{"id":21424,"date":"2013-10-23T17:00:35","date_gmt":"2013-10-23T22:00:35","guid":{"rendered":"http:\/\/www.shamusyoung.com\/twentysidedtale\/?p=21424"},"modified":"2015-07-01T04:48:41","modified_gmt":"2015-07-01T09:48:41","slug":"project-good-robot-26-shader-shenanigans","status":"publish","type":"post","link":"https:\/\/www.shamusyoung.com\/twentysidedtale\/?p=21424","title":{"rendered":"Project Good Robot 26: Shader Shenanigans"},"content":{"rendered":"<p>It turns out that the most capricious, buggy, unpredictable, and mysterious part of my program is the vertex shader.  This is frustrating because it&#8217;s the thing I know the least about, the thing with the least documentation, and the thing that&#8217;s hardest to debug. Of course, these are also the reasons the problem exists in the first place.<\/p>\n<p>The vertex shader malfunctions on about half of the testing machines, and it malfunctions in a different way in each place. There&#8217;s no pattern to these behaviors as far as I can tell. It works fine on one Linux machine, and malfunctions on another. Fine on one XP machine, not on the other. Works great on my Windows 7 machine, but goes crazy on another. Maybe I could nail something down if I started collecting data on driver versions and graphics card manufacturers, but that&#8217;s probably not a good way to spend my time. I mean, if I discover the problem is with laptops using ATI cards, that doesn&#8217;t help me solve the problem. I&#8217;m not John Carmack and I don&#8217;t have encyclopedic knowledge of all the various drivers and their quirks and exceptions. For me, knowing where the problem happens doesn&#8217;t put me any closer to the solution.<\/p>\n<p>I need the game to run properly on all these machines, and the stuff I&#8217;m doing is so stone-age simple that there&#8217;s no reason for this chaos.  It should just work. Barring that, I would hope the system could at least have the decency to fail predictably.<\/p>\n<p>I&#8217;m sure all of these problems are either from bugs in my shader code or from holes in the documentation. But I can&#8217;t find the problem if I don&#8217;t understand it.<\/p>\n<p>I&#8217;ve described how shaders work before.  You might <a href=\"?p=15956\" title=\"Project Octant Part 11: Shaders\">remember<\/a> this diagram:<\/p>\n<p><!--more--><table   class=\"\" cellpadding='0' cellspacing='0' border='0' align='center'><tr><td><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/octant11_1.png' class='insetimage'   alt='octant11_1.png' title='octant11_1.png'\/><\/td><\/tr><\/table><\/p>\n<p>So at one point I had a bug in my shader where it was passing along the red, green, and blue values of the polygon, but NOT passing along the alpha value. The outcomes:<\/p>\n<ol>\n<li>My machine: Shader just silently forced all alpha values to 1, thus rendering everything. The only hint that anything was amiss was that smoke particles weren&#8217;t fading out the way I expected.\n<li>User #1: Shader silently forced all alpha values to zero, thus not rendering anything.\n<li>User #2: Alpha had a random value, resulting in sprites flickering on and off randomly like strobe lights.\n<li>User #3: Shader apparently used one of the given color values for the alpha channel, so the the brighter the color, the more opaque the polygon.\n<\/ol>\n<p>One bug, four divergent outcomes. I managed to track that one down, but I&#8217;m still getting random behavior. <\/p>\n<p>Part of the problem is that there are a half dozen ways to do everything, and it&#8217;s never clear which is the &#8220;right&#8221; way.  We&#8217;ve got different graphics cards, different drivers, and different versions of the OpenGL shader language. New cards add new ways of doing things, which results in new stuff being added to the language, which results in situations where sometimes things are REMOVED from the language, which results in <acronym title=\"No... This. Is. MADNESS!\">Sparta<\/acronym>. <\/p>\n<p>I&#8217;ve been trying to read up on the subject, but it&#8217;s hard to make any headway. The shading language has gone through a few revisions, so old example code might not compile, or it might compile with warnings, or it might compile but malfunction. Tutorials seem to begin mid-stream instead of introducing first concepts and building up. <\/p>\n<p>Actually, this is my biggest complaint. I really feel like the &#8220;Hello World&#8221; of vertex shaders should be a short example that simply reproduced the fixed-function pipeline (what you get when not using shaders) and is compatible with all versions of the shader language. I&#8217;ve googled around, and all I can find are non-working \/ incomplete examples, dead links, and <a href=\"?p=21365\" title=\"How to Forum\">jackasses<\/a> asking &#8220;Why would you want to do that?&#8221;<\/p>\n<p>Yes, there are books on the topic, but my shader program is so simple I really hate to purchase and read a friggin&#8217; textbook just to shove some 2D quads around the screen. That&#8217;s like getting a medical degree so you can take aspirin. <\/p>\n<p>Basically, I&#8217;m rendering thousands of squares. For convenience, each robot has a position, a rotation, and a size.  Now, I can do math to turn these values into the corners of a square, but that&#8217;s a lot of trig to give to your CPU. When we have thousands of particles and each particle has 4 corners, we end up doing a lot of sine and cosine operations to figure out where all the squares belong. <\/p>\n<p>This is what sprites look like if I draw boxes around them:<\/p>\n<p><table   class=\"\" cellpadding='0' cellspacing='0' border='0' align='center'><tr><td><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/gr27_sprites.jpg' class='insetimage'   alt='gr27_sprites.jpg' title='gr27_sprites.jpg'\/><\/td><\/tr><\/table><\/p>\n<p>In my profiling, it looks like the vast majority of my time is spent generating these squares and shoveling them at the graphics hardware. It&#8217;s the only part of my program where I&#8217;ve made any effort to speed things up, and it&#8217;s still the slowest part of the game. AI and collision detection are a distant second and third. They&#8217;re so distant that it&#8217;s actually tough to tell which one is second and which one is third. The CPU usage of building sprites is so big and noisy that measuring the other two is like trying to figure out if your cell phone light is brighter than a candle when you&#8217;re standing in direct sunlight. (To be fair, my performance profiling tools are really rudimentary. I mean, my clock only has millisecond resolution, so I need to take a lot of measurements over many frames. Also, a LOT of CPU time goes unmeasured if it&#8217;s happening in things like audio-processing threads.)<\/p>\n<p>With a shader, we can simply hand all of that annoying and CPU-sucking math off to the graphics card.  The graphics card is barely working, and the CPU has lots of stuff to worry about already. Let&#8217;s just send it a simple 1&#215;1 rectangle along with the position, scale, and rotation values and let the hardware to the math.<\/p>\n<p>Imagine our CPU and graphics card are roommates. Our CPU is Alice. Alice is a healthy but underweight lady who has many jobs working as a sound engineer, musician, animator, file clerk, garbage collector, and analyst. Her roommate is AN ENTIRE FOOTBALL TEAM, who is currently unemployed except for some occasional one-man jobs that show up on the weekend. <\/p>\n<p>Then a new job opening appears: Moving heavy appliances. Who should take the job? Alice, or FOOTBALL TEAM? <\/p>\n<p>This shader is my attempt to pass the work off to FOOTBALL TEAM instead of just dumping more work on Alice, but it&#8217;s turning into a time-sucking circus of bugs and frustration. This one feature accounts for over half my bug reports for the last few builds I&#8217;ve done <\/p>\n<p>For the curious out there, this is the shader I&#8217;m working with:<\/p>\n<pre lang=\"glsl\" line=\"1\">\r\n#version 120 \r\n#define TEX0\t\t\tgl_TexCoord[0]\r\n\r\nin float \t\tattrib_angle;\r\nin float \t\tattrib_scale;\r\nin vec3\t\t\tattrib_position;\r\nin vec3\t\t\tattrib_atlas;\r\n\r\nvoid main()\r\n{\r\n  vec4       vert;\r\n  float      texture_unit;\r\n  float      rad;\r\n  vec2       rotate;\r\n  \r\n  \/\/Color is pass-through.\r\n  gl_FrontColor.rgba = gl_Color.rgba;\r\n  \/\/Atlas pos contains the column, row, and scale of our sprite TEXTURE.\r\n  texture_unit = (1.0 \/ 32.0) * attrib_atlas.z; \/\/Default grid is 32x32.\r\n  TEX0.xy = (attrib_atlas.xy + gl_MultiTexCoord0.xy) * texture_unit;\r\n  TEX0.y = 1-TEX0.y; \/\/Because OpenGL thinks upside-down.\r\n  \/\/Finally, prepare the vertex for the frag shader\r\n  vert = gl_Vertex;\r\n  rad = radians (attrib_angle);\r\n  rotate.x = sin (rad);\r\n  rotate.y = cos (rad);\r\n  vert.x = gl_Vertex.x * rotate.y - gl_Vertex.y * rotate.x;\r\n  vert.y = gl_Vertex.x * rotate.x + gl_Vertex.y * rotate.y;\r\n  vert.xy *= attrib_scale;\r\n  vert.xyz += attrib_position;\r\n  gl_Position = gl_ModelViewProjectionMatrix * vert;\r\n}\r\n<\/pre>\n<p>Sigh. 32 lines of non-branching code. It&#8217;s just some simple math. It should not be this mysterious and difficult thing. <\/p>\n<p>(For the curious, the bug I mentioned earlier was line 17, where it originally said:<\/p>\n<pre lang=\"glsl\" line=\"17\">\r\n  gl_FrontColor.rgb = gl_Color.rgb;\r\n<\/pre>\n<p>That&#8217;s pretty subtle, and I could have any number of bugs like that in my code, where undefined behavior will work fine on one machine and cause chaos on another.)<\/p>\n<p>On one hand, using a shader is the right thing to do, performance-wise. On the other hand, this is a massive time-sink and the project would be much further along if I wasn&#8217;t wasting so much time on it. I have no way of knowing if the next round of changes will clear things up until I send it out to my testers, and when they report visual glitches I don&#8217;t know what to make of them because it&#8217;s all random. <\/p>\n<p>I have features I could be writing and the playtester feedback would be much more useful if they could actually play the game. <\/p>\n<p>So this is a frustrating spot to be in.  Do I waste more time doing it right, or cut corners and have an inefficient game that requires a lot more horsepower than it should? I&#8217;ve taken such pains to make a lightweight, trim, and efficient little program. It would kill me to ruin all of that now. It would also kill me to spend more days on this crap. <\/p>\n<p>I&#8217;ve actually re-written the shader this week.  If it&#8217;s still wonky I&#8217;ll just disable it and re-visit this topic later in the project. <\/p>\n<p>EDIT: My new shader failed on the first machine. <\/p>\n<p><code> 0:6(22): error: 'in' qualifier in declaration of 'attrib_angle' only valid for function parameters in GLSL 1.20<\/code><\/p>\n<p>So it works fine on my machine and compiles with no warnings, but is an INVALID program on another. This error is worded in such a way that indicates it shouldn&#8217;t compile OR work ANYWHERE. <\/p>\n<p>So that&#8217;s horrible. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>It turns out that the most capricious, buggy, unpredictable, and mysterious part of my program is the vertex shader. This is frustrating because it&#8217;s the thing I know the least about, the thing with the least documentation, and the thing that&#8217;s hardest to debug. Of course, these are also the reasons the problem exists in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[498],"tags":[],"class_list":["post-21424","post","type-post","status-publish","format-standard","hentry","category-good-robot"],"_links":{"self":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts\/21424","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=21424"}],"version-history":[{"count":0,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts\/21424\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=21424"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=21424"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=21424"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}