{"id":48683,"date":"2019-12-05T06:00:56","date_gmt":"2019-12-05T11:00:56","guid":{"rendered":"https:\/\/www.shamusyoung.com\/twentysidedtale\/?p=48683"},"modified":"2019-12-05T09:06:36","modified_gmt":"2019-12-05T14:06:36","slug":"programming-vexations-part-12-soa-vs-aos","status":"publish","type":"post","link":"https:\/\/www.shamusyoung.com\/twentysidedtale\/?p=48683","title":{"rendered":"Programming Vexations Part 12: SOA vs. AOS"},"content":{"rendered":"<p>In previous entries, I talked about how you sometimes need to worry about small blocks of memory. In some cases, you may even want to worry about how the data is arranged in memory. To show how this works, let&#8217;s look at our old friend the space marine:<\/p>\n<h3>Space Marines<\/h3>\n<pre lang=\"cpp\">\r\nclass SpaceMarine\r\n{\r\n  bool dead;\r\n  char[8] name;\r\n  Vector3 position;\r\n  float stubble;\r\n  short bullets;\r\n  short kills;\r\n  short dead_wives;\r\n  bool cigar;\r\n};<\/pre>\n<p>This is a contrived example, but it&#8217;s good enough for illustration. This is what the data structure looks like for a single space marine. Here is that same list of fields, except I&#8217;ve color-coded them and made boxes illustrating how much memory each field takes up:<\/p>\n<p><!--more--><\/p>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/vex_layout1.png' width=100% alt='' title=''\/><\/div><div class='mouseover-alt'><\/div><\/p>\n<p>You can see that the name takes up 8 bytes of data. The position is a collection of X, Y, and Z values. Each of those is 4 bytes, so the total size of position is 12. We&#8217;re using 4 bytes to store how much chin stubble he has and 2 bytes for his bullet supply. You get the idea. If you add these all up, we need 32 bytes to store a single marine.\u00a0<\/p>\n<p>When these are stored in memory, they&#8217;ll be packed together like so:<\/p>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/vex_layout2.png' width=100% alt='' title=''\/><\/div><div class='mouseover-alt'><\/div><\/p>\n<p>Just remember that memory addresses aren&#8217;t arranged in a grid like this. Memory is a linear thing, which would mean it&#8217;s just one row that stretches on for billions of bytes. But that&#8217;s kind of hard to show in an image, so I&#8217;m having the memory wrap to another line to make this readable.<\/p>\n<p>Conceptually, this is how programmers want to think of a space marine. He&#8217;s this self-contained thing and we manipulate his data every frame to make gameplay stuff happen. We run some sort of game update on him and change his position, subtract some bullets as he shoots stuff, add to his kill counter, and maybe grow out his chin stubble a little. Whatever.\u00a0<\/p>\n<p>But this probably isn&#8217;t the most common use-case for a space marine. This most common thing isn&#8217;t updating the marine, but checking for interactions with other objects in the scene.<\/p>\n<h3>The Most Common Operation<\/h3>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/vex_marines.jpg' width=100% alt='Judging by Gears of War, I&apos;m betting the most common operation is Marine.Shout()' title='Judging by Gears of War, I&apos;m betting the most common operation is Marine.Shout()'\/><\/div><div class='mouseover-alt'>Judging by Gears of War, I&apos;m betting the most common operation is Marine.Shout()<\/div><\/p>\n<p>Let&#8217;s say in our scene, we&#8217;ve got a bunch of shit flying around that needs to interact with space marines. Bullets and lasers need to damage them<span class='snote' title='1'>After creating the above diagrams, I&#8217;m finally noticing I didn&#8217;t give the marines a hitpoint variable. So I guess all hits are an insta-kill in this game?<\/span>. Explosions need to shove them around. Debris and physics objects need to bounce off of them. Particles need to avoid passing through them. Crushy machinery should squish them. Vehicles should run them over. Landmines should go off if a space marine stands on them. And so on.<\/p>\n<p>So every frame, every one of these objects needs to crank through the entire list of space marines and test to see where it is so it can decide if an interaction is needed. For the vast majority of these tests, the answer will be &#8220;no&#8221;.\u00a0<\/p>\n<p>Like I said earlier in this series, your processor has a tiny pool of memory called the L1 cache. If the L1 cache is 64 bytes, then it can hold exactly 2 space marines at a time. So as all those game objects are cranking through the list, it will look like this:<\/p>\n<p><tt>check SpaceMarine #1<\/p>\n<p>check SpaceMarine #2<\/p>\n<p>(cache miss)<\/p>\n<p>check SpaceMarine #3<\/p>\n<p>check SpaceMarine #4<\/p>\n<p>(cache miss)<\/p>\n<p>check SpaceMarine #5<\/p>\n<p>check SpaceMarine #6<\/p>\n<p>(cache miss)<\/tt><\/p>\n<p>And so on&#8230;<\/p>\n<p>If there are 16 SpaceMarines and 100 gameplay objects that need to interact with them, then you need to crank through the list of marines 100 times. That means we&#8217;re going to do 1,600 position checks, which will result in 800 cache misses<span class='snote' title='2'>Actually, the L2 cache is really important and will help us out a lot here, but I&#8217;m trying to keep this as simple as possible.<\/span>.<\/p>\n<p>These cache misses are invisible in the code. The programmer doesn&#8217;t write a line of code saying to get another block of memory. That&#8217;s all handled by the compiler. They just write a couple of lines to loop through all the space marines and check their positions. This invisibility is one of the reasons that programmers like me are often blind to their true cost. The thinking will go something like this:<\/p>\n<p>&#8220;Man, all these landmines are doing a distance calculation every frame, and that involves doing a square root. Those are expensive! Maybe I should optimize that somehow.&#8221; But meanwhile the square root operation is a minority of the cost. Most processor cycles are being spent idle while waiting for things to arrive from main memory.<\/p>\n<h3>So What is the Programmer Supposed to Do?<\/h3>\n<p>Here&#8217;s the problem you&#8217;re trying to solve:<\/p>\n<p>All of your various game objects (landmines, bullets, explosions, etc) need to do per-frame checks to see if they&#8217;re interacting with space marines. Every object needs to check itself against every marine. At this stage, you don&#8217;t care about anything except the marine&#8217;s position. The problem is that you&#8217;ve got all this other data &#8211; like the marine&#8217;s name and number of wives &#8211; mixed in with those position values. You don&#8217;t care about that data, but it winds up taking up space in the cache and causing performance-destroying cache misses.<\/p>\n<p>A list of three SpaceMarines would look like this in memory:<\/p>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/vex_layout3.png' width=100% alt='' title=''\/><\/div><div class='mouseover-alt'><\/div><\/p>\n<p>The long distance between the address of the position variable for SpaceMarine #1 and the position variable of SpaceMarine #2 is the problem. Keep in mind that this is a <b>very<\/b> simplified structure. In GoodRobot, each robot had <b>dozens<\/b> of properties. And that was just a simple 2D game. Your real-world 3D game is going to have a huge distance between the position of SpaceMarine #1 and the position of SpaceMarine #2. Very likely, you&#8217;ll get a cache miss for every one.<\/p>\n<p>I&#8217;ve never worked on a AAA project like this, but based on discussions I get the sense that the most common way to handle this is to copy the data into an array.\u00a0 Just before you do your checks, you copy all the position values into a temporary buffer. Then you do your checks against the buffer. In the rare<span class='snote' title='3'>Rare from the program&#8217;s point of view. To the player, bullets are hitting marines CONSTANTLY. To the program, it did tens of thousands of checks before it found a case where a bullet hit somebody.<\/span> occasion that a bullet actually hits someone, it can then interact with the full SpaceMarine data structure to see what happens.<\/p>\n<p>The problem is that there&#8217;s an overhead to copying this data into a buffer. You&#8217;re now storing the same data in two different places. If the position of a SpaceMarine changes<span class='snote' title='4'>Explosions are notoriously shove-y.<\/span> then you have to synchronize the data with the temp buffer or you might get weird behavior. This adds a lot of complexity to the code, particularly if you&#8217;re interacting with many different data values, some of those values need to change, and sometimes you have to deal with cascading changes like I discussed earlier in this series.<\/p>\n<p>This is messy. Isn&#8217;t there some way we can have the best of both worlds? Is there any way we can skip this messy, confusing step with the temp buffer without all the performance problems?<\/p>\n<p>Like I said, a list of three SpaceMarines would look like this:<\/p>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/vex_layout3.png' width=100% alt='' title=''\/><\/div><div class='mouseover-alt'><\/div><\/p>\n<p>If all of the SpaceMarines were packed in such a way that similar variables were stored together, then it wouldn&#8217;t be a big deal to crank through the list and check all the positions. A list of three SpaceMarines would be packed like so:<\/p>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/vex_layout4.png' width=100% alt='' title=''\/><\/div><div class='mouseover-alt'><\/div><\/p>\n<p>If the L1 cache is 64 bytes and a single position value is 12, then the positions of five consecutive SpaceMarines could be held in the L1 cache at the same time. Instead of triggering a cache miss with every other check, you&#8217;d get this:<\/p>\n<p>check SpaceMarine #1<\/p>\n<p>check SpaceMarine #2<\/p>\n<p>check SpaceMarine #3<\/p>\n<p>check SpaceMarine #4<\/p>\n<p>check SpaceMarine #5<\/p>\n<p>(cache miss)<\/p>\n<p>That&#8217;s going to be more than twice as fast!\u00a0<\/p>\n<p>So why can&#8217;t they be packed like this?<\/p>\n<p>Well, they can. Sort of.<\/p>\n<h3>Vexation #6: Refactoring Code is Not Programming<\/h3>\n<p>The problem is that in C++, packing the attributes of your marines together would require you to completely re-write the code. <b>Everything<\/b> related to SpaceMarines &#8211; every allocation, every check, every interaction, every deletion &#8211; would need to be changed. You&#8217;d have to give up your object-oriented class structure and break the variables into seperate lists. You&#8217;ll no longer have a list of 50 SpaceMarines. Instead you&#8217;ll have a list of 50 names, 50 positions, 50 cigar flags, and so on. Instead of deleting marine #5, you&#8217;ll have to delete name #5, position #5, stubble #5, bullets #5, and so on. If you forget one, then the lists will get out of sync with each other and you&#8217;ll get a bunch of strange bugs. If another programmer comes along, they won&#8217;t have that nice class definition to help them understand how this part of the code works.<\/p>\n<p>In short, doing things the &#8220;right&#8221; way makes for lousy, verbose, clumsy, confusing code. Worse, switching between these two memory layouts is not easy. If you&#8217;re trying to compare the two, then it&#8217;s going to take a lot of fussing to gather the data.\u00a0<\/p>\n<p>These two different ways of storing data are referred to this:<\/p>\n<p>Array of Structs:(AoS)\u00a0 A list of objects (like marines) where objects are stored one after another.<\/p>\n<p>Struct of Arrays: (SoA) An object that contains lists of fields, so similar fields are stored together.<\/p>\n<p>You can think of this as a list of containers vs. a container of lists. These are two conceptually different ideas and the implementation for them is very different. If you realize your project needs to use the other memory layout<span class='snote' title='5'>Or if you&#8217;re curious and want to compare the two.<\/span> then you need to re-write basically everything.<\/p>\n<p>In gaming, you&#8217;re very often doing batch operations where you only care about one part of a large structure. Like, maybe we have a series of loops that look over all of the space marines and checks for a particular situation.\u00a0<\/p>\n<pre lang=\"cpp\">\r\n\/\/Some C-ish pseudo-code\r\n\/\/Let's assume these marines have a hitpoint value, even though the earlier ones didn't.\r\nforeach (SpaceMarine Jack in MarineList) {\r\n\u00a0\u00a0if (Jack.Hitpoints () <= 0)\r\n\u00a0\u00a0\u00a0\u00a0Jack.die ();\u00a0\r\n}<\/pre>\n<p>If you have a list of 50 space marines and you're using AoS, then you get a cache miss every single time you check a marine. However, if you do happen to find a marine that needs to die(), the die() function will probably run nice and fast because the entire soon-to-be-dead marine will already be loaded into the L1 cache. If that same list is stored in SoA format, then checking marines will be blazing fast, but processing the death of a particular marine will be slow because his individual fields are all scattered around in memory.<\/p>\n<p>So to know which system is best, we need to ask ourselves: <strong>Which is the most common operation, checking or die()ing?<\/strong>\u00a0<\/p>\n<p>Assuming this is your typical video game, then obviously checking will be massively more common than die()ing. If you're got 50 marines, the game runs at 60fps, you check for death every frame, and marines die an average of one every ten seconds or so, then you will need to check hitpoints 30,000 times for every time you need to call die(). Checking hitpoints is massively more common than calling die(), so it makes sense to optimize for checking.<\/p>\n<p>Sadly, none of the C languages offer a way to store your data as a SoA without giving up all of your object-oriented design. To the best of my knowledge, no major language does.<\/p>\n<p>I don't know why this is. I'm not being snarky. I'm not saying, \"I don't know why those dum-dum language designers didn't think of this.\" There's probably a shining good reason why compilers work this way, it's just that those reasons fall outside of my area of knowledge. My guess is that our modern-day concerns over cache misses are relatively new in terms of language development.\u00a0<\/p>\n<p>If a language designed for games offered you a way to switch between a SoA and AoS as needed, then it would be really useful in cases where you need to do bulk checking on game objects.<\/p>\n<p>JAI will supposedly have this feature, and I wonder if it's not going to pop up in some other languages in the next few years. I find it hard to believe that video games are the only domain that would benefit from this.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In previous entries, I talked about how you sometimes need to worry about small blocks of memory. In some cases, you may even want to worry about how the data is arranged in memory. To show how this works, let&#8217;s look at our old friend the space marine: Space Marines class SpaceMarine { bool dead; [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[66],"tags":[],"class_list":["post-48683","post","type-post","status-publish","format-standard","hentry","category-programming"],"_links":{"self":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts\/48683","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=48683"}],"version-history":[{"count":15,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts\/48683\/revisions"}],"predecessor-version":[{"id":48700,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts\/48683\/revisions\/48700"}],"wp:attachment":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=48683"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=48683"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=48683"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}