Project Good Robot #31: So Obvious I Can’t See It

By Shamus
on Mar 12, 2015
Filed under:
Good Robot

It’s been about a year, but project Good Robot is moving again. I’m working with Pyrodactyl to make the game into something that’s hopefully worth paying money for. The game is going on Steam Greenlight. Here’s the trailer:


Link (YouTube)

If you’d like the game to see the light of day, please vote for it and spread the word. It would be much appreciated. It will be a lot easier to plan development if we don’t have to worry about getting stuck in Greenlight for months. Arvind (head honcho of Pyrodactyl) put up his previous title A.Typical RPG, and it’s still awaiting approval after a few weeksA vote for that game would be appreciated as well!. So apparently this can take a while.

This project nearly feels like my old day job. In a good way. Once a week we get together and talk about the game, and when the meeting is over I’ve got a list of crap to do.

Here is the most interesting problem I ran into last week:

One of the things that’s been on the to-do list since last year is the fact that the game isn’t performing the way it should. Once I get a few thousand particles on screen, the framerate drops like a rock. This is pretty mysterious. The last serious particle engine I wrote was almost a decade ago. It was more complex than this one, it worked in 3D, and I’m pretty sure it could handle 2,000 particles without too much trouble.

Two thousand particles is nothing. Their polygons are ridiculously cheap, and the processing to move them around is pretty lightweight. I’ve tried tightening up the particle animation code and tried offloading a bit of the math onto the graphics card, but it didn’t seem to help. This is an annoying head-scratcher of a problem. This system is so simple I just can’t see where the bottleneck isMost of this analysis took place before I had access to good profiling tools..

It works like this:

SO MANY PARTICLES WHO THOUGHT OF THIS?!?

As the game is played, various systems create particles. As lasers fly around, they leave these brief trails of glowing pixels. Every time something hits a wall, it makes this little scatter of rocks. When robots die, they give off a trail of black smoke. An explosion makes a big cluster of dozens (or a hundred) bits of fire and smoke. Half a dozen sparks are given off by lasers when they hit something. You get the idea.

All of these particles are added to the particle system. It has a great big queue of all the active particles, and the new ones get added to the end of the list. Once a frame, the particle managerNo, I DIDN’T call it “Particle Man” in the code. I have SOME self-restraint. Also, I didn’t think of it until now. I wish Visual Studio had better refactoring tools. runs through the list of active particles and does a little bit of processing for each one. If its life span is over, it’s marked for deletion. If not, the PM makes the particle tumble or fall or move around however it’s supposed to move. Then when render time comes around the rendering system runs through the list and sends the polygons to the GPU.

That’s it. The entire particle system is less than 300 lines of dead-simple code. Not being able to find the performance slowdown here feels like losing track of a couch in an empty room. There just aren’t many places for something that big to hide.

It’s like a rave where everyone hates you and wants to kill you.

Eventually I give up and the project goes dark for a year. When I come back I start poking at the problem again. Previously I had the number of particles limited to 2,000. Just for giggles I up the limit to 10,000 and have the player character spew a shower of hundreds of sparks every few milliseconds. Sure enough, we get a massive slowdown again. But the odd thing is that as the number of particles passes 3k, the framerate is still good. At 5k, the framerate is still good. Same thing at 8k. Then we hit the 10k limit and the framerate drops like a rock.

So it’s not that the system slows down when we have more than 2,000 polygons, it’s that the framerate tanks whenever we hit the limit, whatever that limit might be.

I spend several seconds looking dumbfounded at the screen before I begin nodding my head. Of course. OF COURSE. I am a fool.

Last week we talked about how there are a lot of ways to store crap in memory. The popular choice is an array, You just get a big ol’ wad of memory and you pack your things next to each other. This is called an array. Another way to store things is with a linked list. A linked list is a great way to store an ever-changing list of stuff and an even better way of annoying everyone who isn’t an old-school C coder.

A linked list in action:

1
2
3
4
5
6
7
8
9
10
//We are doing things C-style. 
//Let's program like it's 1989!
struct Foo
{
  int   awesomeness;
  int   color;
  char* favorite_dance;
  float psi;
  Foo*  next_foo; //This one is the tricky bit. A pointer to another Foo.
};

You create a Foo object. It’s got some properties that you care about. One of those properties is that it has a pointer to a Foo object. So I make one of these objects. Let’s call this one FooAlice. It’s the only one I’ve got. I allocate a tiny little bit of memory for it and store my values there.

Then I need another Foo. So I make FooBob. I get the memory, fill in the values, and then I set its pointer to the location of FooAlice. When FooCarl shows up, I do the same thing, but its pointer is set to the location of FooBob.

FooCarl » FooBob » FooAlice

So at any given time I only have direct access to one Foo, which is the most recently created. So if I ever want to see FooAlice again, I have to ask FooCarl where to find FooBob, and then ask FooBob where to find FooAlice. It’s like one of those scavenger hunts where you see a stickynote telling you to go to the kitchen, and in the kitchen you find a note telling you to go to the bedroom, which has a note to go to the bathroom, where you find a note telling you to stop wasting all the stickynotes and clean up this mess. When you’re looking at the first note, you have no idea where the trail will end, so you have no way of getting there except to follow the chain.

You may ask, quite reasonably, “Shamus dear boy, this sounds like a lot of faffing about about to get back to our FooAlice. Why don’t we store all these locations so we can recall them later?”

Yes, you CAN do that. At which point you have an array again. And arrays have their own drawbacks, as we’ll see in a minute.

This art is being re-done. Next next time I show this part of the game, it ought to look very different.

The advantage of linked lists is that you never have to shuffle around any big blocks of memory. You don’t reserve, free, or move groups of objects, so changing the list has a fixed (and very tiny) cost. If I’m done with FooBob, I just point FooCarl at FooAlice, and then I free up the memory used by FooBob.

FooCarl » FooAlice

The disadvantage of linked lists is that they’re a bit hard for some people to wrap their heads around. They require a lot of dumb boilerplate code. And (most importantly) accessing specific items in the list is slow. There’s no way to quickly jump to item #10 in the list.

Most modern coders would turn their nose up at linked lists, but I think they’re a fine way to store stuff like these particle lists. I never need to do something to ONE particle. I only need to do things to ALL of them. I’m either moving them or rendering them. This is a great place to use a linked list, as long as you’re comfortable with the ugly retro way of doing things.

Too bad that’s not what I used here.

I used one of the sexy new C++ style vector arrays:

1
vector<Foo>   my_foo_list;

Using a vector is a lot less code than setting up a linked list. It has a bunch of features that do things for you. Delete stuff! Add new stuff. Insert stuff! There’s a lot of code packed inside these tools, and that code is generally hidden from people like me. You just create this array of crap and let the vector code handle it all for you.

And that’s fine. But I sort of forgot that I’d taken that shortcut.

Remember that new particles get added to the END of the list. So the particles at the start of the list tend to be the oldest and the ones at the end tend to be the youngestParticles have different lifespans, so sometimes ones near the middle of the list “die” before ones at the start.. So then when I would delete particle #10:

1
my_foo_list.erase (my_foo_list.begin()+10);

I hate that syntax, but whatever. The point is, this isn’t a linked list. This is an array. You can’t have gaps in an array. Everything has to be packed together. So when I delete #10 in a list of 2,000, it has to move the last 1,990 items over one space. (#11 becomes #10, #12 becomes #11, and so on.) I have no idea if it does this as a single move operation, or if it does 1,990 memory copies. It doesn’t matter. It’s ruinously slow. Deleting the first item in the list means moving every single following item in the list.

So let’s say this frame I create 200 particles. Then another 200 particles next frame. Then another 200 the frame after that. For the purpose of simplicity, let say I’ve got the particle limit set to 1,000. After six frames, the particle manager realizes it has 1,200 particles. Rather than throwing away the new ones, it throws away the oldest ones. So it immediately kills the first 200 particles in the list. One at a time.

Remove the first particle in the list. Move the remaining 1,199 particles over to fill the gap.
Remove the first particle in the list. Move the remaining 1,198 particles over to fill the gap.
Remove the first particle in the list. Move the remaining 1,197 particles over to fill the gap.

…and so on.

It needs to copy the entire list 200 times.

I have no idea how the vector works under the hood. Maybe when it moves them over it does so in one bulk copy operation. But MAYBE it does them one at a time. In an ideal world, these two things would be the same. But when you’re talking about C++ objects (structs or classes) there’s all sorts of hoodoo that might be going on under the hood. It’s the difference between moving a box across the room, or taking all the items out of a box and placing them in an identical box on the other side of the room. The result is identical but the steps are vastly different.

The point is, deleting items out of the list like this is preposterously slow, and a single line of code generated a huge number of operations that completely overshadowed the hundreds of lines of obvious processing and rendering code.

The problem was right there in front of me. I scrolled past it a dozen times. But I didn’t look at it because I’m not used to worrying about list management. It’s usually a performance non-issueAgain, assuming you don’t need to fiddle around with individual members. and so I went looking for the usual suspects: processing data and rendering polygons. It was this strange professional blind spot that haunted me way longer than it should have, and a less experienced coder would probably have found it sooner.

I could fix this by moving to a linked list, but in the interest of self-education I try a different approach. I change it so particles are only cleaned up once a second. When that happens, instead of removing items from the start of the list, it runs through all the particles and sticks all of the still-active particles in a new list. Then the new list replaces the old.

This fixes the speed problem and now the particle code runs at least an order of magnitude faster.

There’s probably more speed gains to be had, but I have no idea if it’s worth the time. The game is smooth again and it’s not time to optimize things yet. (This wasn’t optimizing, this was basic sanity management. This mystery was driving me crazy.)

Onward.

Enjoyed this post? Please share!

Footnotes:

[1] A vote for that game would be appreciated as well!

[2] Most of this analysis took place before I had access to good profiling tools.

[3] No, I DIDN’T call it “Particle Man” in the code. I have SOME self-restraint. Also, I didn’t think of it until now. I wish Visual Studio had better refactoring tools.

[4] Particles have different lifespans, so sometimes ones near the middle of the list “die” before ones at the start.

[5] Again, assuming you don’t need to fiddle around with individual members.



A Hundred!2020205Many comments. 165, if you're a stickler

From the Archives:

  1. Tobias says:

    I haven’t done any serious programming in years, but shouldn’t you have a profiler that tells you those things?

  2. Ilseroth says:

    Glad to see Good Robot is on it’s way. I am kinda curious regarding your last post of Good Robot compared to now.

    Part of your frustrations (besides the odd slowdown issues which you seem to have semi-fixed) was the concept that, as you put it, it was a “jellybean” game; while you were aiming for something a little more engaging.

    So the question is, do you think you have found a way to improve on that concept; or that you have embraced the concept of just being a quick, fun game?

  3. Df458 says:

    Interesting problem. I’m a relatively new(5 years) coder, and it seems like every major bug is caused by something blindingly obvious that takes hours to notice.

    Speaking of data structures, I’ve been told a few times that the C++ STL structures are very slow in general, and that I should either roll my own or find a library with more efficient ones. Do you know if it’s typically worth the potential speed boost when developing games, or is it not as much of an issue as people make it out to be?

    • Bloodsquirrel says:

      I wouldn’t say that they’re “very slow in general”, but you have to be aware that different types of lists are designed for different types of usage. Vector, for example, is optimized for fast access. It’s also reasonably fast to add items to it as long as you aren’t causing it to resize too often. It’s flat-out slow to delete items.

      For really high-performance code, you often need to fit the data structure as exactly to your usage as possible.

      • Ingvar M says:

        Fast random access.
        If you keep adding data to the end of a vector, you can get asymptotically O(1) add for each element added, but that requires essentially doubling the size of the allocated back-store every time you run out (and it’s only on-average O(1), any specific add may take noticeably longer).

        Deletion is O(n), due to need to copy over the gaps. As long as you’re fine with “deleted items” still hanging about, you can use deletion markers and occasional cleanup sweeps (what Shamus did), but that does technically break your O(1) lookup.

    • Ian says:

      Speaking as a long term C++ developer the answer is realistically no. These days stl is fine for the majority of tasks you are going to throw at it. Plus there’s nothing worse than revisiting code 10 years down the line and finding old libraries tucked in that may or may not be still maintained.

      Optimising too soon in a project can waste a lot of developer time (expensive) to save a tiny amount of processing time (cheap). Obviously there are times where spending a month to make something run 1% faster is a good use of the month over the lifetime of the software but it’s not often the case.

      In a previous job the coding test was to make a particle engine in C++. Rolling your own linked list was an almost certain way to not get an interview as it showed you didn’t know stl (mentioned as a technology the test wanted to see used).

      Edit: I agree with the above answer, it’s all about knowing which collection type you want to use.

    • Kian says:

      This is absolutely wrong. Most STL containers are as fast as equivalent C structures (that attempt to provide similar facilities, obviously). However, you have to understand what promises they make and what performance is like for different structures and operations. Which you do by reading the documentation for them.

      Writing your own structures will give you the knowledge of how your structure works, buts it’s generally better to learn about the STL because everyone else in your team is not going to know how to use your container. So you’re basically making your life “easier” (your container is going to need to be developed, then debugged, then profiled for performance, etc) at the expense of everyone else’s.

      • Shamus says:

        “Which you do by reading the documentation for them.”

        Do you mean this documentation?

        http://www.cplusplus.com/reference/vector/vector/

        It sticks to syntax and never touches on theory / usage / tradeoffs. If there’s better documentation, Google hasn’t found it for me.

        As far as I can tell this stuff is folk knowledge that gets passed around in conversation and forums.

        • Abnaxis says:

          Erm…?

          …Therefore, compared to arrays, vectors consume more memory in exchange for the ability to manage storage and grow dynamically in an efficient way.

          Compared to the other dynamic sequence containers (deques, lists and forward_lists), vectors are very efficient accessing its elements (just like arrays) and relatively efficient adding or removing elements from its end. For operations that involve inserting or removing elements at positions other than the end, they perform worse than the others, and have less consistent iterators and references than lists and forward_lists…

          Not meaning to be a jerk, but isn’t that exactly what those paragraphs describe?

          • Kian says:

            To add to that, the linked documentation for erase says: http://www.cplusplus.com/reference/vector/vector/erase/

            Erase elements
            Removes from the vector either a single element (position) or a range of elements ([first,last)).

            This effectively reduces the container size by the number of elements removed, which are destroyed.

            Because vectors use an array as their underlying storage, erasing elements in positions other than the vector end causes the container to relocate all the elements after the segment erased to their new positions. This is generally an inefficient operation compared to the one performed for the same operation by other kinds of sequence containers (such as list or forward_list).

            It also mentions time complexity for the operation, data races, and such.

          • WILL says:

            I have to agree here, STL is incredibly well documented.

        • lethal_guitar says:

          Hey Shamus,

          in case you didn’t know it: Nowadays, cppreference.com is considered a much better source than cplusplus.com. The latter sometimes contains incomplete or outdated information. It’s also not the official documentation, even though it tries to come across like that. The only official C++ documentation is the standard.. which costs a lot of money and is borderline unreadable.

          cppreference.com, on the other hand, is updated frequently, very accurate, and often contains information regarding upcoming language features very early on.

          Regarding your specific example: The vector docs on cppreference.com contain a lot of information about time complexity, how the data is laid out etc.

          • Carlos says:

            Note that while the official versions of the standards are relatively expensive for the casual reader (30-60 USD), you can get a free version of the draft version (which is very close to the final version) online as a PDF for free. I keep this stackoverflow link with a good list of where to find it bookmarked:

            http://stackoverflow.com/questions/81656/where-do-i-find-the-current-c-or-c-standard-documents

            Further, the writing of the standard is coordinated on Github, and new drafts can be found here:

            https://github.com/cplusplus/draft

            That said, I would not recommend reading the standard if all you want is a reference guide. It’s very much like reading a legal contract, and it’s not for the faint of heart. cppreference is a great place to get up-to-date documentation on the C++ standard library including a good summary of the run-time characteristics of the various contained data structures.

    • Abnaxis says:

      I think Shamus’s example here is the perfect situation for deviating from STL data structures and writing your own.

      The reason for this, is because the main benefit you get from using STL is so you don’t have to write boilerplate code for the stuff like pop() and erase() and iterators. However, this application doesn’t fit neatly into any of the STL structs–he’s doing both random removals (dropping particles that have reached end of life is essentially random) and ordered removals (if particle limit is 1000 and you have 1200 particles, drop the oldest 200).

      To me, this means that no matter what, you’re going to be writing that boilerplate code. If you try to use a std::queue it’s going to kill you to iterate through the entire list and drop dead particles. If you use std::vector, it’s going to kill you to drop the oldest particles when you reach the limit (as happened here). The only way to avoid this is to use a more basic container like a std::list or a std::forward_list, which will make your code a lot less readable and cludgy (because you’re bolting queue functionality onto a list) than it would be if you just make your own struct designed to do both.

      • Ed Lu says:

        Even in this case, though, I wouldn’t roll everything from scratch. I think what everyone’s forgetting here is that you don’t need to use just one master-structure for all the particles. Here’s a sketch of my approach:

        struct ParticleMan {
          std::vector<Particle> pool;
          std::priority_queue<Particle&, fancy_comparator> age_ordered_particles;
        }

        The std::vector pool is used as an object pool and for easy iteration, and the std::priority_queue age_ordered_particles is used to sort the particles by age. The std::vector comes with 10000 (or whatever your limit is) pre-allocated particles, each of which has an “active” flag. When the game requests a new particle, we go looking for the first particle within the pool without the active flag set, initialize it, and stuff it in the age_ordered_particles queue. If there are no inactive particles, we instead reset whatever the top (oldest) element of age_ordered_particles is, and use that instead. Additionally, each tick of the game, we look at the top element in age_ordered_particles. If it’s too old, we set it to inactive and remove it from the queue, returning it to the pool. Rendering would be as simple as iterating through all the elements of pool and rendering the element only if it has the active flag set.

        There are a few optimizations to be made here, the most important one being the linear-time search through the pool every time we make a new particle. I’ll leave that one to this excellent article on object pools.

        • Abnaxis says:

          But what do you get out of doing it that way? You’re still going to be writing about the same number of lines of code for your struct, except all you interactions are going to be with std::priority_queue <Particle&, fancy_comparator> instead of MySweetParticleStructWithADescriptiveNameInPlainEnglish.

          Unless it’s saving you lines of code, I don’t see any justification in using STL, but that’s probably my preference.

          • Ed Lu says:

            For me, it’s less about lines of code and more about the tried-and-tested-ness of the STL. If I implemented vector/prioq myself, or whatever other data structure, I’d be constantly worrying about whether or not the implementation was correct. I’d never be sure at what level a problem exists. This way, I know that at least my operations with std::vector and std::priority_queue will do the right thing, and the (asymptotic) running times of those operations are a known and guaranteed quantity. Assuming of my system’s C++ implementation is correct, of course.

            You’re right about interacting through an interface, though. The code for ParticleMan up above wouldn’t go straight into Shamus’ ParticleManager (bad naming on my part), it’d exist as its own class and have functions like AddParticle(), Tick(), and so on. One big sticking point is how those Particles get passed to the rendering engine, but that’s something I’d worry about if I were actually implementing this.

      • Richard says:

        No. Absolutely not.

        It’s simply a great example of accidentally using an inappropriate datastructure.
        It doesn’t matter whether you wrote it or not, using the wrong datastructure gives you slow results.

        The STL is almost certainly faster than anything you could write that does the same thing, it’s been extremely well tested and is very well documented.

        However, if you use a std::list or std::forward_list for something where you often need to access element #n (random access), or std::vector where you’re deleting things from the middle, it’s going to be slow.

        One of the great things about using the STL is that you can often swap out the datastructure for something else, without changing any of the code that uses the structure.

        • Bryan says:

          Yeah, just changing std::vector to std::list in this case would almost assuredly fix the entire issue without changing anything else…

          Unless he’s iterating over the vector today with an int, instead of with an iterator; then the list would be a lot slower to iterate (n**2, as each access would be O(n) … on the other hand, operator[] on a std::list probably doesn’t exist, either, so it probably just wouldn’t compile). But since he’s doing a .erase(), I suspect it’s an iterator instead already.

          (I’d recommend using auto-iteration and C++11’s “for (const auto& elem : list)” syntax … except that you have to be able to do those .erase() calls, and the : iteration syntax doesn’t give you an iterator.)

          Oh. And deleting lots of stuff off the end of the list when necessary should be the same complexity on a std::list as a std::vector. Either you have to run the destructor for every deleted element, or not (you don’t if it’s POD; you do otherwise). If you do, it’s O(n) either way. If you don’t, then since both have a .erase() overload that takes two iterators instead of having to call .erase() N times on a single element each time, it should be constant-time. (Especially the list version.)

    • WILL says:

      They are not slow, they require a good understanding of their inner workings, that’s all. Maybe this is just because I was taught this at University, but looking into efficient data structures and basic algorithms is essential to good performance.

    • lethal_guitar says:

      ” I’ve been told a few times that the C++ STL structures are very slow in general, and that I should either roll my own or find a library with more efficient ones”

      That was indeed once true, in the early days of C++. But it isn’t anymore. Unfortunately, some people still insist on not using the STL. Which is in general a bad idea. Consider:

      1. You have a whole bunch of extra code that may contain bugs. Sometimes subtle ones, that only trigger in specific edge cases. Good luck figuring those out..
      2. You won’t profit at all from improvements to the STL when upgrading your compiler.
      3. Your code is now tied to your custom library, so it’s harder to reuse
      4. It’s probably hard to use your custom containers with STL algorithms, like std::accumulate etc., unless you go to the trouble of implementing the full STL container interface, including iterators etc.

      Point 2 is especially important considering the switch to C++ 11. The STL containers that ship with C++ 11 compliant compilers have proper move semantics, for example. This means that some code might become faster just by recompiling with a C++ 11 compiler. If you use your own containers, OTOH, you will have to update that yourself, or you won’t benefit from these new language features – which in turn means, your code will be slower.

    • Blake says:

      We use custom data structures here, and the main reasons for that are readibility (trying to debug inside stl containers is quite zany), simplicity in defining how the memory is laid out (everybody here knows how to put a FixedVector in somewhere, but writing a custom allocator for a stl::vector isn’t something anyone here has done), and guarantees we’re running the same code on all platforms.

      Our vector type is as fast as you can make it, as the [] operator is inlined and just calls it on the underlying memory (after doing bounds checks in non-retail builds)) and it never throws any exceptions or does anything like that.
      I’d imagine our map is probably a touch slower than stl though, and we don’t have the same range of data structures as stl does.

      Basically as the others said, stl is fine as long as you completely understand the structures you’re using and are comfortable with your data structures being different on different platforms.

      Something worth checking out is EASTL, EA’s STL implementation. It is free, and addresses some of the complaints game devs have about STL.
      At the very least this document about it explains the differences and why you might want them:
      http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html

  4. Tintenseher says:

    I am so excited. To Greenlight!

  5. krellen says:

    Did your experimentation with shaders come up with a solution for the light-bleed problem I had in the Inky Depths level?

    (I think the background lights were never “background” while I was playing, but it wasn’t an issue except with the bright white lights in the Inky Depths.)

  6. Robyrt says:

    Great writeup on what is generally a very dry, unexciting programming problem (choice of data structures). But when it’s framed as “Why does my framerate drop like a rock?” you can read all the way through.

  7. Kian says:

    A vector is really just a pointer to an allocated chunk of memory. If you delete something from the middle, it will probably do a memmove of everything behind to keep it compact.

    The way to delete things from the middle is to swap the item you want to delete with the last item in the vector, then pop the last item. Should only be used of you don’t care about the order, which in a particle list you don’t.

    • Dan Efran says:

      This. Protip.

      Elegant and efficient, whereas, Shamus, your “copy most of the array, but only every second” fix is…how to say this politely?…a cringe-inducing kludge.* Please do the swap-pop thing instead.

      *Specifically, you’ve reinvented garbage collection.

      • Shamus says:

        Yeah, I think this is the way to go. Actually, you don’t even need to SWAP, because you’re not preserving the one being thrown away.

        if (we’re not at the end of the list)
          particle[this one] = particle[last one];
        particle.pop ();

        Only a couple of lines of code, and it lets us keep the particles packed together in memory.

        This FEELS strange because I’m re-ordering the particles. But there’s no reason to care about order.

        • Abnaxis says:

          ….really?

          Well, there went the wind in my sails. I thought you were removing the oldest particles first we you got to the particle cap? Switching particles around will break that order.

          • Shamus says:

            That’s a good point. For cases when we go over the limit, we would naturally want the oldest first, and this essentially puts the NEWEST first. It’s a really interesting trade-off. In practice, it means that particles will vanish early if we go over the limit. Which will only happen when the scene is INSANE busy.

            I’m not sure the user could tell, in this case. Which means I SHOULDN’T care. (But deep down, I actually do.)

            • Dan Efran says:

              Oops, good point….

              Here’s an idea. Rather than having the garbage collector come in and lop off some particles when you go over budget, why not let the particle system automatically tune itself as it gets close. Specifically, I’m thinking you could just put a scaling factor on particle ages, so particles die of old age more readily – younger – as you approach the particle cap.

              Works best if you go over budget gradually, not in big bursts, but it’s only about one line of code.

              • Kian says:

                While I’d like some hard data about whether to put a cap at all, or how high it should be, let’s see if we can come up with a way to track the oldest elements.

                – When a particle is created, it is added to the vector at the back.
                – Particles can have differing lifetimes.
                – When a particle dies, it is removed from the vector by “copy and pop”.
                – As a result, our vector is generally unordered.

                Possible solution: If we are over budget, sort the vector by lifetime, youngest to oldest, remove the required range from the back until we are inside our budget. As an addendum, don’t remove “just enough”, remove enough so we have 10% budget free.

                Since all the oldest particles will be at the back, you can remove them in one pass basically for free. If our limit is 1000, and our current count is 1200, remove 1200-900 = 300 particles. This leaves 100 spots free so that if next frame we add even more particles, we won’t have to immediately do the sort again and we give the system a bit of breathing room.

                To sort, use the algorithm header’s std::sort:
                http://en.cppreference.com/w/cpp/algorithm/sort

                void ParticleUpdate()
                {
                // Add new particles

                // Update and remove expired particles

                // Check if we are over budget
                if (particles.size() > k_particleLimit)
                {
                // Define comparison operator for particles by lifetime (less is younger)
                struct
                {
                bool operator()(Particle const& a, Particle const& b) const
                {
                return a.lifetime < b.lifetime;
                }
                } lifetimeComp;
                std::sort(particles.begin(), particles.end(), lifetimeComp);
                // Remove the oldest ones
                int particleCount = particles.size() – k_limit*0.9;
                particles.erase(particles.end()-particleCount, particles.end());
                }
                }

                • Dan Efran says:

                  To be honest, my favorite approach for particle systems is a circular buffer.

                  Standard stuff but for those who haven’t seen it:

                  Preallocate the array as big as you’re willing to let it go. Keep track of the first particle and the first empty space beyond the last particle – not pointers, just array indices will do.

                  The indices start out equal, which signifies an empty list.
                  Pop is just moving the end marker back, push is writing there and moving it forward.
                  Delete is the swap and pop dance again.
                  You can do the FIFO bulk delete of the oldest particles by moving the start marker past the stuff you want to forget. You just change one variable, no need to move memory or write zeroes to the array or anything. You just move the starting line.

                  This works because you wrap around whenever you get to the end of the array: it’s circular. Which does mean you write a bit of extra code to make it work – wrapping around correctly, and not letting the start and end indices collide when it gets full – but if you like coding, that’s some fun code to write, and it’s really only a few lines.

                  The benefit is that ALL these operations are O(1), basically as efficient as possible.
                  Perfect for this job, if you can afford to keep all the memory allocated all the time.

                  • Kian says:

                    What I don’t like about them is that you have to have an “inactive” state for particles (since you don’t always remove from the front, particles may die at any position if new particles are shorter-lived). Which is an additional bit of complexity to keep track of.

                    I’m pretty confident that any approach he chooses, so long as it plays nice with the underlying data structure, is going to be fine. He’s not running up against hardware limits.

                    • Dan Efran says:

                      Not sure I see the problem.

                      He’s already got a “marked for deletion” state – that’s pretty standard and not a big deal, basically a quick “if not dead” check while you’re already running through the list to do particle stuff.

                      Then when you’re ready to, you can delete a dead particle from the middle, with total efficiency, using the swap and pop trick.

                      (In fact you can do this as soon as a particle dies, with no need for an inactive state. But marking for deletion is often better than immediate deleting in game engines. Because reasons.)

                      Whereas calling a sort function has a cost. Sorting’s another thing that eats up time if you pick the wrong algorithm, and I’d hesitate to add a sort to something like a particle system, which can and should be tight and fast and simple.

                  • Kacky Snorgle says:

                    “Delete is the swap and pop dance again.”

                    Dumb newbie question: With your circular setup, why do you even *need* a delete operation? It seems like it’d be simpler to just leave the dead particles in place, to eventually be overwritten once they get “lapped” by newly created particles. You wouldn’t even need to bother checking whether your new particles are overwriting dead particles or live particles, since the latter is the desired behavior anyway when the particle buffer gets too full.

                    • Dan Efran says:

                      Not a dumb question, actually that’s a very good point. A clever simplification of the code, that might help performance (by omitting the delete operations) or hurt it (by leaving more dead particles in the list to be skipped over all the time) but probably not by much either way.

                      Edit: D’oh! Never mind, it’s not such a good idea. Sorry. Forgot a detail.

                      You want to use the delete technique to close up the gaps, so that when you’re adding new particles you don’t have to search the list for gaps to add them in. You could do that (someone else suggests it below) but it’s less efficient and more complex. Stick with the delete plan.

                    • Kacky Snorgle says:

                      Hmmm…I was thinking that new particles would always be added at the end of the list, rather than searching for a “better” place to put them. The only time you’d have to search through the list would be when the oldest live particle (the current beginning of the list) dies, and you have to search for the next-oldest live particle in order to reset your start index.

                      But I guess that if particle lifetimes vary widely enough, that could end up creating a very sparse list, with way too many dead particles being skipped over every frame. Which would also mean that the buffer size might have to be much larger than the expected number of live particles, in order to avoid premature overwrites even under non-exceptional circumstances. Which doesn’t sound as efficient as I was hoping.

                      Ah. Like I said, dumb newbie question. :)

                    • Blake says:

                      Although it does depend on how big the total buffer is and how often you’ll have gaps for large amounts of time.
                      It might just be fine to mark a particle as inactive and wait until the ones before it are also clear, and my instinct would be to say that is probably the case.
                      And if you have really long lived particles messing with things then yes you could go for a pop-n-swap, or move the long lived particle back to the end when its at the front, or simply put long lived particles into a separate buffer.

                      I think the choice of implementation would really depend on how the game is planning on using it.

            • Purple Library Guy says:

              Maybe you should just swap the order automatically in the first place, so the oldest particles are all on the other end and if you run up against the cap you can delete from the easy/fast end and nothing needs to get moved.

              • Kian says:

                Then he’d have to add new particles at the front, which means bumping every other particle one step back every time he adds a particles, instead of every time he removes them.

        • Causeless says:

          Resizing the array at all is something that should be avoided. Frequently making an vector bigger and smaller can have serious performance issues because sometimes some other memory you’ve allocated might be next to it which means the entire array needs to be moved, and either way you’re putting a lot of trust into the OS!

          A linked list also isn’t perfect. Although inserting and deleting particles is easy, actually iterating through them is SLOW. If you have 2000 particles, that means you are probably going to have these particles all over the place in memory, and you are very likely to be causing cache misses constantly as you move incoherently around RAM. It’s also wasteful in memory, needing an extra pointer per particle.

          The best solution would probably be to use an object pool. Instead of resizing the entire array whenever a particle is added, you are essentially making your own very very simple memory allocator, which can only supply and delete blocks that are exactly 1 particle wide.

          First, you never truly delete particles, by which I mean freeing memory. Deleting particles is just implicit – when a particle has no more lifetime left, you skip simulating and rendering it, but don’t actually mess around with its memory.

          While you loop through all your particles (including dead ones), updating them, if you want to create a new particle you just use a dead particle and copy a new particle object over it. If a particle dies, you don’t need to do any messy resizing. In fact, you don’t really need to do anything at all.

          The key thing is that you are writing a system which understands that even if a particle was just deleted you are likely to make a new one almost immediately, instead of letting std::vector guess at your intentions.

          Of course, this all may be a bit much focus on what is probably an already good enough system. Still, if you ever need insane amounts of particles constantly being created and deleted – and who doesn’t love huge scale, right? – an object pool is a good solution.

    • Abnaxis says:

      Except he does care about order, when the particle limit is reached. When he hits the limit, he starts popping particles in FIFO order.

      For my part, I probably would rolled my own queue instead of using stl. I don’t think any of the STL structs are good both for FIFO removals and random-access removals that this application needs. std::vector stinks at FIFO and std::queue stinks at erasing stuff in the middle.

      EDIT: Or, I guess, use one of the standard lists and use iterators to do popping and removing in a loop, but there’s not a whole heck of a lot of difference between that and a (implemented, documented and maintained by me in plain understandable syntax) hand-made data structure at that point…

      • Kian says:

        He could remove the limit, and test if the limit even needs to exist at all. As he said, he only found the issue when he reached the arbitrarily placed limit. He moved the limit up and his performance wasn’t affected. Sure, there’s an upper limit that he can’t realistically pass. But if in the normal course of the game you don’t get close to it, who cares?

        • Abnaxis says:

          I don’t care whether the limit needs to exist or not.

          The design goal is to create a data structure that allows us to both efficiently drop particles that have reached their end-of-life and also to drop particles in FIFO order once an arbitrary number of particles is reached. I don’t care what the reasoning behind this design goal is–that’s the design goal.

          Bar none, the most frustrating phrase you will ever find in a discussion about programming is “why would you want to do that?” It doesn’t matter if dropping extra particles is good or bad for performance, it matters that the design goals are met.

          • Kian says:

            Well, we might argue about the design goals then. I think the design goal is to store particles for the purpose of running the game efficiently. The limit is a constraint that flows from that design goal, since processing too many particles could slow down the game. However, without evidence that the limit is necessary it is an unreasonable constraint, and so it is reasonable to leave it out.

            • Abnaxis says:

              Define “unreasonable”? It’s five lines of code that, on a first-pass inspection, seem like a perfectly reasonable thing to include to make sure players can’t find some nice edge case where they can make infinity particles and crash he system. It’s only in hindsight that we know it causes a problem.

              And even then, that’s missing the point. The point of this entire exercise is to to say “this is what we’re trying to do, this is why the way we’re doing it doesn’t work, this is a better way to accomplish our aims.” Dropping the limit is all well and good, until you run across the next situation where you actually need to implement a limit without hanging the system (because you’re dealing with polygons or projectiles instead of particles), and when you go looking for help all you can find is “just don’t check the limit.”

              • Kian says:

                “Unreasonable”: there is no clear reason why it is implemented this specific way.

                “The point of this entire exercise is to to say “this is what we’re trying to do, this is why the way we’re doing it doesn’t work, this is a better way to accomplish our aims.””

                I’m of the opinion that the point is to ship a functional game. Not create a particular data structure with a set of arbitrary constraints. And in any case, I’m following your line of reasoning:

                “this is what we’re trying to do”: Remove the oldest particle from the list.
                “this is why the way we’re doing it doesn’t work”: Vectors are terrible at removing from the front, which is where we were storing the oldest particles.
                “this is a better way to accomplish our aims”: Don’t remove the oldest particles, because it is most likely superfluous. Determine first if it is needed at all.

                • Abnaxis says:

                  “this is what we’re trying to do”: Remove the oldest particle from the list.
                  “this is why the way we’re doing it doesn’t work”: Vectors are terrible at removing from the front, which is where we were storing the oldest particles.
                  “this is a better way to accomplish our aims”: Don’t remove the oldest particles, because it is most likely superfluous. Determine first if it is needed at all.

                  You are literally saying, “the best way to accomplish a thing is to not try to accomplish a thing.” This is starting to get into one of those obtuse arguments that flame-wars are made of.

                  As far as you and I are concerned, as strangers on the internet who have zero point of reference for the performance of the code in question other than “it works slower when the particle limit is reached,” it is 100% unhelpful to say “justify why you want to include a particle limit before we will discuss how to implement it.” There are a million reasons why you might want to implement a hard limit on the number of particles rendered–some of which might be performance-related but some of which aren’t. Those reasons aren’t germane to the design problem being postulated.

                  • Purple Library Guy says:

                    Well, I suppose it depends on whether you’re approaching it as a problem with shipping a game or a problem with implementing an idea. Both approaches have their benefits.
                    In this case, I suspect that as you’ve suggested there’s some value to having a particle limit even if you’re not sure it will ever be needed because that way you know that no weird corner cases that might shove the number of particles through the roof can come up. But that in itself is coming back to talking about problems with shipping a game, given which you shouldn’t be just ruling out of bounds the other guy’s question of whether the particle limit is needed to ship a game.

                    • Abnaxis says:

                      I feel like I’m fundamentally failing to make my point 0.o

                      The only reason why I’m talking about what value a particle limit might-or-might-not have is because I’m trying to convince that the whether or not there is value to it doesn’t matter. If someone goes on a forum to discuss some code they are trying to implement, responding with nothing other than “don’t even try unless you know it will improve performance” is the opposite of helpful. It’s responding to a request of “tell me how to do <thing>” with “I don’t think you need to know about <thing>.”

                      I know “early optimization is the root of all evil” and whatnot, but coming to the discussion with that attitude clutters up the discussion with non-answers and creates redundancy because later, when I have a use-case where I actually really do need <thing>, I have to come back and ask again in the vain hope someone might answer this time. This is a large part of why there are so many unhelpful Google results related to code questions. Even if Shamus doesn’t need to put a limit on particles, maybe Joe Google (or Future Shamus) does need a limit on SpaceMarines, and now gets to be frustrated by parsing through fifty threads of “why would you want to do that before you profile?”

                    • Kian says:

                      I understand your point, I think. I just don’t agree with it :P

                      If I understand correctly, you are of the opinion that if someone asks “how do I do x”, people should answer how to do x, first and foremost.

                      I’m on the camp that thinks that if someone asks “how do I do x”, it’s important to understand why they want to do x in the first place, and if x is indeed the best approach. The fact that you don’t know how to do it is already a red flag. There are many reasons why someone might be struggling to do something. It may be that the language is trying to steer you away from that, or the library you are using is trying to steer you away, etc.

                      The point being, you can’t just jump straight to answering the question. You need to believe that you will actually be helping the person by providing an answer. In this particular example, Shamus posted that he increased the limit to five times it’s previous value, and it didn’t affect performance. Since the limit is a performance guard in the first place, the fact that changing it has no effect is a glaring fault. In fact, the limit is now introducing a performance issue because it needs to be enforced.

                      Design is all about trade-offs. The question “how do I implement this limit” has no correct answer unless you know what the implications of all the different trade-offs are. There is no single “right” answer, each answer depends on the situation you are trying to solve. In this case, I don’t see that there is a problem to fix in the first place, so any attempt to impose a limit would make the program worse.

                    • Abnaxis says:

                      I’m not necessarily saying you shouldn’t ever ask about why someone wants to do x, but that any sentiments in that vein should come only after you have actually answered the question posed. To me, a good answer is, “If I needed to do X, I would use Y function from library Z, but I would caution against going through all this effort unless you’re sure you actually need to in order to accomplish your goals because of Good Reasons.”

                      I have yet to observe any response other than utter annoyance if you try to tell someone why they shouldn’t be doing something without actually telling them how to do it, and with good reason. For one thing, if they don’t understand how to do X they sure as hell aren’t going to understand why not to do X. For another thing, it makes an assumption that they are telling you everything you need to know in order to evaluate whether they should be doing X, which probably isn’t the case if they are asking how to do X. Finally, the whole exchange amounts to basically spam to the guy who really does need X, and is coming through after the fact trying to figure out to do it through internet research.

                  • Kian says:

                    We know a bit more than that. From the post:
                    “Previously I had the number of particles limited to 2,000. Just for giggles I up the limit to 10,000 and have the player character spew a shower of hundreds of sparks every few milliseconds. Sure enough, we get a massive slowdown again. But the odd thing is that as the number of particles passes 3k, the framerate is still good. At 5k, the framerate is still good. Same thing at 8k. Then we hit the 10k limit and the framerate drops like a rock.

                    So it’s not that the system slows down when we have more than 2,000 polygons, it’s that the framerate tanks whenever we hit the limit, whatever that limit might be.

                    He increased the limit by a factor of 5, and it didn’t affect his performance. So why is the limit at 2k in the first place? If he hadn’t set a limit in the first place, he might not have gotten frustrated with the project and gone on a one year hiatus.

                    The limit is extra complexity. Extra complexity has to be justified with data. When something is hurting you for no gain, it is absolutely the correct thing to do to reevaluate if that thing is necessary. There is a cost associated with maintaining that limit, so the limit should be giving something back to the project. From what Shamus has described, all it’s done so far is cost a year of development and untold frustration for no measurable performance or robustness gain.

                    “There are a million reasons why you might want to implement a hard limit on the number of particles rendered”
                    Awesome. Name one that applies to this project.

                    EDIT- How do you quote things, or add code tags or such? I can’t html.

                    • Abnaxis says:

                      There are special <blockquote> tags for quoting things. Also, if you want to use angle braces in your text, you need to use the html escape codes (&lt; and &gt; for < and >) for them to appear. I don’t think there’s an explicit code tag, though.

                      On topic, it’s not our place to ask why the limit is there. Responding to an inquiry in this way is basically trying to Kobayashi Maru your way through the exchange, by turning the question around so you don’t have to answer it.

                      There are plenty, plenty of reasons to increase complexity without data. Buffer overflow prevention. Strict memory limitations. Not wanting your particle code to cause a CTD if another construct in your system (say, collision detection) spawns infinity particles. Code consistency. Giving players the option to choose how busy-full-of-particles their game is.

                      Even if none of these particular limitations are part of the design as has been presented to you right now, I guarantee there is someone, somewhere, designing a game who is balancing every one of these factors, as well as a hundred more I couldn’t think of off the top of my head. That’s why it doesn’t help to try to solely address why someone wants to implement whatever they’re asking about–even if you’re right and it doesn’t help performance in this particular case, you are actively making it harder to implement a limit in other cases where it actually would be useful.

                    • Downie says:

                      The particle system issue did not ‘cost a year of development’ (that was more of a “why isn’t this more fun?” issue I believe). It’s just a year-old bug he finally got around to investigating.

                    • Kian says:

                      There are plenty, plenty of reasons to increase complexity without data. Buffer overflow prevention. Strict memory limitations. Not wanting your particle code to cause a CTD if another construct in your system (say, collision detection) spawns infinity particles. Code consistency. Giving players the option to choose how busy-full-of-particles their game is.

                      Buffer overflow prevention is data. Or rather, the fact that a buffer overflow might happen is data. So you then check if the data applies to your case. It’s possible to write code in such a way that buffer overflows are not possible in a certain piece of code. In that case, there’s no point in adding buffer overflow prevention. For example, vector’s at() member provides checked access to an element (I think it throws if you try to access a member beyond the array). If you are using it, checking before accessing the element is redundant. And in any case, checking where the bug could be originating is better than trying to catch it’s effects later.

                      Strict memory limitations are also data. You can’t try to fit within a certain memory footprint without knowing what that footprint is. And in any case, you would then have to check where the best place to add limits is to maximize your gain. If you are blowing your memory budget by 5 MBs, limiting a 500 KB structure is not going to help you.

                      If your physics system is spawning infinity particles, maybe fix your physics system? Here you are falling into a trap of your own doing. You are trying to prevent possible bugs by writing more code, which could also have possible bugs. So how do you prevent the bugs from your protection code? More possibly buggy code?

                      Code consistency? Not sure what that means in this context.

                      Giving the player control over how busy it looks? You don’t do that by lowering the particle limit, you do that by lowering the amount of particles being spawned. Otherwise, the game looks just as busy until you hit the limit, and then particle effects start popping off before they are complete, so they look odd (a trail stops abruptly instead of fading away, etc). What you want to do is spawn 3 particles instead of 6, not kill 3 other particles before they are done.

                      See, you provide data for why you want something, and other options for achieving that goal pop up. If I just answered how to do it, I wouldn’t be helping you.

                      Anyway, above I mentioned one way of doing it, sorting the array and popping them. A different approach could be to “double buffer” your particle system. Have two arrays, and instead of operating on the particles in the current array, copy them over with the applied changes to a second array. The nice thing about this is that you can just not copy the ones you want to delete, so you never change the order, and if you are over budget you just start copying from some position beyond the beginning. You then swap the two arrays, or set the second as active, or however you want to call “setting the other active”.

                      This is a “functional” approach, and it has the side-benefit that your particle system is now parallelizable, since you can have several threads reading the vector and writing to different spots in the (preallocated) second array.

                    • Abnaxis says:

                      When you said “data” I assumed you meant “empirical measurement” not “piece of information.” Partly, because that’s usually what “data” means to me, and partially because the way you are defining it makes now the statement “extra complexity has to be justified with data” essentially parse down to “every piece of code you write has to have some manner of justification for you to write it.” Well, yeah.

                      I know you offered a solution above, but what spawned this whole dialog was the sentiment of “remove the limit and test whether you need it” in this particular thread, which didn’t offer any such solution (and, notably, was made before your made any suggestion). I’m not trying to be combative, I’m just following through on my assertion that was true when I made it–that only saying “don’t do X” isn’t helpful if someone asks “how do I X?”

                      All of the justifications I listed were off-the-cuff possible reasons for wanting to impose a limit. The important point is, that actually exist reasons why you would want to do it. In order to get those reasons, the asker has to understand all the reasons they might want to do it and has to express those reasons in a words that you have not misconstrued (which is actually my main source of frustration if anyone asks for my justification first–I know it’s justified, just take my word for it OK = ). They also need to understand how to do X before they can understand why to do X or avoid X, which means they will probably read your response, make some untoward comments regarding your ancestry, and move on to the next Google result in search of their answer.

                      I’m not saying the information you are trying to impart is completely without merit, or that your general philosophy is wrong, but you’re going about it wrong if you want to actually help someone understand X.

                    • Kian says:

                      When you said “data” I assumed you meant “empirical measurement” not “piece of information.” Partly, because that’s usually what “data” means to me, and partially because the way you are defining it makes now the statement “extra complexity has to be justified with data” essentially parse down to “every piece of code you write has to have some manner of justification for you to write it.” Well, yeah.

                      Well, I do mean “empirical measurement” most of the time, but it’s not always applicable. You don’t “measure” buffer overflows, for example. But you can know if they are likely or not. If you are building a library interface, you don’t have control over the inputs, so you assume that data is bad and you have to check that you are not using those inputs to do unreasonable things. But you only do that in the boundaries. Inside your code, you keep checks during development to catch bugs but remove them in release mode. The boundary checks however have to stay in release. You don’t need to test whether people are trying to abuse your code, you just take that on faith :P

                      Regarding this particular case, I’ll point out first that Shamus didn’t ask how to solve the issue, we just volunteered alternative solutions :P However, the second point and what I hope to get across, is that there is no single right way of doing x. If someone asks “how do I get to the library”, I could tell them how I’d get to the library, but that’s not likely to be useful to them. So I’d need to ask them “well, where do you live? Do you have a car or a bike? Where are you now?” It’s not reasonable for them to complain “Geez, what’s with all the questions?! I just want you to tell me how to get to the library!”

                      That’s what it feels like when someone asks “how do I do x” in programming. There are a million ways of doing any one particular thing. Which way is the right way of doing it depends on your requirements and situation. The way I would do something might not fit your situation, and it’s not free for me to come up with ways of doing things. It takes me time to write down directions. I don’t want to write down how I’d get to the library only to be met with “well, I tried it and it didn’t work, and now I’m even more lost!”

                    • Abnaxis says:

                      But your library analogy falls apart, because you aren’t asking for clarification on the request for directions, you’re questioning whether the desire to go to the library is valid. It’s the difference between someone asking “how do I get to the library?” and responding with “what direction are you coming from?”, or responding with “The library? But that’s, like, outside, and you have the internet. Why would you ever want to go to the library?” One of those gets me closer to the information I need. The other one might help me if I’ve never heard of the internet, but in all probability it’s more likely to just tick me off.

                      I would agree that a lot of times the answer to a question is “it depends…” That’s why I think it’s a fine idea to include a disclaimer alongside any suggestion made, to underscore the possible pros and cons of implementing X in the way you suggest.

                      And while there may be numerous ways to solve every problem, there are only so many ways to solve any particular problem cleanly. If you limit yourself to the answers that are simple and follow standard practices, it’s not that hard to spitball a solution and see if it sticks even if you don’t have complete knowledge of the issue.

                    • Kian says:

                      If you limit yourself to the answers that are simple and follow standard practices, it’s not that hard to spitball a solution and see if it sticks even if you don’t have complete knowledge of the issue.

                      It can come as condescending if you offer CS 101 answers to problems. Besides, what do you do if the simple, best practice answer is “don’t do that, it goes against best practices and isn’t simple”. Such as, implementing features that you don’t need. Like limits that don’t guard against perceivable problems :P

      • Svick says:

        What about std::deque? That is guaranteed to be fast at adding at the end and removing from the beginning. And you can remove from the middle, but doing that can be slow.

        • Zak McKracken says:

          I guess that would be a problem if you have particles with different life spans — in that case many of them would be removed from the middle and thus hard to access. If you could iterate through them and then remove them without the library internally having to iterate through to their address, that would make things easier — but I don’t know about the C++ implementation. Python does not seem to allow such a thing.

      • I wonder if FILO would make sense, as long as the limit is not hit then it does not really matter if it’s FIFO or FILO or anything else, everything is iterated anyway.

        But as soon as the limit is hit you want FILO (First In Last Out).

        Myself I’d probably do a resizeable array.

        The array can be sized/resized at startup (maybe by detcting memory etc) or my a graphics option in the game.
        During gameplay it would be static in size.

        It would be FILO order based on duration, maybe with priority added in as well.

        That way low priority effects won’t be added until higher priority ones are taken care of.
        And if all are the same priority then the ones with the longest duration are taken care of first (as one assume those are more important than short duration ones).
        If they are the same duration then the oldest one is the first in the queue.

        So that’s priority & duration & time, 3 factors to help decide how to handle a list.
        And also discard those deemed too old to handle (particle effects are usually very time specific).

    • Volfram says:

      “The way to delete things from the middle is to swap the item you want to delete with the last item in the vector, then pop the last item.”
      I wanted to suggest exactly this.

      My intent for my own particle system was to have all the particles in a (more or less) static array. When a particle is “removed,” its index is zeroed from the array and marked as available. When a new particle is added, it goes through the array until it finds an available spot and is placed there. No available spots? Either that particle isn’t created, or kill the particle closest to expiration and take its spot.(haven’t written it, so I haven’t settled on a particular solution.)

      Downside: you never have fewer than N particles. Upside: you never have more than N particles and don’t need to fiddle with memory access.

      So instead I wrote a procedural image library. 1000:1 or better effective compression ratios, and it only took 4500 lines of monolithic, barely-commented code.

  8. Septyn says:

    And this is why you don’t trust C++ for anything. :)
    Reminds me of a Perl project where I had to anonymize users in such a way as to reverse it if necessary. just assign a unique random number to each, right? Easy. Except when you have 60,000 users and Perl’s native rand() has a length of 32,767…which I only discovered after the scrpit that should run for 5 minutes hadn’t completed after a long lunch. Hidden pitfalls in programming languages suck.

    • Kian says:

      That’s not a hidden pitfall. “I didn’t read the docs” is not a pitfall. You have to check that functions you call actually fit your purpose before calling them.

      C++ has some of the best documentation available. Not only is it comprehensive, you can find multiple sites with examples of use. My favorite is http://en.cppreference.com/w/Main_Page, but there are other good ones out there.

      Shamus’ problem was his algorithm for removing elements was a poor fit for the data structure he chose. He’s used to using lists (which while simple to implement, and easy to add or remove at arbitrary points, are horrible for modern computers to traverse because they induce a cache miss for EVERY SINGLE ELEMENT), so vector’s implementation caught him by surprise. But if he’d hand-rolled his own vector, it would have been just as bad. Of course, then he wouldn’t have done it because he’d have understood better why it was a bad idea.

      • Shamus says:

        So that’s the documentation. I’ve done dozens of searches on stl stuff, and Google NEVER had that site anywhere in the top 3. That’s not the fault of the language, of course. But it is really curious.

        • Kian says:

          Huh. I must have trained Google well during my years of searching for “c++ std::string”, “c++ std::vector” and such :P These days, it offers loads of results for c++ documentation for anything remotely like a programming query. Glad to be of assistance.

          • psivamp says:

            Ditto. I get cppreference as my top result for any C/C++ standard component or function.

            Roger’s comment below may also help explain why my coworkers and I can find solid research and our CTO can’t.

        • That’s the issue these days, when folks say “just google it” your response should be “which google?”

          You have “their” Google, you have “your” Google and then you have “Google”.

          People say “Google” but they really mean “their” Google.

          Before people say “just google it” maybe they should test using Incognito mode first, you may just see the result end up very different.

          Comparing using duckduckgo.com may also be worth it.

          It is not unusual for a search term to be one of the first two results for one person while ending up on page 3 for others.

          It is possible to turn off personalized search for Google (IIRC), assuming you can find the right place to do so that is.

          • DIN aDN says:

            Wait, what? Google curates searches on a per-user basis these days?

            Holy cow. I did not realise this. That’s like – that’s like having your local library gradually reorder itself based on your borrowing habits. If your local library was for-profit. Which in that case case, I could see it doing s- damn it. Must stop thinking of search engines like catalog browsers.

            Still, that is very good to know.

        • Bryan says:

          Well, it’s documentation anyway.

          Neither cplusplus.com nor cppreference.com nor SGI’s STL docs are the official documentation, but both cppreference and the SGI docs have been good places for me to look. Not sure which of them is better, or where cplusplus.com falls in that ordering…

    • Neko says:

      Oh wow. I was incredulous to hear that, thinking “no way can Perl’s rand() return such a limited range”, but wanted to check and found this stackoverflow post that confirms it can be that limited.

      In Perl 5’s defence, it’s using the underlying system’s random number generator, and while I have 48 bits available (checked via `perl -V:randbits`), apparently on Win32 you only get 15.

      • Many PRNGs spits out a floating point number and a range is created based on that, if that floating point number if 32bit then that limits the precision and hence range you can get.
        It really depends on how the PRNG is designed.

        • Gnagn says:

          C++’s rand() is only required to return ints between 0 and 32767. This actually bit us once when migrating from Borland C++ to Visual C++. Borland had defined RAND_MAX as 4,294,967,295 (max uint), but Microsoft had it at 32767 and this killed some routines that were depending on the larger range. Luckily we were migrating to a version of VC++ that supported C++11’s random library, and could just rip out the crappy generator and put in a shiny new one.

  9. kikito says:

    Huge disclaimer: my C++ days are long over so take what I say with a grain of salt.

    The std lib provides several List implementations. In particular, http://www.cplusplus.com/reference/forward_list/forward_list/ seems to do that you wrote “automatically”, keeping the boilerplate to a minimum.

    If all you are doing is adding, drawing, and erasing particles, it would seem that you don’t need to parse them in any particular order, or access them randomly. Yet you are saying that you need them. May I ask why?

    • Kian says:

      PLEASE don’t use lists. Lists are horrible data structures. They might have been better back when the memory/cpu speed divide was smaller, so it didn’t take hundreds of CPU cycles to fetch things from RAM, but now a days, everyone who knows about containers will tell you to stay away from them. Including Stroustrup, the guy who came up with the language: http://blog.davidecoppola.com/2014/05/cpp-benchmarks-vector-vs-list-vs-deque/

      • Xeorm says:

        Well, they’re still useful at times, as long as they’re used properly and at the right times. Your link, for example, shows exactly how you don’t want to use them. If one is using them, they’re only useful when you’re going through them sequentially and handling insertion/deletion during that time, rather than doing random insertions and deletions from outside like that.

        This does tend to mean that you should never use std::list because it’s not really built to take advantage of the strengths of a linked list, and so fails to be at all useful. Most of the time they’re useful would be more as logical storage/linking rather than as a pure storage mechanism. http://gameprogrammingpatterns.com/object-pool.html is a great example of where a linked list can be quite useful.

        • Abnaxis says:

          Yeah, I was thinking the same thing. I know a lot of the performance hit isn’t just from having to iterate through the list on a removal operation, but I wonder how much it reduces overhead if you’re iterating through the list already? Like in the particle example–if Shamus made a list, then used an iterator to go through and test/remove each element and copy it into an array bound for a GL render call all in the same loop, it shouldn’t be a huge burden on performance. It’s all O(1) removals.

        • David says:

          So, linked lists turn out to be one of those really really popular data structures that get talked about in CS classes, but that you should almost never use. Doesn’t matter whether it’s std::list or your own implementation. So the first link you sent was generally correct, but they didn’t explain why. So why shouldn’t you use linked lists? Cache locality.

          The way it works is that whenever you do a memory read from a location, the CPU brings in a whole bunch of the memory around the location into the cache. Accessing data from the cache is extremely fast. Accessing data from RAM is much slower.

          So what does all this have to do with anything? Well, with an array or a std::vector, all the data is stored contiguously in memory. This is guaranteed. So when you iterate over a vector, all the neighboring vector elements get pulled into the cache, so you only have to hit RAM at most once. But with a linked list, your list elements are stored all over the place in RAM, which means that when you access one element of the list while iterating over it, your cache gets polluted with a bunch of meaningless garbage. Then you’re hitting RAM for every single element of your list, which is ridiculously slow if you care about performance.

          There are other problems with linked lists that make them slow, too. The short version is that you never use a linked list unless you plan to do lots of insertions and never iterate over the list (which is kindof a ridiculous scenario in most cases).

          • Abnaxis says:

            The weird thing is, there’s no reason for list allocation to be so inefficient. I don’t see any reason the STL couldn’t just as easily allocate memory to a list the same way it allocates vectors it just…doesn’t, I guess.

            I wonder if future iterations of the C++ standard will rectify this, or if we are going to have this conflict between how the structure works and how CS students are taught for the next few decades.

            • Kian says:

              There are reasons for why it’s terrible. It’s because it has to fit certain constraints. If it allocated a pool like vector, then iterators or pointers to elements might be invalidated when you add things, since it might need to move all the elements to a bigger contiguous block of memory. Lists, however, promise that once you have a pointer or iterator you can play around with the list and it will always be valid until you destruct it.

              You might counter with “well, why not allocate buckets, like deque?” If it did that, then memory usage could ballon, since you might reach a situation where each bucket has one element, since you removed all the other ones.

              “Ah, but you could allocate new elements in the gaps in between elements and avoid that!” But doing that would require keeping an associated data structure that keeps track of all that, which would hurt insertion and removal times, which are also constraints.

              Ultimately, there is no reason to try to “fix” lists. If a list doesn’t fit your requirements, then use another data structure. No data structure is perfect, they each offer different trade-offs. Lists’ main advantage is they’re easy to build, which is why they show up prominently in CS courses. They can be used to learn about data structures, interfaces, algorithms and complexity and efficiency. They’re just no good for production code.

      • Blake says:

        I thought the same when Shamus mentioned using a linked list.
        So many cache misses.

        I think a circular buffer is the way to go in this case, either plugging holes by swapping another element in or just allowing the holes knowing they’ll be re-usable when the elements before them get used up.
        This also means if your buffer gets full, you can always remove the first element knowing it was the oldest element.

        Contiguous memory, fixed size and fast.

  10. Adam Phant says:

    Using particles to make trails seems wasteful. I don’t know if OpenGL supports this, but in HLSL you can write a (vertex?) shader that creates a polygon trail, with a gradient of vertex colors. I think the vertex colors can be alpha blended too, so the tail end fades off.

    • WILL says:

      Or you use dynamic buffers and OpenGL3 functions to render a single trail object. I’ve made simple implementation of it and it works pretty well.

      You might be thinking of geometry shaders.

      • Richard says:

        Geometry shaders are great for particles.

        You can just hand the GPU the ‘global’ stuff (texture map etc) once, then each frame gets a list of particle coordinates, each with a type and any other per-particle data, and boom, particles coming out of your ears!

      • Adam Phant says:

        I probably was thinking of geometry shaders. There are too many types of shaders.

  11. Issachar says:

    Shamus, I can’t help thinking that the trailer might spark more interest in the game if it offered just a bit more information. Maybe a few short points like:

    Infinite Levels (if that’s accurate to say)
    More than X types of enemies
    Customize your robot through power-ups
    …and so forth.

    Good luck — I really hope Good Robot gets votes quickly!

    • Invariel says:

      The problem with “More than X types of enemies” is that Y > X is frequently Y + 1. I do agree that powerups (in addition to the already mentioned skill trees) would be a nice mention.

      • Arvind says:

        Good Robot’s design is still evolving and we didn’t want to make promises like “more than 15 types of enemies” until we’re sure of the number ourselves.

        Skill trees, infinite levels and so on will be elaborated upon once we’re happy enough with the feature to show it to the public. I think for the Greenlight trailer it was probably better to err on the side of caution (that being said, our future trailers will definitely use these suggestions).

    • Kian says:

      I’m not very fond of trailers that claim “over X kind of things!” and such. I mean, you could make hundreds of enemy “types” by re-skinning them or changing some minimal aspect, and it wouldn’t add to gameplay. “Monster A has 100 hp and attacks for 50, while monster B has 101 hp and attacks for 49. Totally different!” In essence, you’re just saying “Numbers!”

  12. Andy says:

    Why does it sound like all of your shots are being fired in my right ear?

  13. Mad says:

    Why not use std::list or as somebody else mentioned std::forward_list ?

    • Kian says:

      Because lists are bad and should not be used.

    • WILL says:

      Lists cause massive amounts of cache misses. You should practically never use them.

      • Blake says:

        Agreed, and a particle system designed to have thousands of elements is just about the worst case for this.
        You’ll really be wanting contiguous memory.

      • Bryan says:

        No more, and no fewer, than a vector of pointers.

        And in particular, you *shouldn’t care* if the pointer traversal is actually causing you performance problems, until and unless profiling points to that being the problem.

        In the case of this post, the profiling pointed to the vector mid-element deletion being the problem. If future slowdowns happen because of the pointer chasing (e.g. the CPU performance counters show a lot of cache misses) then maybe consider another data structure. But for now, lists are *far* more of a win because the garbage collection doesn’t require an N**2 algorithm.

        Don’t try to microoptimize the constants out in front of the algorithm when you can just change N**2 to N. (Well, unless N is tiny…)

  14. Neko says:

    I wonder if another problem you were hitting is std::vector’s reallocation-copy-free whenever it got past its reserved size. I’ve always found it to have sucky performance unless I explicitly .reserve() a suitable amount of space beforehand.

    I think your singly-linked list solution is quite elegant really – use the right tool for the job. You don’t need random access by index for your particles. You do want some guarantees about the complexity of insertion operations.

    As for deletion though, consider setting a simple flag on inactive particles once they ‘die’, and then amortizing your ‘clean up’ operation over several frames. You can skip over inactive particles when rendering, and every 10 frames or so check for and delete a few of the inactive entries so your list doesn’t balloon up with dead particles. Or forget list management entirely and use a fixed array from the beginning – it would limit the max number of particles you could ever have, but would avoid costly news and frees.

    • If using a array there is no need to free anything, you just mark it as being “free” and then re-use it.

      As said in another comment by me here, the size of the array is determined at game start (based on memory or something else) or at level start or based on graphics settings in the game (that the user can change) or a mix of all three ways to decide on a size.

      One danger of actually freeing once every xx frames is that you could end up with a skipped frame if the processing causes the frame render to be delayed (because the list was being vacuumed).

  15. Kian says:

    It seems like having a limit at all is a bit of premature optimization. What analysis did you use to set the limit at 1000? Is a limit even necessary, in the normal course of the game?

    You could test a few runs, log the framerate and number of particles, and see if there is a correlation that might imply you need to set a limit in the first place. If you don’t need to set a limit, then just don’t. I know this probably feels crazy, you’ve been coding for longer than I have so you might be used to being hardware constrained. Current day computers, even computers half a decade old, are ridiculously powerful. You’re unlikely to bump into problems, and if you do, that’s when you start looking into how to fix it. Maybe a limit is necessary, but maybe that limit is 500k, not 1k. You need to test first.

    • Valid points. Unless the desire is to wash the entire screen in particles there is automatically a average number of particles for the levels.
      Setting a max slightly over that, and have vacuuming occur (if a list) will provide runaway memory I guess.

      If a sizeable array per level then vacuuming may never be needed, I do not forsee Good Robot using Physx kind level of particles or whatever.
      So even a modest PC should be able to handle whatever particles is thrown at it.

      Memory is not really the issue here (hence why a array can easily be used) and more about how many particles can be handled at once without framerate drops.

      There are a few rare games that let you set the max number of soundeffects (and max number of particles) the game is permitted to use.

      This can be amusing (or annoying depending on your point of view) on games as you notice sounds (like shots) fail to be heard even though you see them.

  16. Drew says:

    Very excited for you Shamus.

    Am I the only one who reads the logo as “God Robot”? I understand, the eyes are supposed to create two different Os, but it reads to me as God Robot, because the head is a single item.

    Oh well, still pretty cool to have a project on Greenlight. Here’s hoping you get the necessary votes.

  17. Abnaxis says:

    Does anyone else do a double-take on the good robot logo while associating it with Bad Robot, or is that just me?

  18. Bropocalypse says:

    This makes me think of why I was seeing such a lag in my code when I was working on pathfinding… Part of the operation involved removing extra nodes in the path to achieve a smoother trail, and I was removing them from a list… It makes more sense now. I’m sure there were further things I could have done to make it tidy and efficient, but that project is long dead.

    • Reusable array. If you know that a path is never longer then say 1000 then make a array that size.
      And if you want to smooth it then simply “disable” the nodes in question.

      The array itself can be traversed very quickly (skipping the disabled nodes) to show a smoother path.

      Memory use is rarely the enemy today.

      • Bropocalypse says:

        That makes sense… Though given I was trying to make an RTS engine I was afraid that having several hundred lists of paths in memory was going to be a hog… Though in retrospect it’d be chump change compared to the actual map grid.

  19. silver Harloe says:

    I haven’t looked at the implementation of STL, but the usual biggest problem with linked lists is memory fragmentation. Over time you end up with elements scattered throughout your available memory, and it can make finding a space to put a new larger structure difficult.

    Then again, maybe C++ has a fix for memory fragmentation, because I haven’t heard of anyone talk about issues with allocating and freeing hundreds of thousands of small objects in years and years.

    Then again, I notice a lot of software that doesn’t exactly leak memory, but nevertheless loses performance with run-time. So it’s possible people just say “eh, memory is cheap” and don’t even care. Or don’t run long enough or varied enough tests for them to ever notice.

  20. Cybron says:

    “I spend several seconds looking dumbfounded at the screen before I begin nodding my head. Of course. OF COURSE. I am a fool.”

    The best description of the debugging process I’ve seen in a while. Though my personal experience tends to includes heavy doses of “What knucklehead wrote this crap?” with the answer only occasionally being me.

    Also, I love how the moment you bring up a programming problem your comment section turns into a completely typical coding discussion, complete with “Why would you want do that”, arguments about best practices, and RTFMs. Some things never change.

    • Abnaxis says:

      My personal favorite while debugging: “How the hell did this code ever even appear to work?”

      • Atle says:

        I’ve added a couple of those comments.

        But the worst situations when debugging code written by someone else, is when coming across code that is undocumented, and doesn’t seem to do anything. Sometimes it’s impossible to tell if it covers some special corner case I just can’t think of at the moment.

        • Alexander The 1st says:

          When I worked for a year on a particular project with the Seam framework, we actually had a template comment for a specific situation that otherwise would leave people scratching their head.

          Basically, the default way to start a new conversation was to annotate at the beginning of a function:

          @Begin(nested=true)
          someFunction(){

          }

          whereas the way we did it for this one sector of functions was:

          someFunction(){
          conversation.begin(true)
          }

          The comment explaining why we couldn’t just do the top was important for everytime we had the exception, because it’s an edge case not covered by the manual of the framework we used, and…well, it certainly looks like it’s just doing the same thing, so clearly the first way is right, right?

          Luckily we had the comment in question, or else I would’ve been confused everytime that change was made.

  21. lethal_guitar says:

    Awesome, I’m glad the project is now officially back on track. Please have my Greenlight vote! Keeping my fingers crossed that it quickly makes it onto Steam.

    I’m really excited to see it evolve, read about the development here on the blog, and, of course: Play it once it’s done!

  22. Draklaw says:

    Well, as already mentioned, don’t use linked-lists. They are really bad for cache coherency and doesn’t help much here (moreover, you would need some pool-allocator or something to reduce memory allocation overhead). Sure, you would not have had this problem with them, but it doesn’t mean they are the right answer.

    std::vector will be fine, and there are several way to do what you want with them. If you only need to delete the n firsts elements, myVector.erase(myVector.begin(), myVector.begin() + n) will do the job in linear time (and simplify your code). Alternatively, you can use a std::deque that allow to add/remove elements from both the beginning and the end in constant time while keeping data in a densely-packed array.

    If you want to remove particles according some arbitrary logic (e.g. particles with the “dead” flag on), you can use the std::remove_if() algorithm that does exactly that in linear time (or code it yourself, it is not that hard). For example: std::remove_if(myVector.begin(), myVector.end(), isDead), where isDead is a function (or a functor) that takes a particle reference in parameter and return true if the particle must be removed (is dead). I suggest to use this solution, which allow you to remove the particles count limit if you want. If you find it too costly, you can even run it only once every few frames and just skip dead particles while updating/rendering.

    Yet another approach if you stick with the particles count limit is to just not delete particles at all. Just flag them as deleted and do not update or render them. The problem is to insert new particles in the array. You can store a pointer to the first free slot and then store in each deleted particle a pointer to the next free slot, thus creating a linked-list of free slots. So adding or removing a particle boils down to adding or removing stuff from a linked-list. Yet, updating and rendering is still done by traversing a densely packed array. However, this is the most complicated solution and I am not sure it is significantly faster (if faster) than the previous one.

    In the end, the most important thing here is not the solution employed in the end, it is that data structures matters.

    • guy says:

      Yet another approach if you stick with the particles count limit is to just not delete particles at all. Just flag them as deleted and do not update or render them.

      *two hours later*

      “Why did I suddenly run out of memory? There’s only two enemies on the screen!”

      Well, actually the linked list would fix that. But then you have an extra pointer per particle and the C++ people scream at you.

      • Draklaw says:

        Well, by “if you stick with the particles count limit” I mean that the particles array is preallocated at a fixed size, so running out of memory is not an option. The trick is to reuse deleted particles when you add a new one, hence the “linked-list of free slots”.

        Linked-lists are rarely useful. Sure, you can insert or remove elements anywhere in constant time, but in order to do this, you need a pointer to the element before the one you want to insert/delete (or to the element you want to delete if this is a double-linked list). If you don’t have it, you will need to search through your list, which will be a O(n) operation, so no better than vectors.

        In Shamus case, if he want to remove all particles that are too old, assuming an unsorted list of particles, it require an O(n) operation both with a linked-list an an array/vector. The difference is that the array approach will need to move a lot of data (if you remove only the first element, the whole array need to be moved), where the linked-list will only modify a few pointers. But on modern hardware, unless you deal with big particles, moving stuff will be much less a problem than cache-miss due to the linked-list.

        However, a thing to take in consideration in C++ is that moving data is not trivial. Doing memmove is most of the time not allowed because objects may have constructor/destructor that does arbitrary stuff. So, std::vector need to call the copy constructor for each object it moves, then the destructor of the previous one (should be no-op in the particle case). (In C++11, the move constructor replace the copy constructor in this case, but it does not change anything as long as the class does not do memory allocation, which should not be the case here.) If the particles are plain C-like struct, the STL may decide to do a memmove, but I think it depends on the STL implementation you use.

        Still, for a few thousand particles, I don’t think it would be a problem, but the only way to know is to test and profile…

        • Carlos says:

          A few points:

          “Linked-lists are rarely useful”
          Linked lists are useful for exactly 2 use-cases:
          1. You need the memory address of the item you’re storing to remain constant while it’s live. std::list (a doubly-linked list) guarantees this.
          2. You need a sequence container or can’t implement operator< for a std::set, and you need O(1) removal of an arbitrary item. In my experience, this is an uncommon use case as other std data structures tend to perform better even in these conditions.

          "O(n) operation both with a linked-list an an array/vector"
          I see a lot of people in here saying this, and I find it a bit disingenuous. To be more accurate, you must say:
          + If n is the number of items in the container being removed from,
          + and k is then number of items being removed from the front of the container,
          + then removing the elements is O(n) for a vector
          + and O(k) for a list.
          This is an important distinction because if you're only removing one element from the front, then for a list it's a constant-time operation, but for a vector, you still pay a huge penalty in moving around all following elements back by 1.

          "moving stuff will be much less a problem than cache-miss due to the linked-list"
          Given that Shamus examined the runtime of his code, and found the move to be expensive, I'd say that the vector's move is much more expensive than the cache-miss due to the linked-list. I'm not saying that a linked-list is a better structure for this, but that the move is happening often enough (he's probably batch-deleting every frame, or every few frames), that the cost is huge. Further, you're not considering that Shamus is doing more than deleting particles in batch, he's iterating over all particles near every frame. Using a linked list, that's ~O(num_particles) cache misses for every frame. Cache misses are not cheap, and this really adds up. Let me illustrate my point with a little test. Here's a C++ function:


          constexpr int kNumItems = 100000;

          template <Container>
          int GetSum(Container& c) {
          for (int i = 0; i < kNumItems; ++i) {
          c.push_back(i);
          }

          int sum = 0;
          for (int i = 0; i < kNumItems; ++i) {
          for (int& j : c) {
          sum += j;
          }
          }

          return sum;
          }

          If I run it with a vector:


          #include <vector>
          #include <iostream>

          int main() {
          std::vector c;
          c.reserve(kNumItems);
          std::cout << GetSum(c) << std::endl;
          return 0;
          }

          I get the results:


          $ clang++ -O3 -std=c++11 ./cache-miss.cc && time ./a.out
          677203456

          real 0m1.222s
          user 0m1.220s
          sys 0m0.000s

          If I run it with a doubly-linked list:


          #include <list>
          #include <iostream>

          int main() {
          std::list c;
          std::cout << GetSum(c) << std::endl;
          return 0;
          }

          I get the results:


          $ clang++ -O3 -std=c++11 ./cache-miss.cc && time ./a.out
          677203456

          real 0m14.763s
          user 0m14.744s
          sys 0m0.000s

          And one more time using a singly-linked list instead of a doubly-linked list:


          #include <forward_list>
          #include <iostream>

          constexpr int kNumItems = 100000;

          template <Container>
          int GetSum(Container& c) {
          for (int i = 0; i < kNumItems; ++i) {
          c.push_front(i);
          }

          int sum = 0;
          for (int i = 0; i < kNumItems; ++i) {
          for (int& j : c) {
          sum += j;
          }
          }

          return sum;
          }

          int main() {
          std::forward_list c;
          std::cout << GetSum(c) << std::endl;
          return 0;
          }

          $ clang++ -O3 -std=c++11 ./cache-miss.cc && time ./a.out
          677203456

          real 0m16.572s
          user 0m16.545s
          sys 0m0.004s

          That's an order of magnitude worse in run time just for choosing a list instead of a vector for iteration. Almost all of that is due to cache misses in iterating over the list. So, if the majority of your time is spent iterating (as I imagine is the case with this particle manager), it’s worth it to spend time picking a data structure that gives you good data locality.

          For a case like this, I’d implement it as a ring-buffer. Just have a static array of particles, and keep two indexes into the buffer where the “first” and “last” item is. When you go over the size of your array, just over-write the “last” index modulo the size of the array, and increment the “first” index. The operation is equivalent to removing the first item from the array and adding a new one at the tail. Moreover, you don’t pay the cost of new memory allocation (you’re reusing the memory from the element you’re overwriting), and the whole thing happens in constant – O(1) – time.

          I’m really sorry if I sound too aggressive above. It’s really not my intention to. It’s just that most of my day is spent thinking about things like this, so it’s important to me when it comes up.

      • Kian says:

        It’s not just the (two, forward and back to be able to delete) extra pointers. It’s the cache miss every time you chase a pointer.

        Imagine you’re in a library. You have a table, a notepad, and a pen. You are handed a book with instructions to follow. So you start doing the operations in your little notepad, and the book says “find the numbers in page 6 of book #0123”. So you get up from the table, look in the library for book “#0123”, bring it to your table, and move to page 6. You start working on the numbers in there, and at the end of the page it says “continued on page 19 of book #9456”. So you get up, find the book, bring it back to your table, operate on the numbers in page 19, and at the end of the page it says “continued on page 63 of book #4782”. And so on, for every page. You’ve spent most of your time going to fetch books to bring to your table, and only operated on a handful of pages.

        Meanwhile, another person in another table had a single book filled with numbers. At most, he had to get up a handful of times when the book ran out of pages and told him to continue with the next book. He got a lot more work done than you, in much less time.

        You were working with a linked list. The other guy was working with a vector. That’s why lists are horrible. They keep your processor busy going to look for the information instead of operating on it.

  23. Worthstream says:

    And Rutskarn will be the writer, too!? O_O
    This will be a game that is fun to play while also having a good (or at least pun-filled) story!?

    Brb, going to upvote on Greenlight.

    • Hmm, is he going to be a bad robot (Josh), Good robot (Shamus), Punny robot (Rutskarn) or something else?
      And who will do the voice for the robot?

      SHamus or Rutskarn with some vocoder effect might do wonders, to provide small quips that can be played here and there.

      For example flying straight into the wall/edge he can say “Ow” or in the case of Josh doing play testing “Ow” “Ow” “Ow” “Ow” “Ow”.

      Or missing a super powerup (that just faded out maybe?) he could say “Aaaw!” in a kind of sad voice.

      And when moving to a new level he could say things like.

      “Uiiiiii” or “Ooooh” “Green, very green” “Pretty” and so on.

      Or if in a lava level and the lava can hurt and there is lava damage “Outch, hot hot hot”.

      Or if flying back up (in case the player though they missed anything but there really is nothing back that way)
      “Hello? Anyone there? Helloooo! I guess nothing’s there!”

      And should the robot get some silly orders (from whatever gives orders) he could maybe do a “Sigh! Really?!”

      Nice little touches like that.

  24. Zak McKracken says:

    Yeah, Good Robot!

    (Also, I very much trust this will be available outside of Steam, right?)

  25. The Snide Sniper says:

    There are tons of comments already, so I haven’t had a chance to read them…

    The solution is simple:
    Don’t remove elements from the beginning of the vector/array; instead replace them with elements from the end. This will be an O(1) operation.

    In psuedocode, when you want to remove particle i from an array of n particles, perform the following steps:
    Replace particle i’s data with the data from particle n
    Remove particle n from the array.

    With particles, you probably don’t care about the order in which they’re stored, so why spend all that time maintaining the order?

    • Yep, I came here to say this. Has the added advantage of avoiding the butt-ugly erase(myVector.begin() + i) syntax. But, as the man says, the problem here isn’t in finding nice solutions, it’s in noticing where the problem lies in the first place.

    • Alexander The 1st says:

      The problem here is that he actually does care about the order – the order is how he determines which particles to kill when the limit is reached, oldest ones first.

  26. MichaelG says:

    But at each draw cycle, you are going to want a contiguous array of triangle vertexes for the particles, right? You aren’t reallocating/generating this each time are you?

    So I would be thinking that the result drives the data structure, meaning it should be a fixed-sized array allocated in a single chunk. But since your vertexes won’t include the particle state info, there must be a parallel array that has this information.

    So if you used zero-sized triangles for deleted elements, then your parallel array would just use indexes for links and implement a free list in the deleted elements. Now you never allocate or delete any elements, just play with the list pointers.

    In my experience, it isn’t just data structure that causes problems. new and delete are deadly slow under C++ and should be avoided as much as possible.

    • Re-allocation is evil.
      Allocate/prepare as much as possible at the start of a level or the game. Then re-use what you can.

      Same is true for loops, setup/populate variables etc. before entering the loop.
      I’ve seen some do allocations in a loop which makes me cringe, if you know the number of loops you’ll do and the amount of memory you need then allocate it all before entering the loop. You get speed boosts and you avoid memory fragmentation as well.

      Thats the drawback of STLs and Frameworks and Middleware, you get too used to others doing the coding for you so you end up forgetting how to do logical simple coding yourself.

      I have often coded something myself but ended up using middleware/libraries as those had more pre-made features or better optimized code, but I only did that if I knew how to code the same thing myself.

      I hate blackbox coding. (in this context I mean using code that I have no idea how it actually does things).

      • Kian says:

        Well, vectors and similar STL containers do allow you to preallocate, with “reserve”. That way you only ask for memory once, and you can fill it up later without reallocating or copying.

  27. Mephane says:

    I am so looking forward to this game. I am especially looking forward to the procedural level generation. :)

  28. nm says:

    So this is super belated because I didn’t have time to comment earlier, but Shamus, you need a good profiler. This is the kind of issue you’d have found in 20 seconds by running the profiler and seeing that the majority of your time was spent on that one line of code.

    Valgrind has a profiler, and oprofile works well. The best profiler I’ve used was developed in-house for our weird architecture and just sampled the PC (using JTAG) every millisecond. Anyway, my point is that these tools are free to use and good at their jobs. There’s no reason to bang your head against the wall looking for the source of your performance problem.

  29. Geebs says:

    I wasn’t aware that there even was a way to remove a bunch of objects from an array other than making an array of things to delete and then removing them in a second step. I guess I now know why things work that way :)

  30. RCN says:

    Hmm… watching the video, I noticed very little feedback from enemy robots being shot. Maybe it is because of a piercing upgrade or whatever, but it felt remarkably low on the kinesthetics that Chris loves to faff about. I mean, you want feel your shots hitting their marks.

    Maybe the sound design makes the shots more satisfying, but the little bobbing of the enemy robots don’t seem to convey enough. But I can’t really say why, it certainly is MORE than the usual “enemy blinks with light every time it is hit” of other 2D old shooters.

    Maybe if there was more obvious particles of pieces and bits flying off the robots with each hit? I dunno… sometimes less is more, but other times more is more too.

  31. Spammy says:

    Putting on my Jim Sterling voice… Blah blah Greenlight is the devil blah any schmuck can put games up blah overcrowing blah curation blip blah there’s 5000 games on Steam Valve fix this for me clap your hands and thank God… for me.

    Not being serious, just trying to poke fun at Sterling’s style and his seemingly favorite subject matter.

    Also Shamus I was re-reading the Good Robot blog posts and Frontier: Reboot, Unearth, and Good Robot aren’t categorized for easy reading from the page your Programming button links to, like Frontier and Pixel City are.

  32. Forty says:

    In your trailer, all of the game audio is coming out of the right channel. Can you fix that? I makes it really tiring to listen to.

  33. Bropocalypse says:

    Shamus, you expressed that you were having some difficulty deciding on gameplay aspects. Is Pyrodactyl helping you with that? Are you free to discuss what changes or additions they’ve suggested?

  34. Erik says:

    As an embedded programmer working mostly in classic C, there’s a completely different pattern that immediately sprang to mind: use two lists.

    Sorta like this:
    {
    // Pop from one list, process, then push to the other.
    while (!empty(ParticleList)) {
    particle = pop(ParticleList);
    particle.process();
    push(particle, ProcessedList);
    }

    // Then swap the lists back for the next frame
    temp = ParticleList;
    ParticleList = ProcessedList;
    ProcessedList= temp;
    }

    This is a reasonably common pattern for things that run in shared interrupts, like system timers. When the timer interrupt goes off, pull all the expired timer blocks into a separate list and get out of the interrupt as fast as possible. Then, after you’re out of the interrupt, the OS can walk the Expired list and post the timer events without locking out the rest of the system hardware from being serviced. (In that case

  35. kdansky says:

    I urge anyone interested in these things to do a few things:

    1. Use a profiler. Visual Studio has a very good one included. Learn how to use it. Never optimize anything before you have profiled it. In 2015, it’s painfully amateurish to optimize without one, because we have a lot of unintuitive issues, such as cache behaviour.

    2. Read up on current performance benchmarks. Linked lists are generally horribly slow because they are cache-unfriendly. Replacing a vector with a list only results in a performance boost if the vector was mis-used. Even in typical “lists do this well” cases, lists are always slower than vectors nowadays, because we have so much cache. Herb Sutter has a great Channel 9 talk on the subject:
    http://channel9.msdn.com/Events/Build/2014/2-661

    The gist is: If you have 2000 elements in a list, and you iterate through them, you’ll suffer a few hundred cache-misses (roughly every 16 elements at best). Every single cache miss costs you 200 cycles for DRAM access. You can run through your 2000 element array in roughly 2200 cycles (1 cache miss for the first element, then 1 for 1). Copying 1999 elements is even cheaper than 2000 cycles, because you can do that unrolled at 4 operations per cycle. VC++ does that optimization automatically anyway.

    Which means it is many magnitudes cheaper to delete the head of the vector and copy it in full than it is to just look at the last element in a list. Your problem here is not that you delete the first element. Your problem is that you do it more than once per frame. You wrote a pattern that batches things, and then you didn’t batch them.

    The only reason to use a list is when you have external pointers that need to remember a specific object, and those objects need to go into a collection. That’s a bad design. Therefore no lists.

    3. If you want a buffer with particles, allocate a big array and mark the dead ones as dead instead of actually deleting them, then overwrite them. You’ll waste a few bytes on dead particles (a handful of megabytes at most), but spend much less time copying the same few values. Every so often you can pack your array if you really want to, but it’s most likely unnecessary. Allocate at the end until you run out of space, then start over at the beginning. Keep an “alive” counter so you know when to increase array size and don’t spend a ton of time going through filled space.

  36. ET says:

    @Shamus

    Instead of a list, why not use a height-balanced binary search tree instead? Then all your insertions, deletions, and lookups are O(log n) *, and you still get O(n) time for traversing the list. You’d have a blazing fast particle system that could handle a gojillion particles!

    * Log n is, for most purposes, close enough to a constant time, that it doesn’t matter.

    • Carlos says:

      The problem with tree structures is more or less the same problem with linked-list structures, since most trees are implemented using linked, heap-allocated nodes. That is, that you’ll incur a runtime overhead due to frequent cache misses when iterating over the nodes of the tree. I left a comment with a microbenchmark showing how bad this can be (in my case, an order of magnitude as compared to a vector).

      What a lot of people tend to forget about big-O is that it’s asymptotic and has hidden constants that can really throw off your estimates. On modern processors, conforming to hardware quirks is oftentimes more important towards performance than just using the algorithm with the best big-O runtime. For a really good example of this see:

      http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array

      Where someone finds that it’s an order of magnitude faster to sort an array before iterating through it [O(nlogn) + O(n)], than it is to iterate through it without sorting [O(n)]. I highly recommend reading the explanation as it was really eye-opening for me the first time I read it[1].

      [1]: TLDR, a component of your processor called the branch-predictor looks ahead and guesses which branch of an if statement you’ll take, and if it’s wrong, there’s a huge performance penalty. It happens to be right more often in this case of a sorted array than it is with the unsorted array.

  37. kaypy says:

    Is Good Robot likely to end up on GOG as well as steam? (I’m guessing so- eg my copy of Unrest is from GOG…)

    Given the above, is it dishonest to vote for it on Greenlight if I intend to buy it elsewhere given the choice?

  38. MrGuy says:

    Naive question.

    It sounds like a lot of the complexity comes from the fact that particles can get deleted from any point in the list. Which as I understand it being a result of particles having different “life spans,” and the actions that trigger the particles happening in random orders (e.g. a long-lived particle effect being triggered before a shorter lived effect).

    Just wondering if the single queue is a problem, and if we can get rid of the problem by doing the queueing differently.

    Rather than have a single queue of “all the particles,” have three queues – one for the “long lived particles,” one for the “medium lived particles,” and one for the “short lived particles.” Long particles last 1 second, medium 1/2 second, short 1/4 second. When a new particle is created, it is placed into the correct queue for its type (you can trade off the number of queues/number of time to live values you want to support – one list per type of particle might not be crazy).

    This approach gives us a set of lists where we always add to the end, and delete from the front – we never need to delete from the middle (because a particle never expires sooner than an older particle with the same TTL). The tradeoff is a larger number of queues that Particle Man has to iterate through, but each one is shorter, so the number of particle update operations is the same.

    • Dan Efran says:

      Sure, that avoids deleting from the middle. But you’re not really simplifying or solving a big problem. Deleting from the middle of an array is just a few lines of clever code, not “a lot of complexity.” It’s really nothing to fear. You just need to not do it a totally wrong way, is all.

      Maintaining different lists of particles because their lifespans differ? Also just a bit of extra code, and it works fine. To me it feels like too much extra bookkeeping for a typical particle system (which can be quite simple, and in which you really want to be able to make any kind of particle any time, even a never-before-seen kind, yet treat them all the same).

      Unless you need to do it for more compelling reasons…and I’ve certainly pre-sorted actors into multiple lists just as you suggest, when they’re in categories that get handled differently and it would be silly to categorize them again every frame while running through a single list. But for particle lifespan I wouldn’t bother.

    • kdansky says:

      You’re trying a heuristical approach to a problem that can be properly solved with proper data structure design.

  39. Jamie Pate says:

    Bah, use a sparse array. (mark particles as ‘inactive’ when they are not in use) you have the max number of particles before hand!
    Just keep the index of the most recent deleted one, then update that as you insert new ones (or a small queue of ‘new’ particles that you insert to empty spaces every time you encounter them as you scan the list for rendering. Particles don’t have to interact with each other (i guess they might have to interact with walls/entities though), you should be touching them once per frame max.

  40. Carlos says:

    Hey Shamus,
    If you still have the vector-based code lying around, I’d love to see what would happen if you did a quick find/replace of std::vector in your ParticleMan with std::deque. The two have an almost identical interface, so it should just be a drop-in replacement, but std::deque is implemented such that removing elements from the front of a std::deque is efficient (there’s no big shift of elements like you get with a std::vector).

    From http://en.cppreference.com/w/cpp/container/deque
    > Insertion or removal of elements at the end or beginning – amortized constant O(1)

    vs. http://en.cppreference.com/w/cpp/container/vector
    > Insertion or removal of elements – linear in distance to the end of the vector O(n)

  41. Kdansky says:

    You don’t need to delete from the front or add at the front, because *order does not matter for your particles*. You can just copy the last-most particle into the now-empty front spot, overwrite the dead particle in the process. Keep a single int to remember your last particle.

  42. Frank G says:

    I haven’t read through all of the replies so someone else may have already suggested this, but…

    If you’re iterating over the vector of particles every frame to do some operation, you can remove the old particles in that same iteration. Since the output vector size is guaranteed to be less than or equal to the input size, you can reuse the space. Just keep track of both the input and output position. You increment the input position for every particle. If the particle is kept, you write it to the next output position and increment the output position. This will compact the vector on-the-fly with little overhead vs. a read-only iteration, so it can be done every frame. It doesn’t even require creating a new vector.

  43. Nixitur says:

    “Assuming you don’t need to fiddle around with individual members.”
    I’m pretty sure that’s how you go blind.

  44. RichardB says:

    I reckon the best reference for these sorts of issues is still the book series “Effective C++”, “More Effective C++” and “Effective STL” by Scott Meyers. Very readable in bite-sized chunks, and taught me more than anything else out there. I can’t recommend them highly enough. I fact I’d go so far as to call them required reading, which is not something I do often.

    There’s also “Effective Modern C++” that deals with C++ 11 and 14 but I don’t write in C++ any more, so haven’t read it yet.

Leave a Reply

Comments are moderated and may not be posted immediately. Required fields are marked *

*
*

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun.

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>