I don’t have a new entry for Good Robot done, so I’m going to resort to the crutch of delinquent bloggers everywhere and post a YouTube video made by someone else and use it as a conversation-starter.
Computers do a lot of sorting. Alphabetizing names, sorting enemies by distance, arranging a list of polygons from far to near, putting a high score table in order, arranging files according to size or date, ordering search results by word frequency, and a million other things. There are a lot of different ways of sorting lists of values.Some clever person wrote a program to show the sorting take place and emit audio based on the current state of the list. I find the result to be kind of hypnotic:
Shamus, why are there so many different sort algorithms?
Good question, me. The reason is that there are a lot of different sort problems. The above sorts look like they’re sorting some really predictable data: A list of all the numbers from 1 to 1,000, or whatever. This is not always the case. Sometimes it is very not the case.
Back in the old days sort was depicted as being a tradeoff between speed and memory usage. If you’ve got lots of memory (or the data is small) then you can create space for a copy of the stuff you’re sorting and fill in the new space as you sort. If you don’t, then you have to move the entries around in-place. It’s like re-arranging a room. It’s easy if you’ve got another room you can shove some items into, and a lot harder if you have to keep all the furniture in the room as you work. A lot of sorts are optimized for different levels of memory use. Maybe one is slow but only requires room for 1 additional item. Another is faster but requires enough memory to contain half of the list. Another is faster still but requires a complete copy of the list, thus doubling memory usage.
Likewise, optimal techniques will vary based on what you’re sorting. If I’m sorting the integers from 1 to 100 and I’ve got a bunch of instances of the number 10 next to each other, I know those values are sorted relative to each other and don’t need to be fussed with. If I find another 10, I can stick it before or after these other tens and it doesn’t matter. But if I’m sorting floating-point numbers like 3.285714285714286 and 12.83333333333333 then I’m probably not going to have large blocks of same-value entries next to each other. Also, I think some sorts make guesses about where items appear in the list based on average or median values. That’s great if you can say, “These values all must fall within the range of 1-100, so I know this value of 80 will be somewhere near the top.” But if your list is (for whatever reason) all the digits from 1 to 1,000 but also there is a value of ten billion stuck in there, then assumptions about ranges or averages could really make a mess of things.
Then there are concerns about how scrambled the list might be. If you begin with the assumption that the list is complete chaos at the outset, then you might use something optimized for really complex sorting. But what if the list is only chaos sometimes, but the majority of the time it’s in perfect order with just one item out of place? Going back to our furniture-moving analogy: It’s overkill to shove all the furniture into the spare room if all you want to do is move the end-table.
Maybe you’ve got a data set that’s too massive to fit in memory, so you need a sort that that’s good at operating on isolated segments. Maybe the entries are massive and moving them takes a lot of time (maybe you need to write to disk every time you move something) so you want a system that might spend more CPU time thinking so it can do the sort in the fewest possible moves.
And let us not forget the all important cost: Programmer time. If you’re never doing more than occasionally sorting a couple hundred small items, then you should not, under any circumstances, worry, argue, or even think about what sort to use. Just use whatever sort is handy, and if you don’t have one handy just use stupid bubble sort. Programmer time is precious and your brainpower is probably needed elsewhere.
I say this knowing nobody will take my advice because getting things just right is part of the fun of this job.
There are two major schools of thought about how you should write software. Here's what they are and why people argue about it.
Lost Laughs in Leisure Suit Larry
Why was this classic adventure game so funny in the 80's, and why did it stop being funny?
Push the Button!
Scenes from Half-Life 2:Episode 2, showing Gordon Freeman being a jerk.
What did web browsers look like 20 years ago, and what kind of crazy features did they have?
Good to be the King?
Which would you rather be: A king in the middle ages, or a lower-income laborer in the 21st century?