It’s time to solve a mystery. (A boring technical coding mystery.)
It’s been bothering me for years: Why does my site go down when I get linked on Reddit or Stumbleupon? It shouldn’t. I use Hosting Matters, and they host blogs that are far larger than mine without difficulty. Usually my site will just vanish for a few hours, and after the fact I’ll be able to sort out what happened by looking at the incoming traffic.
Eventually I noticed that not all traffic surges were created equal. It was really only the ones that linked to DM of the Rings that crushed my site. Links to other stuff wouldn’t even cause a hiccup, even if the overall traffic spike was larger. The natural assumption is that it must be due to the fact that the comic is image-heavy. Except, this problem persisted even when the images were moved off-site. Hmmm. What about the scripts that set up all the comic navigation stuff? That seems like a stretch. Perhaps the script I wrote that embeds images so I don’t have to clutter up my prose with a lot of HTML? Meh. It might be inefficient, but I can’t imagine it’s anywhere near bad enough to bring down the site where the average comic was one or two images. But what else could it be? What else makes the comics different from the other posts on my site?
I’d ponder this problem for a day or so and then forget all about it until the next time my site went down. This cycle has been repeating itself for a couple of years now.
I’ve finally figured it out. And I am really embarrassed at how long it took me.
It’s the number of comments. Also: Duh.
I was always looking at the early comics, where the comment count is a couple dozen or so per entry. Later in the series the comments regularly top a hundred, and the finale clocks in at just under seven hundred comments.
I run some tests on my local machine, where I have a functional mirror of this blog. It takes my machine an agonizing three and a half seconds to generate the page for the finale. Now, I’m sure the server at Hosting Matters is many times faster than my humble computer, but 3.5 seconds is still insane. Get a few thousand people in there chewing through those high-comment posts, and it’s easy to see how they could render the machine helpless. I try disabling the display of comments and it is able to generate the page in just 0.14 seconds. So 96% of the CPU time is spent churning out the HTML for the comments, and the other 4% is spent on everything else.
But why? Why is it taking so long to list the comments? My first thought is that this is caused by my dice roller. The code to display the dice for a comment is a recursive operation that subtracts from a number and then calls itself until the number is zero.
As of this writing, the current comment count on the DM of the Rings finale is 678. This requires ten levels of recursion. (Six 100 sided dice and four 20 sided dice.) Keep in mind it has to roll a handful of dice for each and every comment in the list, with the number of dice needed gradually going up.
Lots of dice times lots of comments time lots of comment-heavy posts = a crap-ton of HTML.
Every comment has a bunch of HTML around it. Nested DIV tags and SPAN tags and TABLE tags and… wow. This is really messy. There is a lot of extra stuff in here. I remember how this came about: Most of it was for cross-platform reasons. The differing behaviors of IE 6, IE 7, and Firefox were crazy to sort through, and often making very slight changes would blow out the formatting on one browser or another. Once I had it working, I stopped messing with it because I was pissed off and sick of fighting with it. So I left this crufty mess in place, even though I most likely don’t need the majority of it.
Just to make sure I’m getting a clean test, I remove ALL of this extraneous HTML. I just have a single line-break between comments, but otherwise they are a big jumble. That’s fine. I’ll sort that out later. Finally, I disable the comment editing plugin and SuperCache so that I can be sure they aren’t part of the measurement.
The code that is left is as streamlined as it gets: WordPress yanks comments out of the database and spews them onto the page without doing anything else. It now takes almost exactly 2 seconds to build the page. Nice, but I have a feeling there is still a really big bottleneck in here someplace.
Time to get systematic. Each comment has several distinct parts:
1) The comment number
2) The dice
3) The avatar
4) The name / link of the author
5) The permalink of the comment itself, which is wrapped around…
6) The date on which the comment was made and…
7) The time of day of the comment
8) An admin edit link (just for me)
9) The actual comment text
Hm. So which of these steps would be the most time consuming? My primary suspects are #2, #3, and #9. Time to run some tests. I’m going to disable an item, reload the page a few times to see how much time it saves, then re-enable it and run the same test for the next one. This test isn’t super-accurate. Values jump around in 0.05 increments, but considering how huge our bottleneck is this shouldn’t be a problem.
But let’s see how it goes…
One of my prime suspects – my dice roller – is so fast that it’s basically “free”. (I give myself a pat on the back.) #9, the comment text, is indeed a high priced thing to process. But the absurd surprise is that #6 and #7 are really expensive. Together, it takes longer to show the date and time of when you left a comment then it does to display the actual text! What!?!
I am at a loss, here. Suddenly, everything I know is wrong.
While I can’t explain why it’s taking so long to do #6 and #7, I can clearly see that I don’t need to do them both. In both cases, I’m retrieving the timestamp, I’m just printing it differently. It’s like if your mom asked you what the date is, so you decide to walk five miles into town and look at the marque at he bank. After you walk all the way back home and tell her, she sends you back to the bank to find out what time it is. I don’t know why it’s taking so long, but at least we can get both nuggets of data on the same overlong trip.
That shaves a nice chunk off of the loading time.
The next thing I notice is that there are two ways I can get the comment text. I can do it the way I’ve been doing it:
Or I can replace it with a very similar line of code:
echo get_comment_text ();
The first is what takes over 0.6 seconds. The second happens so fast I can’t even measure it with the benchmarking tools I’m using. It is, in effect, “free”.
This is one of those changes that seems a little too good to be true. I must be giving up something by switching to the new way. But what? How do comment_text () and get_comment_text () differ?
I have looked in the WordPress docs, and found nothing. As in: Neither one is even listed. I looked in the support / discussion forums, and the WordPress community is its usual helpful self. In fact, someone actually asked my question.
Alan: Hey, what’s the difference between comment_text () and get_comment_text ()?
Betty: Have you tried looking them up in the docs?
Alan: Uh. Yeah? Nothing there?
Betty: Did you try reading the source?
I cannot stress how useless it is to post RTFM in a forum like this. If you don’t know, just keep your yap shut and maybe someone else will come along and help. Of course you can read the source. It’s time consuming and you’re most likely consulting the docs and forums in an attempt to learn from someone who has already done so. This is why the forum exists.
Perhaps I just have bad luck, but in all my trips to WordPress.com, this has been my experience every time. Nobody knows, and everyone tells you to read docs that don’t exist.
Item #8 is also inexplicably expensive. Its only job is to print nothing to every single visitor to the site, unless the viewer is me, in which case it needs to print a little link. I never click on that link, so this element can be removed. I have no idea why it eats so much time, but at least I can cut it.
We’re now down to the page taking 0.85 seconds. That’s a huge improvement, but it really bothers me that just printing the stupid date takes half of that.
I check into it some more, and it look like all the time is being eaten when converting date formats. Apparently the date of each comment is stored thusly:
But of course it’s much nicer to display that as:
March 29th, 2009 03:03pm
But the latter costs me 0.4 seconds. I can double the speed of this code by sacrificing usability. It seems like the thing to do is to keep the nice dates on posts with less than a hundred comments. Once a post passes this, it will start cutting corners and display the all numeric date.
(Remember, when I say 0.4 seconds, I’m talking about processing the 600+ comment page, not just a single comment. The differences would be too small for me to measure if I was only working with one comment at a time.)
I make some other minor changes, and in the end the page loading happens in about 0.31 seconds. It’s more than ten times faster than it was when I started.
I haven’t made the changes live yet. While mucking about with this mess I discovered that WordPress now supports threaded discussions. We need this here. We have large, diverse, and multi-threaded conversations on this site.
Still, it’s clear this place needs an update before the next traffic surge.
Why Batman Can't Kill
His problem isn't that he's dumb, the problem is that he bends the world he inhabits.
The Gradient of Plot Holes
Most stories have plot holes. The failure isn't that they exist, it's when you notice them while immersed in the story.
Trashing the Heap
What does it mean when a program crashes, and why does it happen?
A horrible, railroading, stupid, contrived, and painfully ill-conceived roleplaying campaign. All in good fun.
Trusting the System
How do you know the rules of the game are what the game claims? More importantly, how do the DEVELOPERS know?