Database Abuse

By Shamus Posted Wednesday Jan 31, 2018

Filed under: Projects 117 comments

The conventional wisdom is that WordPress is really database intensive. I’ve never seen an official critique of it or a proper comparison between WordPress and some other CMSContent Management System or publishing platform. This is just the sort of thing I see dismissive blowhards say on forums or StackOverflow. “What did you expect? WordPress puts a heavy load on a mySQL server.” I never really questioned it myself.

But last week got me thinking about it again when I tried to move this website to DreamHost. Is WordPress really that bad? How many dB queries does it take to create a single instance of the front page? How does that compare to other platforms?

Some Background

Here is a look at the dB for this site. We're over 400,000 comments these days. Crazy.
Here is a look at the dB for this site. We're over 400,000 comments these days. Crazy.

Just in case all of this jargon is opaque to you, here’s a quick explanation:

You’ve noticed that WordPress is a dynamic system. It’s not a bunch of fixed, static pages. The content of the pages changes constantly. When someone leaves a comment, you see it right awayAssuming my spam filter doesn’t decide to harass them for no reason.. On the front page, the number of comments is always up-to-date.

This is accomplished by lashing together two different bits of technology: PHP and mySQL.

PHP is a “Server-side scripting language”. It is also a hilariously dysfunctional language. I understand why it became popular: It solved an important problem, which was the need for dynamic content on the web. What I don’t understand is why – despite its abominable reputation as a nest of confusing syntax, security concerns, undocumented behavior, misleading function names, and alarming bugs – it hasn’t been replaced in the 23 years since it was introduced. Microsoft took a crack at it with their Active Server Pages, but the world runs mostly on Linux servers and Linux admins are not eager to adopt proprietary Microsoft software.

Anyway.

When you visit this site, your web browser requests a document called index.php, which is a PHP script. The server opens up the script and runs it. The script spits out some HTML, which the server then sends to you.

mySQL is a database. It holds information in tables. In the case of this blog, there’s a table with a list of every post on the site. There’s another table of every comment, along with what posts they belong to. Then we have a table that keeps track of all the categories. Another for users. And so on.

So when the WordPress PHP script runs, it will talk to the mySQL database and ask for information. Stuff like, “Hey, I’m building the front page so I need the 12 most recently published posts.” Or maybe “Excuse me, I’m trying to build a page for post number 41607 and I need a list of every comment assigned to this post.” PHP takes this information, crams it into some sort of shape according to the site theme, and presents it to you. Each of these SQL requests is called a query.

mySQL Servers Are Not Created Equal

Trust me, I have a system. Just DON'T UNPLUG ANYTHING.
Trust me, I have a system. Just DON'T UNPLUG ANYTHING.

For context: For the last 12 years, Hosting Matters (HM) has been my web host. Last week I briefly tried Dreamhost (DH) before canceling the service because it ran too slowly. On HM, the front page would generally load in less than a second. Let’s call it half a second. (With the exception of the hours between 11pm EST and 1am EST, when the site would be crazy slow for no reason I could ever figure out.) Once I’d made a copy of this site on DH, I found it loaded somewhere between 5 seconds and 10 seconds. So that’s somewhere between 10× and 20× slower. This is especially bad considering the site was still private and I was the only one viewing it. I can’t imagine how much slower it would be under normal traffic load.

Once we’d proved that the mySQL server was the bottleneck, I upgraded to their “premium” SQL server, where your database runs on a virtual machine all by itself and doesn’t have to share with other people using the same hostTechnically you’re still competing for CPU and memory with the other users, but now you’re supposedly guaranteed a fixed share of it. Or something.. This resulted in no change. That prompted me to cancel the service. I finally migrated to 1And1 hosting, which is what I’m using now. And we all remember how that turned out.

So How Many Queries?

According to my calculations, your plan is garbage.
According to my calculations, your plan is garbage.

So how many queries does it take to do something simple like assemble the front page of the blog? I decided to find out. Rather than spend hours tracing through the countless scripts that drive WordPress and figuring out the flow of the program, I figured I could look at things from the other end and just examine the mySQL logs.

This would be hard to do on a live site that’s open to the public. If you just watch the spew of queries as they come in then it will be hard to tell which scripts they belong to and what they’re all doing. But I have a copy of this site on my local machine. I was able to load the page exactly once with a clean log file. Afterwards, I looked in the log. Here is what I found:

Query	SET global general_log_file = "p:/wamp/mysql_general.log"
Version: 5.6.17 (MySQL Community Server (GPL)). started WITH:
TCP Port: 3306, Named Pipe: /tmp/mysql.sock
CONNECT	[email protected] ON 
SELECT @@SESSION.sql_mode
SET SESSION sql_mode='NO_ENGINE_SUBSTITUTION'
DB	shamusyo_dnd
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'
SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  WHERE 1=1  AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish')  ORDER BY wp_posts.post_date DESC LIMIT 0, 12
SELECT FOUND_ROWS()
SELECT wp_posts.* FROM wp_posts WHERE ID IN (41602,41591,41519,41384,41474,41480,41322,41455,41408,41172,41417,38241)
SELECT  t.*, tt.*, tr.object_id FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category', 'post_tag', 'post_format') AND tr.object_id IN (38241, 41172, 41322, 41384, 41408, 41417, 41455, 41474, 41480, 41519, 41591, 41602) ORDER BY t.name ASC
SELECT post_id, meta_key, meta_value FROM wp_postmeta WHERE post_id IN (38241,41172,41322,41384,41408,41417,41455,41474,41480,41519,41591,41602) ORDER BY meta_id ASC
SELECT option_value FROM wp_options WHERE option_name = 'can_compress_scripts' LIMIT 1
SELECT * FROM wp_posts WHERE ID = 34542 LIMIT 1
SELECT post_id, meta_key, meta_value FROM wp_postmeta WHERE post_id IN (34542) ORDER BY meta_id ASC
SELECT YEAR(post_date) AS `year`, MONTH(post_date) AS `month`, COUNT(ID) AS posts FROM wp_posts  WHERE post_type = 'post' AND post_status = 'publish' GROUP BY YEAR(post_date), MONTH(post_date) ORDER BY post_date DESC
SELECT  t.*, tt.* FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('category') AND tt.count > 0 ORDER BY t.name ASC
SELECT term_id, meta_key, meta_value FROM wp_termmeta WHERE term_id IN (12,607,609,102,1,287,14,530,587,55,611,5,498,610,242,13,508,7,468,9,111,16,11,66,10,4,8,554,188,15,608,120,598) ORDER BY meta_id ASC
SELECT * FROM wp_users WHERE ID = '1'
SELECT user_id, meta_key, meta_value FROM wp_usermeta WHERE user_id IN (1) ORDER BY umeta_id ASC
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41602) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-24 06:34:21' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (111) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-24 06:34:21' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (111) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41591) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-23 01:54:29' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (111) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-23 01:54:29' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (111) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41519) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-21 06:00:51' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (111) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-21 06:00:51' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (111) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41384) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-18 06:00:55' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (609) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-18 06:00:55' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (609) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41474) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-16 06:00:06' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (102) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT * FROM wp_posts WHERE ID = 41164 LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-16 06:00:06' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (102) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41480) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-14 07:51:50' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (5) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT * FROM wp_posts WHERE ID = 40917 LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-14 07:51:50' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (5) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41322) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-11 06:00:12' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (609) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-11 06:00:12' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (609) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41455) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-10 06:00:32' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (111) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT * FROM wp_posts WHERE ID = 41258 LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-10 06:00:32' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (111) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41408) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-09 06:00:47' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (120) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-09 06:00:47' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (120) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41172) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-07 06:00:28' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (4) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT * FROM wp_posts WHERE ID = 41240 LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-07 06:00:28' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (4) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
SELECT * FROM wp_users WHERE ID = '5'
SELECT user_id, meta_key, meta_value FROM wp_usermeta WHERE user_id IN (5) ORDER BY umeta_id ASC
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41417) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-05 12:23:36' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (120) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT * FROM wp_posts WHERE ID = 41369 LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-05 12:23:36' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (120) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (38241) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-04 06:00:47' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (609) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT * FROM wp_posts WHERE ID = 38156 LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-04 06:00:47' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (609) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1
CONNECT	[email protected] ON promo
Query	SELECT * FROM entries
Quit	
Quit

All of that activity is the result of loading the front page once. Note that I stripped out a lot of extraneous things like timestamps, tabs, and a few other bits in order to make this a little easier to follow. It wasn’t much, but every little bit helps.

My word processor counts 60 individual select statements. That seems like a lot, but I’ve never used Drupal, Joomla, Typo3, or any of the hot young CMS platforms that are out there right now. If you use one, I’d love to know how much dB activity the thing generates.

Breaking It Down

Please enjoy this picture of Jack Black explaining things as a way to break up this wall of text and code.
Please enjoy this picture of Jack Black explaining things as a way to break up this wall of text and code.

Are you curious what all that SQL gibberish means? No? Too bad, because you’re about to read an explanation. Prepare to be informed against your will.

Query	SET global general_log_file = "p:/wamp/mysql_general.log"
Version: 5.6.17 (MySQL Community Server (GPL)). started WITH:
TCP Port: 3306, Named Pipe: /tmp/mysql.sock
CONNECT	[email protected] ON 
SELECT @@SESSION.sql_mode
SET SESSION sql_mode='NO_ENGINE_SUBSTITUTION'
DB	shamusyo_dnd

This is just some basic handshaking. PHP says “Hi” to mySQL and lets it know what database it’s going to be using. The database is called “shamusyo_dnd”. “shamusyo” was my username on my old, old host way back in 2006, and they automatically put my username on the front of the database name. I called it “dnd” because this crazy site originally started out as the record of a D&D campaign. If you’re worried I’m creating some sort of security risk, I’ll point out that on my new host all the databases, users, and logins are different.

In a normal situation, it would be a security risk to reveal all of this information to the public, but in this case it only gives you insight to the setup I have on my home PC, and that server is not reachable from the outside internet. It can only serve pages to itself. The worst this might do is give you a hint how to hack my SQL database if you were already sitting at my computer. And if you get that far then you probably have a gun, and you can just point it at me and I’ll tell you the password is “wefightorwedie”.

Once PHP is connected, we get our first real query:

SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'

This one pulls a bunch of information out of the (very small) options table. This is all basic stuff like what theme is in use, what the homepage is, the preferred date format, the title of the site, how many posts should appear on the front page, and so on. Obviously the script needs this information before it can do anything else.

SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  WHERE 1=1  
    AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish')  
    ORDER BY wp_posts.post_date DESC LIMIT 0, 12
SELECT FOUND_ROWS()

Now it’s time to begin building the front page. I have the blog set to show the most recent 12 entries, so this query looks into the table of all posts and asks for the most recent 12 that are published. This query returns the unique ID numbers for the 12 posts in question.

SELECT wp_posts.* FROM wp_posts WHERE ID IN (41602,41591,41519,41384,41474,41480,41322,41455,41408,41172,41417,38241)

Now that it has the ID numbers, it asks for the data on each of those posts. This test was done during the site blackout, so the first number is the ID of this post that was at the top of the page at the time: Welcome To the New Site. The very last number is the ID of the post that was at the bottom of the page: Borderlands Part 23: The Big Googly Eye of Helios. You can see these numbers in the address bar when you’re browsing the site. (This post has ID# 41607.)

SELECT  t.*, tt.*, tr.object_id FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category', 'post_tag', 'post_format') AND tr.object_id IN (38241, 41172, 41322, 41384, 41408, 41417, 41455, 41474, 41480, 41519, 41591, 41602) ORDER BY t.name ASC

Here is where we get knee-deep into the sorcery. In a sane universe, each post would belong to one category. But years ago, WordPress decided to generalize their category system. You had categories of postsFor example, this one is filed under “projects”.. You can have categories of links if your blog uses a blogroll. And finally you can add “tags” to a post. I haven’t used tags in ages, but if I did you’d see them at the end of a post. If I file a post under “Comic Books” and tag it with “spider-man” and “dr-doom”, then technically you can click on any of those three things and see all posts filed under those terms.

But WordPress took all three of these concepts and stuffed them into the same database table. What this means is that if you want to know what category a post is in, you have to do this roundabout thing where you have to deal with four tables. There’s the wp_terms table that lists all the terms. Then there’s wp_term_taxonomy which contains the description for reach term. Then the wp_term_relationships table links the terms (like ‘DM of the Rings’) to the descriptions (like ‘A Webcomic of Lord of the Rings as a D&D campaign’). Then wp_termmeta holds extra data about each category. (The table is empty on my site.)

That’s a lot of database juggling. I don’t know enough about the design of WordPress to know if this was a good idea or not, but I will say I’m glad I’m not the one who has to maintain the code that handles this.

If I’m reading my mySQL right, then this INNER JOIN is asking the database, “Hey, I’ve got this list of posts. Look in this other cross-reference table and find the taxonomy that matches the category of this post.” This would give the category description. Which means this query is possibly a big waste. Category descriptions exist on my site from back in the old days, but they’re no longer shown to the user.

SELECT post_id, meta_key, meta_value FROM wp_postmeta WHERE post_id IN (38241,41172,41322,41384,41408,41417,41455,41474,41480,41519,41591,41602) ORDER BY meta_id ASC

Here we’re taking a peek in the post_meta table. This is a generic table where site admins like me can stuff extra info not covered by the regular WordPress database. In my case, I use it to store the splash image you see at the top of every post. So here WordPress is just getting that info for our 12 posts.

SELECT option_value FROM wp_options WHERE option_name = 'can_compress_scripts' LIMIT 1

Huh. I don’t know why it’s asking this now. There’s an option in WordPress that allows it to compress Javascript and CSS. Fine. But why is it asking this in the middle of building a page? It seems like this query could have been done with the very first look into the wp_options database above.

SELECT * FROM wp_posts WHERE ID = 34542 LIMIT 1
SELECT post_id, meta_key, meta_value FROM wp_postmeta WHERE post_id IN (34542) ORDER BY meta_id ASC

What is going on here? We’re querying another post, and its metadata (the splash image). The problem is, there’s no such thing as post #34542. Doesn’t exist. I can’t explain this.

SELECT YEAR(post_date) AS `year`, MONTH(post_date) AS `month`, COUNT(ID) AS posts FROM wp_posts  WHERE post_type = 'post' AND post_status = 'publish' GROUP BY YEAR(post_date), MONTH(post_date) ORDER BY post_date DESC

This one tripped me up for a bit, because I was expecting the queries to go down the page. But here it looks like it’s doing a query to create the monthly archive dropdown you see at the top of the page. Which makes me wonder: Does anyone ever use that thing? I don’t know if I’ve ever explored a site using the monthly archive. (I’m not thinking of getting rid of it. I’m just generally curious when this gets used.)

SELECT  t.*, tt.* FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('category') AND tt.count > 0 ORDER BY t.name ASC
SELECT term_id, meta_key, meta_value FROM wp_termmeta WHERE term_id IN (12,607,609,102,1,287,14,530,587,55,611,5,498,610,242,13,508,7,468,9,111,16,11,66,10,4,8,554,188,15,608,120,598) ORDER BY meta_id ASC

And here it creates the category dropdown. I actually use this one all the time.

SELECT * FROM wp_users WHERE ID = '1'
SELECT user_id, meta_key, meta_value FROM wp_usermeta WHERE user_id IN (1) ORDER BY umeta_id ASC

Okay, at this point the script is FINALLY walking down the page and assembling the individual posts. It’s on the top post. I’m the author, and it looks up my user so it can put my information at the top of the post and my byline at the bottom.

SELECT  t.term_id, tt.parent, tt.count, tt.taxonomy FROM wp_terms AS t  INNER JOIN wp_term_taxonomy AS tt ON t.term_id = tt.term_id INNER JOIN wp_term_relationships AS tr ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy IN ('category') AND tr.object_id IN (41602) ORDER BY t.name ASC
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date < '2018-01-24 06:34:21' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (111) AND p.post_status = 'publish' ORDER BY p.post_date DESC LIMIT 1
SELECT p.ID FROM wp_posts AS p  INNER JOIN wp_term_relationships AS tr ON p.ID = tr.object_id INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE p.post_date > '2018-01-24 06:34:21' AND p.post_type = 'post' AND tt.taxonomy = 'category' AND tt.term_id IN (111) AND p.post_status = 'publish' ORDER BY p.post_date ASC LIMIT 1

Ah! And now I understand why the front page is generating so many queries.

Here it’s starting with the post at the top, which we already established is #41602 – Welcome To the New Site. The first line is looking up category descriptions again, which it should already have and which my site doesn’t use anyway, but whatever.

The second line is doing a search for a post in the same category that was published directly before this one. The third line is looking for the post after this one. These queries are generated by the buttons at the top of every post. These are designed to allow you to conveniently step through a long-running series a post at a time.

So creating the previous post / next post buttons takes 3 database queries. One to query the totally unused category description, and the other two to get the previous and next posts. It goes through this process 12 times as it walks down the page.

CONNECT	[email protected] ON promo
Query	SELECT * FROM entries
Quit	
Quit

And here it gets to the bottom of the page. It connects to the “promo” database, which is a database I added, entirely apart from WordPress. It stores all of those little post suggestions you see at the end of an article. I actually request the entire table, all 130+ entries. This is probably inefficient, but I honestly can’t think of a better way to handle it. I want several random entries from the table and I can’t see how to do that. I’ve read some tutorials that suggest using RAND(), but RAND () doesn’t seem to be part of the version of mySQL I’m using? I don’t know.

This is an inefficient design, but the database is small enough that it doesn’t really matter.

Once it has the promo, it terminates its connection to both databases and we’re done.

Wrapping Up

Most people don't know this, but databases are actually shaped like 3-layer cakes.
Most people don't know this, but databases are actually shaped like 3-layer cakes.

It looks like a lot of the database traffic is my fault. To make the prev / next buttons for each post requires 3 queries. 3 queries x 12 posts = 36 total queries. Remember we counted 60 total queries in this single page-load, so just over half of all database transactions are spent making those links.

I suppose there are some things I could do to speed this up. I could leave the chronological navigation buttons off of the front page. It’s unlikely that a user will use them from there. Think about it: Load the front page, read the first couple of intro paragraphsAssuming it’s a good day and I didn’t put the whole post on the front page. and then LEAVE this post without reading the rest of it so you can read the previous one? I’m not saying it never happens, but it does sound like an unusual situation.

The other thing I could do is restrict chronological navigation to certain categories. Clearly the big retrospectives, the webcomic, and the weekly column need them. But Notices? Does anyone really page through the Notices category, browsing my announcements on site changes?

I’m not saying I’m planning on doing anything right now. It’s just something to think about.

In any case, I think I’ve demonstrated that my site generates a lot of extra queries. It looks like a more conventional WP site will generate between 20 and 30 queries. Maybe that’s a lot, maybe it isn’t.

I don’t know if we leaned anything but we definitely just killed ten minutes. You’re welcome.

 

Footnotes:

[1] Content Management System

[2] Assuming my spam filter doesn’t decide to harass them for no reason.

[3] Technically you’re still competing for CPU and memory with the other users, but now you’re supposedly guaranteed a fixed share of it. Or something.

[4] For example, this one is filed under “projects”.

[5] Assuming it’s a good day and I didn’t put the whole post on the front page.



From The Archives:
 

117 thoughts on “Database Abuse

  1. Olivier FAURE says:

    Eh… I’ve occasionally used all the features you’re considering removing, and I think having them be unavailable for unpredictable reason would be a little annoying, but I guess you could do that if your performance really suffers.

    Re: the database clutter, it sounds like the major problem is that the SQL calls aren’t bundled nearly enough. I’m not very good at SQL, but I don’t see any logical reason that the database couldn’t do one

    SELECT current_post, current_post_image, category, prev_post, next_post FROM db

    call instead of having like a billion calls for the same data. It’s not like it would even require hard plumbing, since most of the times the only thing you’re doing is adding posts in chronological orders, and for the rare exceptions you can just sort the entire database again (though I’m guessing WordPress doesn’t really let you customize database calls anyway).

    1. EwgB says:

      If I got it correctly, the previous/next buttons need the posts from the same category, so it wouldn’t do to just select three posts in chronological order. That’s why WordPress is doing all that JOIN magic.

      1. Abnaxis says:

        But it already knows what category the post is in. I’m, pretty sure there’s a way to index into a sorted list (which ORDER BY does), so I bet there’s a way to say “grab this post, plus the next one above and below after you ORDER BY date.”

        I don’t know how efficient sorts are in a mySQL database, but it’s doing two separately when it’s getting the next and previous posts, which to my mind MUST be eating up some time.

        1. guy says:

          Actually, probably not. If you look closely, the next query is pulling out only posts with dates greater than the current post, while the previous query is pulling out only posts with dates less than the current post, so the ORDER BY statements are sorting different lists entirely. So it’s only the WHERE clause that’s getting run twice, and in return the sorts are on less data. So it runs an O(n) (probably) search twice to get k and j entries, then runs an O(klog(k)) sort and an O(jlog(j)) sort rather than running an O(n) search once and then an O((k+j)log(k+j)) sort. There are sorts that are faster than O(nlog(n)), but they depend on knowing certain properties of the input to be sorted.

          That probably doesn’t save more time than the joins take, but the database might cache the join and not actually run it twice in a row.

          1. guy says:

            Okay, now I’ve checked and my intuition was wrong; the cost of running the initial search twice is more than the savings from splitting the data set to be sorted.

          2. Droid says:

            The only sorting algorithms I know that are faster than n log(n) basically cheat by adding a “constant” m that implicitly depends on n (in the sense that every real-world use-case with very large n will also have very large m). But since it’s a constant and not a variable, you can hide it in the big-O, and therefore your algorithm is O(n) or whatever.

            Also, since k+j = n-1 (assuming it’s “dates strictly less than” and “dates strictly higher than”), I’m not sure O(n) + O(k log(k)) + O(j log(j)) is faster than O(n log (n)). Of course it will depend on what the constants hidden in the big-O of the search and the sort actually are, but if you think about it in the following way, with C being the constant for the search and D being the constant for the sort:
            C*n + D*(k log(k)) + D*(j log(j)) < C*n + D*(n/2 log(n/2)) + D*(n/2 log(n/2))
            (since an equal split is the best possible split, and k+j < n).
            Then we get that
            C*n + D*(n/2 log(n/2)) + D*(n/2 log(n/2)) = C*n + D*n*log(n/2) = n*(C+D*log(n)-D*log(2)).
            On the other hand, O(n log(n)) would just be D*n*log(n).
            This would mean that splitting into two queries is faster only if C < D*log(2) ~= 0.69*D.

            Uuhhh, sorry, my brain was still in math mode…

            1. guy says:

              The only sorting algorithms I know that are faster than n log(n) basically cheat by adding a “constant” m that implicitly depends on n (in the sense that every real-world use-case with very large n will also have very large m).

              Not necessarily; the constants tend to depend on the number of possible distinct values for entries rather than the number of entries. In a lot of real-world applications, that’s unhelpful because you’re sorting unique things, but if you wanted to order WoW characters by level then a time complexity of n+100(counting sort) is a lot better than nlog(n).

        2. Uristqwerty says:

          Databases love to cheat, so chances are that wordpress told the DB to maintain an index of posts by date, and the DB can skip sorting entirely and just scan the index list from the starting point until it finds a post matching all of the conditions.

          Further, an index can contain more than just one field, so it might even be set up so that it doesn’t have to load the main table at all.

    2. guy says:

      The necessary information for the queries is scattered across multiple tables, so you need a different merge for getting next and previous posts than you do for getting post information. It’d probably be possible to merge the three queries into one, but it’d be a really convoluted query and probably not actually particularly faster.

      1. Olivier FAURE says:

        It could if WordPress cached the “previous post” and “next post” values alongside each post (would also imply less sorting and joining for an operation that always returns the same value).

        There would be some plumbing to do to avoid cache invalidation (eg if a post is deleted or changes categories), but since 99.9% of use cases are “Add a new post”, you can just re-sort the entire database for every other operation (which, if I understood those queries, is basically what WordPress does right now, except they do it every single time you load a page).

        Do that and bam, a single query can load everything you need for an arbitrary number of posts, with minimal sorting.

  2. EwgB says:

    I can tell that I never use the month dropdown. Or the categories. Or the front page for that matter. I only come to the blog via my RSS reader (yeah, I’m one of three people in the world still using one!). I do use the previos/next post buttons, but only those in the post itself, not the front page.

    1. CJK says:

      I’m the second. I’m pretty sure Shamus himself is the third.

      1. redsoxfantom says:

        I’m the forth!

        (obligatory “There are dozens of us! Dozens!”)

        1. MichaelGC says:

          I don’t think Shamus does use one – vague memory he’s mentioned not doing so at some point (probably when things were formatting weirdly for RSS). So, that bumps you up to third, and I shall go fourth!

          1. Cubic says:

            I too use RSS, my stalwart grognards.

            1. Erik says:

              Add another to the RSS contingent. What’re we up to now, on all the reply threads? 7? 10?

              I would be completely lost without my RSS feeds – it’s how I see the world, in a very real way. I have probably ~1000 feeds, categorized and subcategorized, and it would take me months to even come close to recapturing them (and could never get all of them). I should probably download the current list and back it up more often….

              1. Blake says:

                Another RSS person here. Look at all of us!

                1. Philadelphus says:

                  We are legion!

                  1. Alan says:

                    A very small, but loud legion. :-)
                    – Legionaire #12

                  2. Milo Christiansen says:

                    Heck, I wrote my own!

                    RRS Forever!

      2. Asdasd says:

        I was the fourth, but I lost my list of subscriptions when I had to recover from a hard drive crash. This was the final nail in the coffin of my webcomic-reading life :(

    2. MichaelG says:

      What are people using these days instead of RSS? Twitter, Facebook?

      1. Viktor says:

        Storing all bookmarks according to the day of the week they update and then right-click>Open all in new tabs on that day.

        1. Penn says:

          That sounds like a nightmare. RSS is the obvious way to go and I can’t figure out why everyone doesn’t use it.

          1. Cubic says:

            I have something like 800 feeds in my RSS reader so the bookmark approach just doesn’t work.

          2. Rick C says:

            As far as I can tell, that’s because Google killed an RSS service they used to provide.

            1. Primogenitor says:

              The did kill GoogleReader but because this is the internet, it has a living clone at https://theoldreader.com which you can use for free.

        2. MelTorefas says:

          This is me. Except, I open them one at a time for some reason.

        3. Raphaël says:

          No need for that, you can use this wonderful add-on which does exactly that! That’s for firefox, but I bet a similar add-on for chrome exists.

        4. Olivier FAURE says:

          I went back to doing that when Firefox Quantum was released and switched to a permissions model for extensions, that’s incompatible with having a in-browser RSS plugin.

          That was pretty annoying, so I’ve switched to Inoreader instead. Much simpler. In the long term, I think I’ll just use a self-hosted RSS reader, have it run in the background at all times, and connect to it in localhost with firefox.

      2. John says:

        I just check the site every day or two. It has honestly never occurred to me to do anything else. Shamus posts fairly regularly, after all.

      3. I used twitter and blog rolls/webcomic lists from other blogs/webcomics.

        These days it’s mostly just “huh, I haven’t looked at X, Y, or Z blog/webcomic in a while.” At which point the monthly thing is great, as it allows me to quickly more or less figure out when I last read and go forward from there.

        But I never use Shamus’s monthly thing cause I check here daily! (It’s one of 2 blogs I check daily, the other’s an elderly woman’s knitting blog)

        1. Lanthanide says:

          That’s quite strange company for a gaming / nerditry blog to keep as a ‘most-read’ companion.

    3. Kasumi Ghia says:

      feedly.com reports it has 1K followers of the RSS feed.

      1. Philadelphus says:

        Yeah, I’m one of them. Though I often just use it for notification during my morning routine and look up the article on my phone during my commute

        1. MichaelGC says:

          Aye, me too. (I’m probably two or three of them, actually.) I use Feedly regular-style for most sites but it’s probably a little superfluous for twentysidedtale, as I tend to check in a couple of times a day anyway to see what Daemian is folks are saying in the comments…

          PS I wonder can we do emojis now the site has settled a little? ?

          Edit We cannot. We can do a good line in oddly placed ? questionmarks, though.

          ?

        2. Cuthalion says:

          I am another. Any content that I want to keep regularly up-to-date with, I use RSS for. Why keep checking a bunch of sites for updates if I can see in one place which ones have new stuff and click straight to the new stuff?

          I almost never navigate to the site directly or check the site itself. The only time I come through a route other than RSS is if I’m showing someone the site or looking for an older article.

  3. DanMan says:

    In reference to your “maybe the promo database is inefficient”, in terms of Database usage, it’s the most efficient it can be apart from adding indexes and querying by them. If it’s a small enough table, then even querying indexes is more intensive than just dumping all the data.

    Every time you have a conditional on the query, it slows down the query. SELECT * FROM table is very efficient Database usage. Pretty much the only way to get more efficient would be SELECT * FROM table WHERE primary_key = ‘id’. Again, unless the table is large enough to be taking up multiple pages on the disk causing the hardware to need to move to retrieve the data, a whole dump is fine.

    Source: Worked as a data developer with Database Admins on SQL tuning for 2 years

    1. thak777 says:

      Sorry, but SELECT * FROM is only a good idea for VERY small tables. And if you have tables without indexes (designed to help your WHERE clauses), you have a bad database design. Period.

      I/O from disk is the slowest thing you can do, and when you run a query like this, you’re not limiting either the columns or the rows. It’s going to hit every. single. row. in the table and pull every.single.column back, whether you are going to use it or not.

      Limiting the amount of data–rows and columns!–that comes back from a query is one of the best things you can do to limit memory pressure on your cache, and reduce the amount of I/O (physical reads) you have to do from disk.

      Source: 20+ years as a database programmer/software architect/database professor

      1. FluffySquirrel says:

        Yeah, but you’re unlikely to get anything like that in a platform which is designed to be as modular as wordpress is, which is ultimately why most things like this end up with a bit of overheard in the amount of queries and such. When any extra plugins can be adding column names, it does often just come down to ‘ehh, just get all the columns’ so whatever can use them

        They’re not wrong in my experience regarding the joins, it’s usually the massive joined queries which can cause a lot of the slowdown.. but even then I’ve seen things be fine on one system and slow as hell on another.. I suspect a lot of that just comes from the individual hosts particular setup of MySQL.. so often I find database performance a bit of a crapshoot

      2. silver Harloe says:

        We know the promo widget only has 130 rows. If we assume it also has, say, 4 columns (id, post id to link to, image reference (href? I don’t see him joining image_ids to an image map table), and the short text blurb), then I think we’re fine with the ‘select * from’…

        Though then we have to choose 6 random things from the return data in PHP’s bloated “mixed array and hashmap monster it calls ‘array'”… and I wonder if Shamus really can’t use “order by rand() limit 6” on the select – maybe his info that “rand() isn’t a thing” is based on the version of mysql he had at his old host. Since he just migrated he should be on the latest mysql now (because his new hosting provider should be keeping it up to date for possible security patches even if not for any other reason)

        1. Shamus says:

          columns:
          ID, title, image, url, description

          ID is just your standard autoincrement index. I tried using rand() on the new host. Also randomize (), which I saw in other examples. mySQL always acted like it was invalid. I can’t explain that.

          1. MrPyro says:

            How are you actually using RAND() in your query? Normally RAND is part of the ORDER BY statement, so you’d have something like:

            SELECT ID FROM entries ORDER BY RAND() LIMIT 6;

            Which would return 6 random IDs from the table; you can then get the rest of the details from there using the IDs as a search term.

  4. onodera says:

    The number of queries is usually not very indicative of page load performance. Yes, each one adds some fixed overhead, but you (the reader) should profile your page to understand which queries are slow first, since bad schemas and bad queries are the source of slowdowns much more often.

    1. silver Harloe says:

      This. Programmers are the worst, THE WORST, at guessing what’s actually making their code slow. Always profile. I mean, for PHPMySQL, a bunch of times you’ll find the connection and login is slower than your actual queries. Unless you have a bunch of crazy joins.

      1. Daemian Lucifer says:

        But this isnt Shamus’s code.Its code from someone else used by a code from another someone else,and Shamus is just trying to find his way in that whole mess.

  5. Yerushalmi says:

    I don’t know if I’ve ever explored a site using the monthly archive. (I’m not thinking of getting rid of it. I’m just generally curious when this gets used.)

    I’ve used it on many occasions. Generally I use it when I’ve first discovered a site and want to see what sort of content it’s had through the years (and not just the most recent stuff). Or if I’ve been following a site for long enough to know that I like it, and am hungry for more content than they’re currently providing, so I jump way back into the archives and read stuff from before I started following.

    In your case, however, I’ve never used it – I started following you close enough to the beginning of the site that I went back and read your entire archive in chronological order, and ever since then I get your posts via RSS feed. So I’ve never had the need to search through old stuff you’ve done to see if there’s anything I haven’t seen yet.

    1. MichaelGC says:

      I’ve used this site’s archive in exactly the manners described in your first paragraph, aye. Haven’t needed it for a while now, though.

  6. Daemian Lucifer says:

    Is this a bug,or a feature?

    https://imgur.com/a/ZcUEy

    Whatever it is,its trippy.

    1. Echo Tango says:

      I was going to post the same glitch! :)

      I’m on Firefox, Ubuntu, up to date.

    2. Dreadjaws says:

      It has to be a bug, because it makes the full lines of code unreadable (not that particular one, though). Apparently, the text box gets larger when you hover your mouse over it, but if the line of code is longer than the allotted space for the page, you’re toast.

      Not that I can understand most of this code anyway, but still, if someone wants to see it, it’s a problem.

      Unless Shamus doesn’t want people to look at the full code, and in that case it is a feature.

      1. Daemian Lucifer says:

        Nah,when theres more text the scroll bar remains there.It just disappears from this one because the text box is increased juuust enough to display this whole line.

        Though if your screen is already filled by the regular text of the blog,this increase would indeed put the text out of bounds.

        1. Dreadjaws says:

          Yeah, the scroll bar remains, but if you move it to the right, it won’t let you reach the end of the text (example: https://imgur.com/a/gJksX). Note that the black bars are where the blog “ends” for the article, but as the background shows, it’s still part of the blog.

          1. In fact, you can get to the end of the text box by using the L/R arrow keys.

    3. Cordance says:

      They behave strangely if the string is longer than the column wide. If you mouse over the box it will display all the text to the right. However then the box would change shape as a scroll bar is no longer needed so your mouse is no longer located in the box and it would collapse back until the mouse is located right of the box where it expands and is stable. Im guessing it is a unwanted effect.

  7. Aquarion says:

    Yeah, that’s a fairly standard set of queries for a WP front page. It’s worth noting that a decently configured MySQL server will have query caching, so in a live environment a lot of those queries won’t be interrogating the database per hit.

    The easiest way to speed up the front page would be to use something like WP Super Cache, configured to render less dynamic pages (Front, Month index, Category index) and serves them staticly (regenerating when something would change them), leaving your post pages dynamic (to avoid screwing up edit plugins and such). That’ll mostly avoid a badly behaved bot slowing down your site by requesting high CPU pages a few a second, but comes at the cost of dynamic content on those pages (like your entries) being random-per-cache-expiry rather than random-per-hit.

    The best thing I did for my site speed was to offload static assets to a dedicated CDN cache (I’m using Amazon Cloudfront) also with WP Super Cache.

  8. Hal says:

    FYI, this post was causing Chrome on Android to lock up and crash for me.

    1. Echo Tango says:

      Maybe you’ve got an old phone, or the company that you bought it from put in some custom Android nonsense. I know the last phone I had stopped receiving Android updates, because the company who sold it didn’t feel like porting them to that model of phone anymore, after 2 years. (And the hardware would still last for several more years…)

      1. Hal says:

        It’s a Galaxy S8, so it’s certainly not about the phone being outdated. And it’s on Chrome for Android, not the default Samsung internet browser.

        1. Echo Tango says:

          I’ve got Android and Chrome too, no problems.

    2. guy says:

      Same; clearing the cache in settings>storage fixed it.

    3. Zerbin says:

      Same. Brand new phone, Chrome for Android. Frozen solid.

  9. Daemian Lucifer says:

    Think about it: Load the front page, read the first couple of intro paragraphs[5] and then LEAVE this post without reading the rest of it so you can read the previous one? I’m not saying it never happens, but it does sound like an unusual situation.

    Not that unusual actually.Since I did that somewhat recently,Ill give you an example why:Not being able to read a few entries of one of your long running series when they are posted,I decided to binge them on a free day.So,seeing how the most recent one was on the front page,I immediately opened the previous one in a new tab while opening the recent one in the current tab.

    1. MichaelGC says:

      Indeed, or: “Ah, Bobby the Tongue is writing about Batman again; I shall re-read the previous post to remind me where we’re up to.” Work of a moment to just open the new post and navigate back from there, though! Definitely not critical to have that ability from the front page.

    2. Duoae says:

      I do this a lot. I tend to bulk read entries if life has been particularly busy or I didn’t want to read a series when it started.

      Usually, it happens when I’m reminded of a series by a new post and realise i didn’t get around to reading the last entry. I don’t want to click through to the article in order to go back a post in the series – that’s fairly inefficient.

    3. Bubble181 says:

      Or when I’ve been away for a week or more, and I see a new post, start reading, realize “wait, I haven’t read the last one yet, apparently”

  10. MichaelGC says:

    For example, this one is filed under “projects[.]”

    Last used in May of 2015, so great to see it getting work-out!

    We’re over 400,000 comments these days. Crazy.

    Right, but approx. 380,000 of those were Daemian.

  11. Ryan says:

    Developer and DBA here.

    First, your comment about the inner joins is slightly off. The terms and taxonomy/description are directly related. The relationship table is for linking the term/taxonomy to the *post*, not for linking the term to the description. Also, inner join is probably better to describe as the equivalent of intersect on a venn diagram, so it’s more like “give me A and B, where A/B are related by X, but ONLY if A and B BOTH have a row that matches for X — if only one has it, don’t bother me.” In your case, *every* post will have a taxonomy/term it links to, so you won’t have to worry about that side being missing from the venn diagram.

    Second, I’m almost certain that the query asking for the post that doesn’t yet exist is the part of the page that’s filling out the “Next” part of “Next Post / Previous Post” links, but in this case it’s the generic version, as opposed to the category-specific version you correctly labeled as part of each post. It doesn’t know that the first load wouldn’t have a “next” to go to until after it executes that, because the script is simply standardized for every run. But here’s the thing: I don’t see you using the next/prev option at all on the page (and you’re using standard paging controls rather than id-driven paging), so that query actually is almost certainly useless.

    Last, the other comment that “select *” is efficient is true, at least as it relates to small tables, because it’s the “where” clause that adds processing cycles. 130-ish entries is not going to stress anything there. Though I would add that the “WHERE 1=1” in the row search query near the beginning is unnecessary.

    1. Abnaxis says:

      I was going to ask about the “WHERE 1=1” clause. Why in the world would you ever add that to a query?

      1. Dev Null says:

        The query is being generated dynamically. It was easier (note “easier”, not “more efficient”) for the programmer to always have a WHERE clause, and just drop in the conditional at runtime. If there is no condition, they just default to “1=1”. Which should have 0 effect on the efficiency, since the database engine is smart enough to figure out that this is static, and always true.

        1. Ryan says:

          You’re almost certainly correct about why it’s there — the script can slap on as many “AND [Something]” statements as they need, or none at all, and it won’t break, and in return the developers don’t have to worry about making the first add-on a “Where” instead. But it is also lazy.

          The good news, though, is that as you recognized it is *merely* unnecessary, as in any decent DB iteration the SQL evaluation and generation of the execution plan would strip it out, so it’s just a syntactic legacy of the laziness, not an efficiency concern.

          1. Abnaxis says:

            That’s actually another question I had–just how much of this is automated, and how much does Shamus have control over? It’s hard to tell from the article

            1. Dev Null says:

              It’s all automated. The WordPress developers wrote all the code; Shamus can only really affect it by adding or removing elements of his site that cause WordPress to run queries.

              (Well, he could modify the WordPress code himself, but that way lies madness.)

              1. FluffySquirrel says:

                I dunno about madness, it can actually be super simple. Or it can be horrendously complicated. Welcome to third party mods

      2. Leonardo Herrera says:

        Because the query was generated by some database access abstraction / template.

      3. Shamus says:

        The base of the statement is:

        SELECT * FROM wp_posts WHERE 1=1

        And then onto that it adds a bunch of conditionals. If we’re looking for stuff that’s been published (to exclude drafts) then we can add:

        AND post_status=’published’

        Which would give us:

        SELECT * FROM wp_posts WHERE 1=1 AND post_status=’published’

        As the WordPress code runs, it continues to build this query. One bit of code adds on “AND post_status=’published'” Another one adds on something like “AND post_date>some_date”. Just stick them together like legos.

        But let’s say we left off the 1=1 condition, which is, as you noticed, completely useless. Then in a query with NO conditionals (effectively, “show me everything”, which is what I see in the admin panel) you’d end up with:

        SELECT * FROM wp_posts WHERE

        And this would generate an error, because “WHERE” needs to be followed by a conditional.

        The OTHER way you could do it is to start without the WHERE clause:

        SELECT * FROM wp_posts

        And then every bit of code that adds a conditional, must first check to see if it is the FIRST conditional, and if so, then it will add WHERE thing='value' and if it’s not the first then add AND thing='value'

        Basically, having a do-nothing conditional at the start means the PHP code can be simpler and one bit of code doesn’t need to know or care what other parts of the code might be doing with the query..

        1. silver Harloe says:

          The other other way they could have written it would be to make to store filters in an array rather than immediately tack them onto a sql string:

          $filters = [];
          if ( $some condition ) $filters[] = “post_status=’published'”;
          if ( $some other condition ) $filters[] = “some other sql”;

          if ( count( $filters ) > 0 ) {
          $sql = ‘select * from wp_posts where ‘ . join( ‘ and ‘, $filters );
          } else {
          $sql = ‘select * from wp_posts’;
          }

  12. Will says:

    A few potentially-interesting technical asides.

    When you visit this site, your web browser requests a document called index.php, which is a PHP script. The server opens up the script and runs it.

    This style of building a webapp, where each page load results in running the page script from scratch, is called “CGI” (“Common Gateway Interface”, the etymology of which is far too obscure to get into here). It has largely fallen out of favor for modern webapps because it’s relatively slow compared to having a server constantly running, with all the code already loaded and slowly-changing values (like the config options or post text) cached.

    (Actually, because this is tech and we will never ever replace anything when we could hack around its deficiencies instead, the site almost certainly uses a technology called “FastCGI”, which runs a server that preloads the script interpreter and may also cache the scripts, though they still actually run from scratch and can’t store config or cache content. It is, per the name, much faster than regular slow CGI, but also complicated and very easy to screw up.)

    So creating the previous post / next post buttons takes 3 database queries. One to query the totally unused category description, and the other two to get the previous and next posts. It goes through this process 12 times as it walks down the page.

    This is a really common problem with automatically-generated queries, called the “N+1 query problem” (the first query provides N results—in the case of Shamus’s front page, always 12—and then the system generates one more query to get the details of each of those N). For pulling 12 text pages for the front page, the cost is relatively low, but in bad cases, there may be thousands of additional queries, and processes which should take a few seconds can take hours instead.

    As one final note, I find it interesting that this database structure (multiple tables referencing each other by fields, called a “relational database”, which is the structure of the vast majority of modern databases) is actually a fairly recent invention. In a field where 90% of the commonly-used data structures were invented in the 50s or 60s, relational databases weren’t commercially available until 1979.

    1. Alan says:

      I’m not fluent in PHP, but I was under the impression that most of the interpreter-embedded-in-the-webserver systems could maintain state between executions. Often the database connection would be shared for efficiency.

      While CGIs and CGI-like things are less efficient, they’re pretty good for small to medium scale sites, and let you host on cheap providers. Once you start maintaining your own always(ish) on server, there is a significant jump in price tag and maintenance complexity. Compared to CGI or even FastCGI, writing a persistent server that robustly handles problems and securely separates secure sessions is complicated and easy to screw up.

      1. Will says:

        I’m not fluent in PHP, but I was under the impression that most of the interpreter-embedded-in-the-webserver systems could maintain state between executions.

        I’m certainly no expert, either. The fastCGI server pooling database connections sounds very plausible. There are certainly no application-level caches, though. Every script still runs from scratch, even if some of the resources it uses are kept around from previous runs by the server.

        While CGIs and CGI-like things are less efficient, they’re pretty good for small to medium scale sites, and let you host on cheap providers.

        While it may have sounded like it, I was absolutely not ripping on CGI, and only a little bit on fCGI. I completely agree with you here. Both are an excellent approach to simple, small or low-traffic dynamic sites, and there’s something to be said for the enforced statelessness. (After all, the only thing to differentiate a persistent server is that it shares state across requests, and “shared state” is a dirty phrase amongst programmers for a reason.) It doesn’t scale up in many ways, but neither do most websites, and as you say, the complexity and cost burden are significant.

        1. silver Harloe says:

          The reason PHP became popular and remains in use is because it *doesn’t* do CGI. Instead, there is “mod_php” – a module for Apache that has the PHP interpreter as a .so (like a .dll, but for *n*x)

          Instead of “.php” being run as a CGI (or piped to a FastCGI daemon), it’s passed to a function in the server that probably already has PHP running (except right after a server restart). It still has to interpret your page, but it doesn’t have the startup cost of fork/exec and getting PHP itself running — so basically, the advantages of FastCGI without making Apache dependent on an external process that always seemed to crash and then not get restarted properly and ARGH everything about FastCGI was a nightmare for system administrators. AND you had to clear memory between page turns yourself (which mod_php takes care of for you — sadly this often means your page turns need to reconnect to the DB, but you’re already using PHP – you don’t care *that* much about performance)

          There’s also a mod_perl but it lets you hook into a bunch of internals of Apache and doesn’t provide a built-in “use these tags to just drop code onto a regular HTML page” (there’s stuff for that NOW, but decades too late).

          So basically PHP was just easier for system administrators – it had a module you tie to Apache that had configs to restrict people to code in their own directory and the common library, didn’t have any stage where it needed to be root to set up pipes or anything, and was basically considered “safe” to put on your webserver and let your customers upload their code and run it (fairly or not).

  13. TheCheerfulPessimist says:

    Well, now you can’t use the password “wefightorwedie” anymore. Your new password is obviously, “butwhatdotheyeat.”

    1. Droid says:

      “youokaybuddy”

      1. That always makes me crack a smile when I read/hear it.

    2. Ander says:

      The password was a pretty good punchline to what I hadn’t realized would be that kind of joke. It’s a clever , yet it also proves that we will never ever be done discussing Mass Effect and how it was bad before the ending.

  14. Lazlo says:

    Does it bother anyone else that the person in the “Wrapping Up” picture is apparently drawing a database schema backwards on a piece of glass? I mean, sure, that’s a handy talent if you’ve got it, but in order to use it, the room would have to be set up in such a way that this is normal.

    1. *waves hand*

      Won’t say it bothered me exactly, but I did notice it was all in reverse relative to the lady apparently writing it. (And speaking as someone who spent a while in school pissing about with mirror writing, she’s pretty good! XD )

      Random thing I just noticed: I clicked to reply here, but then scrolled on down to the end of the comments first. When I got there, I discovered that the comments just…. stopped, with no comment box available. When I refreshed, it closed my (as-yet unused) reply window and restored the comment box at the end. I’m a bit hazy today, but I think it didn’t do that before? Didn’t it show both (reply and comment) previously? Or am I going mad?

      1. Droid says:

        It always worked like that. You can click “reply here”, then the comment box moves up from the bottom to where your post will be inserted. That’s what the “click here to cancel reply” button at the top of the reply box is for, after all.

        1. Ah right. I honestly couldn’t remember. Thanks!

    2. Paul Spooner says:

      The easiest way to get this photo would be to write normally on the glass and then mirror the photo. Most people are right-handed. Subject is shown writing with their left hand. Thesis supported.

  15. The fastest code possible is the code that never need to run.

    By this I mean, ideally you should only need to pull data from te database that can not remain static for a long period of time.

    A “Static Website” should be your goal. (this is basically the TLDR; read on for a more roundabout verbose suggestion)

    Now ignoring the WordPress limitation (I’m basing my example on being able to wave a magic wand at the database itself here) here is what “my ideal” would be for your site (if I was you Shamus).

    Using this as an example https://shamusyoung.com/twentysidedtale/?p=41607

    The search input, I’d probably move that to the bottom. I never use it. I never use the monthly drop down. I don’t even use he category thing. This would free up some space to redo the twitter thing, maybe enough space r to show 2-3 tweets horizontally? The search could probably be tucked into a “search magnifying glass” along wit the RSS/All posts/comments stuff on the top right. Maybe a “hamburger menu could be used to tuck some of this stuff away?

    The article images is maybe a tad big for my taste, and I’d prolly make it a little smaller so I could fit the author/date/category/comment count to the left or right of the article image.

    After the opening paragraph (the jump) add a strip suggesting random articles, 5-7 seems a good number if presented as images.

    Then follows the article.

    After the articles last paragraph add a strip of the most popular random articles (but link to the start of that series or category), 5-7 seems a good number here too if presented as images.

    Then follows all the comments, and comment box. And then at the bottom a copyright message and maybe a contact link (to the contact page).

    And here’s the big thing. The entire page is static. So your CMS of choice would publish the article, but compile it, make it static.
    The comments can be loaded by using a javascript loader, just feed the script function the article id (this would be part of the static page as well) and it would load in the comments. By the time the user has read through the article (or scrolled through) all the comments would have loaded.

    Same thing with the random articles and popular article strips. These too would be loaded using javascript. The bottom one is no issue as it will have plenty time to load. The topmost one would need to be run early in te page rendering process. I’m not a fan of having a javascript hold back the rendering of the html and css of a page, but in this case it’s hard to avoid I guess a placeholder could be used. This does mean that the top strip may not update until the user has already scrolled down some. But it’s unlikely they’ll go for that at once and may scroll back up again later (in which case a late load javascript would have had time to finish loading the strip).

    My website provider uses Varnish and Ngnix to cache and present static files. If done right a normal user would never need to hit the database, or in my case for popular pages they never need to hit the disk at all, it’s all fetched from memory cache instead.

    One issue though are the urls. https://shamusyoung.com/twentysidedtale/?p=41607 is a dynamic url.
    You do not want a PHP script redirecting this or loading te static content, sure it’s faster than fetching it ala WordPress but it’s still overhead that can be avoided.

    Breaking urls is an issue but you could do some .htaccess magic (or apache config) magic and have it rewrite https://shamusyoung.com/twentysidedtale/?p=41607 to be https://shamusyoung.com/twentysidedtale/p/41607/ for example
    This would allow the webserver to serve a static page (which would technically be stored as https://shamusyoung.com/twentysidedtale/p/41607/index.html )

    Since your site is using WordPress you can’t do any of these things easily. But it might be worth considering it in the long run.

    The backend (CMS) would store all articles and things in a database, allowing for easy editing).

    The javascript loaders would do XHR POST requests to PHP (or other language) scripts on the server which either fetches directly from the database, or trough a intermediary cache. Having the script fetch from a local .txt file if it’s been less than x minutes avoids having to hit the database that often, that way you can have it refresh the cached strip if it’s longer than 30 minutes since last time it was cached for example.

    Could like this is “simple” and fun to write too IMO.

    Except for the few times you edit a article, the page itself do not ever need to be edited again (sitewide CSS redesigns aside). Only the comments and the promo strip(s) would need to change.

    Don’t worry about Google either, these days Google’s search engine renders the javascript (and thus the page as a user would see it) so Googling your site for stuff said in the comments would work and point the result to the article page. (If you use a XHR POST request then Google won’t index that script page, if you used a XHR GET it might end up indexing it unless you use noindex for that url in robots.txt).

    Hopefully you or others reading this got a few ideas from my ramblings above.

    It’s much faster to read from a cached file on disk rather than from a database (that could be on a different server) when you do scripts, but it’s even faster if you don’t have to run a script at all but just let the webserver fetch a static webpage and server that directly.

  16. Shamus, a lil’ tip regarding previous and next buttons.

    People do not use them that often. You can probably reduce load by doing the following.
    Have the buttons change to urls like https://shamusyoung.com/twentysidedtale/page.php?p=41607&=prev
    and
    https://shamusyoung.com/twentysidedtale/page.php?p=41607&=next

    If possible use some htaccess rewrite magic so they’ll be https://shamusyoung.com/twentysidedtale/p/41607/prev
    (if you follow what I say in the other comments that’s currently stuck in moderation queue).
    Or https://shamusyoung.com/twentysidedtale/?p=41607&=prev

    For those wondering how this will work (if it’s not obvious). You are basically sending the current “article id” along with a “next” or “prev” keyword to a script.
    That script queries the database for the current + 1 (or current -1) entry, then it’ll answer back to the browser with a temporary 302 redirect to 41606 or 41608 for example.

    This would eliminate any db lookups while for the prev and next links when loading the current article.

    In my case I almost never use the prev and next links, so the script overhead for when I do use them would be 99% less than compared to today when I view a article page.

    1. Lee says:

      The problem there is that the Previous and Next links actually include the title of the posts they point at, not just “Previous” and “Next”. Your solution requires Shamus to change his design, and removes information that the user might find helpful.

      1. “the Previous and Next links actually include the title of the posts”, OK, but the previous one can be made part of the static page (it’s unlikely a article is inserted between two older ones).

        As to the “next”, that might be more cumbersone. The latest one would have no “next”, so when publishing a new article the previous article would have to be re-rendered.

        But again, that is just a “one time thing”, rather than doing it every time a page is loaded.

        1. Ralph says:

          These links are the perfect example of where a bit of denormalisation would be appropriate.

          Store the next/previous posts title and id in the post table itself along with the rest of the posts data and you can then display the links without needing additional queries.

          The downside is you then when you add/edit/delete a post you need to update not just the post in question, but the posts either side of it so the links match up. Mess this up and your links might be out of sync (but you can always repair them based on the current contents of the table)

  17. Dev Null says:

    the monthly archive dropdown you see at the top of the page. Which makes me wonder: Does anyone ever use that thing?

    I use it infrequently on your site. Not to explore, but when I’m remembering a post from earlier in the week/month, and want to pop back in to see what new comments turned up / if anyone replied to my comments / add something new that has occurred to me.

    As others have mentioned, the number of queries isn’t going to affect efficiency much – those options tables are tiny, and you’re looking them up by an indexed field. In fact, the only table of any real size from a database perspective is liable to be the comments, and at 400,000 rows that’s really peanuts too, given that you’re almost-certainly also looking that up by an indexed field. If anything, my guess would be that any slowdown is going to mostly be due to your provider using really cheap slow storage – like something that offlines content to the “cloud” (from their point-of-view; its obviously already there, from yours) and fetches it back as required. Or else they’ve got a limited internet pipe that you’re sharing with all of their other customers, and it’s bottlenecking there.

    I wouldn’t be surprised if there was a WP plugin that would measure your database access times and dump them into a log, if you’re really interested in seeing where the biggest timesinks are.

  18. Richard says:

    Databases are very good at SELECT queries, that’s basically the entire point.

    The things to avoid are in general:
    Returning more data than you actually wanted
    – Don’t ask for all rows when you only want one or two rows. Use LIMIT etc.

    Querying without an index
    – Make sure the Primary Key is the one you actually use most often.

    Having too many indexes
    – Every time you add an item all the related indexes have to be updated. If those indices aren’t used often – or ever! – then that’s wasted effort.

    1. Dev Null says:

      As a general rule, you are absolutely correct. But for a db back-end to a webpage, you basically can’t have too many indexes. If it takes 30 seconds to post a new page, and that shaves half a second off the load time, it was totally worth it. Slightly less true for comments, but only slightly.

  19. Daemian Lucifer says:

    What is going on here? We’re querying another post, and its metadata (the splash image). The problem is, there’s no such thing as post #34542. Doesn’t exist. I can’t explain this.

    And now we know

    https://www.youtube.com/watch?v=pele5vptVgc

  20. Paul Spooner says:

    Since you did all these tests at home, I posit this qualifies as domestic database abuse.

  21. J Greely says:

    I cringe whenever I see PHP code for accessing a database: horrible flashbacks to the clowns who built the first version of our customer site. Bad enough that they were using PHP, but they weren’t even good at that; they copy-pasted everywhere instead of pulling things out into function calls, leading to indents so deep that half the code was “continued on next monitor”.

    Also, their idea of source control was to comment out large blocks of old code (with the ‘#’ in column 1), add a detailed comment explaining the new code, and then check it in. Since you usually couldn’t fit the entire line on screen, you often found yourself reading through two pages of code before discovering it wasn’t actually used. And of course they all used different editors, so half the checkins differed mostly by whitespace. Worst of all, the least-competent of the bunch actually ran a PHP user’s group in the heart of Silicon Valley…

    When I migrated my blog off of Movable Type (which, due to all the MySQL queries, ran badly on a tiny Amazon virtual), I ended up going with the static site generator Hugo, with comments managed by adding a bit of Javascript to call an Isso server (tiny and fast). 10 seconds to build ~4,500 articles on my laptop, another 10 to rsync it to the server. A lot more up-front work than importing the old data into a newer blog engine, but the resulting site is faster, much more secure, and checked into Mercurial. Difficult to extend to a group blog, but fortunately I don’t need that.

    I’ve always been annoyed by the “your web site is a database” model. It makes for pretty demos, but in production you end up bolting on elaborate caching schemes to get “acceptable performance”, because you’ve been writing queries instead of reports. Then comes the day when someone logs in and sees another user’s cached data…

    -j

  22. Sean Hagen says:

    Before making any decisions, look into using ‘EXPLAIN’ to learn why queries might be running long: https://dev.mysql.com/doc/refman/5.7/en/explain.html

    There’s tons of great posts out there that show how to use the results from EXPLAIN to figure out why a query takes forever. It might not be that you’re making 3 extra queries for every post, it might just be that you need to slap an index on something so it finishes faster.

  23. TylerDurd0n says:

    This is no panacea, but as already mentioned the biggest speed boost is achieved by making your host serve static pages instead of running every request through PHP (thus bypassing database accesses completely). There are very complicated ways to do this (e.g. WP-Super-Cache and their ilk) and then there’s NGINX.

    NGINX can be configured to locally store the output of PHP-FPM as (gzipped) HTML files and serves this HTML directly on consecutive requests, bypassing PHP entirely. So pages are served near instantly (you can configure it to automatically bypass the cache e.g. if the WordPress login cookie is part of the request) if NGINX can find a cache entry (it can even be configured to cache the file descriptors to speed up local file access) – otherwise it passes the request on to PHP again.

    Using a memcache like Redis (or old-school memcached) allows WordPress to bypass the database for static data if no cached HTML has been found by NGINX, speeding up those page requests as well. Finally plugins can be used to combine/compress all your CSS/JS into single files, reducing HTTP overhead and improve caching of these assets.

    Using “static” comment forms is tricky though – Jetpack’s WordPress comments or Disqus circumvent this as they’re implemented as iFrames (so the markup can be static and comments will still be “fresh”) otherwise you’ll need to invalidate the cache each time a new comment is posted (or use “micro caching” – very short cache TTLs, which help on pages with loads of traffic nonetheless). Without a “custom” plugin NGINX can’t invalidate single pages’ cache entries (by default you can only throw away the whole cache for a zone), so WP-Super-Cache (which handles the HTML creation/storage in PHP) might still be the better solution there.

    TL;DR: NGINX -> Fastcgi Caching -> PHP -> Redis/Memcached -> MySQL

    Also: Don’t use MyISAM.. just don’t.. ?

  24. Blake says:

    I haven’t touched databases since I was at university a decade ago, but my first thought on those prev/next things is that they should really be generated once for each post, doing a doubly-linked-list thing, with the result stashed on the post_meta table or something.

    Each time a new post is posted, it looks up the previous post in that category, if it’s found it updates its own prev field and the previous posts next field.
    It could also look up the next one at post time if you were posting in the past for any reason.

  25. FYI: your local mysql server is just an XSS vulnerability away from being pivoted to from a website. (assuming the XSS vuln is scanning your lan for open sql servers, and has targeted you, having read your blog)

    Not that you’d probably care if someone defaced your local copy of your website ¯\_(?)_/¯

    1. Noumenon72 says:

      What do you mean by “pivoted to”?

    2. Decius says:

      “Defacing” your local copy of a website with arbitrary code could be hella bad.

  26. Alan says:

    Is there a good reason to query the post IDs, then query the exact same table to get rest of the fields in the exact same table? Sure, you’ll need the IDs for later queries, but the IDs are among the data returned in the second query. I suspect it’s incredibly cheap, but it bugs me.

    My best guess is that perhaps some queries are parallel, so armed with the IDs, you can fire off a bunch of followup queries simultaneously.

  27. Decius says:

    How often will someone on the front page want to find the post in the same category as the one that they are previewing that was posted AFTER it?

    How often is that even not null?

    -12 queries per load for changing the front page not to care about that.

    1. Droid says:

      Even if it IS not null for some page, that means the “next post” shown there will definitely be on the front page further above, so you have to have already seen and scrolled past it before reaching the “next” button that leads you to that page.

      Sorry if that sentence wasn’t clear.

  28. Lord Visjes says:

    Hi Shamus, just wanted to let you know that I do use the monthly archives function. It comes in handy when I quickly want to go back to where I left off when re-reading your old post (and I’m not using the same machine). Do you still have the problem of slow loading pages or is the speed about the same as it was at Hosting Matters? I sincerely hope your hosting problems will be over soon, although I do enjoy a good rant now and then.

  29. Joe says:

    Which makes me wonder: Does anyone ever use that thing? I don’t know if I’ve ever explored a site using the monthly archive.

    I did a full archive binge of the site a year or so ago, and used that extensively. From the start up to the then-present day. Very useful.

  30. (dangit, Sean Hagen posted before me, welp, hope that Ol’ Shamus notices at least one of our comments)

    Shamus, have you captured the SQL statements used for a page, and then repeated it but with the EXPLAIN syntax added?
    That will show you how the database performs it, was there a index it could use to speed things up? If not then how did it look things up?

    Sometimes this can reveal small changes you can do to get a boost in lookup speeds. Sometimes you might find that adding an extra index across two columns will cause the same query to hit that index and avoid a slow row by row scan.

    Though using WP you are limited in what you can tweak, but comparing two similar plugins and using EXPLAIN might reveal the more efficient one..

    Optimizing Queries with EXPLAIN at https://dev.mysql.com/doc/refman/5.7/en/using-explain.html

  31. MrPyro says:

    Oh God WordPress still uses MyISAM tables. Looking at those table definitions makes my inner DBA very sad.

  32. Cuthalion says:

    At my workplace, we use a custom CMS we wrote using the Django Python framework. 60 queries wouldn’t be unusual for something like a homepage or topic page (if it had various subtopics, etc.) but still high enough that we might look for redundancies to eliminate if we’re bored. Definitely high for a page that doesn’t show many different things.

    In your case, I did think it was high, since your homepage doesn’t have a lot of different types of things it has to show. When I skimmed your SQL output, I did have a “wtf?” moment when I saw all the queries for individual posts! The next/previous buttons being the culprit makes a lot of sense.

  33. ANGELA JACKSON says:

    Wow

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply

Your email address will not be published. Required fields are marked *