{"id":38522,"date":"2017-05-07T06:47:16","date_gmt":"2017-05-07T10:47:16","guid":{"rendered":"http:\/\/www.shamusyoung.com\/twentysidedtale\/?p=38522"},"modified":"2020-08-23T19:06:24","modified_gmt":"2020-08-23T23:06:24","slug":"how-many-words","status":"publish","type":"post","link":"https:\/\/www.shamusyoung.com\/twentysidedtale\/?p=38522","title":{"rendered":"How Many Words?"},"content":{"rendered":"<p>I have been doing this site for a dozen years, but the question didn&#8217;t occur to me until now. I noticed the three-year anniversary of my Patreon campaign was coming up and I was looking for a way to quantify my overall output. The question is:<\/p>\n<p><strong>How many words do I write in a year?<\/strong><\/p>\n<p>Of course, this number will go up and down from year to year. Some years my big project is <a href=\"?p=612\">a comic<\/a> that will naturally be more image than words. Other years I end up posting most of my words on <a href=\"http:\/\/www.escapistmagazine.com\/articles\/view\/columns\/experienced-points\">the Escapist<\/a>. Sometimes I&#8217;ll focus on <a href=\"?p=25833\">video content<\/a> and sometimes <a href=\"?p=27792\">I&#8217;ll lose my mind<\/a> and write over a hundred thousand words about one videogame franchise.<\/p>\n<p>But still. Even if I don&#8217;t have a convenient way to measure stuff I&#8217;ve done for other sites, we ought to be able to get some sort of handle on how many words I write on <em>this<\/em> site, right? I mean, I&#8217;ve got the database <em>right here<\/em>. (You can&#8217;t see it, but I&#8217;m holding up the database and gesturing with it right now.) That should have all the information we need.<\/p>\n<p>I suppose the first step is to filter out the stuff not written by me. To date, 5,025 posts have been published on this site. (This includes posts you haven&#8217;t read yet, like the future entries in my <a href=\"?p=36555\">Arkham City<\/a> and <a href=\"?p=38399\">Zenimax vs. Facebook<\/a> series.) 321 of them have been written by other people, and the remaining 4,704 posts were written by me. So all we need to do is get a word count on those posts and we&#8217;ll have what we need, right?<\/p>\n<p>Well&#8230;<\/p>\n<p><!--more--><br \/>\nNot all word counts are created equal. In fact, as far as I can tell <em>none of them are<\/em>. If we look at the text<span class='snote' title='1'>I&#8217;m talking about the raw text I see in the editor, which &#8211; due to markup &#8211; is different from the text you read on the site.<\/span> of <a href=\"?p=32341\">the very first entry of my Final Fantasy X<\/a> series, we&#8217;ll see that WordPress reports it as being 1,930 words long. If I take that exact same text and post it into Google Docs<span class='snote' title='2'>I write most of my long-form stuff in Google Docs. The editor is more comfortable, the spelling and grammar checking is more robust, and I can&#8217;t accidentally publish a half-finished article when trying to save my work.<\/span> it gives me a word count of 2,149. That is not a small difference! If I paste the same text into <a href=\"https:\/\/wordcounter.net\/\">this word counter<\/a> it tells me the text is 2,085 words. And if you copy &#038; paste that post somewhere else for a count, you&#8217;ll probably get another answer entirely. <\/p>\n<p>I&#8217;m assuming the difference comes down to HTML markup. For example, in the paragraph above I&#8217;ve got the sentence:<\/p>\n<pre lang=\"html\">In fact, as far as I can tell <em>none of them are<\/em>.<\/pre>\n<p>One word counter probably counts special characters like &lt; and \/ as word breaks, and another only counts whitespace. So one will see &#8220;<code>&lt;em&gt;none<\/code>&#8221; as the word &#8220;em&#8221; followed by &#8220;none&#8221;, and the other will see it as one big word. From experimenting, it looks like the WordPress counter is actually smart enough to pull out HTML, so &#8220;em&#8221; won&#8217;t get counted at all. Which means the WordPress count is probably the number we&#8217;re interested in. <\/p>\n<p>This is still not perfect. I think the WordPress counter gets confused by the shortcode markup I use for images, footnotes<span class='snote' title='3'>Like this one.<\/span>, and YouTube embeds. But this is basically close enough for our purposes.<\/p>\n<p>The more serious problem is that the word count isn&#8217;t stored in the database. If I want to know the word count of a post I have to open up the post in the editor and look at it. Not to sound lazy, but I don&#8217;t actually want to spend two full work days opening up 4,704 individual posts in the WordPress editor. The editor is not snappy and it does not open quickly. It takes several seconds to open a post<span class='snote' title='4'>You can see why I prefer to write in Google Docs!<\/span> and there&#8217;s no good way to navigate between published posts chronologically. <\/p>\n<p>About the only thing we have to work with on the database side is a brute-force character count of the text. That&#8217;s ugly. I think the best I can do is look at the character counts and compare them to the displayed word count. That should give me a ballpark &#8220;characters per word&#8221; that I can use to derive the numbers we need.<\/p>\n<p>I gather up the last dozen posts and look up their word counts. I put them in a Google spreadsheet along with their character counts, and it tells me I average about 6.64 characters per word. So if the database tells me a post is 10,000 characters long, it means it was in the ballpark of 1,506 words. <\/p>\n<p>That sounds high. The average in standard English text is 5.1 characters per word. While I&#8217;d love to claim that I&#8217;ve got one o&#8217; them fancy vocabularies that lets me use a lot of fifty cent words, I think this inflation of word length is more appropriately blamed on all the HTML and shortcode. Still, maybe the last couple of weeks have been atypical? Just for the sake of completeness, I do the same experiment again using every single post I wrote in June of last year. During that time I wrote 37 posts. I open each one in the editor to get the official HTML-free word count. Added together they come to 34,337 words. If I divide the number of characters by this number I get 6.63. <\/p>\n<p>Wow. That&#8217;s amazingly consistent. I think I can proceed feeling pretty confident that I wrote a word for every 6.6 characters in a post. In any case, we finally have what we need to answer this stupid question: How many words do I write per year?<\/p>\n<p>All I need to do is get the numbers out of the database. I&#8217;ll admit I&#8217;m pretty rubbish at SQL. I&#8217;m one of those people who knows <em>juuust<\/em> enough to be dangerous. I never have the guts to perform changes to the database via mySQL. My interactions are strictly read-only. Here is what I come up with:<\/p>\n<pre lang=\"sql\">select SUM(CHAR_LENGTH(post_content)) FROM wp_posts WHERE post_author='1' AND \r\npost_status='publish' AND post_date >= '2017-01-01' AND post_date < '2018-01-01' LIMIT 10000;<\/pre>\n<p>(The \"limit 10000\" is because I'm feeding these queries into the database via phpMyAdmin, and if you don't specify a limit it defaults to 10.)<\/p>\n<p>If I do one of these for every year since the site's inception, it should give me the character counts. Note that this count is based on the year starting Jan 1st, and not in September when the site was launched. This means the first \"year\" is just a few months long. <\/p>\n<p>Taking those results and assuming 6.6 characters = 1 word, I get:<\/p>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/2017_words_per_year.jpg' width=100% alt='Yes this should be counting image mouseover text like the stuff you&apos;re reading now.' title='Yes this should be counting image mouseover text like the stuff you&apos;re reading now.'\/><\/div><div class='mouseover-alt'>Yes this should be counting image mouseover text like the stuff you&apos;re reading now.<\/div><\/p>\n<p>That dip in 2010 is when I was making content three times a week for The Escapist. Plus I still had a day job. The Patreon campaign began in 2014, and that's when I really started treating the site like a full-time job. <a href=\"?p=27403\">I wasn't happy with my output at the end of year one<\/a>, mostly due to <a href=\"?p=24592\">problems in my personal life<\/a> and some projects that hit a dead end without ever turning into blog posts. But I'm pretty happy with my output since then. <\/p>\n<p>To put these numbers in context, your typical young adult novel is somewhere in the 50k to 90k words ballpark. I think the first Harry Potter book is probably around 75k or so. Hefty adult books are maybe double that. <a href=\"http:\/\/lotrproject.com\/statistics\/books\/wordscount\">The Two Towers clocks in at 156k words<\/a>. Which means last year I wrote 4.5 Harry Potters worth of content, or a little more than <em>The Two Towers + Return of the King<\/em>. <\/p>\n<p>Well, it's a bit more complicated than that. This is actually a count of what I posted and not what I wrote. For instance, <a href=\"?p=30811\">last year I did re-posts of old Escapist content<\/a>. But I also did some non-trivial edits to those things. I don't want to haggle over where we draw the line between \"writing\", \"re-writing\", and \"editing\", and so let's just ignore this while I make dismissive hand-wavey motions.<\/p>\n<p>I'm kind of surprised by my output in 2006. That's a lot of words. On the other hand, I think those are pretty low-quality words. They're mostly random dashed-off thoughts. They're barely proofed, there aren't any images, and almost no links. The stuff I'm writing now is more analysis. It's researched, proofread, and annotated. It's got lots of screenshots with captions. The transformation probably began when DMotR took off and I became aware I was writing for thousands and not just a small group of friends.<\/p>\n<p>While we're at it, let's look at how many posts I've put up every year:<\/p>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/2017_posts_per_year.jpg' width=100% alt='Posts per year.' title='Posts per year.'\/><\/div><div class='mouseover-alt'>Posts per year.<\/div><\/p>\n<p>Again, that is not at all what I would have expected. I don't remember being nearly that busy in 2006. But I guess I can't argue with the data. I do remember hearing the advice, \"You should make sure to post once every day!\" and taking it to heart. Like most blogging advice, this is misleading. It's true that the most popular blogs have regular content. In the same way, many successful men wear suits every day. But wearing a suit every day will not make me successful. The <em>actual<\/em> advice you're looking for is, \"Write stuff that other people want to read.\" But that's sort of obvious and nobody knows how to teach other people to do it. So instead we get shallow advice like, \"Post every day\" and \"Check your SEO performance\", because those are things you can quantify. <\/p>\n<p>At any rate, the \"Post every day\" mindset resulted in me posting a lot of ephemeral dross in the early days of the site. That declining red bar graph is probably a good indicator of an overall rise in quality. <\/p>\n<p>Since I've already got the data in a spreadsheet, I might as well look at how long-winded I'm becoming. Here is the average word length of posts:<\/p>\n<p><div class='imagefull'><img src='https:\/\/www.shamusyoung.com\/twentysidedtale\/images\/2017_postlength_per_year.jpg' width=100% alt='Number of words per post.' title='Number of words per post.'\/><\/div><div class='mouseover-alt'>Number of words per post.<\/div><\/p>\n<p>Poor 2010. I guess I was just posting Spoiler Warning videos and links to my Escapist content. <\/p>\n<p>Well, I don't know if that was interesting, but it was a fun little project. <\/p>\n<p>And because I know you'll be curious at this point: According to WordPress, this post is 1,705 words long.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have been doing this site for a dozen years, but the question didn&#8217;t occur to me until now. I noticed the three-year anniversary of my Patreon campaign was coming up and I was looking for a way to quantify my overall output. The question is: How many words do I write in a year? [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[610],"tags":[],"class_list":["post-38522","post","type-post","status-publish","format-standard","hentry","category-landmarks"],"_links":{"self":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts\/38522","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=38522"}],"version-history":[{"count":1,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts\/38522\/revisions"}],"predecessor-version":[{"id":50617,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=\/wp\/v2\/posts\/38522\/revisions\/50617"}],"wp:attachment":[{"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=38522"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=38522"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.shamusyoung.com\/twentysidedtale\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=38522"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}