Who Ya Callin’ a “girl”?

By Shamus Posted Sunday May 6, 2007

Filed under: Links 55 comments

Random surfing took me here, which led me to Gender Genie, which is a program that examines written text and attempts to ascertain the gender of the writer. Amazingly, the analysis is done on seemingly innocuous words like “if”, “with” and “where” and not by looking for obvious male / female subject matter like cars vs. cats. (Or whatever stereotypes seem likely.) Also interesting is the size of their word lists. Gender Genie uses just sixteen “female” words and 17 “male” ones when determining the supposed gender of the author.

The program claims 80% accuracy. That’s pretty interesting to me, although Gender Genie thinks I’m a woman. I tried out several long posts (their directions suggest that text should be at least 500 words in length) and Gender Genie regularly called me a woman. When I use shorter posts, my score tips female by an even wider margin. I tried text from a few people in my blogroll, and it correctly and unambiguously identified everyone else.

I wonder what it is about my style that is causing this? It really was amazing to see that my various ruminations on roleplaying games, videogames, and geek culture – all of which seem like nominally male-dominated pursuits to me – were somehow feminine to Gender Genie. I don’t think this is bad. I’m not insulted. I don’t think this means the software sucks. I just find it curious.

The gender politics behind the system will probably chafe some (it made me roll my eyes a couple of times) but laying aside why the designer thinks males and females use various words at the given frequency, the truth remains that males and females really do write differently and this difference can usually be detected via a brute-force word count. I can’t help but get the feeling that the authors might be trying to prove something about males and females with this exercise. Maybe one to many agenda-driven gender studies has left me paranoid and jaded. In any case, the mathematics at the end interest me far more than the reading of tea leaves taking place in the main body.

I’d like to see how it does against other people. I tried a few people from my regular reading list, although I don’t want to offend anyone by outing the gender of their authorial voice. If you want to try it yourself, just grab a longish post of yours, stick it into the thingy, and see what guess it makes about your gender.

And finally: This post, which is 450 words long, scores 525 female and 468 male. Maybe I should grow a mustache and see if that nudges the score in my favor.

LATER: Other reactions to the program here and here.

 


From The Archives:
 

55 thoughts on “Who Ya Callin’ a “girl”?

  1. Three females, two males, when I (male) tried it out.

    It reckoned that Kurt Vonnegut’s “2BR02B” was male, by 3991 to 3080, though.

    Then I fed it Ernest Hemingway’s “Soldier’s Home”. Which it confidently declared, by a margin of 4245 to 3027, to have been written by a woman.

  2. Marmot says:

    Well, first of all, a bit of femininity is not a bad thing (assuming that such attributes actually are related to genders at all, which I kind of oppose). But this side seems ridiculously biased. Hell, almost anything I fed it from my own writings to others’ came out as female, even long flame posts at World of Warcraft forums, which really stretches it!

  3. mark says:

    sounds to me like it expects women to be better writers….

  4. Issachar says:

    I did this a few years ago with bits of my own writing, and the software declared me to be female, which I am not.

    I seem to recall reading an article about Gender Genie (although I don’t think it had that name back then), which is how I found the web site in the first place. The article offered more details on how the software makes its determinations of authorial gender, but for the life of me I can’t remember where I read it or what it said. I think one factor was the use of the active voice versus the passive, and the occurrence of “I” statements, which are supposedly more common in male writing.

    Very interesting stuff. Even if it thinks I’m a girl.

  5. ruleant says:

    I submitted 4 or 5 of my blog entries to Gender Genie, and all were rated male (even this small entry). Good thing, as I am male.
    I noticed that when submitting a technical post, it contained even more male words than female ones.

  6. eloj says:

    I tried it on a text of about 1300 words, which I segmented (very roughly) as follows:

    first tried the first third: Male (correct)
    then about two thirds: Female with small margin (wrong)
    and then the whole thing: Female with large margin (wrong)

    But then, I’m not a native which may play in.

  7. Issachar says:

    Drat. Gotta learn to view all the linked pages before posting, in case Shamus preemptively ninja’s me to a reference. The article in The Guardian, to which Shamus has provided a link, was probably the one I read a few years ago. Go read it if you haven’t already; it’s interesting.

  8. gedece says:

    Don’t worry, I’m sure this will happen to a lot of us here, specially the ones that like to read a lot and are thus used to a wider vocabulary, and tend to use more words to say the same thing, only to better clarify the concepts.

    In fact, I think this applies specially to us nerds, geeks and roleplayers, who are into this male dominated areas (yes, there are female nerds, geeks and roleplayers, but they are few), which are not fitting the macho male stereotype. And the vocabulary you use is almost always determined by the ones you talk to/write to more often.

  9. Carl the Bold says:

    Eventually, I’ll have to find a 500 word something that I’ve written, and see just how many X Chromosomes I’ve got. Until then, here are some results of what it says for various sentences I typed in.

    “I am a man.” – Male
    “I am a woman.” – Male
    “I ain’t no woman.” – Female
    “My nipples explode with delight.” – Female
    “She’s got huge. . .tracts of land.” – Female
    “My hemorrhoids are acting up.” – Male
    “Call me Ishmael.” – Female
    “Shiny. Let’s be bad guys.” – Female (!) (“Well, Jayne ain’t a girl!”)
    “We gotta go to that crappy town where I’m a hero!” – Female

    Then I started putting in some things spoken in DMotR (hee hee):

    “You’re confusing us with heroes who are lawful-stupid.” – Male
    “Yeah, but they’re all chicks. Naked chicks. Leafy, naked tree chicks.” – Male
    “I was failing before we even started this campaign.” – Female (well, it nailed that one!)

    “I hate this campaign.” – Unknown

    That was fun while it lasted. Oh, look, breakfast. Bye.

  10. V says:

    Everytime I put in a post complaining about work I was female. Everything else, posts about anything else, stories and old school papers all came up male. Of course nearly all of these online things incorrectly peg me as a guy.

  11. Oliver says:

    I saw this some time ago, and at that time it classified me as male based on my old blog.

    I tried this for my personal blog. For myself it’s really a toss up between male and female. However, I do notice a bit of a pattern concerning my subject matter at the time. Generally, when I am reflecting deeply upon experiences, or telling a story, the algorithm thinks I’m female. When I am speaking about the present in a lighthearted manner, projecting into the future it finds that I am male. (Also when I am talking extensively about a scientific, historical, or mathematical topic, it also find me to be male.)

    I can’t put my finger on it, but I do have some notion it’s related to the tone I use for different things– the choice is *fairly* unconscious, since I am used to writing using many different tones and can switch back and forth if need be.

    That being said, it’s interesting to see the correlation between the subject matter of the entries in my new blog and what the algorithm decides. I started my new blog to work on my writing, particularly my fiction– and it makes sense that most of the posts are evaluated as “female”.

    Of course, this is just a particular pattern with me, a coincidence that the storytelling voice I use is “female” and the precision based voice is “male”. I seriously doubt that this model would hold up consistently for a larger number of people.

    Fun, but note that it’s a *simplified* version of the algorithm mentioned in the articles you linked. So I do not think that the web app can claim to be 80% accurate. Also, Computational Linguistics is a very ill-defined field— trust me, I used to be in it. (Happily, I’ve ended up studying mathematics. No offense to my former colleagues– there are good and even brilliant people in every field, no matter how ill-defined.)

    I was trying to get to 500 words with this comment so I could use the form to evaluate my writing style, but meh… anyway with just 365 words, it thinks I am male.


    Words: 365
    (NOTE: The genie works best on texts of more than 500 words.)

    Female Score: 429
    Male Score: 505

    The Gender Genie thinks the author of this passage is: male!

  12. Strangeite says:

    I just tried something I wrote concerning the nature of Snape after I finished reading the last Harry Potter and it gave me a score of 1022 female versus 834 male on a 636 word essay. I am a man.

  13. Arson55 says:

    I decided to fest some of my writing. I mostly write fiction. I started with the novel I’m working on. The First 6000 words made it think I was female…Then the next 6000 got a definite female result (well over a 9000 female score to a just over a 6000 male score) The next 3000 words also got a female response.

    I tried again with something else; I figured that since the protagonists in the other work were female that might be affecting it given that her and she were among the words it used. So I put in something where the protagonist was male…still female. I tried one more story of mine, and finally got a male response. Strange…

    But whatever, I don’t really find myself caring. I just find it odd that it has found so many of us to be female.

  14. trousercuit says:

    I do machine learning research, so this is right up my alley. One thing you’ve got to remember is that the classifier most likely can’t get 80% accuracy on text that comes from a different population than the one it’s trained on. In other words, if you want it to do well on Internet posts that have to do with gaming and such, you need to train it on those kinds of posts from both male and female authors.

    I’ve read about similar things before. Keep in mind that the list of words it looks for aren’t things that a *human* chose – a *machine learning algorithm* tagged them as words that tend to indicate male-ness or female-ness. Authors of one such system posited that their system’s choice of words reflected the idea that men tend to write about concrete events, and women abstract ideas. In theirs, which also used part-of-speech tags, male writers tended to favor nouns and verbs, while female writers favored adverbs and adjectives. (The specific words they tended to favor reflected the concrete/abstract division as well.)

    Weird stuff, but very cool. It’s very likely you like to write about ideas rather than events, Shamus.

  15. AngiePen says:

    I ran some of my fiction through Gender Genie a while back and the result depended on the kind of story. If the plot was lighter, the characters less blatantly masculine, the tone fluffier or more humerous, then it generally came up female (which I am.) If the plot was heavier, the characters rougher around the edges, the tone darker and more hard-edged in general, then it generally came up male. Which is fine with me, actually. I’ve always tried to shift my writing style IAW the story I’m telling and it looks like I’m succeeding. A bunch of my writer friends who ran stories through it got similar results.

    I have a feeling this thing is more likely to be accurate (by whatever percentage — I don’t quite buy 80% for this particular version either) with non-fiction than fiction. Fiction writers generally do alter their styles to suit the story, as I do, and that would tend to throw it way off.

    Angie

  16. Apparently, I am about 2/3rds male according to my writing.

    Typically, my more technical-related posts tend more male, and my more productivity-oriented or story-based posts are female. Though in most cases, it’s a slender margin.

    Oh, and my rants are often male. Go figure!

  17. Kelly says:

    I, a female, was listed as male for every blog post I submitted, rather definitively as well. I freely admit to being an atypical female however. I’m more of a math and science type geek than one o’ those readin’ and writin’ ones. However, for NaNoWriMo last year, I wrote a (really bad) fictional story of a teenage girl. Segments of that writing turned up adamantly female. It turns out I used “she” a lot.

  18. Nite says:

    Everything I’ve every written that I’ve run through that thingy comes out female. Then again, every single gender test (be it colors or thinking or writing or anything) pops me out as female. All in all, every single thing (including on-line real-person contacts) note me as a female. Every single thing, short of a look in the mirror that is, which is kinda sad, since I’d like to be, but whatever, seems the inner me is female in any case :)

  19. wally says:

    wow it worked 100%…i tried 5 times i also tried quoting people
    and it was still right!….wow

  20. JD Malmquist says:

    This is just fascinating. I’m trying to write a novel, and I’m happy to report the results on my first chapter:

    Words: 687
    Female score: 495
    Male score: 1424

    But I’m more interested in that list of words. I’m wondering if something written deliberately using more words of the ‘wrong’ gender for the author would help it sound truly differently gendered… no pun or twisted metaphor intended.

  21. Luke says:

    It’s a tossup for me. Some of my posts come as male, and some as female. When I first did this while back ago, most of my posts were coming up as female. Now I found couple of them that were classified as male.

    Funny thing – the male post were my angry rants about human stupidity, lame users and etc. The female ones were the onces where I was more light hearted or humorous.

    Also, I always assumed that this thing would be sometimes thrown off my my English usage. I have been living in US for the past 8 years or so, but I still have some slight accent and often use a word in a weird way, or construct a sentence that sounds odd to the native speakers.

  22. Chilango2 says:

    After testing this out, I have come to the belief that there’s something about blog posts that skew the numbers towards Male. It answered Male to me (correct), my wife, and every female friend on her blog network, for every entry I tried. The numbers were *less* skewed male than posts written by actual males, but yeah.

  23. Mordaedil says:

    Odd. I’ve written a couple of fictive background stories for RPG characters and it nailed their gender, but not the gender of myself.

  24. I submitted quite a lot of my writings to Gender Genie.

    Where I was making statements or using words like “is” and “must”, I rated as male.

    Where I was discussing feelings or using words like “might” and “could” I rated female.

    Seems pretty biased in what represents masculinity or femininity. But it was interesting.I submitted quite a lot of my writings to Gender Genie.

    Where I was making statements or using words like “is” and “must”, I rated as male.

    Where I was discussing feelings or using words like “might” and “could” I rated female.

    Seems pretty biased in what represents masculinity or femininity. But it was interesting.

    (This post shows as male, because I’m making statements!)

  25. Morte says:

    Wouldn’t worry Shamus old bean, I’m a lady too. Whereas in real life I’m a manly blokey bod, enjoying manly pursuits like drinking beer, talking to dames, shooting guns, fighting and needlepoint.

    Damn and blast, the truth is out

  26. Hal says:

    Pure, weapons-grade bolonium.

    Even a schmuck of a writer like me knows that some words are best for making certain points. These word designations are horrible.

    Like some of you, my scores depend on the subject I’m writing on. If I’m writing about religion/philosophy, politics, or news, I’m apparently feminine to varying degrees. When I write about video games or porn? All man.

    Silliness, if you ask me.

  27. Carl the Bold says:

    So I was eating lunch thinking about this more, and I remembered the ‘Tandem Story’ (http://www.snopes.com/college/homework/writing.asp), written by Rebecca and Gary, and wondered what it would say. If you’d never read it, go read it.

    The Gender Genie correctly predicted the gender of the author when I put them in by paragraph, right up to the point where they started complaining about each other. Those paragraphs were all assumed to have been written by a male (in fact, the female score was zero!).

    You don’t have to think it’s fun. I did. :)

  28. Don says:

    I’m male when I write about anime, female when I write about anything else.

  29. Thad says:

    Shamus, you should try your System Shock novel.

  30. Phobiac says:

    If it makes you feel any better, putting the lyrics for Bohemian Rhapsody in makes it say that the author is female.
    Although I did always wonder about Freddie Mercury.

  31. Thpbltblt says:

    For the most part, the Gender Genie thinks I’m male, which would make it correct. However, most of my blog posts are short, sweet, and to the point. On my longer posts, I seem to take on a feminine edge. On one post in particular regarding a recent familial visit to my domicile (1387 words), the GG decided I was overwhelmingly female (1828 versus 1326). This little snippet I’ve typed up right here, not counting this sentence, is decidedly male scoring 80 to 5; go figure.

  32. Dave says:

    Looks to me that the thing decides that female write in the past tense and males are more direct with simple sentences. .. Hmm.. I wonder what it does with misspellled stuff.

  33. Steve says:

    There’s a simple rule of thumb: Don’t believe anything supported by the Guardian! Bunch of unshaven trouser-wearing feminists :)

  34. Vykromod says:

    I’ve put in the last 15 posts I made on a forum, considering them to be blog entries. These cover everything from local politics to experiences in Games Workshop to a lengthy rant about things I hate.

    Male results: 5
    Female results: 10

    I’m male. Unless someone’s not telling me something.

    The site says texts of 500 words tend to give more reliable results and everything I’ve put in is shorter than this. The longest post comes in at 462 words and did give a male result, but even putting more weight behind that I seem to use feminine keywords more often.

    For no real reason, I also put this comment (down to this sentence) through the Genie.

    Female Score: 56
    Male Score: 182

    These things will never be entirely reliable, though.

  35. Jade says:

    Alright – I’m a geek chick who tried it out. I fully expected to be labled as male due to my tendancy to possess the humor of a 12 year old boy, and the fact I used entries from my “tech” rant blog.

    Interestingly, the longer entries talking about something that happened were all labled as female, and the shorter ones were labled male. The shorter ones al seemed to be just quick facts and notes about the future, so I think I agree with whoever said it seems to go by tenses.

    And I even used a lot of “male” words like “fart” and “butt-pipe!” =D

    PS I’ve been a long time lurker of your blog, and this is the first time I’ve posted – just wanted to say it’s fantastic!

  36. I fed in a whole bunch of my recent entries from http://www.thealexandrian.net and got back consistent results:

    If I was talking about games (whether D&D or computer games), it said I was male.

    If I was talking about anything else, I was female.

  37. mom says:

    Females are statistically MORE VERBAL. Bigger vocabulary, more intricate sentences, less direct (wordier). You’ve always been waaay more verbal than most. Just my guess. Also, the spell checker masks your male tendency to misspell. Also a guess.

  38. karrde says:

    A couple of posts about books/history came up male, but a post about programming to solve math problems came up marginally female.

    What I wonder is, how did they come up with a number like 80%?

    Was a “multiple submissions from same author, take the majority” type? Or was it “80% are identified correctly at least once” type of question?

  39. karrde says:

    Just for kicks–I found a couple of files that had been emailed to me by an ex-girlfriend (emailed before she was an ex-gf, that is). She had some ability in writing poetry, so I fed two of her poems into the Gender Genie.

    Both came up male. (One by a factor of 5%, the other by 15% or so.) Strange, when compared to the better-known poetry put in by Phobiac.

    However, in her defense, Madamesoille X.G.F. fit into the “very literate” category, with science-geekery and some talent with computers thrown in on the side.

    (And why do I keep a bunch of emails and poems in files on my HDD from a relationship that died three years ago? Probably because I’m an obsessive archiver of information…and my HDD isn’t full yet.)

  40. Myxx says:

    Interestingly enough, I got all male results (which is accurate). However, on my snippets from grad school work the margin was much wider (500+ words), as opposed to blog entries and even Amazon product reviews (less than 500 words), which while still male, resulted in a much slimmer margin.

    Of course, then I tested this post, and it resulted in a F9/M58… go figure.

  41. John says:

    Well it is ‘Mo May’ – grow your mo to support prostate cancer awareness. Though I think referring to Chuck Norris a lot would help more

  42. Thad says:

    80% accuracy… ie, me and four of my friends tried it out, and it guessed four correctly.

  43. MintSkittle says:

    Well, I’m not a blogger, so I copy/pasted a bit of fanfic I was working on into the genie. As a side note, the fic is notwhere near done. It’s currenty only a prologue and first chapter, and will probably never be finished.

    Words: 1504
    Female Score: 1415
    Male Score: 1780

    It says I’m leaning towards male. I’ll go with that.

  44. Carl the Bold says:

    37 mom Says: Females are statistically MORE VERBAL. Bigger vocabulary, more intricate sentences, less direct (wordier). You've always been waaay more verbal than most.

    That’s got me curious enough to ask a personal question, Shamus. What did you get on your SATs? I suspect you did really well on both sections (there are three now), but if there had been a section on penmanship, for some reason, I suspect you would have failed that.

  45. Avatar says:

    You’re a guy?!?

  46. trousercuit says:

    karrde:

    “What I wonder is, how did they come up with a number like 80%?”

    Most likely cross validation. That is, you split the data set (novels, articles, whatever) up into ten (or whatever) chunks, train the algorithm on nine of them, and use the last for testing accuracy. Make each chunk the testing chunk exactly once. Average the accuracies.

  47. Sarah says:

    OK, weird. This is even more messed up than that test to see which House you’d be in at Hogwarts. I put a page from one of my half-finished novels up…900something female to 400something male. well, i feel validated…but oddly stereotyped.

  48. Clodia says:

    Six male, three female. I’m actually female. However, most of my blog posts are less than five hundred words – I’m not a very chatty person, apparently.

  49. Ace says:

    Turns out I’m a woman as well. Wow.. I feel a whole lot more in touch with my woman side, all of a sudden. I’m off to watch my little pony and pick flowers now. The genie wills it!

  50. Gropos says:

    The site is faux scientific.

  51. Purple Library Guy says:

    Looks like bunk to me.
    It doesn’t look as if it generalizes at all well across populations–at most, it might be identifying some kind of cultural artifact in the original text samples that led to the list of words being spat out. Populations like the one those text samples came from may find it has some accuracy. Everyone else can be expected to find little if any predictive value.
    And if as trousercuit says the word list was spat out by a computer algorithm, it would be kind of uninteresting even if it worked 100% for everyone, because the moment anyone tried to come up with an idea about *why* those words were predictive, they’d be blowing smoke.

    As for someone’s comment that women are statistically more verbal–you want to watch that kind of statistics. They tend to be technically true but immensely misleading, because the reality is that the small statistical average variations between genders on nearly anything don’t mean much at an individual level. The population is almost all overlap, with a tiny bit of edge that doesn’t.

    It’s also unclear how much any given apparent difference is cultural (and “cultural” is a misleading statistical thing too–in effect, everyone is raised in a somewhat different “culture” from everyone else). I mean, time was when guys were expected to be the rational gender with the big vocabulary that did all the writing. And they probably did have bigger vocabularies, ’cause women didn’t get educated.

    Even brain studies showing differences in information processing aren’t conclusive. For one thing, they’re going to be statistical too; sure, on a TV show you’ll get shown the most characteristic “male” and “female” pattern they got, and it will look quite different. But in reality there are going to be lots of different patterns some more common than others, and some people will look like the other gender, and left handers may tend to look different, and yadda yadda yadda. Data is always messy. Second, because brains are plastic–they change over your lifetime, make new connections, form new patterns; it’s unclear whether we grow new neurons, but the existing ones certainly grow and change. Plausibly one might take a guy, do the scans, find a classic “male” pattern, then train, educate, or immerse him in an environment that hones “female” aptitudes, do the scan again and find a “female” pattern simply because he’s formed the kinds of connections a skilled person develops.

    Things are complicated, and I mistrust a lot of the results we see on this kind of stuff. There are too many sources of error and artifacts that are really difficult to screen for. And too many people not interested in finding them because the results will look more dramatic if they don’t.

  52. Rebecca says:

    I think the pronouns could throw it off. It thinks my fiction is male, but I write in third-person, not first-person, and more about guys than women. Regular blog entries vary. Perhaps to get the best results you need to enter in a very large sample of work. And maybe I should write more female characters.

  53. I guess internet-people express themselves in a far different way than the thing is used to.
    My Nanowrimo story of 2005 scored 69454 Male vs 84981 Female. That’s what I get for a male protagonist who speaks in first person, while of the other characters, the girls are discussed the most.
    Hm…
    With = 458 x 52 = 23816
    Not = 682 x 27 = 18414
    Or it’s just my style. Less than 11k points coming from words like she/her/hers.
    Wait, “the” and “a” are both considered male words? Then how on earth can one use a noun without being male?

    With my 2006 Nano I scored less big a difference: 75232/80838 M/F
    It’s funny how the amounts of female words are all a tiny bit less, while the male words nearly all are (far) greater in quantity.
    In both cases I have “with” and “not” as main female words, adding up to roughly half of the female score, and “what”, “as”, “are” and “the” being over half of the male score.
    Yup, definitely my writing style.

    This blog-entry is male, my referring to “with” and “not” are about the only things actually causing a female score.

  54. Bryan says:

    The Gender Genie algorithm, which first appeared in the NY Times’ “science” section, is a poor popularization of the algorithm as it appeared in the original academic literature. I have the original paper and that algorithm is meant to be applied to fiction; applied to non-fiction, the authors admit, the algorithm is no better than random chance at detecting an author’s gender. A much better alogrithm, the one that has an “80%” chance of detecting author’s gender correctly, needs to be taught on a large sample to generate a massive statistical measure of male vs. female characteristics in text. Even applied to fiction, the popular algorithm is not much better.

  55. Julia says:

    I tried out one of these with my mom awhile back. We both found samples of our writing online for the purpose. It identified her writings as done by a woman, but just barely, and mine as done by a man, by a bit more of a margin.

    Given the context in which most of my writing was done, maybe that had something to do with it. Maybe I should feed it some of my LiveJournal stuff and see what it does.

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply to Bryan Cancel reply

Your email address will not be published.