Scraping Part 4: THE FINAL CHAPTER

By Shamus Posted Thursday May 7, 2020

Filed under: Programming 63 comments

My bot is nowWell, not RIGHT now. This series was written after the bot was completed. downloading pages from Metacritic, one at a time, at the rate of a page every couple of seconds. This would be painfully slow if we were trying to read something large-scale, but right now we’re just scraping for PC games that scored above 30 over the last 19 years. That’s well under 1,000 games.

Of course, downloading these pages isn’t useful unless I can pull information out of them. Much earlier in this series I mentioned I’m using the Html Agility pack. This library can parse HTML for me and return the bits I’m interested in.

One of the funny things about this project is that I’m so far out of my comfort zone / area of expertise that I don’t even know what I don’t know. Not only am I likely making lots of hilarious blunders, but I don’t even know that I’m making them.

This is strangely liberating. When I know what I’m doing, then every cut corner makes me feel vaguely guilty. But when you don’t know what you’re doing, you’re free of the obligations to do things the Right Way(tm) because you don’t know what the right way is! As far as I know, I’ve just written the best web scraper in the history of scrapingDespite the lack of proof, I’m fairly confident that I have not actually written the best web scraper in the history of scraping..

Continue reading ⟩⟩ “Scraping Part 4: THE FINAL CHAPTER”

 


 

Please Help I Can’t Stop Playing Cities: Skylines

By Shamus Posted Tuesday May 5, 2020

Filed under: Column 124 comments

I LOVE Cities Skylines. I know it seems like I spend more time complaining about games than I spend enjoying them – and that’s probably true in a lot of cases – but Cities Skylines is one of those games that gets more interesting every time I come back to it. 

For the last few weeks I’ve been on a huge Skylines bender. I’m sure you’re familiar with this sort of thing. You start skipping sleep. You stop talking to people. You end up with a game on one monitor and its wiki open on the other. Factorio and Dwarf Fortress players will definitely know what I’m talking about. 

It got really bad this time. I’d download a dozen mods and buildings to add to the game, play for a few hours, then slam down some junk food while reading the Skylines subReddit, then watch some YouTube videos about the game, then download more mods and start the whole thing over again. 


Link (YouTube)

If you’d like to watch the video without sound, we’ve added proper closed captioning. This is much better than the YouTube auto-generated CC, which doesn’t always parse proper nouns and lacks punctuation.

I’m mostly recovered at this point. I think. My rehab counselor said I’m making good progress and I’m allowed supervised access to a computer now. My family is adjusting to having me interact with them again. So I’ve decided to pretend like I was actually working this whole time by making a video about the city-building genre and why I love Skylines so much.

In my work, I’m usually whining about how much better things were in the Good Old Days of gaming, but not this time. This is it. Cities Skylines is the zenith of the city-building genre. City simulation has never been this good. In fact, thanks to DLC and mods, this game is better now than it was when it came out in 2015. And it was already a great game back then.

This isn’t a simple clone of the classic city-building games with a new graphical paint job, this is a deeper and more interesting simulation. I tried going back to old Sim City classics after playing Skylines, and the older games felt shallow and repetitive in comparison. 

I want to tell you why I think this game is so good, but before I do that we need to go on…
Continue reading ⟩⟩ “Please Help I Can’t Stop Playing Cities: Skylines”

 


 

Diecast #300: Three Hundred!

By Shamus Posted Monday May 4, 2020

Filed under: Diecast 82 comments

Three hundred is a lot of podcasts. If podcasts were a physical object and not audio files, and if you took all 300 diecasts and stacked them up, then the resulting pile would be tall enough to fall over. Amazing.



Hosts: Paul, Shamus. Episode edited by Issac.
Diecast300


Link (YouTube)

Show notes: Continue reading ⟩⟩ “Diecast #300: Three Hundred!”

 


 

Scraping Part 3: A Well-Behaved Bot

By Shamus Posted Thursday Apr 30, 2020

Filed under: Programming 49 comments

So now I’m done messing around and being silly. It’s time to actually scrape the web for stuff. There are three different sites I’m interested in:

  1. Metacritic, for critic scores.
  2. Wikipedia, for credits regarding director, writer, producer, composer, etc. This information is spotty and I can’t think of how it might be useful right now, but I’m going to include it as part of the exercise. Also, Wikipedia often notes what franchise a game is from, which might be handy if I want to do a search that includes “all Resident Evil games” or somesuch. 
  3. Steam, for PC -specific info like DRM, controller support, multiplayer, etc.

There’s also a bit of information that can come from any of these sources: The url for the game’s official website might be handy, and we also need to get the publisher, developer, and release date from one of these places.

Of the three sites, it seems like Metacritic is the best one to start with. It has games listed by platform, which is necessary in a structural sense. For the purposes of our database, it’s possible for the same game to have vastly different information depending on platform. For example, maybe a game is released on the Playstation 3 in 2010 by Beloved Developer, but then a year later it gets ported to the PC by Shovelware Games. Metacritic is the only place where we can get this information reliably. Steam obviously isn’t going to have non-PC data, and Wikipedia entries aren’t guaranteed to have all the per-platform data in an easy-to capture locationIt might be in the info box on the right, or it might be buried in the article text (good luck capturing THAT) or it might not be listed at all.

Metacritic even has a handy index page that you can go through: Continue reading ⟩⟩ “Scraping Part 3: A Well-Behaved Bot”

 


 

Scraping Part 2: Full Control

By Shamus Posted Tuesday Apr 28, 2020

Filed under: Programming 83 comments

So there are thousands of webpages that have information we want. When faced with this problem, ancient civilizations used to go to these pages using Internet Explorer 6 and copy the data into Notepad. We don’t know what they did with it after that, because they got eaten by Woolly Mammoths or conquered by Mongols or whatever. I’m not a historian so I might be slightly off with my timeline, but you get the basic idea: The past was hard.

But now we have these newfangled web scrapers that can surf the web for you and harvest whatever data you like. The problem is that putting the data into Notepad isn’t terribly helpful. Great, now you have an enormous text file of random facts. Are you going to sit down and read it manually? Probably not. So what do we do? Write another program to read that file? You need to turn this text into data sooner or later, and to do that we need to put it into a database.

Continue reading ⟩⟩ “Scraping Part 2: Full Control”

 


 

Diecast #299: The Dross Cast

By Shamus Posted Monday Apr 27, 2020

Filed under: Diecast 46 comments

It’s the dross cast! We didn’t have a lot of topics and none of them were about Current Events or Hot New Releases. We only managed to answer one mailbag question.

But! Next week is the big 300. We’re going to do an all-mailbag episode, so please send us questions. Email is in the header image.



Hosts: Paul, Shamus. Episode edited by Issac.
Diecast299


Link (YouTube)

Show notes:
Continue reading ⟩⟩ “Diecast #299: The Dross Cast”

 


 

The Other Kind of MMO

By Bob Case Posted Saturday Apr 25, 2020

Filed under: Video Games 82 comments

(Achilles and The Grognard is on temporary hold while my various playthroughs catch up.)

(Also, I know there’s an irritating white line in the header image. I made a mistake copy-pasting.)

People are still somehow playing EVE Online, the internet spaceship MMO that came out in 2003.

Not bad for a seventeen year-old game.
Not bad for a seventeen year-old game.

I played EVE on and off from around 2007 to 2013 or so, and very occasionally since then. It’s by far the best and worst online game I’ve ever played. It’s ancient, and full of the remnants of the 2003 vintage game design choices. Both despite and because of this, I enjoyed my time in New Eden. I got to experience the much-discussed metagame: at various points, I was a spy, a scammer, a capital ship pilot, and a member of several different sovholding alliances (that is, player groups that controlled areas of conquerable space).

Continue reading ⟩⟩ “The Other Kind of MMO”