Once a week, I sit down with an hour of raw audio and Audacity and produce our weekly podcast. It’s a really annoying job to me, mostly because it doesn’t seem like it should exist. You record audio, you upload audio. What else is there to do? Why mess with it?
But no, apparently it takes a lot of work just to create the sort of medium-quality audio we have on our show. It’s like when I found out about foley artists, who painstakingly record all the everyday sounds to put in a feature film, because normal boom mics don’t pick them up and the world sounds mute without the footstep, clothing, and door-slamming noises we expect to hear. Tons of work goes into something I always assumed was effortless.
I’m not an audiophileI actually call myself an audioslob: I’m not very picky about fine-detail sound quality. I listened to all my music on cassette in the 80’s and the lack of quality never bothered me., I have no training in audio production, and I barely know what I’m doing. I’ve mostly had to figure this gig out on my own through trial-and-error, with some guidance from Josh.
Once in a while people ask questions about how this is done, so I thought I’d document the process…
1. Record and Import.
We start with the raw audio inputs of our hosts. This will be three to five separate audio files, about an hour long. I’ve heard that some people just save the raw output of (say) a Skype session or a conversation held around a single Microphone. That’s like uploading a Google hangout. It’s just a little too raw for my taste and leaves the audience to listen to awkward pauses, throat-clearing, unbalanced audio, and bits of behind-the-scenes conversation that are kind of dull and time consuming. So what we do on the Diecast represents a sort of medium level of editing: Clean up the audio and trim out the cruft.
In our case, we record a Ventrilo conversation. Vent is the only program that will let one participant record the entire conversation, but keeps each person’s audio in its own file instead of “helpfully” mixing them together. On the other hand, Vent has an annoying bug where the last half-second of every message ends up clipped in the exported audio, which makes it sound like we’re dum-dums who let go of the push-to-talk key too soon. Over the years we’ve learned to compensate for this by holding down the key extra-long. Over a year ago I showed the bug to the Vent team and they were able to reproduce it. I’m still hoping we’ll see a fix for that one of these days.
I drop the raw audio tracks into Audacity, which is free and awesome. There are expensive big-name audio processing tools out there that you can buy if you’re made of money, but Audacity is more than powerful enough for this job and if I had something better I wouldn’t have the knowledge to put its features to use anyway.
2. Balance volume.
Incoming audio can be sort of random. Windows has volume controls for your mic, Vent has volume control for outgoing audio, Windows has volume controls for Vent, and Vent has volume controls for incoming audio. Maybe Rutskarn outputs loud but I manually turn him down and Chris is quiet but I manually boost him. The result is that these different audio tracks can be at wildly different audio levels.
Let’s start with Rutskarn. I select his audio file and hit “Amplify”. By default, this will boost the entire audio file enough so that the loudest part is at max output volume. If I turned the amplify above this point it would produce “clipping”. Do that hard enough, and the audio would have the over-boosted sound of someone talking out of a megaphone. You don’t want that.
By default, the amplify tool will boost someone to make them as loud as possible without doing any clipping. If Rutskarn spent the entire show speaking at the same volume, then this would be good enough. But likely as not he was louder in some parts than others. To fix that we need to…
3. Compress audio.
In this case, “compression” doesn’t have anything to do with file compression. This is volume compression. See, the previous step just made it so that Rutskarn’s loudest moment is at max volume. But if he shouted at one point in the show and talked normally the rest of the time, then this would result in him being too quiet in the final mix.
I’ll be honest: I barely understand the compressor. I know it takes all the quiet parts of the file and boosts them by some degree. This part is more art than science. If you don’t boost enough, then the quiet parts are still too quiet. If you over-boost, then once again you risk crushing the audio on creating the “megaphone” sound.
Every host has their own quirks that make their audio unique. Mumbles and Josh talk loud but have explosive shouting once in a while. I don’t usually shout, but sometimes I mumble as I finish up a thought and it trails off into annoying inaudibility. Rutskarn is all over the place depending on mood and topic. The only exception is Chris, who is super measured and even, as long as nobody brings up Guy Fieri.
Basically, I’ll compress their audio until the quiet parts sound acceptable.
4. Noise cancellation.
This step is really fiddly. Josh and Chris have really high-end microphones, so they have very little room noise. But Rutskarn and I have low and medium grade equipment, respectively, and we often have fans running near where we record. This noise is usually very slight. But sometimes it’s super hot and I have my fan close to me. Other times we’ve got the fan pointed so that the moving air passes over the microphone. Or perhaps there’s other stuff going on in the apartment that gets picked up. (Rutskarn has a roommate, I have kids.) We try to fix this stuff before recording, but sometimes conditions change during the session and sometimes we forget.
Of course, there’s an upper limit on how much noise you can remove before it starts to kill the voice quality. Last week Rutskarn and I had fans too close to our computers, and the final audio was a disaster.
In any case, the previous step probably boosted all the unwanted background noise quite a bit. The slightly distracting whisper on a fan will become a rushing wind, making it sound like I’m sticking my head out the window of a moving vehicle.
To cancel noise, I need a sample of pure, raw noise with no talking. It usually looks like this:
So I find a section of the show where Rutskarn left his mic open but wasn’t talkingAs part of our routine, I always ask everyone to give me a couple seconds of open-mic at the start of the show, for this very reason.. I select just that part, open up the “Noise Removal” tool, and hit “Get Noise Profile”. This tells audacity: “The pitches you see in this little sample of audio are the ones I’m interested in.” Then I select the ENTIRE audio file for Rutskarn, open up the same dialog again, but this time I hit the “Ok” button at the bottom. Audacity will have a jolly good think for a minute or so, and when it’s done Rutskarn’s audio should have the background noise removed. It’s not perfect, but it’s a massive improvement.
Once I have Rutskarn’s audio cleaned up, I go back and do steps 2 through 4 for the other hosts.
EDIT: Many people have pointed out in the comments that noise cancellation is much more effective if you do it BEFORE compression. So that will be my proceedure from now on.
5. The Editing.
There’s an important tradeoff you have to deal with when recording a podcast among people from the far-flung corners of the world. You can have decent audio quality but intolerable latency, or you can have abysmal audio quality and slightly bad latency. More latency means conversations are more awkward. We can’t see each other, so we’re deprived of all the physical interpersonal cues that people use to signal they would like to talk: Opening your mouth, taking in a loud breath, nodding vigorously, etc. So the only way to get your words out is to dive in and hope other people get out of your way. This results in those tedious bits where three people talk at once and then play ten seconds of, “Go ahead / No YOU go ahead.” Not fun to listen to.
On top of that, lag can still be a problem for individuals even if the server is fine.
So I listen to the entire show and find all the instances of confusion, overlap, and cross-talk. I clean them up as best I can. If three people start talking at once, I’ll edit out the resulting confusion. If there’s some overlap, I’ll either mute the overlapping clip or time-shift it so that it’s no longer overlapping. I also edit out the housekeeping bits where we decide when to move on to the next topic or when to end the show. No need to waste the listener’s time with that. Overall, the show gets about three or four minutes shorter by the time I’m done.
This is a fussy, tedious task, and if I work very hard and do a really great job then you won’t be able to notice I did it at all.
6. Time times three.
This sort of editing takes about three times as long as the recording itself, although this value goes up or down slightly depending on the number of hosts and how much lag was happening. Also the longer the show is, the more time I spend waiting for bulk audio tasks to complete, so the increase in time is non-linear. This is why I’m such a tyrant with a clock. (I’m always the one pushing to end the show as soon as the hour is up.) An hour long show will cost me 3 hours of editing time, while a 1.5 hour show will cost me in excess of 5 hours. If there are a lot of hosts and a lot of lag that night, then it might take me 6 hours.
So that’s the process.
 I actually call myself an audioslob: I’m not very picky about fine-detail sound quality. I listened to all my music on cassette in the 80’s and the lack of quality never bothered me.
 As part of our routine, I always ask everyone to give me a couple seconds of open-mic at the start of the show, for this very reason.
What did web browsers look like 20 years ago, and what kind of crazy features did they have?
Quakecon Keynote 2013 Annotated
An interesting but technically dense talk about gaming technology. I translate it for the non-coders.
Internet News is All Wrong
Why is internet news so bad, why do people prefer celebrity fluff, and how could it be made better?
Deus Ex and The Treachery of Labels
Deus Ex Mankind Divided was a clumsy, tone-deaf allegory that thought it was clever, and it managed to annoy people of all political stripes.
Another PC Golden Age?
Is it real? Is PC gaming returning to its former glory? Sort of. It's complicated.