Audio Editing for Podcasts

By Shamus
on May 28, 2015
Filed under:

Once a week, I sit down with an hour of raw audio and Audacity and produce our weekly podcast. It’s a really annoying job to me, mostly because it doesn’t seem like it should exist. You record audio, you upload audio. What else is there to do? Why mess with it?

But no, apparently it takes a lot of work just to create the sort of medium-quality audio we have on our show. It’s like when I found out about foley artists, who painstakingly record all the everyday sounds to put in a feature film, because normal boom mics don’t pick them up and the world sounds mute without the footstep, clothing, and door-slamming noises we expect to hear. Tons of work goes into something I always assumed was effortless.

I’m not an audiophileI actually call myself an audioslob: I’m not very picky about fine-detail sound quality. I listened to all my music on cassette in the 80’s and the lack of quality never bothered me., I have no training in audio production, and I barely know what I’m doing. I’ve mostly had to figure this gig out on my own through trial-and-error, with some guidance from Josh.

Once in a while people ask questions about how this is done, so I thought I’d document the process…

1. Record and Import.


We start with the raw audio inputs of our hosts. This will be three to five separate audio files, about an hour long. I’ve heard that some people just save the raw output of (say) a Skype session or a conversation held around a single Microphone. That’s like uploading a Google hangout. It’s just a little too raw for my taste and leaves the audience to listen to awkward pauses, throat-clearing, unbalanced audio, and bits of behind-the-scenes conversation that are kind of dull and time consuming. So what we do on the Diecast represents a sort of medium level of editing: Clean up the audio and trim out the cruft.

In our case, we record a Ventrilo conversation. Vent is the only program that will let one participant record the entire conversation, but keeps each person’s audio in its own file instead of “helpfully” mixing them together. On the other hand, Vent has an annoying bug where the last half-second of every message ends up clipped in the exported audio, which makes it sound like we’re dum-dums who let go of the push-to-talk key too soon. Over the years we’ve learned to compensate for this by holding down the key extra-long. Over a year ago I showed the bug to the Vent team and they were able to reproduce it. I’m still hoping we’ll see a fix for that one of these days.

I drop the raw audio tracks into Audacity, which is free and awesome. There are expensive big-name audio processing tools out there that you can buy if you’re made of money, but Audacity is more than powerful enough for this job and if I had something better I wouldn’t have the knowledge to put its features to use anyway.

2. Balance volume.


Incoming audio can be sort of random. Windows has volume controls for your mic, Vent has volume control for outgoing audio, Windows has volume controls for Vent, and Vent has volume controls for incoming audio. Maybe Rutskarn outputs loud but I manually turn him down and Chris is quiet but I manually boost him. The result is that these different audio tracks can be at wildly different audio levels.

Let’s start with Rutskarn. I select his audio file and hit “Amplify”. By default, this will boost the entire audio file enough so that the loudest part is at max output volume. If I turned the amplify above this point it would produce “clipping”. Do that hard enough, and the audio would have the over-boosted sound of someone talking out of a megaphone. You don’t want that.

By default, the amplify tool will boost someone to make them as loud as possible without doing any clipping. If Rutskarn spent the entire show speaking at the same volume, then this would be good enough. But likely as not he was louder in some parts than others. To fix that we need to…

3. Compress audio.


In this case, “compression” doesn’t have anything to do with file compression. This is volume compression. See, the previous step just made it so that Rutskarn’s loudest moment is at max volume. But if he shouted at one point in the show and talked normally the rest of the time, then this would result in him being too quiet in the final mix.

I’ll be honest: I barely understand the compressor. I know it takes all the quiet parts of the file and boosts them by some degree. This part is more art than science. If you don’t boost enough, then the quiet parts are still too quiet. If you over-boost, then once again you risk crushing the audio on creating the “megaphone” sound.

Every host has their own quirks that make their audio unique. Mumbles and Josh talk loud but have explosive shouting once in a while. I don’t usually shout, but sometimes I mumble as I finish up a thought and it trails off into annoying inaudibility. Rutskarn is all over the place depending on mood and topic. The only exception is Chris, who is super measured and even, as long as nobody brings up Guy Fieri.

Basically, I’ll compress their audio until the quiet parts sound acceptable.

4. Noise cancellation.

This step is really fiddly. Josh and Chris have really high-end microphones, so they have very little room noise. But Rutskarn and I have low and medium grade equipment, respectively, and we often have fans running near where we record. This noise is usually very slight. But sometimes it’s super hot and I have my fan close to me. Other times we’ve got the fan pointed so that the moving air passes over the microphone. Or perhaps there’s other stuff going on in the apartment that gets picked up. (Rutskarn has a roommate, I have kids.) We try to fix this stuff before recording, but sometimes conditions change during the session and sometimes we forget.

Of course, there’s an upper limit on how much noise you can remove before it starts to kill the voice quality. Last week Rutskarn and I had fans too close to our computers, and the final audio was a disaster.

In any case, the previous step probably boosted all the unwanted background noise quite a bit. The slightly distracting whisper on a fan will become a rushing wind, making it sound like I’m sticking my head out the window of a moving vehicle.

To cancel noise, I need a sample of pure, raw noise with no talking. It usually looks like this:


So I find a section of the show where Rutskarn left his mic open but wasn’t talkingAs part of our routine, I always ask everyone to give me a couple seconds of open-mic at the start of the show, for this very reason.. I select just that part, open up the “Noise Removal” tool, and hit “Get Noise Profile”. This tells audacity: “The pitches you see in this little sample of audio are the ones I’m interested in.” Then I select the ENTIRE audio file for Rutskarn, open up the same dialog again, but this time I hit the “Ok” button at the bottom. Audacity will have a jolly good think for a minute or so, and when it’s done Rutskarn’s audio should have the background noise removed. It’s not perfect, but it’s a massive improvement.

Once I have Rutskarn’s audio cleaned up, I go back and do steps 2 through 4 for the other hosts.

EDIT: Many people have pointed out in the comments that noise cancellation is much more effective if you do it BEFORE compression. So that will be my proceedure from now on.

5. The Editing.


There’s an important tradeoff you have to deal with when recording a podcast among people from the far-flung corners of the world. You can have decent audio quality but intolerable latency, or you can have abysmal audio quality and slightly bad latency. More latency means conversations are more awkward. We can’t see each other, so we’re deprived of all the physical interpersonal cues that people use to signal they would like to talk: Opening your mouth, taking in a loud breath, nodding vigorously, etc. So the only way to get your words out is to dive in and hope other people get out of your way. This results in those tedious bits where three people talk at once and then play ten seconds of, “Go ahead / No YOU go ahead.” Not fun to listen to.

On top of that, lag can still be a problem for individuals even if the server is fine.

So I listen to the entire show and find all the instances of confusion, overlap, and cross-talk. I clean them up as best I can. If three people start talking at once, I’ll edit out the resulting confusion. If there’s some overlap, I’ll either mute the overlapping clip or time-shift it so that it’s no longer overlapping. I also edit out the housekeeping bits where we decide when to move on to the next topic or when to end the show. No need to waste the listener’s time with that. Overall, the show gets about three or four minutes shorter by the time I’m done.

This is a fussy, tedious task, and if I work very hard and do a really great job then you won’t be able to notice I did it at all.

6. Time times three.

This sort of editing takes about three times as long as the recording itself, although this value goes up or down slightly depending on the number of hosts and how much lag was happening. Also the longer the show is, the more time I spend waiting for bulk audio tasks to complete, so the increase in time is non-linear. This is why I’m such a tyrant with a clock. (I’m always the one pushing to end the show as soon as the hour is up.) An hour long show will cost me 3 hours of editing time, while a 1.5 hour show will cost me in excess of 5 hours. If there are a lot of hosts and a lot of lag that night, then it might take me 6 hours.

So that’s the process.

Enjoyed this post? Please share!


[1] I actually call myself an audioslob: I’m not very picky about fine-detail sound quality. I listened to all my music on cassette in the 80’s and the lack of quality never bothered me.

[2] As part of our routine, I always ask everyone to give me a couple seconds of open-mic at the start of the show, for this very reason.

202020207There are now 87 comments. Almost a hundred!

From the Archives:

  1. James Schend says:

    I kind of have to disagree that Audacity is great software– it’s adequete, but the UI is atrociously bad, it’s 2015 and it’s still single-threaded (I’ve sure you’ve noticed that expensive filters like noise removal take about 8 times longer than they should on your computer), it makes some wrong assumptions about Windows audio devices (in particular, it doesn’t understand the difference between the default device and the default communication device), it has local application-level sliders that make global system changes (those damned volume sliders are my nemesis!) It’s, uh, free. That’s the best I can say about it.

    I got a free pack-in of SoundForge when I bought Sony Movie Studio, and I use that for everything except noise cancellation. (Because Sony’s noise cancellation plug-in is ridiculously pricy.) So each time I have an hour’s worth of audio, I have to suffer through Audacity slooooowly doing its noise cancellation at 1/8th my computer’s actual CPU power…

    If your guys are responsible, you might consider having each person recording themselves during recording and sending you their files via DropBox or something instead of relying on Ventrilo and just coping with its bugs and latency. Make sure they use a lossless format like FLAC. You run the risk of having to rely on 4 computers now instead of one, but you can still use the Ventrilo file as back-up.

    Also, I also have absolutely no experience on the matter, but I find things run smoothly if I run noise cancellation *before* compressing the audio. I try to run the compressor only after all the junk noise is gone, it can’t tell the difference between a loud background hum and a low voice.


    • Eruanno says:

      I don’t have a huge lot of audio experience, but I find that most audio editors have a horribly convoluted interface. It’s not free either, but I find Adobe Audition’s interface is the least scary and horrible thing to work with.

      • You can actually download Adobe Audition 3 from the Adobe site (with serial supplied by Adobe themselves).
        Search around you’ll find it, I won’t link directly as it’s not supposed to be free but for those who have CS 2 if I recall correctly (as a compensation for the shutting down the licensing servers).
        The download has been there for several years now, there is also a update and a hotpatch fix available for Audition 3.

        It’s a shame they discontinued the Audition (3.x) line, they redid pretty much all the code for the later releases/CS editions so Audition 3 is the end of the line of the Cooledit legacy (Adobe bought Cooledit way back).

      • James Schend says:

        Oh yeah. There are a LOT of areas of software that are just BEGGING for a competent developer to come in and kick their ass. Audio editing is definitely one of these.

        I agree entirely.

        And honestly, I’d probably love Audacity if it could draw menus correctly, if it didn’t change global system volume based on its own estimation of your mic level, and if it had threading so it could actually run at more than 1/8th the speed of my actual CPU.

      • Bropocalypse says:

        Yeah, very few audio-centric programs I’ve seen have an intuitive interface. Maybe because audio experts aren’t visually-oriented? I dunno.

      • Nevermind says:

        Regarding bad interfaces for audio tools, I remember reading a post from Unity Technnologiers about designing the audio features in latest Unity3D engine. I’m no audio professional, so I was pretty surprised. They had first draft of the UI that looked like this: and apparently, it was bad. The actual audio designers could not use it, and instead much preferred this: I, personally, don’t understand ANYTHING in that second picture.

        Oh, and the link to the actual post is for those who are interested.

        • Cuthalion says:

          The second Unity pic actually IS the visually-oriented one. The first one is just numbers, and you have to try and picture what they’re doing to the volume. The second one shows you: everything louder than that line gets processed so that it gets loud less quickly. That may sound weird, but it’s saying, “Alrighty, anything louder than that 10-below-max-volume line, take it and make it only half as… louder than 10-below-max.” That’s why the hill in that graph gets less steep after “Threshold: -10.0 dB”.

          I probably didn’t explain it well (compression is kind of counter-intuitive), but trust me, the one with the graph is much easier to use once you understand what it’s doing than just going by the numbers like you can in some compression tools.

          Also, they aren’t screenshots of the same tool. Or at least, not entirely. They each have controls the other one doesn’t; I’m surprised that one replaced the other.

      • Trix2000 says:

        Audition is by far the best I’ve worked with (and I work for an audio company, so…), though often we end up using Audacity as well just because it’s so much easier to throw it on any of our computers and use right away. It’s very effective at what it does and I’ve had few issues with performance and UI (though admittedly, it’s not the most intuitive thing to use).

    • Also, I also have absolutely no experience on the matter, but I find things run smoothly if I run noise cancellation *before* compressing the audio.

      *nods* Yeah, take note of this Shamus.
      Always get a noise profile and then do noise reduction on the raw audio.

      If you amplify or compress and then do noise reduction you will have altered the audio such that the noise floor has already been raised several places so you end up with more hiss and stuff than otherwise.

      Other than that your process seems pretty good overall.

      BTW. It’s not really time times three, if you add in the actual recording time then it’s more or less time times four.

      Better hope you’ll never need to do a 6 hour recording. :P

      • Majromax says:

        > If you amplify or compress and then do noise reduction you will have altered the audio such that the noise floor has already been raised several places so you end up with more hiss and stuff than otherwise.

        Also, compression is nonlinear.

        The noise cancellation works by figuring out which frequencies are noise and filtering those out, but the compression mixes frequencies together. This means that the noise cancellation will be less effective in the louder parts of the audio file.

        • @Majromax
          I did not ask how compression works so why the lecture I have no clue.
          Also by non-linear you probably mean logarithmic. However this is not entire correct. Depending on the settings and method you may actually apply a static amount of compression.

          Let me boil it down to the basics: I suggested that Shamus should apply noise reduction first and THEN do amplifying and compression.

          (I assume that Audacity adds no noise on it’s own if you amplify silence/close to silence.)

          But if you amplify and compress first, then the noise floor has already been increased and a background fan may be as loud as soft speech thus noise reduction may remove part of the speech.

          A good rule of thumb is to always clean up the raw audio first, once that is done one can start to tweak levels, edit/clamp splosives, apply compression or frequency filtering or EQ or similar.

  2. Da Mage says:

    And by the end of it, Shamus is completely and utterly sick of hearing everyone’s voice.

    I’ve used Audacity for years for simple audio editing and it’s a fantastic tools as far as I’m concerned. If it was more featured it would simply be harder and more complex to use for the simple tasks that use non-audio people want it for.

  3. Knut says:

    When I was a radio tech, I sometimes had to control the volumes of the different hosts real time while they were talking. Pretty annoying, but most people learned after a while to keep a somewhat level voice. A couple of tips, if I may:
    – If you want to shout, move a bit away from the microphone. This avoids clipping and the volume won’t be too loud, but it will still sound like shouting because the voice changes, so it’s good anyway.
    – Try offsetting the different voices slightly off center, so it sounds like they are sitting around a table or something (but not too much, otherwise it will sound strange from a headset). Makes distinguishing the different persons much easier, and the conversation seems more natural. (I guess this it pretty easy to do post recording for you)
    – If you are able, get a table microphone. Even a cheap one is better than laptop mics, and because they are fixed in place, you can use your voice in more ways (like move away when shouting, or moving closer when speaking softly). As I mentioned above, the important thing when speaking in different ways are not the volume, but how the voice sounds. In this way you can have some parts which are softly spoken, and other parts which are shouted, and even though the final volume might be very similar, the difference will still be very clear and effective.

    • Eruanno says:

      On the topic of offsetting voices: It’s okay to a degree, but when people end up being super much to the left or super much to the right it sounds really awful to me when using headphones. Like my headphones are broken in one or the other ear, alternating back and forth.

      • AileTheAlien says:

        Agreed. I’ve listened to some podcasts where, intentionally or by accident, the left-right balance was off the charts. The person closest to ‘centre’ still sounded like they were off to the side a bit, and the other two people sounded like the audio file had been slammed full-left or full-right in editing. Honestly a file that had completely balanced L/R (so that it’s effectively mono) would sound better than something this unbalanced. So, I guess my advice to Shamus would be, err on the side of caution*. :)

        * Read: “Err on the side of mono-sound, instead of accidentally slamming the audio to the left and right sides.” :)

  4. Ranneko says:

    At the risk of being obnoxious:
    Mumble will also let you record all users separately though into separate channels rather than separate files.

    There is more info here.

    I realise that Shamus already has a working set up, which means that he isn’t really likely to use this info, but hopefully this is a helpful addendum to anyone else using this as post as a to try to achieve the same ends.

    • Interesting. And Mumble support the Opus audio codec too which is arguably the best VoIP codec out there. High quality, wide band, dynamic bitrate support, low latency (20ms latency by default on packets).

      So Mumble using Opus codec and recording to multichannel FLAC should be as good as it gets.

      Ventrilo vs Mumble latency test
      (Note! The test with Mumble in the video uses the CELT codec, the Opus codec has even lower latency.)

      • AileTheAlien says:

        Shamus, y u no use this? :P

        Seriously though, Mumble looks like a great piece of software. It’s got low-latency, in-game overlay, and it’s open-source…heck, that’d be enough for me! When I get home tonight, I’ll dump some more money in the internet buckets, so y’all can experiment a bit and/or buy better mics and/or headsets for Shamus and Rutz. Keep up the good work, y’all! :)

        • My housemate uses Mumble for gaming and quite likes it. I definitely like it better than Vent or Teamspeak. I edit a friend’s irregular podcast using Audacity and he’s been pretty pleased with my work. I don’t know what his recording setup is, though–he has a Mac and it just produces an ordinary stereo file that I convert to mono–the different speakers are not in different tracks.

          • Volfram says:

            One of my friends(Karaias, mentioned below, higher in the conversation tree) really likes Mumble for its combination of high audio quality and low latency(lowest of all the VOIP systems he’s used).

    • mhoff12358 says:

      Be careful with mumble. It’s great and I use it all the time, but it’s still got free software issues. As of a few months ago there was a bug where when recording individual users in their own files, dropped udp packets would just be ignored in the recording, meaning stoke voice tracks would just have many short holes in them. Over an hour of dialogue you’d get everyone voices randomly offset be a second or two and the whole thing would have to be edited back into place. At the time it was fixed in an unstable release, but it was a nasty surprise to find at the end of a long recording session.
      So yeah, mumbles great, but given the nature if free software any rocking the boat is likely to stir up new bugs.

      • I fail to see how Mumble being open source makes it any worse than say Ventrillo or Teamspeak.

        Did the bug suddenly appear in Mumble on it’s own, or was it introduced during a new version or revision?

        Also if people use unstable releases then it’s their own fault.

        For major releases it may be wise to wait for a revision or two or wait a couple months to see if any issues crop up.

        So when you say “Be careful with Mumble” does that mean bugs will manifest on their own unlike in other VoIP software or? If it is the latest stable release that has this issue or if it was a specific release that ad this issue it would be nice if you mentioned that version.

        Shamus has complained about the last few packets being dropped from Ventrillo for years now and it’s still not fixed, that’s even worse IMO.

        “given the nature if free software any rocking the boat is likely to stir up new bugs”
        That’s bull. If “rocking” the boat causes nasty bugs to crop up then that is bad coding (by the previous or current coder).

        Are you saying the contributors to Mumble are very bad at coding? More so than for other VoiP software?
        Or are you saying that all open source are like this? Which is also a very odd ad damning claim to make.

    • Volfram says:

      I came here to post basically this. My friend and I use Mumble as a second audio channel recorder in our new(and terrible) Pokemon Insurgence LP series. We’re sitting next to each other so it’s really just a workaround because Audacity can’t accept signals from several sources at once, but were I doing a series over WAN, Mumble would be my go-to for audio.

      I hate audio cleanup. It usually takes me about 6 hours to do cleanup on the 1 hour of recording we’ve done for a week, I can’t put on music or watch any videos while doing it, and I still get pretty bad echoing and bleedover just because of limitations.

      Mumble also has a nice signal/noise ratio voice activation setting that I don’t think Ventrilo does. It can be made more sensitive than normal noise gate style activation, so even some of your quieter dialogue will get captured, but it won’t capture every single awful mouth noise you make, which is nice.

      • Look into using

        They are free (or rather Donationware) and gives you a virtual hardware mixer.

        There is also
        which gives you virtual audio cables.

        IMO the best virtual audio cables out there.

        With VoiceMeeter you can mix/pan/level and route that to Audacity mixed or as multichannel (not sure if Audicity lets you record in more than just 1 or 2 channels though)
        With VoiceMeeter Banana you can mix/level/EQ/record on the fly, no need to use Audacity.

        With VoiceMeeter Banana and a few Virtual Cables (there is te virtual cable then if you donate you get links to A and B cables) the two mixers themselves come with built in virtual cables as well.
        So you could hook up and mix like a couch full of people with headsets and game audio if you really wanted to.

        Using such a mixer should avoid the echo issue get from recording “What U Hear” / “Stereomix” as you wont have to feed mic audio into the system mix.
        Also the game audio can be fed into the VoiceMeeter mixer too.

        Multiple headsets are not an issue either. Depending how you use/setup the mixer and how you hook up the virtual audio cables you could feed the game audio to both headsets. That would eliminate any game audio echo due to room speakers being used for game audio.

    • Fast_Fire says:

      Like someone mentioned earlier and at the risk of being annoying: Mumble can be a viable alternative if Vent becomes too much of a problem.

      Local recording can be done by everyone individually and I’ve found that there’s more audio input options to compensate for different kinds of microphones than Vent and especially Teamspeak. This has the potential to cut time with editing and maybe even raise the sound quality.

  5. Josef says:

    Does the order of these steps matter? It seems to me that it would be better to remove noise before starting fiddling with the volume and compression…

  6. kikito says:

    Oh man, that sounds tedious. I’m very grateful for the time you invest in doing all this.

    I don’t know if you guys have a budget available, but it sounds like the kind of task that could be outsourced. I am not offering my services at all (I wouldn’t do that kind of work for money. I literally would prefer washing dishes for 3 hours).

    I organize a local development user group in my city. We invite someone every month to come to the group and talk once per month. The talks are taped, and the intention is to upload them to Vimeo. But they needed some editing. And no one in the community was able to do that.

    We have some sponsors. The group is not rich by any means, but we have *some* money available. One of my best administrative decisions, IMHO, was employing some of the money to pay someone to cut, cleanup and upload the videos of the talks. Now the videos happen more often, they have higher quality than before, and I can concentrate in finding unsuspecting victims volunteers to give talks, organize the venue, etc, not fighting video editing software and wanting to kill myself.

    • Shamus says:

      I’m teaching my daughter to do it. She seems interested and it would be a good skill to learn. We’ll see how that goes.

      • Eruanno says:

        If she does pick it up and enjoys it – consider compensating her for the work! Reading about how much work you do, it seems like it could be worth it for both you and her.

        • Heh, I was contemplating volunteering to do it myself–I edit the irregular podcast of another Objectivist blogger friend of mine, so I already have Audacity et al installed on my computer. I also have something incredibly valuable–free time. I could use more productive things to fill it up (even if I’m just volunteering my time) because then I can put them on my CV.

          I like editing work–I had a paid gig editing a novel last year–but I’m godawful at locating potential customers and marketing myself. Absolutely vital skills.

    • Volfram says:

      I ballparked on Shamus’s “how I spend my week” post and estimated that I’m a little faster at audio cleanup than he is(3-6 hours of work for 1 hour of audio) but at the same time I’m only cleaning up audio from 2 people, while he’s got between 3 and 6 usually.

      On the other hand, he doesn’t have problems with noise bleed because no two Diecast members are in the same room.(I just bought a pair of shotgun mics to try and fix this)

      Either way, it is indeed tedious, and my personal least favorite part of video production, but I insist on doing it because it can add so much to the experience.(As one of my friends commented, with the stereo panning, it’s like watching with the two of us sitting on either side of her)

      But I still hate it. My Monday episode has been going up late because I put off cleanup until I absolutely have to.

      By comparison, video cleanup is actually kind of fun(especially when adding visual bells and whistles), and takes about 2-3 hours per hour of content.

  7. Arstan says:

    But Rutskarn and I have low and medium grade equipment, respectively, and we often have fans running near where we record.
    What I pictured at first was tons of show’s fans running aroung Shamus’s and Rutskarn’s houses making noises)))))))
    But then I understood my mistake.

  8. Zak McKracken says:

    I agree with James that recording locally to lossless formats and then sending the files to superimpose them should give better quality, and also de-noising before compression seems like a good idea — could not say if many people would hear the difference though.
    Bonus: Since Audacity only uses one thread, you could do this for each audio file in a separate audacity instance, in parallel.

    What the compressor does it not just making the mid-levels louder while keeping the high levels where they are, but also doing this adaptively: Quiet passages will be made louder very loud passages won’t, the “attack” and “decay” settings are used to determine how smooth the transition between them is. You can hear the effect when listening to any old regular contemporary disco stuff. The bass drum is the loudest instrument, and after every beat, everything else sounds quieter than usual for a quarter of a second. That’s a compressor with a quarter-second decay time.

  9. Zak McKracken says:

    I think I mentioned this before, but I know some “professional” casts with worse quality. Which says as much about the Diecast as it says about some of those professional ones…

    => Thanks for putting this much effort in, I appreciate it!

    … and yeah, Chris can teach the rest of you how to talk into a microphone :)

    • OMG you would not BELIEVE how crap most people are at producing audio for anything. I used to work for a transcription firm that provided transcripts for conferences and so forth and I swear they recorded some of them by putting a tape recorder in front of an old-style PA speaker system. You could barely make out any words, it all sounded like Charlie Brown’s Teacher talking.

      And there’s just NO EXCUSE for this kind of crap these days. Ditch that 1970’s system if you actually care about preserving your audio for posterity. :P

      • Speaking of which, I kinda wish there were transcripts for some of these podcasts. I don’t like listening to podcasts, I much prefer to read things whenever possible. And it makes it so the hearing impaired can access your stuff. Considering that podcasts are sometimes close to 50% of the content on the site, this might be a useful thing to get. With audio this good it’s actually very easy to transcribe.

        • Zak McKracken says:

          how reliably can transcription be automated these days? I wouldn’t want to be the person who has to go through and listen to everything at writing speed, twice to make sure I got it right…

          • As far as I know, it still can’t be. Like, at all. Speech-to-text works all right for straight dictation (but even then it’s usually pretty messy) but I haven’t yet run across one that can even start to get a grip on actual conversations, much less break it up accurately into who is speaking when.

      • Trix2000 says:

        It really doesn’t help that audio quality is not something a lot of people consider unless A) They hear something REALLY bad quality, or B) They hear a direct comparison with something of really GOOD quality.

        The former doesn’t happen often and is usually localized (ie: this song sounds bad, not this speaker). The latter doesn’t happen as much as it should, since a lot of common audio equipment (speakers and such) is not very good quality-wise, since (as above) most people won’t notice there is a difference.

        It says something that one of the most popular sound file formats is mp3, which actually is really not all that great for quality. Yet it’s so incredibly prolific and accepted that it’s also what people are used to, so it becomes a sort of cycle of non-improvement. (This is not to say mp3 is completely terrible, but it IS lossy compression, so naturally it loses something over the raw audio).

        But then there comes a time when you hear a nice uncompressed .wav on an actually good set of speakers/amplifier, and it becomes so clear how much audio quality IS a thing we should be aware of.

    • The most sad and ironic thing ever is to go on youtube to look at videos about audio recordings and how to do good mic setups and stuff like that. And the audio in the video is clipping and distorting or way too hot.

      There are also “video” tutorials with blurry/artifacty/washed out colors and stuff which is just as sad.

      And then there are youtube “reviews” which is nothing more than a shaky onehanded camera unboxing of stuff. *sigh*

  10. Zak McKracken says:

    Since I’ve recently become aware of this in myself: It’s not good for your voice if you run out of air mid-sentence but still try to finish on whatever’s left in your lungs. In addition to being much harder to hear, most people hurt their vocal chords while doing so (I certainly have…).

    • Shamus says:

      Yeah. That might be why my voice is inherently “croak-y”. I spent most of the first 20 years of my life with greatly reduced lung capacity because of my asthma. (It’s still not great now, to be honest.) When I was little I’d have to stop mid-sentence to suck in a huge noisy breath. That’s a very stereotypically “dorky” thing to do, and once I became aware of it I tried to avoid it. Which is why I do the trailing off thing now.

      I suppose I could split the difference and switch to Shatner-speak.

  11. Amplify and then compress? No, just Normalize.

    Also, Vent isn’t the only software that lets you record one track per person, on the end of one person. Mumble can do that too. Either way it’s a bad practice. You should have everyone record their Audio with Audacity on their end to get the cleanest signal without the shenanigans the internet induces, and then have them send it to you. Much cleaner that way.

    • AileTheAlien says:

      OK, normalize actually looks like the better tool. It does the same thing as compression plus amplification, but also let’s you do multiple tracks at the same time. i.e. All the different cast members’ mics.

      So, why does Audacity have all of these options separated? I can understand needing the amplify if you just want to make something louder, but normalize seems like a strict superset of compress, in terms of what it can do, and the times when you’d want to use it.

      • Ben says:

        As far as I know, Normalization does not Compress, it just does amplification. It’s just a different way of looking at amplifying. Instead of saying, increase the level by x amount, it increase the volume to a specified level. With the Normalization tool, you can set the maximum level to something lower if you are going to be adding other effects that would amplify the signal.

      • Zak McKracken says:

        normalize does not compress, only adapts the amplitude globally. Compression both adjusts amplitude locally and decreases the relative difference between the loudest and not-as-loud sounds.

        Normalize might be a better way of setting amplitude, but it can be tricked if someone accidentially hit their microphone while speaking or such, and will then normalize everything to that peak. When you’re doing it manually, you can make it so that spoken words get the same amplitude. more work, but better results.

        • “more work, but better results.”

          Well yeah, obviously. But Shamus is complaining about the amount of work. And usually you can just LOOK at the waveform and see where ‘oops, I hit the mic’ spikes are, and edit those out beforehand. Still less work for good enough results.

          And it’s saving him a lot of steps. Just edit the spikes, which he should do anyways, and then normalize all tracks in one go, ideally to about -5dB or something thereabouts so he has a bit of headroom for later processing, or -1-ish dB if he doesn’t plan on applying later effects.

          Edit: Also, try giving autoduck a spin. Not for the Podcast, but for Spoiler Warning. It’s quite great to apply that to the game audio once you’re done bouncing all the commenter audio to one track, and automagically get the game sound lowered whenever someone speaks. Play around with it for a while until you get the hang of it, and POOF your time spent editing goes down by a LOT.

          • Zak McKracken says:

            autoduck sounds like a really good idea for SW.

            on the normalize/compress front: The audacity manual doesn’t look as if normalize is compressing, and I never thought it would. I mean, there are no options for the compressor. And I think using a compressor is incredibly useful for those of us listening in the car, simply because people change volume while speaking.

            …actually, I think Audacity should allow automating much of that process.
            The part that I assume is much more time-consuming is the editing, and I’m afraid there’s no quick solution except accepting that there will be irrelevant cross-talk in the podcast.

            … there really should be a “raise finger” thing in some VoIP software, where others will notice when you press the push-to-talk button and can then make time for you to speak. Ideally there’d be some cue analogous to an expecting look so everyone can tell who’s expected to say something but that’ll probably have to wait until we all can use VR kits for this sort of thing.

      • “So, why does Audacity have all of these options separated?”

        I don’t know, I’ve never seen a normalize option in the DAWs that I worked with (Logic, Cubase and Reaper, fwiw).

        Also, I really want to stress this: Get the folks to record their audio locally, sync up in the beginning with one person going “Three, Two, One” and then all “SYNC”. That way, you’re less likely to get overlapping audio due to lag, which cuts down on the editing time quite a lot.

        • Dude says:

          Normalization is called brick wall limiting + compression in music production DAWs.

        • Cuthalion says:

          Reaper has a normalize, but if I recall correctly, you apply it to a track (or rather, a clip or whatever a piece of audio in a track is called) by right-clicking and going to its properties or something instead of as an effect. I think it does the same as people are describing in Audacity: crank it however much is necessary to get the highest volume up to max volume.

    • James Schend says:

      Normalize doesn’t do compression, not in Audacity at least.

  12. James says:

    I’ve done some Foley work, and podcast work, and video editing.

    I love Audacity because its free, the UI is shitty but i learned to use it.

    Editing a 1 hour podcast takes along time, and when you listen to your own voice for hours you just hate it so very much.

    On Foley work i actually enjoy Foley work, but its hard long work, remaking the steps from a 6-7 second scene can take hours, you cant just have a single step sound and duplicate that sounds like shit, you need it to be diverse without wasting too long on it. and with film there is a concept that you have to have background noise because in filming this sound is almost always lost. so you get the odd moment when you go to a room, hopefully the same one the scene is shot in and just record ambient noise for 30 minuets because people WILL notice if its missing. also sometimes you need make new sounds, the work of Skywalker Sound and other groups is remarkable, i thoroughly recommend people look it up. they are the Industrial Light and Magic of sound design.

  13. Trix2000 says:

    It’s clear to me you already understand this (since you mentioned it in the article), but I can’t emphasize enough how much the audio recording equipment matters in a general sense.

    Not just on the hardware side but also in software, as a number of products (and sound cards, for that matter) will do their own processing on the incoming audio stream to make things sound better – like noise cancellation, beam-forming, and AEC (though with headphones this isn’t needed).

    Almost any mic will do to record with, but it helps a lot if said mic can do a lot of the cleanup/processing work for you.

    • James says:

      This yes, having listened to sound from both shit quality mics recorded over Skype, and high end $1000 mics recorded in professional conditions on professional software the difference is huge, the amount of clean up needed on the former is a lot harder and sometimes impossible to completely fix it without spending days digging deep into the file and manually adjusting a lot of things

  14. JackTheStripper says:

    The audio clipping at the end of a message is not a bug, just a problem with the
    how quickly the program closes the transmission versus how long we humans stop producing sound at the end of a sentence.

    I use Mumble, which is practically the same thing as Ventrilo, and this program has a slider option called “Voice Hold” which keeps the voice transmission open for the selected additional extra time after you let go of the button (say for example, adding 0.5 seconds of hold time after you let go of the button), so that your voice transmissions don’t end up inadvertently clipped. This feature is not really that complex, and I suspect that Ventrilo also has a similar option. Do look it up, Shamus.

    • Cuthalion says:

      They’ve mentioned previously that they don’t hear the cutoff during recording. It appears to be an export bug, as if there’s some built-in noise gate during the export process that is overzealous and non-configurable.

      • Shamus says:

        This is correct. In the session, everyone heard me say, “But what to they EAT?” When I listen to the session in Vent later, it sounds the same. But then when I export to wav files, I end up with, “But what do the-“.

        It’s an export bug in Vent, and it seems to only impact longer recordings. You can’t replicate it with a 5-minute conversation.

  15. MikhailBorg says:

    I may be the only person here who uses GarageBand for my podcast, Managlitch City Underground. But even I use Audacity on OS X for some of the weirder tracks, like building muffled background starship audio from actual NASA archives.

    I don’t want to sound like a spammer, but the information in this post really is going to be pretty useful to me. I’m sure I’m doing a million things wrong in my one-person operation, but I’m having lots of fun flailing about.

  16. Dude says:

    Shamus, I’ll explain compressors to you. Use your screenshot for terms.

    Most compressors are reduction compressors: they don’t boost the signal, they reduce the signal volume. There is usually a threshold setting. If the volume of the signal goes above this threshold, then the volume is reduced by a suitable ratio. So, if the ratio is set to say 4:1, then every time the threshold is crossed by 4db, the compressor reduces the volume so only 1db is output. Slightly simplified, this means that 16db over threshold results in only 4db actually getting output. So, the sound is “compressed”, or reduced. I can explain the attack and release/decay controls if you want, too.

    What actually increases the volume afterwards is the little checkbox you see in your screenshot which says, “Make up gain for 0db after compression”. This takes the compressed audio signal and then just amplifies it till the loudest peak is at 0db again. In effect compression enables you to make quiet parts louder, like you said, but that’s not what a compressor does. It reduces the volume–or the dynamic range–of the signal.

    • Cuthalion says:

      This. Compressors are a little counterintuitive, and it sound like you’re using it fine, but if you want to know what they’re actually doing, this is it. They are evening out the volume by making the loud parts less loud.

      Many have a checkbox to try to then pull the whole thing up so that it sounds like it’s making the quiet parts louder instead of the loud parts quieter, and this is the end result most people want with a compressor. But the compression itself is the loud -> less loud bit.

  17. evileeyore says:

    In case you didn’t know Shamus, that “few minutes of silence with an open mic” is called taking the room tone.

    And now you can impress everyone by misusing this terminology like a pro.

  18. Scerro says:

    I started hyperventilating when I read you use Ventrilo. Ventrilo is a horrible, horrible, outdated program. It’s really sad if that’s the only program that lets you separate the streams. I’ve run into horrible out-of-sync problems with it in normal use.

  19. Duoae says:

    I used to co-host and edit our podcast and I used audacity for the editing too. Pretty much used a very similar process with the big differences being:

    1) We locally saved our own files and uploaded them to the editor.

    2) I never used the compressor because I didn’t realise that it wasn’t compressing anything… who came up with that name? It actually sounds like what an equaliser is supposed to do – unless my unfamiliarity with audio-terms is bringing me up short.

    Unfortunately, having a really big difference in mic quality really doesn’t help when making a podcast – and you’re right, different people make all sorts of different habitual noises when they’re just sitting around…. and yes, the editing usually took at least twice as long as the podcast recording length (we usually recorded for 1-2 hours which I then edited down a bit).

    I miss actually recording and talking verbally about video games though…

    • Dude says:

      2) People who were mastering and producing music and audio before “computer data” was a thing that you could “compress” to make it smaller.

      Compressors are not equalizers; think of them as an automatic volume slider or an aimbot for volume control (read a couple of posts above yours for my explanation). :)

    • Zak McKracken says:

      A compressor compresses dynamic range. Like HDR tonemapping for pictures, for anyone who knows what that is. It reduces the difference between silent and loud tones.

      If you move towards and away from the microphone, the audio level at the mic can change quite dramatically. Compression is used to reduce that difference a lot. As far as I know, every professional live music event uses a pretty strong compressor on the singer’s voice (with a “low-noise” cut-off to avoid getting the other instruments in as well), because when you’re holding the mic in hand and possibly bouncing across the stage, too, it is impossible to sing with a constant volume, at constant distance to the mic. If you got the raw audio, it would sound all over the place. A compressor will take that input and map it to a much smaller range of volumes, so the singer will remain at reasonably constant volume relative to the other instruments.

      At the same time, if you try to listen to music in a noisy environment, you can’t hear low noises but you can’t have them too loud or you’ll damage your ears. Which means that there’s only a limited amplitude range where you will be able to comfortably listen to something and still understand it all.
      That’s why compression is also used for radio broadcasts (and most recorded music, really): The host can talk softly or shout at the mic but listener’s ears will neither explode nor will they have trouble hearing the host because of the magic of dynamic range compression. It also makes your voice sound sexier :)

  20. Neko says:

    I have a question: How do you import the Ventrillo recording into Audacity?

    I don’t use the official vent client myself, I use Mangler. I’ve noticed I can get it to record … except it makes some odd “.vrf” file. I suppose that might be compatible with the way the official Ventrillo client does it, and it would have to hold everyone’s voices separately, but I’d kinda like to have a normal wave file at the end.

    My specific use-case is recording WoW raid videos; I want to be able to record people talking as well as game audio. Since Mangler doesn’t give me a monitor for my own voice, I’d have to set my own up, but then it would monitor my voice 100% of the time and not just when I’m using push-to-talk. Which sounds bad.

  21. Shinan says:

    There was a program called “Levelator” (the site seems to no longer exist, but I’m sure the program can be found somewhere) that sort of “magically” did compression and amplification and all that stuff. Or at least that’s how it sounded to me.

    So in the podcast I’m part of I tend to edit (very similarly to how you do) the complete thing and export it. Then run it all through the levelator. (and after that I tend to add the music, because once when I forgot the music fades didn’t exactly work after the post-work)

    But if you can find it give Levelator a try. I think I heard about it on some other podcast. (might have been Boardgames to Go)

  22. The Seed Bismuth says:

    This seems like the best post to ask. Since Shamus updated the site I have had to reload to get footnotes. That is to say a page will load everything but footnotes and I will almost always have to reload the page to get them to appear. Has anyone else had this problem?

    P.S. Love these “Making Of” post Shamus keep them coming.

  23. Steve C says:

    We can’t see each other, so we’re deprived of all the physical interpersonal cues that people use to signal they would like to talk:

    I’ve thought about a solution to that; Use chat alongside voice.

    When someone has a point to add, hit enter into chat. You wouldn’t need to actually type anything. Just a blank line would be enough to alert the person currently talking that they should give a pause. It would help with cases where one person is talking and two people try to break in and there is this awkward back and forth as they everyone tries to figure out who speaks next. The order people got a line-break into chat would also be the order they could speak. Simple one letter lines into chat like ‘q’ for “I want to ask a question,” and ‘a’ for “I have an answer to that,” and ‘!!!’ “I really want to speak” were all things we also used but a blank line was the norm.

    This is how we kept discussion organized in WoW raids where there could be 25 people all trying to speak in the same channel.

  24. Steve C says:

    Also the longer the show is, the more time I spend waiting for bulk audio tasks to complete, so the increase in time is non-linear.

    As a listener I’d prefer if Diecasts were cut in half. Same amount of content, just 2 separate files.

    I listen to the Diecast while doing other stuff. Very often (multiple times every Diecast) I’ll stop paying attention to the Diecast in order to do something that needs my attention then reverse back to where I was. The problem is my mp3 player sucks at fast-forward/rewind since it was designed for 3min songs and not for hr+ podcasts. Finding my place again is exponentially more difficult the longer the length of the file. An hour is the very top end where I can listen to it the way I want. After that it’s such a pain in the ass that I have to give up and listen to the Diecast only on my computer. I’m very glad the 100th episode was a one off. I’d probably stop listening if every file was that big.

    Again, it’s just the file length that’s the problem for me. If file length is a problem for you too, please please change it. Talk for as long as you want, but I’d greatly appreciate it if Diecast #106 and beyond was separated out into:


    • Retsam says:

      Splitting the file wouldn’t actually improve things for Shamus, I don’t think; he’d still have to do all the work that he’s currently doing, just with the extra step of then splitting the audio up into three parts at the end. (If he does it at the beginning then that’s going

      Unless the bulk tasks are actually exponential in time based on file length, but that seems very unlikely to me. What exponential work would it be doing if you could split the file in 3 and get the same result?

      Likely I think the “increase in time is non-linear” statement is just not strictly speaking accurate. From Shamus’s description there’s a small fixed component to the time (amount of time it takes to queue up the “once and done” tasks) and a large component of the work that is proportional to the length of the podcast. (processing time for large tasks, individual meddling with specific parts of the show)

      A more mathematically accurate statement would be “linear but with a large coefficient for the non-constant term”… but that hardly rolls of the tongue as well as “non-linear”.

      (And personally, I really don’t want three times as many files to download and mess with)

      • Steve C says:

        It wouldn’t make any sense to split it up at the end. You’d split it up as you record. Which is as far as I can tell exactly what they did for Diecast104 & 105. It was a single session in two separate files. That’s why there were two Diecasts this week. That’s exactly the sort of thing that I prefer.

        I totally get why people wouldn’t want three times as many files to download and mess with. If people didn’t mind having two Diecasts this week though, then it’s not actually a problem.

    • Ideally thee should be chapter markers or similar but neither .mp3 nor .ogg support that normally. Also no guarantee all players do (.ogg should in theory support it).

      I prefer to listen to the Diecast in Firefox as Crrome’s audio player has a way to short track bar slider.

  25. RCN says:

    Ah, the plight of engineers and maintenance.

    If you do a crappy/mediocre/half-done/only ok job, everybody hates you and/or people might die.

    If you do your job perfectly, nobody can tell if you did your job at all!

    It must be really unnerving to have only silence as the highest form of praise.

Leave a Reply

Comments are moderated and may not be posted immediately. Required fields are marked *


Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun.

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>