Video Compression Gone Wrong

By Shamus Posted Tuesday Apr 17, 2012

Filed under: Movies 140 comments

The technology behind image and video compression is an amazing thing. It began as an effort to get more of our pixelated scanned photographs onto a single floppy disk and has now grown into a scientific discipline that blends visual perception with information theory. Throw in some compression algorithms with size / CPU time tradeoffs and a dash of obnoxious patent trolling and you have a field of study that can keep you busy for an entire career.

The early compression systems (like .gif) were just focused on ways to pack together redundant pixels. Kind of “hey, the next twenty pixels are all exactly the same, so instead of repeating the same bit of information 20 times, I’ll do it once with a note on how many times to repeat it.” It was actually a bit more complicated than that, but you get the idea.

A picture of St. Louis during the paleolithic age.
A picture of St. Louis during the paleolithic age.

This worked great when images were made from 16, 64, or 256 really distinct colors. If there are only 16 different colors in your image, there’s going to be a lot of repetition. However, as the number of colors grows, detecting duplicates and repeating patterns becomes useless. At some point a clever engineer noticed that while your average image of 16 million colors had almost no points where the same color pixel was used again and again, there were a lot of situations where the pixels were nearly the same.

Human beings sense changes in brightness much, much more easily than we detect changes in color. So, what if we just reduced the color diversity a bit? Don’t mess with the brightness, but just fiddle with the hue of pixels to reduce the number of distinct colors. This can be tricky. Check out this image:

jpeg3.jpg

This is our example image, reduced to just 16 colors. You could compress the hell out of this sucker. However, it kind of looks a bit horrible. In our 16 color palette, we made room for a lot of blues. The sky came out in decent shape, but the landscape was ruined. There just weren’t enough shades of orange to make it work. We could go the other way, and favor the orange, which would result in…

jpeg2.jpg

The ground came out pretty nice (for a 16 color image) but the sky is ruined. But! What if we didn’t use a one-size-fits-all color palette for the whole image? What if we examined the thing in sections, and just reduced the color locally?

jpeg5.jpg

What if we looked at section A, figured out how much color diversity was there, and made a palette just for that area. Then do the same for section B. What you end up with is a quad tree, which is something I describe in my Terrain Project. Start with a big section of image. Is there a lot of color diversity? If yes, then cut this section into four pieces and examine each quadrant. Keep dividing until you’ve got sections with sufficiently homogeneous color.

However, you will still end up with squares with a lot of color diversity. I’m sure some of you remember the early days of the internet, when people turned Jpeg compression way up to make their webpage images load faster on those old 14.4k modems. (Shudder.) The compression works great for those large areas of similar color, but it leaves ugly seams where two things of different hue and brightness come together.

jpeg1.jpg

Note those square areas where the sky touches the landscape. You end up with a single square that needs a lot of oranges and a lot of blues. It tries to split the difference, and you end up with a mess. Check out this up-close view of what it did to the rock arch in this image:

jpeg6.jpg

Shrinking images is a complex business. I’m glossing over a lot of details here. There are a lot of different ways to reduce color, or to try and preserve changes in brightness while allowing hues to drift. There are tradeoffs to the various methods, based on what is the most important about the image in question.

Now take everything we’ve just learned, and add it to the even more convoluted science of moving images. In most videos, a majority of the pixels are the same from one frame to the next.

batty1.jpg

When you have someone talking, there usually isn’t a lot of motion in the scene. Their mouth moves. Their head moves slightly. They blink. From one frame to the next, only a very small percent of the overall image needs to change. Sure, there are times when the camera is moving like crazy, and in those cases it gets harder to get much in the way of savings. But most of the time, 90% (a number I made up, but is reasonable) of the pixels will be the same between any two adjacent frames.

But even among those pixels that are changing, they’re not changing much. When Rutger Hauer moves his mouth, there are slight shifts in the lighting on his cheek, but it’s still skin-tone under a blue light.

batty2.jpg

A single 1920×1080 Blu-Ray image is comprised of 2,073,600 pixels. At full color, each pixel would take 3 bytes, which means a single uncompressed frame would be 6MB. That’s 144MB of data a second for video, or 8GB a minute. A dual-layer Blu-Ray holds 50GB of data. Which means that on the average Blu-Ray disc, by the time you were done with the FBI warnings, the main menu, ad the trailers, the disc would be full. Without compression, you would literally run out of space before you reached the opening credits.

So we’re getting a ridiculous amount of compression, is what I’m saying. We’re getting more than people might have thought was mathematically possible 20 years ago. Videos aren’t stored as a series of fixed images, but instead they begin with a reference image / keyframe, and then subsequent frames are built by keeping track of the slight differences from one instant to the next. These changes are cumulative, making pixels brighter or darker as characters move around the scene. Then the camera undergoes a significant change (the lighting changes, or the POV shifts) and we get a new keyframe, and the process begins again.

Now, you don’t want to end up with a situation where a fast-moving scene of flashing lights ends up taking a ton of space. Without some sort of safeguard, a really busy scene could bloat the individual frames. You might end up in a situation where the device is physically incapable of moving data off of the disc rapidly enough to keep the video going without stuttering. (I don’t actually know if this is possible now, but I know it was a concern in the past.) You don’t want some section of shaky-cam and flashing lights to blow your storage, throughput, or CPU budget. So, videos are throttled to only feed so much data a second. If the raw footage is too complex, the compression will reduce the image quality. Effectively, the movie will get blurrier. Since this happens mostly in sequences of flashing lights and rapid camera movement, it’s actually hard to tell.

jpeg7.jpg

Now, this means that some types of things will compress far better than others. For years I wondered why I couldn’t find a decent version of Groove is in the Heart by Dee Lite. The YouTube version is pixel soup. You can google around and find slightly better versions, but they’re all basically mush. Perhaps the label just hasn’t seen fit to put out the official video, but I kind of suspect that problem is the video itself. Made in 1990, it’s almost perfectly engineered to thwart modern compression techniques. Most of the video is people in multicolored clothing, dancing in front of a kaleidoscopic background. They are going to be very few reusable pixels from one frame to the next.

Which brings me, 1,200 words later, to the point of this article. As I said above, as you watch a movie it’s really just altering the on-screen images a little bit at a time. But what happens if some of those keyframes end up missing? What happens if your video is riddled with Swiss-cheese gaps, and you have sections where you’ve got all the changes, but not the base image being changed?

This is apparently possible for people downloading movies via bittorrent. The peer-to-peer nature of the thing means that you download sections of a video chaotically, based on what everyone else has, what you need, and which peers are available. Apparently video players will play these fragmented things. Which results in the following video. Note that according to the poster, no editing was performed on this. This is exactly how the incomplete video appeared:

Mad Men, bittorrent edition:

Yes, the music is part of the magic here. But still. This was strangely haunting.