While everyone seems to have heard that JPEGs are a lossy format, not everyone knows the technical reasons why this happens. JPEGs do two things when you save them. First, the convert the color space from RGB to YUV. Y is the luminance, which is similar to the image converted to grayscale. U and V represet the chrominance-red and chrominance-blue components. RGB and YUV are just two different ways to represent the same colors. However, the human eye interprets color closer to the YUV decomposition rather than RGB.
The conversion from RGB to YUV and back should be done with floating-point computations. Otherwise, some colors
cannot be reconstructed. Unfortunately, JPEG uses integer math. Saving any image as a JPEG results in a significant modification to the image's colors. However, subsequent resaves really don't alter the colors much.
The second thing that happens with a JPEG is the frequency quantization. The image is divided into 8x8 pixel squares. (Technical detail: depending on the JPEG subsampling, chrominance may use 8x8, 8x16, 16x8, or 16x16. But for simplicity, let's ignore this.) The 8x8 pixels are converted into 8x8 frequencies using a discrete cosine transform (DCT). Finally, the DCT values are quantized, or scaled. With JPEG, most of the information is removed from the higher frequencies since the human eye is more sensitive to low frequencies.
To regenerate the image, the stored, scaled frequency values are multiplied by the quantization table values and then converted from 8x8 frequencies to 8x8 pixels. Finally, the 8x8 pixels are converted from YUV to RGB.
The quantization step is what causes JPEG images to get worse with each resave. Due to harmonics and rounding problems from integer math, going from the decoded values back to encoded values results in more and more values being cut off. Sure, eventually the image will reach an equilibrium where nothing more is lost, but until you reach that "bad quality image that never gets worse," each resave will degrade the JPEG.
Detecting Changes
The amount of image degradation is not linear. The first time you save a JPEG, the values are "original". The image will change the most during the first resave. The second resave changes the image a little. The third changes it a little more. And so on. If the first image is saved at 90%, then a resave at 90% creates an image equivalent to 81%. The next resave at 90% is equivalent to 72.9% (90% x 90% x 90%). The
nth resave at 90% should be equivalent to 90%
n. (It won't be exactly due to rounding error from integer math and
stepwise approximations, but it is good enough for this description.)
Knowing this, I developed an error level analysis (ELA) system. By intentionally resaving the image and comparing it with the pre-resave image, the amount of change can be identified. The idea is that the entire image should be at roughly the same potential error level. If an image is digitally modified, then the modified areas will have a different error level potential than the rest of the image. (There are many caveats, including high-contrast edges, frequency impulses, and uniformly colored surfaces, but this is good enough for most images.)
Error Level Analysis
Although I released enough detail for other people to implement the algorithm, I never released code. Instead, I left it for other people to develop their own variations. And while there are a couple of different implementations out there (
Noah,
SoCo Software,
Schlake), I have been really impressed by the
Image Error Level Analyser by P. Ringwood. His system allows you to paste in a URL to a JPEG and compute the ELA error potential. Better yet: he caches the result so you can refer to it later.
I'm thrilled with his results (and entertained by his domain name: errorlevelanalysis.com -- I'm jealous!). I think it is wonderful that Ringwood has opened his application for other people to experiment with. On a scale from 1 to 10, this is awesome.
Sounds like you're doing it right!
Speaking generally and not about any specific image:
Images should stablize after about 25 resaves. (Some stablize after as few as 5!)
The quality level does alter the rate a little. 75% usually stablizes faster than 90%. However, 90% makes smaller adjustments, so it may hit a local minima in the error level much faster. (Big boulder rolling down hill stops when it hits the bottom. Small rock stops when it hits another rock.)
I have a high quality photo (from a digital camera) that I usually play with. At 75%, it stablizes after 24 resaves. (Stablize meaning "exact same sha1 on the file".) At 90%, it stablizes after 17 resaves.
If you're seeing the picture stablize after 10 resaves, then either the image started off low quality, or you're using a very high quality level for the resaves (like 95% or 98%). Do you mind if I ask what resave quality level you are using?
I found an interesting result: if you save repeatedly with different qualities, say, 99, 97, 95, 93, 91 percent, it will degrade the image more than you would expect at that high quality level. It's like the JPG algorithm is unable to converge.
Anyways, thanks for the interesting topic.
========================
#!/bin/sh
cp srcimage.jpg image.jpg
while [ true ]
do
for Q in $*
do
mogrify -quality $Q image.jpg
echo -n "$Q `wc -c image.jpg` "
done
echo
read DUMMY
done
Cheers for the kind words. I'm actually looking into ways to port that jpegquality.c code over to work on my website, so it's not always running at 95% - but sadly, since my day job beckons, it hasn't happened yet.