The quantization process in modern video encoders tends to make a lot of assumptions. A common one is that of continuity and uniform step size–that, for example, if we are quantizing the value 2.5, both 2 and 3 will give equal distortion, being exactly 0.5 off from the correct value. But this isn’t always true; in reality, we are working with an 8-bit range in each channel. The inverse transform has to round our high-precision internal values to a small output range.
Normally, this isn’t a problem. Since AC coefficients have (by definition) different output values for each output pixel, they serve to effectively dither the output of the iDCT. But what happens when we don’t have any AC coefficients?
0 1 2 3 4
0 1 2 3 4 X 5 6
The bottom row represents quantization steps and the top row represents actual pixel residual values. Our DC coefficient is ~4.3, which will be quantized to 4. The 4 will then be dequantized and iDCT’d… and rounded down to 2. But our original DC coefficient represented a pixel value of ~2.7! At each step we rounded to nearest, yet when rounding is taken into account, we actually rounded away from nearest. In a rounding-aware quantization process, we would have rounded to 5 instead.
The same situation can occur in reverse:
0 1 2 3
0 X 1 2 3 4 5 6 7 8
In this case we quantize to 1, and then iDCT… back to zero. We coded a coefficient that had absolutely no effect on the output image–what a waste! A rounding-aware quantization process would of course send this coefficient to zero.
This effect is generally very small unless the quantization step size is less than one output step–in which case it becomes very large very quickly. Fixing quantization to try to take this into account provided small gains in luma at low quantizers, but significantly higher in chroma–up to a 10% gain in one particularly pathological case. The reason for this is that chroma DC coefficients have 2x the precision of normal DC coefficients due to the hierarchical transform used in H.264. Of course, this hierarchical transform also makes it a tad more tricky to perform this optimization.
Update: Patch is now committed.