Bin Yu

3. LOSSY DATA COMPRESSION

3.1 Rate-Distortion Theory

Working with MDL for part of my thesis introduced me to information theory, and my research has become more involved in information theory since then. The information theory ideas related to MDL belong to noiseless data compression, whereby redundancy is eliminated without loss of information. Memory space and transmission rate requirements can be reduced if one is willing to incur some information loss. This is called lossy compression and the theory studying its theoretical limits is called rate-distortion theory. Lossy compression is often done in practice, for example, when fetching images on the world wide web, recording audio on a compact disc or transmitting voice with a wireless digital telephone. looks at an old problem in rate-distortion theory: what is the universal redundancy rate for lossy data compression if the data are iid and discrete. It had been known that $n^{-1} \log n$ is the universal redundancy rate for noiseless compression. The question was whether the same rate holds for the lossy case. This rate was asserted (for the distortion-rate function, which is the inverse of the rate-distortion function) in 1968 by Pilc, but his proof contained a major gap. Using the method of types and the parametric representation of the rate-distortion function, we showed in Yu and Speed (1993) that the rate $n^{-1} \log n$ is indeed an upper bound to the optimal rate. In the same paper, we conjectured that this bound cannot be improved. This conjecture has been proved very recently by Zhang, Yang, and Wei.