2. INFORMATION THEORY AND STATISTICS
2.1 Mininum Description Length Principle
The Minimum Description Length (MDL) principle is an approach to statistical modeling that uses a source-coding framework. It continues the tradition of interactions between statistics and information theory. This tradition is exemplified by the early book by Kullback in 1959 and the recent book by Cover and Thomas in 1991, and by the works of Csisz\'{a}r and others on I-divergence, a measure that finds many applications in statistics. As a general principle for modeling, MDL has been recognized and used by people who face modern stochastic modeling problems such as model selection in regression and time series, cluster analysis, image segmentation and compression, wavelet signal processing, and neural nets. The MDL idea, or shortest description idea, is very natural in statistical classification problems as in the works of Boulton and Wallace in 1968. It has also been applied to order selection problems in time series, as a useful alternative to AIC and BIC (in fact, the two-stage form of MDL model selection coincides with BIC). Another form of MDL model selection, predictive least squares, is a natural and valid empirical prediction error assessment and has been well studied in the time series context by several researchers including Hannan, Rissanen, Guo and Huang, Wei, and Lai. My work on MDL has focused on contrasting MDL with other methods in terms of conventional statistical criteria. Asymptotic expansions of AIC and the three forms of MDL model selection criteria are derived for finite-dimensional normal regression models in
T. Speed and B. Yu, ``
Model selection and prediction: normal
regression,''
J. Inst. Statist. Math. 1993, 35--54
J. Rissanen, T. Speed and B. Yu, `` Density estimation by stochastic complexity,'' IEEE Trans. on Information Theory, 1992, 315--323.
B. Yu and T. Speed, `` Data compression and histograms,'' Probability Theory and Related Fields, 1992, 195--229.
B. Yu, `` Lower bounds on expected redundancy for nonparametric classes,'' IEEE Trans. on Information Theory, 1996, 272--275.
B. Yu, `` Lower bound on the expected redundancy for classes of continuous Markov sources,'' In Statistical Decision Theory and Related Topics V, 1994b, 453--466. S. S. Gupta and J. O. Berger (eds).
B. Yu, `` Assouad, Fano, and Le Cam,'' To appear in Festschrift in Honor of L. Le Cam on His 70th Birthday. 1994c.
J. Rissanen and B. Yu, ``MDL learning,'' In Learning and Geometry: Computational Approaches, Progress in Computer Science and Applied Logic, 1996. 3-19. David Kueker and Carl Smith (eds), Birkh\"{a}user, Boston.
A. R. Barron, Y. Yang and B. Yu, ``
Asymptotically
optimal function estimation
by minimum complexity criteria,''
Proceedings of 1994 International Symposium
on Information Theory, 1994, pp.38.
Trondheim, Norway.