Neural Network Learning: Theoretical Foundations
This book describes recent theoretical advances in the study of
artificial neural networks. It explores probabilistic models of
supervised learning problems, and addresses the key statistical
and computational questions. The book surveys research on pattern
classification with binary-output networks, discussing the relevance
of the Vapnik-Chervonenkis dimension, and calculating estimates of
the dimension for several neural network models. A model of
classification by real-output networks is developed, and the
usefulness of classification with a `large margin' is demonstrated.
The book explains the role of scale-sensitive versions of the
Vapnik-Chervonenkis dimension in large margin classification, and in
real prediction. It also discusses the computational complexity of
neural network learning, describing a variety of hardness results,
and outlining two efficient constructive learning algorithms.
It is essentially self-contained, since it introduces the necessary
background material on probability, statistics, combinatorics
and computational complexity. It is suitable for researchers and
graduate students in computer science, engineering, and mathematics.
Cambridge University Press, 1999.
(Cambridge,
New York)
Table of Contents
1. Introduction
1.1. Supervised Learning
1.2. Artificial Neural Networks
1.3. Outline of the Book
1.4. Bibliographical Notes
Part one: Pattern Recognition with Binary-Output Neural Networks
2. The pattern recognition problem
2.1. The Learning Problem
2.2. Learning Finite Function Classes
2.3. Applications to Perceptrons
2.4. Restricted Model
2.5. Remarks
2.6. Bibliographical Notes
3. VC-dimension
3.1. Introduction
3.2. The Growth Function
3.3. The Vapnik-Chervonenkis Dimension
3.4. Bibliographical Notes
4. General upper bounds
4.1. Learning by Minimizing Sample Error
4.2. Uniform Convergence and Learnability
4.3. Proof of Uniform Convergence Result
4.4. Application to the Perceptron
4.5. The Restricted Model
4.6. Remarks
4.7. Bibliographical Notes
5. General lower bounds
5.1. Introduction
5.2. A Lower Bound for Learning
5.3. The Restricted Model
5.4. VC-dimension Quantifies Sample Complexity and Estimation Error
5.5. Remarks
5.6. Bibliographical Notes
6. Linear threshold networks
6.1. Feed-forward Neural Networks
6.2. Upper Bound
6.3. Lower Bounds
6.4. Bibliographical Notes
7. Geometric techniques
7.1. Introduction
7.2. The Need for Conditions on the Activation\\ Functions
7.3. A Bound on the Growth Function
7.4. Proof of the Growth Function Bound
7.5. More on Solution Set Components Bounds
7.6. Bibliographical Notes
8. VCdim for neural networks
8.1. Introduction
8.2. Function Classes that are Polynomial in their Parameters
8.3. Piecewise Polynomial Networks
8.4. Standard Sigmoid Networks
8.5. Remarks
8.6. Bibliographical Notes
Part two: Pattern Recognition with Real-Output Networks
9. Classification with real functions
9.1. Introduction
9.2. Large Margin Classifiers
9.3. Remarks
9.4. Bibliographical Notes
10. Covering Numbers and Convergence
10.1. Introduction
10.2. Covering Numbers
10.3. A Uniform Convergence Result
10.4. Covering Numbers in General
10.5. Remarks
10.6. Bibliographical Notes
11. Pseudo-dimension and fat-shattering
11.1. Introduction
11.2. The Pseudo-dimension
11.3. The Fat-shattering Dimension
11.4. Bibliographical Notes
12. Bounding covering numbers with dimensions
12.1. Introduction
12.2. Packing Numbers
12.3. Bounding with the Pseudo-dimension
12.4. Bounding with the Fat-shattering Dimension
12.5. Comparing the Two Approaches
12.6. Remarks
12.7. Bibliographical Notes
13. Sample complexity of classification learning
13.1. Large Margin SEM Algorithms
13.2. Large Margin SEM Algorithms as Classification Learning Algorithms
13.3. Lower Bounds for Certain Function Classes
13.4. Using the Pseudo-dimension
13.5. Remarks
13.6. Bibliographical Notes
14. Neural network dimensions
14.1. Introduction
14.2. Pseudo-dimension of Neural Networks
14.3. Bounds on the Fat-shattering Dimension in terms of Number of Parameters
14.4. Bounds on the Fat-shattering Dimension in terms of Size of Parameters
14.5. Remarks
14.6. Bibliographical Notes
15. Model Selection
15.1. Introduction
15.2. Model Selection Results
15.3. Proofs of the Results
15.4. Remarks
15.5. Bibliographical Notes
Part three: Learning Real-Valued Functions
16. Learning classes of real functions
16.1. Introduction
16.2. The Learning Framework for Real Estimation
16.3. Learning Finite Classes of Real Functions
16.4. A Substitute for Finiteness
16.5. Remarks
16.6. Bibliographical Notes
17. Uniform convergence for real classes
17.1. Uniform Convergence for Real Functions
17.2. Remarks
17.3. Bibliographical Notes
18. Bounding covering numbers
18.1. Introduction
18.2. Bounding with the Fat-shattering Dimension
18.3. Bounding with the Pseudo-dimension
18.4. Comparing the Different Approaches
18.5. Remarks
18.6. Bibliographical Notes
19. Sample complexity of learning function classes
19.1. Introduction
19.2. Classes with Finite Fat-shattering Dimension
19.3. Classes of Finite Pseudo-dimension
19.4. Results for Neural Networks
19.5. Lower Bounds
19.6. Remarks
19.7. Bibliographical Notes
20. Convex classes
20.1. Introduction
20.2. Lower Bounds for Non-Convex Classes
20.3. Upper Bound for Convex Classes
20.4. Remarks
20.5. Bibliographical Notes
21. Other learning problems
21.1. Loss Functions in General
21.2. Convergence for General Loss Functions
21.3. Learning in Multiple-Output Networks
21.4. Interpolation Models
21.5. Remarks
21.6. Bibliographical Notes
Part four: Algorithmics
22. Efficient learning
22.1. Introduction
22.2. Graded Function Classes
22.3. Efficient Learning
22.4. General Classes of Efficient Learning Algorithms
22.5. Efficient Learning in the Restricted Model
22.6. Bibliographical Notes
23. Learning as optimization
23.1. Introduction
23.2. Randomized Algorithms
23.3. Learning as Randomized Optimization
23.4. A Characterization of Efficient Learning
23.5. The `Hardness' of Learning
23.6. Remarks
23.7. Bibliographical Notes
24. The boolean perceptron
24.1. Introduction
24.2. Learning is Hard for the Simple Perceptron
24.3. Learning is Easy for Fixed Fan-in
24.4. Perceptron Learning in the Restricted Model
24.5. Remarks
24.6. Bibliographical Notes
25. Hardness results
25.1. Introduction
25.2. Linear Threshold Networks with Binary Inputs
25.3. Linear Threshold Networks with Real Inputs
25.4. Sigmoid Networks
25.5. Remarks
25.6. Bibliographical Notes
26. Constructive Algorithms
26.1. Introduction
26.2. Real Estimation with Convex Combinations of Basis Functions
26.3. Real Classification using Boosting
26.4. Bibliographical Notes
Appendix
A1. Useful Results
A1.1. Inequalities
A1.2. Tails of Distributions
A1.3. Sard's Theorem
Bibliography
Author index
Subject index
Last update: Wed Apr 23 12:39:53 PDT 2003