Neural Network Learning: Theoretical Foundations

Martin Anthony and Peter L. Bartlett

This book describes recent theoretical advances in the study of artificial neural networks. It explores probabilistic models of supervised learning problems, and addresses the key statistical and computational questions. The book surveys research on pattern classification with binary-output networks, discussing the relevance of the Vapnik-Chervonenkis dimension, and calculating estimates of the dimension for several neural network models. A model of classification by real-output networks is developed, and the usefulness of classification with a `large margin' is demonstrated. The book explains the role of scale-sensitive versions of the Vapnik-Chervonenkis dimension in large margin classification, and in real prediction. It also discusses the computational complexity of neural network learning, describing a variety of hardness results, and outlining two efficient constructive learning algorithms. It is essentially self-contained, since it introduces the necessary background material on probability, statistics, combinatorics and computational complexity. It is suitable for researchers and graduate students in computer science, engineering, and mathematics.
Cambridge University Press, 1999. (Cambridge, New York)

Table of Contents

1. Introduction

1.1. Supervised Learning
1.2. Artificial Neural Networks
1.3. Outline of the Book
1.4. Bibliographical Notes

Part one: Pattern Recognition with Binary-Output Neural Networks

2. The pattern recognition problem

2.1. The Learning Problem
2.2. Learning Finite Function Classes
2.3. Applications to Perceptrons
2.4. Restricted Model
2.5. Remarks
2.6. Bibliographical Notes

3. VC-dimension

3.1. Introduction
3.2. The Growth Function
3.3. The Vapnik-Chervonenkis Dimension
3.4. Bibliographical Notes

4. General upper bounds

4.1. Learning by Minimizing Sample Error
4.2. Uniform Convergence and Learnability
4.3. Proof of Uniform Convergence Result
4.4. Application to the Perceptron
4.5. The Restricted Model
4.6. Remarks
4.7. Bibliographical Notes

5. General lower bounds

5.1. Introduction
5.2. A Lower Bound for Learning
5.3. The Restricted Model
5.4. VC-dimension Quantifies Sample Complexity and Estimation Error
5.5. Remarks
5.6. Bibliographical Notes

6. Linear threshold networks

6.1. Feed-forward Neural Networks
6.2. Upper Bound
6.3. Lower Bounds
6.4. Bibliographical Notes

7. Geometric techniques

7.1. Introduction
7.2. The Need for Conditions on the Activation\\ Functions
7.3. A Bound on the Growth Function
7.4. Proof of the Growth Function Bound
7.5. More on Solution Set Components Bounds
7.6. Bibliographical Notes

8. VCdim for neural networks

8.1. Introduction
8.2. Function Classes that are Polynomial in their Parameters
8.3. Piecewise Polynomial Networks
8.4. Standard Sigmoid Networks
8.5. Remarks
8.6. Bibliographical Notes

Part two: Pattern Recognition with Real-Output Networks

9. Classification with real functions

9.1. Introduction
9.2. Large Margin Classifiers
9.3. Remarks
9.4. Bibliographical Notes

10. Covering Numbers and Convergence

10.1. Introduction
10.2. Covering Numbers
10.3. A Uniform Convergence Result
10.4. Covering Numbers in General
10.5. Remarks
10.6. Bibliographical Notes

11. Pseudo-dimension and fat-shattering

11.1. Introduction
11.2. The Pseudo-dimension
11.3. The Fat-shattering Dimension
11.4. Bibliographical Notes

12. Bounding covering numbers with dimensions

12.1. Introduction
12.2. Packing Numbers
12.3. Bounding with the Pseudo-dimension
12.4. Bounding with the Fat-shattering Dimension
12.5. Comparing the Two Approaches
12.6. Remarks
12.7. Bibliographical Notes

13. Sample complexity of classification learning

13.1. Large Margin SEM Algorithms
13.2. Large Margin SEM Algorithms as Classification Learning Algorithms
13.3. Lower Bounds for Certain Function Classes
13.4. Using the Pseudo-dimension
13.5. Remarks
13.6. Bibliographical Notes

14. Neural network dimensions

14.1. Introduction
14.2. Pseudo-dimension of Neural Networks
14.3. Bounds on the Fat-shattering Dimension in terms of Number of Parameters
14.4. Bounds on the Fat-shattering Dimension in terms of Size of Parameters
14.5. Remarks
14.6. Bibliographical Notes

15. Model Selection

15.1. Introduction
15.2. Model Selection Results
15.3. Proofs of the Results
15.4. Remarks
15.5. Bibliographical Notes

Part three: Learning Real-Valued Functions

16. Learning classes of real functions

16.1. Introduction
16.2. The Learning Framework for Real Estimation
16.3. Learning Finite Classes of Real Functions
16.4. A Substitute for Finiteness
16.5. Remarks
16.6. Bibliographical Notes

17. Uniform convergence for real classes

17.1. Uniform Convergence for Real Functions
17.2. Remarks
17.3. Bibliographical Notes

18. Bounding covering numbers

18.1. Introduction
18.2. Bounding with the Fat-shattering Dimension
18.3. Bounding with the Pseudo-dimension
18.4. Comparing the Different Approaches
18.5. Remarks
18.6. Bibliographical Notes

19. Sample complexity of learning function classes

19.1. Introduction
19.2. Classes with Finite Fat-shattering Dimension
19.3. Classes of Finite Pseudo-dimension
19.4. Results for Neural Networks
19.5. Lower Bounds
19.6. Remarks
19.7. Bibliographical Notes

20. Convex classes

20.1. Introduction
20.2. Lower Bounds for Non-Convex Classes
20.3. Upper Bound for Convex Classes
20.4. Remarks
20.5. Bibliographical Notes

21. Other learning problems

21.1. Loss Functions in General
21.2. Convergence for General Loss Functions
21.3. Learning in Multiple-Output Networks
21.4. Interpolation Models
21.5. Remarks
21.6. Bibliographical Notes

Part four: Algorithmics

22. Efficient learning

22.1. Introduction
22.2. Graded Function Classes
22.3. Efficient Learning
22.4. General Classes of Efficient Learning Algorithms
22.5. Efficient Learning in the Restricted Model
22.6. Bibliographical Notes

23. Learning as optimization

23.1. Introduction
23.2. Randomized Algorithms
23.3. Learning as Randomized Optimization
23.4. A Characterization of Efficient Learning
23.5. The `Hardness' of Learning
23.6. Remarks
23.7. Bibliographical Notes

24. The boolean perceptron

24.1. Introduction
24.2. Learning is Hard for the Simple Perceptron
24.3. Learning is Easy for Fixed Fan-in
24.4. Perceptron Learning in the Restricted Model
24.5. Remarks
24.6. Bibliographical Notes

25. Hardness results

25.1. Introduction
25.2. Linear Threshold Networks with Binary Inputs
25.3. Linear Threshold Networks with Real Inputs
25.4. Sigmoid Networks
25.5. Remarks
25.6. Bibliographical Notes

26. Constructive Algorithms

26.1. Introduction
26.2. Real Estimation with Convex Combinations of Basis Functions
26.3. Real Classification using Boosting
26.4. Bibliographical Notes


A1. Useful Results

A1.1. Inequalities
A1.2. Tails of Distributions
A1.3. Sard's Theorem


Author index

Subject index

NIPS'98 Tutorial based on the book

Last update: Wed Apr 23 12:39:53 PDT 2003