Learning Internal Representations by Error Propagation

This paper introduces the generalized delta rule, a learning procedure for multi-layer networks with hidden units, enabling them to learn internal representations. This rule implements a gradient descent method to minimize the error between the network's output and a target output by propagating error signals backward through the network. The authors demonstrate through simulations on various problems, such as XOR and parity, that this method, often called backpropagation, can discover complex internal representations and solutions. They show it overcomes previous limitations in training such networks and rarely encounters debilitating local minima.

Visit Website

Introduction

We now have a rather good understanding of simple two-layer associ-ative networks in which a set of input patterns arriving at an input layer are mapped directly to a set of output patterns at an output layer. Such networks have no hidden units. They involve only input and output units. In these cases there is no internal representation. The coding provided by the external world must suffice. These networks have proved useful in a wide variety of applications (cf. Chapters 2, 17, and 18). Perhaps the essential character of such networks is that they map similar input patterns to similar output patterns. This is what allows these networks to make reasonable generalizations and perform reasonably on patterns that have never before been presented. The similarity of patterns in a PDP system is determined by their overlap. The overlap in such networks is determined outside the learning system itself–by whatever produces …

Back

Information

Websitewww.cs.toronto.edu
AuthorsDavid E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams
Published date1985/09/01

More Items

Video models are zero-shot learners and reasoners

2025

Thaddäus Wiedemer, Yuxuan Li +7

This paper demonstrates the zero-shot learning and reasoning abilities of the generative video model Veo 3, paralleling the evolution of Large Language Models (LLMs) in natural language processing. Veo 3 excels in diverse visual tasks without explicit training, such as object segmentation, edge detection, image editing, understanding physical properties, recognizing affordances, and simulating tool use, enabling early visual reasoning like maze solving and symmetry detection.

video vision LLM paper ai-video+3

A Tutorial Introduction to the Minimum Description Length Principle

2004

Peter Grunwald

This paper gives a concise tutorial on MDL, unifying its intuitive and formal foundations and inspiring widespread use of MDL in statistics and machine learning.

foundation 30u30 paper math

Neural Turing Machines

2014

Alex Graves, Greg Wayne +1

This paper augments recurrent neural networks with a differentiable external memory addressed by content and location attention. Trained end-to-end, it learns algorithmic tasks like copying, sorting and associative recall from examples, proving that neural nets can induce simple programs. The idea sparked extensive work on memory-augmented models, differentiable computers, neural program synthesis and modern attention mechanisms.

foundation 30u30 paper