This paper proposes minimizing the information content in neural network weights to enhance generalization, particularly when training data is scarce. It introduces a method where adaptable Gaussian noise is added to the weights, balancing the expected squared error against the amount of information the weights contain. Leveraging the Minimum Description Length (MDL) principle and a "bits back" argument for communicating these noisy weights, the approach enables efficient derivative computations, especially if output units are linear. The paper also explores using adaptive mixtures of Gaussians for more flexible prior distributions for weight coding. Preliminary results indicated a slight improvement over simple weight-decay on a high-dimensional task.
This paper proposes a quantitative framework for the rise-and-fall trajectory of complexity in closed systems, showing that a coffee-and-cream cellular automaton exhibits a bell-curve of apparent complexity when particles interact, thereby linking information theory with thermodynamics and self-organization.
Supervised neural networks generalize well if there is much less information in the weights than there is in the output vectors of the training cases. So during learning, it is important to keep the weights simple by penalizing the amount of information they contain. The amount of information in a weight can be controlled by adding Gaussian noise and the noise level can be adapted during learning to optimize the trade-off between the expected squared error of the network and the amount of information in the weights. We describe a method of computing the derivatives of the expected squared error and of the amount of information in the noisy weights in a network that contains a layer of non-linear hidden units. Provided the output units are linear, the exact derivatives can be computed efficiently without time-consuming Monte Carlo simulations. The idea of minimizing the amount of information that is required to communicate the weights of a neural network leads to a numb er of interesting schemes for encoding the weights.