LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Blog
LogoAIAny

Tag

Explore by tags

LogoAIAny

Curated AI Resources for Everyone

support@aiany.app
Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2026 All Rights Reserved.
  • All

  • 30u30

  • ASR

  • ChatGPT

  • GNN

  • IDE

  • RAG

  • ai-agent

  • ai-api

  • ai-api-management

  • ai-client

  • ai-coding

  • ai-demos

  • ai-development

  • ai-framework

  • ai-image

  • ai-image-demos

  • ai-inference

  • ai-leaderboard

  • ai-library

  • ai-rank

  • ai-serving

  • ai-tools

  • ai-train

  • ai-video

  • ai-workflow

  • AIGC

  • alibaba

  • amazon

  • anthropic

  • audio

  • blog

  • book

  • bytedance

  • chatbot

  • chemistry

  • claude

  • claude-code

  • course

  • deepmind

  • deepseek

  • engineering

  • finance

  • foundation

  • foundation-model

  • gemini

  • github

  • google

  • gradient-booting

  • grok

  • huggingface

  • LLM

  • llm

  • math

  • mcp

  • mcp-client

  • mcp-server

  • meta-ai

  • microsoft

  • mlops

  • NLP

  • nvidia

  • ocr

  • ollama

  • openai

  • paper

  • physics

  • plugin

  • pytorch

  • RL

  • robotics

  • science

  • security

  • sora

  • translation

  • tutorial

  • vibe-coding

  • video

  • vision

  • xAI

  • xai

Keeping NN Simple by Minimizing the Description Legnth of the Weights

1993
Geoffrey E. Hinton, Drew van Camp

This paper proposes minimizing the information content in neural network weights to enhance generalization, particularly when training data is scarce. It introduces a method where adaptable Gaussian noise is added to the weights, balancing the expected squared error against the amount of information the weights contain. Leveraging the Minimum Description Length (MDL) principle and a "bits back" argument for communicating these noisy weights, the approach enables efficient derivative computations, especially if output units are linear. The paper also explores using adaptive mixtures of Gaussians for more flexible prior distributions for weight coding. Preliminary results indicated a slight improvement over simple weight-decay on a high-dimensional task.

foundation30u30paper

A Tutorial Introduction to the Minimum Description Length Principle

2004
Peter Grunwald

This paper gives a concise tutorial on MDL, unifying its intuitive and formal foundations and inspiring widespread use of MDL in statistics and machine learning.

foundation30u30papermath

Machine Super Intelligence by Shane Legg

2011
Shane Legg

This book develops a formal theory of intelligence, defining it as an agent’s capacity to achieve goals across computable environments and grounding the concept in Kolmogorov complexity, Solomonoff induction and Hutter’s AIXI framework.It shows how these idealised constructs unify prediction, compression and reinforcement learning, yielding a universal intelligence measure while exposing the impracticality of truly optimal agents due to incomputable demands. Finally, it explores how approximate implementations could trigger an intelligence explosion and stresses the profound ethical and existential stakes posed by machines that surpass human capability.

foundation30u30book

The First Law of Complexodynamics

2011
Scott Aaronson

This post explores why physical systems’ “complexity” rises, peaks, then falls over time, unlike entropy, which always increases. Using Kolmogorov complexity and the notion of “sophistication,” the author proposes a formal way to capture this pattern, introducing the idea of “complextropy” — a complexity measure that’s low in both highly ordered and fully random states but peaks during intermediate, evolving phases. He suggests using computational resource bounds to make the measure meaningful and proposes both theoretical and empirical (e.g., using file compression) approaches to test this idea, acknowledging it as an open problem.

foundationblog30u30tutorial

ImageNet Classification with Deep Convolutional Neural Networks

2012
Alex Krizhevsky, Ilya Sutskever +1

The 2012 paper “ImageNet Classification with Deep Convolutional Neural Networks” by Krizhevsky, Sutskever, and Hinton introduced AlexNet, a deep CNN that dramatically improved image classification accuracy on ImageNet, halving the top-5 error rate from \~26% to \~15%. Its innovations — like ReLU activations, dropout, GPU training, and data augmentation — sparked the deep learning revolution, laying the foundation for modern computer vision and advancing AI across industries.

vision30u30paperfoundation

Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton

2014
Scott Aaronson, Sean M. Carroll +1

This paper proposes a quantitative framework for the rise-and-fall trajectory of complexity in closed systems, showing that a coffee-and-cream cellular automaton exhibits a bell-curve of apparent complexity when particles interact, thereby linking information theory with thermodynamics and self-organization.

foundation30u30paperphysicsscience

Neural Machine Translation by Jointly Learning to Align and Translate

2014
Dzmitry Bahdanau, Kyunghyun Cho +1

This paper introduces an attention-based encoder–decoder NMT architecture that learns soft alignments between source and target words while translating, eliminating the fixed-length bottleneck of earlier seq2seq models. The approach substantially improves BLEU, especially on long sentences, and matches phrase-based SMT on English-French without additional hand-engineered features. The attention mechanism it proposes became the foundation for virtually all subsequent NMT systems and inspired attention-centric models like the Transformer, reshaping machine translation and sequence modeling across NLP.

30u30paperNLPtranslation

Recurrent Neural Network Regularization

2014
Wojciech Zaremba, Ilya Sutskever +1

This paper presents a method for applying dropout regularization to LSTMs by restricting it to non-recurrent connections, solving prior issues with overfitting in recurrent networks. It significantly improves generalization across diverse tasks including language modeling, speech recognition, machine translation, and image captioning. The technique allows larger RNNs to be effectively trained without compromising their ability to memorize long-term dependencies. This work helped establish dropout as a viable regularization strategy for RNNs and influenced widespread adoption in sequence modeling applications.

foundation30u30paper

Neural Turing Machines

2014
Alex Graves, Greg Wayne +1

This paper augments recurrent neural networks with a differentiable external memory addressed by content and location attention. Trained end-to-end, it learns algorithmic tasks like copying, sorting and associative recall from examples, proving that neural nets can induce simple programs. The idea sparked extensive work on memory-augmented models, differentiable computers, neural program synthesis and modern attention mechanisms.

foundation30u30paper

CS231n: Deep Learning for Computer Vision

2015
Fei-Fei Li

Stanford’s 10-week CS231n dives from first principles to state-of-the-art vision research, starting with image-classification basics, loss functions and optimization, then building from fully-connected nets to modern CNNs, residual and vision-transformer architectures. Lectures span training tricks, regularization, visualization, transfer learning, detection, segmentation, video, 3-D and generative models. Three hands-on PyTorch assignments guide students from k-NN/SVM through deep CNNs and network visualization, and a capstone project lets teams train large-scale models on a vision task of their choice, graduating with the skills to design, debug and deploy real-world deep-learning pipelines.

foundationvision30u30coursetutorial

The Unreasonable Effectiveness of Recurrent Neural Networks

2015
Andrej Karpathy

This tutorial explores the surprising capabilities of Recurrent Neural Networks (RNNs), particularly in generating coherent text character by character. It delves into how RNNs, especially when implemented with Long Short-Term Memory (LSTM) units, can learn complex patterns and structures in data, enabling them to produce outputs that mimic the style and syntax of the training material. The discussion includes the architecture of RNNs, their ability to handle sequences of varying lengths, and the challenges associated with training them, such as the vanishing gradient problem. Through various examples, the tutorial illustrates the potential of RNNs in tasks like language modeling and sequence prediction.

30u30foundationblogtutorial

Understanding LSTM Networks

2015
Christopher Olah

This tutorial explains how Long Short-Term Memory (LSTM) networks address the limitations of traditional Recurrent Neural Networks (RNNs), particularly their difficulty in learning long-term dependencies due to issues like vanishing gradients. LSTMs introduce a cell state that acts as a conveyor belt, allowing information to flow unchanged, and utilize gates (input, forget, and output) to regulate the addition, removal, and output of information. This architecture enables LSTMs to effectively capture and maintain long-term dependencies in sequential data

foundationblog30u30tutorial
  • Previous
  • 1
  • 2
  • 3
  • Next