LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Blog
LogoAIAny

Tag

Explore by tags

LogoAIAny

Learn Anything about AI in one site.

support@aiany.app
Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2025 All Rights Reserved.
  • All

  • 30u30

  • ASR

  • ChatGPT

  • GNN

  • IDE

  • ai-agent

  • ai-coding

  • ai-image

  • ai-tools

  • ai-video

  • AIGC

  • alibaba

  • anthropic

  • audio

  • blog

  • book

  • chatbot

  • chemistry

  • course

  • deepmind

  • deepseek

  • engineering

  • foundation

  • foundation-model

  • google

  • LLM

  • math

  • NLP

  • openai

  • paper

  • physics

  • plugin

  • RL

  • science

  • translation

  • tutorial

  • vibe-coding

  • video

  • vision

  • xAI

ImageNet Classification with Deep Convolutional Neural Networks

2012
Alex Krizhevsky, Ilya Sutskever +1

The 2012 paper “ImageNet Classification with Deep Convolutional Neural Networks” by Krizhevsky, Sutskever, and Hinton introduced AlexNet, a deep CNN that dramatically improved image classification accuracy on ImageNet, halving the top-5 error rate from \~26% to \~15%. Its innovations — like ReLU activations, dropout, GPU training, and data augmentation — sparked the deep learning revolution, laying the foundation for modern computer vision and advancing AI across industries.

vision30u30paperfoundation

Generative Adversarial Networks

2014
Ian J. Goodfellow, Jean Pouget-Abadie +6

The 2014 paper “Generative Adversarial Nets” (GAN) by Ian Goodfellow et al. introduced a groundbreaking framework where two neural networks — a generator and a discriminator — compete in a minimax game: the generator tries to produce realistic data, while the discriminator tries to distinguish real from fake. This approach avoids Markov chains and approximate inference, relying solely on backpropagation. GANs revolutionized generative modeling, enabling realistic image, text, and audio generation, sparking massive advances in AI creativity, deepfake technology, and research on adversarial training and robustness.

visionAIGCpaperfoundation

CS231n: Deep Learning for Computer Vision

2015
Fei-Fei Li

Stanford’s 10-week CS231n dives from first principles to state-of-the-art vision research, starting with image-classification basics, loss functions and optimization, then building from fully-connected nets to modern CNNs, residual and vision-transformer architectures. Lectures span training tricks, regularization, visualization, transfer learning, detection, segmentation, video, 3-D and generative models. Three hands-on PyTorch assignments guide students from k-NN/SVM through deep CNNs and network visualization, and a capstone project lets teams train large-scale models on a vision task of their choice, graduating with the skills to design, debug and deploy real-world deep-learning pipelines.

foundationvision30u30coursetutorial

Multi-Scale Context Aggregation by Dilated Convolutions

2015
Fisher Yu, Vladlen Koltun

This paper introduces a novel module for semantic segmentation using dilated convolutions, which enables exponential expansion of the receptive field without losing resolution. By aggregating multi-scale contextual information efficiently, the proposed context module significantly improves dense prediction accuracy when integrated into existing architectures. The work has had a lasting impact on dense prediction and semantic segmentation, laying the foundation for many modern segmentation models.

30u30papervision

Deep Residual Learning for Image Recognition

2015
Kaiming He, Xiangyu Zhang +2

The paper “Deep Residual Learning for Image Recognition” (ResNet, 2015) introduced residual networks with shortcut connections, allowing very deep neural networks (over 100 layers) to be effectively trained by reformulating the learning task into residual functions (F(x) = H(x) − x). This innovation solved the degradation problem in deep models, achieving state-of-the-art results on ImageNet (winning ILSVRC 2015) and COCO challenges. Its impact reshaped the design of deep learning architectures across vision and non-vision tasks, becoming a foundational backbone in modern AI systems.

foundation30u30papervision

Identity Mappings in Deep Residual Networks

2016
Kaiming He, Xiangyu Zhang +2

This paper shows that using identity mappings for skip connections and pre-activation in residual blocks allows signals to flow unimpeded, making it easier to train very deep networks. Through theoretical analysis and ablation studies, the authors introduce a pre-activation residual unit that enables successful training of 1000-layer ResNets and improves CIFAR-10/100 and ImageNet accuracy, influencing later architectures such as ResNet-v2 and numerous deep vision models.

foundation30u30papervision

Variational Lossy Autoencoder

2016
Xi Chen, Diederik P. Kingma +6

This paper proposes the Variational Lossy Autoencoder (VLAE), a VAE that uses autoregressive priors and decoders to deliberately discard local detail while retaining global structure. By limiting the receptive field of the PixelCNN decoder and employing autoregressive flows as the prior, the model forces the latent code to capture only high-level information, yielding controllable lossy representations. Experiments on MNIST, Omniglot, Caltech-101 Silhouettes and CIFAR-10 set new likelihood records for VAEs and demonstrate faithful global reconstructions with replaced textures. VLAE influenced research on representation bottlenecks, pixel-VAE hybrids, and state-of-the-art compression and generation benchmarks.

30u30papervision
Icon for item

Midjourney

2022
Midjourney, Inc.

An independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.

ai-toolsai-imagevision
Icon for item

Stable Diffusion

2022
Stability AI

Experience unparalleled image generation capabilities with SDXL Turbo and Stable Diffusion XL. Our models use shorter prompts and generate descriptive images with enhanced composition and realistic aesthetics.

ai-toolsai-imagevision
Icon for item

Runway

2023
Runway AI, Inc.

With Runway Gen-4, you are now able to precisely generate consistent characters, locations and objects across scenes. Simply set your look and feel and the model will maintain coherent world environments while preserving the distinctive style, mood and cinematographic elements of each frame. Then, regenerate those elements from multiple perspectives and positions within your scenes.

ai-toolsai-videovision
Icon for item

Veo

2024
Google DeepMind

Veo is a state-of-the-art video generation model developed by Google DeepMind, designed to empower filmmakers and storytellers.

ai-toolsai-videovision
Icon for item

KlingAI

2024
Kuaishou Technology

Kling AI, tools for creating imaginative images and videos, based on state-of-art generative AI methods.

ai-toolsai-imageai-videovision
  • Previous
  • 1
  • 2
  • Next