LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Blog
LogoAIAny

Tag

Explore by tags

LogoAIAny

Learn Anything about AI in one site.

support@aiany.app
Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2025 All Rights Reserved.
  • All

  • 30u30

  • ASR

  • ChatGPT

  • GNN

  • IDE

  • ai-agent

  • ai-coding

  • ai-image

  • ai-tools

  • ai-video

  • AIGC

  • alibaba

  • anthropic

  • audio

  • blog

  • book

  • chatbot

  • chemistry

  • course

  • deepmind

  • deepseek

  • engineering

  • foundation

  • foundation-model

  • google

  • LLM

  • math

  • NLP

  • openai

  • paper

  • physics

  • plugin

  • RL

  • science

  • translation

  • tutorial

  • vibe-coding

  • video

  • vision

  • xAI

Attention Is All You Need

2017
Ashish Vaswani, Noam Shazeer +6

The paper “Attention Is All You Need” (2017) introduced the Transformer — a novel neural architecture relying solely on self-attention, removing recurrence and convolutions. It revolutionized machine translation by dramatically improving training speed and translation quality (e.g., achieving 28.4 BLEU on English-German tasks), setting new state-of-the-art benchmarks. Its modular, parallelizable design opened the door to large-scale pretraining and fine-tuning, ultimately laying the foundation for modern large language models like BERT and GPT. This paper reshaped the landscape of NLP and deep learning, making attention-based models the dominant paradigm across many tasks.

NLPLLMAIGC30u30paper+1

Relational recurrent neural networks

2018
Adam Santoro, Ryan Faulkner +8

This paper introduces a Relational Memory Core that embeds multi-head dot-product attention into recurrent memory to enable explicit relational reasoning. Evaluated on synthetic distance-sorting, program execution, partially-observable reinforcement learning and large-scale language-modeling benchmarks, it consistently outperforms LSTM and memory-augmented baselines, setting state-of-the-art results on WikiText-103, Project Gutenberg and GigaWord. By letting memories interact rather than merely store information, the approach substantially boosts sequential relational reasoning and downstream task performance.

foundation30u30paperNLPLLM

GPT2: Language Models are Unsupervised Multitask Learners

2019
Alec Radford, Jeffrey Wu +4

This paper introduces GPT-2, showing that large-scale language models trained on diverse internet text can perform a wide range of natural language tasks in a zero-shot setting — without any task-specific training. By scaling up to 1.5 billion parameters and training on WebText, GPT-2 achieves state-of-the-art or competitive results on benchmarks like language modeling, reading comprehension, and question answering. Its impact has been profound, pioneering the trend toward general-purpose, unsupervised language models and paving the way for today’s foundation models in AI.

LLMNLPopenaipaper

Scaling Laws for Neural Language Models

2020
Jared Kaplan, Sam McCandlish +8

reveals that language model performance improves predictably as you scale up model size, dataset size, and compute, following smooth power-law relationships. It shows that larger models are more sample-efficient, and optimally efficient training uses very large models on moderate data, stopping well before convergence. The work provided foundational insights that influenced the development of massive models like GPT-3 and beyond, shaping how the AI community understands trade-offs between size, data, and compute in building ever-stronger models.

LLMNLPopenai30u30paper

GPT3: Language Models are Few-Shot Learners

2020
Tom B. Brown, Benjamin Mann +29

This paper introduces GPT-3, a 175-billion-parameter autoregressive language model that achieves impressive zero-shot, one-shot, and few-shot performance across diverse NLP tasks without task-specific fine-tuning. Its scale allows it to generalize from natural language prompts, rivaling or surpassing prior state-of-the-art models that require fine-tuning. The paper’s impact is profound: it demonstrated the power of scaling laws, reshaped research on few-shot learning, and sparked widespread adoption of large-scale language models, influencing advancements in AI applications, ethical debates, and commercial deployments globally.

LLMNLPopenaipaper

The Annotated Transformer

2022
Alexander Rush

This tutorial offers a detailed, line-by-line PyTorch implementation of the Transformer model introduced in "Attention Is All You Need." It elucidates the model's architecture—comprising encoder-decoder structures with multi-head self-attention and feed-forward layers—enhancing understanding through annotated code and explanations. This resource serves as both an educational tool and a practical guide for implementing and comprehending Transformer-based models.

NLPLLM30u30blogtutorial

GPT-4 Technical Report

2024
OpenAI, Josh Achiam +279

This paper introduces GPT-4, a large multimodal model that processes both text and images, achieving human-level performance on many academic and professional benchmarks like the bar exam and GRE. It significantly advances language understanding, multilingual capabilities, and safety alignment over previous models, outperforming GPT-3.5 by wide margins. Its impact is profound, setting new standards for natural language processing, enabling safer and more powerful applications, and driving critical research on scaling laws, safety, bias, and the societal implications of AI deployment.

LLMNLPopenaipaper

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

2024
DeepSeek-AI, Aixin Liu +155

This paper presents DeepSeek-V2, a 236B-parameter open-source Mixture-of-Experts (MoE) language model that activates only 21B parameters per token, achieving top-tier bilingual (English and Chinese) performance with remarkable training cost savings (42.5%) and inference efficiency (5.76× throughput) compared to previous models. Its innovations—Multi-head Latent Attention (MLA) and DeepSeekMoE—reduce memory bottlenecks and boost specialization. The paper’s impact lies in advancing economical, efficient large-scale language modeling, pushing open-source models closer to closed-source leaders, and paving the way for future multimodal and AGI-aligned systems.

LLMNLPdeepseekpaper

DeepSeek-V3 Technical Report

2024
DeepSeek-AI, Aixin Liu +198

This paper introduces DeepSeek-V3, a 671B-parameter Mixture-of-Experts (MoE) language model that activates only 37B parameters per token for efficient training and inference. By leveraging innovations like Multi-head Latent Attention, auxiliary-loss-free load balancing, and multi-token prediction, it achieves top-tier performance across math, code, multilingual, and reasoning tasks. Despite its massive scale, DeepSeek-V3 maintains economical training costs and outperforms all other open-source models, achieving results comparable to leading closed-source models like GPT-4o and Claude-3.5, thereby significantly narrowing the open-source vs. closed-source performance gap.

NLPLLMdeepseekpaper

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

2025
DeepSeek-AI, Daya Guo +198

This paper introduces DeepSeek-R1, a large language model that improves reasoning purely through reinforcement learning (RL), even without supervised fine-tuning. It shows that reasoning skills like chain-of-thought, self-reflection, and verification can naturally emerge from RL, achieving performance comparable to OpenAI’s top models. Its distilled smaller models outperform many open-source alternatives, democratizing advanced reasoning for smaller systems. The work impacts the field by proving RL-alone reasoning is viable and by open-sourcing both large and distilled models, opening new directions for scalable, cost-effective LLM training and future development in reasoning-focused AI systems.

NLPLLMdeepseekpaper

Deep Dive into LLMs like ChatGPT

2025
Andrej Karpathy

The best introduction to how large language models (LLMs) like ChatGPT works in the world. It covers the three main stages of their training: pre-training on vast amounts of internet text, supervised fine-tuning to become helpful assistants, and reinforcement learning to improve problem-solving skills. The video also discusses LLM psychology, including why they hallucinate, how they use tools, and their limitations. Finally, it looks at future capabilities like multimodality and agent-like behavior.

LLMvideoChatGPTtutorial

How I use LLMs

2025
Andrej Karpathy

The best introduction on how to use LLMs like ChatGPT. It covers the basics of how LLMs work, including concepts like "tokens" and "context windows". The video then demonstrates practical applications, such as using LLMs for knowledge-based queries, and more advanced features like "thinking models" for complex reasoning. It also explores how LLMs can use external tools for internet searches and deep research. Finally, the video delves into the multimodal capabilities of LLMs, including their use of voice, images, and video.

LLMvideoChatGPTtutorial
  • Previous
  • 1
  • Next