AIAny - Natural Language Processing Papers

Neural Machine Translation by Jointly Learning to Align and Translate

2014

Dzmitry Bahdanau, Kyunghyun Cho +1

This paper introduces an attention-based encoder–decoder NMT architecture that learns soft alignments between source and target words while translating, eliminating the fixed-length bottleneck of earlier seq2seq models. The approach substantially improves BLEU, especially on long sentences, and matches phrase-based SMT on English-French without additional hand-engineered features. The attention mechanism it proposes became the foundation for virtually all subsequent NMT systems and inspired attention-centric models like the Transformer, reshaping machine translation and sequence modeling across NLP.

30u30 paper NLP translation

Attention Is All You Need

2017

Ashish Vaswani, Noam Shazeer +6

The paper “Attention Is All You Need” (2017) introduced the Transformer — a novel neural architecture relying solely on self-attention, removing recurrence and convolutions. It revolutionized machine translation by dramatically improving training speed and translation quality (e.g., achieving 28.4 BLEU on English-German tasks), setting new state-of-the-art benchmarks. Its modular, parallelizable design opened the door to large-scale pretraining and fine-tuning, ultimately laying the foundation for modern large language models like BERT and GPT. This paper reshaped the landscape of NLP and deep learning, making attention-based models the dominant paradigm across many tasks.

NLP LLM AIGC 30u30 paper+1

Relational recurrent neural networks

2018

Adam Santoro, Ryan Faulkner +8

This paper introduces a Relational Memory Core that embeds multi-head dot-product attention into recurrent memory to enable explicit relational reasoning. Evaluated on synthetic distance-sorting, program execution, partially-observable reinforcement learning and large-scale language-modeling benchmarks, it consistently outperforms LSTM and memory-augmented baselines, setting state-of-the-art results on WikiText-103, Project Gutenberg and GigaWord. By letting memories interact rather than merely store information, the approach substantially boosts sequential relational reasoning and downstream task performance.

foundation 30u30 paper NLP LLM

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

2018

Jacob Devlin, Ming-Wei Chang +2

The BERT (Bidirectional Encoder Representations from Transformers) paper introduced a powerful pre-trained language model that uses deep bidirectional transformers and masked language modeling to capture both left and right context. Unlike prior unidirectional models, BERT achieved state-of-the-art performance across 11 NLP tasks (like GLUE, SQuAD) by enabling fine-tuning with minimal task-specific adjustments. Its impact reshaped NLP by setting a new standard for transfer learning, greatly improving accuracy on tasks such as question answering, sentiment analysis, and natural language inference, and inspiring a wave of follow-up models like RoBERTa, ALBERT, and T5.

NLP paper

Category

Explore by categories

All

AI Leaderboard

Chatbot

Machine Learning Foundation Books

AI Train

AI Deploy

AI Client

Machine Learning Foundation Papers

Machine Learning Foundation Tutorials

AI Agent

Large Language Model Tutorials

Large Language Model Papers

Machine Learning Engineering Papers

Computer Vision Tutorials

Computer Vision Papers

Natural Language Processing Papers

Reinforcement Learning Papers

Speech Technology Papers

AI API

AI Coding

AI Image

AI Video

MLOps

MCP Client

MCP Server

Neural Machine Translation by Jointly Learning to Align and Translate

Attention Is All You Need

Relational recurrent neural networks

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding