AIAny - AI Agent Papers

Large Language Model Papers2022

ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao +5Google Research, Princeton University

Interleaves chain-of-thought reasoning with tool-using actions in one LLM loop: the model plans, queries a source like Wikipedia, then revises from results. Cuts hallucination versus reasoning-only prompting and beats trained agents on interactive tasks.

paper LLM NLP ai-agent google+1

AI Agent Papers2024

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

John Yang, Carlos E. Jimenez +5Princeton Language and Intelligence, Princeton University

Treats the interface between an LM agent and a computer as a design variable. A custom agent-computer interface (ACI) with concise file-edit, repo-navigation, and test commands plus compact feedback reaches 12.5% pass@1 on SWE-bench, 87.7% on HumanEvalFix.

paper ai-agent LLM ai-coding engineering

Large Language Model Papers2025

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

DeepSeek-AI, Aixin Liu +262DeepSeek-AI

An open large language model pairing DeepSeek Sparse Attention (DSA) for cheaper long-context inference with a scaled RL pipeline. Authors claim parity with GPT-5, with a high-compute Speciale variant surpassing it and rivaling Gemini-3.0-Pro on reasoning.

deepseek LLM paper RL ai-agent

Reinforcement Learning Papers2026

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

Jiapeng Zhu, Jianxiang Yu +6

Combines internalizing general skills with task-specific skill utilization via a difficulty-aware router to improve in-distribution and out-of-distribution performance for agentic RL. Uses privileged distillation for hard tasks and diagnostic probing for easy tasks; evaluated on ALFWorld and WebShop.

agent-skills RL ai-agent paper

AI Agent Papers2026

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Tomer Keren, Nitay Calderon +4

Proposes TASTE, an automatic pipeline that synthesizes challenging agent benchmark tasks by sampling and evolving valid tool-sequence patterns; uses an adaptive contrastive n-gram model and LLM validity judgments to produce τ^c-Bench with broader tool-use coverage and higher difficulty.

agent-skills ai-agent paper LLM ai-rank

AI Agent Papers2026

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

Haoxiang Zhang, Qixin Xu +5

Analyzes when masking stale observations improves long-horizon search agents and why, identifying an asymmetric inverted-U relationship between masking benefit, retriever quality, and model capacity; explains a token-for-turn trade-off and releases evaluation scaffolds and trajectories.

paper code github nlp llm+4

AI Agent Papers2026

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

Tianyi Zhou, Dongrui Liu +3

Automates distillation of heterogeneous traces from a target person or role into versioned, inspectable skill packages for LLM agents — producing separate capability and bounded-behavior tracks that support natural-language corrections, rollback, and cross-host installation. Ships with an open system and a skills gallery.

agent-skills skillkit LLM nlp paper+3

AI Agent Papers2026

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Jiaming Wang, Ziteng Feng +9

Localizes harmful span-level errors inside long research-agent trajectories to show which trajectory segments make final answers unreliable. Provides a 1,000-instance TELBench of annotated spans and DRIFT, a claim-centric auditing method that improves span-level localization and first-error accuracy by up to 30 percentage points.

agent-skills ai-agent LLM NLP paper

Reinforcement Learning Papers2026

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Pengcheng Jiang, Zhiyi Shi +6

A 20B retrieval subagent trained with reinforcement learning inside a stateful search harness that externalizes recoverable search state (candidate pool, curated evidence, verification records). The harness lets the policy focus on semantic search decisions, improving curated recall and transfer robustness.

RL ai-agent agent-skills vllm huggingface+1

Computer Vision Papers2026

Cosmos 3: Omnimodal World Models for Physical AI

Aditi, Niket Agarwal +9

Omnimodal world model that jointly processes and generates text, images, video, audio, and action trajectories for physical AI. Uses a mixture-of-transformers to combine autoregressive reasoning and diffusion-based multimodal generation; released open-source with checkpoints, datasets and benchmarks for robotics and simulation.

foundation-model multimodal video image robotics+4

AI Agent Papers2026

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

Junqi Liu, Salena Song +13

Workflow-aware benchmark for autonomous medical-AI research that splits agent execution into five stages (Plan, Setup, Validate, Inference, Submit) and evaluates long-horizon runs across segmentation, image enhancement, VQA, report generation, and lesion detection with stage-level scoring.

vision multimodal ai-agent agent-skills ai-workflow+2

AI Agent Papers2026

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Nahyun Lee, Dongkeun Yoon +13

A benchmark for evaluating web-browsing agents in Korean contexts, composed of 400 tasks (300 manually verified by native speakers). Includes a human-verified split and an adversarial synthetic split to probe failure modes; reveals large performance gaps for both frontier and Korean models.

paper NLP ai-agent agent-skills multilingual+1

Category

Explore by categories

All Categories

AI Leaderboard

AI Agent Tutorials

AI Coding Tutorials

AI Model

AI Agent Papers

Chatbot

AI Dataset

Machine Learning Foundation Books

AI Train

AI Deploy

AI Client

Machine Learning Foundation Papers

Machine Learning Foundation Tutorials

AI Image Demos

AI Agent

Large Language Model Tutorials

Large Language Model Papers

Machine Learning Engineering Papers

Computer Vision Tutorials

Computer Vision Papers

Natural Language Processing Papers

Reinforcement Learning Papers

Speech Technology Papers

AI API

AI Coding

AI Image

AI Video

MLOps

MCP Client

MCP Server

AI Video Papers

AI Audio

AI Others

AI Infra

Embodied AI

ReAct: Synergizing Reasoning and Acting in Language Models

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks

Masking Stale Observations Helps Search Agents -- Until It Doesn't: A Regime Map and Its Mechanism

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Cosmos 3: Omnimodal World Models for Physical AI

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts