LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Daily AI
LogoAIAny

Category

Explore by categories

LogoAIAny

Curated AI Resources for Everyone

[email protected]

Powered by airss.app

Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2026 All Rights Reserved.
  • All Categories

  • AI Leaderboard

  • AI Agent Tutorials

  • AI Coding Tutorials

  • AI Model

  • AI Agent Papers

  • Chatbot

  • AI Dataset

  • Machine Learning Foundation Books

  • AI Train

  • AI Deploy

  • AI Client

  • Machine Learning Foundation Papers

  • Machine Learning Foundation Tutorials

  • AI Image Demos

  • AI Agent

  • Large Language Model Tutorials

  • Large Language Model Papers

  • Machine Learning Engineering Papers

  • Computer Vision Tutorials

  • Computer Vision Papers

  • Natural Language Processing Papers

  • Reinforcement Learning Papers

  • Speech Technology Papers

  • AI API

  • AI Coding

  • AI Image

  • AI Video

  • MLOps

  • MCP Client

  • MCP Server

  • AI Video Papers

  • AI Audio

  • AI Others

  • AI Infra

  • Embodied AI

Icon for item

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

2025
DeepSeek-AI, Aixin Liu +262

DeepSeek-V3.2 is an open large language model that balances high computational efficiency with superior reasoning and agent capabilities. Key innovations include DeepSeek Sparse Attention (DSA) for reduced complexity in long contexts, a scalable reinforcement learning framework achieving GPT-5-level performance, and a large-scale agentic task synthesis pipeline for improved generalization in tool-use scenarios.

deepseekLLMpaperRLai-agent
Icon for item

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

2026
Jiapeng Zhu, Jianxiang Yu +6

Combines internalizing general skills with task-specific skill utilization via a difficulty-aware router to improve in-distribution and out-of-distribution performance for agentic RL. Uses privileged distillation for hard tasks and diagnostic probing for easy tasks; evaluated on ALFWorld and WebShop.

agent-skillsRLai-agentpaper
Icon for item

LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards

2026
Nianyi Lin, Jiajie Zhang +2

Uses search-agent reading traces and tiered distractors to train LLMs for long-context, multi-hop reasoning, and introduces a rubric reward that supervises entity-level steps (applied only to correct finals). Improves evidence-grounded reasoning and resists reward hacking across 4B–30B models.

RLLLMNLPpapercode+1
Icon for item

A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL

2026
Lei Yang, Siyu Ding +1

Analyzes how single-domain RL fine-tuning on LLMs induces cross-domain interference and shows this damage concentrates in a low-dimensional shared conflict subspace; proposes a local perturbation theory and short domain "refresh" procedures that selectively recover earlier domains with minimal collateral loss.

RLLLMpaperNLPcode+1
Icon for item

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

2026
Pengcheng Jiang, Zhiyi Shi +6

A 20B retrieval subagent trained with reinforcement learning inside a stateful search harness that externalizes recoverable search state (candidate pool, curated evidence, verification records). The harness lets the policy focus on semantic search decisions, improving curated recall and transfer robustness.

RLai-agentagent-skillsvllmhuggingface+1

Playing Atari with Deep Reinforcement Learning

2013
Volodymyr Mnih, Koray Kavukcuoglu +5

The paper by DeepMind introduced Deep Q-Networks (DQN), the first deep learning model to learn control policies directly from raw pixel input using reinforcement learning. By combining Q-learning with convolutional neural networks and experience replay, DQN achieved superhuman performance on several Atari 2600 games without handcrafted features or game-specific tweaks. Its impact was profound: it proved deep learning could master complex tasks with sparse, delayed rewards, catalyzing the modern wave of deep reinforcement learning research and paving the way for later breakthroughs like AlphaGo.

RLdeepmindpaper

Mastering the game of Go with deep neural networks and tree search

2016
David Silver, Aja Huang +18

The paper introduced AlphaGo, the first program to defeat a human professional Go player without handicap. It combined deep neural networks — trained with supervised learning and reinforcement learning — with Monte Carlo tree search (MCTS), enabling efficient move selection and board evaluation in Go’s massive search space. AlphaGo’s victory against European champion Fan Hui marked a historic AI milestone, showcasing that combining learning-based policies with search can surpass prior handcrafted methods, reshaping both game AI and broader AI research directions.

RLdeepmindpaper
  • Previous
  • 1
  • Next