This paper introduces ReAct, an approach that integrates reasoning and acting in large language models (LLMs). ReAct enables LLMs to generate both reasoning traces and task-specific actions in an interleaved manner. This synergy allows reasoning to help induce, track, and update action plans, while actions interface with external sources like knowledge bases to gather more information, overcoming issues of hallucination and error propagation in prior methods.
SWE-agent is a system designed to empower language model (LM) agents to autonomously perform software engineering tasks. It features a custom agent-computer interface (ACI) that enhances the agent's ability to navigate repositories, create and edit code, and execute programs, achieving state-of-the-art results on the SWE-bench and HumanEvalFix benchmarks. [2, 5, 8]
LightRAG is an open-source framework designed for simple and fast Retrieval-Augmented Generation (RAG), integrating knowledge graphs, vector search, and efficient LLM-based processing to enhance question-answering over large document collections.
DeepSeek-V3.2 is an open large language model that balances high computational efficiency with superior reasoning and agent capabilities. Key innovations include DeepSeek Sparse Attention (DSA) for reduced complexity in long contexts, a scalable reinforcement learning framework achieving GPT-5-level performance, and a large-scale agentic task synthesis pipeline for improved generalization in tool-use scenarios.
Provides a conditional memory module that performs O(1) N‑gram lookups and fuses static embeddings into transformer hidden states — enables offloading large embedding tables to host memory with minimal inference overhead.
Introduces Draft-OPD, an on-policy distillation method for training lightweight draft models used in speculative decoding — it focuses learning on draft-induced errors via target-assisted rollouts and replay, improving acceptance length and enabling >5× lossless LLM inference acceleration.
Automates distillation of heterogeneous traces from a target person or role into versioned, inspectable skill packages for LLM agents — producing separate capability and bounded-behavior tracks that support natural-language corrections, rollback, and cross-host installation. Ships with an open system and a skills gallery.
Uses search-agent reading traces and tiered distractors to train LLMs for long-context, multi-hop reasoning, and introduces a rubric reward that supervises entity-level steps (applied only to correct finals). Improves evidence-grounded reasoning and resists reward hacking across 4B–30B models.
Proposes TrOPD, a method that restricts token-level on-policy distillation to regions where teacher supervision is reliable to stabilize training under teacher–student distribution mismatch. Adds outlier handling (clipping, masking, forward-KL) and off-policy guidance; shows consistent gains on math reasoning, code generation and general benchmarks.
Studies small trainable adapters (PEFT) used as persistent personal models on top of large foundation models, analyzing three scaling axes—Scale Up, Scale Down, Scale Out—and introducing MinT, an infrastructure for adapter identity, provenance, evaluation, and serving.
Analyzes how single-domain RL fine-tuning on LLMs induces cross-domain interference and shows this damage concentrates in a low-dimensional shared conflict subspace; proposes a local perturbation theory and short domain "refresh" procedures that selectively recover earlier domains with minimal collateral loss.
Studies when and how to combine visual future rollouts from world models with abstract reasoning in multimodal LLMs. Proposes PF-OPSD — a teacher-student distillation that uses ground-truth future videos during training — and evaluates on two human-verified benchmarks, improving accuracy ≈10% while improving robustness to noisy rollouts.