AIAny - RL

ReAct: Synergizing Reasoning and Acting in Language Models

2022

Shunyu Yao, Jeffrey Zhao +5

This paper introduces ReAct, an approach that integrates reasoning and acting in large language models (LLMs). ReAct enables LLMs to generate both reasoning traces and task-specific actions in an interleaved manner. This synergy allows reasoning to help induce, track, and update action plans, while actions interface with external sources like knowledge bases to gather more information, overcoming issues of hallucination and error propagation in prior methods.

paper LLM NLP ai-agent google+1

Agent Lightning

2025

Microsoft Research

Agent Lightning is an open-source framework developed by Microsoft Research for optimizing and training AI agents using reinforcement learning (RL) and other techniques, supporting integration with any agent framework with minimal code changes.

RL LLM ai-agent microsoft ai-train+3

MiniMind

2024

Jingyao Gong

MiniMind is an open-source GitHub project that enables users to train a 26M-parameter tiny LLM from scratch in just 2 hours with a cost of 3 RMB. It provides native PyTorch implementations for Tokenizer training, pretraining, supervised fine-tuning (SFT), LoRA, DPO, PPO/GRPO reinforcement learning, and MoE architecture with vision multimodal extensions. It includes high-quality open datasets, supports single-GPU training, and is compatible with Transformers, llama.cpp, and other frameworks, ideal for LLM beginners.

LLM tutorial github ai-train RL

Tianshou

2018

Jiayi Weng, Huayu Chen +8

Tianshou is a high-performance, modular deep reinforcement learning library built on pure PyTorch. It offers both high-level APIs for easy application and detailed procedural APIs for algorithm development. Tianshou supports online/offline RL, multi-agent setups, many standard algorithms (DQN, PPO, SAC, TD3, CQL, etc.), vectorized environments, EnvPool integration, recurrent models, multi-GPU training, and logging integrations (TensorBoard, W&B). It emphasizes software quality and reproducible results.

RL ai-library github ai-train ai-framework+1

CleanRL (Clean Implementation of RL Algorithms)

2019

vwxyzjn (GitHub owner), Shengyi Huang +6

CleanRL is a high-quality, single-file implementation library for deep reinforcement learning (Deep RL). It provides compact, research-friendly standalone implementations of many RL algorithms (PPO, DQN, C51, DDPG, TD3, SAC, PPG, etc.), benchmarks, TensorBoard logging, Weights & Biases integration, and cloud/run tooling. It emphasizes readability, reproducibility, and ease of understanding rather than being a modular importable framework.

RL pytorch github ai-library huggingface+3

Qlib

2020

Microsoft Research

Qlib is an open-source, AI-oriented quantitative investment platform from Microsoft that provides a full pipeline for quant research — data processing, feature engineering, model training, backtesting and serving. It supports supervised learning, market-dynamics modeling and reinforcement learning, and integrates tools (e.g., RD-Agent) for automated factor mining and model optimization.

microsoft github ai-library mlops ai-workflow+3

labml.ai Deep Learning Paper Implementations

2020

labml.ai (labmlai)

A curated collection of 60+ concise, well-documented PyTorch implementations of deep learning papers from labml.ai. It provides side-by-side notes and tutorials for transformers, optimizers, GANs, RL, diffusion models, vision models and more, intended as learning and reproduction resources.

pytorch paper github tutorial ai-coding+5

Isaac Lab

2022

NVIDIA (Isaac Sim / Omniverse team)

Isaac Lab is an open-source, GPU-accelerated robotics learning framework built on NVIDIA Isaac Sim. It provides high-fidelity physics and sensor simulation, ready-to-train environments and robot models, and integrations for reinforcement and imitation learning workflows to accelerate sim-to-real research and large-scale robot training.

nvidia RL physics ai-framework ai-train+3

ms-swift (SWIFT: Scalable lightWeight Infrastructure for Fine-Tuning)

2023

ModelScope community, Yuze Zhao +11

ms-swift (SWIFT) is an extensible, lightweight infrastructure from the ModelScope community for fine-tuning, evaluating, quantizing and deploying large language models (LLMs) and multimodal LLMs. It supports hundreds of text and multimodal models, many low-cost fine-tuning and quantized training techniques, Megatron-style model parallelism, RL/GRPO family algorithms for alignment, and multiple inference/deployment backends such as vLLM and LMDeploy. ms-swift provides CLI, Python APIs and a Web UI for end-to-end model workflows.

llm ai-train ai-inference ai-serving github+3

Awesome-ML-SYS-Tutorial

2024

zhaochenyang20

A GitHub repository of learning notes and code dedicated to ML + SYS (machine learning systems). It collects tutorials, code walkthroughs and engineering notes on RLHF, distributed training (FSDP, Megatron), inference and scheduling (SGLang, vllm), quantization, CUDA/GPU optimization, system design, and practical engineering.

github mlops ai-train pytorch LLM+6

Verifiers: Environments for LLM Reinforcement Learning

2025

Prime Intellect, William Brown

Verifiers is an open-source library from Prime Intellect providing modular components to build, evaluate, and train reinforcement-learning environments for LLM agents. It includes SingleTurn/MultiTurn envs, ToolEnv for tool-enabled agents, rubric-based reward functions, parsers, and integrations with prime-rl and common inference stacks for both small-scale evaluation and large-scale RL training.

RL LLM ai-library ai-train github+1

Cua

2025

trycua (GitHub)

Cua is an open-source infrastructure platform for building, benchmarking, and deploying computer-use AI agents. It provides self-hostable sandboxes (Docker, QEMU, Apple Vz), SDKs, and benchmark suites to train and evaluate agents that can control full desktops across macOS, Linux, and Windows.

ai-agent ai-development ai-tools github mlops+2

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

claude-code

course

deepmind

deepseek

engineering

finance

foundation

foundation-model

gemini

github

google

gradient-booting

grok

huggingface

LLM

llm

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

ocr

ollama

openai

paper

physics

plugin

pytorch

RL

robotics

science

security

sora

translation

tutorial

vibe-coding

video