AIAny - Tianshou

Overview

Tianshou is an open-source reinforcement learning (RL) library implemented in PyTorch. Designed to balance usability and flexibility, it provides two complementary API levels: a high-level interface for application development and experiment configuration, and a low-level procedural (algorithmic) API for researchers implementing new RL algorithms.

Key features

Modular design: clear separations between Algorithms, Policies, and training logic, enabling concise and maintainable implementations.
Dual APIs:
- High-level API (ExperimentBuilder, pre-configured training loops) for fast prototyping and running experiments.
- Procedural API for full control over collection, replay buffers, optimizers and update rules.
Wide algorithm coverage: implementations of value-based and policy-gradient methods (DQN, Double/Dueling DQN, C51, QRDQN, IQN, PPO, TRPO, A2C, DDPG, TD3, SAC), offline RL algorithms (BCQ, CQL, CRR, TD3+BC), imitation learning (GAIL), and useful techniques like PER, GAE, HER, ICM.
Performance and scale:
- Vectorized environment support (sync/async) and experimental super-fast EnvPool integration.
- Multi-GPU training support and optimized components (n-step returns, PER using numba/numpy optimizations).
Flexible environment and data types: supports arbitrary observation/action structures (dicts, classes) and recurrent networks for POMDPs.
Logging and reproducibility: TensorBoard and WandB logging integrations, thorough test suite that includes full training runs to help ensure reproducible behaviour.

Installation & requirements

Hosted on PyPI and conda-forge; requires Python >= 3.11.
Recommended developer install via Poetry (repository clone + poetry install), with optional extras for mujoco, envpool, atari, etc.
Alternatively: pip install tianshou (PyPI release) or pip install git+https://github.com/thu-ml/tianshou.git@master --upgrade.

Typical usage

High-level: configure an ExperimentBuilder (e.g., DQNExperimentBuilder) to declare env factory, training config and algorithm params; call .build().run() to run experiments quickly.
Procedural: construct environments, networks, policies and algorithms manually, use Collectors and ReplayBuffers, then call the algorithm's training API for full control.

Who maintains it / citation

Tianshou is developed and maintained by contributors from the Tsinghua AI / THU-ML community and collaborators. If used in publications, authors request citation of the JMLR paper: "Tianshou: A Highly Modularized Deep Reinforcement Learning Library" (JMLR, 2022).

When to use

Use Tianshou when you need:

A research-friendly RL codebase that is easy to extend to new algorithms;
A practical framework for training RL agents with performant vectorized sampling and integrations (EnvPool, MuJoCo, Atari, PyBullet);
An RL library with strong engineering practices and reproducibility-focused tests.

Tianshou

Introduction

Overview

Key features

Installation & requirements

Typical usage

Who maintains it / citation

When to use

Information

Categories

Tags

More Items

Grok-1

Anthropic Sandbox Runtime (srt)

NautilusTrader