Overview
Tianshou is an open-source reinforcement learning (RL) library implemented in PyTorch. Designed to balance usability and flexibility, it provides two complementary API levels: a high-level interface for application development and experiment configuration, and a low-level procedural (algorithmic) API for researchers implementing new RL algorithms.
Key features
- Modular design: clear separations between Algorithms, Policies, and training logic, enabling concise and maintainable implementations.
- Dual APIs:
- High-level API (ExperimentBuilder, pre-configured training loops) for fast prototyping and running experiments.
- Procedural API for full control over collection, replay buffers, optimizers and update rules.
- Wide algorithm coverage: implementations of value-based and policy-gradient methods (DQN, Double/Dueling DQN, C51, QRDQN, IQN, PPO, TRPO, A2C, DDPG, TD3, SAC), offline RL algorithms (BCQ, CQL, CRR, TD3+BC), imitation learning (GAIL), and useful techniques like PER, GAE, HER, ICM.
- Performance and scale:
- Vectorized environment support (sync/async) and experimental super-fast EnvPool integration.
- Multi-GPU training support and optimized components (n-step returns, PER using numba/numpy optimizations).
- Flexible environment and data types: supports arbitrary observation/action structures (dicts, classes) and recurrent networks for POMDPs.
- Logging and reproducibility: TensorBoard and WandB logging integrations, thorough test suite that includes full training runs to help ensure reproducible behaviour.
Installation & requirements
- Hosted on PyPI and conda-forge; requires Python >= 3.11.
- Recommended developer install via Poetry (repository clone +
poetry install), with optional extras for mujoco, envpool, atari, etc. - Alternatively:
pip install tianshou(PyPI release) orpip install git+https://github.com/thu-ml/tianshou.git@master --upgrade.
Typical usage
- High-level: configure an ExperimentBuilder (e.g., DQNExperimentBuilder) to declare env factory, training config and algorithm params; call
.build().run()to run experiments quickly. - Procedural: construct environments, networks, policies and algorithms manually, use Collectors and ReplayBuffers, then call the algorithm's training API for full control.
Who maintains it / citation
Tianshou is developed and maintained by contributors from the Tsinghua AI / THU-ML community and collaborators. If used in publications, authors request citation of the JMLR paper: "Tianshou: A Highly Modularized Deep Reinforcement Learning Library" (JMLR, 2022).
When to use
Use Tianshou when you need:
- A research-friendly RL codebase that is easy to extend to new algorithms;
- A practical framework for training RL agents with performant vectorized sampling and integrations (EnvPool, MuJoCo, Atari, PyBullet);
- An RL library with strong engineering practices and reproducibility-focused tests.
