AIAny - Verifiers: Environments for LLM Reinforcement Learning

Verifiers — Overview

Verifiers is a modular Python library for creating environments and evaluation harnesses tailored to training and assessing large language model (LLM) agents with reinforcement learning. It is designed to serve multiple roles: a direct evaluation/evals library for API-based models, a synthetic data / rollout pipeline, and as the environment layer for large-scale RL training (notably with the companion prime-rl trainer).

Key features

Environment primitives: SingleTurnEnv for single-response tasks, MultiTurnEnv for custom multi-turn interaction protocols, ToolEnv/StatefulToolEnv and Sandbox/PythonEnv for tool-using or sandboxed agents.
Rubrics & Rewards: Composable reward functions (sync/async), JudgeRubric integration for LLM-based judges, and weighting of multiple reward components.
Parsers & Format rewards: Parsers to extract structure or enforce formats; utilities to derive format-adherence rewards and manage tool-call parsing.
Integration with training stacks: First-class support for prime-rl for large-scale async training and an included minimal vf.RLTrainer (a lightweight transformers-based trainer) for single-node experiments.
Ecosystem & distribution: Environments are distributable as installable Python modules and can be published/consumed via the Prime Intellect Environments Hub. CLI integrations via the prime CLI and uv tooling are provided for setup, installation, evaluation, and publishing.
vLLM / inference support: Supports OpenAI-compatible inference clients and exposes sampling controls compatible with vLLM server parameters for advanced rollout strategies.

Typical workflows

Quick evaluation: Use vf-eval or the library API to run API-driven evaluations (e.g. with OpenAI-compatible clients) across an HF-style dataset.
Local development: Create environment modules under environments/, implement load_environment and any needed env logic, and test with CPU-based API rollouts.
RL training: Use prime-rl for scalable GPU-based asynchronous RL training or the included vf.RLTrainer for small experiments. The repo provides example configs and setup scripts to wire up trainer, orchestrator, and inference server.

Example snippets

Install and add to a uv project:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv init && uv venv --python 3.12
uv tool install prime
uv add verifiers

Run a quick evaluation with an API model:

uv run vf-eval wordle -m gpt-4.1-mini

Architecture & extensibility

Verifiers separates dataset, rollout logic, rubric, and parser concerns so environments can be reused across evaluation, synthetic data generation, and RL training. Environments expose a load_environment entrypoint, and environment authors can publish modules to the Environments Hub. The library is intentionally trainer-agnostic: any trainer that exposes an OpenAI-compatible inference client can be integrated.

Who maintains it / citation

Originally created by William Brown and maintained under the Prime Intellect organization. The repository README includes citation metadata for academic use.

Release & status notes

The repository is actively maintained (project created 2025-01-22) and publishes incremental versions and release notes in the README. It is intended for research and production-oriented RL work with LLMs; many environment implementations are community-contributed and lightly reviewed, while team-maintained research environments live in a separate repository.

Use cases

Evaluating LLM completions with programmatic reward functions or LLM judges.
Building tool-enabled agent sandboxes for tool-using research.
Large-scale RL training of LLMs using asynchronous rollout/orchestration patterns.

For full documentation and examples, see the official docs: https://docs.primeintellect.ai/verifiers

Verifiers: Environments for LLM Reinforcement Learning

Introduction

Verifiers — Overview

Key features

Typical workflows

Example snippets

Architecture & extensibility

Who maintains it / citation

Release & status notes

Use cases

Information

Categories

Tags

More Items

Grok-1

Tianshou

NautilusTrader