LogoAIAny
Icon for item

Verifiers: Environments for LLM Reinforcement Learning

Verifiers is an open-source library from Prime Intellect providing modular components to build, evaluate, and train reinforcement-learning environments for LLM agents. It includes SingleTurn/MultiTurn envs, ToolEnv for tool-enabled agents, rubric-based reward functions, parsers, and integrations with prime-rl and common inference stacks for both small-scale evaluation and large-scale RL training.

Introduction

Verifiers — Overview

Verifiers is a modular Python library for creating environments and evaluation harnesses tailored to training and assessing large language model (LLM) agents with reinforcement learning. It is designed to serve multiple roles: a direct evaluation/evals library for API-based models, a synthetic data / rollout pipeline, and as the environment layer for large-scale RL training (notably with the companion prime-rl trainer).

Key features
  • Environment primitives: SingleTurnEnv for single-response tasks, MultiTurnEnv for custom multi-turn interaction protocols, ToolEnv/StatefulToolEnv and Sandbox/PythonEnv for tool-using or sandboxed agents.
  • Rubrics & Rewards: Composable reward functions (sync/async), JudgeRubric integration for LLM-based judges, and weighting of multiple reward components.
  • Parsers & Format rewards: Parsers to extract structure or enforce formats; utilities to derive format-adherence rewards and manage tool-call parsing.
  • Integration with training stacks: First-class support for prime-rl for large-scale async training and an included minimal vf.RLTrainer (a lightweight transformers-based trainer) for single-node experiments.
  • Ecosystem & distribution: Environments are distributable as installable Python modules and can be published/consumed via the Prime Intellect Environments Hub. CLI integrations via the prime CLI and uv tooling are provided for setup, installation, evaluation, and publishing.
  • vLLM / inference support: Supports OpenAI-compatible inference clients and exposes sampling controls compatible with vLLM server parameters for advanced rollout strategies.
Typical workflows
  • Quick evaluation: Use vf-eval or the library API to run API-driven evaluations (e.g. with OpenAI-compatible clients) across an HF-style dataset.

  • Local development: Create environment modules under environments/, implement load_environment and any needed env logic, and test with CPU-based API rollouts.

  • RL training: Use prime-rl for scalable GPU-based asynchronous RL training or the included vf.RLTrainer for small experiments. The repo provides example configs and setup scripts to wire up trainer, orchestrator, and inference server.

Example snippets
  • Install and add to a uv project:
curl -LsSf https://astral.sh/uv/install.sh | sh
uv init && uv venv --python 3.12
uv tool install prime
uv add verifiers
  • Run a quick evaluation with an API model:
uv run vf-eval wordle -m gpt-4.1-mini
Architecture & extensibility

Verifiers separates dataset, rollout logic, rubric, and parser concerns so environments can be reused across evaluation, synthetic data generation, and RL training. Environments expose a load_environment entrypoint, and environment authors can publish modules to the Environments Hub. The library is intentionally trainer-agnostic: any trainer that exposes an OpenAI-compatible inference client can be integrated.

Who maintains it / citation

Originally created by William Brown and maintained under the Prime Intellect organization. The repository README includes citation metadata for academic use.

Release & status notes

The repository is actively maintained (project created 2025-01-22) and publishes incremental versions and release notes in the README. It is intended for research and production-oriented RL work with LLMs; many environment implementations are community-contributed and lightly reviewed, while team-maintained research environments live in a separate repository.

Use cases
  • Evaluating LLM completions with programmatic reward functions or LLM judges.
  • Building tool-enabled agent sandboxes for tool-using research.
  • Large-scale RL training of LLMs using asynchronous rollout/orchestration patterns.

For full documentation and examples, see the official docs: https://docs.primeintellect.ai/verifiers

Information

  • Websitegithub.com
  • AuthorsPrime Intellect, William Brown
  • Published date2025/01/22

Categories

More Items