Verifiers — Overview
Verifiers is a modular Python library for creating environments and evaluation harnesses tailored to training and assessing large language model (LLM) agents with reinforcement learning. It is designed to serve multiple roles: a direct evaluation/evals library for API-based models, a synthetic data / rollout pipeline, and as the environment layer for large-scale RL training (notably with the companion prime-rl trainer).
Key features
- Environment primitives: SingleTurnEnv for single-response tasks, MultiTurnEnv for custom multi-turn interaction protocols, ToolEnv/StatefulToolEnv and Sandbox/PythonEnv for tool-using or sandboxed agents.
- Rubrics & Rewards: Composable reward functions (sync/async), JudgeRubric integration for LLM-based judges, and weighting of multiple reward components.
- Parsers & Format rewards: Parsers to extract structure or enforce formats; utilities to derive format-adherence rewards and manage tool-call parsing.
- Integration with training stacks: First-class support for prime-rl for large-scale async training and an included minimal vf.RLTrainer (a lightweight transformers-based trainer) for single-node experiments.
- Ecosystem & distribution: Environments are distributable as installable Python modules and can be published/consumed via the Prime Intellect Environments Hub. CLI integrations via the
primeCLI anduvtooling are provided for setup, installation, evaluation, and publishing. - vLLM / inference support: Supports OpenAI-compatible inference clients and exposes sampling controls compatible with vLLM server parameters for advanced rollout strategies.
Typical workflows
-
Quick evaluation: Use
vf-evalor the library API to run API-driven evaluations (e.g. with OpenAI-compatible clients) across an HF-style dataset. -
Local development: Create environment modules under
environments/, implementload_environmentand any needed env logic, and test with CPU-based API rollouts. -
RL training: Use
prime-rlfor scalable GPU-based asynchronous RL training or the includedvf.RLTrainerfor small experiments. The repo provides example configs and setup scripts to wire up trainer, orchestrator, and inference server.
Example snippets
- Install and add to a uv project:
curl -LsSf https://astral.sh/uv/install.sh | sh
uv init && uv venv --python 3.12
uv tool install prime
uv add verifiers- Run a quick evaluation with an API model:
uv run vf-eval wordle -m gpt-4.1-miniArchitecture & extensibility
Verifiers separates dataset, rollout logic, rubric, and parser concerns so environments can be reused across evaluation, synthetic data generation, and RL training. Environments expose a load_environment entrypoint, and environment authors can publish modules to the Environments Hub. The library is intentionally trainer-agnostic: any trainer that exposes an OpenAI-compatible inference client can be integrated.
Who maintains it / citation
Originally created by William Brown and maintained under the Prime Intellect organization. The repository README includes citation metadata for academic use.
Release & status notes
The repository is actively maintained (project created 2025-01-22) and publishes incremental versions and release notes in the README. It is intended for research and production-oriented RL work with LLMs; many environment implementations are community-contributed and lightly reviewed, while team-maintained research environments live in a separate repository.
Use cases
- Evaluating LLM completions with programmatic reward functions or LLM judges.
- Building tool-enabled agent sandboxes for tool-using research.
- Large-scale RL training of LLMs using asynchronous rollout/orchestration patterns.
For full documentation and examples, see the official docs: https://docs.primeintellect.ai/verifiers
