MLOps2023

promptfoo

Declarative CLI and library to evaluate and red-team LLM apps: run test cases against prompts and models, compare providers side-by-side, and scan for jailbreaks, prompt injection, and data leaks — with CI/CD and pull-request code scanning built in.

Visit Website

Introduction

Most teams ship LLM features on vibes — tweak a prompt, eyeball a few outputs, and hope it holds in production. Promptfoo treats prompts and agents the way engineers treat code: as a test suite you can version, diff, and gate a release on. The same declarative config that scores answer quality also drives adversarial attacks, so evaluation and security stop being two separate projects.

What Sets It Apart

Side-by-side model matrix: run identical cases across OpenAI, Anthropic, Bedrock, Ollama, and local models at once — "which model is actually better for us" becomes a comparison table, not a hallway argument.
Red-teaming in the same harness: it generates jailbreak, prompt-injection, PII-leak, and tool-misuse probes, so security testing rides on the eval config you already wrote instead of a separate tool and team.
CI/CD- and PR-native: assertions fail a build and code scanning flags risky LLM changes during review, so regressions get caught before merge rather than after an incident.
Local-first, MIT-licensed: evals run on your machine and sensitive prompts or data never have to leave it — which is what makes it usable inside regulated orgs.

Who It's For

Great fit if you're hardening a customer-facing LLM feature and want repeatable, gate-able checks for both output quality and security in the same pipeline. Its March 2026 acquisition by OpenAI is a signal that red-teaming is becoming table stakes, and the project stays open source and MIT-licensed. Look elsewhere if you want a hosted, zero-config dashboard that observes production traffic — promptfoo is a developer-driven testing harness that assumes you'll write configs and wire it into your own workflow.

Back

Information

Websitegithub.com
OrganizationsPromptfoo, Inc., OpenAI
AuthorsPromptfoo team (now part of OpenAI)
Published date2023/04/28

More Items

AI Infra2026

Knowledge Catalog

Google Cloud (Google LLC), GoogleCloudPlatform (GitHub organization)

Provides tools and samples to build context management, enrichment, and retrieval solutions on Google Cloud Knowledge Catalog — an AI-oriented data catalog that builds a dynamic knowledge graph for structured and unstructured data, suitable for RAG and agent workflows.

google github ai ai-development RAG+5

MLOps2018

Prefect

PrefectHQ

Orchestrates and schedules Python data pipelines and workflows with primitives for retries, caching, parameters, and deployments. Provides either a self-hosted server or managed Prefect Cloud for monitoring, observability, and integrations across common data tools.

mLOps python ai-workflow docker cli+2

AI Agent2026

no-mistakes

kunchenguid

Acts as a local git proxy that runs an AI-driven validation pipeline in a disposable worktree, only forwarding the branch and opening a PR after every check passes. Runs review, tests, docs, and lint in isolation, applies safe auto-fixes, supports multiple agent providers, and pauses for human approval when intent would change.

go cli agent-skills ai-workflow mLOps+2