Scans LLMs for security and safety failures — probing for hallucination, data leakage, prompt injection, jailbreaks, toxicity, and misinformation. A CLI red‑teaming kit that runs static, dynamic and adaptive probes across many providers and outputs structured JSONL reports.
Provides a modular Python framework to run standardized evaluations of large language models — including prompt engineering, tool usage, multi-turn dialog and model-graded scoring. Ships with 100+ pre-built evaluations and extension points for custom elicitation and scoring; intended for model comparison, safety checks and benchmark automation.
Packs a Git repository into a single AI-friendly file for easy ingestion by LLMs. Offers per-file and total token counts, optional Tree-sitter compression, secret scanning, and multiple interfaces (CLI, web, browser extension, Docker, MCP) for AI-driven code review and analysis.
Runs autonomous LLM-driven penetration-testing agents that discover, exploit, and produce reproducible vulnerability reports in sandboxed Docker environments. Ships with 20+ pentest tools, a knowledge-graph memory, multi-LLM provider support, and REST/GraphQL APIs for self-hosted deployments.
A code-first collection of runnable tutorials for building production-ready generative-AI agents — step-by-step guides covering stateful workflows, vector memory, RAG, tool integrations, Docker/AWS/RunPod deployment, security guardrails, observability, and multi-agent patterns.