VLMEvalKit is an open-source evaluation toolkit for large vision-language models (VLMs/LVLMs). It enables one-command evaluation across many benchmarks, supports generation-based evaluation with optional LLM answer extraction, and provides leaderboards and reproducible pipelines.
Benchmarks document-parsing systems on real-world enterprise PDFs and images—evaluates tables, charts, content faithfulness, semantic formatting, and visual grounding with human-verified, rule-level tests. Ships with ~2,000 pages, ~169K test rules, and an open evaluation framework for end-to-end pipeline scoring.