AIAny - Surya

Document OCR and layout parsing are often split across specialized pipelines or dominated by very large models. Surya takes a different tradeoff: it unifies layout, OCR (including inline math), reading-order and table recognition inside a single ~650M-parameter vision–language model, aiming for strong end-to-end accuracy while keeping model size and inference cost constrained.

What Sets It Apart

Pareto-efficient size/quality tradeoff — Surya scores 83.3% on olmOCR-bench while staying well under 3B parameters, so you get near state-of-the-art document parsing without a very large model footprint (so what: lower VRAM and cheaper inference for production pipelines).
Unified VLM for layout + OCR + table-rec — one model emits layout JSON or full-page HTML (with <math> tags) depending on prompt, which simplifies pipelines and reduces format-translation errors (so what: easier integration and fewer cascade failures across separate components).
Multilingual and reading-order aware — evaluated across an internal 91-language benchmark with broad pass rates and explicit reading-order output (so what: better cross-language robustness and downstream structure for extraction tasks).
Practical inference choices — works with vllm on NVIDIA GPUs or llama.cpp/llama-server on CPU/Apple Silicon, and provides a manager that auto-spawns/attaches to the backend (so what: flexible deployment from local CPU to single-GPU servers).

Who It's For and Tradeoffs

Great fit if you need accurate, production-friendly document parsing that balances quality and cost: teams extracting structured text, tables, or semantics from scanned PDFs and multilingual documents who want a single-model stack. Look elsewhere if absolute top-of-the-line leaderboard scores (from very large models) are the sole priority, if your use case is natural-scene text (photos), or if your commercial licensing needs exceed the model’s modified OpenRAIL-M terms (weights are free for research/personal use and small startups; commercial licensing details are on the project site).

Where It Fits

Compared with larger document parsers, Surya occupies the lower-latency / lower-cost part of the accuracy curve: it’s a pragmatic choice when you need structured OCR + layout + table output with modest infrastructure, rather than pursuing the last percent of benchmark performance with multi-billion-parameter models.

Surya

Introduction

What Sets It Apart

Who It's For and Tradeoffs

Where It Fits

Information

Categories

Tags

More Items

LuffyTheFox/Qwen3.6-35B-A3B-Uncensored-Genesis-Hermes-V3-GGUF

NVIDIA Nemotron-3-Embed-1B-BF16

Qwen3.6-27B-Fable-Fusion-711-Uncensored-Heretic-NM-DAU-NEO-MAX-MTP-GGUF