Docling is an open-source document parsing and understanding library designed for generative-AI workflows. It processes many formats (PDF, DOCX, PPTX, HTML, images, audio, WebVTT), offers advanced PDF layout/table/code/formula understanding, OCR and ASR support, a unified document representation, multiple export formats, local execution for sensitive data, CLI, and integrations with popular agent/LLM frameworks. It also provides an MCP server for agentic usage.
Docling — Document processing and understanding for gen-AI
Docling is an open-source toolkit and library that simplifies preparing documents for generative-AI applications. It focuses on robust parsing, structured representation, and integrations that let downstream systems (LLMs, retrieval systems, agents) consume documents reliably.
Key capabilities
Parsing of many formats: PDF, DOCX, PPTX, XLSX, HTML, images (PNG, TIFF, JPEG, ...), audio (WAV, MP3), WebVTT, and more.
Advanced PDF understanding: page layout & reading order, table structure recovery, code/formula detection, image classification and richer layout parsing.
OCR support: processes scanned PDFs and images with OCR pipelines to extract text and layout.
Audio/ASR support: transcription for audio inputs and support for WebVTT track parsing.
Unified data model: a DoclingDocument representation that expresses structure, annotations and metadata in a consistent format and can be exported to Markdown, HTML, JSON, or DocTags.
Local-first operation: supports local execution and air-gapped environments for sensitive data handling.
MCP server: a lightweight server to connect Docling processing into agentic workflows and pipelines.
Integrations: plug-and-play adapters for LangChain, LlamaIndex, Haystack and others to accelerate RAG and agent setups.
Highlights & design goals
Developer-friendly CLI and Python API for quick conversion and experimentation.
Extensible pipeline architecture to swap OCR engines, VLMs, or layout models.
Focus on quality of document structure extraction (tables, code blocks, formulas), which improves downstream retrieval and LLM prompting.
Works across major OSes (macOS, Linux, Windows) and CPU architectures (x86_64, arm64).
Typical usage
Convert a PDF or URL to a DoclingDocument and export to Markdown/JSON for indexing into a retriever.
Example (Python):
from docling.document_converter import DocumentConvertersource = "https://arxiv.org/pdf/2408.09869"converter = DocumentConverter()result = converter.convert(source)print(result.document.export_to_markdown())
CLI example:
docling https://arxiv.org/pdf/2206.01062
You can run with a visual-language model pipeline or enable OCR and other options via CLI flags.
Integrations & ecosystem
Integrates with LangChain, LlamaIndex, Haystack and other agent/LLM frameworks for retrieval-augmented generation and agentic document use.
Supports visual-language models (e.g., GraniteDocling) and can be combined with VLMs hosted on Hugging Face or local model runtimes.