Most failures in literature QA come from brittle retrieval and missing provenance: models either hallucinate facts or return uncited summaries. PaperQA2 treats the pipeline as evidence engineering — it focuses on fetching, scoring, and summarizing chunks with explicit citation metadata so that generated answers are grounded and traceable.
What Sets It Apart
- Metadata-aware retrieval: embeddings and indexing incorporate paper metadata (DOI, citations, venues) so retrieval prioritizes authoritative sources and can flag retractions or low-quality outlets. This reduces noisy context for scientific queries.
- LLM-based re-ranking & contextual summaries: top-k chunks are re-scored and summarized by an LLM before answer generation, producing concise, citation-linked contexts instead of raw passages. That two-stage design improves answer faithfulness for technical claims.
- Agentic RAG workflows: an agent can iteratively search, gather evidence, and refine answers (or run a faster fixed pipeline). The repo exposes both a CLI (pqa) and a Python API for custom pipelines and evaluation.
- Multimodal and practical: supports images/tables in PDFs, media enrichment prompts, local embedding models or cloud providers, and optional external vector DB backends for scale.
Who It's For — Tradeoffs
Great fit if you need reproducible, citation-backed answers from a curated PDF corpus (researchers, reviewers, teams building literature QA). It excels when you can supply or index the relevant papers and have access to LLM/embedding resources.
Look elsewhere if you want a drop-in web search across the entire open literature without providing PDFs (PaperQA2 expects you to index documents), or if you need extremely lightweight on-device inference — practical use typically requires an LLM/embedding provider or a reasonably capable local model.
Where It Fits
Use PaperQA2 to build toolchains for literature review, claim verification, summarization of research corpora, or evaluation suites that demand provenance. It’s complementary to general RAG frameworks but opinionated toward scientific metadata, re-ranking, and agentic evidence gathering.
