A 57-subject multiple-choice benchmark for measuring broad language understanding in LLMs; provides per-subject configs and test/dev/auxiliary_train splits for few-/zero-shot evaluation, widely used for model comparison and academic reporting.
Provides code, pretrained weights, and tooling for protein language models and structure prediction — including ESMC, ESMFold2, sparse autoencoders (SAEs), and the ESM Atlas. Includes model checkpoints, tutorials, Hugging Face & Biohub integration, and an MIT license.
Provides pre-parsed Parquet snapshots of English and French Wikipedia articles with structured fields (sections, infoboxes, tables, references, images) and credibility signals — optimized for large-scale analysis, retrieval-augmented generation, and model development.