AI Leaderboard2023

Arena Leaderboard (formerly LMArena)

Blind side-by-side voting site where users send one prompt to two anonymous chat models, pick the winner, and millions of votes become Elo rankings across text, coding, vision, image, and video. Crowd preference, not static benchmarks, decides the order.

Visit Website

Introduction

Static benchmarks reward models that overfit to known test sets; this leaderboard sidesteps that by never telling voters which model they are judging. Each ranking is built from millions of blind pairwise human votes, so a model cannot game it without genuinely winning real preferences — which is why frontier labs treat a strong placement here as a launch milestone.

What Sets It Apart

Rankings come from human preference votes, not curated question banks, so they track what people actually prefer rather than what a model memorized.
Models are anonymized during voting, removing brand bias and making the Elo scores hard to manipulate.
Coverage spans text, coding, vision, image, and video arenas, letting you compare the same model's standing across very different tasks.
Scores carry confidence intervals, so a one-rank gap with overlapping margins honestly signals a statistical tie rather than a real difference.

Who It's For

Great fit if you want a directional, preference-based read on which frontier models people like right now, or a public reference point when picking between similar top-tier models. Look elsewhere if you need reproducible, task-specific scores: votes skew toward conversational style and presentation, ranks shift as new models arrive, and the methodology rewards what feels better, not necessarily what is most correct on your particular workload.

Back

Information

Websitelmarena.ai
OrganizationsArena Intelligence Inc.
AuthorsLMSYS Org, Arena
Published date2023/05/03

More Items

AI Train2026

OpenAI/parameter-golf

OpenAI

A challenge repository for training the best language model that fits inside a 16,000,000‑byte (16MB) submission artifact; provides baseline training code, FineWeb bpb evaluation, a public leaderboard, and compute-grant instructions for short 8×H100 runs.

openai ai-train ai-leaderboard github pytorch+2

AI Leaderboard2023

VLMEvalKit

open-compass (OpenCompass community)OpenCompass, Shanghai AI Laboratory

Runs one-command evaluation of vision-language models across 80+ multimodal benchmarks, handling data download, inference, and metric scoring in a single pass. Supports 220+ LMMs; adding a new model means writing one generate_inner() function.

vision ai-leaderboard huggingface github ai-tools+1

AI Leaderboard2023

OpenCompass CompassRank

OpenCompass ContributorsShanghai AI Laboratory

Public leaderboard ranking LLMs and multimodal models across 70+ datasets — reasoning, knowledge, coding, math, and long-context. Blends open-source and proprietary benchmarks into one comparative view spanning GPT-4, Claude, Qwen, and InternLM.

ai-leaderboard

Arena Leaderboard (formerly LMArena)

Introduction

What Sets It Apart

Who It's For

Information

Categories

Tags

More Items

OpenAI/parameter-golf

VLMEvalKit

OpenCompass CompassRank