LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Daily AI
LogoAIAny

Tag

Explore by tags

LogoAIAny

Curated AI Resources for Everyone

[email protected]

Powered by airss.app

Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2026 All Rights Reserved.
  • All

  • 30u30

  • ASR

  • ChatGPT

  • GNN

  • IDE

  • RAG

  • agent-skills

  • ai

  • ai-agent

  • ai-api

  • ai-api-management

  • ai-client

  • ai-coding

  • ai-demos

  • ai-deploy

  • ai-development

  • ai-framework

  • ai-image

  • ai-image-demos

  • ai-inference

  • ai-leaderboard

  • ai-library

  • ai-rank

  • ai-serving

  • ai-tools

  • ai-train

  • ai-video

  • ai-workflow

  • AIGC

  • algorithms

  • alibaba

  • amazon

  • android

  • anthropic

  • audio

  • aws

  • biology

  • blog

  • book

  • bytedance

  • chatbot

  • chatgpt

  • chemistry

  • claude

  • claude-code

  • cli

  • code

  • codex

  • copilot

  • course

  • cursor

  • deepmind

  • deepseek

  • depth

  • devops

  • diffusers

  • docker

  • drug-discovery

  • electron

  • embeddings

  • engineering

  • facebook

  • finance

  • foundation

  • foundation-model

  • gemini

  • gemini-cli

  • gemma

  • genomics

  • gitHub

  • github

  • go

  • google

  • gradient-booting

  • grok

  • huggingface

  • image

  • ios

  • java

  • javascript

  • LLM

  • llm

  • math

  • mcp

  • mcp-client

  • mcp-server

  • meta-ai

  • meta-pytorch

  • microsoft

  • mlops

  • mobile

  • multilingual

  • multimodal

  • NLP

  • nlp

  • nodejs

  • nvidia

  • ocr

  • ollama

  • openai

  • opencode

  • pandas

  • paper

  • physics

  • plugin

  • postgres

  • privacy

  • prompt-engineering

  • python

  • pytorch

  • RL

  • robotics

  • rust

  • science

  • security

  • shodan

  • skillkit

  • sora

  • speech

  • ssh

  • tensorrt

  • terminal

  • transformers

  • translation

  • tutorial

  • typescript

  • vibe-coding

  • video

  • vision

  • vllm

  • voice

  • xAI

  • xai

Icon for item

OpenRouter LLM Rankings

2023
OpenRouter, Inc.

Crowd-sourced leaderboard that tracks token usage, popularity trends and market share of large language models on the OpenRouter gateway.

ai-leaderboard
Icon for item

OpenCompass CompassRank

2023
OpenCompass Contributors

Objective benchmark leaderboard from the OpenCompass community, scoring LLMs and LVLMs across 100+ datasets in five capability dimensions.

ai-leaderboard
GitHub
Icon for item

Language Model Evaluation Harness

2020
EleutherAI

Unified framework for few-shot evaluation of generative language models across 60+ academic benchmarks. Supports multiple model backends (Hugging Face, vLLM, APIs, local servers), configurable prompts/YAML configs, and reproducible exports for leaderboards and research comparisons.

llmai-leaderboardhuggingfacevllmgithub+3
Icon for item

Arena Leaderboard (formerly LMArena)

2023
LMSYS Org, Arena

Publishes real-time, human-voted leaderboards that rank frontier AI models across chat, code, image, video and more. Uses crowdsourced pairwise voting and Elo-style scores with per-arena breakdowns and historical data.

ai-leaderboardai-rankLLMchatbotai-tools
GitHub
Icon for item

VLMEvalKit

2023
open-compass (OpenCompass community)

VLMEvalKit is an open-source evaluation toolkit for large vision-language models (VLMs/LVLMs). It enables one-command evaluation across many benchmarks, supports generation-based evaluation with optional LLM answer extraction, and provides leaderboards and reproducible pipelines.

visionai-leaderboardhuggingfacegithubai-tools+1
Hugging Face
Icon for item

hf-audio/open-asr-leaderboard

2024
hf-audio

Provides leaderboard-ready test splits for the Open ASR Leaderboard: converts unsafe custom loaders to Parquet, sorts samples by audio length, and packages eight ESB test sets (LibriSpeech, Common Voice, GigaSpeech, SPGISpeech, etc.) for reproducible ASR benchmarking.

ASRaudiohuggingfaceai-leaderboardai-rank
Hugging Face
Icon for item

SWE-bench Verified

2025
SWE-bench

A human-verified subset of 500 SWE-bench test cases for evaluating models that resolve GitHub issues into PRs using unit-test verification. Contains problem statements and base commits (pre-fix) for reproducible unit-test based evaluation; suitable for benchmarking code-fix and issue-resolution capabilities.

githubnlppythonai-leaderboardai-rank+1
Hugging Face
Icon for item

ScaleAI/SWE-bench_Pro

2025
ScaleAI

Benchmark dataset for evaluating agents on long-horizon software-engineering tasks (repo-level patches, test-driven fixes). Includes golden patches, related tests, and problem statements in parquet format; aimed at agent debugging and code-modification evaluation but requires full test environments.

huggingfaceai-codingagent-skillsai-leaderboardcode
GitHub
Icon for item

OpenAI/parameter-golf

2026
OpenAI

A challenge repository for training the best language model that fits inside a 16,000,000‑byte (16MB) submission artifact; provides baseline training code, FineWeb bpb evaluation, a public leaderboard, and compute-grant instructions for short 8×H100 runs.

openaiai-trainai-leaderboardgithubpytorch+2
Hugging Face
Icon for item

ParseBench

2026
llamaindex (dataset), Boyang Zhang +5

Benchmarks document-parsing systems on real-world enterprise PDFs and images—evaluates tables, charts, content faithfulness, semantic formatting, and visual grounding with human-verified, rule-level tests. Ships with ~2,000 pages, ~169K test rules, and an open evaluation framework for end-to-end pipeline scoring.

huggingfacegithubpaperocrvision+3
Hugging Face
Icon for item

HealthBench Professional

2026
OpenAI

Benchmark dataset for evaluating clinician-facing chat assistants: physician-authored conversations plus rubric items, use-case and difficulty labels, specialty metadata, and a built-in canary to reduce benchmark contamination. Hosted on Hugging Face under an MIT license.

openaihuggingfaceai-rankai-leaderboardnlp+2
Hugging Face
Icon for item

χ-Bench

2026
actava

Evaluates LLM-driven agents on long-horizon, policy-rich U.S. healthcare workflows using 75 clinical task fixtures and a 20-app MCP simulator; includes task fixtures, shared worlds, and leaderboard integration (Managed-Care handbook is gated).

huggingfacemcpagent-skillsai-agentai-leaderboard+3
  • Previous
  • 1
  • 2
  • Next