Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

agent-skills

ai

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-deploy

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

algorithms

alibaba

amazon

android

anthropic

audio

aws

benchmark

biology

blog

book

bytedance

chatbot

chatgpt

chemistry

claude

claude-code

cli

code

codex

copilot

course

cuda

cursor

deepmind

deepseek

depth

devops

diffusers

docker

drug-discovery

electron

embeddings

engineering

evaluation

facebook

finance

flow-matching

foundation

foundation-model

gemini

gemini-cli

gemma

genomics

gitHub

github

go

google

gradient-booting

grok

groq

huggingface

image

ios

java

javascript

json

kimi

llama.cpp

LLM

llm

lora

mLOps

math

mcp

mcp-client

mcp-server

meta-ai

meta-pytorch

metal

microsoft

mlops

mobile

multilingual

multimodal

mysql

NLP

nlp

nodejs

numpy

nvidia

ocr

ollama

openai

opencode

pandas

paper

physics

pi

plugin

polars

postgres

privacy

prompt-engineering

pwa

python

pytorch

qwen

react

reasoning

retrieval

RL

robotics

rust

science

security

segmentation

shodan

skillkit

sora

speech

sqlite

ssh

stt

swe

tensorrt

terminal

transformers

translation

tts

tutorial

typescript

vibe-coding

video

vision

vllm

voice

web-search

windsurf

xAI

xai

AI Video2025

Seedance

ByteDance Seed

Generates 1080p videos from text or images, with native multi-shot storytelling that keeps subjects, style, and atmosphere consistent across cuts. Ranked first on Artificial Analysis T2V and I2V leaderboards, ahead of Veo 3 and Kling 2.0.

ai-tools ai-video bytedance

AI Infra2024

AIBrix

vllm-projectByteDance

Cloud-native control plane that scales vLLM on Kubernetes, adding the routing, autoscaling, and fault tolerance single-instance serving lacks. Brings high-density LoRA management, an LLM gateway, distributed KV cache reuse, and SLO-aware GPU serving.

vllm ai-inference ai-serving mlops bytedance+2

AI Train2024

verl: Volcano Engine Reinforcement Learning for LLMs

ByteDance Seed Team, Volcengine +1ByteDance Seed, The University of Hong Kong +1

Open-source HybridFlow implementation for RL post-training of LLMs. Decouples control flow from compute so PPO, GRPO, GSPO and DAPO share one dataflow; pairs FSDP/Megatron with vLLM/SGLang rollout and reports 1.5-20x throughput over prior RLHF stacks.

RL LLM vllm pytorch huggingface+3

AI Train2024

Protenix

ByteDance AI4Science (AML) Team

High-accuracy biomolecular structure prediction suite: open-source models (protenix-v2/v1), a benchmark/evaluation toolkit, and a web server for inference. Targets protein/antibody–antigen and ligand-aware predictions with inference-time sampling and constraint support.

bytedance github foundation-model genomics drug-discovery+2

AI Agent2025

UI-TARS Desktop

ByteDance

Drives your computer from natural language: a vision-language model reads raw screenshots and works the mouse and keyboard like a person, controlling any GUI app without APIs or accessibility hooks. Local or remote operator modes on Windows and macOS.

bytedance github ai-agent vision LLM+7

AI Agent2025

UI-TARS

ByteDance

Reads GUI screenshots and directly outputs desktop, mobile, and browser actions — clicking, typing, navigating — as one end-to-end vision-language model rather than a modular pipeline. Scores 84.8% on WebVoyager and 42.5% on OSWorld.

bytedance github ai-agent vision LLM+3

AI Agent2025

DeerFlow

ByteDance (DeerFlow team), Daniel Walnut +1ByteDance

Orchestrates a lead agent, isolated parallel sub-agents, long-term memory, and sandboxes for long-horizon tasks — minutes to hours of deep research, coding, and content creation. LangChain/LangGraph-based with extensible skills; v2 is a full rewrite.

ai-agent agent-skills bytedance mcp-server docker+3

AI Image2025

Dolphin

ByteDance

Converts document images—scans, photos, born-digital PDFs—into structured text in two stages: first map layout and reading order, then parse each element (text, tables, formulas, figures) in parallel, each guided by its own task prompt.

ocr pytorch huggingface bytedance github+2

AI Client2025

MineContext

Volcengine (ByteDance)

Continuously screenshots your screen, feeds the captures to a vision-language model, then pushes back daily summaries, weekly recaps, and todos on its own. Local-first desktop app: data stays on your machine; runs on OpenAI-format or local LLMs.

ai-client ai-tools bytedance typescript embeddings+1

AI Agent2026

OpenViking

Volcengine

Models an AI agent's context as a file system, unifying memory, resources, and skills instead of flat vector RAG. Uses L0/L1/L2 tiered loading to cut tokens, directory-recursive plus semantic retrieval, and visualized retrieval traces for debugging.

github bytedance ai-agent agent-skills RAG+4

AI Model2026

Lance: Unified Multimodal Modeling by Multi-Task Synergy

Fengyi Fu, Mengqi Huang +12

Delivers image and video generation, editing, and understanding inside a single 3B-parameter multimodal model trained from scratch with a multi-task recipe. Notable for strong unified benchmarks at 3B scale; inference requires large GPU memory (≈40GB+ VRAM).

bytedance multimodal video ai-video ai-image+5

AI Video2026

ByteDance/Bernini-R

ByteDance

Provides the renderer weights and inference code for Bernini’s video renderer, enabling text→video, image→video and video editing inference. Offers a ready diffusers-format bundle or safetensors checkpoints under Apache‑2.0; intended for multi‑GPU/Hopper inference and reproducible research.

bytedance huggingface diffusers video ai-video+3