Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

agent-skills

ai

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-deploy

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

algorithms

alibaba

amazon

android

anthropic

audio

aws

benchmark

biology

blog

book

bytedance

chatbot

chatgpt

chemistry

claude

claude-code

cli

code

codex

copilot

course

cuda

cursor

deepmind

deepseek

depth

devops

diffusers

docker

drug-discovery

electron

embeddings

engineering

evaluation

facebook

finance

flow-matching

foundation

foundation-model

gemini

gemini-cli

gemma

genomics

gitHub

github

go

google

gradient-booting

grok

groq

huggingface

image

ios

java

javascript

json

kimi

llama.cpp

LLM

llm

lora

mLOps

math

mcp

mcp-client

mcp-server

meta-ai

meta-pytorch

metal

microsoft

mlops

mobile

multilingual

multimodal

mysql

NLP

nlp

nodejs

numpy

nvidia

ocr

ollama

openai

opencode

pandas

paper

physics

pi

plugin

polars

postgres

privacy

prompt-engineering

pwa

python

pytorch

qwen

react

reasoning

retrieval

RL

robotics

rust

science

security

segmentation

shodan

skillkit

sora

speech

sqlite

ssh

stt

swe

tensorrt

terminal

transformers

translation

tts

tutorial

typescript

vibe-coding

video

vision

vllm

voice

web-search

windsurf

xAI

xai

AI Others2015

IPED

sepinf-inc (Brazilian Federal Police team)Brazilian Federal Police

Processes and indexes seized digital evidence — disk images, files, timelines — for forensic examiners. Bundles high-speed carving, OCR, named-entity recognition, similar-image and face search, and audio transcription behind scriptable Java parsers.

github nlp ocr ai-image audio+1

Speech Technology Papers2015

Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Dario Amodei, Rishita Anubhai +32Baidu Research

Bet that one neural net, scaled with HPC, could transcribe both English and Mandarin without hand-built pipelines — reaching human-competitive accuracy by training fast enough to iterate on architecture in days, not weeks.

30u30 paper audio ASR

AI Train2017

fairseq

Facebook AI Research (FAIR)Meta AI (formerly Facebook AI Research)

Sequence modeling toolkit for training custom models for translation, summarization, and language modeling. Reference implementation behind RoBERTa, BART, mBART, XLM-R, and wav2vec 2.0, with multi-GPU and mixed-precision training.

pytorch nlp translation ASR audio+5

AI Audio2019

NVIDIA NeMo

NVIDIA

Build, fine-tune, and deploy speech AI on NVIDIA GPUs: ASR, text-to-speech, and speech LLMs in one PyTorch stack. Ships pretrained Parakeet/Canary recognition and Magpie TTS checkpoints; broader LLM/multimodal training now lives in v2.7.0.

nvidia pytorch ASR audio huggingface+3

AI Audio2020

ultimatevocalremovergui

Anjok07, aufr33

Extracts vocals and instrumentals from audio using an ensemble of models — MDX-Net/MDX23C, Demucs v3/v4, and the VR architecture. Runs locally via a Tkinter GUI with GPU acceleration across Nvidia, AMD, Intel, and Apple chips.

audio pytorch ai-tools github python+1

AI Deploy2022

ExecuTorch

PyTorchMeta

Deploys PyTorch models directly on phones, microcontrollers, and embedded hardware via ahead-of-time compilation to a ~50KB C++ runtime. Delegates subgraphs to 12+ backends (XNNPACK, CoreML, Qualcomm, ARM Ethos-U) with torchao quantization.

pytorch ai-inference ai-serving meta-ai llm+5

AI Audio2022

edge-tts

rany2

edge-tts is a Python module that enables the use of Microsoft Edge's online text-to-speech service directly from Python code or via command-line tools like edge-tts and edge-playback, without requiring Microsoft Edge, Windows, or an API key.

github ai-tools audio microsoft ai-library

AI Audio2022

Whisper

OpenAI

Multilingual sequence-to-sequence speech model and toolkit for speech recognition, speech-to-text translation, and language identification. Offers several model sizes (tiny → large/turbo) for different speed/accuracy trade-offs and ships with a CLI and Python API for offline transcription workflows.

openai speech ASR multilingual pytorch+4

AI Audio2022

Buzz

Chidi Williams

Transcribes and translates audio and video entirely on your own machine with OpenAI's Whisper, so nothing is uploaded. Handles local files, YouTube links, and live mic input; exports TXT, SRT, and VTT, plus a watch folder for batch jobs.

github python audio openai ai-tools

AI Video2022

SadTalker

Wenxuan Zhang, Xiaodong Cun +6

Generate a lip-synced talking-head video from a single portrait image and an audio clip using learned 3D motion coefficients for realistic expression and head motion. Offers still/reference modes, Colab/HuggingFace demos, and an Apache-2.0 license.

audio video ai-video pytorch github+3

AI Audio2022

FunASR

Alibaba DAMO Academy, Northwestern Polytechnical University (NWPU) +5Alibaba DAMO Academy, ModelScope

Bundles ASR, voice activity detection, punctuation, and speaker diarization into one pipeline, with pretrained models like Paraformer and SenseVoice. SenseVoice runs ~17x realtime on CPU; also ships streaming ASR and an OpenAI-compatible API.

ASR audio pytorch ai-library huggingface+4

AI Image2023

Comfy.org

Comfy Org

Create and run node-based generative AI workflows for images, video, 3D, and audio — reusable, shareable node graphs with custom nodes, live previews, and local/cloud runtime options. Open-source with Comfy Cloud and Hub for creators.

ai-tools ai-image ai-video audio ai-client+1