Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

agent-skills

ai

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-deploy

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

algorithms

alibaba

amazon

android

anthropic

audio

aws

benchmark

biology

blog

book

bytedance

chatbot

chatgpt

chemistry

claude

claude-code

cli

code

codex

copilot

course

cuda

cursor

deepmind

deepseek

depth

devops

diffusers

docker

drug-discovery

electron

embeddings

engineering

evaluation

facebook

finance

flow-matching

foundation

foundation-model

gemini

gemini-cli

gemma

genomics

gitHub

github

go

google

gradient-booting

grok

groq

huggingface

image

ios

java

javascript

json

kimi

llama.cpp

LLM

llm

lora

mLOps

math

mcp

mcp-client

mcp-server

meta-ai

meta-pytorch

metal

microsoft

mlops

mobile

multilingual

multimodal

mysql

NLP

nlp

nodejs

numpy

nvidia

ocr

ollama

openai

opencode

pandas

paper

physics

pi

plugin

polars

postgres

privacy

prompt-engineering

pwa

python

pytorch

qwen

react

reasoning

retrieval

RL

robotics

rust

science

security

segmentation

shodan

skillkit

sora

speech

sqlite

ssh

stt

swe

tensorrt

terminal

transformers

translation

tts

tutorial

typescript

vibe-coding

video

vision

vllm

voice

web-search

windsurf

xAI

xai

AI Video2026

RuneXX/LTX-2.3-Workflows

RuneXX

ComfyUI workflows that run LTX‑2.3 split models to produce text→video, image→video and audio→video pipelines. Uses extracted/split safetensor or GGUF files so models load more modularly; requires up‑to‑date ComfyUI, KJNodes and ComfyUI‑GGUF.

huggingface ai-video ai-workflow gemma video+1

AI Audio2026

Dramabox

ResembleAI

Generates expressive, prompt-driven text-to-speech audio with optional 10-second voice cloning; prompts control speaker identity, emotion, pauses and nonverbal sounds. An IC‑LoRA fine-tune of LTX‑2.3 that applies an imperceptible Resemble Perth watermark.

audio speech huggingface gemma ai-demos+2

AI Model2026

Gemma 4 E4B-it Assistant

Google DeepMind

Provides a lightweight assistant (draft) model for Gemma 4 E4B used in speculative-decoding pipelines — it predicts token drafts that the target model verifies in parallel, enabling up to ~2× decoding speedups while preserving identical final outputs. Useful for low-latency, multimodal assistant and on-device scenarios.

gemma deepmind google multimodal transformers+5

AI Model2026

gemma-4-31B-it-DFlash

z-lab

Draft model for speculative decoding that uses a lightweight block-diffusion drafter to propose multiple tokens in parallel; designed to pair with google/gemma-4-31B-it and accelerate autoregressive text generation (official benchmarks report up to ~5.8× throughput).

gemma huggingface vllm transformers llm+3

AI Image2026

HiDream-O1-Image

HiDream-ai

Generates and edits high-resolution images (up to 2048×2048) from text and reference images, plus subject-driven personalization. Implements a pixel-level unified transformer that encodes raw pixels and text in one token space and includes a reasoning-driven prompt agent for layout and text rendering.

transformers multimodal ai-image huggingface gemma+4

AI Video2026

SANA-WM (Bidirectional)

Efficient-Large-Model

Generates minute-scale, 720p videos from a single image using a 2.6B image-to-video diffusion transformer with precise 6‑DoF camera control and an optional LTX‑2 refiner; designed for long-context, memory-efficient modeling but requires large refiner checkpoints (~41 GB).

video ai-video ai-image huggingface gemma+2

AI Model2026

google/gemma-4-12B-it

Google DeepMind

Instruction-tuned, unified Gemma 4 12B multimodal model that accepts text, image and audio inputs and generates text outputs locally. Encoder-free design reduces multimodal latency and fits on consumer devices while offering long-context support and native thinking/system-prompt features.

gemma google deepmind multimodal transformers+5

AI Model2026

Gemma 4 12B Unified

Google DeepMind

A 12B unified, encoder-free multimodal model that directly ingests text, images and audio and returns text; supports very long contexts (up to 256K tokens), native function-calling/thinking modes, and small-model deployment for local or on-device use.

gemma multimodal transformers google deepmind+8

AI Model2026

unsloth/gemma-4-12b-it-GGUF

unsloth

A GGUF-quantized, locally runnable build of Gemma 4 12B Unified (image-text-to-text) packaged by unsloth; preserves multimodal (image/audio) input support under an Apache-2.0 license and is compatible with common GGUF runtimes and Unsloth Studio.

gemma google deepmind huggingface multimodal+7

AI Model2026

google/gemma-4-12B-it-qat-q4_0-gguf

Google DeepMind

Provides a GGUF-ready QAT (Q4_0) quantized build of Gemma 4 12B that preserves near-bfloat16 quality while reducing memory footprint for local inference; compatible with Transformers-based and GGUF runtimes.

gemma google deepmind huggingface transformers+3

AI Model2026

Gemma-4-12B-OBLITERATED

OBLITERATUS

A surgically modified Gemma 4 (12B) that removes refusal behavior while preserving benchmark parity; released as an uncensored research artifact with GGUF quantizations for local inference and red‑team/alignment evaluation.

gemma transformers huggingface llm ai-inference+5

AI Model2026

unsloth/gemma-4-12B-it-qat-GGUF

unsloth, Google DeepMind

GGUF-format QAT (quantization-aware training) build of Gemma 4 12B that reduces memory needs for local or lightweight inference while preserving near bfloat16 quality. Ready for any-to-any conversational pipelines and ecosystem deployment.

gemma huggingface google deepmind transformers+5