Tag

Explore by tags

AI Image Demos2025

Nano Banana Demos

prompts and images demo of nano banana

ai-image ai-demos ai-image-demos

AI Image2018

Kornia

Kornia contributors, E. Riba +4kornia.ai

Brings classic computer vision into PyTorch as differentiable, GPU-accelerated tensor operators — filters, geometric transforms, feature matching, camera calibration — so each step lives inside autograd and trains end-to-end with neural networks.

vision pytorch ai-library gitHub ai-image+1

AI Image2020

YOLOv5

Ultralytics

PyTorch object detector built for shipping: train on your own data, then export to ONNX, CoreML, TFLite, or TensorRT with one command. Comes in five sizes (n/s/m/l/x) and adds instance-segmentation and classification heads beyond bounding-box detection.

vision pytorch github ai-image ai-tools+1

AI Model2022

Diffusers (Hugging Face)

Patrick von Platen, Suraj Patil +12Hugging Face

Runs pretrained diffusion models for image, video, and audio generation through composable pipelines. It separates pipelines, schedulers, models, adapters, and memory optimizations so teams can prototype quickly without locking into one model family.

huggingface ai-image pytorch vision ai-tools+1

AI Image2024

Brush

Arthur Brussee

Produces real-time 3D reconstructions from multi-view images using Gaussian splatting, with on-device training and interactive viewing across native desktops, Android, and the browser. Uses WebGPU and the Burn ML framework to ship dependency-free binaries, a CLI, live training visualization, and streaming .ply support.

vision ai-image ai-train rust github+5

AI Model2024

Structured 3D Latents for Scalable and Versatile 3D Generation

Microsoft

Generates high-quality, editable 3D assets from text or images and decodes to radiance fields, 3D Gaussians, or textured meshes. Ships pretrained models up to 2B parameters, a 500K asset dataset and training code; best used with image conditioning and a ≥16GB NVIDIA GPU.

microsoft gitHub huggingface ai-image ai-image-demos+3

AI Video2025

ViMax: Agentic Video Generation

HKUDSData Intelligence Lab @ HKU

Turns a raw idea, novel, or screenplay into a complete multi-shot video through a multi-agent pipeline that scripts, storyboards, and renders shots while a vision model checks character and scene consistency across the whole story.

ai-video video ai-agent agent-skills python+6

AI Model2026

Anima

CircleStone Labs, Comfy Org

Generates anime-style and other non-photorealistic illustrations from text prompts. A 2B-parameter diffusion base preview trained on millions of anime images (and ~800k non-anime art) and released under a non-commercial license; best used in ComfyUI around ~1MP resolution.

huggingface nvidia ai-image AIGC ai-train+3

AI Image2026

Modly

Lightning Pixel

Turns photos into exportable 3D meshes using open-source AI models that run entirely on your GPU. Desktop app for Windows and Linux with an extension system to install local model generators and export common 3D formats.

ai-image image nodejs python github+2

AI Model2026

ERNIE-Image

Baidu

An open text-to-image generation model built on an 8B Diffusion Transformer that focuses on layout-sensitive, text-heavy, and instruction-following image synthesis. Notable for accurate text rendering, structured/compositional generation (posters, comics), and ability to run on consumer 24GB GPUs when paired with prompt enhancement.

vision ai-image huggingface pytorch prompt-engineering+3

AI Model2026

HY-World 2.0

Tencent

Generates and reconstructs navigable, editable 3D worlds from text, single images, multi-view photos, or video; outputs meshes and Gaussian Splatting assets and includes WorldMirror 2.0 for fast multi-view reconstruction. Suited for research and production pipelines that import assets into engines; requires substantial GPU resources.

vision ai-image huggingface pytorch ai-demos+3

AI Model2026

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

Robbyant Team, Chen, Lin‑Zhuo +10

Performs feed‑forward streaming 3D reconstruction from image sequences, combining coordinate grounding, dense geometric cues and trajectory memory to correct long‑range drift; uses paged KV‑cache attention for ~20 FPS inference at 518×378 and supports sequences >10,000 frames.

vision pytorch foundation-model huggingface ai-inference+2

Tag

Explore by tags

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

agent-skills

ai

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-deploy

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

algorithms

alibaba

amazon

android

anthropic

audio

aws

benchmark

biology

blog

book

bytedance

chatbot

chatgpt

chemistry

claude

claude-code

cli

code

codex

copilot

course

cuda

cursor

deepmind

deepseek

depth

devops

diffusers

docker

drug-discovery

electron

embeddings

engineering

evaluation

facebook

finance

flow-matching

foundation

foundation-model

gemini

gemini-cli

gemma

genomics

gitHub

github