AIAny - vision

Nano Banana

2025

Google DeepMind, Google AI Studio

Nano Banana (aka Gemini 2.5 Flash Image) is a state-of-the-art image generation and editing model from Google. It enables users to blend multiple images, maintain character consistency, perform targeted transformations via natural language, use world knowledge in image editing, and includes invisible SynthID watermarking to mark AI-generated or edited images.

ai-tools ai-image vision

Seedream

2025

ByteDance Seed

As a new-generation image creation model, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture. It supports multimodal inputs, reference images, and can produce high-definition images up to 4K with fast inference speed.

ai-tools ai-image vision

Veo

2024

Google DeepMind

Veo is a state-of-the-art video generation model developed by Google DeepMind, designed to empower filmmakers and storytellers.

ai-tools ai-video vision

FLUX.1

2024

Black Forest Labs

Amazing AI models from the Black Forest.

ai-tools ai-image vision

Midjourney

2022

Midjourney, Inc.

An independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.

ai-tools ai-image vision

Runway

2023

Runway AI, Inc.

With Runway Gen-4, you are now able to precisely generate consistent characters, locations and objects across scenes. Simply set your look and feel and the model will maintain coherent world environments while preserving the distinctive style, mood and cinematographic elements of each frame. Then, regenerate those elements from multiple perspectives and positions within your scenes.

ai-tools ai-video vision

KlingAI

2024

Kuaishou Technology

Kling AI, tools for creating imaginative images and videos, based on state-of-art generative AI methods.

ai-tools ai-image ai-video vision

PaddleOCR

2019

PaddlePaddle

PaddleOCR is an industry-leading, production-ready OCR and document AI engine developed by the PaddlePaddle team. It supports over 100 languages and converts PDFs or image documents into structured AI-friendly data (e.g., JSON and Markdown), bridging the gap between images/PDFs and LLMs. Key features include multilingual support, high accuracy, handwriting recognition, and advanced document parsing for elements like tables, formulas, and charts, with end-to-end tools for training, inference, and deployment.

github ai-tools ai-image vision ai-inference+2

X-AnyLabeling

2023

Wei Wang, CVHub

X-AnyLabeling is a powerful annotation tool integrated with an AI engine for fast and automatic labeling. Designed for multi-modal data engineers, it offers industrial-grade solutions for complex tasks. Supports images and videos, GPU acceleration, custom models, one-click inference for all task images, and import/export formats like COCO, VOC, YOLO. Handles classification, detection, segmentation, captioning, rotation, tracking, estimation, OCR, VQA, grounding, etc., with various annotation styles including polygons, rectangles, rotated boxes.

github ai-tools vision ai-image ai-video+4

Sora

2024

OpenAI

OpenAI's video generation model.Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.

ai-tools ai-video vision

Video models are zero-shot learners and reasoners

2025

Thaddäus Wiedemer, Yuxuan Li +7

This paper demonstrates the zero-shot learning and reasoning abilities of the generative video model Veo 3, paralleling the evolution of Large Language Models (LLMs) in natural language processing. Veo 3 excels in diverse visual tasks without explicit training, such as object segmentation, edge detection, image editing, understanding physical properties, recognizing affordances, and simulating tool use, enabling early visual reasoning like maze solving and symmetry detection.

video vision LLM paper ai-video+3

openpilot

NaN

comma.ai

openpilot is an operating system for robotics developed by comma.ai. It upgrades the driver assistance system on over 300 supported cars, providing advanced features like lane centering, adaptive cruise control, and path planning.

github ai-development vision ai-tools ai-agent

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

course

deepmind

deepseek

engineering

foundation

foundation-model

gemini

github

google

gradient-booting

grok

huggingface

LLM

llm

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

ollama

openai

paper

physics

plugin

pytorch

RL

science

sora

translation

tutorial

vibe-coding

video

vision

xAI

xai

Nano Banana

Seedream