LogoAIAny
Icon for item

DiffSynth-Studio

DiffSynth-Studio is an open-source Diffusion model engine developed and maintained by the ModelScope Community, focusing on image and video generation. It supports mainstream models like FLUX, Wan, and Qwen-Image, offering efficient memory management and flexible training frameworks. Key features include VRAM optimization, low-memory inference, LoRA/ControlNet training, and innovative techniques like EliGen and Nexus-Gen for pushing generative model boundaries.

Introduction

DiffSynth-Studio: An Open-Source Diffusion Model Engine

Overview

DiffSynth-Studio is a powerful open-source framework designed to harness the magic of Diffusion models for advanced image and video synthesis. Developed and maintained by the ModelScope Community, it serves as the core engine for the ModelScope AIGC zone, enabling both aggressive technical exploration for academia and stable deployment for industry. The project aggregates community power to foster innovation in generative AI, lowering the barrier for developers to experiment with cutting-edge capabilities.

It currently encompasses two complementary projects:

  • DiffSynth-Studio: Emphasizes rapid prototyping and novel research, supporting the latest models and experimental features.
  • DiffSynth-Engine: Focuses on production-ready stability, optimization, and performance for industrial applications.

Users can access productized experiences via the ModelScope AIGC Zone (for Chinese users) at https://modelscope.cn/aigc/home or ModelScope Civision (global) at https://modelscope.ai/civision/home. Comprehensive documentation is available in English and Chinese, covering principles of Diffusion models to guide developers in expanding technological frontiers.

Key Features and Capabilities
Efficient Inference and Training Pipelines

DiffSynth-Studio re-engineers inference and training for models like FLUX, Wan, Qwen-Image, and more. It introduces advanced VRAM management, including layer-level disk offloading to run large models on limited hardware (e.g., 8GB VRAM for complex generations). Features like FP8 precision, sequence parallelism, and split training (separating data processing from gradient computations) reduce memory usage and accelerate workflows.

  • Image Synthesis: Supports text-to-image, image-to-image, inpainting, and control-guided generation. Models include Z-Image Turbo, FLUX.2-dev, Qwen-Image series (with variants for editing, distillation, and ControlNet), and FLUX.1 lineage (including ControlNets, IP-Adapter, and InfiniteYou).
  • Video Synthesis: Handles text-to-video, image-to-video, video continuation, and audio-driven generation. Core support for Wan series (e.g., Wan2.1-T2V, Wan2.2-S2V), with extensions for VACE, Fun controls, and real-time models like krea-realtime-video.
Training Innovations

The framework excels in model fine-tuning:

  • LoRA and Full Training: Compatible with differential LoRA, direct distillation (e.g., 5x speedup for Qwen-Image), and blockwise ControlNet training.
  • Advanced Techniques: Includes split training for lower VRAM, FP8 for non-gradient models, and support for high-resolution optimization.

Example models trained and open-sourced include Qwen-Image-EliGen for entity-level control, ArtAug LoRA for aesthetic enhancement, and Nexus-GenV2 for unified understanding/generation/editing.

Innovative Achievements

Beyond engineering, DiffSynth-Studio incubates research:

  • EliGen: Entity-level controlled generation with regional attention, extending to inpainting via fusion pipelines (arXiv:2501.01097).
  • Nexus-Gen: Unifies LLM reasoning with Diffusion for seamless image tasks (arXiv:2504.21356).
  • ArtAug: Improves aesthetics through synthesis-understanding interaction (arXiv:2412.12888).
  • AttriCtrl: Fine-grained attribute intensity control (arXiv:2508.02151).
  • AutoLoRA: Automated retrieval and gated fusion of LoRAs (arXiv:2508.02107).
  • Historical roots in ExVideo (video extension, arXiv:2406.14130), Diffutoon (toon shading, arXiv:2401.16224), and DiffSynth (deflickering, arXiv:2308.03463).

Datasets like Qwen-Image-Self-Generated-Dataset and EliGenTrainSet are also open-sourced to fuel community training.

Installation and Usage

Install via source for latest features:

git clone https://github.com/modelscope/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -e .

Or from PyPI: pip install diffsynth.

Environment variables allow configuring model downloads (default: ModelScope; international: www.modelscope.ai). Quick starts for models like Qwen-Image or Wan2.1 are provided in docs, with examples for inference, low-VRAM runs, and training scripts.

WebUI support includes brush tools for interactive creation. The project warns of ongoing optimizations, with limited maintainer bandwidth—issues may resolve slowly.

Update History Highlights
  • Dec 2025: Qwen-Image-i2L (Image-to-LoRA); DiffSynth-Studio 2.0 with VRAM upgrades, Z-Image Turbo, FLUX.2-dev.
  • Nov-Oct 2025: Video models like Video-As-Prompt-Wan2.1, LongCat-Video.
  • Sep-Aug 2025: Qwen-Image ecosystem expansions (EliGen-Poster, ControlNets, distillation).
  • Jul-Jun 2025: Wan2.2, Nexus-Gen, FLUX.1 support.
  • Earlier: CogVideoX, ExVideo, Diffutoon origins (2023-2024).

Major updates may deprecate old features; historical versions available. With 10k+ stars, it empowers global AI innovation in generative media.

Information

  • Websitegithub.com
  • AuthorsModelScope Community, Artiprocher
  • Published date2023/12/08

More Items