GitHub Park

Z-Image-Turbo: Text-to-Image Generation Tool

Z-Image-Turbo is a professional web interface tool based on the Tongyi-MAI Z-Image-Turbo model, equipped with 6 billion parameters, enabling fast text-to-image generation and conversion.

Z-Image-Turbo integrates a powerful MCP (Model Context Protocol) server, which exposes image generation capabilities through a standardized protocol, supporting programmatic calls from MCP-compatible clients such as AI assistants and automation tools.

Z-Image-Turbo Core Advantages

  • AI Integration: Allows AI assistants like Claude to generate images directly during conversations.

  • Automated Workflows: Enables the creation of automated workflows that include image generation steps.

  • Remote Access: Supports image generation via web clients or remote services (HTTP mode).

  • Standardized API: Uses a universal protocol across different AI tools and platforms.

Z-Image-Turbo Core Features

Application Level

  • Premium Dark Interface: Features a frosted glass design with intuitive and user-friendly controls.

  • Smart Presets: Provides commonly used aspect ratios (1:1, 3:4, 16:9) and resolutions (480p–1080p).

  • Fine-Grained Control: Adjust size, inference steps, guidance scale, and random seed via sliders.

  • Real-Time Progress: Tracks and displays the generation process in real time.

  • Flexible Deployment: Supports custom model cache directories and enables CPU offloading.

MCP Server Level

  • Dual Transport Modes: Supports both stdio (local) and HTTP/SSE (remote) connections simultaneously.

  • AI Assistant Compatibility: Seamlessly integrates with Claude Desktop and other MCP clients.

  • Comprehensive Toolset: Includes image generation, model information querying, configuration management, and example prompt access.

  • Highly Configurable: Customize host, port, and transport mode via mcp_config.json.

  • Production Ready: HTTP mode supports stateless deployment and horizontal scaling.

Model Level

  • High Speed: Optimized for 8-step inference, achieving sub-second latency on enterprise-grade GPUs.

  • Advanced Architecture: Built on S3-DiT (Scalable Single-Stream Diffusion Transformer).

  • Powerful Encoders: Utilizes Qwen 4B for efficient language understanding and Flux VAE for image decoding.

  • Cutting-Edge Training: Employs DMDR training combining DMD and reinforcement learning for high semantic alignment.

  • Multilingual Support: Accurately renders English and Chinese text content.

  • Versatile Applications: Generates photorealistic, anime, and various other styles of images with no content restrictions.

  • High-Resolution Output: Natively supports up to 2 megapixel resolutions (e.g., 1024×1536, 1440×1440).

  • Resource Efficient: Only 6B parameters, runs smoothly on 16GB VRAM (consumer-grade GPU friendly).

Visit Aaryan-Kapoor/z-image-turbo to access the source code and obtain more information.