YingMusic-SVC is an open-source project dedicated to real-world, zero-shot robust singing voice conversion (SVC). Its goal is to accurately reproduce the timbre of a target singer while preserving melody and lyrics, overcoming challenges such as harmonic interference, F0 errors, and insufficient singing-specific inductive bias faced by existing systems in real songs.
YingMusic-SVC proposes an innovative three-stage training framework that combines continuous pre-training (CPT) based on a singing training module, robust supervised fine-tuning (SFT) through F0 perturbation and harmonic enhancement, and multi-reward reinforcement learning using Flow‑GRPO to optimize perceptual quality. The framework introduces singing-specific inductive biases, including an RVC-trained timbre converter for timbre‑content decoupling, an F0‑aware fine-grained timbre adapter to capture dynamic vocal expressions, and an energy‑balanced flow‑matching loss to enhance high‑frequency details.
YingMusic-SVC offers a visual web interface for task management and result filtering, supports concurrent monitoring of multiple tasks, and integrates GPT‑4o for in‑depth intelligent analysis of product images, descriptions, and seller profiles. It enables instant notification delivery and highly customizable task filtering, significantly improving user experience and practical deployment outcomes.
The project provides a difficulty‑graded multi‑track benchmark dataset, along with pre‑trained full SVC and accompaniment separation models. It supports accompaniment separation functionality and Gradio applications, demonstrating excellent performance under complex accompaniment and harmonic contamination conditions, thereby offering robust support for practical singing voice conversion deployment.
Click to download 100+ multi-track studio songs.