X-Sing: VISinger & VITS-SVC & NSF-BigVGAN Some thing as this paper : Make-A-Voice: Unified Voice Synthesis With Discrete Representation Adapt VISinger for S1 Adapt VITS-SVC for S2 Adapt NSF-BigVGAN for S3