Skip to the content.

UniSinger: Unified End-to-End Singing Voice Synthesis With Cross-Modality Information Matching

Introduction:
In this work, we propose UniSinger, a unified end-to-end singing voice synthesizer, which integrates three abilities related to singing voice generation: singing voice synthesis (SVS), singing voice conversion (SVC), and singing voice editing (SVE) into a single framework.

All experiments in the paper are conducted on a large-scale singing voice dataset OpenSinger. There are some audio samples to demonstrate the performance of SVS, SVC and SVE applications.

1 Singing Voice Synthesis

index GT GT (mel + HiFiGAN) FastSpeech 2 + HiFiGAN FastSpeech 2s VISinger UniSinger
#1
#2
#3
#4
#5

2 Singing Voice Conversion

2.1 Timbre Conversion

index Source Reference Reference (mel + HiFiGAN) SpeechFlow UniSinger
#1
#2
#3
#4
#5
#6

2.2 Pitch Conversion

the fundamental frequencies of samples presented below are multiplied by constant factors 0.8.

index Reference Reference (mel + HiFiGAN) SpeechFlow UniSinger
#1
#2
#3
#4
#5

4 Singing Voice Editing

Exp. 1:

original lyrics: 爱可以不问对错 —— ai # k e | y i # b u | w en # d ui | c uo
insertion: 爱怎么可以不问对错 —— ai # z en | m e # k e | y i # b u | w en # d ui | c uo
replacement: 爱怎么(可以)不问对错 —— ai # z en | m e # (k e | y i #) b u | w en # d ui | c uo
deletion: 爱(可以)不问对错 —— ai # ( k e | y i #) b u | w en # d ui | c uo

GT GT(Mel+PWG) EditSinger(insertion) EditSinger(replacement) EditSinger(deletion) UniSinger(insertion) UniSinger(replacement) UniSinger(deletion)

Exp. 2:

original lyrics: 你何苦非为他等在雨中 —— n i # h e | k u # f ei | w ei # t a # d eng # z ai # y u # zh ong
insertion: 你何苦非为他傻傻等在雨中 —— n i # h e | k u # f ei | w ei # t a # sh a | sh a # d eng # z ai # y u # zh ong
replacement: 你何苦非为他伫立风(等在雨)中 —— n i # h e | k u # f ei | w ei # t a # zh u | l i # f eng | ( d eng # z ai # y u #) zh ong
deletion: 你(何苦非)为他等在雨中 —— n i # ( h e | k u # f ei |) w ei # t a # d eng # z ai # y u # zh ong

GT GT(Mel+PWG) EditSinger(insertion) EditSinger(replacement) EditSinger(deletion) UniSinger(insertion) UniSinger(replacement) UniSinger(deletion)

Exp. 3:

original lyrics: 几朵云在阴天忘了该往哪儿走 —— j i | d uo # y un # z ai # y in | t ian # w ang # l e # g ai # w ang # n a | r # z ou
insertion: 几朵孤独的云在阴天忘了该往哪儿走 —— j i | d uo # g u | d u # d e # y un # z ai # y in | t ian # w ang # l e # g ai # w ang # n a | r # z ou
replacement: 几片叶(朵云)在阴天忘了该往哪儿走 —— j i | p ian # y e | (d uo # y un #) z ai # y in | t ian # w ang # l e # g ai # w ang # n a | r # z ou
deletion: 几朵云(在阴天)忘了该往哪儿走 —— j i | d uo # y un | (z ai # y in | t ian #) w ang # l e # g ai # w ang # n a | r # z ou

GT GT(Mel+PWG) EditSinger(insertion) EditSinger(replacement) EditSinger(deletion) UniSinger(insertion) UniSinger(replacement) UniSinger(deletion)

Exp. 4:

original lyrics: 被吹进了左耳 —— b ei # ch ui | j in # l e # z uo | er
insertion: 被思念吹进了左耳 —— b ei # s i | n ian # ch ui | j in # l e # z uo | er
replacement: 被传递到(吹进了)左耳 —— b ei # ch uan | d i # d ao # (ch ui | j in # l e # ) z uo | er
deletion: 被吹进()左耳 —— b ei # ch ui | j in # (l e #) z uo | er

GT GT(Mel+PWG) EditSinger(insertion) EditSinger(replacement) EditSinger(deletion) UniSinger(insertion) UniSinger(replacement) UniSinger(deletion)

Exp. 5:

original lyrics: 在昏暗中的我 —— z ai # h un | an # zh ong # d e # w o
insertion: 在那时昏暗中的我 —— z ai # n a | sh i # h un | an # zh ong # d e # w o
replacement: 在昏暗中与你(的我) —— z ai # h un | an # zh ong # y u # n i (d e # w o)
deletion: 在昏暗()的我 —— z ai # h un | an # ( zh ong # ) d e # w o

GT GT(Mel+PWG) EditSinger(insertion) EditSinger(replacement) EditSinger(deletion) UniSinger(insertion) UniSinger(replacement) UniSinger(deletion)