Roughly two types of systems for voice synthesis have been proposed. One is based on the time domain pitch synchronous overlap-add (TD-PSOLA [2]), which synthesizes a voice using the short time waveform directly extracted from the input signal. The other is based on a vocoder [3], which analyzes a voice in terms of its pitch (fundamental frequency; F0) and timbre (spectral envelope) and synthesizes it with the estimated parameters.
TD-PSOLA直接使用從音頻庫中提取的波形圖合成語音;vocoder分析語音的音調(基頻)和音色(頻譜包絡)并結合一些估計得到的參數(shù)合成語音
TD-PSOLA and vocoders have trade offs. TD-PSOLA synthesizes voice with better quality than vocoders; however, vocoders can manipulate pitch and voice timbre independently.
TD-PSOLA合成效果好,但是vocoder可以控制音調和音色
The STRAIGHT [7] and TANDEM- STRAIGHT [8] have been proposed to solve this problem. They use the pitch synchronous analysis [9] to improve the estimation performance of the spectral envelope
pitch synchronous analysis是什么?待續(xù)
Furthermore, aperiodicity is used as the parameter to represent not only the periodic signal but also the aperiodic signal
AP可以表示周期信號以及非周期信號
In STRAIGHT and TANDEM-STRAIGHT, the aperiodicity is defined as the spectrum to synthesize both periodic and aperiodic signals.The periodic and the aperiodic spectra are calculated using the spectral envelope and aperiodicity, and the periodic and aperiodic signals are individually calculated
AP是可以生成周期信號和非周期信號的頻譜圖好唯。周期頻譜和非周期頻譜是用頻譜包絡和AP計算得到,而且周期信號和非周期信號是獨立計算的糊治。
This approach cannot represent the phase of the input voice because the periodic signal is calculated as the minimum phase response, and the vocal tract response hetT generally includes not only minimum phase response but also maximum phase response. To accurately synthesize a voice, it is essential to extract the phase of the input voice. We used a waveform-based parameter as a new parameter instead of aperiodicity.
待續(xù)
PLATINUM extracts the waveform- based parameter to reconstruct the input voice.
** PLATINUM提取波形圖的參數(shù)碌燕,重建輸入語音恒水。**
The proposed system equals vocoder-based systems except that it uses the excitation signal instead of aperiodicity, which therefore suggests that it is possible for the proposed system to independently manipulate the F0 and spectral envelope like vocoder-based systems.
該系統(tǒng)等價于vocoder-based systems咒劲,但是他使用激勵信號替換了AP顷蟆,這表明這個系統(tǒng)可以獨立的控制音調(基頻)和音色(頻譜包絡)
The observed spectrum Ye!T is defined as the product of the spectral envelope He!T and target spectrum for reconstructing the waveform. The target spectrum Xe!T is given by,Since the phase of He!T for vocoder-based systems is generally the minimum phase, the maximum phase of the input voice is included in Xe!T. The power of Xe!T is nearly flat, provided that the spectral envelope is accurately estimated. If He!T does not include any zeros, the inverse spectrum can be calculated reliably.
觀察到的頻譜Y是由頻譜包絡和用于重建波形圖的目標頻譜的產(chǎn)物。目標頻譜X是有如下公式獲得的腐魂。既然頻譜包絡中的相位是最小相位帐偎,那么輸入信號的最大相位在X中,X的能量幾乎平穩(wěn)的蛔屹,這表明頻譜包絡的估算值準確的削樊。如果H中不包含0,那么H的倒數(shù)可以計算獲得兔毒。
To estimate Xe!T, determining the temporal positions for
windowing is an important problem. PLATINUM uses the F0 contour and waveform. First, the voiced section is estimated based on the F0 contour, and the temporal position with maximum value of yetT2 is then extracted as the basic temporal position. The other positions are automatically calculated based on the basic position and F0 contour.
在估計X的時候漫贞,測定窗口的位置是關鍵。PLATINUM使用基頻等高線和波形圖育叁。首先绕辖,語音部分由基頻登高線估計得到,然后獲得具有最大值yt2的時間位置擂红,并以此作為基礎時間位置,剩下的位置自動的通過基礎時間位置和基頻等高線計算得到。
總結
f0基頻代表音調的高低昵骤,女生偏高树碱,男生偏低。sp代表音色变秦,吉他和鋼琴的音色就不一樣成榜,ap代表說話的內(nèi)容,比如”你好嗎“蹦玫,ap可能涉及到拼音中的1234聲赎婚。用提取激勵信號的方式代替ap,能取得更好的結果樱溉。