目錄
- 基音周期勤婚、濁音的概念
- Sonic源碼分析
- 資料
- 收獲
上一篇我們學(xué)習(xí)了音頻變速不變調(diào)的原理以及WSOLA波形相似疊加算法進(jìn)行時(shí)域壓擴(kuò)處理趣些。其中在尋找相似幀方面催烘,Sonic采用AMDF(平均幅度差函數(shù)法)方法來(lái)進(jìn)行尋找胸懈。
一伺通、基音周期绎狭、濁音的概念
圖片來(lái)自:[清音or濁音 ]
人體的發(fā)音器官可以分為三大部分:動(dòng)力區(qū) 聲源區(qū) 調(diào)音區(qū)
1.動(dòng)力區(qū)—— 肺 、橫膈膜推捐、氣管
肺部呼出的氣流是語(yǔ)音的原動(dòng)力。肺部呼出的氣流侧啼,通過(guò)支氣管到達(dá)喉頭牛柒,作用于聲帶、咽腔痊乾、口腔 皮壁、鼻腔等發(fā)音器官。
2.聲源區(qū)——喉頭哪审、聲帶
用手摸脖子那里的喉頭蛾魄,聲帶就位于喉頭的后面,
聲帶是兩片富有彈性的帶狀薄膜湿滓,兩片聲帶之間的空隙叫聲門(mén)滴须。
從肺部呼出的氣流通過(guò)關(guān)閉著的聲門(mén)時(shí),會(huì)引起聲帶振動(dòng)而發(fā)出聲音
如果你把手貼在脖子上喉的部位叽奥,發(fā)聲時(shí)扔水,手會(huì)感到輕微的震動(dòng),這是因?yàn)槁晭г谡駝?dòng)朝氓。
嗓音的高低魔市、粗細(xì)是由聲帶的松緊程度、呼出的氣體多少?zèng)Q定的赵哲。
3.調(diào)音區(qū)————口腔待德、鼻腔、咽腔
調(diào)音區(qū)主要是口腔枫夺,鼻腔将宪,咽腔三大部分,其中口腔主要包括唇、齒和舌頭涧偷。(口腔后面是咽腔簸喂,咽頭上通口腔、鼻腔燎潮,下接喉頭喻鳄。)
引用:[清音or濁音](https://zhuanlan.zhihu.com/p/374857199)
濁音的發(fā)音過(guò)程是:來(lái)自肺部的氣流沖擊聲門(mén),造成聲門(mén)的一張一合确封,形成一系列準(zhǔn)周期的氣流脈沖除呵,經(jīng)過(guò)聲道(含口腔、鼻腔)的諧振及唇齒的輻射最終形成語(yǔ)音信號(hào)爪喘。故濁音波形呈現(xiàn)一定的準(zhǔn)周期性颜曾。
所謂基音周期,就是對(duì)這種準(zhǔn)周期而言的秉剑,它反映了聲門(mén)相鄰兩次開(kāi)閉之間的時(shí)間間隔或開(kāi)閉的頻率泛豪。
基音周期是語(yǔ)音信號(hào)最重要的參數(shù)之一,但是基音的提取是比較困難的侦鹏。
主要體現(xiàn)在
1. 聲門(mén)激勵(lì)信號(hào)并不是一個(gè)完全的周期序列
2. 基音頻率大多數(shù)情況是在100-200HZ诡曙,但是濁音信號(hào)往往啃根包含幾十個(gè)諧波分量,而其基波分量往往不是最強(qiáng)的略水,造成基音檢測(cè)時(shí)价卤,把諧波當(dāng)做了基波。
3. 基波周期的變化分為比較大渊涝,老年男性50 Hz慎璧,兒童和女性500 Hz。
引用:[語(yǔ)音識(shí)別 08 基音周期的估算方法](https://zhuanlan.zhihu.com/p/454283094)
基音檢測(cè)的方法主要有自相關(guān)函數(shù)法跨释,平均幅度差函數(shù)法等胸私。而Sonic的實(shí)現(xiàn)采用的就是平均幅度差函數(shù)法,這也是sonic 變速不變調(diào)最重要的一步煤傍。
二盖文、Sonic源碼分析
sonic源碼地址:https://github.com/waywardgeek/sonic
可以看到它有兩份實(shí)現(xiàn)Java版本(Sonic.java)和Cpp版本(Sonic.cpp),并且代碼量都比較少蚯姆,作者給出了性能對(duì)比,基本上也沒(méi)什么差別洒敏。
而android中大名鼎鼎的Exoplayer的變速不變調(diào)的實(shí)現(xiàn)就是基于Sonic.java龄恋,我們結(jié)合Exoplayer的實(shí)現(xiàn)來(lái)進(jìn)行分析。
主要有兩個(gè)類(lèi)SonicAudioProcessor和Sonic凶伙,其中SonicAudioProcessor是對(duì)Sonic做了一層封裝為了適配Exoplayer的框架郭毕。
public final class SonicAudioProcessor {
private float speed;
private float pitch;
private Sonic sonic;
private ByteBuffer buffer;
private ShortBuffer shortBuffer;
private ByteBuffer outputBuffer;
public void setSpeed(float speed) {
if (this.speed != speed) {
this.speed = speed;
...
flush();
}
}
//速度發(fā)生變化后,重新初始化Sonic函荣。
private void flush() {
...
sonic = new Sonic(
mSampleRate,//輸入采樣率
mChannelCount,//采樣通道數(shù)
speed,//速度
pitch,//變調(diào)值显押,默認(rèn)1.0f
mSampleRate//輸出采樣率扳肛,一般不變
);
...
}
//把Mediacodec解碼音頻后的Frame數(shù)據(jù)數(shù)據(jù)在給到AudioTrack.write之前,先給到Sonic進(jìn)行變速處理
public void queueInput(ByteBuffer inputBuffer) {
...
ShortBuffer shortBuffer = inputBuffer.asShortBuffer();
...
sonic.queueInput(shortBuffer);
...
}
// 緊接著調(diào)用Sonic變速處理后的數(shù)據(jù)給到AudioTrack進(jìn)行write
public ByteBuffer getOutput() {
...
int outputSize = sonic.getOutputSize();
buffer = ByteBuffer.allocateDirect(outputSize).order(ByteOrder.nativeOrder());
shortBuffer = buffer.asShortBuffer();
sonic.getOutput(shortBuffer);
outputBuffer = buffer;
...
return outputBuffer;
}
}
可以看到SonicAudioProcessor就是AudioTrack和Sonic之前的一層封裝層乘碑。把Mediacodec解碼的音頻frame數(shù)據(jù)在給到AudioTrack.write之前挖息,先通過(guò)queueInput給到Sonic進(jìn)行變速處理,然后通過(guò)getoutput獲取處理后的數(shù)據(jù)再給到AudioTrack兽肤。
下面我們重點(diǎn)看下Sonic的queueInput和getOutput的實(shí)現(xiàn)套腹。
public final class Sonic {
private static final int MINIMUM_PITCH = 65;
private static final int MAXIMUM_PITCH = 400;
private static final int AMDF_FREQUENCY = 4000;
private static final int BYTES_PER_SAMPLE = 2;
public Sonic(
int inputSampleRateHz, int channelCount, float speed, float pitch, int outputSampleRateHz) {
this.inputSampleRateHz = inputSampleRateHz;
this.channelCount = channelCount;
this.speed = speed;
this.pitch = pitch;
rate = (float) inputSampleRateHz / outputSampleRateHz;
minPeriod = inputSampleRateHz / MAXIMUM_PITCH;//最小的基音周期 44100/400
maxPeriod = inputSampleRateHz / MINIMUM_PITCH;//最大的基音周期 44100/65
maxRequiredFrameCount = 2 * maxPeriod;//最大的請(qǐng)求幀數(shù) 2* 44100/65 根據(jù)奈奎斯特采樣定律,采樣率為周期的2倍
downSampleBuffer = new short[maxRequiredFrameCount];//下采樣的buffer
inputBuffer = new short[maxRequiredFrameCount * channelCount];
outputBuffer = new short[maxRequiredFrameCount * channelCount];
pitchBuffer = new short[maxRequiredFrameCount * channelCount];
}
public void queueInput(ShortBuffer buffer) {
...
processStreamInput();
}
private void processStreamInput() {
...
float s = speed / pitch;
float r = rate * pitch;
if (s > 1.00001 || s < 0.99999) {
changeSpeed(s);
}
...
}
private void changeSpeed(float speed) {
...
int frameCount = inputFrameCount;
int positionFrames = 0;
do {
//如果有保留的framecount资铡,將inputbuffer 中保存的 positionFrames 個(gè)點(diǎn)的數(shù)據(jù)拷貝到 outputbuffer 中
if (remainingInputToCopyFrameCount > 0) {
positionFrames += copyInputToOutput(positionFrames);
} else {
//尋找基音周期
int period = findPitchPeriod(inputBuffer, positionFrames);
if (speed > 1.0) {
//如果倍速 進(jìn)行跳幀重采樣
positionFrames += period + skipPitchPeriod(inputBuffer, positionFrames, speed, period);
} else {
//如果慢速电禀,則插入值
positionFrames += insertPitchPeriod(inputBuffer, positionFrames, speed, period);
}
} while (positionFrames + maxRequiredFrameCount <= frameCount);
removeProcessedInputFrames(positionFrames);
}
private int findPitchPeriod(short[] samples, int position) {
//尋找基音周期,這是變速不變調(diào)的關(guān)鍵的一步笤休,Sonic采用 AMDF方式尋找
int period;
int retPeriod;
int skip = inputSampleRateHz > AMDF_FREQUENCY ? inputSampleRateHz / AMDF_FREQUENCY : 1;//采樣率是否大于AMDF_FREQUENCY(4000),計(jì)算下采樣時(shí)尖飞,跳過(guò)的采樣點(diǎn)數(shù)量,這里的結(jié)果是5店雅。為了提高效率政基,進(jìn)行向下采樣到4KHZ,然后用更窄的頻率范圍再做一次底洗。
downSampleInput(samples, position, skip);
period = findPitchPeriodInRange(downSampleBuffer, 0, minPeriod / skip, maxPeriod / skip);
if (skip != 1) {
period *= skip;
int minP = period - (skip * 4);
int maxP = period + (skip * 4);
if (minP < minPeriod) {
minP = minPeriod;
}
if (maxP > maxPeriod) {
maxP = maxPeriod;
}
downSampleInput(samples, position, 1);
period = findPitchPeriodInRange(downSampleBuffer, 0, minP, maxP);
}
if (previousPeriodBetter(minDiff, maxDiff)) {
retPeriod = prevPeriod;
} else {
retPeriod = period;
}
prevMinDiff = minDiff;
prevPeriod = period;
return retPeriod;
}
//尋找基音周期的 最終實(shí)現(xiàn)就在這里了
private int findPitchPeriodInRange(short[] samples, int position, int minPeriod, int maxPeriod) {
// Find the best frequency match in the range, and given a sample skip multiple. For now, just
// find the pitch of the first channel.
int bestPeriod = 0;
int worstPeriod = 255;
int minDiff = 1;
int maxDiff = 0;
position *= channelCount;
for (int period = minPeriod; period <= maxPeriod; period++) {
int diff = 0;
for (int i = 0; i < period; i++) {
short sVal = samples[position + i];
short pVal = samples[position + period + i];
diff += Math.abs(sVal - pVal);
}
// Note that the highest number of samples we add into diff will be less than 256, since we
// skip samples. Thus, diff is a 24 bit number, and we can safely multiply by numSamples
// without overflow.
if (diff * bestPeriod < minDiff * period) {
minDiff = diff;//計(jì)算最小差值
bestPeriod = period;//對(duì)應(yīng)對(duì)最佳基音周期
}
if (diff * worstPeriod > maxDiff * period) {
maxDiff = diff;//記錄最大的差值
worstPeriod = period;//記錄波形相似周期
}
}
this.minDiff = minDiff / bestPeriod;//最小的差值 除以 最佳的基音周期腋么,求得 采樣點(diǎn)的平均最小差值
this.maxDiff = maxDiff / worstPeriod;//最大差值 除以 波形相似周期,求得采樣點(diǎn)的平均最大差值
return bestPeriod;//返回最佳基音周期
}
//如果是倍速處理亥揖,跳過(guò)基音周期信號(hào)
private int skipPitchPeriod(short[] samples, int position, float speed, int period) {
// Skip over a pitch period, and copy period/speed samples to the output.
int newFrameCount;
if (speed >= 2.0f) {
//大于等于2倍珊擂,不保留remainingInputToCopyFrameCount
newFrameCount = (int) (period / (speed - 1.0f));
} else {
newFrameCount = period;
//如果配速小于2倍,保留remainingInputToCopyFrameCount费变,采用線性插值法
remainingInputToCopyFrameCount = (int) (period * (2.0f - speed) / (speed - 1.0f));
}
outputBuffer = ensureSpaceForAdditionalFrames(outputBuffer, outputFrameCount, newFrameCount);
overlapAdd(
newFrameCount,
channelCount,
outputBuffer,
outputFrameCount,
samples,
position,
samples,
position + period);
outputFrameCount += newFrameCount;
return newFrameCount;
}
//如果是慢速(小于1.0)則進(jìn)行插入基音周期信號(hào)
private int insertPitchPeriod(short[] samples, int position, float speed, int period) {
// Insert a pitch period, and determine how much input to copy directly.
int newFrameCount;
if (speed < 0.5f) {
newFrameCount = (int) (period * speed / (1.0f - speed));
} else {
newFrameCount = period;
remainingInputToCopyFrameCount = (int) (period * (2.0f * speed - 1.0f) / (1.0f - speed));
}
outputBuffer =
ensureSpaceForAdditionalFrames(outputBuffer, outputFrameCount, period + newFrameCount);
System.arraycopy(
samples,
position * channelCount,
outputBuffer,
outputFrameCount * channelCount,
period * channelCount);
overlapAdd(
newFrameCount,
channelCount,
outputBuffer,
outputFrameCount + period,
samples,
position + period,
samples,
position);
outputFrameCount += period + newFrameCount;
return newFrameCount;
}
//最后進(jìn)行合幀疊加處理摧扇,到輸出buffer
private static void overlapAdd(
int frameCount,
int channelCount,
short[] out,
int outPosition,
short[] rampDown,
int rampDownPosition,
short[] rampUp,
int rampUpPosition) //rampUpPosition=rampDownPosition+基音周期值
{
for (int i = 0; i < channelCount; i++) {
int o = outPosition * channelCount + i;
int u = rampUpPosition * channelCount + i;
int d = rampDownPosition * channelCount + i;
for (int t = 0; t < frameCount; t++) {
//把起始幀和基音周期幀的幀相加,這里采樣線性插值
out[o] = (short) ((rampDown[d] * (frameCount - t) + rampUp[u] * t) / frameCount);
o += channelCount;
d += channelCount;
u += channelCount;
}
}
}
}
詳細(xì)說(shuō)明見(jiàn)上述代碼注釋?zhuān)玖鞒炭偨Y(jié)如下:
- 首先確定一個(gè)最大和最小的基音周期范圍(和采樣率有關(guān)系的一個(gè)經(jīng)驗(yàn)值)
- 通過(guò)findPitchPeriod找到基音周期大小挚歧,為了提高效率扛稽,先進(jìn)行下采樣到4KHZ,然后用更窄的頻率范圍再做一次滑负。尋找基音周期的方法就是:在 range 范圍內(nèi)遍歷每個(gè)幀與起始幀的 AMDF 值在张,值最小的幀與起始幀的距離則是基因周期
- 根據(jù)倍速還是慢速分別進(jìn)行跳過(guò)部分基音周期信號(hào)或者進(jìn)行插入基音周期信號(hào),
- 進(jìn)行合幀疊加輸出到outputBuffer
調(diào)用以及l(fā)og輸出
sonicAudioProcessor.queueInput(audioData);
outData = sonicAudioProcessor.getOutput();
Log.i(TAG, " inputDataLength="+audioData.limit()+ " inputData="+ Arrays.toString(audioData.array()));
Log.i(TAG, " outDataLength="+outData.limit()+ " outData="+ Arrays.toString(outData.array()));
--->0.5倍速時(shí)
inputDataLength=4096
outDataLength=8096 //--》不是恒定的
--->1.5倍速時(shí)
inputDataLength=4096
outDataLength=2844 //--》不是恒定的
--->2倍速時(shí)
inputDataLength=4096
outDataLength=2020 //--》不是恒定的
可以看到0.5倍速時(shí)矮慕,進(jìn)行了插值處理帮匾;大于1倍數(shù)時(shí)進(jìn)行了采樣。這個(gè)的實(shí)現(xiàn)是
do {
//如果有保留的framecount痴鳄,將inputbuffer 中保存的 positionFrames 個(gè)點(diǎn)的數(shù)據(jù)拷貝到 outputbuffer 中
if (remainingInputToCopyFrameCount > 0) {
positionFrames += copyInputToOutput(positionFrames);
} else {
//尋找基音周期
int period = findPitchPeriod(inputBuffer, positionFrames);
//找到基音周期后瘟斜,變速的處理,重點(diǎn)時(shí)下面的skipPitchPeriod和insertPitchPeriod
if (speed > 1.0) {
positionFrames += period + skipPitchPeriod(inputBuffer, positionFrames, speed, period);
} else {
positionFrames += insertPitchPeriod(inputBuffer, positionFrames, speed, period);
}
}
} while (positionFrames + maxRequiredFrameCount <= frameCount);
skipPitchPeriod的實(shí)現(xiàn)用下圖說(shuō)明
insertPitchPeriod 的實(shí)現(xiàn)用下圖說(shuō)明
由此可見(jiàn),變速不變調(diào)不是簡(jiǎn)單的改變采樣率螺句,而是首先要找到基音周期虽惭,然后根據(jù)不同的倍速情況進(jìn)行分幀、下采樣或者插值蛇尚、合幀以及remainingInputToCopyFrameCount等處理芽唇。其中Sonic再尋找基音周期時(shí)采用 AMDF方式。
那么soundtouch又是如何實(shí)現(xiàn)的吶佣蓉?我們下一篇來(lái)對(duì)其進(jìn)行分析
三披摄、資料
音頻變速變調(diào) -sonic 源碼分析
語(yǔ)音識(shí)別 08 基音周期的估算方法
四、收獲
通過(guò)本篇的學(xué)習(xí)
- 了解了人是如何發(fā)生的勇凭,以及什么是基音周期
- 分析Exoplayer的Sonic變速不變調(diào)的實(shí)現(xiàn)
- 分析Sonic的通過(guò)平均幅度差函數(shù)法尋找基音周期的實(shí)現(xiàn)
- 分析變速的實(shí)現(xiàn)原理
感謝你的閱讀
下一篇我們繼續(xù)通過(guò)源碼分析另外一種變速不變調(diào)的實(shí)現(xiàn):Soundtouch疚膊,歡迎關(guān)注公眾號(hào)“音視頻開(kāi)發(fā)之旅”,一起學(xué)習(xí)成長(zhǎng)虾标。
歡迎交流