可以進(jìn)行Fbank特征提取的庫(kù)有兩個(gè):
一個(gè)是python_speech_features另一個(gè)是pytorch中的torchaudio
import python_speech_features as psf
import torchaudio as ta?
對(duì)應(yīng)的兩個(gè)函數(shù)分別是:
ta.compliance.kaldi.fbank
psf.base.logfbank
ta調(diào)用了kaldi中的特征提取模塊
kaldi 和 python_speech_features 生成fbank特征的不同地方在于以下幾點(diǎn):
轉(zhuǎn)自:https://zhuanlan.zhihu.com/p/55371926
1.預(yù)加重不同:
* kaldi是先加窗分幀戚宦,再對(duì)幀內(nèi)進(jìn)行預(yù)加重辙培;python_speech_features是全體先預(yù)加重,然后再分幀;
* 再看預(yù)加重彼城,會(huì)發(fā)現(xiàn),對(duì)于音頻數(shù)據(jù)(譬如10,17,13,15, ...)第一個(gè)數(shù)據(jù)10退个,python_speech_features是不變(預(yù)加重系數(shù)0.97募壕,預(yù)加重后是 10,7.3,-3.489,2.39,...),kaldi是第一個(gè)數(shù)據(jù)也做預(yù)加重(預(yù)加重后是 0.3,7.3,-3.489,2.39,...)
2. 分幀數(shù)不同语盈,功率譜計(jì)算不同
python_speech_features 對(duì)最后還剩余的數(shù)據(jù)舱馅,不足幀長(zhǎng)的,按照一幀計(jì)算補(bǔ)齊刀荒;
kaldi 如果選擇不切斷最后剩余數(shù)據(jù)(snip-edged = False)代嗤,會(huì)發(fā)現(xiàn)多了一幀(個(gè)人認(rèn)為這里kaldi錯(cuò)了,python_speech_features正確)照棋;如果snip-edged = True资溃,兩者一致
功率譜計(jì)算不同:
python_speech_features 計(jì)算功率譜是:1.0/NFFT * numpy.square(magspec(frames,NFFT))
kaldi? 沒(méi)有乘1.0/NFFT
3. 梅爾濾波器組計(jì)算方法不同
kaldi是在梅爾坐標(biāo)轉(zhuǎn)換后的梅爾值域計(jì)算武翎,index是hz的定義域烈炭,通過(guò)梅爾轉(zhuǎn)換后比較(個(gè)人認(rèn)為計(jì)算量偏大,每次計(jì)算都要經(jīng)過(guò)一次轉(zhuǎn)換)宝恶;
而python_speech_features是先將linspace后的梅爾值統(tǒng)一轉(zhuǎn)成了hz符隙,再進(jìn)行計(jì)算,index也是hz的定義域垫毙,計(jì)算量小
其實(shí)上述兩種的本質(zhì)是一樣的霹疫,計(jì)算方法不同會(huì)導(dǎo)致系數(shù)存在差異