調(diào)用Azure認(rèn)知服務(wù)進(jìn)行長音頻文件語音識別CognitiveServices Speech

由于如鵬網(wǎng)工作需要要對大語音文件（長度超過5分鐘）進(jìn)行“語音轉(zhuǎn)文字”的語音識別知染，試了百度和科大訊飛的接口，對大語音文件的識別都支持的不好陆爽，本來想找開源的語音識別項(xiàng)目胶逢，發(fā)現(xiàn)都要自己去做數(shù)據(jù)集的訓(xùn)練，不僅麻煩俗扇，而且訓(xùn)練不夠的話就會識別準(zhǔn)確度太低硝烂。最后試了微軟的Azure認(rèn)知服務(wù)（CognitiveServices Speech），感覺非常好用铜幽。

1滞谢、首先要到azure去申請一個(gè)賬號，azure上提供了免費(fèi)的試用賬號除抛，具體怎么申請很簡單狮杨，我就不說了。

2到忽、新建一個(gè).Net項(xiàng)目

Azure認(rèn)知服務(wù)對于.Net橄教、Java、C++喘漏、Python护蝶、js等主流語言都有支持。我這里用.Net舉例子陷遮，其他語言用法點(diǎn)擊上圖中的【快速入門指南】

Nuget安裝SDK：Install-Package Microsoft.CognitiveServices.Speech

3滓走、首先是一個(gè)工具類Helper.cs

它的作用是把大wav音頻文件轉(zhuǎn)換為“音頻拉流”PullAudioInputStreamCallback

這個(gè)代碼是從Azure的GitHub官方例子中Copy出來的。

using Microsoft.CognitiveServices.Speech.Audio;

using System.Diagnostics;

using System.IO;

namespace Demo

{

? ? public class Helper

? ? {

? ? ? ? public static AudioConfig OpenWavFile(string filename)

? ? ? ? {

? ? ? ? ? ? BinaryReader reader = new BinaryReader(File.OpenRead(filename));

? ? ? ? ? ? return OpenWavFile(reader);

? ? ? ? }

? ? ? ? public static AudioConfig OpenWavFile(BinaryReader reader)

? ? ? ? {

? ? ? ? ? ? AudioStreamFormat format = readWaveHeader(reader);

? ? ? ? ? ? return AudioConfig.FromStreamInput(new BinaryAudioStreamReader(reader), format);

? ? ? ? }

? ? ? ? public static BinaryAudioStreamReader CreateWavReader(string filename)

? ? ? ? {

? ? ? ? ? ? BinaryReader reader = new BinaryReader(File.OpenRead(filename));

? ? ? ? ? ? // read the wave header so that it won't get into the in the following readings

? ? ? ? ? ? AudioStreamFormat format = readWaveHeader(reader);

? ? ? ? ? ? return new BinaryAudioStreamReader(reader);

? ? ? ? }

? ? ? ? public static AudioStreamFormat readWaveHeader(BinaryReader reader)

? ? ? ? {

? ? ? ? ? ? // Tag "RIFF"

? ? ? ? ? ? char[] data = new char[4];

? ? ? ? ? ? reader.Read(data, 0, 4);

? ? ? ? ? ? Trace.Assert((data[0] == 'R') && (data[1] == 'I') && (data[2] == 'F') && (data[3] == 'F'), "Wrong wav header");

? ? ? ? ? ? // Chunk size

? ? ? ? ? ? long fileSize = reader.ReadInt32();

? ? ? ? ? ? // Subchunk, Wave Header

? ? ? ? ? ? // Subchunk, Format

? ? ? ? ? ? // Tag: "WAVE"

? ? ? ? ? ? reader.Read(data, 0, 4);

? ? ? ? ? ? Trace.Assert((data[0] == 'W') && (data[1] == 'A') && (data[2] == 'V') && (data[3] == 'E'), "Wrong wav tag in wav header");

? ? ? ? ? ? // Tag: "fmt"

? ? ? ? ? ? reader.Read(data, 0, 4);

? ? ? ? ? ? Trace.Assert((data[0] == 'f') && (data[1] == 'm') && (data[2] == 't') && (data[3] == ' '), "Wrong format tag in wav header");

? ? ? ? ? ? // chunk format size

? ? ? ? ? ? var formatSize = reader.ReadInt32();

? ? ? ? ? ? var formatTag = reader.ReadUInt16();

? ? ? ? ? ? var channels = reader.ReadUInt16();

? ? ? ? ? ? var samplesPerSecond = reader.ReadUInt32();

? ? ? ? ? ? var avgBytesPerSec = reader.ReadUInt32();

? ? ? ? ? ? var blockAlign = reader.ReadUInt16();

? ? ? ? ? ? var bitsPerSample = reader.ReadUInt16();

? ? ? ? ? ? // Until now we have read 16 bytes in format, the rest is cbSize and is ignored for now.

? ? ? ? ? ? if (formatSize > 16)

? ? ? ? ? ? ? ? reader.ReadBytes((int)(formatSize - 16));

? ? ? ? ? ? // Second Chunk, data

? ? ? ? ? ? // tag: data.

? ? ? ? ? ? reader.Read(data, 0, 4);

? ? ? ? ? ? Trace.Assert((data[0] == 'd') && (data[1] == 'a') && (data[2] == 't') && (data[3] == 'a'), "Wrong data tag in wav");

? ? ? ? ? ? // data chunk size

? ? ? ? ? ? int dataSize = reader.ReadInt32();

? ? ? ? ? ? // now, we have the format in the format parameter and the

? ? ? ? ? ? // reader set to the start of the body, i.e., the raw sample data

? ? ? ? ? ? return AudioStreamFormat.GetWaveFormatPCM(samplesPerSecond, (byte)bitsPerSample, (byte)channels);

? ? ? ? }

? ? }

? ? /// <summary>

? ? /// Adapter class to the native stream api.

? ? /// </summary>

? ? public sealed class BinaryAudioStreamReader : PullAudioInputStreamCallback

? ? {

? ? ? ? private System.IO.BinaryReader _reader;

? ? ? ? /// <summary>

? ? ? ? /// Creates and initializes an instance of BinaryAudioStreamReader.

? ? ? ? /// </summary>

? ? ? ? /// <param name="reader">The underlying stream to read the audio data from. Note: The stream contains the bare sample data, not the container (like wave header data, etc).</param>

? ? ? ? public BinaryAudioStreamReader(System.IO.BinaryReader reader)

? ? ? ? {

? ? ? ? ? ? _reader = reader;

? ? ? ? }

? ? ? ? /// <summary>

? ? ? ? /// Creates and initializes an instance of BinaryAudioStreamReader.

? ? ? ? /// </summary>

? ? ? ? /// <param name="stream">The underlying stream to read the audio data from. Note: The stream contains the bare sample data, not the container (like wave header data, etc).</param>

? ? ? ? public BinaryAudioStreamReader(System.IO.Stream stream)

? ? ? ? ? ? : this(new System.IO.BinaryReader(stream))

? ? ? ? {

? ? ? ? }

? ? ? ? /// <summary>

? ? ? ? /// Reads binary data from the stream.

? ? ? ? /// </summary>

? ? ? ? /// <param name="dataBuffer">The buffer to fill</param>

? ? ? ? /// <param name="size">The size of data in the buffer.</param>

? ? ? ? /// <returns>The number of bytes filled, or 0 in case the stream hits its end and there is no more data available.

? ? ? ? /// If there is no data immediate available, Read() blocks until the next data becomes available.</returns>

? ? ? ? public override int Read(byte[] dataBuffer, uint size)

? ? ? ? {

? ? ? ? ? ? return _reader.Read(dataBuffer, 0, (int)size);

? ? ? ? }

? ? ? ? /// <summary>

? ? ? ? /// This method performs cleanup of resources.

? ? ? ? /// The Boolean parameter <paramref name="disposing"/> indicates whether the method is called from <see cref="IDisposable.Dispose"/> (if <paramref name="disposing"/> is true) or from the finalizer (if <paramref name="disposing"/> is false).

? ? ? ? /// Derived classes should override this method to dispose resource if needed.

? ? ? ? /// </summary>

? ? ? ? /// <param name="disposing">Flag to request disposal.</param>

? ? ? ? protected override void Dispose(bool disposing)

? ? ? ? {

? ? ? ? ? ? if (disposed)

? ? ? ? ? ? {

? ? ? ? ? ? ? ? return;

? ? ? ? ? ? }

? ? ? ? ? ? if (disposing)

? ? ? ? ? ? {

? ? ? ? ? ? ? ? _reader.Dispose();

? ? ? ? ? ? }

? ? ? ? ? ? disposed = true;

? ? ? ? ? ? base.Dispose(disposing);

? ? ? ? }

? ? ? ? private bool disposed = false;

? ? }

}

4帽馋、編寫主體識別代碼：

using Microsoft.CognitiveServices.Speech;

using Microsoft.CognitiveServices.Speech.Audio;

using System;

using System.Threading.Tasks;

namespace Demo

{

? ? class SpeechToTestMain

? ? {

? ? ? ? static void Main(string[] args)

? ? ? ? {

? ? ? ? ? ? T1().Wait();

? ? ? ? ? ? Console.WriteLine("ok");

? ? ? ? ? ? Console.ReadKey();

? ? ? ? }

? ? ? ? static async Task T1()

? ? ? ? {

? ? ? ? ? ? var file = @"E:\1.wav";

? ? ? ? ? ? var config = SpeechConfig.FromSubscription("這里填寫在第一步中拿到的密鑰", "westus");

? ? ? ? ? ? //通過設(shè)置config.SpeechRecognitionLanguage屬性設(shè)定識別的語言搅方，默認(rèn)是英文

? ? ? ? ? ? // config.OutputFormat = OutputFormat.Detailed;是讓 recognizer.Recognized 中可以通過var best = e.Result.Best();

? ? ? ? ? ? //拿到一句話的多個(gè)識別形式：比如數(shù)字是寫成3還是three

? ? ? ? ? ? var stopRecognition = new TaskCompletionSource<int>();

? ? ? ? ? ? //不要用AudioConfig.FromWavFileInput，因?yàn)樗麩o法處理大wav文件

? ? ? ? ? ? using (var pushStream = AudioInputStream.CreatePushStream())

? ? ? ? ? ? using (var audioInput = AudioConfig.FromStreamInput(pushStream))

? ? ? ? ? ? using (var recognizer = new SpeechRecognizer(config, audioInput))

? ? ? ? ? ? {

? ? ? ? ? ? ? ? recognizer.Recognized += (s, e) =>

? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? if (e.Result.Reason == ResultReason.RecognizedSpeech)

? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? Console.WriteLine($"RECOGNIZED: Text={e.Result.Text} Duration={e.Result.Duration} OffsetInTicks={e.Result.OffsetInTicks}");

? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? ? ? else if (e.Result.Reason == ResultReason.NoMatch)

? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? Console.WriteLine($"NOMATCH: Speech could not be recognized.");

? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? };

? ? ? ? ? ? ? ? recognizer.Canceled += (s, e) =>

? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? Console.WriteLine($"CANCELED: Reason={e.Reason}");

? ? ? ? ? ? ? ? ? ? if (e.Reason == CancellationReason.Error)

? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");

? ? ? ? ? ? ? ? ? ? ? ? Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");

? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? ? ? stopRecognition.TrySetResult(0);

? ? ? ? ? ? ? ? };

? ? ? ? ? ? ? ? recognizer.SessionStarted += (s, e) =>

? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? Console.WriteLine("\nSession started event.");

? ? ? ? ? ? ? ? };

? ? ? ? ? ? ? ? recognizer.SessionStopped += (s, e) =>

? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? Console.WriteLine("\nSession stopped event.");

? ? ? ? ? ? ? ? ? ? stopRecognition.TrySetResult(0);

? ? ? ? ? ? ? ? };

? ? ? ? ? ? ? ? // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.

? ? ? ? ? ? ? ? await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

? ? ? ? ? ? ? ? // open and read the wave file and push the buffers into the recognizer

? ? ? ? ? ? ? ? using (BinaryAudioStreamReader reader = Helper.CreateWavReader(file))

? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? byte[] buffer = new byte[1000];

? ? ? ? ? ? ? ? ? ? while (true)

? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? var readSamples = reader.Read(buffer, (uint)buffer.Length);

? ? ? ? ? ? ? ? ? ? ? ? if (readSamples == 0)

? ? ? ? ? ? ? ? ? ? ? ? {

? ? ? ? ? ? ? ? ? ? ? ? ? ? break;

? ? ? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? ? ? ? ? pushStream.Write(buffer, readSamples);

? ? ? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? }

? ? ? ? ? ? ? ? pushStream.Close();

? ? ? ? ? ? ? ? // Waits for completion.

? ? ? ? ? ? ? ? // Use Task.WaitAny to keep the task rooted.

? ? ? ? ? ? ? ? Task.WaitAny(new[] { stopRecognition.Task });

? ? ? ? ? ? ? ? // Stops recognition.

? ? ? ? ? ? ? ? await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);

? ? ? ? ? ? }

? ? ? ? }

? ? }

}

可以看到在recognizer.Recognized事件中绽族，我們可以拿到識別出來的一段段的話姨涡，一個(gè)比較長的語音會分為多次觸發(fā)recognizer.Recognized事件識別出來。e.Result.Text屬性是識別出來的文本吧慢，e.Result.Duration是識別出來的這段話的長度涛漂，e.Result.OffsetInTicks是識別出來這句話在整個(gè)音頻中的位置，可以用?TimeSpan.FromTicks(e.Result.OffsetInTicks)轉(zhuǎn)換為TimeSpan類型。

5匈仗、注意：包括Azure認(rèn)知服務(wù)在內(nèi)的幾乎所有語音識別引擎都只支持wav文件瓢剿，不支持mp3等格式的文件。而且需要注意的是wav碼率必須是16000悠轩，否則OffsetInTicks時(shí)間將會不準(zhǔn)確间狂。

可以用NAudio這個(gè)組件把mp3轉(zhuǎn)換為wav文件，NAudio是全托管代碼火架，不像ffmpeg是單獨(dú)運(yùn)行一個(gè)進(jìn)程鉴象，無論是調(diào)試還是其他的都很麻煩。

下面是使用NAudio進(jìn)行mp3轉(zhuǎn)為wav的代碼何鸡，再次強(qiáng)調(diào)：碼率必須是16000.

using (Mp3FileReader reader = new Mp3FileReader(mp3file))

using (WaveStream pcmStream = new WaveFormatConversionStream(new WaveFormat(16000, 1),reader))

{

WaveFileWriter.CreateWaveFile(wavFile, pcmStream);

}

最后編輯于：2019.03.06 16:43:51

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末纺弊，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子骡男，更是在濱河造成了極大的恐慌淆游，老刑警劉巖，帶你破解...
沈念sama閱讀 211,290評論 6贊 491
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件隔盛，死亡現(xiàn)場離奇詭異稽犁，居然都是意外死亡，警方通過查閱死者的電腦和手機(jī)骚亿，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 90,107評論 2贊 385
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門已亥，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人来屠，你說我怎么就攤上這事虑椎。” “怎么了俱笛？”我有些...
開封第一講書人閱讀 156,872評論 0贊 347
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵捆姜，是天一觀的道長。經(jīng)常有香客問我迎膜，道長泥技，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 56,415評論 1贊 283
?港島之戀（遺憾婚禮）
正文為了忘掉前任磕仅，我火速辦了婚禮珊豹，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘榕订。我一直安慰自己店茶，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 65,453評論 6贊 385
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布劫恒。她就那樣靜靜地躺著贩幻，像睡著了一般轿腺。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上丛楚，一...
開封第一講書人閱讀 49,784評論 1贊 290
城市分裂傳說
那天族壳，我揣著相機(jī)與錄音，去河邊找鬼趣些。笑死决侈，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的喧务。我是一名探鬼主播，決...
沈念sama閱讀 38,927評論 3贊 406
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼枉圃，長吁一口氣：“原來是場噩夢啊……” “哼功茴！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起孽亲，我...
開封第一講書人閱讀 37,691評論 0贊 266
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤坎穿，失蹤者是張志新（化名）和其女友劉穎，沒想到半個(gè)月后返劲，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體玲昧，經(jīng)...
沈念sama閱讀 44,137評論 1贊 303
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,472評論 2贊 326
?白月光啟示錄
正文我和宋清朗相戀三年篮绿，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了孵延。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點(diǎn)故事閱讀 38,622評論 1贊 340
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡亲配，死狀恐怖尘应，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情吼虎，我是刑警寧澤犬钢，帶...
沈念sama閱讀 34,289評論 4贊 329
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站思灰，受9級特大地震影響玷犹，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜洒疚，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,887評論 3贊 312
男人毒藥：我在死后第九天來索命
文/蒙蒙一歹颓、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧油湖，春花似錦晴股、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,741評論 0贊 21
一樁弒父案电湘，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至，卻和暖如春寂呛，著一層夾襖步出監(jiān)牢的瞬間怎诫，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 31,977評論 1贊 265
情欲美人皮
我被黑心中介騙來泰國打工贷痪，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留幻妓，地道東北人。一個(gè)月前我還...
沈念sama閱讀 46,316評論 2贊 360
代替公主和親
正文我出身青樓劫拢，卻偏偏與公主長得像肉津，于是被迫代替她去往敵國和親。傳聞我的和親對象是個(gè)殘疾皇子舱沧，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 43,490評論 2贊 348

調(diào)用Azure認(rèn)知服務(wù)進(jìn)行長音頻文件語音識別CognitiveServices Speech

推薦閱讀更多精彩內(nèi)容