Hololens上語音輸入有三種形式坟募,分別是:
- 語音命令 Voice Command
- 聽寫 Diction
- 語法識別 Grammar Recognizer
在 HoloLens開發(fā)手記 - 語音識別(語音命令) 博客已經(jīng)介紹了 Voice Command 的用法。本文將介紹聽寫的用法:
聽寫識別 Diction
聽寫就是語音轉(zhuǎn)化成文字 (Speech to Text)崎逃。此特性在HoloLens上使用的場所一般多用于需要用到鍵入文字的地方,例如在HoloLens中使用 Edge 搜索時休偶,由于在HoloLens上一般是非常規(guī)的物理鍵盤輸入换途,使用手勢點(diǎn)按虛擬鍵盤鍵入文字的具體操作需要用戶轉(zhuǎn)動頭部將Gaze射線光標(biāo)定位到想輸入的虛擬鍵盤字母上咬荷,再用Gesture點(diǎn)按手勢確認(rèn)選定此字母路呜,由此可見還是有極大的不便性迷捧。
Paste_Image.png
所以語音轉(zhuǎn)為文字實(shí)現(xiàn)鍵入內(nèi)容的操作將能大大提高效率织咧。
聽寫特性用于將用戶語音轉(zhuǎn)為文字輸入,同時支持內(nèi)容推斷和事件注冊特性漠秋。Start()和Stop()方法用于啟用和禁用聽寫功能笙蒙,在聽寫結(jié)束后需要調(diào)用Dispose()方法來關(guān)閉聽寫頁面。GC會自動回收它的資源庆锦,如果不Dispose會帶來額外的性能開銷捅位。
使用聽寫識別應(yīng)該注意的是:
- 在你的應(yīng)用中必須打開 Microphone 特性。設(shè)置如下:Edit -> Project Settings -> Player -> Windows Store -> Publishing Settings > Capabilities 中確認(rèn)勾上Microphone搂抒。
- 必須確認(rèn)HoloLens連接上了wifi艇搀,這樣聽寫識別才能工作。
DictationRecognizer.cs
using HoloToolkit;
using System.Collections;
using System.Text;
using UnityEngine;
using UnityEngine.UI;
using UnityEngine.Windows.Speech;
public class MicrophoneManager : MonoBehaviour
{
[Tooltip("A text area for the recognizer to display the recognized strings.")]
public Text DictationDisplay;
private DictationRecognizer dictationRecognizer;
// Use this string to cache the text currently displayed in the text box.
//使用此字符串可以緩存當(dāng)前顯示在文本框中的文本求晶。
private StringBuilder textSoFar;
void Awake()
{
/* TODO: DEVELOPER CODING EXERCISE 3.a */
//Create a new DictationRecognizer and assign it to dictationRecognizer variable.
dictationRecognizer = new DictationRecognizer();
//Register for dictationRecognizer.DictationHypothesis and implement DictationHypothesis below
// This event is fired while the user is talking. As the recognizer listens, it provides text of what it's heard so far.
//注冊聽寫假設(shè)事件焰雕。此事件在用戶說話時觸發(fā)。當(dāng)識別器收聽時誉帅,提供到目前為止所聽到的內(nèi)容文本
dictationRecognizer.DictationHypothesis += DictationRecognizer_DictationHypothesis;
//Register for dictationRecognizer.DictationResult and implement DictationResult below
// This event is fired after the user pauses, typically at the end of a sentence. The full recognized string is returned here.
//注冊聽寫結(jié)果事件。此事件在用戶暫停后觸發(fā)右莱,通常在句子的結(jié)尾處蚜锨,返回完整的已識別字符串
dictationRecognizer.DictationResult += DictationRecognizer_DictationResult;
//Register for dictationRecognizer.DictationComplete and implement DictationComplete below
// This event is fired when the recognizer stops, whether from Stop() being called, a timeout occurring, or some other error.
//注冊聽寫完成事件。無論是調(diào)用Stop()函數(shù)慢蜓、發(fā)生超時或者其他的錯誤使得識別器停止都會觸發(fā)此事件
dictationRecognizer.DictationComplete += DictationRecognizer_DictationComplete;
//Register for dictationRecognizer.DictationError and implement DictationError below
// This event is fired when an error occurs.
//注冊聽寫錯誤事件亚再。當(dāng)發(fā)生錯誤時調(diào)用此事件,通常是為連接網(wǎng)絡(luò)或者在識別過程中網(wǎng)絡(luò)發(fā)生中斷等時產(chǎn)生錯誤
dictationRecognizer.DictationError += DictationRecognizer_DictationError;
// Shutdown the PhraseRecognitionSystem. This controls the KeywordRecognizers
//PhraseRecognitionSystem控制的是KeywordRecognizers晨抡,關(guān)閉語音命令關(guān)鍵字識別氛悬。只有在關(guān)閉這個后才能開啟聽寫識別
PhraseRecognitionSystem.Shutdown();
//Start dictationRecognizer
//開啟聽寫識別
dictationRecognizer.Start();
}
/// <summary>
/// This event is fired while the user is talking. As the recognizer listens, it provides text of what it's heard so far.
/// </summary>
/// <param name="text">The currently hypothesized recognition.</param>
private void DictationRecognizer_DictationHypothesis(string text)
{
// Set DictationDisplay text to be textSoFar and new hypothesized text
// We don't want to append to textSoFar yet, because the hypothesis may have changed on the next event
DictationDisplay.text = textSoFar.ToString() + " " + text + "...";
}
/// <summary>
/// This event is fired after the user pauses, typically at the end of a sentence. The full recognized string is returned here.
/// </summary>
/// <param name="text">The text that was heard by the recognizer.</param>
/// <param name="confidence">A representation of how confident (rejected, low, medium, high) the recognizer is of this recognition.</param>
private void DictationRecognizer_DictationResult(string text, ConfidenceLevel confidence)
{
// 3.a: Append textSoFar with latest text
textSoFar.Append(text + "");
// 3.a: Set DictationDisplay text to be textSoFar
DictationDisplay.text = textSoFar.ToString();
}
/// <summary>
/// This event is fired when the recognizer stops, whether from Stop() being called, a timeout occurring, or some other error.
/// Typically, this will simply return "Complete". In this case, we check to see if the recognizer timed out.
/// </summary>
/// <param name="cause">An enumerated reason for the session completing.</param>
private void DictationRecognizer_DictationComplete(DictationCompletionCause cause)
{
// If Timeout occurs, the user has been silent for too long.
// With dictation, the default timeout after a recognition is 20 seconds.
// The default timeout with initial silence is 5 seconds.
//如果在聽寫開始后第一個5秒內(nèi)沒聽到任何聲音,將會超時
//如果識別到了一個結(jié)果但是之后20秒沒聽到任何聲音耘柱,也會超時
if (cause == DictationCompletionCause.TimeoutExceeded)
{
Microphone.End(deviceName);
DictationDisplay.text = "Dictation has timed out. Please press the record button again.";
SendMessage("ResetAfterTimeout");
}
}
/// <summary>
/// This event is fired when an error occurs.
/// </summary>
/// <param name="error">The string representation of the error reason.</param>
/// <param name="hresult">The int representation of the hresult.</param>
private void DictationRecognizer_DictationError(string error, int hresult)
{
// 3.a: Set DictationDisplay text to be the error string
DictationDisplay.text = error + "\nHRESULT: " + hresult;
}
// Update is called once per frame
void Update () {
}
void OnDestroy()
{
dictationRecognizer.Stop();
dictationRecognizer.DictationHypothesis -= DictationRecognizer_DictationHypothesis;
dictationRecognizer.DictationResult -= DictationRecognizer_DictationResult;
dictationRecognizer.DictationComplete -= DictationRecognizer_DictationComplete;
dictationRecognizer.DictationError -= DictationRecognizer_DictationError;
dictationRecognizer.Dispose();
}
}
HoloLens只能運(yùn)行單個語音識別 (run at a time)如捅,所以若要使用聽寫識別的話,必須要關(guān)閉KeywordRecognizer调煎。
DictationRecognizer中設(shè)置有兩個超時:
- 如果識別器啟用并且在5秒內(nèi)沒有聽到任何聲音镜遣,將會超時。
- 如果識別器識別到了結(jié)果士袄,但是在20秒內(nèi)沒有聽到聲音悲关,將會超時。