騰訊云語音識(shí)別(Automatic Speech Recognition戏阅,ASR)是將語音轉(zhuǎn)化成文字的PaaS產(chǎn)品橙凳,為企業(yè)提供精準(zhǔn)而極具性價(jià)比的識(shí)別服務(wù)。被微信彭羹、王者榮耀黄伊、騰訊視頻等大量業(yè)務(wù)使用,適用于錄音質(zhì)檢派殷、會(huì)議實(shí)時(shí)轉(zhuǎn)寫还最、語音輸入法等多個(gè)場(chǎng)景。
官方接入文檔地址:
https://cloud.tencent.com/document/product/1093/35722
實(shí)時(shí)語音識(shí)別 Android SDK 及 Demo 下載地址:接入 SDK 下載毡惜。
接入步驟
1. 下載好aar包添加依賴
implementation(name: 'asr-realtime-release', ext: 'aar')
2. 在 AndroidManifest.xml 添加如下權(quán)限:
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
<uses-permission android:name="android.permission.INTERNET"/>
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
3. 初始化ASR功能拓轻,只需要初始化一次
var aaiClient: AAIClient? = null
fun initAsr() {
// okhttp全局配置
ClientConfiguration.setAudioRecognizeConnectTimeout(3000)
ClientConfiguration.setAudioRecognizeWriteTimeout(5000)
val appid: Int = ASRConstent.APPID
val projectId = 0 //此參數(shù)固定為0
val secretId = ASRConstent.SECRETID
val secretKey = ASRConstent.SECRETKEY
try {
/**直接鑒權(quán) */
// 1、簽名鑒權(quán)類经伙,sdk中給出了一個(gè)本地的鑒權(quán)類扶叉,您也可以自行實(shí)現(xiàn)CredentialProvider接口,在您的服務(wù)器上實(shí)現(xiàn)鑒權(quán)簽名
aaiClient = AAIClient(getApplication(),
appid,
projectId,
secretId,
LocalCredentialProvider(secretKey)
)
} catch (e: ClientException) {
e.printStackTrace()
}
}
4. 開始識(shí)別語音
fun startAsr() {
try {
// 5帕膜、啟動(dòng)語音識(shí)別
viewModelScope.launch {
aaiClient?.startAudioRecognize(audioRecognizeRequest,
audioRecognizeResultlistener,
recognizeStateListener,
audioRecognizeConfiguration)
}
} catch (e: Exception) {
e.printStackTrace()
}
}
創(chuàng)建開始語音識(shí)別的各個(gè)listener
// 初始化語音識(shí)別請(qǐng)求枣氧。
val audioRecognizeRequest: AudioRecognizeRequest =
builder
.pcmAudioDataSource(getRecordDataSource())
.setEngineModelType("16k_zh") // 設(shè)置引擎參數(shù)("16k_zh" 通用引擎,支持中文普通話+英文,"16k_en"英文引擎)
.setFilterDirty(0) // 0 :默認(rèn)狀態(tài) 不過濾臟話 1:過濾臟話
.setFilterModal(2) // 0 :默認(rèn)狀態(tài) 不過濾語氣詞 1:過濾部分語氣詞 2:嚴(yán)格過濾
.setFilterPunc(1) // 0 :默認(rèn)狀態(tài) 不過濾句末的句號(hào) 1:濾句末的句號(hào)
.setConvert_num_mode(1) //1:默認(rèn)狀態(tài) 根據(jù)場(chǎng)景智能轉(zhuǎn)換為阿拉伯?dāng)?shù)字泳叠;0:全部轉(zhuǎn)為中文數(shù)字作瞄。
.setNeedvad(1) //0:關(guān)閉 vad,1:默認(rèn)狀態(tài) 開啟 vad危纫。
.setWordInfo(1) //0:關(guān)閉 vad宗挥,1:默認(rèn)狀態(tài) 開啟 vad。
// .setNoiseThreshold(0) //是否嚴(yán)格過濾噪音 0不嚴(yán)格种蝶, 1 嚴(yán)格
.build()
// 初始化語音識(shí)別結(jié)果監(jiān)聽器契耿。
val audioRecognizeResultlistener: AudioRecognizeResultListener =
object : AudioRecognizeResultListener {
override fun onSliceSuccess(request: AudioRecognizeRequest?,
result: AudioRecognizeResult?,
seq: Int) {
//返回分片的識(shí)別結(jié)果,此為中間態(tài)結(jié)果螃征,會(huì)被持續(xù)修正搪桂, //在這里實(shí)時(shí)獲取識(shí)別的結(jié)果
result?.let {
}
}
override fun onSegmentSuccess(request: AudioRecognizeRequest?,
result: AudioRecognizeResult?,
seq: Int) {
//返回語音流的識(shí)別結(jié)果,此為穩(wěn)定態(tài)結(jié)果,可做為識(shí)別結(jié)果用與業(yè)務(wù)
//在這里獲取一句話識(shí)別的最終結(jié)果
}
override fun onSuccess(request: AudioRecognizeRequest?, result: String?) {}
override fun onFailure(request: AudioRecognizeRequest?,
clientException: ClientException?,
serverException: ServerException?,
response: String?) {
// 識(shí)別失敗
}
}
//自定義識(shí)別配置
val audioRecognizeConfiguration: AudioRecognizeConfiguration =
AudioRecognizeConfiguration.Builder() //分片默認(rèn)40ms踢械,可設(shè)置40-5000,必須為20的整倍數(shù)酗电,
//分片默認(rèn)40ms,可設(shè)置40-5000,必須為20的整倍數(shù)内列,如果不是撵术,sdk內(nèi)將自動(dòng)調(diào)整為20的整倍數(shù),例如77將被調(diào)整為60话瞧,如果您不了解此參數(shù)不建議更改
//.sliceTime(40)
// 是否使能靜音檢測(cè)嫩与,
.setSilentDetectTimeOut(false)
//觸發(fā)靜音超時(shí)后是否停止識(shí)別,默認(rèn)為true:停止交排,setSilentDetectTimeOut為true時(shí)參數(shù)有效
.setSilentDetectTimeOutAutoStop(true)
.audioFlowSilenceTimeOut(2000)
.minVolumeCallbackTime(100)
.isCompress(true).build()
5. 結(jié)束語音識(shí)別
fun stopAsr() {
viewModelScope.launch {
if (aaiClient != null) {
//停止語音識(shí)別划滋,等待最終識(shí)別結(jié)果
aaiClient?.stopAudioRecognize()
}
}
}
6. 更多設(shè)置
6.1 設(shè)置狀態(tài)監(jiān)聽器
AudioRecognizeStateListener 可以用來監(jiān)聽語音識(shí)別的狀態(tài):
6.2 回聲消除
/**
* 注: 部分android機(jī)型可以通過該方式解決回音消除失效的問題
* https://blog.csdn.net/wyw0000/article/details/125195997
*/
// 1. 設(shè)置音頻模式為AudioManager.MODE_IN_COMMUNICATION可以起到回音消除的作用
AudioManager audioManager = (AudioManager)context.getSystemService(Context.AUDIO_SERVICE);
audioManager.setMode(AudioManager.MODE_IN_COMMUNICATION);
// 2. 音頻源使用MediaRecorder.AudioSource.VOICE_COMMUNICATION可以起到回音消除的作用
int audioSource = MediaRecorder.AudioSource.VOICE_COMMUNICATION;
AudioRecord audioRecord = new AudioRecord(audioSource, sampleRate, channel, audioFormat, bufferSize);
/**
* 注: 以下兩個(gè)能力(AcousticEchoCanceler和NoiseSuppressor)和手機(jī)硬件能力相關(guān),有些機(jī)型(比如小米11)即使isAvailable()==true埃篓,回音消除也不生效
*/
// AcousticEchoCanceler回音消除
if (AcousticEchoCanceler.isAvailable()) {
Log.d(TAG, "AcousticEchoCanceler isAvailable.");
AcousticEchoCanceler acousticEchoCanceler = AcousticEchoCanceler
.create(audioRecord.getAudioSessionId());
int resultCode = acousticEchoCanceler.setEnabled(true);
if (AudioEffect.SUCCESS == resultCode) {
Log.d(TAG, "AcousticEchoCanceler AudioEffect SUCCESS");
}
}
// NoiseSuppressor噪音抑制
if (NoiseSuppressor.isAvailable()) {
Log.d(TAG, "NoiseSuppressor isAvailable.");
NoiseSuppressor noiseSuppressor = NoiseSuppressor
.create(audioRecord.getAudioSessionId());
int resultCode = noiseSuppressor.setEnabled(true);
if (AudioEffect.SUCCESS == resultCode) {
Log.d(TAG, "NoiseSuppressor AudioEffect SUCCESS");
}
}