Introduction
Keyword Spotting (KWS) aims at detecting predefined key-words in an audio stream, and it is a potential technique to provide the desired hands-free interface[1].
A commonly used technique for keyword spotting is the Key-word/Filler Hidden Markov Model (HMM). Other recent work ex- plores discriminative models for keyword spotting based on large-margin formulation or recurrent neural networks. These systems show improvement over the HMM approach but require processing of the entire utterance to find the optimal keyword region or take information from a long time span to predict the entire keyword, increasing detection latency.
Deep Learning Method
Deep KWS[1]
Honk[2]
CNN[3]
ResNet with Dilated Convolutions[4]
RNN with Attention[5]
文章[6]比較分析了幾種基于CNN的KWS蒿褂,并給出了仿真出的最優(yōu)參數(shù)、復(fù)雜度等畜份。其中KWS里面的“state of the art”結(jié)果來自上面的RNN with Attention[5]玄叠。
應(yīng)用Xception的depthwise separable convolution,提出適合移動設(shè)備的簡化運算版[7]摆马。
Reference
-
Small-footprint keyword spotting using deep neural networks ? ?
-
Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting ?
-
Speech Command Recognition with Convolutional Neural Network ?
-
Deep residual learning for small-footprint keyword spotting ?
-
Comparison and Analysis of SampleCNN Architectures for Audio Classification ?
-
Temporal convolution for real-time keyword spotting on mobile devices ?