SwiftOCR在手機(jī)號(hào)/數(shù)字識(shí)別App中的應(yīng)用
數(shù)字識(shí)別的基本過程

在第一,二步圖像獲取與預(yù)處理后
步驟三使用 Connected-component labeling 技術(shù)實(shí)現(xiàn)了文字的分割,這是識(shí)別前關(guān)鍵的一步.
步驟四,對(duì)圖片進(jìn)行縮放 轉(zhuǎn)換為16x20的二進(jìn)制數(shù)組 做為神經(jīng)網(wǎng)絡(luò)輸入前的準(zhǔn)備
步驟五,對(duì)這組數(shù)據(jù)跑神經(jīng)網(wǎng)絡(luò)
步驟六,輸出運(yùn)算結(jié)果 98.99的可能性是數(shù)字1.
SwiftOCR簡(jiǎn)介
SwiftOCR是一個(gè)由Swift編寫的快速簡(jiǎn)單的光學(xué)字符識(shí)別庫,使用FFNN神經(jīng)網(wǎng)絡(luò)進(jìn)行圖像識(shí)別.針對(duì)0~9 A~Z的目標(biāo)字符進(jìn)行識(shí)別.
SwiftOCR的作者對(duì)為什么使用SwiftOCR代替經(jīng)典的Tesseract 有如下論述:
If you want to recognize normal text like a poem or a news article, go with Tesseract, but if you want to recognize short, alphanumeric codes (e.g. gift cards), I would advise you to choose SwiftOCR because that's where it exceeds.
Tesseract is written in C++ and over 30 years old. To use it you first have to write a Objective-C++ wrapper for it. The main issue that's slowing down Tesseract is the way memory is managed. Too many memory allocations and releases slow it down.
改造過程
首先業(yè)務(wù)目標(biāo): 識(shí)別印刷體的手機(jī)號(hào).
第一步,重新訓(xùn)練數(shù)據(jù)訓(xùn)練集,使用程序生成各種系統(tǒng)字體的0~9的數(shù)字進(jìn)行訓(xùn)練,輸出新的OCR-Network文件.
第二步,圖像輸入改為視頻流.
第三步,優(yōu)化--提高幀率.
優(yōu)化邏輯與具體代碼
1?? 經(jīng)過測(cè)試 識(shí)別算法運(yùn)行中,仍然會(huì)有數(shù)據(jù)的到達(dá).如果算法正在運(yùn)行,則直接返回.
2?? 識(shí)別算法在對(duì)數(shù)字串進(jìn)行分割處理時(shí),會(huì)調(diào)用比較重的由GPUImage庫提供的Connected-component labeling,性能消耗較大.
所以首先調(diào)用iOS 9提供的文字檢測(cè)API,判斷輸入圖像中是否有文字.
有則識(shí)別文字,無則返回.
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
// 識(shí)別中 則返回
if self.viewModel.isOCRRecognizing {
return
}
// 截取圖片
self.imgToRecognize = XGCameraScanWrapper.cropImageFromSampleBuffer(using: sampleBuffer, croppedSizeInScreen: (self.qRScanView?.getRetangeSize())!)
// 調(diào)用 iOS 9 文字檢測(cè)API. 若沒有檢測(cè)到文字 => 則返回 不跑數(shù)字識(shí)別算法
if #available(iOS 9.0, *) {
let ciDetector = CIDetector(ofType: CIDetectorTypeText, context: nil, options: nil)
guard let ciImgToRecognize = self.convertUIImageToCIImage(uiImage: self.imgToRecognize!) else {
return
}
let features = ciDetector?.features(in: ciImgToRecognize)
if (features?.isEmpty)! {
self.recognizedImgView?.isHidden = true
return
} else {
self.recognizedImgView?.image = self.imgToRecognize
self.recognizedImgView?.isHidden = false
}
}
// 識(shí)別圖像
self.viewModel.isOCRRecognizing = true
self.xgDigitalRecognizeService?.recognize(self.imgToRecognize!) { recognizedString in
if recognizedString.utf16.count >= 11 { // 簡(jiǎn)單通過識(shí)別結(jié)果的長(zhǎng)度進(jìn)行輸出判斷 實(shí)際可通過正則限制結(jié)果的輸出
DispatchQueue.main.async {
self.viewModel.phoneNumStr = recognizedString
self.resultLablePhoneNum?.text = "手機(jī)號(hào): " + self.viewModel.phoneNumStr
}
}
self.viewModel.isOCRRecognizing = false
}
}
Other
手寫字體的機(jī)器識(shí)別是一個(gè)很久遠(yuǎn)經(jīng)典的問題.
也有一個(gè)近30年的標(biāo)準(zhǔn)訓(xùn)練集MNIST
可否支持手寫字體的識(shí)別? 單個(gè)手寫字體的識(shí)別不難,有興趣可參考下面Tensorflow on iOS這篇文章.
難點(diǎn)是如何處理連筆書寫的數(shù)字,此時(shí)通過 Connected-component labeling 技術(shù)進(jìn)行文字的分割已經(jīng)失效.