目錄:
文字識別(OCR)工具箱
文字識別(OCR)目前在多個行業(yè)中得到了廣泛應用几于,比如金融行業(yè)的單據(jù)識別輸入,餐飲行業(yè)中的發(fā)票識別,
交通領域的車票識別,企業(yè)中各種表單識別,以及日常工作生活中常用的身份證拂封,駕駛證,護照識別等等鹦蠕。
OCR(文字識別)是目前常用的一種AI能力冒签。
OCR工具箱功能:
- 方向檢測
- 0度
- 90度
- 180度
- 270度
detect_direction
圖片旋轉
文字識別(提供三個模型)
- mobile模型
- light模型
- 服務器端模型
- 版面分析(支持5個類別, 用于配合文字識別,表格識別的流水線處理)
- Text
- Title
- List
- Table
- Figure
- 表格識別
- 生成html表格
- 生成excel文件
運行OCR識別例子
1.1 文字方向檢測:
- 例子代碼: OcrDetectionExample.java
- 運行成功后钟病,命令行應該看到下面的信息:
[INFO ] - Result image has been saved in: build/output/detect_result.png
[INFO ] - [
class: "0", probability: 1.00000, bounds: [x=0.073, y=0.069, width=0.275, height=0.026]
class: "0", probability: 1.00000, bounds: [x=0.652, y=0.158, width=0.222, height=0.040]
class: "0", probability: 1.00000, bounds: [x=0.143, y=0.252, width=0.144, height=0.026]
class: "0", probability: 1.00000, bounds: [x=0.628, y=0.328, width=0.168, height=0.026]
class: "0", probability: 1.00000, bounds: [x=0.064, y=0.330, width=0.450, height=0.023]
]
-
輸出圖片效果如下:
detect_result
1.2 文字方向檢測幫助類(增加置信度信息顯示萧恕,便于調試):
- 例子代碼: OcrDetectionHelperExample.java
- 運行成功后刚梭,命令行應該看到下面的信息:
[INFO ] - Result image has been saved in: build/output/detect_result_helper.png
[INFO ] - [
class: "0 :1.0", probability: 1.00000, bounds: [x=0.073, y=0.069, width=0.275, height=0.026]
class: "0 :1.0", probability: 1.00000, bounds: [x=0.652, y=0.158, width=0.222, height=0.040]
class: "0 :1.0", probability: 1.00000, bounds: [x=0.143, y=0.252, width=0.144, height=0.026]
class: "0 :1.0", probability: 1.00000, bounds: [x=0.628, y=0.328, width=0.168, height=0.026]
class: "0 :1.0", probability: 1.00000, bounds: [x=0.064, y=0.330, width=0.450, height=0.023]
]
-
輸出圖片效果如下:
detect_result_helper
2. 圖片旋轉:
每調用一次rotateImg方法,會使圖片逆時針旋轉90度票唆。
- 例子代碼: RotationExample.java
-
旋轉前圖片:
ticket_0 -
旋轉后圖片效果如下:
rotate_result
3. 文字識別:
再使用本方法前朴读,請調用上述方法使圖片文字呈水平(0度)方向。
- 例子代碼: LightOcrRecognitionExample.java
- 運行成功后走趋,命令行應該看到下面的信息:
[INFO ] - [
class: "你", probability: -1.0e+00, bounds: [x=0.319, y=0.164, width=0.050, height=0.057]
class: "永遠都", probability: -1.0e+00, bounds: [x=0.329, y=0.349, width=0.206, height=0.044]
class: "無法叫醒一個", probability: -1.0e+00, bounds: [x=0.328, y=0.526, width=0.461, height=0.044]
class: "裝睡的人", probability: -1.0e+00, bounds: [x=0.330, y=0.708, width=0.294, height=0.043]
]
-
輸出圖片效果如下:
ocr_result
4. 版面分析:
- 運行成功后衅金,命令行應該看到下面的信息:
[INFO ] - [
class: "Text", probability: 0.98750, bounds: [x=0.081, y=0.620, width=0.388, height=0.183]
class: "Text", probability: 0.98698, bounds: [x=0.503, y=0.464, width=0.388, height=0.167]
class: "Text", probability: 0.98333, bounds: [x=0.081, y=0.465, width=0.387, height=0.121]
class: "Figure", probability: 0.97186, bounds: [x=0.074, y=0.091, width=0.815, height=0.304]
class: "Table", probability: 0.96995, bounds: [x=0.506, y=0.712, width=0.382, height=0.143]
]
-
輸出圖片效果如下:
layout
5. 表格識別:
- 運行成功后,命令行應該看到下面的信息:
<html>
<body>
<table>
<thead>
<tr>
<td>Methods</td>
<td>R</td>
<td>P</td>
<td>F</td>
<td>FPS</td>
</tr>
</thead>
<tbody>
<tr>
<td>SegLink[26]</td>
<td>70.0</td>
<td>86.0</td>
<td>770</td>
<td>89</td>
</tr>
<tr>
<td>PixelLink[4j</td>
<td>73.2</td>
<td>83.0</td>
<td>77.8</td>
<td></td>
</tr>
...
</tbody>
</table>
</body>
</html>
-
輸出圖片效果如下:
table -
生成excel效果如下:
excel