tesseract

Tesseract，一款由HP实验室开发由Google维护的开源OCR引擎，开源，免费，支持多语言，多平台;

https://github.com/tesseract-ocr/tesseract.git

tesseract.js

js版本的Tesseract OCR,支持一百多种语言,使用也是非常简单，可以用npm安装，也可以直接在页面引用js

https://github.com/naptha/tesseract.js.git

PaddleOCR是百度开源一套OCR,旨在打造一套丰富、领先、且实用的OCR工具库，助力开发者训练出更好的模型，并应用落地。

https://github.com/PaddlePaddle/PaddleOCR.git

EasyOCR是用Python编写基于Tesseract的OCR识别库，用于图像识别输出文本，目前支持80多种语言。

https://github.com/JaidedAI/EasyOCR.git

MMOCR 是基于 PyTorch 和 mmdetection 的开源工具箱，专注于文本检测，文本识别以及相应的下游任务，如关键信息提取。

https://github.com/open-mmlab/mmocr.git

基于opencv 和numpy开源的OCR识别引擎

https://github.com/goncalopp/simple-ocr-opencv.git

OCRmyPDF是基于tesseract-ocr开发、训练的文字识别提取的开源项目

https://github.com/ocrmypdf/OCRmyPDF.git

基于 PaddleOCR 实现的一款开源的文字识别工具，

guanshuicheng/invoice

专门识别增值税发票的开源模型

https://github.com/guanshuicheng/invoice