TY - GEN
T1 - GPGPU-based High Throughput Image Pre-processing Towards Large-Scale Optical Character Recognition
AU - Gener, Serhan
AU - Dattilo, Parker
AU - Gajaria, Dhruv
AU - Fusco, Alexander
AU - Akoglu, Ali
N1 - Funding Information:
ACKNOWLEDGEMENT This work is partly supported by National Science Foundation (NSF) research projects NSF CNS-1624668. This material is based upon High Performance Computing (HPC) resources supported by the University of Arizona TRIF, UITS, and Research, Innovation, and Impact (RII) and maintained by the UArizona Research Technologies department. REFERENCES
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Studies have shown that pre-processing digital images through scaling, rotation and blurring type of operations allow optical character recognition (OCR) to focus on the key features in the image and result in improving recognition accuracy. We leverage the open-source Tesseract OCR and show that its accuracy can be improved through a pre-processing flow that includes thresholding, rotation, rescaling, erosion, dilation, and noise removal steps based on a dataset that is formed of 560 phone screen images. However, the serial CPU-based implementation of this flow introduces a latency of 48.32 ms per image on average. Even though time scale is low in the context of a single image, this latency poses as a barrier when processing millions of images with OCR. To address this, we parallelize the entire pre-processing flow on the Nvidia P100 GPU, implement a streaming based execution, and reduce the latency to 0.846 ms. This streaming-enabled implementation enables setting up a GPU based OCR engine to process large scale workloads.
AB - Studies have shown that pre-processing digital images through scaling, rotation and blurring type of operations allow optical character recognition (OCR) to focus on the key features in the image and result in improving recognition accuracy. We leverage the open-source Tesseract OCR and show that its accuracy can be improved through a pre-processing flow that includes thresholding, rotation, rescaling, erosion, dilation, and noise removal steps based on a dataset that is formed of 560 phone screen images. However, the serial CPU-based implementation of this flow introduces a latency of 48.32 ms per image on average. Even though time scale is low in the context of a single image, this latency poses as a barrier when processing millions of images with OCR. To address this, we parallelize the entire pre-processing flow on the Nvidia P100 GPU, implement a streaming based execution, and reduce the latency to 0.846 ms. This streaming-enabled implementation enables setting up a GPU based OCR engine to process large scale workloads.
KW - CUDA
KW - GPU
KW - Image Processing
KW - Leptonica
KW - Optical Character Recognition (OCR)
KW - Tesser-act
UR - http://www.scopus.com/inward/record.url?scp=85147038887&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147038887&partnerID=8YFLogxK
U2 - 10.1109/AICCSA56895.2022.10017481
DO - 10.1109/AICCSA56895.2022.10017481
M3 - Conference contribution
AN - SCOPUS:85147038887
T3 - Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA
BT - 2022 IEEE/ACS 19th International Conference on Computer Systems and Applications, AICCSA 2022 - Proceedings
PB - IEEE Computer Society
T2 - 19th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2022
Y2 - 5 December 2022 through 7 December 2022
ER -