Ocr system for myanmar language

OCR SYSTEM FOR MYANMAR LANGUAGE PDF

Each individual was asked to write down all the Manipuri characters on one A4-size paper.

It is the most widely spoken language in Northeast India after Bengali and Assamese languages. In this work, we introduce a handwritten Manipuri Meetei-Mayek character dataset which consists of more than 5000 data samples which were collected from a diverse population group that belongs to different age groups (from 4 years to 60 years), genders, educational backgrounds, occupations, communities from three different districts of Manipur, India (Imphal East District, Thoubal District and Kangpokpi District) during March and April 2019.

This language is also used by a significant number of people as their communicating language over the north-east India, and some parts of Bangladesh and Myanmar. It is the official language and lingua franca of the southeastern Himalayan state of Manipur, in northeastern India. Myanmar language is the official language and widely used in. And their OCR works for bharati script only. To the best of our knowledge, no benchmark dataset exists for handwritten character recognition of Manipuri Meetei-Mayek script in public domain so far. Manipuri, also referred to as Meeteilon or sometimes Meiteilon, is a Sino-Tibetan language and also one of the Eight Scheduled languages of Indian Constitution. OCR, which is used to enhance the accuracy in recognition levels. In the first place, we have our 1000s of years evolved script, that they want us to change to Bharati script.

OCR SYSTEM FOR MYANMAR LANGUAGE PDF

The proposed algorithms have been tested on a variety of Myanmar printed documents and the results of the experiments indicate that the methods can increase the segmentation accuracy as well as recognition rates.A benchmark dataset is always required for any classification or recognition system. Free online OCR service that allows to convert scanned images, faxes, screenshots, PDF documents and ebooks to text, can process 122 languages and supports. To reveal the effectiveness of the segmentation technique, the authors follow a new hybrid feature extraction method and choose the SVM classifier for recognition of the character image. In order to get more accurate system, the authors propose the method for isolation of the character image by using not only the projection methods but also structural analysis for wrongly segmented characters. Text translation: Translate between 108 languages by typing Tap to Translate: Copy text in any app and tap the Google Translate icon to translate (all. An initial study is d e-scribed to create compa rable data for Tesseract training and evaluation based on two approaches to character segme n-tation of Indic scripts logical vs. While OCR invoice processing has radically improved speed and accuracy for accounts payable teams, it remains an incomplete solution to meeting the needs of modern AP.

In addition, there is no system that can recognize the documents that are written in Myanmar and English. Language data for the Tesseract OCR system currently supports recognition of a number of languages written in Indic writing scripts. Therefore, the authors design an Optical Character Recognition System for Myanmar Printed Document (OCRMPD), with several proposed techniques that can automatically recognize Myanmar printed text from document images. OCR system for Myanmar language is in little effort. However, the state of the art OCR systems cannot do for Myanmar scripts as the language poses many challenges for document understanding. AbstractAutomatic machine-printed Optical Characters or texts Recognizers (OCR) are highly desirable for a multitude of modern IT applications, including Digital Library software.