New models for handwriting recognition in online Latin and Arabic scripts

New models for handwriting recognition in online Latin and Arabic scripts
The architecture of OnHS-LSTM. Credit: Akouaydi et al.

Researchers at the University of Sfax, in Tunisia, have recently developed a new method to recognize handwritten characters and symbols in online scripts. Their technique, presented in a paper pre-published on arXiv, has already achieved remarkable performance on texts written in both the Latin and Arabic alphabet.

In recent years, researchers have created -based architectures that can tackle a variety of tasks, including image classification, , processing (NLP), and many more. Handwriting recognition systems are computer tools that are specifically designed to recognize characters and other hand-written symbols in a similar way to humans.

In their early years of life, in fact, human beings innately develop the ability to understand different types of handwriting by identifying specific characters both individually and when grouped together. Over the past decade or so, many studies have tried to replicate this ability in , as this would ultimately enable more advanced and automatic analyses of handwritten texts.

"Our paper handles the problem of online handwritten script recognition based on an extraction features system and deep approach system for sequence classification," the researchers wrote in their paper. "We used an existent method combined with new classifiers in order to attain a flexible system."

In their paper, the researchers at the University of Sfax present two systems based on : an online handwriting segmentation and recognition system that uses a long short-term memory network (OnHSR-LSTM) and an online handwriting recognition system composed of a convolutional long short-term memory network (OnHR-covLSTM).

New models for handwriting recognition in online Latin and Arabic scripts
The architecture of (a) OnHR-convLSTM, (b) the convLSTM cell. Credit: Akouaydi et al.

Their first model, dubbed OnHSR-LSTM, is based on a theory that describes the human perceptual system as a means of transforming language from graphical marks into symbolic representations. It works by detecting common properties of symbols or characters and then arranging them according to specific perceptual laws, for instance, based on proximity, similarity, etc.

"Finally, it [the model] attempts to build a representation of the handwritten form based on the assumption that the perception of form is the identification of basic features that are arranged until we identify an object," the researchers explained in their paper. "Therefore, the representation of handwriting is a combination of primitive strokes. Handwriting is a sequence of basic codes that are grouped together to define a or a shape."

The first technique proposed by the researchers essentially divides a handwritten script into individual elliptic strokes using a model of generation. Subsequently, these strokes are classified into primitive codes, which are used by the neural architecture to recognize words in online handwritten scripts.

The second system proposed by the researchers, OnHR-convLSTM, is a generative model that uses a script's online signal as input and is trained to predict both characters and words. This second technique is particularly useful for sequence learning tasks (i.e. tasks that involve the processing and classification of long sequences of characters and symbols).

The researchers trained and evaluated both their systems using five different databases containing handwritten scripts in the Arabic and Latin alphabets. Their tests yielded remarkable results, with both systems achieving recognition rates of over 98 percent. Interestingly, the researchers found that the performance of both techniques is comparable to that typically achieved by human subjects in similar tasks.

"We now plan to build on and test our proposed recognition systems on a large-scale database and other scripts," the researchers wrote.

More information: Neural architecture based on fuzzy perceptual representation for online multilingual handwriting recognition. arXiv:1908.00634 [cs.CV].

© 2019 Science X Network

Citation: New models for handwriting recognition in online Latin and Arabic scripts (2019, August 20) retrieved 28 May 2024 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

A CNN-based method for math formula script and type identification


Feedback to editors