In a significant development, a research team from IIT Madras has developed a common unified script for nine Indian Languages.
According to different sources, the script was developed taking cue from European languages, were several of them had the same (Roman letter–based) script. The team led by Srinivasa Chakravathy’s team at IIT Madras developed developed a unified script for nine Indian languages, named the Bharati script. The team has now gone a step further since developing the script: it has developed a method for reading documents in Bharati script using a multi-lingual optical character recognition (OCR) scheme. The team has also created a finger-spelling method that can be used to generate a sign language for hearing-impaired persons. In collaboration with TCS Mumbai, the researchers have found a way for persons with hearing disability to generate signatures using this finger-spelling technique.
Though scripts including Devnagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, Malayalam and Tamil were integrated, English and Urdu have not been integrated so far. It should be noted that Urdu and English alphabet systems have a very different phonetic organisation. But that does not mean a mapping is not possible. It is quite possible and can be done.
In general, optical character recognition schemes involve first separating (or segmenting) the document into text and non-text. The text is then segmented into paragraphs, sentences words and letters. Each letter has to be recognised as a character in some recognisable format such as ASCII or Unicode. The letter has various components such as the basic consonant, consonant modifiers, vowels etc.