Syed Faraz Ali 1/8

Issues In English to Indian Sign Generation and Translation model and developed Corpus based Translation System to tackle those issues

Syed Faraz Ali

Computer Science and Engineering

Sharda University

Greater Noida, India

Syed Faraz Ali 1/8

Abstract: -- Sign language is used by deaf and hard of hearing people throughout the world. The sign language used in India is Indian Sign language -ISL. This paper explores the application of data-driven sign generation model for Indian sign language (ISL). The provision of an Indian Sign Language generation system can facilitate communication between Deaf and hard of hearing people by translating information into the native and preferred language of the individual. We have developed an Indian Sign Language generation system by which animated signs can be displayed according to the inputted text. The proposed system enables to generate signs by inputting text even having no knowledge in sign language. There is a detailed explanation of our system, describing different modules of our system developed. This thesis also gives our approach for developing the ISL translation system.

1.  Introduction

Human interaction is not possible if communication didn’t exist. For normal people there in no problem, they use proper spoken languages for communication but as we consider the physically disabled people like deaf and dumb there is a communication, these people cannot interact like normal people, and they need the help of sign language for their interaction. Sign language is the language which uses manual communication (physical body movements) and facial gestures to convey message and thoughts.

The sign language is also used by the people who can hear but cannot physically speak. Wherever there are deaf people or community sign language exists. On the basis of the deaf population at one region sign language can be categorized in the following:

Home sign language: Where there is only one person in the family who is deaf or hard of hearing the language he uses to communicate with other family members is the home sign language.

Village sign language: Where there is more the one person deaf or hard of hearing people in the village the sign language used by these people to communicate is the village sign language

Deaf community sign language: When deaf people from different places come together to formulate a standardized signs for communicating this type of language created is the deaf community sign language as the deaf community have developed it.

Basically Signs of Sign Languages consists of two features

Manual features : The manual communication involves the movement of hands and fingers for the communication. In this type of communication the signer tries to convey message using the movement of hands and fingers.

Nonmanual features: This communication involves the facial expression and body gestures. The facial expression of the signer tells the listener what the signer is trying to say.

We have made an effotrs to help these disabled people of india by developing a system that could help these deaf and dumb people for communicating with the normal world. This paper discusses our approach for developing English text to indian sign language translation and sign generation. This paper gives the overview of the previously proposed systems and the detailed structure and interface of our developed system. It also discusses the issues that came up while developing the system and different obstacles that have to be tackled for developing the perfect system.

1.1  INDIAN SIGN LANGUAGE

All around the world there are different communities of deaf and dumb people and thus the language of these communities will be different. Just as there are many spoken languages in the world like English, French, Urdu, etc., similarly there are different sign languages and different expressions used by hearing disabled people worldwide. The Sign Language USA is American Sign Language (ASL); British Sign Language (BSL) is used in Britain; and Indian Sign Language (ISL) is used in India for expressing thoughts and communicating with each other.

The interactive systems are already developed for many sign language e.g. for ASL and BSL etc. To help hearing impaired people in India to interact with others we are developing the system that will translate the English text to the text of Indian sign language which can further be represented in ISL. Since it is difficult to generate signs for each verb/phrase in the vocabulary or dictionary, we will limit experiments in a domain, like we will try to develop the system for railways that will display the signs accordingly .We will take all the possible conversations from the railways enquiry/reservation counters and will then analyse and find the respective signs used in ISL.

India is a large country with the population of 1,241,491,960 (Google Public Data) .In India there are 30 states and the languages used in most of the states there are their local language e.g. Kashmiri is spoken in Kashmiri, Punjabi is spoken in Punjab similarly there is slight difference in the sign language in different parts of India.

2.  Related work

Since we are dealing with the translation model for indian sign language , we will be discussing the models proposed for indian sign language For spoken languages Machine Translation is a booming area of research and development. It can be inferred from the proliferation of different Machine Translation products for sale, such as Systran and Language Weaver , as well as freely available on-line Machine Translation tools such as AltaVista’s Babel Fish and Google Translate . The funding of large research projects such as the Global Autonomous Language Exploitation (GALE) project , the TCSTAR project and the most recent Centre for Science, Engineering and Technology (CSET) project on Next Generation Localisation further demonstrate the importance given to such areas of research in the Europe and the US . The same level of activity cannot be said for Sign Language Machine Translation, with little more than a dozen systems having tackled this area of translation. Most papers describe prototype systems that often focus primarily on Sign Language generation rather than applying Machine Translation techniques to these visual-gestural languages.

According to Dorr et al [1] the machine translation systems can be grouped into three basic designs:

·  Direct

·  Transfer

·  Inter-lingual

In direct there is word to word conversion, none of the other aspects of the sentences are taken into consideration. This means that the transfer rules that perform this type of conversation fully depend on the source language. The transfer systems analyse the input text to syntactic and semantic level, here the transfer rules that perform this type of conversation is dependent on both source and target language. And for the last interlingual architecture the analysis of the source language text should result in the representation of the text that is independent of the source language. The systems are categorised on two basis

2.1  RULE BASED APPROACHES

The rule based approaches came into existence in 1976 and gained their position in the research field. Rule-based approaches may be sub-classified into transfer and inter-lingua based methodologies in transfer we know the syntactic and semantic analysis takes place and then the translation takes place. The interlingua is the top level phase in the machine translation pyramid as seen in Dorr et al [1] pyramid. In a transfer approach, analysis of the source language input sentence is usually shallow (when compared with interlingual approaches rather than a direct methodology) and on a syntactic level, often producing constituent structure-based parse trees. Interlingual approaches tend to enact a deeper analysis of the source language sentence that creates structures of a more semantic nature. The transfer systems analyse the input text to syntactic and semantic level, here the transfer rules that perform this type of conversation is dependent on both source and target language. And for the last interlingual architecture the analysis of the source language text should result in the representation of the text that is independent of the source language. Summarising some of the rule based systems as under:

Purushottam Kar et al [2] in their work have developed a system named INGIT . It is a cross-model translation system from Hindi strings to Indian Sign Language for possible use in the Indian Railways reservation counters. The system translates input from the reservation clerk into Indian Sign Language, which can be then displayed to the ISL user. They have used Fluid Construction Grammar (FCG) [3] , for constructing the grammar for Sign language. In this the domain-specific construction grammar for Hindi is implemented in FCG. This grammar converts the input into a thin semantic structure which is an input to ellipsis resolution, after which a saturated semantic structure is obtained. Depending on the type of utterance (statement, query, negation, etc.) a suitable ISL-tag structure is generated by the ISL generator. This is then passed to a HamNoSys [4] [5] converter to generate the graphical simulation.

For validating the system, they collected small corpus on six different days. This corpus was based on interaction with speaking clients at a computer reservation counter. They after evaluation found the interaction constituted 230 words, of which many were repeated. The vocabulary of 90 words included 10 verbs in various morphological forms (e.g. work, worked, working etc.), 9 words related to time, 12 words specific to the domain (e.g. ticket, tatkal, etc.), Other words were numerals (15), names of months (12), cities (4) and trains (4) as well as digits particles etc. The INGIT system has three main modules:

·  Input parser

·  Ellipsis Resolution Module

·  ISL Generator (including ISL lexicon with HamNoSys [4] [5] phonetic descriptions)

Their system cannot show the non-manual features like facial expressions, gestures, etc. Their system has a restricted domain i.e. it is only applicable for railway systems. The vocabulary of sign language will be very small.

2.2  DATA BASED APPROACH

This is also known as the corpus based approach or example based approach. In this there takes direct mapping as in the last level of the pyramid of machine translation in Dorr . Data-driven approaches came into existence in the 1990s and now dominate the research field. This approach, often termed ‘corpus-based’, can be sub-divided into statistical Machine translation and example–based machine translation. Compared to rule-based approaches, there are fundamental differences in both data-driven processes yet they remain inherently similar. In general, linguistic information and rules are eschewed in favour of probabilistic models collected from a large parallel corpus.

In the data based approaches the dataset is generated that is huge and the direct mapping between the words takes place. This approach is booming as there is no dictionary for the sign languages this approach might help to bring up one , and if the corpus collected for all the languages is taken into consideration might help to build the standardised sign language in future.

The systems fro this technique has not been developed till now for indian sign language, we have taken this approach for the development of our system.

3.  Issues For Sign Language Translation

In India for Indian Sign Language the only one system has been developed i.e. INGIT. For many different countries there is work going on sign language to help the deaf and dumb people of their country .So to help the deaf and dumb people of our country I am taking an initiative towards building this system . It will help these people that have been off-track from present fast growing world to communicate with us.

As mentioned above India is a very large country which is 2nd largest in population. Thus in proportion to population it can be predicted that it might have the largest number of the deaf and dumb people. So for these people we are making effort to develop this system. Some of the issues for developing the system are described here.

3.1  LOCAL VARIATION

In this as we are aware that the India is a large country with the population of 1,241,491,960 (Google Public Data). In India there are 30 states and the languages used in most of the states there are their local language e.g. Kashmiri is spoken in Kashmiri, Punjabi is spoken in Punjab similarly there is slight difference in the sign language in different parts of India. This shows there is a variation in the languages as we move from one state to another, not only this, in some states there is variation in languages within e.g. in Jammu and Kashmir the Kashmiri is spoken in Kashmir and in Jammu ‘Dogri’ is spoken this variation is not only in the spoken languages, it can also be seen in the sign languages.

Sign language just like spoken languages varies from place to place. There are three categories in which these languages can be categorised. This variation creates a barrier for making the efficient system for translation. Also there is no standard for Indian sign language; this is a very important issue to look upon because without any standard language for Indian sign language it will be difficult for us to design the system. This is because for one single word we might have different signs and there will be confusion for this situation.

To tackle this problem we will use the mostly used sign for the word to be translated and then use the same sign to depict the translation of the given word.