USU Conference Systems, The 2015 International Conference on Electrical, Telecommunication and Computer Engineering

Font Size: 
Utilization of Self-Organizing Map for Recognizing Image as Input to the Digital Dictionary Based on Android
Dani Gunawan, Dedy Arisandi, Romi Fadillah Rahmat, Fajar Matius Ginting

Building: Arya Duta
Room: Main Hall
Date: 2015-11-27 02:00 PM – 03:30 PM
Last modified: 2015-11-24

Abstract


1. Introduction

Searching translation for Russian word by using digital dictionary in a smartphone might be as difficult as searching by using dictionary book. This is caused by the Russian letters are not like common letters by the most visitors. As well as its other neighbor countries, Russia uses letters called Cyrillic instead of Latin. This problem will affect the ability of visitors to translate it.

2. Exposition

From the statement mentioned earlier, the problem in translating Russian word is how to provide the word to be an input to the digital dictionary. It is difficult to search the Russian word in dictionary book or type it in digital dictionary either. According to those problems, how if we change the way users input the word to the dictionary? We propose to provide the input by using image captured by the smartphone’s camera. With this approach, we expect the application will help 28 million visitors who visit Russia (UNWTO, 2014) to translate the Russian word easier.

However, using the image from the smartphone’s camera emerges a new problem. As the computer can not read the text directly from the image, it needs a preprocessing phase to read the text in the image. Several researches in this area had been conducted. First of all, Limbing uses self-organizing map (SOM) to recognize Latin letters in an image (Limbing, 2007). Limbing found that SOM could recognize 70.43% sentences and 55.62% paragraphs for the font that has not been trained before. Next, Prarian creates an application that utilizes SOM to recognize Lampung letters. Images are retrieved from web camera and print screen application from the computer (Prarian, 2010). By utilizing SOM, the application could recognize 75% Lampung letters. Moreover, Fauziah also utilizes SOM to recognize Korean letters which are retrieved from print screen application (Fauziah, 2010). The result is SOM could recognize 97.31% Korean letters.

3. Results

Source of images are retrieved from smartphone’s camera or image gallery. As the image contains another shape other than the Russian letters, it needs to be cropped to get the preferred word. After cropping, the image should be preprocessed to make the letters clearer as shown in Figure 1. The result of image preprocessing is isolated characters from the preferred word. Then these characters will be extracted to binaries.

SOM algorithm is used to recognize the word from the feature extraction. System is needed to be trained before being used to recognize the word as shown in Figure. 1. Training data is done by using 11 different Cyrillic fonts. Each font contains 33 uppercase and 33 lowercase letters. For testing data, we take 350 photos of different words. It successfully recognizes 292 words and while 58 other words is not correctly recognized.

4. Conclusion

A system is able to recognize the patterns of Cyrillic letters after being trained by Self-Organizing Map algorithm. After being tested with 350 different Russian words, it recognizes 83.42% Russian words correctly.

References

Fauziah, F. (2010). Sistem Penerjemah Huruf Korea ke Huruf Latin dan Bahasa Indonesia Berbasis Pengolahan Citra Digital dan Jaringan Syaraf Tiruan Self-Organizing Map (SOM).

Limbing, S. S. (2007). Pembuatan Aplikasi Perangkat Lunak Pengenalan Huruf Cetak pada File Text Hasil Scanning dengan Menggunakan Metode Kohonen Neural Network.

Prarian, C. (2010). Desain dan Implementasi Sistem Penerjemah Aksara Lampung ke Huruf Latin Berbasis Pengolahan Citra Digital dan Jaringan Syaraf Tiruan Self-organizing Map (SOM).

UNWTO. (2014). UNWTO Tourism Highlight 2014 Edition. United Nations World Tourism Organization.