A Seq to Seq Machine Translation from Urdu to Chinese

Journal: Journal of Autonomous Intelligence DOI: 10.32629/jai.v4i1.359

zeshan ali ali

Xingjiang university

Abstract

Machine translation (MT) is a subtype of computational linguistics that uses to implement the translation between different natural languages (NL). Simply word to word exchanging on machine translation is not enough to give desire result. Neural machine translation is one of the standard methods of machine learning which make a huge improvement in recent time especially in local and some national languages. However these languages translation are not enough and need to focus on it. In this research we translate Urdu to Chinese language with the help of neural machine translation (NMT) in deep learning methods. First we build a monolingual corpus of Urdu and Chinese languages, after that we train our model using neural machine translation (NMT) and then compare the data-test result to accurate translation with the help of BLEU score method.

Keywords

Machine Translation; Deep Learning; Neural Machine Translation; Urdu Language; Chinese Language

References

1 http://www.statmt.org/wmt16/translation-task.html

1. Damerau FJ. A technique for computer detection and correction of spelling errors. Communications of the ACM 1964; 7(3): 171-176.

2. Kalchbrenner N, Blunsom P. Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing 2013.

3. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. Preprint arXiv: 1409.0473, 2014.

4. Vaswani A, et al. Tensor 2 tensor for neural machine translation. Preprint arXiv: 1803.07416, 2018.

5. Godase A, Govilkar S. Machine translation development for Indian languages and its approaches. International Journal on Natural Language Computing(IJNLC) 2015; 4(2): 55-74.

6. Khan NJ, Anwar W, Durrani N. Machine translation approaches and survey for indian languages. Preprint arXiv: 1701.04290, 2017.

7. Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 2014.

8. Bilal M, et al. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian. Decision Tree and KNN Classification Techniques 2016; 28(3): 330-344.

9. Alam M, Hussain Sibt ul. Sequence to sequence networks for Roman-Urdu to Urdu transliteration. In 2017 International Multi-topic Conference (INMIC) 2017. IEEE.

10. Mukhtar, Neelam, Khan, et al. Urdu sentiment analysis using supervised machine learning approach. International Journal of Pattern Recognition & Artificial Intelligence 2018; 32(2): 1851001.

11. Usman M, et al. Urdu text classification using majority voting. 2016; 7(8): 265-273.

12. Yang N, et al. Word alignment modeling with context dependent deep neural network. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2013.

13. Auli M, et al. Joint language and translation modeling with recurrent neural networks. 2013.

14. Liu L, et al. Additive neural networks for statistical machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2013.

15. Mikolov T, et al. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association 2010.

16. Post M, Callison-Burch C, Osborne M. Constructing parallel corpora for six Indian languages via crowd-sourcing. In Proceedings of the Seventh Workshop on Statistical Machine Translation 2012. Association for Computational Linguistics.

17. Baker P, et al. EMILLE, A 67-Million Word Corpus of Indic Languages: Data Collection, Mark-up and Harmonisation. In LREC 2002.

18. Thang Luong, Eugene Brevdo, Rui Zhao. Neural machine translation (seq2seq) tutorial. Google Research Blogpost 2017.

Copyright © 2021 zeshan ali ali

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License