The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


An Artificial Neural Network Approach for

Sentence  boundary  identification  is  an  important  st ep  for  text  processing  tasks,  e.g.,  machine  translation,  POS  tagging, text summarization etc., in this paper, we  present an approach comprising of Feed Forward Neu ral Network (FFNN)  along with part of speech information of the words  in a corpus. Proposed adaptive system has been test ed after training it with  varying  sizes  of  data  and  threshold  values.  The  bes t  results,  our  system  produced  are  93.05%  precision ,  99.53%  recall  and  96.18% f-measure.  


[1] CLE Center for Language Engineering., available at: http://www.cle.org.pk, last visited 2014.

[2] Humphrey L., Period Disambiguation using a Neural Network, in Proceedings of International Joint Conference on Neural Networks , Washington, USA, 1989.

[3] Hung B., Le M., and Shimazu A., Sentence Splitting for Vietnamese6English Machine Translation, in Proceedings of the 4 th International Conference on Knowledge and Systems Engineering , Danang, Vietnam, pp. 1566 160, 2012.

[4] Kiss T. and Strunk J., Unsupervised Multilingual Sentence Boundary Detection, Journal of MIT Press , vol. 32, no. 4, pp. 4856525, 2006.

[5] Malik A., A Hybrid Model for Urdu Hindi Translation, in Proceedings of Named Entities Workshop , Singapore, pp. 1776185, 2009.

[6] Mobarakeh I. and Bidgoli M., Verb Detection in Persian Corpus, International Journal of Digital Content Technology and its Applications , vol. 3, no. 1, pp. 58665, 2009.

[7] Muaz A., Analysis and Development of Urdu POS Tagged Corpus, in the Proceedings of the 7 th Workshop on Asian Language Resources , Suntec, Singapore, pp. 24629, 2009.

[8] Palmer D. and Hearst M., Adaptive Sentence Boundary Disambiguation, in Proceedings of the 4 th Conference on Applied Natural Language Processing , Association for Computational Linguistics , Germany, pp. 78683, 1994.

[9] Palmer D., Experiments in Multilingual Sentence Boundary Recognition, in Proceedings of Recent Advances in Natural Language Processing , Bulgaria, pp. 166, 1995.

[10] Poornima C., Rule Based Sentence Simplification for English to Tamil Machine Translation System, the International Journal of Computer Applications , vol. 25, no. 8, pp. 38642, 2011. 400 The International Arab Journal of Informa tion Technology, Vol. 12, No. 4, July 2015

[11] Rehman Z. and Anwar W., A Hybrid Approach for Urdu Sentence Boundary Disambiguation, the International Arab Journal of Information Technology , vol. 9, no. 3, pp. 2506255, 2012.

[12] Rehman Z., Anwar W., and Bajwa U., Challenges in Urdu Text Tokenization and Sentence Boundary Disambiguation, in Proceedings of the 2 nd Workshop on Southeast Asian Natural Language Processing , Chiang Mai, Thailand, pp. 40645, 2011.

[13] Sarle W., Neural Networks and Statistical Models, in Proceedings of the 19 th International Conference Annual SAS Users Group , Texas, USA, pp. 153861550, 1994.

[14] Sivanandam S., Introduction to Neural Networks using Matlab 6.0 , McGraw6Hill, 2006. Shazia Raj received her BS degree in computer science from COMSATS Institute of Information Technology, Pakistan, in 2012. Currently, she is an MS scholar at the same institute. Her area of research is natural language processing. Zobia Rehman received her MS degree in computer science from COMSATS Institute of Information technology, Pakistan in 2009. Currently, she is a PhD scholar at Lucian Blaga University of Sibiu, Romania. Her Area of research is natural language processing. Sonia Rauf received her MS degree in computer science from COMSATS Institute of Information Technology, Pakistan in 2010. Currently, she is a lecturer at the same institute. Her Areas of research are artificial intelligence, machine learning and medical image processing. Rehana Siddique received her BS degree in computer science from COMSATS Institute of Information Technology, Pakistan, in 2012. Her area of research is natural language processing. Waqas Anwar received his PhD degree in computer science from Harbin Institute of Technology Harbin, China in 2008. Currently, he is an Associate Professor in the department of Computer Science at COMSATS Institute of Information Technology, Pakistan. His areas of research are nat ural language processing and computational intelligence.