Downloads 937

..............................

Views 2k

..............................

Cited by 5

..............................

Received date September 18, 2013

Accepted date February 28, 2014

Prediction of Part of Speech Tags for Punjabi using Support Vector Machines

Author Abstract:Part-Of-Speech (POS)tagging is a task of assigning the appropriatePOSor lexical category to each word in a,

Keywords #POS tagging #SVM #feature set #vectorization #machine learning #tagger #punjabi #indian languages

Abstract Part-Of-Speech (POS)tagging is a task of assigning the appropriatePOSor lexical category to each word in a natural language sentence. In this paper, we have worked on automated annotation ofPOStags for Punjabi. We have collected a corpusof around 27,000 words, which included the text from various stories, essays, day-to-day conversations, poems etc.,and divided these words into different size files for training and testing purposes. In our approach, we have used Support Vector Machine (SVM) for tagging Punjabi sentences. To the best of our knowledge, SVMs have never been used for taggingPunjabitext. The result shows that SVM based tagger hasoutperformed the existing taggers. In the existingPOS taggers of Punjabi, the accuracy ofPOStagging for unknown words is less than that for known words. But in our proposed tagger, high accuracy has been achieved for unknown and ambiguous words. The average accuracy of our tagger is 89.86%, which is better than the existing approaches.

References

[1]Antony P.andSoman K., basedPartof Speech TaggerforKannada, in Proceedings of International Conference on Machine Learning and Cybernetics,Qingdao, pp. 2139-2144, 2010.

[2]Antony P., Mohan S.,andSoman K., VM basedPartofSpeech Taggerfor Malayalam, in Proceedings ofInternational Conference on Recent Trends in Information, Telecommunication and Computing, Kerala, India, pp. 339-341, 2010.

[3]Charniak E., Hendrickson C.,Jacobson N., and Perkowitz M.,tions forPart-of-Speech Tagging, available at: http://cs.brown.edu/research/pubs/pdfs/1993/Cha rniak-1993-EPT.pdf,last visited1993.

[4]Ekbal A.andSpeech Taggingin Bengali usingSupport Vector Machine, in Proceedings ofInternational Conference on Information Technology, Bhubneswar, India, pp. 106-111, 2008.

[5]Gill M., Lehal G.,andJoshi S., ofSpeech TaggingforGrammar Checkingof Punjabi, the Linguistic Journal, vol. 4, no. 1, pp. 6-21, 2009.

[6]Gimenez J.andMarquez L., andAccurate Part-of-Speech Tagging: The SVMApproach Revisited, available at: http://nlp.lsi.upc.edu/ papers/gimenez03.pdf,last visited2004.

[7]Kashyap D.andJosan G., A Trigram Language Model to Predict Part of Speech Tags Using Neural Network, in Proceedings of the14th International Conference, IDEAL, Hefei, China, pp. 513-520, 2013

[8]Kumar D.andJosan G., aTagset forMachine LearningbasedPos Taggingin Punjabi, International Journal of Applied Research on Information Technologyand Computing,vol. 3, no. 2, pp. 132-143, 2012.

[9]Laferty J.,McCallum A.,andPereira F., Random Fields: Probabilistic ModelsforSegmentingandLabeling Sequence Data, in Proceedings of the8thInternational Conference on Machine Learning,San Francisco, USA, pp. 282-289, 2001. 608The International Arab Journal of Information Technology, Vol. 13, No. 6,November2016

[10]Mikheev A., Rule Inductionfor Unknown-Word Guessing, Computational Linguistics,vol. 23, no. 3, pp. 405-423, 1997.

[11]Orphanos G.andChristodoulakis D., DisambiguationandUnknown Word Guessing withDecision Trees, inProceedings of the9th conference on European chapter of the Association for Computational Linguistics, Stroudsburg, USA, pp. 134-141,1999.

[12]Maximum Entropy Modelfor Part-of-Speech Tagging, available at: http://www.aclweb.org/anthology/W96-0213,last visited1996.

[13]Schmid H.,Part-of-Speech TaggingusingDecision Trees, in Proceedings ofInternational Conference on new methods in language processing, Manchester, UK,pp. 44- 49, 1994.

[14]Sharma S.andLehal G., Hidden Markov ModeltoImprovetheAccuracyof Punjabi POS Tagger, in Proceedings ofIEEE International Conference Computer Science and Automation Engineering,Shanghai,pp. 697-701, 2011.

[15]Zribi C., Torjmen A.,andBenAhmed M., Multi-Agent System for POS-Tagging Vocalized Arabic Texts, TheInternationalArabJournal of Information Technology, vol.4, no. 4, pp. 322- 329, 2007 Dinesh KumarisAssociate ProfessorinDepartmentof Information Technologyat DAV Institute of Engineering and Technology, Jalandhar, Punjab, India. Hehas done BTechdegreein ComputerScienceandEngineering, MTechdegreein Information Technology and currently,pursuing PhDdegree inComputer Engineeringfrom thePunjabiUniversity, Patiala. He is member of IEEE, ISTE andCSI (Computer Society of India).Hehas more than 12 years of teaching and research experience. He has supervised more than 10 MTechStudentsin natural language processing, machine learning and computer networks, image processing. GurpreetJosanis Assistant Professor inDepartmentof Computer Scienceat the Punjabi University, Patiala, India. He holds PhD degree inComputer Science from thePunjabi Universityin addition toMTechdegreein Computer Engineering.Hehas more than 12years of teaching and research experience.He has supervised many MTech students andis supervisingfive PhD students innatural language processing,machine learningandcomputer networks. He also leads and teaches modules at bothB.Techand M.Techlevels in computer science.