The International Arab Journal of Information Technology (IAJIT)


Prediction of Part of Speech Tags for Punjabi using Support Vector Machines

Part-Of-Speech (POS)tagging is a task of assigning the appropriatePOSor lexical category to each word in a natural language sentence. In this paper, we have worked on automated annotation ofPOStags for Punjabi. We have collected a corpusof around 27,000 words, which included the text from various stories, essays, day-to-day conversations, poems etc.,and divided these words into different size files for training and testing purposes. In our approach, we have used Support Vector Machine (SVM) for tagging Punjabi sentences. To the best of our knowledge, SVMs have never been used for taggingPunjabitext. The result shows that SVM based tagger hasoutperformed the existing taggers. In the existingPOS taggers of Punjabi, the accuracy ofPOStagging for unknown words is less than that for known words. But in our proposed tagger, high accuracy has been achieved for unknown and ambiguous words. The average accuracy of our tagger is 89.86%, which is better than the existing approaches.

[15]Zribi C., Torjmen A.,andBenAhmed M., Multi-Agent System for POS-Tagging Vocalized Arabic Texts, TheInternationalArabJournal of Information Technology, vol.4, no. 4, pp. 322- 329, 2007 Dinesh KumarisAssociate ProfessorinDepartmentof Information Technologyat DAV Institute of Engineering and Technology, Jalandhar, Punjab, India. Hehas done BTechdegreein ComputerScienceandEngineering, MTechdegreein Information Technology and currently,pursuing PhDdegree inComputer Engineeringfrom thePunjabiUniversity, Patiala. He is member of IEEE, ISTE andCSI (Computer Society of India).Hehas more than 12 years of teaching and research experience. He has supervised more than 10 MTechStudentsin natural language processing, machine learning and computer networks, image processing. GurpreetJosanis Assistant Professor inDepartmentof Computer Scienceat the Punjabi University, Patiala, India. He holds PhD degree inComputer Science from thePunjabi Universityin addition toMTechdegreein Computer Engineering.Hehas more than 12years of teaching and research experience.He has supervised many MTech students andis supervisingfive PhD students innatural language processing,machine learningandcomputer networks. He also leads and teaches modules at bothB.Techand M.Techlevels in computer science.