A Hybrid Approach for Urdu Sentence Boundary

 Sentence  boundary  identification  is  a  preliminary  s tep  for  preparing  a  text  document  for  Natural  Langu age  Processing tasks, e.g., machine translation, POS ta gging, text summarization and etc. We present a hyb rid approach for Urdu  sentence boundary disambiguation comprising of unig ram statistical model and rule based algorithm.  After implementing this  approach,    we  obtained  99.48%  precision,  86.35%  rec all  and  92.45%  F1-Measure  while  keeping  training  and  testing  data  different from each other, and with same training a nd testing data, we obtained  99.36% precision, 96. 45% recall and 97.89%  F1-Measure.    

Zobia Rehman is a lecturer at COMSATS Institute of Information Technology, Pakistan since October 2009. She did her MS in computer science from COMSATS in 2009. Her area of interest is natural language processing and artificial neural networks. Waqas Anwar is working in COMSATS Institute of Information Technology, Pakistan as assistant professor since April 2008. He got his PhD degree in Computer application technology from Harbin Institute of Technology, PR China in 2008. He did Masters in computer science from Hamdard University, Pakistan in 2001. He is an active researcher and his areas of interest are Natural language processing and computational intelligence.