The International Arab Journal of Information Technology (IAJIT)


A Bi-Level Text Classification Approach for SMS Spam Filtering and Identifying Priority Messages

Short Message Service (SMS) traffic is increasing day by day and trillions of sms are sent and received by billions of users every day. Spam messages are also increasing in same proportionate. Numbers of recent advancements are taking place in the field of sms spam detection and filtering. The objective of this work is twofold, first is to identify and classify spam messages from the collection of sms messages and second is to identify the priority or important sms messages from the filtered non-spam messages. The objective of the work is to categorize the sms messages for effective management and handling of sms messages. the work is planned in two level of binary classification wherein at the first level of classification the sms messages are categorized into the two classes spam and non-spam using popular binary classifiers, and then at the second level of classification non-spam sms messages are further categorized into the priority and normal sms messages. four state of the art popular text classification techniques namely, Naïve Bayes (NB), Support Vector Machine (SVM), Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) are used to categorize the sms text message at different levels of classification. The proposed bi-level classification model is also evaluated using the performance measures accuracy and f- measure. Combinations of classifiers at both levels are compared and it is shown from the experiments that SVM algorithm performs better for filtering the spam messages and categorizing the priority messages.


[1] Abdulla S., Ramadass S., Altaher A., and Al- Nassiri A., “Employing Machine Learning Algorithms to Detect Unknown Scanning and Email Worms,” The International Arab Journal of Information Technology, vol. 11, no. 2, pp. 140-148, 2014.

[2] Almeida T., Hidalgo J., and Silva T., “Towards SMS Spam Filtering: Results under a New Dataset,” International Journal of Information Security Science, vol. 2, no. 1, pp. 1-18, 2013.

[3] Almeida T., Hidalgo J., and Yamakami A., “Contributions to the Study of SMS Spam Filtering: New Collection and Results,” in Proceeding of ACM Symposium on Document Engineering, California, pp. 259-262, 2011.

[4] Androulidakis I., Vlachos V., and Papanikolaou A., “Spam Goes Mobile: Filtering Unsolicited SMS Traffic,” in Proceeding of IEEE 20th Telecommunications Forum, Serbia, pp. 1452- 1455, 2012.

[5] Blei D., Ng A., and Jordan M., “Latent Dirichlet Allocation,” The Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.

[6] Chang C. and Lin C.,, Last Visited 2014.

[7] Cormack G., “Email Spam Filtering: A Systematic Review,” Foundations and Trends in Information Retrieval, vol. 1, no. 4, pp. 335-455, 2007.

[8] Cormack G., Hidalgo J., and Sánz E., “Spam Filtering for Short Messages, Methodology,” in Proceeding of ACM 16th Conference on Information and Knowledge Management, Lisbon, pp. 313-320, 2007.

[9] Delany S., Buckley M., and Greene D., “SMS Spam Filtering: Methods and Data,” Expert Systems with Applications, vol. 39, no. 10, pp. 9899-9908, 2012.

[10] EL-Manzalawy Y., yasser/wlsvm/, Last Visited 2014.

[11] Feldman R. and Sanger J., The Text Mining Handbook, Cambridge University Press, 2007.

[12] Hidalgo J., Almeida T., and Yamakami A., “On the Validity of a New SMS Spam Collection,” in Proceeding of 11th IEEE International Conference on Machine Learning and Applications, Florida, pp. 240-245, 2012.

[13] International Telecommunication Union (ITU), IR.DL-2-2006- R1-SUM-PDF-E.pdf, Last Visited 2014.

[14] Jiang N., Jin Y., Skudlark A., and Zhang Z., “Understanding SMS Spam in a Large Cellular Network: Characteristics, Strategies and Defenses,” in Proceeding of the 16th International Symposium on Research in Attacks, Intrusions, and Defenses, Rodney Bay, pp. 328- 347, 2013.

[15] John G. and Langley P., “Estimating Continuous Distributions in Bayesian Classifiers,” in Proceeding of 11th Conference on Uncertainty in Artificial Intelligence, Montréal, pp. 338-345, 1995.

[16] Junaid M. and Farooq M., “Using Evolutionary Learning Classifiers to do MobileSpam (SMS) Filtering,” in Proceeding of the 13th Annual Conference on Genetic and Evolutionary Computation, Dublin, pp. 1795-1802, 2011.

[17] Lahmadi A., Delosieres L., and Festor O., “Hinky: Defending Against Text-Based Message Spam on Smartphones,” in Proceeding of IEEE International Conference on Communications, Kyoto, pp. 1-5, 2011.

[18] Lee D. and Seung H., “Learning the Parts of Objects by Non-Negative Matrix Factorization,” Nature, vol. 401, no. 6755, pp. 788-791, 1999.

[19] Liu G. and Yang F., “The Application of Data Mining in the Classification of Spam Messages,” in Proceeding of International Conference on Computer Science and Information Processing, Shaanxi, pp. 1315-1317, 2012.

[20] Liu W. and Wang T., “Index-based Online Text Classification for SMS Spam Filtering,” Journal of Computers, vol. 5, no. 6, pp. 844-851, 2010.

[21] Mahmoud T. and Mahfouz A., “SMS Spam Filtering Technique Based on Artificial Immune System,” International Journal of Computer Science, vol. 9, no. 2, pp. 589-597, 2012.

[22] Mccallum A. and Nigam K., “A Comparison of Event Models for Naive Bayes Text Classification,” in Proceeding of 15th National Conference on Artificial Intelligence Workshop on Learning for Text Categorization, Wisconsin, pp. 41-48, 1998.

[23] Mobile Marketing Association, Last Visited 2014.

[24] Modupe A., Olugbara O., and Ojo S., “Investigating Topic Models for Mobile Short Messaging Service Communication Filtering,” in Proceeding of World Congress on Engineering, London, pp. 3-5, 2013. 480 The International Arab Journal of Information Technology, Vol. 14, No. 4, July 2017

[25] Murynets I. and Jover R., “Analysis of SMS Spam in Mobility Networks,” International Journal of Advanced Computer Science, vol. 1, no. 1, pp. 1-8, 2011.

[26] Murynets I. and Jover R., “Crime Scene Investigation: SMS Spam Data Analysis,” in Proceeding of ACM Conference on Internet Measurement Conference, Massachusetts, pp. 441-452, 2012.

[27] Narayan A., “The Curse of 140 Characters: Evaluating the Efficacy of SMS Spam Detection on Android,” in Proceeding of 3rd ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, Berlin, pp. 33- 42, 2013.

[28] Nuruzzaman M., Lee C., and Choi D., “Independent and Personal SMS Spam Filtering,” in Proceeding of IEEE 11th International Conference on Computer and Information Technology, Paphos, pp. 429-435, 2011.

[29] Parimala R. and Nallaswamy R., “A Study on Analysis of SMS Classification Using Document Frequency Thresold,” International Journal of Information Engineering and Electronic Business, vol. 1, pp. 44-50, 2012.

[30] Rafique M. and Farooq M., “SMS SPAM Detection by Operating on Byte-Level Distributions Using Hidden Markov Models (Hmms),” in Proceeding of 20th Virus Bulletin International Conference, Vancouver, pp. 1-7, 2010.

[31] Ranjbarian B., Rehman M., and Lari A., “Attitude toward SMS Advertising and Derived Behavioral Intension, an Empirical Study Using TPB (SEM method),” Journal of American Science, vol. 8, no. 7, pp. 297-307, 2012.

[32] Tan H., Goharian N., and Sherr M., “$100,000 Prize Jackpot. Call Now! Identifying the Pertinent Features of SMS Spam Categories and Subject Descriptors,” in Proceeding of 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, Oregon, pp. 1175-1176, 2012.

[33] UCI Spam Collection- +Collection#, Last Visited 2014.

[34] Uysal A., Gunal S., Ergin S., and Gunal E., “The Impact of Feature Extraction and Selection on SMS Spam Filtering,” Electronics and Electrical Engineering, vol. 19, no. 5, pp. 67-72, 2013.

[35] Uysal A., Gunal S., Ergin S., and Gunal E., “A Novel Framework for SMS Spam Filtering,” in Proceeding of IEEE International Symposium on Innovations in Intelligent Systems and Applications, Trabzon, pp. 1-4, 2012.

[36] Uysal A., Gunal S., Ergin S., and Gunal E., “Detection of SMS Spam Messages on Mobile Phones,” in Proceeding of 20th IEEE Signal Processing and Communications Applications Conference, Mugla, pp. 1-4, 2012.

[37] Vapnik V., The Nature of Statistical Learning Theory, Springer, 1995.

[38] Wang Q., Han X., and Wang X., “Studying of Classifying Junk Messages Based on The Data Mining,” in Proceeding of IEEE International Conference on Management and Service Science, Wuhan, pp. 1-4, 2009.

[39] Waikato Environment of Knowledge Analysis,, Last Visited 2014.

[40] Xu Q., Xiang E., Yang Q., Du J., and Zhong J., “SMS Spam Detection Using Noncontent Features,” IEEE Intelligent Systems, vol. 27, no. 6, pp. 44-51, 2012.

[41] Xu W., Liu X., and Gong Y., “Document Clustering Based on Non-Negative Matrix Factorization,” in Proceeding of 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, pp. 267-273, 2003.

[42] Yadav K., Kumaraguru P., Goyal A., Gupta A., and Naik V., “Smsassassin: Crowdsourcing Driven Mobile-Based System for SMS Spam Filtering, System,” in Proceeding of 12th Workshop on Mobile Computing Systems and Applications, Arizona, pp. 1-6, 2011.

[43] Yang Y., Yoo S., Lin F., and Moon I., “Personalized Email Prioritization Based on Content and Social Network Analysis,” IEEE Intelligent Systems, vol. 25, no. 4, pp. 12-18, 2010.

[44] Yoon J., Kim H., and Huh J., “Hybrid Spam Filtering for Mobile Communication,” Computers and Security, vol. 29, no. 4, pp. 446- 459, 2010. Naresh Kumar Nagwani has completed his graduation in Computer Science and Engineering in 2001 from G. G. Central University, Bilaspur. He completed his post-graduation Master of Technology in Information Technology from ABV-Indian Institute of Information Technology, Gwalior in 2005 and completed the Ph.D. in Computer Science and Engineering in 2013 from National Institute of Technology Raipur, India. His area of interest is data mining, text mining, mining software repositories and information retrieval. His employment experience includes Software Developer and Team Lead at Persistent Systems Limited and presently Assistant Professor at NIT Raipur. He has published more than 20 research papers in various journals and conferences.