The International Arab Journal of Information Technology (IAJIT)


2009 Filtering Spam E-Mail from Mixed Arabic and

Alaa El-Halees,

[1] Androutsopoulos I., Koutsias J., Chandrinos K., Paliouras G., and Spyropoulos C., An Evaluation of Na ve Bayesian Anti-Spam Filtering, in Proceedings of the Workshop on Machine Learning in the New Information Age 11 th European Conference on Machine Learning (ECML 2000) , pp. 9-17 , Spain, 2000.

[2] Androutsopoulos I., Zaragoza H., Gallinari P., and Rajman M., Learning to Filter Spam E- Mail: A Comparison of A Na ve Bayesian and A Memory Based Approach, in the Proceedings of the 4 th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000) , pp. 149-162, France, 2000.

[3] Apte C., Damerau F., and Weiss S., Towards Language Independent Automated Learning of Text Categorization Models, in Proceeding of Research and Development in Information Retrieval , pp. 23-30, New York, 1994.

[4] Berger A., Pietra D., and Pierta D., A Maximum Entropy Approach to Natural Language Processing, Computer Journal Computational Linguistics , vol. 22, no. 4, pp. 39-71, 1996.

[5] Berger A., The Improved Iterative Scaling Algorithm: A Gentle Introduction, Technical Report , 1997 .

[6] Carreras X. and Marquez L. Boosting Trees for Anti-Spam Email Filtering , in the Proceedings of Recent Advances in NLP (RANLP-2001) , pp. 58-64, Bulgaria, 2001.

[7] Chinchor N., Named Entity Task Definition, in Proceedings of the Seventh Message Understanding Conference , pp. 137-142, US, 1998.

[8] Clark J., Koprinska I., and Poon J., A Neural Network Based Approach to Automated E-Mail Classification, in Proceeding of IEEE International Conference on Web Intelligence (WI'03) , pp. 450-453, Hong Kong, 2003.

[9] Cohen P., Empirical Methods for Artificial Intelligence , MIT Press Cambridge, MA. 1995.

[10] Cohen W., Learning Rules to Classify Email, in Processing of the 1996 AAAI Spring Symposium on Machine Learning in Information Acces s, Stanford, pp. 88-95, 1996.

[11] Cortes C. and Vapnik V., Support-Vector Networks, Machine Learning, 1995.

[12] Darroch J. and Ratcliff D., Generalized Iterative Scaling for Long_Linear Model, Annals of Mathematical Statistics , vol. 43, no. 5, pp. 1470- 1480, 1972.

[13] Dasarathy B., Nearest Neighbor Norms: NN Pattern Classification Techniques , IEEE Computer Society Press, Uk, 1991.

[14] Diao Y., Lu H., and Wu D., A Comparative Study of Classification Bases Personal E-Mail Filtering, in the Proceedings of 4 th Pacific-Asia Conference on Knowledge Discovery and Data Mining 2000 ( PAKDD-00 ), pp. 62-73 , 2000.

[15] Dong J., Cao H., Liu R., and Ren L. , Bayesian Chinese Spam Filter Based on Crossed N-Gram , in Proceedings of the 6 th International Conference on Intelligent Systems Design and Applications (ISDA'06) , pp. 103-108 , Shandong, 2006.

[16] Drucker H., Wu D., and Vapnik V., Support Vector Machines for Spam Categorization, IEEE Transactions on Neural Networks , vol. 10, no. 5, pp. 1048-1054, 1999.

[17] El-Halees A., Arabic Text Classification Using Maximum Entropy, The Islamic University Journal , vol. 15, no. 1, pp. 157-167, 2007.

[18] El-Kourdi M., Bensaid A., and Rachidi T., Automatic Arabic Document Categorization Based on the Na ve Bayes Algorithm, in the Proceeding of 20 th International Conference on Computational Linguistics 28 th , pp. 1043-1050, Geneva, 2004 .

[19] Eryigit G. and Tantu A., A Comparison of Support Vector Machines, Memory-Based and Na ve Bayes Techniques on Spam Recognition, in the Conference on Artificial Intelligence Applications (AIA-2005), pp. 1-10, Australia, 2005.

[20] Evett D., Spam Statistics 2006, http://spam- filter -review. spam-statistics .html, 2006.

[21] Hammo B., Abu-Salem H., Lytinen S., and Evens M., Workshop on Computational Approaches to Semitic Languages , in Proceeding of QARAB: A Question Answering 59 The International Arab Journal of Information Technology, Vol. 6, No. 1, January 2009 System to Support the Arabic Language Workshop on Computational Approaches to Semitic Languages , pp. 55-65, Jordan, 2002.

[22] Iwanaga M., Tabata T., and Sakurai K., Some Fitting of Na ve Bayesian Spam Filtering of Japanese, Workshop on Information Security Applications (WISA-2004 ), Springer , Korea, 2004.

[23] Lai C. and Tsi M., An Empirical Performance Comparison on Machine Learning Spam E-mail Categorization, in Proceedings of 4 th International Conference on Hybrid Intelligent Systems , pp.44-48, Australia, 2004.

[24] Malouf R., A Comparison of Algorithms for Maximum Entropy Parameter Estimation, in Proceedings of the 6 th Conference on Natural Language Learning (CoNLL-2002) , pp. 49-55 , Taiwan, 2002.

[25] Metsis V., Androutsopoulos I., and Paliouras G., Spam Filtering with Na ve Bayes: Which Na ve Bayes, in The 3rd Conference on Email and Anti-Spam CEAS 2006 Mountain View, pp. 1702- 1761, California, 2006.

[26] Nigam K., Lafferty J., and McCallum A., Workshop on Machine Learning for Information Filtering , pp. 61-67, UK, 1999.

[27] Ozgur L., Gungor T., and Gurgen F., Adaptive Anti Spam Filtering for Agglutinative Languages: A Special Case for Turkish, Pattern Recognition Letters , vol. 25, no. 16, pp. 1819- 1831. 2004.

[28] Sakkis G. and Androutsopoulos I., A Memory: Based Approach to Anti-spam Filtering for Mailing Lists , Kluwer Academic Publishers, London, 2003.

[29] Sawaf H., Zaplo J., and Ney H., Arabic Natural Language Processing, Workshop on the ACL'2001 , France, 2001.

[30] Sculley D., Wachman G., and Brodley C., Spam Filtering Using Inexact String Marching in Explicit Feature Space with on Line Classifiers, in Proceedings of the 15 th Text REtrieval Conference (TREC 2006) , pp. 191-204, USA, 2006.

[31] Sebastiani F., Machine Learning in Automated Text Categorization, Computer Journal ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.

[32] Tretyakov K., Machine Learning Techniques in Spam Filtering, Technical Report , 2004.

[33] Uchitmoto K., Ma Q., Murata M., Ozaku H., and Isahara H., Named Entity Extraction Based on a Maximum Entropy Model and Transformation Rules, Journal of Natural Language Processing , vol. 7, no. 2, pp. 63-90, 2000.

[34] Woitaszek M., Shaaban M., and Czernikowski R., Identifying Junk Electronic Mail in Microsoft Outlook with a Support Vector Machine, in the Proceedings of Symposium on Applications and the Internet (SAINT2003) , pp. 166-171, Florida, 2003.

[35] Yang Y. and Pedersen J., Comparative Study on Feature Selection in Text Categorization, in Proceedings of ICML-97 14 th International Conference on Machine Learning, pp. 412-420, US, 1997.

[36] Youn S. and McLeod D., A Comparative Study for Email Classification, in Proceedings of International Joint Conferences on Computer Information System Sciences and Engineering (CISSE'06) , pp. 462-567, USA, 2006 .

[37] Zdziarski J, Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification 1 st , No Starch Press, 2005.

[38] Zhang L. and Yao T., Filtering Junk Mail with a Maximum Entropy Model, in the Proceedings of 20 th International Conference on Computer Processing of Oriental Languages, pp. 469-475, China, 2003.

[39] Zhang L., Zhu J., and Yao T., An Evaluation of Statistical Spam Filtering Techniques, ACM Transactions on Asian Language Information Processing (TALIP) , vol. 3, no. 4, pp. 243-269, 2004. Alaa El-Halees is an assistant professor in computing and dean of faculty of Information Technology Department at Islamic University of Gaza, Palestine. He holds PhD degree in data mining in 2004, MSc degree in software development in 1998 from Leeds Metropolitan University, UK. He received his BSc degree in computer engineering in 1989 from University of Arizona, USA. His research activities are in the area of data mining, in particular text mining, machine learning and e-learning.