Automatic Classification and Filtering of Electronic Information: Knowledge-Based Filtering Approach

Author Omar Nouali, Philippe Blache,

Keywords #Information filtering #expert systesms #machine learning #neural networks #relevance feedback #genetic algorithms

Abstract

In this paper we propose an artificial intelligent approach focusing information filtering problem. First, we give an overview of the information filtering process and a survey of different models of textual information filtering. Second, we present our E-mail filtering tool. It consists of an expert system in charge of driving the filtering process in cooperation with a knowledge-based model. Neural networks are used to model all system knowledge. The system is based on machine learning techniques to continuously learn and improve its knowledge all along its life cycle. This email filtering tool assists the user in managing, selecting, classify and discarding non-desirable messages in a professional or non-professional context. The modular structure makes it portable and easy to adapt to other filtering applications such as web browsing. The performance of the system is discussed.

References

[1] Abdelaziz Y. R., “Système de filtrage du courrier électronique: E-FILTER,” Engineer Thesis, INI, Algiers, Algeria, 2000.

[2] Amini M. R., “Apprentissage Automatique et recherche de l’information: Application à l’extraction d’information de Surface et au résumé de Texte,” PhD Thesis, Université de Paris 6, France, 2001.

[3] Androutsopoulos I., Koutsias J., Chandrinos K. V., Paliouras G., and Spyropoulos C. D., “An Evaluation of Naïve Bayesian Anti-Spam Filtering,” in Proceedings of 11th European Conference on Machine Learning in the New Information Age, Barcelona, Spain, pp. 9-17, 2000.

[4] Belkin N. J. and Croft W. B., “Information Filtering and Information Retrieval: Two Sides of the Same Coin?,” Communication of the ACM, vol. 35, no. 12, pp. 29-38, 1992.

[5] Cohen W. W., “Learning Rules That Classify E- Mail,” in Proceedings of AAAI Spring Symposium on Machine Learning in Information Access, 1996.

[6] Cui H., Wen J. R., Nie J. Y., and Ma W. Y., “Probabilistic Query Expansion Using Query Logs,” in Proceedings of 11th International World Wide Web Conference (WWW2002), Honolulu, Hawaii, USA, 2002.

[7] Dreyfus G., Martinez J. M., Samuelides M., Gordon M. B., Badran F., Thiria S., and Hérault L., “Réseaux de Neurones, Méthodologie et applications,” Edition Eyrolles, 2002.

[8] Dumais S. T., “Using LSI for information retrieval, information filtering and other things,” Bellcore Cognitive Technology Conference, 1997.

[9] Dumais S. T., Plat J., Heckerman D., and Sahami M., “Inductive Learning Algorithms and Representation for Text Categorization,” in Proceedings of Seventh International Conference on Information and Knowledge Management, pp. 148-155, 1998.

[10] Goldberg D., Nichols D., Oki B. M., and Douglas T., “Using Collaborative Filtering to Weave an Information Tapestry,” Communication of the ACM, vol. 35, no. 12, pp. 61-70, 1992.

[11] Joachims T., “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” in Proceedings of 14th International Conference on Machine Learning, pp. 143-151, 1997.

[12] Joachims T., “Text Categorization with Support Vector Machines: Learning With Many Relevant Features,” in Proceedings of 16th European Conference on Machine Learning, pp. 137-142, 1999.

[13] Lewis D. D. and Ringuette M., “Comparison of Two Learning Algorithms for Text Categorization,” in Proceedings of Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR), 1994.

[14] Manning C. D. and Schütze H., Foundations of Statistical Natural Language Processing, MIT Press, Cambridge, Massachusetts, 1999.

[15] Mc Callum A. and Nigam K., “A Comparison of Event Models for Naïve Bayes Text Classification,” Learning for text categorization, 1998.

[16] Nouali O., “Classification Automatique De Messages: Une Approche Hybride,” RECITAL, Nancy, 2002.

[17] Oubbad L., Fouial O., and Nouali O., “Système Intelligent De Filtrage Du Courrier Électronique,” Engineer Thesis, INI, Algiers, Algeria, 2000.

[18] Ram A., “Natural Language Understanding for Information Filtering Systems,” Communications of the ACM, vol. 35, no. 12, pp. 80-81, 1992.

[19] Sebastiani F., “Machine Learning in Automated Text Categorisation,” ACM Computing Surveys, vol. 34, no. 1, pp. 1-47, 2002.

[20] Stadnyk I. and Kass R., “Modeling User’s Interests in Information Filters,” Communications of the ACM, vol. 35, no. 12, pp. 49-50, 1992.

[21] Yan T. W. and Garcia-Molina H., “Index Structures For Information Filtering Under The Vector Space Model,” Department of Computer Science, Stanford University, Stanford, CA, 1994.

[22] Yan T. W. and Garcia-Molina H., “SIFT: A Tool for Wide-Area Information Dissemination,” in Proceedings of USENIX Technical Conference, pp. 177-186, 1995.

[23] Yang Y. and Pedersen J. O., “A Comparative Study on Feature Selection in Text Categorization,” International Conference on Machine Learning (ICML), Nashville, TN, USA, 1997. 92 The International Arab Journal of Information Technology, Vol. 1, No. 1, January 2004 Omar Nouali had his Engineer degree in computer science in 1988 from Houari Boumediene University of Science and Technology (USTHB), and the Master degree (Magister) in computer science in 1991 from Advanced Technology Center, Algiers, Algeria. Currently, a “Responsible of research” in basic software laboratory. Research interests include artificial intelligence, expert systems, neural networks, natural language processing, information filtering, and human computer interface. Philippe Blache is a “Research Director” at the CNRS (Laboratoire Parole et Langage, Université de Provence). His work concerns the implementation of linguistic theories and the development of NLP applications (especially concerning parsing, dialogue, alternative communication). He also has several international responsibilities in different associations and foundations in the field of computational linguistics (board member of the EACL, ESSLLI, ATALA, etc.).