The International Arab Journal of Information Technology (IAJIT)


Pruning Based Interestingness of Mined Classification Patterns

Classification is an important problem in data mini ng. Decision tree induction is one of the most comm on techniques that are applied to solve the classifica tion problem. Many decision tree induction algorith ms have been proposed based on different attribute selection and pruning strategies. Although the patterns induced by decision trees are easy to interpret and comprehend compare to the patterns in duced by other classification algorithms, the constructed decision trees may contain hundreds or thousand of nodes which are difficult to comprehend and interpret by the user who examines the patterns. For this reasons, the question of an appr opriate constructing and providing a good pruning c riteria have long been a topic of considerable debate. The main objective of such criteria is to create a tree such that the classification accuracy, when used on unseen data, is maximized and the tree size is minimized. Usually, most of decision tree algorithms perform splitting criteria to construct a tree first, then, prune the tree to find an accurate, simple, and comprehensib le tree. Even after pruning, the decision tree constructed may be extremely huge and may reflect patterns, which are not interesting from the user point of view. In many scenarios, users are only interested in obtaining patterns that are interesting; thus, users may require obtaining a simple, and interpretable, but only approximate d ecision tree much better than an accurate tree that involves a lot of details. In this paper, we proposed a pruning approach that captures the user subjectivity to discoverer interesting patterns. The approach computes the subjective interestingness an d uses it as a pruning criterion to prune away uninteresting patterns. The proposed framework helps in reducing the size of th e induced model and maintaining the model. One of t he features of the proposed approach is to capture the user background knowledge, which is monotonically augmented. The e xperimental results are quite promising.

[1] Bhatnagar V., Al Hegami S., and Kumar N., A Hybrid Approach for Quantification of Novelty in Rule Discovery, in Proceedings of International Conference on Artificial Learning and Data Mining (ALDM 05) , Turkey, pp. 39 42, 2005.

[2] Breiman J., Friedman H., Olshen A., and Stone J., Classifications and Regression Trees , New York, Chapman and Hall, 1984.

[3] Data Mining, http:// /~dm2/index.html, 1998.

[4] Duda O., Hart E., and Stork G., Pattern Classification , John Wiley & Sons, 2002.

[5] Garofalakis M., Hyun D., Rastogi R., and Shim K., Efficient Algorithms for Constructing Decision Trees with Constraints, Bell Laboratories Technical Memorandum , 2000.

[6] Ghrke J., Ganti V., Ramakrishnan R., and Loh Y., BOAT Optimistic Decision Tree Construction, in Proceedings of the ACM SIGMOD International Conference on Management of Data , USA, pp. 13 23, 1999.

[7] Han J. and Kamber M., Data Mining: Concepts and Techniques , Harcourt India Private Limited, 2001.

[8] Hegami S., Bhatnagar V., and Kumar N., Novelty as a Measure of Interestingness in Knowledge Discovery, in International Journal of Information Technology , vol. 2, no. 1, pp. 37 49, 2005.

[9] Hegami S., Bhatnagar V., and Kumar N., Novelty Framework for Knowledge Discovery in Databases, in Proceedings of 6 thInternational Conference on Data War1ehousing and Knowledge Discovery (DaWaK 2004) , Spain, pp. 48 57, 2004.

[10] Karimi K. and Hamilton J., Temporal Rules and Temporal Decision Trees: A C4.5 Approach, Technical Report CSH2001H02 , Canada, 2001.

[11] Liu B. and Hsu W., Post Analysis of Learned Rules, in Proceedings of the 13 th National Conference on AI (AAAI 96) , pp. 1194 2001, 1996.

[12] Liu B., Hsu W., and Chen S., Using General Impressions to Analyze Discovered Classification Rules, in Proceedings of the 3 rd International Conference on Knowledge Discovery and Data Mining (KDD 97) , USA, pp. 73 83, 1997.

[13] Liu B., Hsu W., Mun F., and Lee Y., Finding Interesting Patterns Using User Expectations, Technical Report TRA7/96 , 1996.

[14] Mingers J., An Empirical Comparison of Pruning Methods for Decision Tree Induction, Computer Journal of Machine Learning , vol. 4, no. 2, pp. 213 22, 1987.

[15] Quinlan J., Induction of Decision Trees , Academic Publishers, 1986.

[16] Quinlan R., Simplifying of Decision Trees, International Journal of Machine Learning Studies , vol. 27, no. 3, pp. 389 401, 1987.

[17] Quinlan R., C4.5: Programs for Machine Learning , San Mateo, 1993.

[18] Rastogi R. and Shim K., PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning, in Proceedings of the 24 th Pruning Based Interestingness of Mined Classification Patterns 343 International Conference on Very Large Data Bases (VLDB), New York, pp. 24 27, 1998.

[19] Shafer J., Aggrawal R., and Mehta M., SPRINT: A Scalable Parallel Classifier for Data Mining, in Proceedings of 22 nd VLDB Conference , San Antonio, pp. 962 969, 1996.

[20] UCI KDD Archive, http://, 2005.

[21] Utgoff E., Incremental Induction of Decision Tress, Computer Journal of Machine Learning , vol. 4, no. 2, pp. 161 186, 1989. Ahmed Al=Hegami received his BSc degree in computer science from King Abdul Aziz University, Saudi Arabia, and his Master and PhD in computer application from Jawaharlal Nehru University, New Delhi, India.