Classification is an important problem in data mini ng. Decision tree induction is one of the most comm on
techniques that are applied to solve the classifica tion problem. Many decision tree induction algorith ms have been proposed
based on different attribute selection and pruning strategies. Although the patterns induced by decision trees are easy to
interpret and comprehend compare to the patterns in duced by other classification algorithms, the constructed decision trees
may contain hundreds or thousand of nodes which are difficult to comprehend and interpret by the user who examines the
patterns. For this reasons, the question of an appr opriate constructing and providing a good pruning c riteria have long been a
topic of considerable debate. The main objective of such criteria is to create a tree such that the classification accuracy, when
used on unseen data, is maximized and the tree size is minimized. Usually, most of decision tree algorithms perform splitting
criteria to construct a tree first, then, prune the tree to find an accurate, simple, and comprehensib le tree. Even after pruning,
the decision tree constructed may be extremely huge and may reflect patterns, which are not interesting from the user point of
view. In many scenarios, users are only interested in obtaining patterns that are interesting; thus, users may require obtaining
a simple, and interpretable, but only approximate d ecision tree much better than an accurate tree that involves a lot of details.
In this paper, we proposed a pruning approach that captures the user subjectivity to discoverer interesting patterns. The
approach computes the subjective interestingness an d uses it as a pruning criterion to prune away uninteresting patterns. The
proposed framework helps in reducing the size of th e induced model and maintaining the model. One of t he features of the
proposed approach is to capture the user background knowledge, which is monotonically augmented. The e xperimental results
are quite promising.
[1] Bhatnagar V., Al Hegami S., and Kumar N., A Hybrid Approach for Quantification of Novelty in Rule Discovery, in Proceedings of International Conference on Artificial Learning and Data Mining (ALDM 05) , Turkey, pp. 39 42, 2005.
[2] Breiman J., Friedman H., Olshen A., and Stone J., Classifications and Regression Trees , New York, Chapman and Hall, 1984.
[3] Data Mining, http:// www.comp.nus.edu.sg /~dm2/index.html, 1998.
[4] Duda O., Hart E., and Stork G., Pattern Classification , John Wiley & Sons, 2002.
[5] Garofalakis M., Hyun D., Rastogi R., and Shim K., Efficient Algorithms for Constructing Decision Trees with Constraints, Bell Laboratories Technical Memorandum , 2000.
[6] Ghrke J., Ganti V., Ramakrishnan R., and Loh Y., BOAT Optimistic Decision Tree Construction, in Proceedings of the ACM SIGMOD International Conference on Management of Data , USA, pp. 13 23, 1999.
[7] Han J. and Kamber M., Data Mining: Concepts and Techniques , Harcourt India Private Limited, 2001.
[8] Hegami S., Bhatnagar V., and Kumar N., Novelty as a Measure of Interestingness in Knowledge Discovery, in International Journal of Information Technology , vol. 2, no. 1, pp. 37 49, 2005.
[9] Hegami S., Bhatnagar V., and Kumar N., Novelty Framework for Knowledge Discovery in Databases, in Proceedings of 6 thInternational Conference on Data War1ehousing and Knowledge Discovery (DaWaK 2004) , Spain, pp. 48 57, 2004.
[10] Karimi K. and Hamilton J., Temporal Rules and Temporal Decision Trees: A C4.5 Approach, Technical Report CSH2001H02 , Canada, 2001.
[11] Liu B. and Hsu W., Post Analysis of Learned Rules, in Proceedings of the 13 th National Conference on AI (AAAI 96) , pp. 1194 2001, 1996.
[12] Liu B., Hsu W., and Chen S., Using General Impressions to Analyze Discovered Classification Rules, in Proceedings of the 3 rd International Conference on Knowledge Discovery and Data Mining (KDD 97) , USA, pp. 73 83, 1997.
[13] Liu B., Hsu W., Mun F., and Lee Y., Finding Interesting Patterns Using User Expectations, Technical Report TRA7/96 , 1996.
[14] Mingers J., An Empirical Comparison of Pruning Methods for Decision Tree Induction, Computer Journal of Machine Learning , vol. 4, no. 2, pp. 213 22, 1987.
[15] Quinlan J., Induction of Decision Trees , Academic Publishers, 1986.
[16] Quinlan R., Simplifying of Decision Trees, International Journal of Machine Learning Studies , vol. 27, no. 3, pp. 389 401, 1987.
[17] Quinlan R., C4.5: Programs for Machine Learning , San Mateo, 1993.
[18] Rastogi R. and Shim K., PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning, in Proceedings of the 24 th Pruning Based Interestingness of Mined Classification Patterns 343 International Conference on Very Large Data Bases (VLDB), New York, pp. 24 27, 1998.
[19] Shafer J., Aggrawal R., and Mehta M., SPRINT: A Scalable Parallel Classifier for Data Mining, in Proceedings of 22 nd VLDB Conference , San Antonio, pp. 962 969, 1996.
[21] Utgoff E., Incremental Induction of Decision Tress, Computer Journal of Machine Learning , vol. 4, no. 2, pp. 161 186, 1989. Ahmed Al=Hegami received his BSc degree in computer science from King Abdul Aziz University, Saudi Arabia, and his Master and PhD in computer application from Jawaharlal Nehru University, New Delhi, India.
Cite this
Department of Computer Science, University of Sana' a, Yemen, "Pruning Based Interestingness of Mined Classification Patterns", The International Arab Journal of Information Technology (IAJIT) ,Volume 06, Number 04, pp. 81 - 88, October 2009, doi: .
@ARTICLE{4434,
author={Department of Computer Science, University of Sana' a, Yemen},
journal={The International Arab Journal of Information Technology (IAJIT)},
title={Pruning Based Interestingness of Mined Classification Patterns},
volume={06},
number={04},
pages={81 - 88},
doi={},
year={1970}
}
TY - JOUR
TI - Pruning Based Interestingness of Mined Classification Patterns
T2 -
SP - 81
EP - 88
AU - Department of Computer Science
AU - University of Sana' a
AU - Yemen
DO -
JO - The International Arab Journal of Information Technology (IAJIT)
IS - 9
SN - 2413-9351
VO - 06
VL - 06
JA -
Y1 - Jan 1970
ER -
PY - 1970
Department of Computer Science, University of Sana' a, Yemen, " Pruning Based Interestingness of Mined Classification Patterns", The International Arab Journal of Information Technology (IAJIT) ,Volume 06, Number 04, pp. 81 - 88, October 2009, doi: .
Abstract: Classification is an important problem in data mini ng. Decision tree induction is one of the most comm on
techniques that are applied to solve the classifica tion problem. Many decision tree induction algorith ms have been proposed
based on different attribute selection and pruning strategies. Although the patterns induced by decision trees are easy to
interpret and comprehend compare to the patterns in duced by other classification algorithms, the constructed decision trees
may contain hundreds or thousand of nodes which are difficult to comprehend and interpret by the user who examines the
patterns. For this reasons, the question of an appr opriate constructing and providing a good pruning c riteria have long been a
topic of considerable debate. The main objective of such criteria is to create a tree such that the classification accuracy, when
used on unseen data, is maximized and the tree size is minimized. Usually, most of decision tree algorithms perform splitting
criteria to construct a tree first, then, prune the tree to find an accurate, simple, and comprehensib le tree. Even after pruning,
the decision tree constructed may be extremely huge and may reflect patterns, which are not interesting from the user point of
view. In many scenarios, users are only interested in obtaining patterns that are interesting; thus, users may require obtaining
a simple, and interpretable, but only approximate d ecision tree much better than an accurate tree that involves a lot of details.
In this paper, we proposed a pruning approach that captures the user subjectivity to discoverer interesting patterns. The
approach computes the subjective interestingness an d uses it as a pruning criterion to prune away uninteresting patterns. The
proposed framework helps in reducing the size of th e induced model and maintaining the model. One of t he features of the
proposed approach is to capture the user background knowledge, which is monotonically augmented. The e xperimental results
are quite promising. URL: https://iajit.org/paper/4434
@ARTICLE{4434,
author={Department of Computer Science, University of Sana' a, Yemen},
journal={The International Arab Journal of Information Technology (IAJIT)},
title={Pruning Based Interestingness of Mined Classification Patterns},
volume={06},
number={04},
pages={81 - 88},
doi={},
year={1970}
,abstract={Classification is an important problem in data mini ng. Decision tree induction is one of the most comm on
techniques that are applied to solve the classifica tion problem. Many decision tree induction algorith ms have been proposed
based on different attribute selection and pruning strategies. Although the patterns induced by decision trees are easy to
interpret and comprehend compare to the patterns in duced by other classification algorithms, the constructed decision trees
may contain hundreds or thousand of nodes which are difficult to comprehend and interpret by the user who examines the
patterns. For this reasons, the question of an appr opriate constructing and providing a good pruning c riteria have long been a
topic of considerable debate. The main objective of such criteria is to create a tree such that the classification accuracy, when
used on unseen data, is maximized and the tree size is minimized. Usually, most of decision tree algorithms perform splitting
criteria to construct a tree first, then, prune the tree to find an accurate, simple, and comprehensib le tree. Even after pruning,
the decision tree constructed may be extremely huge and may reflect patterns, which are not interesting from the user point of
view. In many scenarios, users are only interested in obtaining patterns that are interesting; thus, users may require obtaining
a simple, and interpretable, but only approximate d ecision tree much better than an accurate tree that involves a lot of details.
In this paper, we proposed a pruning approach that captures the user subjectivity to discoverer interesting patterns. The
approach computes the subjective interestingness an d uses it as a pruning criterion to prune away uninteresting patterns. The
proposed framework helps in reducing the size of th e induced model and maintaining the model. One of t he features of the
proposed approach is to capture the user background knowledge, which is monotonically augmented. The e xperimental results
are quite promising.},
keywords={},
ISSN={2413-9351},
month={Jan}}
TY - JOUR
TI - Pruning Based Interestingness of Mined Classification Patterns
T2 -
SP - 81
EP - 88
AU - Department of Computer Science
AU - University of Sana' a
AU - Yemen
DO -
JO - The International Arab Journal of Information Technology (IAJIT)
IS - 9
SN - 2413-9351
VO - 06
VL - 06
JA -
Y1 - Jan 1970
ER -
PY - 1970
AB - Classification is an important problem in data mini ng. Decision tree induction is one of the most comm on
techniques that are applied to solve the classifica tion problem. Many decision tree induction algorith ms have been proposed
based on different attribute selection and pruning strategies. Although the patterns induced by decision trees are easy to
interpret and comprehend compare to the patterns in duced by other classification algorithms, the constructed decision trees
may contain hundreds or thousand of nodes which are difficult to comprehend and interpret by the user who examines the
patterns. For this reasons, the question of an appr opriate constructing and providing a good pruning c riteria have long been a
topic of considerable debate. The main objective of such criteria is to create a tree such that the classification accuracy, when
used on unseen data, is maximized and the tree size is minimized. Usually, most of decision tree algorithms perform splitting
criteria to construct a tree first, then, prune the tree to find an accurate, simple, and comprehensib le tree. Even after pruning,
the decision tree constructed may be extremely huge and may reflect patterns, which are not interesting from the user point of
view. In many scenarios, users are only interested in obtaining patterns that are interesting; thus, users may require obtaining
a simple, and interpretable, but only approximate d ecision tree much better than an accurate tree that involves a lot of details.
In this paper, we proposed a pruning approach that captures the user subjectivity to discoverer interesting patterns. The
approach computes the subjective interestingness an d uses it as a pruning criterion to prune away uninteresting patterns. The
proposed framework helps in reducing the size of th e induced model and maintaining the model. One of t he features of the
proposed approach is to capture the user background knowledge, which is monotonically augmented. The e xperimental results
are quite promising.