(2) Combining Instance Weighting and Fine Tuning for

Author Khalil El Hindi,

Keywords #Naïve bayesian algorithm #classification #machine learning #noisy data sets #instance weighting

Abstract This work addresses the problem of having to train a Naïve Bayesian classifier using limited data. It first presents an improved instance-weighting algorithm that is accurate and robust to noise and then it shows how to combine it with a fine tuning algorithm to achieve even better classification accuracy. Our empirical work using 49 benchmark data sets shows that the improved instance-weighting method outperforms the original algorithm on both noisy and noise-free data sets. Another set of empirical results indicates that combining the instance-weighting algorithm with the fine tuning algorithm gives better classification accuracy than using either one of them alone.

References

[1] Alhussan A. and El Hindi K., Selectively Fine- Tuning Bayesian Network Learning Algorithm, International Journal of Pattern Recognition and Artificial Intelligence, vol. 30, no. 8, 2016.

[2] Blake C. and Merz C., UCI Repository of Machine Learning Databases, University of California, http://archive.ics.uci.edu/ml/, Last Visited, 2016.

[3] Chickering D., Learning Bayesian Networks Is NP-Complete, in Proccedings of Learning from Data: Artificial Intelligence and Statistics, New York, pp. 121-30, 1996

[4] Duwairi R., Arabic Text Categorization, The International Arab Journal of Information Technology, vol. 4, no. 2, pp. 125-31, 2007.

[5] El Hindi K., A Noise Tolerant Fine Tuning Algorithm for the Na ve Bayesian Learning Algorithm, Journal of King Saud University- Computer and Information Sciences, vol. 26, no. 2, pp. 237-246, 2014.

[6] El Hindi K., Fine Tuning the Na ve Bayesian Learning Algorithm, AI Communcations, vol. 27, no. 2, pp. 133-141, 2014.

[7] Frank E., Hall M., and Pfahringer B., Locally weighted naive Bayes, in Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, Acapulco, pp. 249-256, 2003.

[8] Friedman N., Geiger D., and Goldszmidt M., Bayesian Network Classifiers, Machine Learning, vol. 29, no. 2-3, pp. 131-163, 1997.

[9] Jiang L., Wang D., and Cai Z., Discriminatively Weighted Naive Bayes and Its Application in Text Classi cation, International Jorurnal on Artificial Intelligence Tools, vol. 21, no. 1, 2012.

[10] Jiang L., Cai Z., Wang D., and Zhang H., Improving Tree Augmented Naive Bayes for Class Probability Estimation, Knowledge-Based Systems, vol. 26, pp. 239-245, 2012.

[11] Jiang L., Zhang H., Cai Z., Evolutional Naive Bayes, in Proceedings of the 1st International Symposium on Intelligent Computation and Applications, pp. 344-350, 2005. 1106 The International Arab Journal of Information Technology, Vol. 15, No. 6, November 2018

[12] Jiang L., Wang D., Cai Z., Yan X., Survey of Improving Naive Bayes for Classification, in Advanced Data Mining and Applications, Harbin, pp. 134-145, 2007.

[13] Jiang L., Wang D., Cai Z., Zhang H., Using Instance Cloning to Improve Naive Bayesfor Ranking, International Journal of Pattern Recognition and Artificial Intelligence, vol. 22, no. 6, pp. 1121-1140, 2008.

[14] Jiang L. and Zhang H., Learning Instance Greedily Cloning Naive Bayes for Ranking, in Proccedings of 5th IEEE International Conference on Data Mining, Houston, pp. 202- 209, 2005.

[15] Kohavi R., Scaling Up the Accuracy of Naive- Bayes Classi Ers : A Decision-Tree Hybrid, in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 202-207, 1996.

[16] Langley P. and Sage S., Induction of Selective Bayesian Classifiers, in Proceedings of the 10th International Conference on Uncertainty in Artificial Intelligence, pp. 339-406, 1994.

[17] Mitchell T., Machine Learning, McGraw Hill, 1997.

[18] Nettleton D., Fornells A., and Orriols-Puig A., A Study of the Effect of Different Types of Noise on the Precision of Supervised Learning Techniques, Artificial Intelligence Review, vol. 33, no. 4, pp. 275-306, 2010.

[19] Nigam K., McCallum A., Thrun S., and Mitchell T., Text Classification from Labeled and Unlabeled Documents Using EM, Machine Learning, vol. 39, no. 2-3, pp.103-134, 2000.

[20] Palacios-Alonso M., Brizuela A., and Enrique Sucar L., Evolutionary Learning of Dynamic Naive Bayesian Classifiers, Journal of Automated Reasoning, vol. 45, no. 1, pp. 21-37, 2010.

[21] Quinn C., Coleman T., and Kiyavash N., Approximating Discrete Probability Distributions with Causal Dependence Trees, in Proccedings of International Symposium on Information Theory and Its Applications, Taichung, pp. 100-105, 2010.

[22] Wu X., Kumar V., Quinlan J., Ghosh J., Yang Q., Motoda H., McLachlan G., Ng A., Liu B., Yu P., Zhou Z., Steinbach M., Hand D., and Steinberg D., Top 10 Algorithms in Data Mining, Knowledge and Information Systems, vol. 14, no. 1, pp. 1-37, 2008.

[23] Zhang H. and Ling C., An Improved Learning Algorithm for Augmented Naive Bayes, Advances in Knowledge Discovery and Data Mining, Hong Kong, pp. 581-586, 2001. Khalil El Hindi is a Professor at the department of Computer Science, King Saud University. His research interest includes machine leaning and data mining. He is particularly interested in improving the classification accuracy of Bayesian classifiers and developing new similarity metrics for instance-based learning. He received his BS.C. Degree from Yarmouk University and his MSc and Ph.D. degrees from the University of Exeter, UK.