Downloads 942

..............................

Views 3k

..............................

Cited by

..............................

Received date October 9, 2015

Accepted date August 24, 2016 1. Introductio

Rough Set-Based Reduction of Incomplete Medical

Author Datasets by Reducing the Number of Missing,

Keywords #Data mining #rough set theory #missing values #reduct

Abstract This paper proposes a model of: firstly, dimensionality reduction of noisy medical datasets that based on minimizing the number of missing values, which achieved by cutting the original dateset, secondly, high quality of generated reduct. The original dataset was split into two subsets; the first one contains complete records and the other one contains imputed records that previously have missing values. The reducts of the two subsets based on rough set theory are merged. The reduct of the merged attributes was constructed and tested using Rule Based and Decomposition Tree classifiers. Hepdata dataset, which has 59% of its tuples with one or more missing values, is mainly used throughout this article. The proposed algorithm performs effectively and the results are as expected. The dimension of the reduct generated by the Proposed Model (PM) is decreased by 10% comparing to the Rough Set Model (RSM). The proposed model was tested against different medical incomplete datasets. Significant and insignificant difference between RSM and PM are shown in Tables 1-5.

References

[1] Agrawal A. and Srikant R., “Privacy Preserving Data Mining,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, pp. 439-450, 2000.

[2] Al-Shalabi L., Mahmod R., Abd Ghani A., and Saman Y., “A New Model for Extracting a Classifactory Knowledge from Large Datasets Using Rough Set Approach,” in Proceedings of World Engineering Congress, Kuala Lumpur, pp. 105-108, 1999.

[3] Al Shalabi L., Shaaban Z., and Kasasbeh B., “Data mining: A Preprocessing Engine,” Journal of Computer Science, vol. 2, no. 9, pp. 735-739, 2006.

[4] Bazan J., Szczuka M., Wojna A., and Wojnarski M., “On Evolution of Rough Set Exploration System,” in Proceedings of International Conference on Rough Sets and Current Trends in Computing, Berlin, pp. 592-601, 2004.

[5] Berthold M. and Huber K., “Missing Values and Learning of Fuzzy Rules,” International Journal Uncertainty, Fuzziness Knowledge-Based Systems, vol. 6, no. 2, pp. 171-178, 1998.

[6] Chen D., Zhang L., Zhao S., Hu Q., and Zhu P., “A Novel Algorithm for Finding Reducts with Fuzzy Rough Sets,” IEEE Transactions on Fuzzy Systems, vol. 20, no. 2, pp. 385-389, 2012.

[7] Dai J., Wang W., Tian H., and Liu L., “Attribute Selection based on a New Conditional Entropy for Incomplete Decision Information Systems,” Knowledge-Based Systems, vol. 39, pp. 207-213, 2013.

[8] Dempster A., Larid N., and Rubin D., “Maximum Likelihood from Imcomplete Data via the Em Algorithm (with Discussion),” Journal of Royal Statistical Society, vol. 39, pp. 1-38, 1977.

[9] Inuiguchi M., Yoshioka Y., and Kusunoki Y., “Variable-precision Dominance-based Rough Set Approach and Attribute Reduction,” International Journal of Approximate Reasoning, vol. 50, no. 8, pp. 1199-1214, 2009.

[10] Jacob S. and Raju G., “Software Defect Prediction in Large Space Systems through Hybrid Feature Selection and Classification,” The International Arab Journal of Information Technology, vol. 14, no. 2, pp. 208-214, 2017.

[11] Jia X., Liao W., Tang Z., and Shang L., “Minimum Cost Attribute Reduction in Decision-theoretic Rough Set Models,” Information Sciences, vol. 219, pp. 151-167, 2013.

[12] Meng Z. and Shi Z., “Extended Rough Set-Based Attribute Reduction in Inconsistent Incomplete Decision Information Systems,” Information Sciences, vol. 204, pp. 44-69, 2012.

[13] Michikazu N. and Weiming K., “Review of the Methods for handling Missing Data in Longitudinal Data Analysis,” International Journal of Math. Analysis, vol. 5, no. 1, pp. 1-13, 2011.

[14] Parthalain N., Shen Q., and Jensen R., “A distance Measure Approach to Exploring the Rough Set Boundary Region for Attribute Reduction,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 3, pp. 305- 317, 2010.

[15] Pawlak Z. and Skowron A., “Rudiments of Rough Sets,” Information Sciences, vol. 177, no. 1, pp. 3-27, 2007.

[16] Quinlan J., “Unknown Attribute Values in Induction,” in Proceedings of the 6th International Workshop on Machine Learning, Ithaca, pp. 164-168, 1989.

[17] Rough Set Exploration System (RSES), 210 The International Arab Journal of Information Technology, Vol. 16, No. 2, March 2019 http://www.mimuw.edu.pl/~szczuka/rses/start.ht m, Last Visited, 2015.

[18] Sansom C., “Up in a Cloud?,” Nature Biotechnology, vol. 28, no. 1, pp. 13-15, 2010.

[19] Taylor R., “An Overview of the Hadoop/Mapreduce/Hbase Framework and Its Current Applications in Bioinformatics,” in Proceedings of the 11th Annual Bioinformatics Open Source Conference, Boston, 2010.

[20] UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/, Last Visited, 2015.

[21] Ye D., Chen Z., and Ma S., “A Novel and Better Fitness Evaluation for Rough Set based Minimum Attribute Reduction Problem,” Information Sciences, vol. 222, pp. 413-423, 2013.

[22] Zhang S., “Shell-Neighbor Method and its Application in Missing Data Imputation,” Applied Intelligence, vol. 35, pp. 123-133, 2011.

[23] Zhong M. and Sharma S., “Development of Improved Models for Imputation Missing Traffic Counts,” The Open Transportation Journal, vol. 3, pp. 35-48, 2009. Luai Al Shalabi was born in Jordan in 1971. He received the B.S. in computer science from Yarmouk University, Jordan, in 1992, M.S. degrees in image interpretations from Universiti Sains Malaysia, Malaysia, in 1996, and the Ph.D. degree in data mining from University Putra Malaysia, Malaysia, in 2000. He is working in the Information Technology Department at Arab Open University in Kuwait. His research interests include data mining, knowledge discovery, and machine learning.