The International Arab Journal of Information Technology (IAJIT)


Data Streams Oriented Outlier Detection Method: A Fast Minimal Infrequent Pattern Mining

Outlier detection is a common method for analyzing data streams. In the existing outlier detection methods, most of methods compute distance of points to solve certain specific outlier detection problems. However, these methods are computationally expensive and cannot process data streams quickly. The outlier detection method based on pattern mining resolves the aforementioned issues, but the existing methods are inefficient and cannot meet requirements of quickly mining data streams. In order to improve the efficiency of the method, a new outlier detection method is proposed in this paper. First, a fast minimal infrequent pattern mining method is proposed to mine the minimal infrequent pattern from data streams. Second, an efficient outlier detection algorithm based on minimal infrequent pattern is proposed for detecting the outliers in the data streams by mining minimal infrequent pattern. The algorithm proposed in this paper is demonstrated by real telemetry data of a satellite in orbit. The experimental results show that the proposed method not only can be applied to satellite outlier detection, but also is superior to the existing methods.

[1] Bakariya B. and Thakur G., “An Efficient Algorithm for Extracting Infrequent Itemsets from Weblog,” The International Arab Journal of Information Technology, vol. 16, no. 2, pp. 275-280, 2019.

[2] Borah A. and Nath B., “Incremental Rare Pattern Based Approach for Identifying Outliers in Medical Data,” Applied Soft Computing Journal, vol. 85, pp. 1-22, 2019.

[3] Böhm C., Plant C., Shao J., and Yang Q., “Clustering by Synchronization,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 583-592, 2010.

[4] Cai S., Li S., Yuan G., Hao S., and Sun R., “Mifi-Outlier: Minimal Infrequent Itemset-Based Outlier Detection Approach on Uncertain Data Stream,” Knowledge-Based Systems, vol. 191, pp. 105268, 2020.

[5] Cai S., Sun R., Hao S., Li S., and Yuan G., “An Efficient Outlier Detection Approach on Weighted Data Stream Based on Minimal Rare Pattern Mining,” China Communications, vol. 16, no. 10, pp. 83-99, 2019.

[6] Chang J. and Lee W., “A Sliding Window Method For Finding Recently Frequent Itemsets Over Online Data Streams,” Journal of Information Science and Engineering, vol. 20, no. 4, pp. 753-762, 2004.

[7] Han J., Kamber M., and Pei J., Data Mining: Concepts and Techniques (3th ed.), Elsevier, 2011.

[8] Hawkins D., Identification of Outliers, Springer, 1980.

[9] He Z., Xu X., Huang J., and Deng S., “FP- utlier: Frequent Pattern Based Outlier Detection,” Computer Science and Information Systems, vol. 2, no. 1, pp. 103-118, 2005.

[10] Hemalatha C., Vaidehi V., and Lakshmi R., “Minimal Infrequent Pattern Based Approach for Mining Outliers in Data Streams,” Expert Systems with Applications, vol. 42, no. 4, pp. 1998-2012, 2015.

[11] Hido S., Tsuboi Y., Kashima H., Sugiyama M., and Kanamori T., “Statistical Outlier Detection Using Direct Density Ratio Estimation,” Knowledge and Information Systems, vol. 26, no. 2, pp. 309-336, 2011.

[12] Kataria M., Oswald C., and Sivaselvan B., “A Novel Rare Itemset Mining Algorithm Based on Recursive Elimination,” in Proceedings of Software Engineering, Singapore, pp. 221-233, 2019.

[13] Lei Y., Man L., Weisong H., Song G., and Xie K., “Efficient Methods for Rare Sequential Pattern Mining,” Journal of Frontiers of Computer Science and Technology, vol. 9, no. 4, pp. 429-437, 2015.

[14] Li Y., Li D., Wang S., and Zhai Y., “Incremental Entropy-Based Clustering on Categorical Data Streams with Concept Drift,” Knowledge-Based Systems, vol. 59, no. 2, pp. 33-47, 2014.

[15] Liu B., Xiao Y., Cao L., Hao Z., and Deng F., “Svdd-Based Outlier Detection on Uncertain Data,” Knowledge and Information Systems, vol. 34, no. 3, pp. 597-618, 2013.

[16] Ouyang W., “Mining Rare Sequential Patterns in Data Streams with A Sliding Window,” in Proceedings of 3rd International Conference on Systems and Informatics, Shanghai, pp. 1023- 1027, 2017.

[17] Shahraki A. and Haugen Ø., “An Outlier Detection Method to Improve Gathered Datasets for Network Behavior Analysis in IoT,” Journal of Communications, vol. 14, no. 6, pp. 455-462, 2019.

[18] Singh M. and Pamula R., “An Outlier Detection Approach in Large-Scale Data Stream Using Rough Set,” Neural Comput and Applic, vol. 32, pp. 9113-9127, 2020.

[19] Todeschini R., Ballabio D., Consonni V., Sahigara F., and Filzmoser P., “Locally Centred Mahalanobis Distance: A New Distance Measure with Salient Features Towards Outlier Detection,” Analytica Chimica Acta, vol. 787, no. 13, pp. 1-9, 2013. 870 The International Arab Journal of Information Technology, Vol. 18, No. 6, November 2021 ZhongYu Zhou has received BSc degree from Anhui University of Science and Technology in 2017. He is currently a Ph. D. candidate at the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics. He has some publications in national key journals. His research interests include data mining, outlier detection and big data analysis. DeChang Pi received Ph.D Degree in Nanjing University of Aeronautics and Astronautics in 2002. He is a full professor in the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics and has been teaching and guiding M.S/Ph.D students. He has published more than 100 academic papers, 20 computer software copyrights have been obtained, and 10 invention patents have been authorized. His research interests include data mining and big data management and analysis. He is a senior member of the Chinese Computer Society (CCF).