An Anomaly Detection Method for Weighted Data Based on Feature Association Analysis
In recent years, weighted data is appearing more and more frequent in many applications, but the existence of anomalies decreases the accuracy of data-based operations, thus, it is necessary to detect anomalies to improve the data quality. However, the existing anomaly detection methods for weighted data only consider the Weighted Frequent Itemsets (WFIs) or Weighted Rare Itemsets (WRIs) separately, which causes their detection accuracy is seriously dependent on the preset minimal weighted support (min_wsup) value. To address these issues, we propose an anomaly detection method for weighted data on the basis of feature association analysis, namely ADWD, it accurately detects the anomalies under different min_wsup values through fully considering both WFIs and WRIs. ADWD first deletes infrequent 1-itemses during constructing Weighted Frequent Itemset-based Tree (WFI-Tree), thus decreasing time overhead on the inquiry of extensible itemsets; And then, ADWD defines three deviation metrics through comprehensively considering possible influencing factors to calculate transaction’s abnormal score. Finally, the transactions whose abnormal score in top-rank are judged as anomalies. Extensive experiments on three datasets verify that the proposed ADWD method can more accurately detect anomalies from weighted data within less time usage, as well as has good scalability.
[1] Agrawal R. and Srikant R., “Fast Algorithms for Mining Association Rules in Large Databases,” in Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487-499, 1994.
[2] Bhattacharjee P., Garg A., and Mitra P., “KAGO: An Approximate Adaptive Grid-Based Outlier Detection Approach Using Kernel Density Estimate,” Pattern Analysis and Applications, vol. 24, no. 4, pp. 1825-1846, 2021. https://doi.org/10.1007/s10044-021-00998-6
[3] Bigdeli E., Mohammadi M., Raahemi B., and Matwin S., “Incremental Anomaly Detection Using Two-Layer Cluster-Based Structure,” Information Sciences, vol. 429, pp. 315-331, 2018. https://doi.org/10.1016/j.ins.2017.11.023
[4] Cai S., Chen J., Chen H., Zhang C., Li Q., Sosu R., and Yin S., “An Efficient Anomaly Detection Method for Uncertain Data Based on Minimal Rare Patterns with the Consideration of Anti-0.20.51235 x 105 200 400 600 800 1000 1200 1400 Num. of transactions Time cost (Sec.) MWRPM-Outlier WMFP-Outlier WFP-Outlier Adaptive-KD LODA ADWD 81012152030 500 1000 1500 2000 2500 3000 Dimensions Time cost (Sec.) MWRPM-Outlier WMFP-Outlier WFP-Outlier Adaptive-KD LODA ADWD 126 The International Arab Journal of Information Technology, Vol. 21, No. 1, January 2024 Monotonic Constraints,” Information Sciences, vol. 580, pp. 620-642, 2021. https://doi.org/10.1016/j.ins.2021.08.097
[5] Cai S., Sun R., Hao S., Li S., and Yuan G., “An Efficient Outlier Detection Approach on Weighted Data Stream Based on Minimal Rare Pattern Mining,” China Communications, vol. 16, no. 10, pp. 83-99, 2019. https://doi.org/10.23919/JCC.2019.10.006
[6] Cai S., Li Q., Li S., Yuan G., and Sun, R., “WMFP- Outlier: An Efficient Maximal Frequent-Pattern- Based Outlier Detection Approach for Weighted Data Streams,” Information Technology and Control, vol. 48, no. 4, pp. 505-521, 2019. https://doi.org/10.5755/j01.itc.48.4.22176
[7] Cai S., Li L., Chen J., Zhao K., Yuan G., Sun R., Sosu, R., and Huang L., “MWFP-Outlier: Maximal Weighted Frequent-Pattern-Based Approach for Detecting Outliers from Uncertain Weighted Data Streams,” Information Sciences, vol. 591, pp. 195- 225, 2022. https://doi.org/10.1016/j.ins.2022.01.028
[8] Cai S., Huang R., Chen J., Zhang C., Liu B., Yin S., and Geng Y., “An Efficient Outlier Detection Method for Data Streams Based on Closed Frequent Patterns by Considering Anti-Monotonic Constraints,” Information Sciences, vol. 555, pp. 125-146, 2021. https://doi.org/10.1016/j.ins.2020.12.050
[9] Eom S., Oh B., Shin S., and Lee K., “Multi-Task Learning for Spatial Events Prediction from Social Data,” Information Sciences, vol. 581, pp. 278-290, 2021. https://doi.org/10.1016/j.ins.2021.09.049
[10] Giacometti A. and Soulet A., “Frequent Pattern Outlier Detection Without Exhaustive Mining,” in Proceedings of the 20th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Auckland, pp. 196-207, 2016. https://doi.org/10.1007/978-3-319-31750-2_16
[11] Han J., Pei J., and Yin Y., “Mining Frequent Patterns without Candidate Generation,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, pp. 1- 12, 2000. https://doi.org/10.1145/335191.335372
[12] Hemalatha C., Vaidehi V., and Lakshmi R., “Minimal Infrequent Pattern Based Approach for Mining Outliers in Data Streams,” Expert Systems with Applications, vol. 42, pp. 1998-2012, 2015. https://doi.org/10.1016/j.eswa.2014.09.053
[13] Huo W., Wang W., and Li W., “AnomalyDetect: An Online Distance-Based Anomaly Detection Algorithm,” in Proceedings of 26th International Conference on Web Services, San Diego, pp. 63- 79, 2019. https://doi.org/10.1007/978-3-030- 23499-7_5
[14] Jain P., Bajpai M., and Pamula R., “A modified DBSCAN Algorithm for Anomaly Detection in Time-Series Data with Seasonality,” The International Arab Journal of Information Technology, vol. 19, no. 1, pp. 23-28, 2022. https://doi.org/10.34028/iajit/19/1/3
[15] Ju H., Lee D., Hwang J., Namkung J., and Yu H., “PUMAD: PU Metric Learning for Anomaly Detection,” Information Sciences, vol. 523, pp. 167-183, 2020. https://doi.org/10.1016/j.ins.2020.03.021
[16] Kou Y., Lu C., and Chen D., “Spatial Weighted Outlier Detection,” in Proceedings of SIAM International Conference on Data Mining, Philadelphia, pp. 614-618, 2006. https://doi.org/10.1137/1.9781611972764.71
[17] Li G. and Jung J., “Deep Learning for Anomaly Detection in Multivariate Time Series: Approaches, Applications, and Challenges,” Information Fusion, vol. 91, pp. 93-102, 2023. https://doi.org/10.1016/j.inffus.2022.10.008
[18] Luo Z., He K., and Yu Z., “A Robust Unsupervised Anomaly Detection Framework,” Applied Intelligence, vol. 52, no. 6, pp. 6022- 6036, 2021. https://doi.org/10.1007/s10489-021- 02736-1
[19] Rasheed F. and Alhajj R., “A Framework for Periodic Outlier Pattern Detection in Time-Series Sequences,” IEEE Transactions on Cybernetics, vol. 44, no. 5, pp. 569-582, 2014. https://doi.org/10.1109/TSMCC.2013.2261984
[20] Rezaei F. and Yazdi M., “A New Semantic and Statistical Distance-Based Anomaly Detection in Crowd Video Surveillance,” Wireless Communication and Mobile Computing, vol. 2021, pp. 5513582, 2021. https://doi.org/10.1155/2021/5513582
[21] Sharma K. and Seal A., “Outlier-Robust Multi- View Clustering for Uncertain Data,” Knowledge-Based Systems, vol. 211, pp. 106567, 2021. https://doi.org/10.1016/j.knosys.2020.106567
[22] Shi P., Zhao Z., Zhong H., Shen H., and Ding L., “An Improved Agglomerative Hierarchical Clustering Anomaly Detection Method for Scientific Data,” Concurrency and Computation- Practice and Experience, vol. 33, no. 6, pp. e6077, 2020. https://doi.org/10.1002/cpe.6077
[23] Smrithy G. and Balakrishnan R., “A Statistical- Based Light-Weight Anomaly Detection Framework for Wireless Body Area Networks,” Computer Journal, vol. 65, no. 7, pp. 1752-1759, 2022. https://doi.org/10.1093/comjnl/bxab016
[24] Wang W. and Sun D., “The Improved Adaboost Algorithms for Imbalanced Data Classification,” Information Sciences, vol. 563, pp. 358-374, 2021. https://doi.org/10.1016/j.ins.2021.03.042
[25] Yuan G., Cai S., and Hao S., “A Novel Weighted Frequent Pattern-Based Outlier Detection Method Applied to Data Stream,” in Proceedings of the IEEE 4th International Conference on An Anomaly Detection Method for Weighted Data Based on Feature Association Analysis 127 Cloud Computing and Big Data Analysis, Chengdu, China, pp. 503-510, 2019. https://doi.org/10.1109/ICCCBDA.2019.8725699
[26] Zeng S., Zhang B., Gou J., Xu Y., and Huang W., “Fast and Robust Dictionary-based Classification for Image Data,” ACM Transactions on Knowledge Discovery from Data, vol. 15, no. 6, pp. 1-22, 2021. https://doi.org/10.1145/3449360
[27] Zhang L., Zhao J., and Li W., “Online and Unsupervised Anomaly Detection for Streaming Data Using an Array of Sliding Windows and PDDs,” IEEE Transactions on Cybernetics, vol. 51, no. 4, pp. 2284-2289, 2021. https://doi.org/10.1109/TCYB.2019.2935066