Based on Correlation Analysis and K-Means: An Anomaly Detection Algorithm for Seasonal Time-Series Data
Anomaly detection is widely used in fields like data processing, intrusion detection, and financial fraud prevention, helping to avoid potential accidents and economic losses. In time series anomaly detection, which deals with numerical sequences over time (e.g., urban temperatures, sales data, stock market trends), the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is an excellent choice. This paper presents an improved anomaly detection algorithm tailored for seasonal time series data. By combining autocorrelation coefficients with the K-means algorithm, precise clustering results down to the date level are provided, subsequently employing the DBSCAN algorithm for detection, the enhanced algorithm is capable of capturing a greater number of local anomalies. Experiment conducted on daily temperature data from Beijing and Sanya in 2023, the enhanced algorithm exhibited a respective increase of 11.6% and 78% in anomaly detection compared to the original algorithm, thus affirming the feasibility of the approach.
[1] Chandola V., Banerjee A., and Kumar V., “Anomaly Detection: A Survey,” ACM Computing Surveys, vol. 41, no. 3, pp. 1-58, 2009. https://doi.org/10.1145/1541880.154188
[2] Caudillo-Cos C., Montejano-Escamilla J., Tapia- McClung R., Ávila-Jiménez F., and Barrera- Alarcón I., “Defining Urban Boundaries through DBSCAN and Shannon's Entropy: The Case of the Mexican National Urban System,” Cities, vol. 149, pp. 104969, 2024. https://doi.org/10.1016/j.cities.2024.104969
[3] Dai Y., Sun S., and Che L., “Improved DBSCAN- Based Data Anomaly Detection Approach for Battery Energy Storage Stations,” Journal of Physics: Conference Series, vol. 2351, no. 1, pp. 012025, 2022. DOI:10.1088/1742- 6596/2351/1/012025
[4] Ding X., Yu S., Wang M., Wang H., Gao H., and Yang D., “Anomaly Detection on Industrial Time Series Based on Correlation Analysis,” Journal of Software, vol. 31, no. 3, pp. 726-747, 2020. DOI:10.13328/j.cnki.jos.005907
[5] Ester M., Kriegel H., Sander J., and Xu X., “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portlands, pp. 226-231, 1996.
[6] Ghamkhar H., Ghazizadeh M., Mohajeri S., Moslehi I., and Khoshqalb E., “An Unsupervised Method to Exploit Low-Resolution Water Meter Based on Correlation Analysis and K-Means: An Anomaly Detection Algorithm for Seasonal ... 985 Data for Detecting End-Users with Abnormal Consumption: Employing the DBSCAN and Time Series Complexity,” Sustainable Cities and Society, vol. 94, pp. 104516, 2023. https://doi.org/10.1016/j.scs.2023.104516
[7] Gholizadeh N., Saadatfar H., and Hanafi N., “K- DBSCAN: An Improved DBSCAN Algorithm for Big Data,” The Journal of Supercomputing, vol. 77, pp. 6214-6235, 2021.
[8] Habeeb R., Nasaruddin F., Gani A., Amanullah M., Hashem I., Ahmed E., and Imran M., “Clustering‐Based Real‐Time Anomaly Detection-A Breakthrough in Big Data Technologies,” Transactions on Emerging Telecommunications Technologies, vol. 33, no. 8, pp. 3647, 2019. DOI:10.1002/ett.3647
[9] Hilal W., Gadsden S., and Yawney J., “Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances,” Expert Systems with Applications, vol. 193, pp. 116429, 2021. https://doi.org/10.1016/j.eswa.2021.116429
[10] Hawkins D., Identification of Outliers, Springer, 1980.
[11] Huang X., Wang Y., Li C., and Xu H., “Improved DBSCAN Algorithm Based Signal Recovery Technology in Coherent Optical Communication Systems,” Optics Communications, vol. 521, pp. 128590, 2022. https://doi.org/10.1016/j.optcom.2022.128590
[12] He Q., Wang M., Liu K., Li K., and Jiang Z., “GPRChinaTemp1km: A High-Resolution Monthly Air Temperature Data Set for China (1951-2020) Based on Machine Learning,” Earth System Science Data, vol. 14, no. 7, pp. 3273- 3292, 2022. DOI:10.5194/essd-14-3273-2022
[13] Jain P., Quamer W., and Pamula R., “Electricity Consumption Forecasting Using Time Series Analysis,” in Proceedings of the International Conference on Advances in Computing and Data Sciences, Dehradun, pp. 327-335, 2018. DOI:10.1007/978-981-13-1813-9_33
[14] Jin F., Wu H., Liu Y., Zhao J., and Wang W., “Varying-Scale HCA-DBSCAN-based Anomaly Detection Method for Multi-Dimensional Energy Data in Steel Industry,” Information Sciences, vol. 647, pp. 119479, 2023. https://doi.org/10.1016/j.ins.2023.119479
[15] Jain P., Bajpai M., and Pamula R., “A Modified DBSCAN Algorithm for Anomaly Detection in Time-Series Data with Seasonality,” The International Arab Journal of Information Technology, vol. 19, no. 1, pp. 23-28, 2022. https://doi.org/10.34028/iajit/19/1/3
[16] Kim S., Park S., and Chu W., “An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases,” in Proceedings of the 17th International Conference on Data Engineering, Heidelberg, pp. 607-614, 2001. DOI:10.1109/ICDE.2001.914875
[17] Li G. and Jung J., “Deep Learning for Anomaly Detection in Multivariate Time Series: Approaches, Applications, and Challenges,” Information Fusion, vol. 91, pp. 93-102, 2023. https://doi.org/10.1016/j.inffus.2022.10.008
[18] Li Z., Chen W., and Pei D., “Robust and Unsupervised KPI Anomaly Detection Based on Conditional Variational Autoencoder,” in Proceedings of the 37th International Performance Computing and Communications Conference, Orlando, pp. 1-9, 2018. DOI:10.1109/PCCC.2018.8710885
[19] Liu W., Lei P., Xu D., and Zhu X., “Anomaly Recognition, Diagnosis and Prediction of Massive Data Flow Based on Time-GAN and DBSCAN for Power Dispatching Automation System,” Processes, vol. 11, no. 9, pp. 2782-2799, 2023. https://doi.org/10.3390/pr11092782
[20] Lin S., Clark R., Birke R., Schonborn S., Trigoin N., and Roberts S., “Anomaly Detection for Time Series Using VAE-LSTM Hybrid Model,” International Conference on Acoustics, Speech and Signal Processing, Barcelona, pp. 4322-4326, 2020. DOI:10.1109/ICASSP40776.2020.9053558
[21] Li J., Di S., Shen Y., and Chen L., “FluxEV: A Fast and Effective Unsupervised Framework for Time-Series Anomaly Detection,” in Proceedings of the 14th ACM International Conference on Web Search and Data Mining, New York, pp. 824-832, 2021. DOI:10.1145/3437963.3441823
[22] Liu H., Yao R., Cui C., and Zhao J., “A Data- Mining Interpretation Method of Pavement Dynamic Response Signal by Combining DBSCAN and Findpeaks Function,” Sensors, vol. 24, no. 3, pp. 939, 2024. https://doi.org/10.3390/s24030939
[23] Loke S., MacDonald B., Parsons M., and Wunsche B., “Accelerated Superpixel Image Segmentation with a Parallelized DBSCAN Algorithm,” Journal of Real-Time Image Processing, vol. 18, no. 6, pp. 2361-2376, 2021.
[24] Latha S., Samiappan D., Muthu P., and Kumar R., “Fully Automated Integrated Segmentation of Carotid Artery Ultrasound Images Using DBSCAN and Affinity Propagation,” Journal of Medical and Biological Engineering, vol. 41, pp. 260-271, 2021.
[25] Mardani K. and Maghooli K., “Enhancing Retinal Blood Vessel Segmentation in Medical Images Using Combined Segmentation Modes Extracted by DBSCAN and Morphological Reconstruction,” Biomedical Signal Processing and Control, vol. 69, pp. 102837, 2021. https://doi.org/10.1016/j.bspc.2021.102837
[26] Pei J., Zhong K., Jan M., and Li J., “Personalized Federated Learning Framework for Network Traffic Anomaly Detection,” Computer Networks, 986 The International Arab Journal of Information Technology, Vol. 21, No. 6, November 2024 vol. 209, pp. 108906, 2022. https://doi.org/10.1016/j.comnet.2022.108906
[27] Saba T., Rehman A., Sadad T., Hoshang K., and Bahaj S., “Anomaly-Based Intrusion Detection System for IoT Networks through Deep Learning Model,” Computers and Electrical Engineering, vol. 99, no. C, pp. 107810, 2022. https://doi.org/10.1016/j.compeleceng.2022.107810
[28] Scitovski R. and Sabo K., “DBSCAN-Like Clustering Method for Various Data Densities,” Pattern Analysis and Applications, vol. 23, no. 2, pp. 541-554, 2020.
[29] Su Y., Chen Z., Gong L., Xu X., and Yao Y., “An Improved Adaptive Radar Signal Sorting Algorithm Based on DBSCAN by a Novel CVI,” IEEE Access, vol. 12, pp. 43139-43154, 2024. DOI: 10.1109/ACCESS.2024.3361221
[30] Wang H., He S., Liu T., Pang Y., Lin J., Liu Q., Han K., Wang J., and Jeon G.,“QRS Detection of ECG Signal Using U-Net and DBSCAN,” Multimedia Tools and Applications, vol. 81, no. 10, pp. 13319-13333, 2022.
[31] Wang X., Guo Y., and Yang B., “Study on Suboptimal Diffusion Layer Based on Rotational- XOR Shifted Structure over Finite Domain(�28)8,” Journal of Shaanxi University of Science and Technology, vol. 41, no. 4, pp. 188- 194, 2023.