The International Arab Journal of Information Technology (IAJIT)


FAAD: A Self-Optimizing Algorithm for Anomaly Detection

Anomaly/Outlier detection is the process of finding abnormal data points in a dataset or data stream. Most of the anomaly detection algorithms require setting of some parameters which significantly affect the performance of the algorithm. These parameters are generally set by hit-and-trial; hence performance is compromised with default or random values. In this paper, the authors propose a self-optimizing algorithm for anomaly detection based on firefly meta-heuristic, and named as Firefly Algorithm for Anomaly Detection (FAAD). The proposed solution is a non-clustering unsupervised learning approach for anomaly detection. The algorithm is implemented on Apache Spark for scalability and hence the solution can handle big data as well. Experiments were conducted on various datasets, and the results show that the proposed solution is much accurate than the standard algorithms of anomaly detection.

[1] Angiulli F. and Pizzuti C., “Fast Outlier Detection in High Dimensional Spaces,” in Proceedings of European Conference on Principles of Data Mining and Knowledge Discovery, Helsinki, pp. 15-27, 2002.

[2] Angiulli F. and Pizzuti C., “Outlier Mining In Large High-Dimensional Data Sets,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 2, pp. 203-215, 2005.

[3] Angiulli F., Basta S., and Pizzuti C., “Distance- Based Detection and Prediction of Outliers,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 2, pp. 145-160, 2006.

[4] Breunig M., Kriegel H., Ng R., and Sander J., “LOF: Identifying Density-Based Local Outliers,” in Proceedings of ACM SIGMOD Record, vol. 29, no. 2, pp. 93-104, 2000.

[5] Bryson S., Kenwright D., Cox M., Ellsworth D., and Haimes R., “Visually Exploring Gigabyte Data Sets in Real Time,” Communications of the ACM, vol. 42, no. 8, pp. 82-90, 1999.

[6] Dorigo M., Optimization, Learning and Natural Algorithms, Thesis, Politecnico di Milano, 1992.

[7] Eberhart R. and Kennedy J., “A New Optimizer Using Particle Swarm Theory,” in Proceedings of the 6th International Symposium on Micro Machine and Human Science, Nagoya, pp. 39-43, 1995.

[8] Ester M., Kriegel H., Sander J., and Xu X., “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, vol. 96, no. 34, pp. 226-231, 1996.

[9] Fister I., Yang X., and Brest J., “A Comprehensive Review of Firefly Algorithms,” Swarm and Evolutionary Computation, vol. 13, no. 1, pp. 34-46, 2013.

[10] Han J., Pei J., and Kamber M., Data Mining: Concepts and Techniques, Elsevier, 2011.

[11] Hawkins D., Identification of Outliers, Chapman and Hall, 1980.

[12] Hawkins S., He H., Williams G., and Baxter R., “Outlier Detection Using Replicator Neural Networks,” in Proceedings of International Conference on Data Warehousing and Knowledge Discovery, Aix-en-Provence, pp. 170-180, 2002.

[13] Karaboga D. and Basturk B., “A Powerful and Efficient Algorithm for Numerical Function Optimization: Artificial Bee Colony Algorithm,” Journal of Global Optimization, vol. 39, no. 3, pp. 459-471, 2007.

[14] Knorr E. and Ng R., “Algorithms for Mining Distance Based Outliers in Large Datasets,” in Proceedings of the International Conference on Very Large Data Bases, New York, pp. 392-403, 1998.

[15] Koufakou A., Secretan J., Reeder J., Cardona K., and Georgiopoulos M., “Fast Parallel Outlier 1 Detection for Categorical Datasets Using Mapreduce,” in Proceedings of IEEE International Joint Conference on Neural Networks, Hong Kong, pp. 3298-3304, 2008.

[16] Krishnanand K. and Ghose D., “Glowworm Swarm Based Optimization Algorithm for Multimodal Functions with Collective Robotics Applications,” Multiagent and Grid Systems, vol. 2, no. 3, pp. 209-222, 2006.

[17] Li X., A New Intelligent Optimization-Artificial Fish Swarm Algorithm, PhD Thesis, Zhejiang University, 2003.

[18] Liu B., Fan W., and Xiao T., “A Fast Outlier Detection Method for Big Data,” in Proceedings of Asian Simulation Conference, Singapore, pp. 379-384, 2013.

[19] Mohemmed A., Zhang M., and Browne W., “Particle Swarm Optimisation for Outlier Detection,” in Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, Portland, pp. 83-84, 2010.

[20] Papadimitriou S., Kitagawa H., Gibbons P., and Faloutsos C., “Loci: Fast Outlier Detection Using the Local Correlation Integral,” in Proceedings of the 19th International Conference 280 The International Arab Journal of Information Technology, Vol. 17, No. 2, March 2020 on Data Engineering, Bangalore, pp. 315-326, 2003.

[21] Passino K., “Biomimicry of Bacterial Foraging For Distributed Optimization and Control,” IEEE Control Systems Magazine, vol. 22, no. 3, pp. 52- 67, 2002.

[22] Ramaswamy S., Rastogi R., and Shim K., “Efficient Algorithms for Mining Outliers from Large Data Sets,” ACM SIGMOD Record, vol. 29, no. 2, pp. 427-438, 2000.

[23] Sajwan M., Acharya K., and Bhargava S., “Swarm Intelligence Based Optimization for Web Usage Mining in Recommender System,” International Journal of Computer Applications Technology and Research, vol. 3, no. 2, pp. 119- 124, 2014.

[24] Sugumaran P., Ravi K., and Shanmugam T., “A Novel Algorithm for Enhancing Search Results By Detecting Dissimilar Patterns Based on Correlation Method,” The International Arab Journal of Information Technology, vol. 14, no. 1, pp. 60-69, 2017.

[25] Tukey J., Exploratory Data Analysis, Addison- Wesley Publication Company, 1977.

[26] Yan Y., Zhang J., Huang B., Sun X., Mu J., Zhang Z., and Moscibroda T., “Distributed Outlier Detection Using Compressive Sensing,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Melbourne, pp. 3-16, 2015.

[27] Yang X., Nature Inspired Cooperative Strategies for Optimization, Springer, 2010.

[28] Yang X., “Firefly Algorithm, Stochastic Test Functions and Design Optimization,” International Journal of Bio-Inspired Computation, vol. 2, no. 2, pp. 78-84, 2010.

[29] Yang X., Research and Development in Intelligent Systems, Springer, 2010.

[30] Yang X., Nature-Inspired Metaheuristic Algorithms, Luniver Press, 2010.

[31] Yang X. and Deb S., “Cuckoo Search Via Lévy Flights,” in Proceedings of World Congress on Nature and Biologically Inspired Computing, Coimbatore, pp. 210-214, 2009.

[32] Yang X. and He X., “Firefly Algorithm: Recent Advances and Applications,” International Journal Swarm Intelligence, vol. 1, no. 1, pp. 36- 50, 2013.

[33] Zang H., Zhang S., and Hapeshi K., “A Review of Nature-Inspired Algorithms,” Journal of Bionic Engineering, vol. 7, no. 4, pp. 232-237, 2010. Adeel Hashmi is a PhD scholar in Jamia Millia Islamia. He has done his BTech and MTech from IP University Delhi. He has over 8 years of teaching/research experience. His areas of interest in research are machine learning, data mining and big data. Tanvir Ahmad is a Professor in Department of Computer Engineering at Faculty of Engineering and Technology. He has over 20 years of teaching and research experience. He has multiple publications in reputed international journals and conferences.