The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Efficient Multiple Pattern Matching Algorithm Based on BMH: MP-BMH

String matching is playing a key role in a wide range of applications in the information computing. Due to its importance, large numbers of different algorithms have been developed over the last 50 years. There are various standards of single and multiple pattern string matching algorithms like Boyer-Moore (BM), Boyer-Moore-Horspool (BMH), Aho-Corasick etc. Multiple pattern string matching is very useful in real world applications. Standard benchmark multiple pattern algorithm Aho-Corasick and many other multiple pattern string matching algorithms are not memory and time efficient for wider length and large number of patterns set on large text data set because they are using the concept like DFA and require full scan of text data. Many string matching tools which are currently popular for string matching are based on these algorithms. Using the bad character principle of BMH, a method for multiple pattern string matching is being developed. Using this method a time and memory efficient exact multiple pattern string matching algorithm Multiple Pattern BMH (MP-BMH) is proposed. Unlike Aho-Corasick, the proffered MP-BMH algorithm provides skipping of unnecessary matching of characters in text while searching like BMH Algorithm. It also provides the skipping of characters in patterns using the similarity between patterns. Along with the shifts while searching, this algorithm also provides shrewd switches among the patterns for efficacious matching.In this paper, the aforesaid method, MP-BMH algorithm with its time, memory and experimental analysis are described.


[1] Abou-Assaleh T. and Ai W., “Survey of Global Regular Expression Print (Grep) Tools,” in Proceedings of Citeseer, Topics in Program Comprehension, Nova Scotia, pp. 1-8, 2004.

[2] Aho A. and Corasick M., “Efficient String Matching: An Aid to Bibliographic Search,” Communications of the ACM, vol. 18, no. 6, pp. 333-340, 1975.

[3] Alzahrani S., Salim N., and Abraham A., “Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 42, no. 2, pp. 133-149, 2012.

[4] Baeza-Yates R. and Gonnet G., “A New Approach to Text Searching,” Communications of the ACM, vol. 35, no. 10, pp. 74-82, 1992.

[5] Baeza-Yates R. and Navarro G., “A Faster Algorithm for Approximate String Matching,” in Proceedings of Annual Symposium on Combinatorial Pattern Matching, Berlin, pp. 1- 23, 1996.

[6] Boyer R. and Moore J., “A Fast String Searching Algorithm,” Communications of the ACM, vol. 20, no. 10, pp. 762-772, 1977.

[7] Brođanac P., Budin L., and Jakobović D., “Parallelized Rabin-Karp Method for Exact String Matching,” in Proceedings of 33rd International Conference on of Information Technology Interfaces, Dubrovnik, pp. 585-590, 2011.

[8] Commentz-Walter B., “A String Matching Algorithm Fast on the Average,” in Proceedings of 6th International Colloquium on Automata, Languages, and Programming, Berlin, pp. 118- 132, 1979.

[9] Dolmiki B., “A Universal Compiler System Based on Production Rules,” BIT Numerical Mathematics, vol. 8, no. 4, pp. 262-275, 1968.

[10] Domolki B., an Algorithm for Syntactical Analysis, Computational Linguistics, 1964.

[11] Fethallah H. and Amine C., “Automated Retrieval of Semantic Web Services: A Matching Based on Conceptual Indexation,” The International Arab Journal of Information Technology, vol. 10, no. 1, pp. 61-66, 2013.

[12] Han Y. and Xu G., “Improved Algorithm of Pattern Matching Based on BMHS,” in Proceedings of IEEE International Conference on Information Theory and Information Security, Beijing, pp. 238-241, 2010.

[13] Horspool R., “Practical Fast Searching in Strings,” Software Practice and Experience, vol. 10, pp. 501-506, 1980. 0 2 4 6 8 10 12 14 16 18 2345678910 MP-BMH running time Time in Seconds Minimum Pattern Size 1128 The International Arab Journal of Information Technology, Vol. 16, No. 6, November 2019

[14] Hume A., “A Tale of Two Greps,” Software Practice and Experience, vol. 18, no. 11, pp. 1063-1072, 1988.

[15] Ke X., Yong C., and Hong Y., “An Improved Wu-Manber Multiple Patterns Matching Algorithm,” in Proceedings of 25th IEEE International Performance, Computing, and Communications Conference, Phoenix, 2006.

[16] Khancome C. and Boonjing V., “New Hashing- Based Multiple String Pattern Matching Algorithms,” in Proceedings of 9th International Conference on Information Technology: New Generations (ITNG), Las Vegas, pp. 195-200, 2012.

[17] Kim B. and Kim I., “Kernel Based Intrusion Detection System,” in Proceedings of 4th Annual ACIS International Conference on Computer and Information Science, Jeju Island, pp. 13-18, 2005.

[18] Kim S. and Kim Y., “A Fast Multiple String- Pattern Matching Algorithm,” in Proceedings of 17th AoM/IAoM Conference on Computer Science, pp. 1-8, 1999.

[19] Knuth D., Morris J., and Pratt V., “Fast Pattern Matching in Strings,” SIAM Journal on Computing, vol. 6, no. 2, pp. 323-350, 1977.

[20] Kouzinopoulos C. and Margaritis K., “A Performance Evaluation of the Pre-processing Phase of Multiple Keyword Matching Algorithms,” in Proceedings of 15th Panhellenic Conference on Informatics, Kastonia, pp. 85-89, 2011.

[21] Lee T., “Generalized Aho-Corasick Algorithm for Signature Based Anti-Virus Applications,” in Proceedings of 16th International Conference on Computer Communications and Networks, Honolulu, pp. 792-797, 2007.

[22] Marturana F., Me G., and Tacconi S., “A Case Study on Digital Forensics in the Cloud,” in Proceedings of International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Sanya, pp. 111-116, 2012.

[23] Miao C., Chang G., and Wang X., “Filtering Based Multiple String Matching Algorithm Combining q-Grams and BNDM,” in Proceedings of 4th International Conference on Genetic and Evolutionary Computing, Shenzhen, pp. 582-585, 2010.

[24] Muth R. and Manber U., “Approximate Multiple String Search,” in Proceedings of Combinatorial Pattern Matching, Lecture Notes in Computer Science, Berlin, pp. 75-86, 1996.

[25] Myers G., “A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming,” Journal of the ACM, vol. 46, no. 3, pp. 395-415, 1999.

[26] Navarro G. and Raffinot M., “Fast and Flexible String Matching by Combining Bit-Parallelism and Suffix Automata,” Journal of Experimental Algorithmics, vol. 5, no. 4, 2000.

[27] Peltola H. and Tarhio J., “Alternative Algorithms for Bit-Parallel String Matching,” in Proceedings of International Symposium on String Processing and Information Retrieval, Berlin, pp. 80-93, 2003.

[28] Qiang Z., “An Improved Multiple Patterns Matching Algorithm for Intrusion Detection,” in Proceedings of IEEE International Conference on Intelligent Computing and Intelligent Systems, Xiamen, pp. 124-127, 2010.

[29] Raita T., “Tuning the Boyer-Moore-Horspool String Searching Algorithm,” Software Practice and Experience, vol. 22, no. 10, pp. 879-884, 1992.

[30] Salmela L., Tarhio J., and Kytojoki J., “Multi Pattern String Matching with Q-Grams,” Journal of Experimental Algorithmic, vol. 11, pp. 1-19, 2006.

[31] Sánchez D., Martín-Bautista M., Blanco I., and de la Torre C., “Text Knowledge Mining: An Alternative to Text Data Mining,” in Proceedings of IEEE International Conference on Data Mining Workshops, Pisa, pp. 664-672, 2008.

[32] Shyamasundar R., “Precedence Parsing Using Domolki's Algorithm,” International Journal of Computer Mathematics, vol. 6, no. 2, pp. 105- 114, 1977.

[33] Sridhar M., Efficient Algorithms for Multiple Pattern Matching, Doctoral Dissertations, the University of Wisconsin, 1986.

[34] Sunday D., “A Very Fast Substring Search Algorithm,” Communications of the ACM, vol. 33, no. 8, pp. 132-142, 1990.

[35] Tao T. and Mukherjee A., “Multiple-Pattern Matching in LZW Compressed Files Using Aho- Corasick Algorithm,” in Proceedings of International Conference on Information Technology: Coding and Computing, Las Vegas, pp. 482, 2005.

[36] The Bible-Pdf E-Book Version of the Bible, www.holybooks.com/download-bible, Last Visited, 2017.

[37] Thompson K., “Programming Techniques: Regular Expression Search Algorithm,” Communications of the ACM, vol. 11, no. 6, pp. 419-422, 1968.

[38] Wang Y., “A New Method to Obtain The Shift- Table in Boyer-Moore’s String Matching Algorithm,” in Proceedings of 19th International Conference on Pattern Recognition, Tampa, pp. 1-4, 2008.

[39] Watson B. and Zwaan G. “A Taxonomy Of Sub Linear Multiple Keyword Pattern Matching Algorithms,” Science of Computer Programming, vol. 27, no. 2, pp. 85-118, 1996. Efficient Multiple Pattern Matching Algorithm Based on BMH: MP-BMH 1129

[40] Watson B., “Taxonomies and Toolkits of Regular Language Algorithms,” Ph.D. Dissertation, Faculty of Computer Science, Eindhoven University of Technology, 1995.

[41] Watson B., “The Performance of Single- Keyword and Multiple-Keyword Pattern Matching Algorithms,” Eindhoven University of Technology, Computing Science Section, 1994.

[42] Wu P. and Shen H., “The Research and Amelioration of Pattern-matching Algorithm in Intrusion Detection System,” in Proceedings of IEEE 14th International Conference on High Performance Computing and Communication and IEEE 9th International Conference on Embedded Software and Systems, Liverpool, pp. 1712-1715, 2012.

[43] Wu S. and Manber U., “Fast Text Searching: Allowing Errors,” Communications of the ACM, vol. 35, no. 10, pp. 83-91, 1992.

[44] Wu S. and Manber U., “Agrep-A Fast Approximate Pattern- Matching Tool,” in Proceedings of Usenix Winter Technical Conference, San Francisco, pp.153-162, 1992.

[45] Wu S. and Manber U., “Fast Text Searching With Errors,” Technical Report, Department of Computer Science, University of Arizona, Tucson, 1991.

[46] Wu S. and Manber U., “A Fast Algorithm for Multi-Pattern Searching,” Technical Report, Department of Computer Science, 1994.

[47] Xiangyan F., Tinggang X., Yidong D., and Youguang Y., “The Research And Improving For Multi-Pattern String Matching Algorithm,” in Proceedings of IEEE International Conference on Intelligent Computing and Intelligent Systems, Xiamen, pp. 266-270, 2010.

[48] Xie L., Liu X., and Yue G., “Improved Pattern Matching Algorithm of BMHS,” in Proceedings of 3rd International Symposium on Information Science and Engineering, Shanghai, pp. 616-619, 2010.

[49] Xiong Z., “A Composite Boyer-Moore Algorithm for the String Matching Problem,” in Proceedings of International Conference on Parallel and Distributed Computing, Applications and Technologies, Wuhan, pp. 492- 496, 2010.

[50] Yuan J., Yang J., and Ding S., “An Improved Pattern Matching Algorithm Based on BMHS,” in Proceedings of 11th International Symposium on Distributed Computing and Applications to Business, Engineering and Science, Guilin, 2012.

[51] Yuan J., Zheng J., and Ding S., “An Improved Pattern Matching Algorithm,” in Proceedings of 3rd International Symposium on Intelligent Information Technology and Security Informatics, Jinggangshan, pp. 599-603,2010.

[52] Zha X. and Sahni S., “GPU-to-GPU and Host-to- Host Multipattern String Matching on a GPU,” IEEE Transactions on Computers, vol. 62, no. 6, pp. 1156-1169, 2013.

[53] Zhang B., Chen X., Ping L., and Wu Z., “Address Filtering Based Wu-Manber Multiple Patterns Matching Algorithm,” in Proceedings of 2nd International Workshop on Computer Science and Engineering, Qingdao, pp. 406-412, 2009. 1130 The International Arab Journal of Information Technology, Vol. 16, No. 6, November 2019 Akhtar Rasool is an Assistant Professor, at Maulana Azad National Institute of Technology. He did M.Tech and PhD in Computer Science and Engineering from Maulana Azad National Institute of Technology. He has published more than 20 research papers in international/national journals and conferences. His research areas include String Matching Algorithms, Parallel Computing, Artificial Intelligence, Big Data Analysis, Software Engineering, Analysis and Design of Algorithms, Cluster and Grid Computing. Gulfishan Firdose Ahmed is an Assistant Professor at College of Agriculture, Powarkheda, affiliated by JNKVV Jabalpur. She did M.Tech from RGPV Bhopal and she received PhD in Computer Science and Engineering from Maulana Azad National Institute of Technology. She has published more than 20 research papers in international/national journals and conferences. Her research areas include wireless network, string Matching Algorithms and high performance computing. Raju Barskar is an Assistant Professor at UIT RGPV Bhopal. He did M.Tech and pursuing PhD in Computer Science and Engineering from Maulana Azad National Institute of Technology. He has published more than 20 research papers in international/national journals and conferences. His research areas include wireless network, VANET, string Matching Algorithms and image processing. Nilay Khare is an associate professor at Maulana Azad National Institute of Technology. He did M.Tech. in Computer Science and Engineering from IIT Delhi. He received Ph.D degree in Computer Science and Engineering. He has been professor in Department of Technical Education, Government of Madhya Pradesh, India. He has also worked as head of State Project Facilitation Unit, Government of Madhya Pradesh, India and as head of department of Computer Science and Engineering, Rajiv Gandhi Technological University, Bhopal, Madhya Pradesh, India. He has published more than 50 research papers in national and international journals. He is also member of Indian Society of Technical Education (ISTE) and Computer Society of India (CSI). His research areas include Wireless Networks, High performance computing and Theoretical Computer Science.