The International Arab Journal of Information Technology (IAJIT)


Constraint-Based Sequential Pattern Mining: A Pattern Growth Algorithm Incorporating

Sequential pattern mining is advantageous for seve ral applications. For example, it finds out the sequential purchasing behavior of majority customers from a la rge number of customer transactions. However, the e xisting researches in the field of discovering sequential patterns are ba sed on the concept of frequency and presume that th e customer purchasing behavior sequences do not fluctuate with change in time, purchasing cost and other parameters. To accl imate the sequential patterns to these changes, constraint are integrate d with the traditional sequential pattern mining ap proach. It is possible to discover more user)centered patterns by integrating certain constraints with the sequential mining process. Thus in this paper, monetary and compactness constraints in addition to frequency and length are included in the sequential mining process for discovering pertinent sequential patterns from sequ ential databases. Also, a CFML)PrefixSpan algorithm is proposed by integrating these constraints with the original Pre fixSpan algorithm, which allows discovering all CFM L sequential patterns from the sequential database. The proposed CFML)Pre fixSpan algorithm has been validated on synthetic sequential databases. The experimental results ensure that the efficacy of the sequential pattern mining process is further enhanced in view of the fact that the purchasing cost, time dur ation and length are integrated with the sequential pattern mining process.

[1] Agrawal R., Imielinski T., and Swami A., Database Mining: A Performance Perspective, IEEE Transaction Knowledge and Data Engineering , vol. 5, no. 6, pp. 9146925, 1993.

[2] Agrawal R. and Srikant R., Mining Sequential Patterns, in Proceedings of the 11 th International Conference on Data Engineering , Taiwan, pp. 3614, 1995.

[3] Antunes C. and Oliveira A., Sequential Pattern Mining with Approximated Constraints, in Proceedings of the International Conference on Applied Computing , pp. 1316138, 2004.

[4] Bigus J., Data Mining with Neural Networks: Solving Business Problems from Application Development to Decision Support , McGraw6Hill, 1996.

[5] Bisaria J., Shrivastava N., and Pardasani K., A Rough Sets Partitioning Model for Mining Sequential Patterns with Time Constraint, International Journal of Computer Science and Information Security , vol. 2, no. 1, pp. 169, 2009.

[6] Bisaria J., Srivastav N., and Pardasani K., A Rough Set Model for Sequential Pattern Mining with Constraints, The International Journal of Computer and Network Security , vol. 1, no. 2, pp. 169, 2009.

[7] Chen E., Cao H., Li Q., and Qian T., Efficient Strategies for Tough Aggregate Constraint6Based Sequential Pattern Mining, Information Sciences , vol. 178, no. 6, pp. 149861518, 2008.

[8] Chen Y., Kuo M., Wu S., and Tang K., Discovering Recency, Frequency, and Monetary (RFM) Sequential Patterns from Customers' Purchasing Data, Electronic Commerce Research and Applications , vol. 8, no. 5, pp. 2416 251, 2009.

[9] Fayyad U., Shapiro G., and Smyth P., From Data Mining to Knowledge Discovery: An Constraint)Based Sequential Pattern Mining: A Pattern Growth Algorithm Incorporating Compactness 41 Overview, in Proceedings of Advances in Knowledge Discovery and Data Mining , USA, pp. 1634, 1996.

[10] Fiot C., Laurent A., and Teisseire M., Extended Time Constraints for Sequence Mining, in Proceedings of the 14 th International Symposium on Temporal Representation and Reasoning , Spain, pp. 1056116, 2007.

[11] Frawley W., Shapiro G., and Matheus C., Knowledge Discovery in Databases: An Overview, AI Magazine , vol. 13, no. 3, pp. 2136 228, 1992.

[12] Han J. and Fu Y., Attribute6Oriented Induction in Data Mining, in Proceedings of Advances in Knowledge Discovery and Data Mining , USA, pp. 3996 421, 1996.

[13] Han J. and Kamber M., Data Mining: Concepts and Techniques , Morgan Kaufman Publishers, 2001.

[14] Hou S. and Zhang X., Alarms Association Rules Based on Sequential Pattern Mining Algorithm, in Proceedings of the 5 th International Conference on Fuzzy Systems and Knowledge Discovery , Shandong, vol. 2, pp. 5566560, 2008.

[15] Hu Y., The Research of Customer Purchase Behavior using Constraint6Based Sequential Pattern Mining Approach, Thesis Report, National Central University Library Electronic Theses & Dissertations System, 2007.

[16] Julisch K., Data Mining for Intrusion Detection ) A Critical Review , Application of Data Mining in Computer Security, Kluwer Academic Publisher, Boston, 2002.

[17] Lin M. and Lee S., Efficient Mining of Sequential Patterns with Time Constraints by Delimited Pattern Growth, Knowledge and Information Systems , vol. 7, no. 4, pp. 4996514, 2005.

[18] Mallick B., Garg D., and Grover P., Incremental Mining of Sequential Patterns: Progress and Challenges, Intelligent Data Analysis , vol. 17, no. 3, pp. 5076530, 2013.

[19] Mallick B., Garg D., and Grover P., CFM6 PrefixSpan: A Pattern Growth Algorithm Incorporating Compactness and Monetary, International Journal of Innovative Computing, Information and Control , vol. 8, no. 76A, pp. 450964520, 2012.

[20] Masseglia F., Poncelet P., and Teisseire M., Incremental Mining of Sequential Patterns in Large Databases, Data & Knowledge Engineering , vol. 46, no.1, pp. 976121, 2003.

[21] Masseglia F., Poncelet P., and Teisseire M., Efficient Mining of Sequential Patterns with Time Constraints: Reducing the Combinations, Expert Systems with Applications , vol. 36, no. 2, pp. 267762690, 2009.

[22] Myra S., Web Usage Mining for Web Site Evaluation, Communications of the ACM , vol. 43, no. 8, pp. 1276134, 2000.

[23] Orlando S., Perego R., and Silvestri C., A New Algorithm for Gap Constrained Sequence Mining, in Proceedings of the ACM Symposium on Applied Computing , Cyprus, pp. 5406547, 2004.

[24] Parmar J. and Garg S., Modified Web Access Pattern (mWAP) Approach for Sequential Pattern Mining, Journal of Computer Science , vol. 6, no. 2, pp. 46654, 2007.

[25] Pei J., Han J., Asl B., Wang J., Pinto H., Chen Q., Dayal U., and Hsu M., Mining Sequential Patterns by Pattern6Growth: The PrefixSpan Approach, IEEE Transactions on Knowledge and Data Engineering , vol. 16, no. 10, pp. 14246 1440, 2004.

[26] Pei J., Han J., and Wang W., Constraint6Based Sequential Pattern Mining: the Pattern6Growth Methods, Journal of Intelligent Information Systems , vol. 28, no. 2, pp. 1336160, 2007.

[27] Sobh T., Innovations and Advanced Techniques in Computer and Information Sciences , Springer, Netherlands, 2007.

[28] Srikant R. and Agrawal R., Mining Sequential Patterns: Generalizations and Performance Improvements, in Proceedings of the 5 th International Conference on Extending Database Technology , France, pp. 3617, 1996.

[29] Tang H., Fang W., and Cao Y., A Simple Method of Classification with VCL Components, in Proceedings of the 21 st International CODATA Conference , pp. 55660, 2008.

[30] Yafi E., Al6Hegami A., Afsar A., and Ranjit B., YAMI: Incremental Mining of Interesting Association Patterns, International Arab Journal of Information Technology , vol. 9, no. 6, pp. 5046510, 2012.

[31] Zhao Q. and Bhowmick S., Sequential Pattern Mining: A Survey, Technical Report, Nanyang Technological University, Singapore, 2003. 42 The International Arab Journal of Information Technology, Vol. 11, No. 1, January 2014 Bhawna Mallick received her B.Tech in a computer technology from Nagpur University, India and M.Tech in information technology from Punjabi University, Patiala, India. She is currently, pursuing PhD degree and working as head, Department of Computer Science & Engineering at Galgotias College of Engineering & Technology, Greater Noida affiliated to UP Technical University , India. She has 13 years of industry and academic experience with organizations like Infosys Technologies Ltd, India and NIIT Technologies Ltd, India. She is a member of IEEE. Her researched interest is data mining focusing on sequential mini ng of progressive databases. Deepak Garg received his PhD in the area of efficient algorithm design from Thapar University. He is certified on latest technologies from Sun for Java Products, IBM for web services and brain bench for programming concepts. He is senior member of IEEE, USA, executive member of IEEE Delhi section and secretary of IEEE Computer Societ y, Delhi Section. He is life member of ISTE, CSI, IETE , ISC, British Computer Society and ACM, UK. He started his career as a software engineer in IBM Corporation Southbury, CT, USA and then with IBM Global Services India Pvt Ltd, India. He is presently, working as professor, Thapar University. He has 37 publications in international journals and conferences. He is a member of the Editorial Board of seven International Journals. His active research area is data structure, algorithms and data mining. Preetam Singh Grover received his Ms degree and PhD from Delhi University, India. He is widely travelled and delivered invited talks/key note addresses at many National/International Conferences/ Seminars and Workshops. He is on the Editorial Board of four International Journals. He has written 9 books and many of his articles have appeared in several books published by IEEE of USA. He has published more than 100 research papers in International and National Journals and Conferences including published by IEEE, ACM and Springer. He is presently, Director General at Guru Tegh Bahadur Institute of Technology, GGS Indraprastha Universit y, India. Formerly, he was dean and head of Computer Science Department, Delhi, India. Prof Grover is a member of IEEE Computer Society. His current research interests are: component based and aspect6 oriented software engineering, autonomic embedded systems.