The International Arab Journal of Information Technology (IAJIT)


Lean Database: An Interdisciplinary Perspective Combining Lean Thinking and Technology

The continuous improvement approach is key to achieve a sustainable competitive advantage for organizations in their business processes. Nowadays, organizational business processes are seen through an automated function under the umbrella of organizational information systems. The huge amount of automated business processes produces data embedded with a part of messy data that could provide corrupt data. This study uses a lean thinking concept integrated with the data cleaning approach to reduce the waste of data according to business requirements and to enhance continuous improvement as part of a data defect reduction strategy. A new approach of improving and cleaning data waste is proposed by combining data cleaning algorithm and lean thinking concepts. After testing the quality and scalability of the algorithm, along with the evaluation of a corrupt dataset, the results showed improvement in the corrupt dataset reduction, leading to higher organizational performance in business processes. This integration can help researchers and technologists to fully understand and benefit from interdisciplinary capabilities while building bridges between different fields.

[1] Al-janabi S. and Janicki R., “Corroborating Quality of Data Through Density Information,” in Proceedings of SAI Intelligent Systems Conference, London, pp. 1128-1146, 2016.

[2] Al-janabi S., Hamid A., and Janicki R., “DatumPIPE: Data Generator and Corrupter for Multiple Data Quality Aspects,” in Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Sydney, pp. 589-592, 2017.

[3] Amerson A. and Parsons E., “Evaluating the Sustainability of The Gray-Whale-Watching Industry Along The Pacific Coast of North America,” Journal of Sustainable Tourism, vol. 26, no. 8, pp. 1362-1380, 2018.

[4] Batini C. and Scannapieco M., Data and Information Quality: Dimensions, Principles and Techniques, Springer, 2018.

[5] Batini C. and Scannapieca M., Data Quality Dimensions. Data Quality: Concepts, Methodologies and Techniques, Springer, 2006.

[6] Batini C., Cappiello C., Francalanci C., and Maurino A., “Methodologies for Data Quality Assessment and Improvement,” ACM Computing Surveys (CSUR), vol. 41, no. 3, pp. 16, 2009.

[7] Belhadi A., Kamble S., Zkik K., Cherrafi A., and Touriki E., “The integrated effect of Big Data Analytics, Lean Six Sigma and Green Manufacturing on the Environmental Performance of Manufacturing Companies: The Case of North Africa,” Journal of Cleaner Production, vol. 252, 2020.

[8] Bellatreche L., Valduriez P., and Morzy T., “Advances in Databases and Information Systems,” Information Systems Frontiers, vol. 20 no. 1, pp. 1-6, 2018.

[9] Birkinshaw J., “What to Expect from Agile,” MIT Sloan Management Review, vol. 59, no. 2, pp. 39- 42, 2018.

[10] Bohannon P., Fan W., Flaster M., and Rastogi R., “A Cost-Based Model And Effective Heuristic for Repairing Constraints By Value Modification,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimor, pp. 143-154, 2005.

[11] Caldera S., Desha C., and Dawes L., “Exploring The Role of Lean Thinking in Sustainable Business Practice: A Systematic Literature Review,” Journal of Cleaner Production, vol. 167, pp. 1546-1565, 2017.

[12] Cawley O., Wang X., and Richardson I., “Lean Software Development-What Exactly are We Talking About?,” in Proceedings of International Conference on Lean Enterprise Software and Systems, Galway, pp. 16-31, 2013.

[13] Cheng H., Feng D., Shi X., and Chen C., “Data Quality Analysis and Cleaning Strategy for Wireless Sensor Networks,” EURASIP Journal on Wireless Communications and Networking, vol. 2018, no.1 pp. 1-11, 2018.

[14] Chomicki J. and Marcinkowski J., “Minimal- Change Integrity Maintenance Using Tuple Deletions,” Information and Computation, vol. 197, no. 1-2, pp. 90-121, 2005.

[15] Christen P., Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, Springer Science and Business Media, 2012.

[16] Chu X., Ilyas I., Krishnan S., and Wang J., “Data cleaning: Overview and Emerging Challenges,” in Proceedings of the International Conference on Management of Data, San Francisco, pp. 2201-2206, 2016.

[17] Cohen W. and Richman J., “Learning to Match Lean Database: An Interdisciplinary Perspective Combining Lean Thinking and Technology 33 and Cluster Large High-Dimensional Data Sets for Data Integration,” in Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Alberta, pp. 475-480, 2002.

[18] Cohn D., Atlas L., and Ladner R., “Improving Generalization with Active Learning,” Machine Learning, vol. 15, no. 2, pp. 201-221, 1994.

[19] Corrales D., Ledezma A., and Corrales J., “From Theory to Practice: A Data Quality Framework for Classification Tasks,” Symmetry, vol. 10, no. 7, pp. 248, 2018.

[20] Daniel E., Pasquire C., Dickens G., and Ballard G., “The Relationship between the Last Planner® System and Collaborative Planning Practice in UK Construction,” Engineering, Construction and Architectural Management, vol. 24, no. 3, pp. 407-425, 2017.

[21] De Freitas J., Costa H., and Ferraz F., “Impacts of Lean Six Sigma over Organizational Sustainability: A Survey Study,” Journal of Cleaner Production, vol. 156, pp. 262-275, 2017.

[22] De D., Chowdhury S., Dey P., and Ghosh S., “Impact of Lean and Sustainability Oriented Innovation on Sustainability Performance of Small and Medium Sized Enterprises: A Data Envelopment Analysis-Based Framework,” International Journal of Production Economics, vol. 219, pp. 416-430, 2020.

[23] Dey P., Malesios C., De D., Chowdhury S., and Abdelaziz F., “The Impact of Lean Management Practices and Sustainably-Oriented Innovation on Sustainability Performance of Small and Medium-Sized Enterprises: Empirical Evidence from the UK,” British Journal of Management, vol. 31, no. 1, pp. 141-161, 2020.

[24] Ester M., Kriegel H., Sander J., and Xu X., “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, pp. 226-231, 1996.

[25] Fan W., Geerts F., and Wijsen J., “Determining the Currency of Data,” ACM Transactions on Database Systems, vol. 37, no. 4, pp. 1-46, 2012.

[26] Fellegi I. and Sunter A., “A Theory for Record Linkage,” Journal of the American Statistical Association, vol. 64, no. 328, pp. 1183-1210, 1969.

[27] Frodeman R., Klein J., and Pacheco R., the Oxford Handbook of Interdisciplinarity, Oxford University Press, 2017.

[28] Ghouzali S. and Larabi S., “Face Identification Based Bio-Inspired Algorithms,” The International Arab Journal of Information Technology, vol. 17, no.1, pp. 118-127, 2020.

[29] Guha S., Koudas N., Marathe A., and Srivastava D., “Merging the Results of Approximate Match Operations,” in Proceedings of the 3th International Conference on very Large Data Bases-Volume 30, Toronto, pp. 636-647, 2004.

[30] Guo A., Liu X., and Sun T., “Research on Key Problems of Data Quality in Large Industrial Data Environment,” in Proceedings of the 3rd International Conference on Robotics, Control and Automation, Chengdu, pp. 245-248, 2018.

[31] Hernández M. and Stolfo S., “The Merge/Purge Problem for Large Databases,” ACM Sigmod Record, vol. 24, no. 2, pp. 127-138, 1995.

[32] Hernández M. and Stolfo S., “Real-World Data is Dirty: Data Cleansing and the Merge/Purge Problem,” Data Mining and Knowledge Discovery, vol. 2, no. 1, pp. 9-37, 1998.

[33] Hibbs C., Jewett S., and Sullivan M., The Art of Lean Software Development: A Practical and Incremental Approach, O’Reilly Media, 2009.

[34] Hicks B., “Lean Information Management: Understanding and Eliminating Waste,” International Journal of Information Management, vol. 27, no. 4, pp. 233-249, 2007.

[35] Hölttä V., Mahlamäki K., Eisto T., and Ström M., “Lean Information Management Model for Engineering Changes,” World Academy of Science, Engineering and Technology, vol. 42, no. 2010, pp. 1459-1466, 2010.

[36] Huang Y. and Chiang F., “Towards a Unified Framework for Data Cleaning and Data Privacy,” in Proceedings of International Conference on Web Information Systems Engineering, Miami, pp. 359-365, 2015.

[37] Janes A. and Succi G., Lean Software Development in Action, Springer, 2014.

[38] Januzaj E., Kriegel H., and Pfeifle M., “Towards Effective and Efficient Distributed Clustering,” in Proceedings of Workshop on Clustering Large Data Sets, Melbourne, 2003.

[39] Jaro M., “Advances in Record-Linkage Methodology As Applied to Matching The 1985 Census of Tampa, Florida,” Journal of the American Statistical Association, vol. 84, no. 406, pp. 414-420, 1989.

[40] Kamble S., Gunasekaran A., and Gawankar S., “Achieving Sustainable Performance in A Data- Driven Agriculture Supply Chain: A Review for Research and Applications,” International Journal of Production Economics, vol. 219, pp. 179-194, 2020.

[41] Küfner T., Uhlemann T., and Ziegler B., “Lean Data in Manufacturing Systems: Using Artificial Intelligence for Decentralized Data Reduction and Information Extraction,” Procedia CIRP, vol. 72, pp. 219-224, 2018.

[42] Lee J., McFadden K., and Gowen C., “An Exploratory Analysis for Lean and Six Sigma Implementation in Hospitals: Together is Better?,” Health Care Management Review, vol. 34 The International Arab Journal of Information Technology, Vol. 18, No. 1, January 2021 43, no. 3, pp. 182-192, 2018.

[43] Lee Y., Pipino L., and Wang R., and Funk J., Journey to Data Quality, The MIT Press, 2009.

[44] Liker J. and Morgan J., “The Toyota Way in Services: The Case of Lean Product Development,” Academy of Management Perspectives, vol. 20, no. 2, pp. 5-20, 2006.

[45] Majiwala H., Parmar D., and Gandhi P., “Leeway of Lean Concept to Optimize Big Data in Manufacturing Industry: An Exploratory Review,” in Proceedings of Data Science and Big Data Analytics, Singapore, pp. 189-199, 2019.

[46] Naumann F. and Herschel M., “An Introduction to Duplicate Detection,” Synthesis Lectures on Data Management, vol. 2, no. 1, pp. 1-87, 2010.

[47] Poppendieck M. and Poppendieck T., Implementing Lean Software Development: From Concept to Cash, Pearson Education, 2007.

[48] Rahm E. and Do H., “Data Cleaning: Problems and Current Approaches,” IEEE Data Engineering Bull., vol. 23, no. 4, pp. 3-13, 2000.

[49] Ramadan B., “Indexing Techniques for Real- Time Entity Resolution,” PhD Thesis, the Australian National University, 2016.

[50] Rammelaere J. and Geerts F., “Explaining Repaired Data with CFDs,” in Proceedings of the VLDB Endowment, Los Angeles, pp. 1387-1399, 2018.

[51] Redeker G., Kessler G., and Kipper L., “Lean Information for Lean Communication: Analysis of Concepts, Tools, References, and Terms,” International Journal of Information Management, vol. 47, pp. 31-43, 2019.

[52] Rodríguez P., Mäntylä M., Oivo M., Lwakatare L., Seppänen P., and Kuvaja P., “Advances in Using Agile And Lean Processes for Software Development,” Advances in Computers, vol. 113, pp. 135-224, 2019.

[53] Salem R., “A Manifold Learning Framework for Reducing High-Dimensional Big Text Data,” in Proceedings of 12th International Conference on Computer Engineering and Systems, Cairo, pp. 347-352, 2017.

[54] Salem R. and Abdo A., “Fixing Rules for Data Cleaning Based on Conditional Functional Dependency,” Future Computing and Informatics Journal, vol. 1, no. 1-2, pp. 10-26, 2016.

[55] Salem M., Bouazizi E., Duvallet D., and Bouaziz R., “(m, k)-Firm Constraints and Derived Data Management for the Qos Enhancement in Distributed Real-Time DBMS,” The International Arab Journal of Information Technology, vol. 16, no. 3, pp. 424-434, 2019.

[56] Strong D., Lee Y., and Wang R., “Data Quality in Context,” Communications of the ACM, vol. 40, no. 5, pp. 103-110, 1997.

[57] Verykios V., Elmagarmid A., and Houstis E., “Automating the Approximate Record-Matching Process,” Information Sciences, vol. 126, no. 1-4, pp. 83-98, 2000.

[58] Wang J., Krishnan S., Franklin M., Goldberg K., Kraska T., and Milo T., “A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Snowbird, pp. 469-480, 2014.

[59] Wijsen J., “Database Repairing Using Updates,” ACM Transactions on Database Systems, vol. 30, no. 3, pp. 722-768, 2005.

[60] Womack J. and Jones D., “Lean Consumption,” Harvard Business Review, vol. 83 no. 3, pp. 58- 68, 2005.

[61] Zheng M., Tucek J., Qin F., and Lillibridge M., “Understanding The Robustness of SSDs Under Power Fault,” in Proceedings of the 11th USENIX Conference on File and Storage Technologies, San Jose, pp. 271-284, 2013. Lean Database: An Interdisciplinary Perspective Combining Lean Thinking and Technology 35 Jamil Razmak is an Assistant Professor in the Department of Management in the College of Business. He received his Master’s in Business Administration and PhD in Interdisciplinary Studies of Business Technology Management from the Laurentian University in Canada. His primary research interests are in the field of business and technology management. Specifically, he is interested in e-health innovative technology and change management, DSS and business analytics. Samir Al-Janabi received his Master and PhD degrees in Software Engineering from McMaster University, Hamilton, Canada. His research is motivated by the tremendous value in data. His primary research interests are broadly in the area of data management with a focus on data quality, databases, software engineering, and machine learning. He has extensive experience in software development in different aspects from analysis and design to implementation and testing. Faten Kharbat is an Associate Professor in Computer Science, and received her PhD in Artificial Intelligence from the University of West of England, Bristol, UK. Her main research interest is learning classifier systems, cancer care, knowledge based systems, applying data mining techniques to marketing, information systems, enterprise social networking, and recently was involved in e-learning systems and quality of higher education. Charles Bélanger is currently a Senior Business Professor with front line experience at the executive and middle management levels in complex organizations as well as in the private sector. He holds a PhD in Institutional Assessment and Quantitative Analysis from Florida State University. He has published extensively and received national and international awards for his achievements. He has consulted widely across the world in health management, vocational training and organizational audit.