The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


A Review of SDLCs for Big Data Analytics Systems in the Context of Very Small Entities Using the ISO/IEC 29110 Standard-Basic Profile

Context: A Systems Development Life Cycle (SDLC) is a model of phases-activities, roles, and products systematically used to develop software with functional expected quality. Although SDLC is widely applied to various software types, it remains unusual in Big Data Analytics Systems (BDAS). Objective: To address this issue, several SDLCs for BDAS have been proposed, along with comparative studies, to guide interested organizations in adapting them. This research seeks a lightweight, balanced, and feasible for small development teams or organizations, taking advantage of favorable characteristics of the international ISO standard. Method and Materials: This study describes the knowledge gap by reporting a comparative analysis of four relevant SDLCs. A selective research method was applied (CRISP-DM, TDSP, BDPL, and DDSL), focusing on alignment with the recent ISO/IEC 29110-basic profilestandard. The goal was to identify which SDLC contributes and fits better from a lightweight approach. Results: From the rigorous approach Cross Industry Standard Process for Data Mining (CRISP-DM) showed the highest alignment with the standard, for the agile approach it was Domino Data Science Lifecycle (DDSL) being the closest of the four. Team Data Science Process (TDSP) stood out as the most agile of those analyzed but fell short of the required results. BDPL, which manages another standard, was too rigorous and more distant. Conclusions: Research on new SDLC for Big Data Project Lifecycle (BDPL) has been practically nonexistent in software engineering from 2000 to 2023. Only BDPL was found in the academic literature, while the other three came from gray literature. Despite the relevance of this topic for BDAS organizations, no adequate SDLC was identified

[1] Ajah I. and Nweke H., “Big Data and Business Analytics: Trends, Platforms, Success Factors and Applications,” Big Data and Cognitive Computing, vol. 3, no. 2, pp. 1-30, 2019. https://doi.org/10.3390/bdcc3020032

[2] Andoh-Baidoo F., Baker E., Susarapu S., and Kasper G., “A Review of IS Research Activities and outputs Using Pro-Forma Abstracts,” Information Resources Management Journal, vol. 20, no. 4, pp. 65-79, 2007. https://www.igi- global.com/article/review-research-activities- outputs-using/1327

[3] Andoh-Baidoo F., Chavarria J., Jones M., Wang Y., and Takieddine S., “Examining the State of Empirical Business Intelligence and Analytics Research: A Poly-Theoretic Approach,” Information and Management, vol. 59, no. 6, pp. 103677, 2022. https://doi.org/10.1016/j.im.2022.103677

[4] Beulke D., “Big Data Impacts Data Management: The 5 vs of Big Data, https://davebeulke.com/big- data-impacts-data-management-the-five-vs-of- big-data/, Last Visited, 2024.

[5] Boehm B. and Turner R., “Using Risk to Balance Agile and Plan-Driven Methods,” Computer, vol. 36, no. 6, pp. 57-66, 2003. https://ieeexplore.ieee.org/document/1204376

[6] Bourque P. and Fairly R., Guide to the Software Engineering Body of Knowledge, SWEBOK Version 3.0, IEEE Computer Society, 2014. https://ieeecs- media.computer.org/media/education/swebok/sw ebok-v3.pdf

[7] Chapman P., Clinton J., Kerber R., Khabaza T., Reinartz T., Shearer C., and Wirth R., CRISP-DM 1.0-Step-by-Step Data Mining Guide, SPSS Inc, 2000. https://www.kde.cs.uni-kassel.de/wp- content/uploads/lehre/ws2012- 13/kdd/files/CRISPWP-0800.pdf

[8] Clarke P. and O’Connor R., “An Empirical Examination of the Extent of Software Process Improvement in Software SMEs,” Journal of Software: Evolution and Process, vol. 25, no. 9, pp. 981-998, 2013. https://doi.org/10.1002/smr.1580

[9] CMMI Institute, CMMI for Development v2.0, https://cmmiinstitute.com/products/cmmi/cmmiv 2-products, Last Visited, 2024.

[10] Davenport T. and Bean R., “The Quest to Achieve Data-Driven Leadership: A Progress Report on the State of Corporate Data Initiatives-Foreword,” Special Report, New Advantage Partners, 2022. https://c6abb8db-514c-4f5b-b5a1- fc710f1e464e.filesusr.com/ugd/e5361a_2f859f34 57f24cff9b2f8a2bf54f82b7.pdf

[11] Davenport T. and Malone K., “Deployment as a Critical Business Data Science Discipline,” Harvard Data Science Review, vol. 3, no. 1, pp. 1- 11, 2021. https://doi.org/10.1162/99608f92.90814c32

[12] Domino Data Lab, The Practical Guide to Managing Data Science at Scale, Domino Data Lab, https://domino.ai/resources/managing-data- science, Last Visited, 2024.

[13] Ebert C., “The Impacts of Software Product Management,” Journal of Systems and Software, vol. 80, no. 6, pp. 850-861, 2007. https://doi.org/10.1016/j.jss.2006.09.017 A Review of SDLCs for Big Data Analytics Systems in the Context of Very Small Entities ... 209

[14] Fayyad U., Haussler D., and Stolorz P., “KDD for Science Data Analysis: Issues and Examples,” in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, pp. 50-56, 1996. https://cdn.aaai.org/KDD/1996/KDD96-009.pdf

[15] Fayyad U., Piatetsky-Shapiro G., and Smyth P., “The KDD Process for Extracting Useful Knowledge from Volumes of Data,” Communications of the ACM, vol. 39, no. 11, pp. 27-34, 1996. https://doi.org/10.1145/240455.240464

[16] Giray G., “A Software Engineering Perspective on Engineering Machine Learning Systems: State of the Art and Challenges,” Journal of Systems and Software, vol. 180, pp. 111031, 2021. https://doi.org/10.1016/j.jss.2021.111031

[17] Gray E., Jennings W., Farrall S., and Hay C., “Small Big Data: Using Multiple Data-Sets to Explore Unfolding Social and Economic Change,” Big Data and Society, vol. 2, no. 1, pp. 1-6, 2015. https://doi.org/10.1177/2053951715589418

[18] Haakman M., Cruz L., Huijgens H., and Van Deursen A., “AI Lifecycle Models Need to be Revised: An Exploratory Study in Fintech,” Empirical Software Engineering, vol. 26, pp. 1-29, 2021. https://link.springer.com/article/10.1007/s10664- 021-09993-1

[19] Humphrey W., The Team Software Process (TSP), Software Engineering Institute, Technical Report, 2000. https://resources.sei.cmu.edu/asset_files/Technica lReport/2000_005_001_13754.pdf

[20] IBM, IBM SPSS Modeler CRISP-DM Guide, https://www.ibm.com/docs/zh/spss- modeler/18.0.0?topic=spss-modeler-crisp-dm- guide, Last Visited, 2024.

[21] Iranmanesh M., Lim K., Foroughi B., Hong M., and Ghobakhloo M., “Determinants of Intention to Adopt Big Data and Outsourcing among SMEs: Organisational and Technological Factors as Moderators,” Management Decision, vol. 61, no. 1, pp. 201-222, 2023. DOI:10.1108/MD-08-2021- 1059

[22] ISO/IEC, Information Technology-Process Assessment-Requirements for Process Reference, Process Assessment and Maturity Models ISO/IEC 33004, ISO/IEC, 2015. https://www.iso.org/standard/54178.html

[23] ISO/IEC, Software Engineering-Lifecycle Profiles for Very Small Entities (VSEs)-Part 5-1-2: Management and Engineering Guide-Generic Pro-File Group: Basic Profile ISO/IEC TR 29110- 5-1-2, IEEE, 2011. https://www.iso.org/standard/51153.html

[24] ISO/IEC, Systems and Software Engineering- Software Life Cycle Processes, ISO/IEC/IEEE 12207, IEEE, 2017. https://www.iso.org/standard/63712.html

[25] ISO/IEC, Systems and Software Engineering- System Life Cycle Processes ISO/IEC15288, IEEE Std, 2008. https://cdn.standards.iteh.ai/samples/43564/3f353 9f541e3448c9d24fa752859d0b0/ISO-IEC- 15288-2008.pdf

[26] ISO/IEC/IEEE, Systems and Software Engineering-Vocabulary, IEEE, 2017. https://www.iso.org/standard/71952.html

[27] Kitchin R. and Lauriault T., “Small Data in the Era of Big Data,” GeoJournal, vol. 80, no. 4, pp. 463- 475, 2015. https://www.jstor.org/stable/44076310

[28] Kruchten P., Rational Unified Process-best Practices for Software Development Teams, Rational Company, 2014. http://srprojects.free.fr/desgest/downloads2/Ratio nal_Unified_Process_Best_Practices.pdf

[29] Kumar V. and Alencar P., “Software Engineering for Big Data Projects: Domains, Methodologies and Gaps,” in Proceedings of the IEEE International Conference on Big Data, Washington (DC), pp. 2886-2895, 2016. DOI:10.1109/BigData.2016.7840938

[30] Kune R., Konugurthi P., Agarwal A., Chillarige R., and Buyya R., “The Anatomy of Big Data Computing,” Software: Practice and Experience, vol. 46, no. 1, pp. 79-105, 2016. https://onlinelibrary.wiley.com/doi/10.1002/spe.2374

[31] Laigner R., Kalinowski M., Lifschitz S., Monteiro R., and De Oliveira D., “Systematic Mapping of Software Engineering Approaches to Develop Big Data Systems,” in Proceedings of the 44th Euromicro Conference on Software Engineering and Advanced Applications, Prague, pp. 446-453, 2018. DOI:10.1109/SEAA.2018.00079

[32] Laporte C. and O’Connor R., “Software Process Improvement Standards and Guides for very Small Organization: An Overview of Eight Implementations,” CrossTalk, The Journal of Defense Software Engineering, vol. 30, no. 3, pp. 23-27, 2017. https://doras.dcu.ie/21798/

[33] Laporte C. and O’Connor R., “Systems and Software Engineering Standards for Very Small Entities: Accomplishments and Overview,” Computer, vol. 49, no. 8, pp. 84-87, 2016. DOI:10.1109/MC.2016.242

[34] Laporte C., O’Connor R., and Fanmuy G., “International Systems and Software Engineering Standards for Very Small Entities,” CrossTalk, The Journal of Defense Software Engineering, vol. 26, no. 3, pp. 28-33, 2013. https://hdl.handle.net/10344/3083

[35] Larson D. and Chang V., “Review and Future Direction of Agile, Business Intelligence, Analytics and Data Science,” International 210 The International Arab Journal of Information Technology, Vol. 22, No. 1, January 2025 Journal of Information Management, vol. 36, no. 5, pp. 700-710, 2016. https://doi.org/10.1016/j.ijinfomgt.2016.04.013

[36] Lin Y. and Huang S., “The Design of a Software Engineering Lifecycle Process for Big Data Projects,” IT Professional, vol. 20, no. 1, pp. 45- 52, 2018. DOI: 10.1109/MITP.2018.011291352

[37] Lukoianova T. and Rubin V., “Veracity Roadmap: Is Big Data Objective, Truthful and Credible?,” Advances in Classification Research Online, vol. 24, no. 1, pp. 4-15, 2014. https://doi.org/10.7152/acro.v24i1.14671

[38] Madhavji N., Miranskyy A., and Kontogiannis K., “Big Picture of Big Data Software Engineering: with Example Research Challenges,” in Proceedings of the IEEE/ACM 1st International Workshop on Big Data Software Engineering, Florence, pp. 11-14, 2015. DOI:10.1109/BIGDSE.2015.10

[39] Martinez I., Viles E., and Olaizola I., “Data Science Methodologies: Current Challenges and Future Approaches,” Big Data Research, vol. 24, pp. 100183, 2021. https://doi.org/10.1016/j.bdr.2020.100183

[40] Martinez-Plumed F., Contreras-Ochando L., Ferri C., Hernendez-Orallo J., Kull M., Lachiche N., and Flach P., “CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, no. 8, pp. 3048- 3061, 2021. DOI:10.1109/TKDE.2019.2962680

[41] Microsoft, What is the Team Data Science Process?, https://docs.microsoft.com/en- us/azure/machine-learning/team-data-science- process/overview, Last Visited, 2024.

[42] Montoya-Murillo D., Mora M., Galvan-Cruz S., and Munoz-Zavala A., Development Methodologies for Big Data Analytics Systems: Plan-Driven, Agile, Hybrid, Lightweight Approaches, Springer, 2023. https://link.springer.com/chapter/10.1007/978-3- 031-40956-1_5

[43] Mora M., Adelakun O., Reyes-Delgado P., and Diaz O., “AVS_FD_MVITS: An Agile IT Service Design Workflow for Small Data Centers,” The Journal of Supercomputing, vol. 79, pp. 17519- 17561, 2023. https://link.springer.com/article/10.1007/s11227- 023-05244-w

[44] Mora M., Reyes-Delgado P., Galvan-Cruz S., and Solano-Romo L., Development Methodologies for Big Data Analytics Systems: Plan-Driven, Agile, Hybrid, Lightweight Approaches, Springer, 2023. https://link.springer.com/chapter/10.1007/978-3- 031-40956-1_1

[45] Munoz M., Pena A., Mejia J., Gasca‐Hurtado G., Gomez‐Alvarez M., and Laporte C., “Analysis of 13 Implementations of the Software Engineering Management and Engineering Basic Profile Guide of ISO/IEC 29110 in Very Small Entities Using Different Life Cycles,” Journal of Software: Evolution and Process, vol. 32, no. 11, pp. 1-26, 2020. https://onlinelibrary.wiley.com/doi/abs/10.1002/smr.2300

[46] Niazi M., “A Comparative Study of Software Process Improvement Implementation Success Factors,” Journal of Software: Evolution and Process, vol. 27, no. 9, pp. 700-722, 2015. https://doi.org/10.1002/smr.1704

[47] O’Connor R. and Coleman G., “Ignoring “Best Practice”: Why Irish Software SMEs are Rejecting CMMI and ISO 9000,” Australasian Journal of Information Systems, vol. 16, no. 1, pp. 7-30, 2009. https://ajis.aaisnet.org/index.php/ajis/article/view/557/441

[48] O’Connor R. and Laporte C., “The Evolution of the ISO/IEC 29110 Set of Standards and Guides,” International Journal of Information Technologies and Systems Approach, vol. 10, no. 1, pp. 1-21, 2017. https://www.igi- global.com/gateway/article/169765

[49] Oussous A., Benjelloun F., Lahcen A., and Belfkih S., “Big Data Technologies: A Survey,” Journal of King Saud University-Computer and Information Sciences, vol. 30, no. 4, pp. 431-448, 2018. https://doi.org/10.1016/j.jksuci.2017.06.001

[50] Paakkonen P. and Pakkala D., “Reference Architecture and Classification of Technologies, Products and Services for Big Data Systems,” Big Data Research, vol. 2, no. 4, pp. 166-186, 2015. https://doi.org/10.1016/j.bdr.2015.01.001

[51] Pai D., Subramanian G., and Pendharkar P., “Benchmarking Software Development Productivity of CMMI Level 5 Projects,” Information Technology and Management, vol. 16, no. 3, pp. 235-251, 2015. https://link.springer.com/article/10.1007/s10799-015-0234-4

[52] Phillips-Wren G., Daly M., and Burstein F., “Reconciling Business Intelligence, Analytics and Decision Support Systems: More Data, Deeper Insight,” Decision Support System, vol. 146, pp. 113560, 2021. https://doi.org/10.1016/j.dss.2021.113560

[53] Plotnikova V., Dumas M., and Milani F., “Adaptations of Data Mining Methodologies: A Systematic Literature Review,” PeerJ Computer Science, vol. 25, no. 6, pp. 1-43, 2020. https://pmc.ncbi.nlm.nih.gov/articles/PMC7924527/

[54] Pollack J., Helm J., and Adler D., “What is the Iron Triangle, and How has it Changed?,” International Journal of Managing Projects in Business, vol. 11, no. 2, pp. 527-547, 2018. https://doi.org/10.1108/IJMPB-09-2017-0107

[55] Ransbotham S., Khodabandeh S., Kiron D., Candelon F., Chu M., and LaFountain B., “Expanding AI’s Impact with Organizational A Review of SDLCs for Big Data Analytics Systems in the Context of Very Small Entities ... 211 Learning,” MIT Sloan Management Review and Boston Consulting Group, pp. 1-15, 2020. https://www.hbs.edu/faculty/Pages/item.aspx?num=63842

[56] Rao T., Mitra P., Bhatt R., and Goswami A., “The Big Data System, Components, Tools, and Technologies: A Survey,” Knowledge and Information Systems, vol. 60, no. 3, pp. 1165- 1245, 2019. https://link.springer.com/article/10.1007/s10115-018-1248-0

[57] Reyes-Delgado P., Mora M., Wang F., and Gomez J., “AHP Evaluation of Rigorous and Agile IT Service Design-Building Phases-Workflows in Data Centers,” The Journal of Supercomputing, vol. 79, no. 16, pp. 18089-18166, 2023. https://doi.org/10.1007/s11227-023-05219-x

[58] Russom P., “Big Data Analytics,” TDWI best Practices Report, Fourth Quarter, 2011. http://download.101com.com/pub/tdwi/Files/TDW I_BPReport_Q411_Big_Data_Analytics_Web.pdf

[59] Salazar-Salazar G., Mora M., Duran-Limon H., Alvarez-Rodriguez F., and Munoz-Zavala A., “Review of Agile SDLC for Big Data Analytics Systems in the Context of Small Organizations Using Scrum-XP,” The International Arab Journal of Information Technology, vol. 21, no. 6, pp. 1089- 1110, 2024. https://doi.org/10.34028/iajit/21/6/12

[60] Salazar-Salazar G., Mora M., Duran-Limon H., and Rodríguez F., Development Methodologies for Big Data Analytics Systems: Plan-Driven, Agile, Hybrid, Lightweight Approaches, Springer, 2023. https://doi.org/10.1007/978-3-031-40956-1_6

[61] Saltz J. and Iva K., “Current Approaches for Executing Big Data Science Projects-A Systematic Literature Review,” PeerJ Computer Science, vol. 8, pp. 1-24, 2022. https://doi.org/10.7717/peerj-cs.862

[62] Saltz J. and Shamshurin I., “Big Data Team Process Methodologies: A Literature Review and the Identification of Key Factors for a Project’s Success,” in Proceedings of the IEEE International Conference on Big Data, Washington (DC), pp. 2872-2879, 2016. DOI:10.1109/BigData.2016.7840936

[63] SAS Institute, Introduction to SEMMA, https://documentation.sas.com/doc/en/emref/14.3 /p1tsqq44rg56ron17qd3m7ey4mzu.htm, Last Visited, 2024.

[64] Schryen G., “Writing Qualitative is Literature Reviews-Guidelines for Synthesis, Interpretation, and Guidance of Research,” Communications of the Association for Information Systems, vol. 37, no. 1, pp. 286-325, 2015. https://doi.org/10.17705/1CAIS.03712

[65] Staples M., Niazi M., Jeffery R., Abrahams A., Byatt P., and Murphy R., “An Exploratory Study of Why Organizations do not Adopt CMMI,” Journal of Systems and Software, vol. 80, no. 6, pp. 883-895, 2007. https://doi.org/10.1016/j.jss.2006.09.008

[66] Sutherland J., The Scrum Handbook, The Scrum Training Institute Press, 2016. https://www.scruminc.com/wp- content/uploads/2014/07/The-Scrum-Handbook.pdf

[67] Todman L., Bush A., and Hood A., “‘Small Data’ for Big Insights in Ecology,” Trends in Ecology and Evolution, vol. 38, no. 7, pp. 615-622, 2023. https://doi.org/10.1016/j.tree.2023.01.015

[68] Tsai C., Lai C., Chao H., and Vasilakos A., “Big Data Analytics: A Survey,” Journal of Big Data, vol. 2, no. 1, pp. 1-32, 2015. DOI:10.1186/s40537- 015-0030-3

[69] Tsoy M. and Staples D., “What are the Critical Success Factors for Agile Analytics Projects?,” Information Systems Management, vol. 38, no. 4, pp. 324-341, 2021. https://doi.org/10.1080/10580530.2020.1818899

[70] Unterkalmsteiner M., Gorschek T., Islam A., Cheng C., Permadi R., and Feldt R., “Evaluation and Measurement of Software Process Improvement-A Systematic Literature Review,” IEEE Transactions on Software Engineering, vol. 38, no. 2, pp. 398-424, 2012. https://ieeexplore.ieee.org/document/5728832

[71] Vives L., Melendez K., and Davila A., “ISO/IEC 29110 and Software Engineering Education: A Systematic Mapping Study,” Programming and Computer Software, vol. 48, no. 8, pp. 745-755, 2022. https://link.springer.com/article/10.1134/S0361768822080229

[72] Watson H., “Update Tutorial: Big Data Analytics: Concepts, Technology, and Applications,” Communications of the Association for Information Systems, vol. 44, no. 1, pp. 364-379, 2019. https://aisel.aisnet.org/cgi/viewcontent.cgi?article =4127&context=cais