The International Arab Journal of Information Technology (IAJIT)


Data Deduplication for Efficient Cloud Storage and Retrieval

Cloud services provide flawless service to the client by increasing the geographic availability of the data. Increasing availability of data induces high amount of redundancy and large amount of space required to store that data. Data compression techniques can reduce the amount of space required for that data to be store at various sites. Data compression will ensure that there is no loss of availability and consistency at any site. As there is huge demand for cloud services and storage due to this the amount of investment also increases. By using data compression we can reduce the amount of investment required and this will also decrease the amount of physical space and data centers required to store data. Various security protocols can be incorporated to secure these compressed files at various sites. We provide a reliable technique to store deduplicates and its management in a secure manner to accomplish high consistency as well as availability.

[1] Biggar H., “Experiencing Data De-Duplication: Improving Efficiency and Reducing Capacity Requirements,” The Enterprise Strategy Group, pp. 902-906, 2012.

[2] Castiglione A., Pizzolante R., De Santis A., Carpentieri B., Castiglione A., and Palmieri F., “Cloud-Based Adaptive Compression and Secure Management Services for 3D Healthcare Data,” Future Generation Computer Systems, vol. 43-44, pp. 120-134, 2014.

[3] Chu X., Ilyas I., and Koutris P., “Distributed Data Deduplication,” Proceedings of the VLDB Endowment, vol. 9, no. 11, pp. 864-875, 2016.

[4] Dolan M., Kochan L., Ram T., Rohr S., Tu K., and Miller S., Patent No. US20160292048, Retrieved from 8, Data Deduplication Using Chunk Files, Google Patent, Last Visited, 2016.

[5] Douceur J., Adya A., Bolosky W., Simon D., and Theimer M., “Reclaiming Space from Duplicate _Les in A Serverless Distributed _Le System,” in Proceedings of 22nd International Conference on Distributed Computing Systems, Vienna, pp. 617-624, 2002.

[6] Demystifying Data Reduplication: Choosing the Best Solution, FalconStor Software, White Paper Dynamic Solutions International, data-deduplication-choosing-0002, Last Visited, 2017.

[7] Eastlake D. Jones P., White paper: Description of SHA-1,, Last Visited, 2017.

[8] Estes J., Patent No. US20140258245, Retrieved from, Efficient Data Deduplication, Last Visited, 2014.

[9] Harnik D., Pinkas B., and Shulman-Peleg A., “Side Channels in Cloud Services, the Case of Deduplication in Cloud Storage,” IEEE Security and Privacy Magazine, vol. 8, no. 6, pp. 40-47, 2010.

[10] Jiang T., Chen X., Wu Q., Ma J., Susilo W., and Lou W., “Secure and Efficient Cloud Data Deduplication with Randomized Tag,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 3, pp. 532-543, 2017.

[11] Karp R. and Rabin M., “Efficient Randomized Pattern-Matching Algorithms,” IBM Journal of Research and Development, vol. 31, no. 2, pp. 249-260, 1987.

[12] Kleppmann M., A Critique of the CAP Theorem,, Last Visited, 2017. Data Deduplication for Efficient Cloud Storage and Retrieval 927

[13] Leesakul W., Townend P., and Xu J., “Dynamic Data Deduplication in Cloud Storage,” Service Oriented System Engineering (SOSE), in Proceedings of IEEE 8th International Symposium on Service Oriented System Engineering, Oxford, 2014.

[14] Luo S., Zhang G., Wu C., Khan S., and Li K., “Boafft: Distributed Deduplication for Big Data Storage in the Cloud,” IEEE Transactions on Cloud Computing, pp. 1-1, 2015.

[15] Meyer D. and Bolosky W., “A Study of Practical Deduplication,” ACM Transactions on Storage, vol. 7, no. 4, pp. 14, 2012.

[16] Nelson M. and Gailly J., the Data Compression Book, M&T Books, 1991.

[17] Ngo D. and Muller M., Patent No. US8930306B1, Retrieved from, Synchronized Data Deduplication, Google Patent, Last Visited, 2015.

[18] Park D., Fan Z., Nam Y., and Du D., “A Lookahead Read Cache: Improving Read Performance for Deduplication Backup Storage,” Journal of Computer Science and Technology, vol. 32, no. 1, pp. 26-40, 2017.

[19] Patterson R., Reddy S., Prabhakaran V., Smith G., Bairavasundaram L., and Venkitachalam G., “System and Methods for Storage Data Deduplication,” U.S. Patent No. 20,170,031,994, 2017.

[20] Puzio P., Molva R., Önen M., and Loureiro S., “PerfectDedup: Secure Data Deduplication,” in Proceedings of 10th International Workshop on Data Privacy Management, and Security Assurance, Vienna, pp. 150-166, 2015.

[21] Qinlu H., Zhanhuai L., and Xiao Z., “Data Deduplication Techniques,” in Proceedings of International Conference on Future Information Technology and Management Engineering, Changzhou, 2010.

[22] Ram T., Patent No.US20140095439, Retrieved from Optimizing Data Block Size for Deduplication, Google Patent, Last Visited, 2014.

[23] Rehman A. and Saba T., “An Intelligent Model for Visual Scene Analysis and Compression,” The International Arab Journal of Information Technology, vol. 10, no. 13, pp. 126-136, 2013.

[24] Sayood K., Introduction to Data Compression, Morgan Kaufmann, 2006.

[25] Shin Y., Koo D., and Hur J., “A Survey of Secure Data Deduplication Schemes for Cloud Storage Systems,” ACM Computing Surveys, vol. 49, no. 4, pp. 74, 2017.

[26] Slater A. and Pelly S., Patent No.US20110184908, Retrieved from 08, Selective Data Deduplication, Google Patent, Last Visited, 2011.

[27] Stanek J., Sorniotti A., Androulaki E., and Lukas K., “A Secure Data Deduplication Scheme for Cloud Storage,” in Proceedings of International Conference on Financial Cryptography and Data Security, Christ Church, pp. 99-118, 2014.

[28] Storer M., Greenan K., Long D., and Miller E., “Secure Data Deduplication,” in Proceedings of the 4th ACM international Workshop on Storage Security and Survivability, Alexandria, pp. 1-10, 2008.

[29] Xia W., Jiang H., Feng D., Hua Y., “Similarity and Locality Based Indexing for High Performance Data Deduplication,” IEEE Transactions on Computers, vol. 64, no. 4, pp.1162-1176, 2015. Rishikesh Misal graduated from University of Mumbai with a bachelor’s degree in Computer Engineer in 2015. He completed his Master’s in Computer Science and Engineering from VIT University, Vellore. He has been working at General Electric for the past 1 year as a Software Engineering Specialist. His professional works are based on building Cloud applications for IoT based scenarios. His research work interests include Distributed Systems, Cloud Computing, System Programming and Compiler Construction. Boominathan Perumal is an Associate Professor working in VIT University, Vellore, India. He received his B.E in Computer science and Engineering from Barathidasan University, Tirchy, India, M.E in omputer Science and Engineering from Anna University, India and he received his Ph.D. from VIT University, Vellore, India.He has 12 years of teaching experience. He has good number of publications in reputed conference proceedings and journals. His research interests include cloud computing, Network Security, and Evolutionary optimization, etc.