The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Phishing Detection using RDF and Random Forests Vamsee Muppavarapu, Archanaa Rajendran, and Shriram Vasudevan

Phishing is one of the major threats in this internet era. Phishing is a smart process where a legitimate website is cloned and victims are lured to the fake website to provide their personal as well as confidential information, sometimes it proves to be costly. Though most of the websites will give a disclaimer warning to the users about phishing, users tend to neglect it. It is not a fully responsible action by the websites also and there is not much that the websites could really do about it. Since phishing has been in persistence for a long time, many approaches have been proposed in past that can detect phishing websites but very few or none of them detect the target websites for these phishing attacks, accurately. Our proposed method is novel and an extension to our previous work, where we identify phishing websites using a combined approach by constructing Resource Description Framework (RDF) models and using ensemble learning algorithms for the classification of websites. Our approach uses supervised learning techniques to train our system. This approach has a promising true positive rate of 98.8%, which is definitely appreciable. As we have used random forest classifier that can handle missing values in dataset, we were able to reduce the false positive rate of the system to an extent of 1.5%. As our system explores the strength of RDF and ensemble learning methods and both these approaches work hand in hand, a highly promising accuracy rate of 98.68% is achieved.


[1] Alkhateeb F., Manasrah A., and Bsoul A., Bank Web Sites Phishing Detection and Notification System Based on Semantic Web Technologies, International Journal of Security and its Applications, vol. 6, no. 4, pp. 53-66, 2012.

[2] Apache Jena: A free and open source Java framework for building semantic web and linked data applications, Available at https://jena.apache.org, Last Visited, 2015.

[3] Carroll J., Matching rdf Graphs, HP Laboratories Technical Report HPL 293 (2001).

[4] Chou N., Ledesma R., Teraguchi Y., and Mitchell J., Client-Side Defense Against Web-Based Identity Theft, in Proceedings of the 11th Annual Network and Distributed System Security Symposium, San Diego, pp. 1-16, 2004.

[5] Cilibrasi R. and Vitanyi P., The Google Similarity Distance. Knowledge and Data Engineering, IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 3, pp. 370- 383, 2007.

[6] Dublin core metadata initiative, Available at http://dublincore.org/documents/2012/06/14/dcmi -terms/?v=elements#, Last Visited, 2015.

[7] Fette I., Sadeh N., and Tomasic A., Learning to Detect Phishing Emails, in Proceedings of the 16th International Conference on World Wide Web, Banff, pp. 649-656, 2007.

[8] HTTP vocabulary, Available at http://www.w3.org/2011/http#, Last Visited, 2015.

[9] Jericho HTML Parser, Available at http://jericho.htmlparser.net, Last Visited, 2015.

[10] Kremic E. and Subasi A., Performance of Random Forest and SVM in Face Recognition, The International Arab Journal of Information Technology, vol. 13, no. 2, pp. 287- 293, 2015.

[11] Muppavarapu V., Gowtham R., and Archanaa R., An RDF based Anti-Phishing Framework, International Journal of Software and Web Sciences, vol. 1, no. 9, pp. 1-10, 2014.

[12] Pan Y. and Ding X., Anomaly Based Web Phishing Page Detection, in Proceedings of 22nd Annual Computer Security Applications Conference, Miami Beach, pp. 381-392, 2006.

[13] PhishTank Phishing Database, Available at http://www.phishtank.com/, Last Visited, 2015.

[14] Prakash P., Kumar M., Kompella R., and Gupta M., Phishnet: Predictive Blacklisting to Detect Phishing Attacks, in Proceedings IEEE INFOCOM, San Diego, pp. 1-5, 2010.

[15] Rapid Miner Data Mining and Machine Learning Tool, Available at https://rapidminer.com/, Last Visited, 2015.

[16] RSA Anti-Fraud Command Center, RSA Monthly Online Fraud Report, January 2015: http://www.emc.com/collateral/fraudreport/h139 29-rsa-fraud-report-jan-2015.pdf, Last Visited, 2015.

[17] RSA Anti-Fraud Command Center, RSA Monthly Online Fraud Report, Available at http://www.emc.com/collateral/fraud- report/online-fraud-report-1114.pdf, Last Visited, 2015.

[18] Shahriar H. and Zulkernine M., Trustworthiness Testing of Phishing Websites: A Behavior Model-Based Approach, Future Generation Computer Systems, vol. 28, no. 8, pp. 1258- 1271, 2012.

[19] XHTML vocabulary, version date: 2010-01-27, http://www.w3.org/1999/xhtml/vocab#, Last Visited, 2015.

[20] Xiang G., Hong J., Rose C., and Cranor L., CANTINA+: A Feature-Rich Machine Learning Framework For Detecting Phishing Web Sites, ACM Transactions on Information and System Security, vol. 14, no. 2, pp. 1-32, 2011.

[21] Zhang J., Porras P., and Ullrich J., Highly Predictive Blacklisting, in Proceedings of the 17th Conference on Security symposium, San Jose, pp. 107-122, 2008.

[22] Zhang Y., Hong J., and Cranor L., Cantina: A Content-Based Approach To Detecting Phishing Web Sites, in Proceedings of the 16th 824 The International Arab Journal of Information Technology, Vol. 15, No. 5, September 2018 International Conference on World Wide Web, Banff, pp. 639-648, 2007. Vamsee Muppavarapu is an Assistant Professor in the Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham University. His primary research interests include anti-phishing, semantic web and recommender systems. He received his Master s degree in Computer Science from Amrita Vishwa Vidyapeetham University in Coimbatore, India. Archanaa Rajendran is an Assistant Professor in the Department of Computer Science and Engineering, Amrita Vishwa Vidyapeetham University. Her research interests include machine learning, recommender systems and data mining. She received her Master s degree in Computer Science from Amrita Vishwa Vidyapeetham University in Coimbatore, India. Shriram Vasudevan is an Embedded System Engineer with about ten yearsof experience in the IT and academics. He has authored 28 books for variousreputed publishers across the globe. He has also written a lot of researcharticles. He has been awarded by Intel, IEI (India), Wipro, Infosys, ICTACT, CII, Computer Society of India, and VIT University, etc. for his technical contributions. He received his Mastersand Doctorate in Embedded Systems. He is currently associated with Amrita Vishwa Vidyapeetham, India. He was associated with WiproTechnologies, Aricent Technologies and VIT University.