The International Arab Journal of Information Technology (IAJIT)


A Statistical Framework for Identification of

 This work describes a statistical approach to detec t applications which are running inside application layer tunnels. Application layer tunnels are a significant threat for network abuse and violation of acceptable inter net usage policy of an organisation. In tunnelling, the prohibited applica tion packets are encapsulated as payload of an allo wed protocol packet. It is much difficult to identify tunnelling using convent ional methods in the case of encrypted HTTPS tunnel s, for example. Hence, machine learning based approach is presented in thi s work in which statistical packet stream features are used to identify the application inside a tunnel. Packet Size Distributi on (PSD) in the form of discrete bins is an importa nt feature which is shown to be indicative of the respective application. Thi s work presents a combination of other features wit h the PSD bins for better identification of the applications. Tunnelled appli cations are identifiable using these traffic statistical parameters. A comparison of the performance accuracy of five mach ine learning algorithms for application detection using this feature set is also given.    

[1] Bharadia K., Network Application Detection Techniques, PhD Thesis, Loughborough University, 2001.

[2] Bo L., Parish D., Sandford M., and Sandford P., Using TCP Packet Size Distributions for Application Detection, available at: http:// ers/2006-053.pdf, last visited 2012.

[3] Borders K. and Prakash A., Web Tap: Detecting Covert Web Traffic, in Proceedings of the 11 th ACM Conference on Computer and Communications Security , Washington, USA, pp. 110-120, 2004.

[4] Dusi M., Crotti M., Gringoli F., and Salgarelli L., Detection of Encrypted Tunnels Across Network Boundaries, in Proceedings of the 43 rd International Conference on Communications , Beijing, China, pp. 1738-1744, 2008.

[5] Dusi M., Crotti M., Gringoli F., and Salgarelli L., Tunnel Hunter: Detecting Application-Layer Tunnels with Statistical Fingerprinting, Computer Networks , vol. 53, no. 1, pp. 81-97, 2009.

[6] Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., and Witten I., The WEKA Data Mining Software: An Update, SIGKDD Exploration Newsletter , vol. 11, no. 1, pp. 10-18, 2009.

[7] Hill J., Bypassing Firewalls: Tools and Techniques, in Proceedings of the 12 th Annual FIRST Conference , Chicago, USA, 2000.

[8] Ismail M., Study the Best Approach Implementation and Codec Selection for VOIP over Virtual Private Network, the International Arab Journal o Information Technology , vol. 10, no. 2, pp. 198-203, 2013.

[9] Khalife J., Verdejo J., and Hajjar A., Performance of OpenDPI in Identifying Sampled Network Traffic, Journal of Networks, vol. 8, no. 1, pp. 71-78, 2013 .

[10] Kitchens, J., Exploring Statistics: A Modern Introduction to Data Analysis and Inference , Brooks/Cole Publishing Company, 1996.

[11] Moore A. and Zuev D., Discriminators for Use in Flow-based Classification, Technical Report Intel Research , 2005.

[12] Mujtaba G. and Parish D., Detection of Applications within Encrypted Tunnels using Packet Size Distributions, in Proceedings of Internet Technology and Secured Transactions , London, UK, pp. 1-6, 2009.

[13] Mujtaba G. and Parish D., Detection of Tunnelled Applications using Packet Size Distributions, available at: http://www.cms. 09.pdf, last visited 2013.

[14] Pack D., Streilein W., Webster S., and Cunningham R., Detecting HTTP Tunnelling Activities, in Proceedings of IEEE Workshop on Information Assurance , New York, USA, pp. 1- 8, 2002.

[15] Parish D., Bharadia K., Larkum A., Phillips I., and Oliver M., Using Packet Size Distributions to Identify Real-Time Networked Applications, IEE Proceedings8Communications , vol. 150, no. 4, pp. 221-227, 2003. 790 The International Arab Journal of In formation Technology, Vol. 12, No. 6A, 2015

[16] Propst A., Statistics: Concepts and Applications, Technometrics, vol. 30, no. 4, pp. 461-462, 1988.

[17] Siau K., Nah F., and Teng J., Internet Abuse and Acceptable Internet Use Policy, Communications of the ACM , vol. 45, no. 1, pp.75-79, 2002.

[18] Tseng C., Chao L., and Liu T., P2P Streaming Traffic Detection in Encrypted Tunnel, in Proceedings International Symposium on Computing and Networking , Matsuyama, Japan, pp. 208-212, 2013.

[19] Woon I. and Pee L., Behavioral Factors Affecting Internet Abuse in the Workplace--an Empirical Investigation, in Proceedings of the 3 rd Annual Workshop on HCI Research in MIS , Washington, USA, pp. 80-84, 2004.

[20] Williams N., Zander S., and Armitage G., A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification, SIGCOMM Computer Communication Review , vol. 36, no. 5, pp. 5-16, 2006.

[21] Witten I. and Frank E., Data Mining: Practical Machine Learning Tools and Techniques , Morgan Kaufmann Series, 2005. Ghulam Mujtaba recievved BSc degree in Computer Systems Engineering from GIKIEST, Pakistan in 2003. He did Postgraduate Diploma and PhD in Electrical Engineering from Loughborough University, at High Speed Networks Laboratory and 2011. Currently, he i s an Assistant Professor in the Electrical Engineerin g Department of CIIT, Abbottabad. His research intere sts include network security and machine learning. David Parish is Professor of Communication Networks in the School of Electronic, Electrical and Systems Engineering, Loughborough University and Head of the High Speed Networks Group. He has been active in the area of communication network research for over 25 years having published over 100 papers and held in excess of 2.5M of research funding. He has extensive experience in the performance measurement and abuse detection for such networks.