The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


HierarchicalRank: Webpage Rank Improvement Using HTML TagLevel Similarity

In the past researches, two types of algorithms are introduced that are query dependent and query independent, works online or offline. PageRank Algorithm works offline independent to query while Hyperlink-Induced Topic Search (HITS) algorithm woks online dependent on query. One of the problems of these algorithms is that, division of the rank is based on number of inlinks, outlinks and different parameters used in hyperlink analysis which is dependent or independent to webpage content with the problem of topic drift. Previous researches were focused to solve this problem using the popularity of the outlink webpages. In this paper a novel algorithm for popularity measure is proposed based on similarity between query and Hierarchical text extracted from source and target webpage using Hyper Text Markup Language (HTML) tags importance parameter. In this paper, result of proposed method is compared with PageRank Algorithm and Topic Distillation with Query Dependent Link Connections and Page Characteristics results.


[1] Brin S. and Page L., The Anatomy of a Large Scale Hypertextual Web Search Engine, Computer Network and ISDN Systems, vol. 30, no. 1-7, pp. 107-11, 1998.

[2] Caverlee J., Webb S., Liu L., and Rouse W., A Parameterized Approach to Spam-Resilient Link Analysis of the Web, IEEE Transactions on Parallel and Distributed Systems, vol. 20, no. 10, pp. 1422-1438, 2009.

[3] Chakrabarti S., Dom B., Raghavan P., and Rajagopalan S., Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text, in Proceedings of the 7th International Conference on World Wide Web 7, Brisbane, pp. 65-74, 1998.

[4] Dubey H. and Roy B., An Improved Page Rank Algorithm Based on Optimized Normalization Technique, International Journal of Computer Science and Information Techniques, vol. 2, no. 5, pp. 2183-2188, 2011.

[5] Duhan N., Sharma A., and Bhatia K., Page Ranking Algorithms: A Survey, in Proceedings IEEE International Conference on Advance Computing, Patiala, pp. 1530-1537, 2009.

[6] Haveliwala T., Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search, IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 4, pp. 784-796, 2003.

[7] Henzinger M., Hyperlink Analysis for the Web, IEEE Internet Computing, vol. 5, no. 1, pp. 45-50, 2001.

[8] Hijikata Y., Hung B., Otsubo M., and Nishida S., HITS Algorithm Improvement using Anchor- Related Text Extracted by DOM Structure Analysis, in Proceedings of the ACM Symposium on Applied Computing, Honolulu, pp. 1691-1698, 2009.

[9] Kleinberg J., Authoritative Sources in a Hyperlinked Environment, Journal of the ACM, vol. 46, no. 5, pp. 604-632, 1999.

[10] Kumar G., Duhan N., and Sharma A., Page Ranking Based on Number of Visits of Webpages, in Proceedings International Conference on Computer and Communication Technology, Allahabad, pp. 11-14, 2011.

[11] Liu X., An Improved HITS Algorithm Based on Page Query Similarity and Page Popularity, Journal of Computers, vol. 7, no. 1, pp. 130-134, 2012.

[12] Mohmmad A., Bidoki Z., and Yazdani N., DistanceRank: An Intelligent Ranking Algorithm for Webpages, Journal on Information Processing and Management, vol. 44, pp. 877-892, 2007.

[13] Nie L., Davison B., and Qi X., Topical Link Analysis for Web Search, in Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, pp. 91-98, 2006.

[14] Noor S. and Bashir S., Evaluating Bias in Retrieval Systems for Recall Oriented Documents Retrieval, The International Arab Journal of Information Technology, vol. 12, no. 1, pp. 53-59, 2015.

[15] Sharma D. and Sharma A., A Comparative Analysis of the Page Ranking Algorithms, International Journal of Computer Science and Engineering, vol. 2, no. 8, pp. 2670-2776, 2010.

[16] Tao W. and Zuo W., Query-Sensitive Self- Adaptable Webpage Ranking Algorithm, in Proceedings International Conference on Machine Learning and Cybernetics, Xi'an, pp. 413-418, 2003.

[17] Tyagi N. and Sharma S., Weighted PageRanking Based on Number of Visits of links of Webpage, International Journal of Soft 492 The International Arab Journal of Information Technology, Vol. 15, No. 3, May 2018 Computing and Engineering, vol. 2, no. 3, pp. 441-446, 2012.

[18] Varadarajan R., Hristidis V., and Li T., Beyond Single-Page Web Search, IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 3, pp. 411-424, 2008.

[19] Wu M., Scholer F., and Turpin A., Topic Distillation with Query Dependent Link Connections and Page Characteristics, ACM Transactions on the Web, vol. 5, no. 2, pp. 6:1- 6:25, 2011.

[20] Xing W. and Ghorbani A., Weighted Page Rank Algorithm, in Proceedings Second Annual Conference on Communication Networks and Services Research, Fredericton, pp. 305-314, 2004.

[21] Zhang Y., Xiao L., and Fan B., The Research about Web Page Ranking Based on the A- PageRank and the Extended VSM, in Proceedings of 5th IEEE International Conference on Fuzzy Systems and Knowledge Discovery, Shandong, pp. 223-227, 2008. Dilip Sharma is B.E.(CSE), M.Tech. (CSE) and Ph.D in Computer Engineering. He is Senior Member of IEEE, IEEE-CS, IEEE- WIEand Member of ACM, CSTA, USA and also life member of CSI, IETE, ISTE, IE, ISCA, SSI. He has published 72 research papers in International Journals /Conferences of repute and participated in 3 International/National conferences. He is consistently Conferred Significant Contribution Award by Computer Society of India in 47th and 48thCSI National Convention at Science City, Kolkata and Visakhapatnam, India. Presently he is working as Programme Coordinator (CSE) and Associate Professor in Department of Computer Engineering & Applications, GLA University, Mathura, U.P, India. He is Joint Secretary IEEE Uttar Pradesh Section and also Vice Chairman of Computer Society of India Mathura Chapter. His research interests are Web Information Retrieval and Software Engineering. Deepak Ganeshiya received his B.Tech degree in Computer Science and Engineering from UPTU Lucknow, India in the year 2009 and M.Tech degree in Computer Science and Engineering from GLA University Mathura, India in the year 2014. During M.Tech his active area of research is Web information retrieval. He has three years of experience in the field of development of various e-governance projects.