The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


A Novel Architecture for Search Engine using Domain Based Web Log Data

Search engines, an information retrieval tool are the main source of information for users’ information need now a day. For every query, the search engine explores its repository and/or indexer to find the relevant documents/URLs for that query. Page ranking algorithms rank the Uniform Resource Locator in abstract section (URLs) according to its relevancy with respect to users’ query. It is analyzed that many of the queries fired by users on search engines are duplicate. There is a scope to improve the performance of search engine to reduce its efforts for duplicate queries. In this paper a proxy server is created that keep store the search results of user queries in web log. The proposed proxy server uses this web log to find results faster for duplicate queries fired next time. The proposed scheme has been tested and found prominent. The proposed architecture tested for ten duplicate user queries. it return all relevant web pages for duplicate user query (if query is found in web log at proxy server) from a particular domain instead of entire database. It reduces the perceived latency for duplicate query and also improves the value of precession and accuracy up to 81.8% and 99% respectively for all duplicate user queries.

[1] Agarwal A., Koppula H., Leela K., Chitrapura K., Garg S., and GM P., “URL Normalization for De-Duplication of Web Pages,” in Proceedings of the 18th ACM Conference on Information and Knowledge Managemen, Hong Kong, pp. 1987- 1990, 2009.

[2] Al-Badarneh A., Al-Alaj A., and Mahafzah B., “Multi Small Index (MSI): A Spatial Indexing Structure,” Journal of Information Science, vol. 39, no. 5, pp. 643-660, 2013.

[3] Aqla H., Ahmed S., and Danti A., “Death Prediction and Analysis Using Web Mining Techniques,” in Proceedings of 4th International Conference on Advanced Computing and Communication Systems, Coimbatore, pp. 1-5, 2017.

[4] Bidoki A. and Yazdani N., “Distancerank: an Intelligent Ranking Algorithm for Web Pages,” Information Processing and Management, vol. 44, no. 2, pp. 877-892, 2008.

[5] Blanco L., Dalvi N., and Machanavajjhala A., “Highly Efficient Algorithms for Structural Clustering of Large Websites,” in Proceedings of the 20th International Conference on World Wide Web, New York, pp. 437-446, 2011.

[6] Broder A., Glassman S., Manasse M., and Zweig G., “Syntactic Clustering of the Web,” Computer Networks and ISDN Systems, vol. 29, no. 8, pp. 1157-1166, 1997.

[7] Elmacioglu E., Tan Y., Yan S., Kan M., and Lee D., “PSNUS: Web People Name Disambiguation By Simple Clustering With Rich Features,” in Proceedings of the 4th International Workshop on Semantic Evaluations, Prague, pp. 268-271, 2007.

[8] Grünwald P., The Minimum Description Length Principle, MIT Press, 2007.

[9] Gupta P., Singh S., Yadav D., and Sharma A., “An Improved Approach to Rank Web Document,” Journal of Information Processing Systems, vol. 9, no. 2, pp. 217-236, 2013.

[10] Gupta P., Sharma A., and Yadav D., “A Novel Technique for Back-link Extraction and Relevance Evaluation,” International Journal of Computer Science and Information Technology, vol. 3, no. 3, pp. 227-238, 2011.

[11] Hu Y., Kang C., Tang J., Yin D., and Chang Y., “Large-Scale Location Prediction for Web Pages,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 9, pp. 1902-1915, 2017.

[12] Jin L., Feng L., Liu G., and Wang C., “Personal Web Revisitation by Context and Content Keywords with Relevance Feedback,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 7, pp. 1508-1521, 2017.

[13] Kakol M., Nielek R., and Wierzbicki A., “Understanding and Predicting Web Content Credibility Using the Content Credibility Corpus,” Information Processing and Management, vol. 53, no. 5, pp. 1043-1061, 2017.

[14] Kheir N., Blanc G., Debar H., Garcia-Alfaro J., and Yang D., “Automated Classification of C&C Connections through Malware URL Clustering Nizar,” IFIP Advances in Information and Communication Technology, vol. 455, pp. 252- 266, 2015.

[15] Khribi M., Jemni M., and Nasraoui O., “Automatic Recommendations for E-Learning Personalization Based on Web,” Educational Technology and Society, vol. 12, no. 4, pp. 30-42, 2009.

[16] Kim S. and Kang J., “Analyzing The (4) 100 The International Arab Journal of Information Technology, Vol. 20, No. 1, January 2023 Discriminative Attributes of Products Using Text Mining Focused on Cosmetic Reviews,” Information Processing and Management, vol. 54, no. 6, pp. 938-957, 2018.

[17] Kleinberg J., “Authoritative Sources in A Hyperlinked Environment,” Journal of the ACM, vol. 46, no. 5, pp. 604-632, 1999.

[18] Lee L., Jiang J., Wu C., and Lee S., “A Query- Dependent Ranking Approach for Search Engines,” in Processing of 2nd International Workshop on Computer Science and Engineering, Qingdao, pp. 259-263, 2009.

[19] Leung K., Ng W., and Lee D., “Personalized Concept-Based Clustering of Search Engine Queries,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 11, pp. 1505- 1518, 2008.

[20] Li L., Xu G., Zhang Y., and Kitsuregawa M., “Random Walk Based Rank Aggregation to Improving Web Search,” Knowledge-Based Systems, vol. 24, no. 7, pp. 943-951, 2011.

[21] Liao Z., Song Y., Huang Y., He L., and He Q., “Task Trail: An Effective Segmentation of User Search Behavior,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 12, pp. 3090-3102, 2014.

[22] Mahafzah B., Al-Badarneh A., and Zakaria M., “A New Sampling Technique for Association Rule Mining,” Journal of Information Science, vol. 35, no. 3, pp. 358-376, 2009.

[23] Moreno M., Segrera S.,López V., Muñoz M., and Sánchez A., “Web Mining Based Framework for Solving Usual Problems in Recommender Systems: A Case Study for Movies Recommendation,” Neurocomputing, vol. 176, pp. 72-80, 2016.

[24] Nguyen T., Lu H., and Lu J., “Web-Page Recommendation Based on Web Usage and Domain Knowledge,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, pp. 2574-2587, 2014.

[25] Patil S. and Sarkar S., “Personalized Web Page Recommendation Using Ontology,” International Journal on Recent and Innovation Trends in Computing and Communication, vol. 3, no. 7, pp. 4431-4436, 2015.

[26] Ramitha A. and Jayasudha J., “Personalization and Privacy in Profile-Based Web Search,” in Processing of International Conference on Research Advances in Integrated Navigation Systems, Bangalore, pp. 1-4, 2016.

[27] Rizvi N. and Keole R., “Web Page Recommendation in Information Retrieval using Domain Knowledge and Web Usage Mining,” International Journal of Science, Engineering and Technology Research, vol. 4, no. 5, pp. 1531-1535 2015.

[28] Rodrigues K., Cristo M., Moura E., and Silva A., “Removing DUST Using Multiple Alignment of Sequences,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 8, pp. 2261- 2274, 2015.

[29] Roobam R. and Vallimayli V., “Survey on Ontology based Semantic Web Usage Mining for Enhanced Recommendation Model,” International Journal of Scientific and Engineering Research, vol. 5, no. 12, pp. 1164- 1170, 2014.

[30] Sharma D. and Ganeshiya D., “HierarchicalRank: Webpage Rank Improvement Using HTML TagLevel Similarity,” The International Arab Journal of Information Technology, vol. 15, no. 3, pp. 485- 492, 2018.

[31] Sharma D. and Sharma A., “A Comparative Analysis of Web Page Ranking Algorithms,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 2, no. 8, pp. 2670-2676, 2010.

[32] Sharma P. and Yadav D., “Incremental Refinement of Page Ranking of Web Pages,” International Journal of Information Retrieval Research, vol. 11, no. 2, pp. 57-73, 2020.

[33] Sharma P., Sharma A., and Garg P., “Design of a Framework for Knowledge Based Web Page Ranking,” International Journal of Engineering and Technology, vol. 9, no. 3, pp. 2236-2244, 2017.

[34] Sharma P., Yadav D., and Garg P., “A Systematic Review on Page Ranking Algorithms,” International Journal of Information Technology, vol. 12, pp. 329-337 2020.

[35] Sharma S. and Lodhi S., “Development of Decision Tree Algorithm for Mining Web Data Stream,” International Journal of Computer Applications, vol. 138, no. 2, pp. 34-43, 2016.

[36] Shirgave S., Kulkarni P., and Borges J., “Semantically Enriched Web Usage Mining for Personalization,” International Journal of Computer and Information Engineering, vol. 8, no. 1, pp. 249- 257, 2014.

[37] Shou L., Bai H., Chen K., Chen G., “Supporting Privacy Protection in Personalized Web Search,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 2, pp. 453-467, 2014.

[38] Vojnovic M., Cruise J., Gunawardena D., and Marbach P., “Ranking and Suggesting Popular Items,” IEEE Transactions on Knowledge and A Novel Architecture for Search Engine using Domain Based Web Log Data 101 Data Engineering, vol. 21, no. 8, pp. 1133-1146, 2009.

[39] Wang Y., Ouyang H., Deng H., and Chang Y., “Learning Online Trends for Interactive Query Auto-Completion,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 11, pp. 2442-2454, 2017.

[40] Yadav A. and Yadav D., “Wavelet Tree based Dual Indexing Technique for Geographical Search,” The International Arab Journal of Information Technology, vol. 16, no. 6, pp. 624- 632, 2019.

[41] Yadav A., Yadav D., and Prasad R., “Efficient Textual Web Retrieval using Wavelet Tree International Journal of Information Retrieval Research, vol. 6, no. 4, pp. 16-29, 2016.

[42] Yadav D., Sharma A., Sanchez-Cuadrado S., and Morato J., “An Approach to Design Incremental Parallel WebCrawler,” Journal of Theoretical and Applied Information Technology, vol. 43, no. 1, pp. 8-29, 2012.

[43] Yadav D., Sharma A., and Gupta J., “Topical Web Crawling Using Weighted Anchor Text and Web Page Change Detection Techniques,” WSEAS Transactions on Information Science and Applications, vol. 6, no. 2, pp. 263-275, 2009.