The International Arab Journal of Information Technology (IAJIT)


An Efficient Web Search Engine for Noisy Free

The vast growth, various dynamic and low quality of the world wide web makes it very difficult to retrieve relevant information from internet during query search. To resolve this issue, various web mining techniques are being used. The biggest challenge in web mining is to remove noisy data information or unwanted information from the webpage such as banner, video, audio, images, hyperlinks etc. which are not associated to a user query. To overcome these issues, a novel custom search engine is proposed with efficient algorithm in this paper. The proposed Uniform Resource Locator (URL) pattern extractor algorithm will extract the all relevance index pages from the web and ranking the indexes based on user query. Then, Noisy Data Cleaner (NDC) algorithm is applied to remove the unwanted content from the retrieved web pages. The results show that the proposed URL Pattern Extractor (UPE)+NDC algorithm provides very promising results for different datasets with high precision and recall rate in comparison with the existing algorithms.

Pradeep Sahoo received his B.Tech from The Institution of Engineer India, Calcutta, India in 2000. He completed his M.Tech in Computer Science and Engineering from Anna University Chennai, Tamilnadu (India). He is pursuing his Doctorate degree from Anna University Chennai, Tamilnadu (India). Currently, he is serving as Associate Professor at Computer Science & Engineering Department in Sai Ram Engineering College, Chennai, Tamilnadu India. He participated in total 8 National Conferences, Workshop & Seminar in various institutions in India. He published 4 papers in International Journal and 3 in National Journal. His research area is Data Mining in Pattern Recognition and Content Extraction and Software Engineering. He is holding the following membership: ISTE, CSI, and IAENG & IACSIT. Rajagopalagn Parthasarthy received his Master degree in Applied Mathematics from IIT Madras and has obtained PhD in Computer Science from the University of Madras. He has 40 years of teaching experience in various institutions in India. He is recognized as research supervisor for Anna University, Dr. MGR University, Vels University, Mother Terasa University and University of Madras Tamilnadu India. He has successfully led around 25 scholars for obtaining their PhD and more than 169 scholars to obtain their M.Phil degree. He served as Faculty, Visiting Professor, Project Co-ordinator, Resource Personal and Panel Member for various academic institution and government organization in India. Currently, he is serving as a veteran at Research and Development Cell in Department of CSE at GKM College Engineering of Technology Chennai, India. He served as many administrative positions like Chairman, Principal, President, Director, Chief Guest, Dean, Head of Department, Advisor, Member, Convener and Subject Expert in various organizations in India. He wrote 13 books and published 84 International journal and national journal in his related area. He awarded as Life Time Achievement Award, Seva Ratna Award, Distinguished Educationist and A Person of Eminence, Seer Seyai Maamani Award, A Living Legend, Educationalist Born-Noble and Best Teacher Award from various government & private organization and institutions. His field of interests and specialization are Quantitative Techniques, Data Processing and Project Management, Management Information System, Programming Languages, Simulation, Text Generation, Cryptography and Data Mining.