The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


Enhanced Graph Based Approach for Multi Document Summarization

 ,
Summarizing documents catering the needs of an use r is tricky and challenging. Though there are varieties of approaches, graphical methods have been quite popul arly investigated for summarizing document contents. This paper focus its attention on two graphical methods namely(LexRa nk (threshold) and LexRank (Continuous) proposed by Erkan and Radev. This paper proposes two enhancements to the above w ork investigated earlier by adding two more features to the existing one. Firstly, discounting approach was introduced to for m a summary which ensures less redundancy among sen tences. Secondly, position weight mechanism has been adopted to prese rve importance based on the position they occupy. Intrinsic evaluation has been done with two data sets. Data set 1 has be en created manually from the news paper documents c ollected by us for experiments. Data set 2 is from DUC 2002 data which is commercially available and distributed or accessed through National Institute of Standards Technology (NIST). We have s hown that the based upon precision and recall param eters were comprehensively better as compared to the earlier a lgorithms.   


[1] Cretu B., Chen Z., Uchimoto T., and Miya K., Automatic Summarizing Based on Sentence Extraction: A Statistical Approach, International Journal of Applied Electromagnetics and Mechanics , vol. 13, no. 1- 4, pp. 19-23, 2002.

[2] Edmundson H., New Methods in Automatic Extracting, Journal of the ACM , vol. 16, no. 2, pp. 264-285, 1969.

[3] Erkan G. and Radev D., LexRank: Graph-Based Lexical Centrality as Salience in Text Summarization, Journal of Artificial Intelligence Research , vol. 22, pp. 457-479, 2004.

[4] Hariharan S. and Srinivasan,R., A Comparison of Similarity Measures for Text Documents, Journal of Information and Knowledge Management , vol.7, no.1, pp. 1-8, 2008.

[5] Hariharan S. and Srinivasan R., Enhancements to Graph Based Approaches for Multi Document Summarizations, International Journal of Applied Computer Science and Mathematics , vol. 3, no. 6, pp. 66-72, 2009.

[6] Hariharan S. and Srinivasan R., Studies on Graph Based Approaches for Single and Multi Document Summarizations, International Journal of Computer Theory and Engineering , vol. 1, no. 5, pp. 512-519, 2009.

[7] Hariharan S. and Srinivasan R., Studies on Intrinsic Summary Evaluation, International Journal of Artificial Intelligence and Soft Computing , vol. 2, no. 1-2, pp. 58-76, 2010.

[8] Jones K., Automatic Summarising: The State of the Art, Information Processing and Management , vol. 43, no. 6, pp. 1449-1481, 2007.

[9] Li W., Wu M., Lu Q., Xu W., and Yuan C., Extractive Summarization Using Inter- and Intra- Event Relevance, in Proceedings of the 21 st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL , Sydney, pp. 369-376, 2006.

[10] Lin C., ROUGE: A Package for Automatic Evaluation of Summaries, in Proceedings of the Workshop on Text Summarization Branches Out Post(Conference Workshop of ACL , Spain, pp. 74-81, 2004.

[11] Lin Y., ROUGE: Recall-Oriented Understudy for Gisting Evaluation, available at: http://www.isi.edu/_cyl/ROUGE/, last visited 2003.

[12] Lin C. and Hovy E., Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics, in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language , USA, pp. 71-78, 2003.

[13] Litvak M. and Last M., Graph-Based Keyword Extraction for Single-Document Summarization, in Proceedings of the Workshop on Multi(Source Multilingual Information Extraction and Summarization Coling , USA, pp. 17-24, 2008.

[14] Liu Y., Wang X., Zhang J., and Xu H., Personalized PageRank Based Multi-document Summarization, in Proceedings of IEEE International Workshop on Semantic Computing and Systems , Huangshan, pp. 169-173, 2008.

[15] Luhn H., The Automatic Creation of Literature Abstracts, IBM Journal of Research Development , vol. 2, no. 2, pp.159-165, 1958.

[16] Mani I. and Maybury M., Advances in Automatic Summarization , MIT Press, Cambridege, 1999.

[17] Mihalcea R. and Tarau P., A Language Independent Algorithm for Single and Multiple Document Summarization, in Proceedings of International Joint Conference on Natural Language Processing , pp. 1-6, 2005.

[18] Mihalcea R. and Tarau P., TextRank: Bringing Order into Texts, in Proceedings of the Conference on Empirical Methods in Natural Language Processing , Spain, pp. 404-411, 2004.

[19] Nik Z. and Fumiyo F., Multi-Document Summarization Using Link Analysis Based on Rhetorical Relations between Sentences, in Proceedings of International Conference on Computational Linguistics and Intelligent Text Processing , Berlin, vol. 6609/2011, pp. 328-338, 2011.

[20] Over P., Dang H., and Harman D., DUC in Context, Information Processing and Enhanced Graph Based Approach for Multi Document Summarization 341 Management , vol. 43, no. 6, pp. 1506-1520, 2007.

[21] Page L., Brin S., Motwani R., and Winograd T., The PageRank Citation Ranking: Bringing Order to the Web, Technical Report, Stanford InfoLab, 1998.

[22] Patil K. and Brazdil P., Sumgraph: Text Summarization using Centrality in the Pathfinder Network, International Journal on Computer Science and Information Systems , vol. 2, no. 1, pp. 18-32, 2007.

[23] Porter M., An Algorithm for Suffix Stripping, Program: Electronic Library and Information Systems , vol. 14, no. 3, pp.130-137, 1980.

[24] Quinn T., Christophe C., and Charles D., Applications of Data Mining in Software Engineering, International Journal of Data Analysis Techniques and Strategies , vol. 2, no. 3 pp. 243-257, 2010.

[25] Radev D. and Tam D., Summarization Evaluation using Relative Utility, in Proceedings of the 12 th International Conference on Information and Knowledge Management , USA, pp. 508-511, 2003.

[26] Radev D., Jing H., Stys M., and Tam D., Centroid-Based Summarization of Multiple Documents, Information Processing and Management , vol. 40, no. 6, pp. 919-938, 2004.

[27] Sjobergh J., Older Versions of the ROUGEeval Summarization Evaluation System were Easier to Fool, Information Processing and Management , vol. 43, no. 6, pp. 1500-1505, 2007.

[28] Wan X., TimedTextRank: Adding the Temporal Dimension to Multi-Document Summarization, in Proceedings of the 30 th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , Amsterdam, pp. 867-868, 2007.

[29] Wan X., An Exploration of Document Impact on Graph-Based Multi-Document Summarization, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics , Hawaii, pp. 755-762, 2008.

[30] Yeh J., Ke H., and Yang W., iSpreadRank: Ranking Sentences for Extraction-Based Summarization using Feature Weight Propagation in the Sentence Similarity Network, Expert Systems with Applications , vol. 35, no. 3, pp. 1451-1462, 2008.

[31] Yaquan X. and Haibo W., A New Feature Selection Method Based on Support Vector Machines for Text Categorisation, International Journal of Data Analysis Techniques and Strategies , vol. 3, no. 1, pp. 1-20, 2011.

[32] Zobia R. and Waqas A., A Hybrid Approach for Urdu Sentence Boundary Disambiguation, The International Arab Journal of Information Technology , vol. 9, no. 3, pp. 250-255, 2012. Shanmugasundaram Hariharan received his B.E degree specialized in computer science and Engineering from Madurai Kammaraj University, India in 2002, M.E degree specialized in the field of Computer Science and Engineering from Anna University, Chennai, India in 2004. He holds his Ph D degree in the area of information retrieval from An na University, Chennai, India. He is a member of IAENG , IACSIT, ISTE, CSTA and has 9 years of experience in teaching. Currently, he is working as associate professor in the Department of Computer Science and Engineering, TRP Engineering College, Trichy- 621105, India. His research interests include information retrieval, data mining, opinion mining, web mining. He has to his credit 80 papers in refer red journals and conferences. Also, he serves as editor ial board member and as program committee member for several international conferences and journals. Thirunavukarasu Ramkumar currently he is working as a professor in the Department of Computer Applications, A.V.C College of Engineering, Mayiladuthurai. He has received his PhD degree in computer applications during the year 2010 from Anna University, Chennai. His area of specialization includes knowledge discovery from multiple databases and object computing. He is the fellow member of ISTE. Rengaramanujam Srinivasan received his BSc degree from the University of Madras, Chennai, India in 1962, MSc degree from the Indian Institute of Science, India in 1964 and PhD degree from the Indian Institute of Technology, India in 1971. He is a member of the ISTE and a Fellow of Institution of Engineers, India. He has over 40 yea rs of experience in teaching and research.