Downloads 414

..............................

Views 1k

..............................

Cited by

..............................

Received date November 9, 2021

Accepted date December 14, 2022

MAPNEWS: A Framework for Aggregating and Organizing Online News Articles

Author Jeelani Ahmed, Muqeem Ahmed,

Keywords #News aggregation #information retrieval #clustering #data aggregation #web scrapping

Abstract

In recent years, digital news has become increasingly prevalent, with many people getting their news and information from online sources rather than traditional print or broadcast media. This shift has been driven, in part, by the convenience and accessibility of digital platforms, as well as the ability to personalize and customize news feeds. Digital news also allows for greater interactivity and engagement with readers and can reach a global audience almost instantly. News articles contain a plethora of hidden spatial information that, when shared with readers, increases comprehension of current events. Only a few news aggregation systems make this information available to users. Many stories, on the other hand, are not clearly geotagged with their spatial information. In this work, we propose the MapNews framework, a novel system that gathers, analyzes, and presents news articles on a map interface, allowing users to take advantage of their underlying spatial information. MapNews pulls content from several different internet news sources and, using a custom-built geotagger, it extracts geographic content from articles. A rapid online clustering method is used to organize articles into story clusters. Panning and zooming MapNews' map interface allows readers to receive news based on geographic location and category importance, and they will view distinct articles depending on their location. MapNews achieved an ARI score of 0.89 for clustering and an accuracy of 95% in usability testing.

References

[1] Allcott H. and Gentzkow M., “Social Media and Fake News in the 2016 Election,” Journal of Economic Perspectives, vol. 31, no. 2, pp. 211- 36, 2017.

[2] Aniche M., Treude C., Steinmacher I., Wiese I., Pinto G., Storey M., Gerosa M., “How Modern News Aggregators Help Development Communities Shape and Share Knowledge,” in Proceedings of the International Conference on Software Engineering, Gothenburg, pp. 499-510, 2018.

[3] Athey S., Mobius M., and Pal J., “The Impact of Aggregators on Internet News Consumption,” National Bureau of Economic Research, Working Paper No. w28746, 2021.

[4] Bayardo R., Ma Y., and Srikant R., “Scaling up All Pairs Similarity Search,” in Proceedings of the 6th International World Wide Web Conference, Banff Alberta, pp. 131-140, 2007.

[5] Belwal R., Rai S., and Gupta A., “Text Summarization Using Topic-Based Vector Space Model and Semantic Measure,” Information Processing and Management, vol. 58, no. 3, pp. 102536, 2021.

[6] Burger J., Henderson J., and Morgan W., “Statistical Named Entity Recognizer Adaptation, Available: https://aclanthology.org/W02-2003, Last Visited, 2022.

[7] Buyukokkten O., Cho J., Garcia-Molina H., Gravano L., and Shivakumar N., “Exploiting Geographical Location Information of Web Pages,” in Proceedings of the WebDB (Informal Proceedings), Link oping, pp. 91-96, 1999.

[8] Calzada J. and Gil R., “What Do News Aggregators Do? Evidence from Google News in Spain and Germany,” Marketing Science, vol. 39, no. 1, pp. 134-167, 2019.

[9] Carrizosa E., Guerrero V., and Romero Morales MAPNEWS: A Framework for Aggregating and Organizing Online News Articles 385 D., “On Mathematical Optimization for Clustering Categories in Contingency Tables,” Advances in Data Analysis and Classification, pp. 1-23, 2022.

[10] Chen Y., Suel T., and Markowetz A., “Efficient Query Processing in Geographic Web Search Engines,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, pp. 277-288, 2006.

[11] Cucerzan S. and Yarowsky D., “Language Independent NER Using a Unified Model of Internal and Contextual Evidence,” in Proceedings of the 6th Conference on Natural Language Learning, Stroudsburg, 2002.

[12] Das A., Datar M, Garg A., and Rajaram S., “Google News Personalization: Scalable Online Collaborative Filtering,” in Proceedings of the 6th International World Wide Web Conference, Banff, pp. 271-280, 2007.

[13] Ding J., Gravano L., and Shivakumar N., “Computing Geographical Scopes of Web Resources,” in Proceedings of the 26th VLDB Conference, Cairo, 2022.

[14] Dos Santos C. and Guimarães V., “Boosting Named Entity Recognition with Neural Character Embeddings,” arXiv preprint arXiv:1505.05008, pp. 25-33, 2015. Available: https://arxiv.org/abs/1505.05008v2, Last Visited, 2021.

[15] Duda R., Hart P., and Stork D., Pattern Classification, Wiley-Interscience Publication, 2006.

[16] Francis W., “A Standard Corpus of Edited Present- Day American English,” College English, vol. 26, no. 4, pp. 267-273, 1965.

[17] GeoNames., https://www.geonames.org/ Last Visited, 2021.

[18] George L. and Hogendorn C., “Local News Online: Aggregators, Geo-Targeting and the Market for Local News*,” Journal of Industrial Economics, vol. 68, no. 4, pp. 780-818, 2020.

[19] Google News, “Google News,” https://news.google.com/topstories?hl=en- IN&gl=IN&ceid=IN:en Last Visited, 2021.

[20] Kilimci Z. and Omurca S., “Enhancement of the Heuristic Optimization Based on Extended Space Forests Using Classifier Ensembles,” The International Arab Journal of Information Technology, vol. 17, no. 2, pp. 188-195, 2020.

[21] Kloog I., Kaufman L., and De Hoogh K., “Using Open Street Map Data in Environmental Exposure Assessment Studies: Eastern Massachusetts, Bern Region, and South Israel as a Case Study,” International Journal of Environmental Research and Public Health, vol. 15, no. 11, pp. 2443, 2018.

[22] Langseth A., “Use of Spatial Information in News Recommenders,”

[Online]. Available: https://ntnuopen.ntnu.no/ntnu- xmlui/handle/11250/3024702, Last Visited, 2022.

[23] Leidner J., “Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names,” ACM SIGIR Forum, vol. 41, no. 2, pp. 124-126, 2008.

[24] Li S., “Replacement or Complement: A Niche Analysis of Yahoo News, Television News, and Electronic News,” Telematics and Informatics, vol. 34, no. 4, pp. 261-273, 2017.

[25] Li Y., Shetty P., Liu L., Zhang C., and Song L., “BERTifying the Hidden Markov Model for Multi Source Weakly Supervised Named Entity Recognition,” in Proceedings of the Fifty- ninetieth Annual Meeting of the Association for Computational Linguistics and the eleventh International Joint Conference on Natural Language Processing, Bangkok, pp. 6178-6190, 2021.

[26] LingPipe, http://www.alias-i.com/lingpipe/, Last Visited, 2021.

[27] McCurley K., “Geospatial mapping and navigation of the Web,” in Proceedings of The 10th International Conference on World Wide Web, Hong Kong, pp. 221-229, 2001.

[28] McNamee P. and Mayfield J., “Entity Extraction without Language-Specific Resources,” in Proceedings of the 6th Conference on Natural Language Learning, Sanya, pp. 1-4, 2002.

[29] Microsoft Live News., “Recent News-Stories,” https://news.microsoft.com/recent-news/ Last Visited, 2021.

[30] Molina-Villegas A., Muñiz-Sanchez V., Arreola- Trapala J., and Alcántara F., “Geographic Named Entity Recognition and Disambiguation in Mexican News using Word Embeddings,” Expert Systems with Applications, vol. 176, pp. 114855, 2021.

[31] Patrick J., Whitelaw C., and Munro R., “SLINERC: The Sydney Language-Independent Named Entity Recogniser and Classifier,” in Proceedings of the 6th Conference on Natural Language Learning, Taipei, 2002.

[32] Pérez Sechi C. and Pérez Sechi C., “Leveraging Entities Knowledge to Bypass the Cold-Start Recommender Problem on Microsoft News Dataset,” Máster en Minería de Datos e Inteligencia de Negocios, 2021.

[33] Phelan O., McCarthy K., Bennett M., and Smyth B., “Terms of a Feather: Content-based News Recommendation and Discovery Using Twitter,” in Proceedings of the Advances in Information Retrieval-33rd European Conference on IR Research, Dublin, pp. 448-459, 2011.

[34] Ravin Y., Watson T., and Wacholder N., Extracting Names from Natural-Language Text, Citeseer, 1997.

[Online]. Available: 386 The International Arab Journal of Information Technology, Vol. 20, No. 3, May 2023 http://citeseerx.ist.psu.edu/viewdoc/summary?doi =10.1.1.55.6337, Last Visited, 2021.

[35] Reddit News, “Reddit-Dive into Anything,” https://www.reddit.com/ Last Visited, 2021.

[36] Salton G., Wong A., and Yang C., “A Vector Space Model for Automatic Indexing,” Communications of the ACM, vol. 18, no. 11, pp. 613-620, 1975.

[37] Salton G. and Buckley C., “Term-Weighting Approaches in Automatic Text Retrieval,” Information Processing and Management, vol. 24, no. 5, pp. 513-523, 1988.

[38] Steinbach M., Karypis G., and Kumar V., “A Comparison of Document Clustering Techniques,” 2000. Available: http://conservancy.umn.edu/handle/11299/215421, Last Visited, 2021.

[39] Vasiliev Y., “Natural Language Processing with Python and spaCy: A Practical Introduction- Google Books,” https://nostarch.com/NLPPython Last Visited, 2022.

[40] Wang S. and Koopman R., “Clustering Articles Based on Semantic Similarity,” Scientometrics, vol. 111, no. 2, pp. 1017-1031, 2017.

[41] Yahoo News, “Yahoo News-Latest News and amp; Headlines,” https://news.yahoo.com/ Last Visited, 2021.

[42] Zhang S. and Wong H., “ARImp: A Generalized Adjusted Rand Index for Cluster Ensembles,” in Proceedings of the 20th International Conference on Pattern Recognition, Istanbul, pp. 778-781, 2010.

[43] Zhou G., and Su J., “Named Entity Recognition Using an HMM-Based Chunk Tagger,” in Proceedings of the Fortieth Annual Meeting on Association for Computational Linguistic, Philadelphia, pp. 473-480, 2002.