TempTracker: A Service Oriented Temporal Natural Language Processing Based Tool for Document Data Characterization and Social Network Analysis
With the advent of Web 2.0 based technology, news sites and micro-blog sites have become popular and have attracted the attention of people around the world. Existing textual data captured by these sites is highly beneficial for extracting (a) new information to analyze, and (b) temporal course of change in entities, topics and sentiment for differing granularities. This has been demonstrated by the study described in this paper. After collecting the data, several directions have been investigated in order to demonstrate its effectiveness under the umbrella of entity extraction, topic and sentiment analysis using Natural Language Processing (NLP) tools, temporal social media analysis, and time varying trend results of entity and sentiment aspect of entities. A service-based architecture has been proposed to process text data with NLP tools and to enrich the data. Text data is collected and processed via NLP tools. It is retrieved upon request for data analysis. The reported results illustrate the applicability and effectiveness of the conducted study.
[1] Agerri R., Bermudez J., and Rigau G., “Multilingual, Efficient and Easy NLP Processing with IXA Pipeline,” in Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, pp. 5-8, 2014.
[2] Baldridge J., “The OpenNLP Project,” http://opennlp.apache.org/index.html, Last Visited, 2021.
[3] Batista F. and Figueira A., “The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media,” in Proceedings of 18th EPIA Conference on Artificial Intelligence, Porto, pp. 803-814, 2017.
[4] Blei D. and Mcaulidde J., “Supervised Topic Models,” in Proceedings of the 20th International Conference on Neural Information Processing Systems, Vancouver, pp. 121-128, 2007.
[5] Combe D., Largeron C., Egyed-Zsigmond E., and Géry M., “A Comparative Study of Social Network Analysis Tools,” Web Intelligence and Virtual Enterprises, 2010.
[6] Dawoud K., Jarada T., Almansoori W., Chen A., Gao S., Alhajj R., and Rokne J., Handbook of Computational Approaches to Counterterrorism, Springer Link, 2013.
[7] Feng Y., Abdelli A., Rizzo G., and Troncy R., “Sentinel,” https://github.com/D2KLab/sentinel, downloaded, Last Visited, 2021.
[8] Hagen M., Potthast M., Büchner M., and Stein B., “Webis: An Ensemble for Twitter Sentiment Detection,” in Proceedings of the 9th International Workshop on Semantic Evaluation, Denver, pp. 582-589, 2015.
[9] Han J., Pei J., and Yin Y., “Mining Frequent Patterns without Candidate Generation,” ACM SIGMOD Record, vol. 29, no. 2, pp. 1-12, 2000.
[10] Hirschberg J. and Manning C., “Advances in Natural Language Processing,” Science, vol. 349, no. 6245, pp. 261-266, 2015.
[11] Jan-van-Eck N. and Waltman L., “Citation-based Clustering of Publications Using CitNetExplorer and VOSviewer,” Scientometrics, vol. 111, no. 2, pp. 1053-1070, 2017.
[12] Mahalakshmi G., Vijayan V., and Antony B., “Named Entity Recognition for Automated Test Case Generation,” The International Arab Journal of Information Technology, vol. 15, no. 1, pp. 112-120, 2018.
[13] Manning C., Surdeanu M., Bauer J., Finkel J., Bethard S., and McClosky D., “The Stanford CoreNLP Natural Language Processing Toolkit,” in Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, pp. 55-60, 2014.
[14] Meng Z., Temporal and Semantic Analysis of Richly Typed Social Networks from User- generated Content Sites on the Web, Theses, University of Nice Sophia Antipolis, 2016.
[15] Meng Z., Gandon F., Zucker C., and Song G., “Detecting Topics and Overlapping Communities in Question and Answer Sites,” Social Network Analysis and Mining, vol. 5, no. 1, pp. 1-27, 2015.
[16] Mikolov T., Sutskever I., Chen K., Corrado G., and Dean J., “Distributed Representations of Words and Phrases and their Compositionality,” in Proceedings of the 26th International Conference on Neural Information Processing Systems, Red Hook, pp. 3111-3119, 2013.
[17] Mrvar A. and Batagelj V., “Analysis and Visualization of Large Networks with Program Package Pajek,” Complex Adaptive Systems Modeling, vol. 4 no. 6, 2016.
[18] Nadeau D. and Sekine S., “A Survey of Named Entity Recognition and Classification,” Lingvisticæ Investigationes, vol. 30, no. 1, pp. 3- 26, 2007.
[19] Pang B. and Lee L., “Opinion Mining and Sentiment Analysis,” Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, pp. 1-135, 2008.
[20] Pinto A., Gonçalo-Oliveira H., and Oliveira Alves-A., “Comparing the Performance of Different NLP Toolkits in Formal and Social Media Text,” in Proceedings of 5th Symposium on Languages, Applications and Technologies, Dagstuhl, pp. 1-16, 2016.
[21] Röder M., Both A., and Hinneburg A., “Exploring the Space of Topic Coherence Measures,” in Proceedings of the 8th ACM International Conference on Web Search and Data Mining, Shanghai, pp. 399-408, 2015.
[22] Stenetorp P., Pyysalo S., Topic G., Ohta T., Ananiadou S., and Tsujii J., “BRAT: A Web- based Tool for NLP-assisted Text Annotation,” in Proceedings of the Demonstrations at the 13th 352 The International Arab Journal of Information Technology, Vol. 19, No. 3, May 2022 Conference of the European Chapter of the Association for Computational Linguistics, Avignon, pp. 102-107, 2012.
[23] Taylor A., Marcus M., and Santorini B., Treebanks, Springer Link, 2003.
[24] Wasserman S. and Faust K., Social Network Analysis: Methods and Applications, Cambridge University Press, 1994.