The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


A Sparse Topic Model for Bursty Topic Discovery in Social Networks

Bursty topic discovery aims to automatically identify bursty events and continuously keep track of known events. The existing methods focus on the topic model. However, the sparsity of short text brings the challenge to the traditional topic models because the words are too few to learn from the original corpus. To tackle this problem, we propose a Sparse Topic Model (STM) for bursty topic discovery. First, we distinguish the modeling between the bursty topic and the common topic to detect the change of the words in time and discover the bursty words. Second, we introduce “Spike and Slab” prior to decouple the sparsity and smoothness of a distribution. The bursty words are leveraged to achieve automatic discovery of the bursty topics. Finally, to evaluate the effectiveness of our proposed algorithm, we collect Sina weibo dataset to conduct various experiments. Both qualitative and quantitative evaluations demonstrate that the proposed STM algorithm outperforms favorably against several state-of-the-art methods.


[1] Becker H., Naaman M., and Gravano L., “Beyond Trending Topics: Real-World Event Identification on Twitter,” in Proceedings of the 5th International Conference on Weblogs and Social Media, Barcelona, pp. 438-441, 2011.

[2] Blei D., Ng A., and Jordan M I., “Latent Dirichlet Allocation,” Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.

[3] Cheng X., Yan X., Guo J, and Lan Y., “BTM: Topic Modeling over Short Texts,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 12, pp. 2928-2941, 2014.

[4] Diao Q., Jiang J., LIM E, and Zhu F., “Finding Bursty Topics from Microblogs,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers- Association for Computational Linguistics, Jeju Island, pp. 536-544, 2012.

[5] Dong X., Mavroeidis D., Calabrese F, and Frossard P., “Multiscale Event Detection in Social Media,” Data Mining and Knowledge Discovery, vol. 29, no. 5, pp. 1374-1405, 2015.

[6] Fang Y., Zhang H., Ye Y., and Li X., “Detecting Hot Topics from Twitter: A Multiview Approach,” Journal of Information Science, vol. 40, no. 5, pp. 578-593, 2014.

[7] Guille A. and Favre C., “Event Detection, Tracking, and Visualization in Twitter: A Mention-Anomaly-Based Approach,” Social A Sparse Topic Model for Bursty Topic Discovery in Social Networks 823 Network Analysis and Mining, vol. 5, no. 1, pp. 1- 18, 2015.

[8] Griffiths T. and Steyvers M., “Finding Scientific Topics,” in Proceedings of the National academy of Sciences, pp. 5228-5235, 2004.

[9] Hoffman M., Bach F., and Blei D., “Online Learning for Latent Dirichlet Allocation,” in Processing of Advances in Neural Information Processing Systems, Vancouver, pp. 856-864, 2010.

[10] Hofmann T., “Probabilistic Latent Semantic Indexing,” in Proceedings of the 22nd annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, pp. 50-57, 1999.

[11] Huang J., Peng M., Wang H., Cao J., and Gao W., Zhang X., “A Probabilistic Method for Emerging Topic Tracking in Microblog Stream,” World Wide Web-internet and Web Information Systems, vol. 20, no. 2, pp. 325-350, 2017.

[12] Huang W., Wang T., Chen W., and Wang Z., “Category-Level Transfer Learning From Knowledge Base to Microblog Stream for Accurate Event Detection,” in Proceedings of International Conference on Database Systems for Advanced Applications, Suzhou, pp. 50-67, 2017.

[13] Kasiviswanathan S., Melville P., Banerjee A., and Sindhwani V., “Emerging Topic Detection Using Dictionary Learning,” in Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, pp. 745- 754, 2011.

[14] Lample G., Ballesteros M., Subramanian S., Kawakami., and Dyer C., “Neural Architectures for Named Entity Recognition,” in Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, pp. 260-270, 2016.

[15] Lau J., Collier N., and Baldwin T., “On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online,” in Proceedings of COLING, Mumbai, pp. 1519-1534, 2012.

[16] Lin T., Zhang S., and Cheng H., “Understanding Sparse Topical Structure of Short Text via Stochastic Variational-Gibbs Inference,” in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, pp. 407-416, 2016.

[17] Li C., Sun A., and Datta A., “Twevent: Segment- Based Event Detection from Tweets,” in Proceedings of ACM International Conference on Information and Knowledge Management, Maui, pp. 155-164, 2012.

[18] Lin T., Tian W., Mei Q., and Cheng H., “The Dual-Sparse Topic Model: Mining Focused Topics and Focused Terms in Short Text,” in Proceedings of International Conference on World Wide Web, Seoul, pp. 539-550, 2014.

[19] Mcminn A. and Jose J., “Real-Time Entity- Based Event Detection for Twitter,” in Proceedings of International Conference of the Cross-Language Evaluation Forum for European Languages, Toulouse, pp. 65-77, 2015.

[20] Mehrotra R., Sanner S., Buntine W., and Xie L., “Improving Lda Topic Models for Microblogs Via Tweet Pooling and Automatic Labeling,” in Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Ireland, pp. 889-892, 2013.

[21] Mimno D., Wallach H., Talley E., Leenders M., and McCallum A., “Optimizing Semantic Coherence in Topic Models,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, pp. 262-272, 2011.

[22] Newman D., Lau J., Grieser K., and Baldwin T., “Automatic Evaluation of Topic Coherence,” in Proceedings of Human Language Technologies: Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Boulder, pp. 100-108, 2010.

[23] Parikh R. and Karlapalem K., “Et: Events From Tweets,” in Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, pp. 613-620, 2013.

[24] Petrovic S., Osborne M., and Lavrenko V., “Streaming First Story Detection with Application to Twitter,” in Proceedings of Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, Los Angeles, pp. 181-189, 2010.

[25] Petrovi., Osborne M., and Lavrenko V., “Using Paraphrases for Improving First Story Detection in News and Twitter,” in Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Montreal, pp. 338-346, 2012.

[26] Quan X., Kit C., Ge Y., and Pan S., “Short and Sparse Text Topic Modeling via Self- Aggregation,” in Proceedings of the 24th International Joint Conference on Artificial Intelligence, Buenos Aires, pp. 2270-2276, 2015.

[27] Selvam S., Balakrishnan R., and Ramakrishnan B., “Social Event Detection-A Systematic Approach Using Ontology and Linked Open Data with Significance to Semantic Links,” The International Arab Journal of Information Technology, vol. 15, no. 4, pp. 729-738, 2018.

[28] Stilo G. and Velardi P., “Efficient Temporal Mining of Micro-Blog Texts and Its Application to Event Discovery,” Data Mining and 824 The International Arab Journal of Information Technology, Vol. 17, No. 5, September 2020 Knowledge Discovery, vol. 30, no. 2, pp. 372-402, 2016.

[29] Teh Y., Jordan M., Beal M., and Blei D., “Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes,” in Proceedings of Advances in Neural Information Processing Systems 17 Neural Information Processing Systems, Vancouver, pp. 1385-1392, 2004.

[30] Wang Y., Liu J., Huang Y., and Feng V., “Using Hashtag Graph-Based Topic Model To Connect Semantically-Related Words Without Co- Occurrence In Microblogs,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 7, pp. 1919-1933, 2016.

[31] Wang C. and Blei D., “Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process,” in Processing of Advances in Neural Information Processing Systems, Vancouver, pp. 1982-1989, 2009.

[32] Xie W., Zhu F., Jiang J., Lim E., and Wang K., “Topicsketch: Real-Time Bursty Topic Detection From Twitter,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 8, pp. 2216- 2229, 2016.

[33] Yan X., Guo J., Lan Y., Xu J., and Cheng X., “A Probabilistic Model for Bursty Topic Discovery in Microblogs,” in Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, pp. 353-359, 2015.

[34] Yang G., Wen D., Chen N., Sutinen N., and Kinshuk., “A Novel Contextual Topic Model for Multi-Document Summarization,” Expert Systems with Applications, vol. 42, no.3, pp. 1340-1352, 2015.

[35] Yin J. and Wang J., “A Dirichlet Multinomial Mixture Model-Based Approach for Short Text Clustering,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, pp. 233- 242, 2014.

[36] Zarrinkalam F. and Bagheri E., “Event Identification in Social Networks,” Encyclopedia with Semantic Computing, vol. 1, no.1, pp. 1-8, 2016.

[37] Zhang, X., and Chen X., Chen Y., Wang S., and Xia J., “Event Detection and Popularity Prediction in Microblogging,” Neurocomputing, vol. 149, no. 2, pp. 1469-1480, 2015.

[38] Zhou X. and Chen L., “Event Detection over Twitter Social Media Streams,” The Very Large Data Bases journal, vol. 23, no.3, pp. 381-400, 2014.

[39] Zuo Y., Wu J., Zhang H., Lin H., Wang F., Xu K., and Xiong H., “Topic Modeling of Short Texts: A Pseudo-Document View,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 2105-2114, 2016. Lei Shi was born in 1986. He received the M.S. degree in Control Engineering from Inner Mongolia University of Science and Technology. He is now a Ph.D. candidate in Computer Science and Technology of Beijing University of Posts and Telecommunications. His research interests include social network search, data mining and cross-media search Junping Du was born in 1963. She is now a professor and Ph.D. tutor at the School of Computer Science and Technology, Beijing University of Posts and Telecommunications. Her research interests include artificial intelligence, image processing and pattern recognition. Feifei Kou was born in 1989. She received her M.S. degree in Computer technology from Beijing Technology and Business University. She is now a Ph.D. candidate in Computer Science and Technology of Beijing University of Posts and Telecommunications. Her research interests include social network search, semantic analysis and semantic learning.