The International Arab Journal of Information Technology (IAJIT)

..............................
..............................
..............................


A Semantic Framework for Extracting Taxonomic

Nowadays, ontologies have been exploited in many current applications due to the abilities in representing knowledge and inferring new knowledge. However, the manual construction of ontologies is tedious and time-consuming. Therefore, the automated ontology construction from text has been investigated. The extraction of taxonomic relations between concepts is a crucial step in constructing domain ontologies. To obtain taxonomic relations from a text corpus, especially when the data is deficient, the approach of using the web as a source of collective knowledge (a.k.a web-based approach) is usually applied. The important challenge of this approach is how to collect relevant knowledge from a large amount of web pages. To overcome this issue, we propose a framework that combines Word Sense Disambiguation (WSD) and web approach to extract taxonomic relations from a domain-text corpus. This framework consists of two main stages: concept extraction and taxonomic-relation extraction. Concepts acquired from the concept-extraction stage are disambiguated through WSD module and passed to stage of extraction taxonomic relations afterward. To evaluate the efficiency of the proposed framework, we conduct experiments on datasets about two domains of tourism and sport. The obtained results show that the proposed method is efficient in corpora which are insufficient or have no training data. Besides, the proposed method outperforms the state of the art method in corpora having high WSD results.


[1] Banerjee S. and Pedersen T., “Extended Gloss Overlaps As A Measure of Semantic Relatedness,” in Proceedings of the 18th international joint Conference on Artificial Intelligence, Acapulco, pp. 805-810, 2003.

[2] Bordea G., Lefever E., and Buitelaar P., “SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2),” in Proceedings of the 10th International Workshop on Semantic Evaluation, San Diego, pp. 1081-1091, 2016.

[3] Brank J., Grobelnik M., and Mladenić D., “A Survey of Ontology Evaluation Techniques,” in Proceedings of the Conference on Data Mining and Data Warehouses, Ljubljana, pp. 166-169, 2005.

[4] Caraballo S., “Automatic Construction of A Hypernym-Labeled Noun Hierarchy from Text,” in Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, Maryland, pp. 120-126, 1999.

[5] Cederberg S. and Widdows D., “Using LSA and Noun Coordination Information to Improve The Precision and Recall of Automatic Hyponymy Extraction,” in Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL, Edmonton, pp. 111-118, 2003.

[6] Cimiano P., Hotho A., and Staab S., “Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis, ” Journal of Artificial Intelligence Research, vol. 24, pp. 305-339, 2005.

[7] Cimiano P., Pivk A., Schmidt-Thieme L., and Staab S., Ontology Learning from Text: Methods, Evaluation and Applications, IOS Press, 2005.

[8] Cimiano P. and Staab S., “Learning by Googling,” SIGKDD Explorations, vol. 6, no. 2, pp. 24-33, 2004.

[9] De Knijff J., Frasincar F., and Hogenboom F., “Domain Taxonomy Learning From Text: The Subsumption Method Versus Hierarchical Clustering,” Data and Knowledge Engineering, vol. 83, pp. 54-69, 2013.

[10] Dellschaft K. and Staab S., “On how to Perform A Gold Standard Based Evaluation of Ontology Learning,” in Proceedings of International Semantic Web Conference, Athens, pp. 228-241, 2006.

[11] Dietz E., Vandic D., and Frasincar F., “TaxoLearn: A Semantic Approach to Domain Taxonomy Learning,” in Proceedings of IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Macau, pp. 58-65, 2012.

[12] Doan P., Arch-int N., and Arch-int S., “Improving key Concept Extraction Using Word Association Measurement,” in Proceedings of the 7th International Conference on Information Technology and Electrical Engineering, Chiang, pp. 403-407, 2015.

[13] Fellbaum C., WordNet: An Electronic Lexical Database, Camb. MA MIT Press, 1998.

[14] Hadni M., Alaoui S., and Lachkar A., “Word Sense Disambiguation for Arabic Text Categorization,” The International Arab Journal of Information Technology, vol.13, no. 1, pp. 215-222, 2016.

[15] Hazman M., El-Beltagy S., and Rafea A., “A Survey of Ontology Learning Approaches,” International Journal of Computer Applications, vol. 22, no. 9, pp. 36-43, 2011.

[16] Hearst M., “Automatic Acquisition of Hyponyms from Large Text Corpora,” in Proceedings of the 15th International Conference on Computational Linguistics, Nantes, pp. 539- 545, 1992.

[17] Jiang X. and Tan A., “CRCTOL: A Semantic- Based Domain Ontology Learning System,” Journal of the American Society for Information Science and Technology, vol. 61, no. 1, pp. 150- 168, 2010.

[18] Kavalec M. and Svaték V., “A Study on Automated Relation Labelling in Ontology Learning,” Ontology Learning from Text: Methods, Evaluation and Applications, no. 123, pp. 44-58, 2005. 336 The International Arab Journal of Information Technology, Vol. 17, No. 3, May 2020

[19] Koeling R., McCarthy D., and Carroll J., “Domain-Specific Sense Distributions and Predominant Sense Acquisition,” in Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, pp. 419-426, 2005.

[20] Lesk M., “Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell A Pine Cone from an Ice Cream Cone,” in Proceedings of the 5th Annual International Conference on Systems Documentation, Toronto, Canada, pp. 24-26, 1986.

[21] Medelyan O., Witten H., Divoli A., and Broekstra J., “Automatic Construction of Lexicons, Taxonomies, Ontologies, and other Knowledge Structures,” Data Mining and Knowledge Discovery, vol. 3, no. 4, pp. 257-279, 2013.

[22] Meijer K., Frasincar F., and Hogenboom F., “A Semantic Approach for Extracting Domain Taxonomies from Text,” Decision Support Systems, vol. 62, pp. 78-93, 2014.

[23] Miller G., “WordNet: A Lexical Database for English, ” Communications of the ACM, vol. 38, no. 11, pp. 39-41, 1995.

[24] Navigli R., “Word Sense Disambiguation: A Survey,” ACM Computing Surveys, vol. 41, no. 2, pp. 1-69, 2009.

[25] Navigli R. and Velardi P., “Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites,” Computational Linguistics, vol. 30, no. 2, pp. 151-179, 2004.

[26] Noy N. and McGuinness D., “Ontology Development 101: A Guide to Creating Your First Ontology,” Stanford Knowledge Systems laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, 2001.

[27] Ortega-Mendoza R., Villaseñor-Pineda L., and Montes-y-Gómez M., “Using Lexical Patterns for Extracting Hyponyms from The Web,” in Proceedings of the Mexican International Conference on Artificial Intelligence, Aguascalientes, pp. 904-911, 2007.

[28] Pantel P. and Lin D., “Discovering Word Senses from Text,” in Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, pp. 613- 619, 2002.

[29] Paukkeri M., García-Plaza P., Fresno V., Unanue R., and Honkela T., “Learning a Taxonomy from A Set of Text Documents,” Applied Soft Computing, vol. 12, no. 3, pp. 1138-1148, 2012.

[30] Petasis G., Karkaletsis V., Paliouras G., Krithara A., and Zavitsanos E., Knowledge-Driven Multimedia Information Extraction and Ontology Evolution, Springer Berlin Heidelberg, 2011.

[31] Pradhan S., Loper E., Dligach D., and Palmer M., “SemEval-2007 task 17: English Lexical Sample, SRL and All Words,” in Proceedings of the 4th International Workshop on Semantic Evaluations, Prague, pp. 87-92, 2007.

[32] Ranjan Pal A. and Saha D., “Word Sense Disambiguation: A Survey,” International Journal of Control Theory and Computer Modeling, vol. 5, no. 3, pp. 1-16, 2015.

[33] Rios-Alvarado A., Lopez-Arevalo I., and Sosa- Sosa V., “Learning Concept Hierarchies from Textual Resources for Ontologies Construction,” Expert Systems with Applications, vol. 40, no. 15, pp. 5907-5915, 2013.

[34] Sang E., “Extracting Hypernym Pairs from The Web,” in Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Prague, pp. 165-168, 2007.

[35] Snow R., Jurafsky D., and Ng A., “Learning- Syntactic-Patterns-For-Automatic-Hypernym- Discovery,” in Proceedings of the 17th International Conference on Neural Information Processing Systems, Vancouver, pp. 1297-1304, 2004.

[36] Snyder B. and Palmer M., “The English all- Words Task, ” in Proceedings of SENSEVAL-3, the 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Barcelona, pp. 41-43, 2004.

[37] Tu D., Chen L., and Chen G., “Automatic Multi- Way Domain Concept Hierarchy Construction from Customer Reviews, ” Neurocomputing, vol. 147, no. 1, pp. 472-484, 2015.

[38] Wong W., Liu W., and Bennamoun M., “Ontology Learning from Text: A Look Back and Into The Future,” ACM Computing Surveys, vol. 44, no. 4, pp. 1-36, 2012.

[39] Yamane J., Takatani T., Yamada H., Miwa M., Sasaki Y., “Distributional Hypernym Generation by Jointly Learning Clusters and Projections,” in Proceedings of the 26th International Conference on Computational Linguistics, Osaka, pp. 1871- 1879, 2016.

[40] Zafar B., Qamar U., and Imran A., “A Domain- Independent Hybrid Approach for Automatic Taxonomy Induction,” in Proceedings of the 17th International Conference on Parallel and Distributed Computing, Applications and Technologies, Guangzhou, pp. 372-375, 2016. A Semantic Framework for Extracting Taxonomic Relations from Text Corpus 337 Phuoc Thi Hong Doan received the M.S degree from Hue University of Sciences, Vietnam, in 2004. She is currently a Ph.D student in Department of Computer Science, Faculty of Science, Khon Kaen University, Thailand. Her research interests are data mining, natural language processing and information retrieval. Ngamnij Arch-int received the PhD degree in computer science from Chulalongkorn University, Thailand in 2003. She is currently an associate professor in the Department of Computer Science at Khon Kaen University, Thailand. Her research interests include the semantic web, web services, semantic web services, and heterogeneous information integration. Somjit Arch-int received the PhD degree in computer science from the Asian Institute of Technology, Thailand in 2002. He is currently an associate professor in the Department of Computer Science at Khon Kaen University, Thailand. His previous experiences include the development of several industry systems and consulting activities. His research interests are business component-based software development, objectoriented metrics, ontology-based e-business modeling, knowledge-based representation, semantic information integration, data mining, and semantic Web. He is a member of the IEEE Computer Society.