The International Arab Journal of Information Technology (IAJIT)


A Schema-Free Instance Matching Algorithm Based on Virtual Document Similarity

With the continuous development of semantic web, especially of the web of data, several knowledge bases expressed by ontologies are independently created and added to the Linked Open Data (LOD) cloud, on a daily basis. A major challenge for the LOD paradigm is to discover resources that refer to the same real-world object, in order to interlink web resources and hold large scale data integration and sharing. In this context, instance matching is a promising solution. It aims to link co- referent instances belonging to heterogeneous knowledge bases with owl: same as links. Several state-of-the-art existing approaches addressing this issue are based on the prior schema-level matching's, which does not avoid the limitation of heterogeneity at the property-level. In this paper, we propose a schema-free, scalable and efficient instance matching approach that is independent from matching results at the schema-level. We transform the instance matching problem to a document similarity problem and we solve it by a Clustering technique that uses an Ascendant Hierarchical Clustering algorithm to group similar instances in the same clusters. Furthermore, we design multiple validating patterns that use some structural information to validate obtained mappings and eliminate wrong ones. Experiments on instance matching track from Ontology Alignment Evaluation Initiative (OAEI) show that our approach gets prominent results compared to several participating systems in OAEI’2019, OAEI’2020 and OAEI’2021.

[1] Amrouch S. and Mostefai S.,“Ascendant Hierarchical Clustering for Instance Matching,” in proceeding of the 22nd International Arab Conference on Information Technology, Oman, pp. 1-6, 2021.

[2] Assi A., Mcheick H., Karawash A., and Dhifli W., “Context-aware Instance Matching Through Graph Embedding in Lexical Semantic Space,” Knowledge-Based Systems, vol. 186, p. 422-433, 2019.

[3] Berners-Lee T., Hendler J. and Lassila O.,“The Semantic Web,” Scientific American, vol. 284, no. 5, pp. 34-43, 2001.

[4] Bhattacharya I. and Getoor L., Mining Graph Data, Wiley and Sons, 2006.

[5] Bizer C., Heath T., and Berners-Lee T., “Linked Data-The Story So Far,” Semantic Web and Information Systems, vol. 5, no. 3, pp. 1-22, 2009.

[6] Bruynooghe M., Large Data Set Clustering Methods Using the Concept of Space Contraction, Physika Verlag, 1978.

[7] Cruz I., Antonelli F., and Stroe, C., “AgreementMaker: Efficient Matching for Large Real-World Schemas and Ontologies,” Journal of VLDB, vol. 2, no. 2, pp. 1586-1589, 2009.

[8] Efthymiou V., Papadakis G., Stefanidis K., and Christophides V., “Minoaner: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities,” in Proceeding of the 22nd International Conference on Extending Database Technology, lisbon, pp. 373-384, 2019.

[9] Ell B., Vrandecic D., and Simperl E., “Labels in the Web of Data,” in Proceeding of the 10th International Semantic Web Conference , Bonn, pp. 162-176, 2011.

[10] Elmagarmid A., Ipeirotis P., and Verykios V., “Duplicate Record Detection: A Survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 1, pp. 1-16, 2007.

[11] Faria D., Pesquita C., Santos E., Palmonari M., Cruz I., and Couto F., “The AgreementMakerLight Ontology Matching System”, in Proceeding of the On the Move to Meaningful Internet System, pp. 527-541, 2013.

[12] Ferrara A., Nikolo A., Noessner J., and Scharffe F., “Evaluation of Instance Matching Tools: The Experience of Oaei,” Journal of Web Semantics, vol. 21, pp. 49-60, 2013.

[13] Ferrara A., Nikolov A., and Scharffe F.,“Data Linking for the Semantic Web,” International journal on Semantic Web and Information systems, vol. 7, no. 3, pp. 46-76, 2011.

[14] Gruber T., “A Translation Approach to Portable Ontology Specifications,” Knowledge Acquisition, vol. 5, no. 2, pp. 199-220, 1993.

[15] Hu Y., Bai S., Zou S., and Wang P., “Lily Results for OAEI 2020,” in Proceeding of the 15th International Semantic Web Conference, Athens, pp. 194-200, 2020.

[16] Hu W . and Jia C., “Bootstrapping Approach to Entity Linkage on the Semantic Web,” Journal of Web Semantics, vol. 34, pp. 1-12, 2015.

[17] Hu W., Chen J., and Qu Y., “Self-Training Approach for Resolving Object Coreference on The Semantic Web,” in Proceeding of the 20th International Conference on World Wide Web, Hyderabad, pp. 87-96, 2011.

[18] Jimenez-Ruiz E., “LogMap Family Participation in the OAEI 2020,” in Proceeding of the 15th International Semantic Web Conference, Workshop on Ontology Matching, Athens, pp. 201-203, 2020.

[19] Jimenez-Ruiz E., Grau B., Horrocks I., and Berlanga R., “Logic-based Assessment of the Compatibility of UMLS Ontology Sources,” 440 The International Arab Journal of Information Technology, Vol. 19, No. 3A, Special Issue 2022 Journal of Biomedical Semantics, vol. 2, no. 1, pp. 1-16, 2011.

[20] Li J., Wang z., Zhang x., and Tang j., “Large Scale Instance Matching Via Multiple Indexes and Candidate Selection,” Knowledge-Based Systems, vol. 50, pp. 112-120, 2013.

[21] Li C., Jin L., and Mehrotra S., “Supporting Efficient Record Linkage For Large Data Sets Using Mapping Techniques,” World Wide Web, vol. 9, no. 4, pp. 557-584, 2006.

[22] Lima B., Faria D., Couto F., Cruz I., and Pesquita C., “Results for OAEI 2020 AML and AMLC,” in Procceding of the 15th International Semantic Web Conference, Athens, pp. 154-160, 2020.

[23] Madhulatha T., “An Overview on Clustering Methods, IOSR,” Journal of Engineering, vol. 2, no. 4, pp. 719-725, 2012.

[24] McMahan H., Holt G., Sculley D., Young M., Ebner D., Grady J., Nie L., Phillips T., Davydov E., Golovin D., Chikkerur S., Liu D., Wattenberg M., Hrafnkelsson A., Boulos T., and Kubica J., “Ad Click Prediction: A View From the Trenches,” in Proceeding of the 19th International Conference on Knowledge Discovery and Data Mining, Chicago, pp. 1222-1230, 2013.

[25] Nassiri A., Pernelle N., Saïs F., and Quercini G., “RE-miner for Data Linking Results For OAEI 2020,” in Proceeding of the 15th International Semantic Web Conference, workshop on Ontology Matching, Athens, pp. 211-215, 2020.

[26] Nentwig M ., Hartung M., Ngomo A., and Rahm E., “A Survey Of Current Link Discovery Frameworks,” Journal of Semantic Web, vol. 8, no. 3, pp. 419-436, 2017.

[27] Noessner J., Niepert M., Meilicke C., and Stuckenschmidt H., “Leveraging Terminological Structure for Object Reconciliation,” in Proceeding of the 7th Extended Semantic Web Conference, Heraklion, pp. 334-348, 2010.

[28] Omran M., Engelbrecht A., and Salman, A., “An Overview of Clustering Methods,” Intelligent Data Analysis, vol. 11, no. 6, pp. 583-605, 2007.

[29] Pernelle N., Saïs F., and Symeonidou D., “An Automatic Key Discovery Approach for Data Linking,” Journal of Web Semantics, vol. 23, pp. 16-30, 2013.

[30] Pour M., Algergawy A., Amini R., Faria D., Fundulaki I., Harrow I., Hertling S., Jimenez-Ruiz E., Jonquet C., Karam N., Khiat A., Laadhar A., Lambrix P., Li H., Li Y., Hitzler P., Paulheim H., Pesquita C., Saveta T., Shvaiko P., Splendiani A., Thieblin E., Trojahn C., VatascinovA J., Yaman B., Zamazal O., and Zhou L., “Results of the Ontology Alignment Evaluation Initiative 2020,” in Proceeding of the 15th International Workshop on Ontology Matching, Athens, PP. 42-138, 2020.

[31] Pulido J., Ruiz M., Herrera R., Cabello C., Legrand S., and Elliman D., “Ontology Languages For The Semantic Web: A Never Completely Updated Review.” Knowledge-Based Systems, vol. 19, no. 7, pp. 489-497, 2006.

[32] Raimond Y., Sutton C., and Sandler M., “Automatic Interlinking of Music Datasets on the Semantic Web,” in Proceeding of the 1st Workshop about Linked Data on the Web, Beijing, 2008.

[33] Saïs F., Pernelle N., and Rousset M-C., “Combining A Logical and A Numerical Method for Data Reconciliation,” Data Semantics XII, vol. 12, no. 12, pp. 66-94, 2009.

[34] Sleeman J. and Finin T., “Computing Foaf Co- Reference Relations With Rules And Machine Learning,” in proceeding of the 3rd International Workshop on Social Data on the Web, China, 2010.

[35] Suchanek F., Abiteboul S., and Senellart P., “Paris: Probabilistic Alignment Of Relations, Instances, And Schema,” Proceedings of the VLDB Endowment, vol. 5, no. 3, pp. 157-168, 2011.

[36] Symeonidou D., Armant V., Pernelle N., and Saïs F., “Sakey: Scalable Almost Key Discovery In Rdf Data,” in Procceding of the 13th International Semantic Web Conference, Riva del Garda, pp. 33-49, 2014.

[37] Uschold M., “Where Is the Semantics in the Semantic Web?,” AI Magazine, vol. 24, no. 3, pp 25-36, 2003.

[38] Wang X., Jiang Y., Fan H., Zhu H., and Liu Q., “FTRLIM results for OAEI 2020”, in Proceeding of the 15th International Semantic Web Conference, Workshop on Ontology Matching, Athens, pp. 187-193, 2020.