Instance Matching or Link Discovery Task at the OAEI OM 2020 Workshop
Instance Matching or Link Discovery
The number of datasets published in the Web of Data as part of the Linked Data Cloud is constantly increasing. The Linked Data paradigm is based on the unconstrained publication of information by different publishers, and the interlinking of Web resources across knowledge bases. In most cases, the cross-dataset links are not explicit in the dataset and must be automatically determined using Instance Matching (IM) and Link Discovery tools amongst others. The large variety of techniques requires their comparative evaluation to determine which one is best suited for a given context. Performing such an assessment generally requires well-defined and widely accepted benchmarks to determine the weak and strong points of the proposed techniques and/or tools.
A number of real and synthetic benchmarks that address diﬀerent data linking challenges have been proposed for evaluating the performance of such systems. So far, only a limited number of link discovery benchmarks target the problem of linking geo-spatial entities. However, some of the largest knowledge bases on the Linked Open Data Web are geospatial knowledge bases (e.g., LinkedGeoData with more than 30 billion triples). Linking spatial resources requires techniques that diﬀer from the classical mostly string-based approaches. In particular, considering the topology of the spatial resources and the topological relations between them is of central importance to systems driven by spatial data.
We believe that due to the large amount of available geospatial datasets employed in Linked Data and in several domains, it is critical that benchmarks for geospatial link discovery are developed.
The proposed Task entitled “Instance Matching or Link Discovery Task” is presented at the OAEI OM 2020 Workshop at ISWC 2020. OM workshop conducts an extensive and rigorous evaluation of ontology matching and instance matching (link discovery) approaches through the OAEI (Ontology Alignment Evaluation Initiative) 2020 campaign.
The aim of the Task is to test the performance of Link Discovery tools that implement string-based as well as topological approaches for identifying matching instances and spatial entities. The different frameworks will be evaluated for both accuracy (precision, recall and f-measure) and time performance.
Q & A
|Deadline for the submission of papers||July 1st, 2020|
|Deadline for the notification of acceptance/rejection||August 5th, 2020|
|Workshop camera ready copy submission||September 2nd, 2020|
|OM-2020, The Megaron Athens International Conference Centre (MAICC), Athens, Greece||November 2nd or 3rd, 2020|
|(Preliminary) datasets available||June 15th|
|Preparation phase ends and final datasets||July 15th|
|participants register their tool (mandatory). Please use this form (requires a google account and a valid email)||July 31st|
|Execution phase ends and participants submit final versions of their tools. SEALS tracks (zip file, e.g., LogMap.zip) using this form. HOBBIT tracks (via platform).||August 31st|
|Evaluation phase ends and results are available. SEALS and HOBBIT tracks.||September 30th|
|Preliminary version of system papers due. Submit PDF paper (e.g., LogMap_prelim.pdf). Please use this form (requires a google account and a valid email).||October 14th|
|Ontology matching workshop||November 2nd or 3rd|
|Final version of system papers due. Submit a PDF (e.g., LogMap_final.pdf) paper. Please use this form (requires a google account and a valid email)||November 15th|
Tasks and Training Data
The goal of the SPIMBENCH task is to determine when two instances describe the same Creative Work. A dataset is composed of a Tbox (contains the ontology and the instances) and corresponding Abox (contains only the instances). The datasets share almost the same ontology (with some difference in the properties’ level, due to the structure-based transformations). What we expect from participants. Participants are requested to match instances in the source dataset (Tbox1) against the instances of the target dataset (Tbox2). The task goal is to produce a set of mappings between the pairs of matching instances that are found to refer to the same real-world entity. An instance in the source (Tbox1) dataset can have none or one matching counterparts in the target dataset (Tbox2). We ask the participants to map only instances of Creative Works (http://www.bbc.co.uk/ontologies/creativework/NewsItem, http://www.bbc.co.uk/ontologies/creativework/BlogPost and http://www.bbc.co.uk/ontologies/creativework/Programme) and not the instances of the other classes.
We use TomTom and Spaten datasets in order to create the appropriate benchmarks.
- TomTom provides a Synthetic Trace Generator developed in the context of the HOBBIT Project, that facilitates the creation of an arbitrary volume of data from statistical descriptions of vehicle traffic. More specifically, it generates traces, with a trace being a list of (longitude, latitude) pairs recorded by one device (phone, car, etc.) throughout one day. TomTom was the only data generator in the first version of SPgen.
- Spaten is an open-source configurable spatio-temporal and textual dataset generator, that can produce large volumes of data based on realistic user behavior. Spaten extracts GPS traces from realistic routes utilizing the Google Maps API, and combines them with real POIs and relevant user comments crawled from TripAdvisor. Spaten publicly offers GB-size datasets with millions of check-ins and GPS traces.
This version of the challenge will comprise the Spatial task that measures how well the systems can identify DE-9IM (Dimensionally Extended nine-Intersection Model) topological relations . The supported spatial relations are the following: Equals, Disjoint, Touches, Contains/Within, Covers/CoveredBy, Intersects, Crosses, Overlaps and the traces are represented in Well-known text (WKT) format. For each relation, a different pair of source and target dataset will be given to the participants.
 Saveta, E. Daskalaki, G. Flouris, I. Fundulaki, and A. Ngonga-Ngomo. LANCE: Piercing to the Heart of Instance Matching Tools. In ISWC, 2015.
 Christian Strobl. Encyclopedia of GIS , chapter Dimensionally Extended Nine-Intersection Model (DE-9IM), pages 240245. Springer, 2008.
- Pavel Shvaiko (Main contact)
Trentino Digitale, Italy
E-mail: pavel [dot] shvaiko [at] tndigit [dot] it
- Jérôme Euzenat
INRIA & Univ. Grenoble Alpes, France
- Ernesto Jiménez-Ruiz
City, Univeristy of London, UK & SIRIUS, Univeristy of Oslo, Norway
- Oktie Hassanzadeh
IBM Research, USA
- Cássia Trojahn
Program Committee (to be completed):
- Alsayed Algergawy, Jena University, Germany
- Manuel Atencia, INRIA & Univ. Grenoble Alpes, France
- Zohra Bellahsene, LIRMM, France
- Jiaoyan Chen, University of Oxford, UK
- Valerie Cross, Miami University, USA
- Jérôme David, University Grenoble Alpes & INRIA, France
- Daniel Faria, Instituto Gulbenkian de Ciéncia, Portugal
- Alfio Ferrara, University of Milan, Italy
- Marko Gulić, University of Rijeka, Croatia
- Wei Hu, Nanjing University, China
- Ryutaro Ichise, National Institute of Informatics, Japan
- Antoine Isaac, Vrije Universiteit Amsterdam & Europeana, Netherlands
- Naouel Karam, Fraunhofer, Germany
- Prodromos Kolyvakis, EPFL, Switzerland
- Patrick Lambrix, Linköpings Universitet, Sweden
- Oliver Lehmberg, University of Mannheim, Germany
- Majeed Mohammadi, TU Delft, Netherlands
- Peter Mork, MITRE, USA
- Andriy Nikolov, Metaphacts GmbH, Germany
- George Papadakis, University of Athens, Greece
- Catia Pesquita, University of Lisbon, Portugal
- Henry Rosales-Méndez, University of Chile, Chile
- Kavitha Srinivas, IBM, USA
- Giorgos Stoilos, Huawei Technologies, Greece
- Pedro Szekely, University of Southern California, USA
- Ludger van Elst, DFKI, Germany
- Xingsi Xue, Fujian University of Technology, China
- Ondřej Zamazal, Prague University of Economics, Czech Republic
- Songmao Zhang, Chinese Academy of Sciences, China
- We appreciate support from the Trentino as a Lab initiative of the European Network of the Living Labs at Trentino Digitale, the EU SEALS project, as well as the Pistoia Alliance Ontologies Mapping project and IBM Research.
We appreciate support from the Trentino as a Lab initiative of the European Network of the Living Labs at Trentino Digitale, the EU SEALS project, as well as the Pistoia Alliance Ontologies Mapping project and IBM Research.