Instance Matching or Link Discovery Task at OAEI 2019 – OM 2019, ISWC 2019

Instance Matching or Link Discovery Task at the OAEI OM 2019 Workshop

Instance Matching or Link Discovery

The number of datasets published in the Web of Data as part of the Linked Data Cloud is constantly increasing. The Linked Data paradigm is based on the unconstrained publication of information by different publishers, and the interlinking of Web resources across knowledge bases. In most cases, the cross-dataset links are not explicit in the dataset and must be automatically determined using Instance Matching (IM) and Link Discovery tools amongst others. The large variety of techniques requires their comparative evaluation to determine which one is best suited for a given context. Performing such an assessment generally requires well-defined and widely accepted benchmarks to determine the weak and strong points of the proposed techniques and/or tools.

A number of real and synthetic benchmarks that address different data linking challenges have been proposed for evaluating the performance of such systems. So far, only a limited number of link discovery benchmarks target the problem of linking geo-spatial entities.

However, some of the largest knowledge bases on the Linked Open Data Web are geospatial knowledge bases (e.g., LinkedGeoData with more than 30 billion triples). Linking spatial resources requires techniques that differ from the classical mostly string-based approaches. In particular, considering the topology of the spatial resources and the topological relations between them is of central importance to systems driven by spatial data.

We believe that due to the large amount of available geospatial datasets employed in Linked Data and in several domains, it is critical that benchmarks for geospatial link discovery are developed.

The proposed Task entitled “Link Discovery Task” is accepted at the OAEI OM 2019 Workshop at ISWC 2019. OM workshop conducts an extensive and rigorous evaluation of ontology matching and instance matching (link discovery) approaches through the OAEI (Ontology Alignment Evaluation Initiative) 2019 campaign.

Task Overview

The aim of the Task is to test the performance of Link Discovery tools that implement string-based as well as topological approaches for identifying matching spatial entities. The different frameworks will be evaluated for both accuracy (precision, recall and f-measure) and time performance.

Q & A

For more information send an e-mail to: Tzanina Saveta (jsaveta@ics.forth.gr) and Irini Fundulaki (fundul@ics.forth.gr).

Important Dates

Workshop related

Deadline for the submission of papers June 28th
Deadline for the notification of acceptance/rejection July 24th
Early registration deadline August 2nd
Workshop camera ready copy submission August 26th
OM-2019, Owen G. Glenn Building, The University of AucklandAuckland, New Zealand October 26th

Challenge related

(Preliminary) datasets available June 15th
Preparation phase ends and final datasets July 15th
participants register their tool (mandatory). Please use this form (requires a google account and a valid email) July 31st
Execution phase ends and participants submit final versions of their tools. SEALS tracks (zip file, e.g., LogMap.zip). HOBBIT tracks (via platform). August 31st
Evaluation phase ends and results are available. SEALS and HOBBIT tracks. September 30th
Preliminary version of system papers due. Submit PDF paper (e.g., LogMap_prelim.pdf). Please use this form (requires a google account and a valid email). October 7th
Ontology matching workshop (check ISWC 2019 student travel grants). October 26th or 27th
Final version of system papers due. Submit a PDF (e.g., LogMap_final.pdf) paper. Please use this form (requires a google account and a valid email). November 15th

Tasks and Training Data

SPIMBENCH

The goal of the SPIMBENCH task is to determine when two instances describe the same Creative Work. A dataset is composed of a Tbox (contains the ontology and the instances) and corresponding Abox (contains only the instances). The datasets share almost the same ontology (with some difference in the properties’ level, due to the structure-based transformations). What we expect from participants. Participants are requested to match instances in the source dataset (Tbox1) against the instances of the target dataset (Tbox2). The task goal is to produce a set of mappings between the pairs of matching instances that are found to refer to the same real-world entity. An instance in the source (Tbox1) dataset can have none or one matching counterparts in the target dataset (Tbox2). We ask the participants to map only instances of Creative Works (http://www.bbc.co.uk/ontologies/creativework/NewsItem, http://www.bbc.co.uk/ontologies/creativework/BlogPost and http://www.bbc.co.uk/ontologies/creativework/Programme) and not the instances of the other classes.

Link Discovery
We will use TomTom datasets in order to create the appropriate benchmarks. TomTom datasets contain representations of traces (GPS fixes). Each trace consists of a number of points.  Each point has time stamp, longitude, latitude and speed (value and metric). The points are sorted by timestamp of the corresponding GPS fix (ascending).

This version of the challenge will comprise the following tasks:

  • Task 1 (Linking) will measure how well the systems can match traces that have been altered using string-based approaches along with addition and deletion of intermediate points.As the TomTom dataset only contains coordinates and in order to apply string-based modifications based on LANCE[1] we have replaced a number of those with labels retrieved from Google Maps Api, Foursquare Api and Nominatim Openstreetmap Api. This task also contains changes on date format and changes on coordinate formats.
  • Task 2 (Spatial) measures how well the systems can identify DE-9IM (Dimensionally Extended nine-Intersection Model) topological relations. The supported spatial relations are the following: EqualsDisjoint, Touches, Contains/Within, Covers/CoveredBy, Intersects, Crosses, Overlaps  and the  traces are represented in Well-known text (WKT) format. For each relation, a different pair of source and target dataset will be given to the participants.

Read the detailed description of the tasks and training data .

References

[1]   Saveta, E. Daskalaki, G. Flouris, I. Fundulaki, and A. Ngonga-Ngomo. LANCE: Piercing to the Heart of Instance Matching Tools. In ISWC, 2015.

Registration & Submission

Participants may register their tool (mandatory)  using this form (requires a google account and a valid email)

Participants submit final versions of their tools. SEALS tracks (zip file, e.g., LogMap.zip) using this form. HOBBIT tracks (via platform).

Submit PDF paper (e.g., LogMap_prelim.pdf). Please use this form (requires a google account and a valid email).

The final version of system papers may submit as a PDF (e.g., LogMap_final.pdf) paper. Please use this form (requires a google account and a valid email).

 

Systems

Tool  Institution Country Contact person(s) Task to participate
An Efficient System for Matching Large Ontologies (Lily) Southeast University China Wu Jiang-Heng SPIMBENCH
FTRL-IM Tongji University China JIANG Yizhi SPIMBENCH
LogMap City, University of London UK Ernesto Jimenez-Ruiz SPIMBENCH
AgreementMakerLight
(AML)
University of Lisbon /
Instituto Gulbenkian de Ciencia /
University of Illinois at Chicago
Portugal / USA Daniel Faria,
Catia Pesquita
SPIMBENCH & Linking & Spatial
Rapid Discovery of
Topological Relations
(RADON)
AKSW Germany Mohamed Ahmed Sherif,
Kevin Dreßler
Spatial
Silk National and Kapodistrian
University of Athens
Greece Despina Athanasia Pantazi, Panayiotis Smeros Spatial

 

Results

SPIMBENCH

SANDBOX (~380 instances, ~10000 triples)
AML Lily FTRL-IM LogMap
Fmeasure 0.864516129 0.9185867896 0.9214175655 0.8413284133
Precision 0.8348909657 0.8494318182 0.8542857143 0.9382716049
Recall 0.8963210702 1 1 0.762541806
Time performance 6223 2032 1474 6919

MAINBOX (~1800 instances, ~50000 triples)
AML Lily FTRL-IM LogMap
Fmeasure 0.8604576217 0.9216224459 0.9214787657 0.790560472
Precision 0.8385678392 0.854638009 0.85584563 0.8925895087
Recall 0.8835208471 1 0.9980145599 0.7094639312
Time performance 39515 3667 2155 26920
The results can be found in HOBBIT platform: here  (login as guest).

Link Discovery – Linking task results

SANDBOX (100 instances)
AML
Fmeasure 1
Precision 1
Recall 1
Time performance 9761
MAINBOX (10000 instances)
AML
Fmeasure 1
Precision 1
Recall 1
Time performance 360647
The results can be found in HOBBIT platform: here  (login as guest).

Link Discovery – Spatial task results

 

The results can be found in HOBBIT platform: here  (login as guest).

Organization

Organizing Committee:

  • Pavel Shvaiko (Main contact)
    Trentino Digitale, Italy
    E-mail: pavel [dot] shvaiko [at] tndigit [dot] it
  • Jérôme Euzenat
    INRIA & Univ. Grenoble Alpes, France
  • Ernesto Jiménez-Ruiz
    The Alan Turing Institute, UK & University of Oslo, Norway
  • Oktie Hassanzadeh
    IBM Research, USA
  • Cássia Trojahn
    IRIT, France

Program Committee:

  • Alsayed Algergawy, Jena University, Germany
  • Manuel Atencia, INRIA & Univ. Grenoble Alpes, France
  • Zohra Bellahsene, LIRMM, France
  • Jiaoyan Chen, University of Oxford, UK
  • Valerie Cross, Miami University, USA
  • Jérôme David, University Grenoble Alpes & INRIA, France
  • Gayo Diallo, University of Bordeaux, France
  • Warith Eddine Djeddi, LIPAH & LABGED, Tunisia
  • AnHai Doan, University of Wisconsin, USA
  • Alfio Ferrara, University of Milan, Italy
  • Marko Gulić, University of Rijeka, Croatia
  • Wei Hu, Nanjing University, China
  • Ryutaro Ichise, National Institute of Informatics, Japan
  • Antoine Isaac, Vrije Universiteit Amsterdam & Europeana, Netherlands
  • Marouen Kachroudi, Université de Tunis El Manar, Tunis
  • Simon Kocbek, University of Melbourne, Australia
  • Prodromos Kolyvakis, EPFL, Switzerland
  • Patrick Lambrix, Linköpings Universitet, Sweden
  • Oliver Lehmberg, University of Mannheim, Germany
  • Vincenzo Maltese, University of Trento, Italy
  • Fiona McNeill, University of Edinburgh, UK
  • Christian Meilicke, University of Mannheim, Germany
  • Peter Mork, MITRE, USA
  • Andriy Nikolov, Metaphacts GmbH, Germany
  • Axel Ngonga, University of Paderborn, Germany
  • George Papadakis, University of Athens, Greece
  • Catia Pesquita, University of Lisbon, Portugal
  • Henry Rosales-Méndez, University of Chile, Chile
  • Juan Sequeda, Capsenta, USA
  • Kavitha Srinivas, IBM, USA
  • Giorgos Stoilos, National Technical University of Athens, Greece
  • Pedro Szekely, University of Southern California, USA
  • Valentina Tamma, University of Liverpool, UK
  • Ludger van Elst, DFKI, Germany
  • Xingsi Xue, Fujian University of Technology, China
  • Ondřej Zamazal, Prague University of Economics, Czech Republic
  • Songmao Zhang, Chinese Academy of Sciences, China