HOBBIT Link Discovery Task at OM 2017 – ISWC 2017

HOBBIT Link Discovery Task at the OAEI OM 2017 Workshop

Task Motivation

The number of datasets published in the Web of Data as part of the Linked Data Cloud is constantly increasing. The Linked Data paradigm is based on the unconstrained publication of information by different publishers, and the interlinking of Web resources across knowledge bases. In most cases, the cross-dataset links are not explicit in the dataset and must be automatically determined using Instance Matching (IM) and Link Discovery tools amongst others. The large variety of techniques requires their comparative evaluation to determine which one is best suited for a given context. Performing such an assessment generally requires well-defined and widely accepted benchmarks to determine the weak and strong points of the proposed techniques and/or tools.

A number of real and synthetic benchmarks that address different data linking challenges have been proposed for evaluating the performance of such systems. So far, only a limited number of link discovery benchmarks target the problem of linking geo-spatial entities.

However, some of the largest knowledge bases on the Linked Open Data Web are geospatial knowledge bases (e.g., LinkedGeoData with more than 30 billion triples). Linking spatial resources requires techniques that differ from the classical mostly string-based approaches. In particular, considering the topology of the spatial resources and the topological relations between them is of central importance to systems driven by spatial data.

We believe that due to the large amount of available geospatial datasets employed in Linked Data and in several domains, it is critical that benchmarks for geospatial link discovery are developed.

The proposed Task entitled “HOBBIT Link Discovery Task” is accepted at the OAEI OM 2017 Workshop at ISWC 2017. OM workshop conducts an extensive and rigorous evaluation of ontology matching and instance matching (link discovery) approaches through the OAEI (Ontology Alignment Evaluation Initiative) 2017 campaign.

Task Overview

The aim of the Task is to test the performance of Link Discovery tools that implement string-based as well as topological approaches for identifying matching spatial entities. The different frameworks will be evaluated for both accuracy (precision, recall and f-measure) and time performance.

Q & A

For more information send an e-mail to: Tzanina Saveta (jsaveta@ics.forth.gr) and Irini Fundulaki (fundul@ics.forth.gr).

Important Dates

Workshop related

Deadline for the submission of papers July 28th, 2017
Deadline for the notification of acceptance/rejection August 24th, 2017
Early registration deadline September 8th, 2017
Workshop camera ready copy submission September 15th, 2017
Workshop October 21st, 2017

Challenge related

(Preliminary) datasets available June 1st, 2017
Datasets are frozen July 15th, 2017
(Still open) participants register their tool (mandatory) June 30th, 2017
Submission is open, zipped SEALS packages (e.g., LogMap.zip) can be submitted using this form (requires a google account and a valid email). July 15th, 2017
Participants submit preliminary wrapped versions (zip file) of their tools (mandatory). July 31st, 2017
Participants submit final versions of their tools (zip file). SEALS tracks. August 31st, 2017
Participants submit final versions of their tools. HOBBIT track. September 15th, 2017
Evaluation is executed and results are analyzed. SEALS and HOBBIT tracks. September 30th, 2017
Preliminary version of system papers due. Submit PDF paper (e.g., LogMap_prelim.pdf) using this form (requires a google account and a valid email). October 10th, 2017
Ontology matching workshop October 21st, 2017
Final version of system papers due. Submit PDF (e.g., LogMap_final.pdf) paper using this form (requires a google account and a valid email). November 15th, 2017

Tasks and Training Data

We will use TomTom datasets in order to create the appropriate benchmarks. TomTom datasets contain representations of traces (GPS fixes). Each trace consists of a number of points.  Each point has time stamp, longitude, latitude and speed (value and metric). The points are sorted by timestamp of the corresponding GPS fix (ascending).

This version of the challenge will comprise the following tasks:

  • Task 1 (Linking) will measure how well the systems can match traces that have been altered using string-based approaches along with addition and deletion of intermediate points.As the TomTom dataset only contains coordinates and in order to apply string-based modifications based on LANCE[1] we have replaced a number of those with labels retrieved from Google Maps Api, Foursquare Api and Nominatim Openstreetmap Api. This task also contains changes on date format and changes on coordinate formats.
  • Task 2 (Spatial) measures how well the systems can identify DE-9IM (Dimensionally Extended nine-Intersection Model) topological relations. The supported spatial relations are the following: EqualsDisjoint, Touches, Contains/Within, Covers/CoveredBy, Intersects, Crosses, Overlaps  and the  traces are represented in Well-known text (WKT) format.For each relation, a different pair of source and target dataset will be given to the participants.

Read the detailed description of the tasks and training data .

References

[1]   Saveta, E. Daskalaki, G. Flouris, I. Fundulaki, and A. Ngonga-Ngomo. LANCE: Piercing to the Heart of Instance Matching Tools. In ISWC, 2015.

Registration and Submission

Participants may register their tool using this form. For the preliminary version of system papers, you may submit PDF paper (e.g., LogMap_prelim.pdf) using this form (requires a google account and a valid email). The final version of system papers may be submittef as PDF (e.g., LogMap_final.pdf) paper using this form (requires a google account and a valid email).

The following systems have been tested

Tool  Institution Country Contact person(s)
AgreementMakerLight
(AML)
University of Lisbon / Instituto Gulbenkian de Ciencia / University of Illinois at Chicago Portugal / USA Daniel Faria, Catia Pesquita
Rapid Discovery of
Topological Relations
(RADON)
AKSW Germany Mohamed Ahmed Sherif, Kevin Dreßler
Silk National and Kapodistrian University of Athens Greece Despina Athanasia Pantazi, Panayiotis Smeros
OntoideaSpatial,
Ontoidea-Hobbit
(OntIdea)
Freie Universität Berlin Germany Abderrahmane Khiat, Maximilian Mackeprang

LINKING BENCHMARK – results

SANDBOX (100 instances)
Systems Precision Recall F-measure Run Time
AML 1 1 1 11722
OntoIdea 0.99 0.99 0.99 19806

 

MAINBOX (5000 instances)
Systems Precision Recall F-measure Run Time
AML 1 1 1 134456
OntoIdea Platform time limit

 

SPATIAL BENCHMARK – results

Relation Systems Run Time
(SANDBOX – 10 instances)
Run Time
(MAINBOX – 2000 instances)
EQUALS AML 8157 10284
OntoIdea 1531 567169
RADON 2215 4680
Silk 4059 125967
DISJOINT AML 7173 Platform Time Limit
OntoIdea not participating not participating
RADON 1558 19214
Silk 3224 257877
TOUCHES AML 11207 20252
OntoIdea 4712 473430*
RADON 2672 485765
Silk 4805 1777747
CONTAINS AML 9191 16966
OntoIdea 1489 223857
RADON 2228 6937
Silk 4160 83958
WITHIN AML 10186 12308
OntoIdea 4517 236506
RADON 2203 5036
Silk 4037 88758
COVERS AML 7177 11859
OntoIdea 1503 313298
RADON 2180 6772
Silk not participating not participating
COVERED BY AML 8184 14703
OntoIdea 1467 304509
RADON 2132 4721
Silk not participating not participating
INTERSECTS AML 9269 66681
OntoIdea 1505 510938
RADON 2737 339742
Silk 3582 1718035
CROSSES AML 8224 19385
OntoIdea 1509 461693
RADON 2131 8490
Silk 3917 203763
OVERLAPS AML 10223 194838
OntoIdea 1486 530752*
RADON 2167 60801
Silk 4217 464382

All the precision, recall and F-measure are 1.0 except cells with * that are 0.99.

Sandbox

Mainbox

The results can be found in HOBBIT platform: http://master.project-hobbit.eu/challenges/http%3A%2F%2Fw3id.org%2Fhobbit%2Fchallenges%23487fcdb1-ebd6-4475-ab33-ea3b0cdea6d7/experiments (login as guest)

Organization

Organizing Committee:

  • Pavel Shvaiko (Main contact)
    Informatica Trentina, Italy
    E-mail: pavel [dot] shvaiko [at] infotn [dot] it
  • Jérôme Euzenat
    INRIA & Univ. Grenoble Alpes, France
  • Ernesto Jiménez-Ruiz
    University of Oslo, Norway
  • Michelle Cheatham
    Wright State University, USA
  • Oktie Hassanzadeh
    IBM Research, USA

Program Committee:

  • Alsayed Algergawy, Jena University, Germany
  • Manuel Atencia, INRIA & Univ. Grenoble Alpes, France
  • Zohra Bellahsene, LRIMM, France
  • Olivier Bodenreider, National Library of Medicine, USA
  • Marco Combetto, Informatica Trentina, Italy
  • Valerie Cross, Miami University, USA
  • Warith Eddine Djeddi, LIPAH & LABGED, Tunisia
  • Jérôme David, University Grenoble Alpes & INRIA, France
  • Gayo Diallo, University of Bordeaux, France
  • Zlatan Dragisic, Linköpings Universitet, Sweden
  • Alfio Ferrara, University of Milan, Italy
  • Wei Hu, Nanjing University, China
  • Valentina Ivanova, Linköpings Universitet, Sweden
  • Antoine Isaac, Vrije Universiteit Amsterdam & Europeana, Netherlands
  • Valentina Ivanova, Linköpings Universitet, Sweden
  • Ryutaro Ichise, National Institute of Informatics, Japan
  • Daniel Faria, Instituto Gulbenkian de Cincia, Portugal
  • Patrick Lambrix, Linköpings Universitet, Sweden
  • Juanzi Li, Tsinghua University, China
  • Vincenzo Maltese, University of Trento, Italy
  • Fiona McNeill, University of Edinburgh, UK
  • Peter Mork, Noblis, USA
  • Andriy Nikolov, Open University, UK
  • Axel Ngonga, University of Leipzig, Germany
  • Catia Pesquita, University of Lisbon, Portugal
  • Dominique Ritze, University of Mannheim, Germany
  • Umberto Straccia, ISTI-C.N.R., Italy
  • Ondrej Svab-Zamazal, Prague University of Economics, Czech Republic
  • Cássia Trojahn, IRIT, France
  • Ludger van Elst, DFKI, Germany