Linking Benchmark for Spatial Data – Presentation in OM 2016

The number of datasets published in the Web of Data as part of the Linked Data Cloud is constantly increasing. The Linked Data paradigm is based on the unconstrained publication of information by different publishers, and the interlinking of Web resources across knowledge bases. In most cases, the cross-dataset links are not explicit in the dataset and must be automatically determined using Instance Matching (IM) tools.

The large variety of techniques requires their comparative evaluation to determine which one is best suited for a given context. Performing such an assessment generally requires well-defined and widely accepted benchmarks to determine the weak and strong points of the proposed techniques and/or tools.

Although a fairly large number of benchmarks has been proposed for evaluating Instance Matching Systems (IIMB 2012 [1], Sandbox 2012 [2], RDFT 2013 [2], ID-REC 2014 [3], ONTOBI 2010 [4], Author – Task 2015 [5] and Lance 2015 [6] to mention few), only a limited number of link discovery benchmarks target the problem of linking geo-spatial entities such as PABench [7].

However, some of the largest knowledge bases on the Linked Open Data Web are geo-spatial knowledge bases (e.g., LinkedGeoData with more than 30 billion triples) and the systems for linking spatial resources requires techniques that differ from the classical mostly string-based approaches since the topology of the spatial resources and the topological relations between them is of central importance to systems driven by spatial data.

We believe that due to the large amount of available geo-spatial datasets employed in Linked Data and in several domains, it is critical that benchmarks for geo-spatial link discovery are developed. In OAEI 2016 we presented the challenge that we proposed for OAEI 2017 for IM systems for spatial data. The benchmark that we will use will be based upon the Lance [6] scalable, schema-agnostic benchmark generator extended with appropriate transformations to tackle geo-spatial link discovery tasks.

The tasks proposed will focus on the different types of spatial object representations and will be provided with different severity levels for the applied transformations. In these transformations, objects may keep their representation, they may change their geometry, type or attributes, merge with other objects, or can completely disappear. This is a scenario that stems from the heterogeneous datasets (in structure and semantics) used to describe geo-spatial entities. The produced tasks will be used by IM tools that implement string-based as well as topological approaches for identifying matching entities. The IM frameworks will be evaluated for both accuracy (precision, recall and f-measure) and scalability.

[1] J. Aguirre, K. Eckert, J. Euzenat, et al. Results of the ontology alignment evaluation initiative 2012. In OM, 2012.

[2] B. Cuenca Grau, Z. Dragisic, K. Eckert, et al. Results of the ontology alignment evaluation initiative 2013. In OM, 2013.

[3] Z. Dragisic, K. Eckert, J. Euzenat, et al. Results of the ontology alignment evaluation initiative 2014. In OM, 2014.

[4] K. Zaiss, S. Conrad, and S. Vater. A Benchmark for Testing Instance-Based Ontology Matching Methods. In KMIS, 2010.

[5] M. Cheatham, Z. Dragisic, J. Euzenat, et al. Results of the ontology alignment evaluation initiative 2015. In OM, 2015.

[6] T. Saveta, E. Daskalaki, G. Flouris, I. Fundulaki, and A. Ngonga-Ngomo. LANCE: Piercing to the Heart of Instance Matching Tools. In ISWC, 2015.

[7] B. Berjawi and F. Duchateau and F. Favetta and M. Miquel and R. Laurini. Pabench: Designing a taxonomy and implementing a benchmark for spatial entity matching. In GeoProcessing, 2015.

Spread the word. Share this post!

Leave A Reply

Your email address will not be published. Required fields are marked *