HOBBIT’s Linking and Spatial Benchmarks

The number of datasets published in the Web of Data as part of the Linked Data Cloud is constantly increasing. The Linked Data paradigm is based on the unconstrained publication of information by different publishers, and the interlinking of Web resources across knowledge bases.  In most cases, the cross-dataset links are not explicit in the dataset and must be automatically determined using link discovery tools [1]. The large variety of link discovery techniques requires their comparative evaluation to determine which one is best suited for a given context. Performing such an assessment generally requires well-defined and widely accepted benchmarks to determine the weak and strong points of the proposed techniques and/or tools.

A number of real and synthetic link discovery benchmarks that address different challenges have been proposed for evaluating the performance of link discovery  systems [2]. So far, only a limited number of link discovery benchmarks target the problem of linking geospatial entities.  However, some of the largest knowledge bases on the Linked Open Data Web are geo-spatial knowledge bases (e.g., LinkedGeoData, with more than 30 billion triples). Linking spatial resources requires techniques that differ from the classical mostly string-based approaches. In  particular, considering the topology of the spatial resources and the topological relations between them is of central importance to systems that manage spatial data.

We believe that due to the large amount of available geo-spatial datasets employed in Linked Data and in several domains, it is critical that benchmarks for geo-spatial link discovery are developed. In this post we discuss the benchmarks developed in the context of HOBBIT for testing the performance of link discovery systems in addition to tools that support the computation of topological relations between geospatial resources. Both generators work with trajectories (i.e., sequences of points in a 2 dimensional space) of vehicles provided by TomTom, a partner in the HOBBIT project. We propose two benchmark generators that deal with link discovery for spatial data:

  • The Linking Benchmark generator, based on SPIMBENCH can be used to test the performance of Instance Matching tools that implement mostly string-based approaches for identifying matching entities.
  • The Spatial Benchmark generator that can be used to test the performance of systems that deal with topological relations proposed in the state-of-the-art DE-9IM (Dimensionally Extended nine-Intersection Model) model [3].

The Linking Benchmark generator is simple and can be used not only by instance matching tools, but also by SPARQL engines that deal with query answering over geospatial data such as STRABON [4]. The choke points for this benchmark are a subset of the ones that were used for the development of SPIMBENCH that in addition to complex value-based, it supports also structure-based and semantics-aware transformations. The ontologies used to represent trajectories are fairly simple, and do not consider complex RDF or OWL schema constructs already supported by SPIMBENCH. The test cases implemented in the benchmark focus on string-based transformations with different  (a) levels (b) types of spatial object representations and (c) types of date representations. Furthermore, the benchmark supports addition and deletion of ontology (schema) properties, known also as schema transformations. The datasets that implement those test cases can be used by Instance Matching tools to identify matching entities. In a nutshell, the benchmark can be used to check whether two traces with their points annotated with place names designate the same trajectory.

The Spatial Benchmark generator, is more complex and implements all DE-9IM topological relations between trajectories in the two dimensional space. To the best of our knowledge such a generic benchmark, that takes as input trajectories (in its first version) and checks the performance of linking systems for spatial data does not exist.

For the design of this benchmark, we focused on (a) on the correct implementation of all the topological relations of the DE-9IM topological model and (b) on producing large enough datasets to stress the systems under test. The supported relations are: Disjoint, Touches, Contains/Within, Covers/CoveredBy, Intersects, Crosses, Overlaps.  To the best of our knowledge, there exist few systems that implement all the topological relations of DE-9IM, hence the benchmark already addresses the first choke point set. Moreover, we produced large synthetic datasets using TomTom’s original data, and hence we are able to challenge the systems regarding scalability.

Both benchmark generator are generic in the sense that they are schema agnostic: they can operate with any datasets that contain trajectories, a trajectory being a set of points or set of longitude, latitude pairs.  The generators are already integrated in the HOBBIT platform.

 

[1] Axel-Cyrille Ngonga Ngomo. On link discovery using a hybrid approach. Journal on Data Semantics, 1(4):203–217, 2012.
[2] T. Saveta, E. Daskalaki, G. Flouris, I Fundulaki, M. Herschel, and A.-C. Ngonga Ngomo. Pushing the limits of instance matching systems: A semantics-aware benchmark for linked data. In WWW, pages 105106. ACM, 2015. Poster.
[3] Christian Strobl. Encyclopedia of GIS , chapter Dimensionally Extended Nine-Intersection Model (DE-9IM), pages 240245. Springer, 2008.
[4] Manolis Koubarakis and Kostis Kyzirakos. Modeling and Querying Metadata in the Semantic Sensor Web: the Model stRDF and the Query Language stSPARQL. In ESWC , 2010.

Spread the word. Share this post!

Leave A Reply

Your email address will not be published. Required fields are marked *