HOBBIT Spatial Benchmark V2.0

A number of real and synthetic benchmarks have been proposed for evaluating the performance of link discovery systems. So far, only a limited number of link discovery benchmarks target the problem of linking geo-spatial entities. However, some of the largest knowledge bases of the Linked Open Data Cloud, such as LinkedGeoData contain vast amounts of spatial information. Furthermore, several systems that manage spatial data and consider the topology of the spatial resources and the topological relations between them have been developed. In order to assess the ability of these systems to handle the vast amount of spatial data and perform the much needed data integration in the Linked Geo Data Cloud, it is imperative to develop benchmarks for geo-spatial link discovery. Thus, in the context of HOBBIT project we have developed the Spatial Benchmark Generator (SPgen). SPgen can be used to test the performance of systems that deal with topological relations proposed by the state of the art DE-9IM (Dimensionally Extended nine-Intersection Model) [1].

The supported topological relations of the model are:

  • Equals
  • Disjoint
  • Touches
  • Contains/Within
  • Covers/CoveredBy
  • Intersects
  • Crosses
  • Overlaps

In the first version of the Spatial Benchmark Generator, we implemented all topological  relations of DE-9IM between LineStrings and we have enriched the second version with the relations between LineStrings and Polygons in a two-dimensional space. SPgen follows the choke point-based approach [2] for benchmark design, i.e., it focuses on the technical difficulties of existing systems and implements tests that address those difficulties to “push” systems to resolve them in order to become better. More specifically we focus on the following choke-points in SPgen:
Scalability: produce datasets large enough to stress the systems under test
Output quality: compute precision, recall and f-measure
Time performance: measure the time the systems need to return the results
SPgen gets as input traces represented as LineStrings and produces a source and a target dataset that implements a specific DE-9IM relation between the source and the target instance.

In this version of SPgen we have three data generators: TomTom, Spaten and DEBS.

  1. TomTom provides a Synthetic Trace Generator developed in the context of the HOBBIT Project, that facilitates the creation of an arbitrary volume of data from statistical descriptions of vehicle traffic. More specifically, it generates traces, with a trace being a list of (longitude, latitude) pairs recorded by one device (phone, car, etc.) throughout one day. TomTom was the only data generator in the first version of SPgen.
  2. Spaten is an open-source configurable spatio-temporal and textual dataset generator, that can produce large volumes of data based on realistic user behavior. Spaten extracts GPS traces from realistic routes utilizing the Google Maps API, and combines them with real POIs and relevant user comments crawled from TripAdvisor. Spaten publicly offers  GB-size datasets with millions of check-ins and GPS traces.
  3. DEBS provides a selection of AIS data collected from the MarineTraffic coastal network. It has been used for the EU H2020 Research Project BigDataOcean and the ACM DEBS Grand Challenge 2018.

A user of SPgen can choose which dataset should be used as a source dataset. The source dataset is identical to the input traces but is expressed in the Well Known Text format (WKT), whereas the target dataset consists of LineStrings or Polygons that are generated from the source dataset in such a way that traces in the target dataset have a specific topological DE-9IM relation with the traces of the source dataset.

To compute the gold standard, we resorted to an appropriate implemented system, namely RADON [8]. RADON was selected because it is a novel approach for rapid discovery of topological relations among geo-spatial resources. RADON was evaluated with real datasets of various sizes and showed that in addition to being complete and correct, it also outperforms the state of the art spatial link discovery systems by up to three orders of magnitude.

SPgen has been integrated into the HOBBIT platform and can be used for benchmarking any system that  is able to identify topological relations.

 

[1] C. Strobl. Encyclopedia of GIS , chapter Dimensionally Extended Nine-Intersection Model (DE-9IM), pages 240245. Springer, 2008.
[2] P. Boncz, T. Neumann, and O. Erling. TPC-H analyzed: Hidden messages and lessons learned from an influential benchmark. In TPC-TC, pages 61–76. Springer, 2013.
[3] M.-A. Sherif, K. Dreßler, P. Smeros, and A.-C. Ngonga Ngomo. RADON – Rapid Discovery of Topological Relations. In AAAI, 2017.

Spread the word. Share this post!

Leave A Reply

Your email address will not be published. Required fields are marked *