Benchmark III – Storage & Curation

HOBBIT provides benchmarks for storage and curation systems.

 Data Storage benchmark

The Data Storage benchmark (DSB) focuses on the typical challenges faced by the storage systems. It extends the Social Network Benchmark, developed in the context of the FP7 Linked Data Benchmarking Council (LDBC) by introducing important modifications in its synthetic data generator and dataset, and by modifying and transforming its SQL queries to SPARQL. This has been carried out while preserving the benchmark’s most relevant features:

  • high insert rate with time-dependent and largely repetitive or cyclic data,
  • exploitation of structure and physical organization adapted to the key dimensions of the data,
  • bulk loading support,
  • interactive complex read queries based on the well defined choke points,
  • simple lookups,
  • concurrency and
  • high throughput.

The dataset in various sizes (called scale factors) can be downloaded from our FTP server.

The key performance indicators for the benchmarks include:

  • Time compression ratio (TCR): the ratio between real time and simulated time in which the system under test was able to answer queries in reasonable time
  • Throughput: number of queries per second
  • Bulk loading time
  • Average execution time per query type in milliseconds
  • Query failures: The number of queries with returned results that are not equal to the expected ones

All the other details about DSB v1, including the queries themselves and the choke points they are based upon, can be found in the Deliverable 5.1.1.

Versioning Benchmark

The Versioning Benchmark – SPBv aims to test the ability of versioning systems to efficiently manage evolving Linked Data datasets and queries evaluated across multiple versions of these datasets. The benchmark is based on Linked Data Benchmark Council (LDBC) Semantic Publishing Benchmark (SPB). It leverages the scenario of the BBC media organization, which makes heavy use of Linked Data Technologies, such as RDF and SPARQL.

SPBv uses the SPB data generator that uses ontologies and reference datasets provided by BBC, to produce sets of creative works. Creative works are metadata represented in RDF about real world events (e.g., sport events, elections, etc.). SPBv’s data generator supports the creation of arbitrarily large RDF datasets that evolve through time and mimic the characteristics of the real BBC datasets. In particular, the 1st version of SPBv, which described in details in the Deliverable 5.2.1, extends the generator of SPB in such a way that generated data is stored in different versions according to their creation date (each version constructed after adding a set of triples to the previous version). A small sample dataset is available here.

SPBv in its 1st version tests the ability of the system to answer eight different type of versioning queries, as those described in section 2.2 of Deliverable 5.2.1. These queries are specified in terms of the ontology of the Semantic Publishing Benchmark and written in SPARQL 1.1

The Key Performance Indicators for the benchmarks include:

  • Query failures: The number of queries that failed to be executed are measured. By failure we mean that the returned results are not those that expected.
  • Throughput (in queries per second): The execution rate per second for all queries.
  • Initial version ingestion speed (in triples per second): The total triples that can be loaded per second for the dataset’s initial version.
  • Applied changes speed (in triples per second): The average number of changes that can be stored by the benchmarked system per second after the loading of all new versions.
  • Storage space (in KB): This KPI measures the total storage space required to store all versions.
  • Average query execution time (in ms): The average execution time, in milliseconds for each one of the eight versioning query types.