Benchmark III – Storage & Curation

HOBBIT provides benchmarks for storage and curation systems.

 Data Storage benchmark

The Data Storage benchmark (DSB) focuses on the typical challenges faced by the storage systems. It extends the Social Network Benchmark, developed in the context of the FP7 Linked Data Benchmarking Council (LDBC) by introducing important modifications in its synthetic data generator and dataset, and by modifying and transforming its SQL queries to SPARQL. This has been carried out while preserving the benchmark’s most relevant features:

  • high insert rate with time-dependent and largely repetitive or cyclic data,
  • exploitation of structure and physical organization adapted to the key dimensions of the data,
  • bulk loading support,
  • interactive complex read queries based on the well defined choke points,
  • simple lookups,
  • concurrency and
  • high throughput.

The dataset in various sizes (called scale factors) can be downloaded from our FTP server.

The key performance indicators for the benchmarks include:

  • Time compression ratio (TCR): the ratio between real time and simulated time in which the system under test was able to answer queries in reasonable time
  • Throughput: number of queries per second
  • Bulk loading time
  • Average execution time per query type in milliseconds
  • Query failures: The number of queries with returned results that are not equal to the expected ones

All the other details about DSB v1, including the queries themselves and the choke points they are based upon, can be found in the Deliverable 5.1.1.

Versioning Benchmark

The Semantic Publishing Versioning Benchmark (SPVB) aims to test the ability of versioning systems to efficiently manage evolving Linked Data datasets and queries evaluated across multiple versions of these datasets. It acts like a Benchmark Generator, as it generates both the data and the queries needed to test the performance of the versioning systems. SPVB is not tailored to any versioning strategy (the way that versions are stored) and can produce data of different sizes, that can be altered in order to create arbitrary numbers of versions using configurable insertion and deletion ratios. The data generator of SPVB uses the data generator of Linked Data Benchmark Council (LDBC) Semantic Publishing Benchmark (SPB) as well as real DBpedia data. The generated SPARQL queries are of different types and are partially based on a subset of the 25 query templates defined in the context of DBpedia SPARQL Benchmark (DBPSB).

A small sample dataset along with a set of SPARQL queries is available here.

SPVB evaluates the correctness and performance of the system under test through the following Key Performance Indicators (KPIs)

  • Query failures: The number of queries that failed to execute.
  • Throughput (in queries per second): The execution rate per second for all queries.
  • Initial version ingestion speed (in triples per second): The total triples that can be loaded per second for the dataset’s initial version.
  • Applied changes speed (in triples per second): The average number of changes that can be stored by the benchmarked system per second after the loading of all new versions.
  • Storage space cost (in MB): This KPI measures the total storage space required to store all versions.
  • Average query execution time (in ms): The average execution time, in milliseconds for each one of the eight versioning query types.