Benchmark I – Generation & Acquisition

Hobbit provides benchmarks to measure the performance of SPARQL query processing systems when faced with streams of data from industrial machinery in terms of efficiency and completeness.

We aim to reflect real loads on triple stores used in real applications. We hence use the following datasets:

  • Public Transport Data
  • Social network data from Twitter
  • Car traffic data gathered from sensor data gathered by TomTom
  • Sensor data from plastic injection moulding industrial plants of Weidmüller

We increased the size and velocity of RDF data used in our benchmarks to evaluate how well a system can store streaming RDF data obtained from industry. The data is generated from one or multiple resources in parallel and inserted using SPARQL INSERT queries. This facet of triple stores has (to the best of our knowledge) never been benchmarked before. SPARQL SELECT queries are used to test the system’s ingestion performance and storage abilities. The benchmarks test for the scalability as well as the accuracy of systems. The key performance indicators for the benchmarks include:

  • their precision, recall and F-measure (both micro and macro)
  • their average answer time by reporting the average delay between the moment at which the SELECT query has been executed and the point in time at which the results are received, and
  • their triples per second, as a fraction of the total number of triples that were inserted during a stream divided by the total time needed for those triples to be inserted

Hobbit also provides benchmarks to measure the performance of entity recognition and linking systems for unstructured streams of natural-language data. Hence, Hobbit reuses some of the concepts developed within the Gerbil framework. These concepts were migrated and adapted to the Hobbit architecture. Extraction tasks for the entity recognition and linking systems comprise the recognition of known and unknown entities inside the data as well as the linking to different knowledge bases. For the benchmarks the following datasets are integrated:

  • curated unstructured datasets by experts from real sources
  • produced unstructured data streams by Bengal, a generic data generator

The benchmarks utilize of the following key performance indicators:

  • precision, recall and F-measure
  • the number of F1-measure points a system achieves per second for a given number of documents