Non-relational data management is starting to dominate the new data economy. Formats like XML, JSON and RDF are becoming the basis for data exchange in the Web and the Enterprise, and systems are constantly being developed for their support. For any such system to gain traction, it has to provide adequate support for its respective format. Standards and benchmarking have traditionally been used as the main tools to formally define and provably illustrate the level of this adequacy.
A platform for holistic RDF benchmarking as provided by HOBBIT is well-timed for two reasons: (a) there is a proliferation of RDF systems, and identifying the strong and weak points of these systems is important to support users in deciding the system to use for their needs; and (b) surprisingly, there is a similar proliferation of RDF benchmarks, a development that adds to the confusion since it is not clear which benchmark should one use (or trust) to evaluate existing, or new, systems.
In general, benchmarks can be used to inform users of the strengths and weaknesses of competing software products, but more importantly, they encourage the advancement of technology by providing both academia and industry with clear targets for performance and functionality.
An RDF benchmark for RDF engines consists of datasets (including data generators in the case of synthetic benchmarks), query workloads, performance metrics and rules that should be followed when executing a benchmark. Benchmarks are distinguished between those that use real datasets and those that produce synthetic datasets using special purpose data generators, as well as benchmark generators.
Queries in RDF benchmark workloads can be characterised by the existence of complex and simple filters, the presence of unbound variables, negation, ordering, the use of regular expressions, the different query types (select, construct, ask and describe), duplicate elimination, aggregates, query operators (union, optional, joins). Performance metrics include query performance, as well as memory consumption.
A number of benchmarks have been developed the last years for RDF query engines. Real benchmarks that use real data and optionally real workloads, include the UniProt KnowledgeBase (UniProtKB) [RU09][UniProtKB], YAGO [SKW07], Barton Library [Barton] dataset from the MIT Simile Project, the Linked Sensor Dataset [PHS10] that contains expressive descriptions of approximately 20,000 weather stations in the US and last WordNet [WordNet] a large lexical database of English
Synthetic benchmarks include the Lehigh University Benchmark (LUBM) [GPH05], SP2Bench [SHM+09] and Berlin SPARQL Benchmark (BSBM) [ BS09][BSBM] that aim at the evaluation of RDF engines with synthetically produced datasets and workloads.
Benchmark generators is the last category of RDF benchmarks. The DBpedia SPARQL Benchmark (DBSB) [MLA+14] proposes a generic, schema agnostic methodology, for SPARQL benchmark creation. It is based on (i) flexible data generation that mimics an input data source (ii) query-log mining (iii) clustering of queries and (iv) SPARQL queries feature analysis. The Waterloo SPARQL Diversity Test Suite [AHO+14] is proposed to stress existing RDF engines to reveal a wider range of query requirements as established by web applications. Two classes of query features are used to evaluate the variability of workloads and datasets. These are the structural (e.g., number of triple patterns, types of joins) and data-driven (e.g., cardinality of the results, the selectivity of triple patterns, the basic graph pattern selectivity and the join-restricted selectivity as well as specializations thereof). The WatDiv Test suite was developed to stress existing RDF engines to reveal a wider range of query requirements. Finally, FEASIBLE [SNM15] is the most recent benchmark framework that proposes a feature-based benchmark generation approach from real queries and is both structure-based and data-driven based. The approach is reuses some of the insights of the WatDiv and the DBSB. FEASIBLE proposes a novel sampling approach for queries based on exemplars and medoids. As WatDiv, FEASIBLE uses a set of structural query features to distinguish between different kinds of queries. The generation of benchmarks is a three-step process: first, the data is cleaned by removing erroneous, zero-result queries and syntactically incorrect queries from the set of real queries used to generate the benchmark and attaching seven SPARQL 1.0 features to each one of the selected queries. At a second step, the feature vectors are normalized since the query selection process requires distances between queries to be computed. Furthermore, query representations are normalized so that all queries can be put in a unit hypercube. In the last step, the queries are selected following an approach that is based on the idea of exemplars.
More information on RDF benchmarks can be found in the ESWC 2016 Tutorial “Assessing the performance of RDF Engines: Discussing RDF Benchmarks” presented by Irini Fundulaki and Anastasios Kementsietsidis. The slides are accessible here.[UniProtKB] UniProtKB Queries. http://www.uniprot.org/help/query-fields
[RU09] N. Redaschi and UniProt Consortium. UniProt in RDF: Tackling Data Integration and Distributed Annotation with the Semantic Web. In Biocuration
[SKW07]F. M. Suchanek, G. Kasneci and G. Weikum. YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia, In WWW 2007
[Barton] The MIT Barton Library dataset. http://simile.mit.edu/rdf-test-data/
[PHS10] H. Patni, C. Henson, and A. Sheth. Linked sensor data. 2010
[WordNet] WordNet: A lexical database for English. Available at http://wordnet.princeton.edu/
[GPH05] Y. Guo, Z. Pan, and J. Heflin. LUBM: A Benchmark for OWL Knowledge Base Systems. Journal Web Semantics: Science, Services and Agents on the
World Wide Web archive Volume 3 Issue 2-3, October, 2005
[SHM+09] M. Schmidt , T. Hornung, M. Meier, C. Pinkel, G. Lausen. SP2Bench: A SPARQL Performance Benchmark. Semantic Web Information Management,
[BS09] C. Bizer and A. Schultz. The Berlin SPARQL Benchmark. Int. J. Semantic Web and Inf. Sys., 5(2), 2009.
[BSBM] Berlin SPARQL Benchmark (BSBM) Specification – V3.1. Available at http://wifo5-03.informatik.unimannheim.de/bizer/berlinsparqlbenchmark/spec/index.html
[MLA+14] M. Morsey, J. Lehmann, S. Auer, A-C. Ngonga Ngomo. DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data.
[AHO+14] G. Aluc, O. Hartig, T. Ozsu, K. Daudjee. Diversified Stress Testing of RDF Data Management Systems. In ISWC, 2014.
[SMN15] M. Saleem, Q. Mehmood, and A–C. Ngonga Ngomo. FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework. ISWC 2015.