There exist a number of storage benchmarks that test the ability of Linked Data systems to store and query data in an efficient way without addressing the management of data versions. To the best of our knowledge only a limited number of versioning systems (mostly academic) and benchmarks exist for storing versions and testing the proposed solutions. However, the existence of such systems and benchmarks is of the utmost importance, as dynamicity is an indispensable part of the Linked Open Data (LOD) initiative [4, 5]. In particular, both the data and the schema of LOD datasets are constantly evolving for several reasons, such as the inclusion of new experimental evidence or observations, or the correction of erroneous conceptualizations . The open nature of the Web implies that these changes typically happen without any warning, centralized monitoring, or reliable notification mechanism; this raises the need to keep track of the different versions of the datasets and introduces new challenges related to assuring the quality and traceability of Web data over time.
In this blogpost we discuss the benchmark developed in the context of HOBBIT for testing the performance of versioning systems for Linked Data. The benchmark is based on Linked Data Benchmark Council’s (LDBC)  Semantic Publishing Benchmark. It is based on the scenario of the BBC  media organisation which makes heavy use of Linked Data Technologies such as RDF and SPARQL. We extend SPB, to produce SPBv, a versioning benchmark that is not tailored to any versioning strategy and system.
The first component of the benchmark is the Data Generator which extends SPB’s data generator. The SPB data generator produces RDF descriptions of creative works that can be defined as a metadata about a real entity (or entities) that exist in a set of reference datasets (snapshots of the real datasets provided by BBC in addition to GeoNames and DBpedia datasets). SPB’s data generator models the following three types of relations in the data:
- Clustering of data: The clustering effect is produced by generating creative works about a single entity from reference datasets and for a fixed period of time.
- Correlations of entities: This correlation effect is produced by generating creative works about two or three entities from reference data in a fixed period of time.
- Random tagging of entities: Random data distributions are defined with a bias towards popular entities created when the tagging of a creative work is performed.
Creative works, as journalistic assets, are highly dynamic, since in the world of online journalism new articles are constantly been added. Every day plenty of new ”creative works” are published, while the already published ones, often change. As a result, editors need to keep track of changes occurred as times goes by. This is the behaviour that the data generator of SPBv tries to simulate. In particular it extends the generator of SPB in such a way that generated data is stored in different versions according to their creation date.
The second component is the Task Generator that is responsible for generating all the tasks that should be addressed by the system. In more details there are three types of tasks:
- Ingestion tasks are those tasks that trigger the system to report the time required for loading a new version.
- Storage space task prompts the system to report the total storage space overhead for storing the different versioned datasets.
- Query performance tasks: For each one of the eight versioning query types , a set of SPARQL queries is generated that must be executed by the benchmarked systems.
Finally, SPBv includes the Evaluation component that is responsible for evaluating the results after executing the aforementioned generated tasks. Analogous to the types of tasks, there are three performance metrics through which, a versioning system can be evaluated upon:
- First, we measure the space required to store the different versioned datasets that will be used in the experiments. This metric is essential to understand whether a system can choose the best strategy for storing the versions.
- Next, we measure the time that a system needs for storing a new version. By doing so, we can quantify the possible overhead of complex computations, such as delta computation during data ingestion.
- Finally, we measure the average execution time required to answer a query.
In order to evaluate the success of systems to cope with the previously described metrics we define the following KPIs:
- Initial version ingestion speed (in triples per second): The total triples that can be loaded per second for storing the first version of the dataset. The reason why we distinguish it from the ingestion speed of other versions is that the initial version is in most cases much larger than the next ones, so there is room for optimization.
- Speed of Applied changes (average changes per second): Every new coming version, other than the initial one, contains a set of changes (additions and/or deletions of triples). This KPI measures the average number of such changes that could be stored by the benchmarked systems per second after the loading of all new versions.
- Storage cost (in KB): This KPI measures the total storage space required in order to store the data of all versions.
- Average Query Execution Time (in ms): The average execution time, in milliseconds for the query.
 Fouad Zablith, Grigoris Antoniou, Mathieu d’Aquin, Giorgos Flouris, Haridimos Kondylakis, Enrico Motta, Dimitris Plexousakis, and Marta Sabou. Ontology evolution: a process-centric survey.KER, 30(1), 2015.
 Jurgen Umbrich, Michael Hausenblas, Aidan Hogan, Axel Polleres, and Stefan Decker. Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources. In LDOW, 2010.
 Tobias Käfer, Ahmed Abdelrahman, Jürgen Umbrich, Patrick O’Byrne, and Aidan Hogan. Observing Linked Data Dynamics. In ESWC, 2013.
 Vassilis Papakonstantinou, Giorgos Flouris, Irini Fundulaki, Kostas Stefanidis and Giannis Roussakis. Versioning for Linked Data: Archiving Systems and Benchmarks. In: BLINK (2016).