Faceted browsing stands for a session-based and state-dependent interactive method for query formulation over a multi-dimensional information space. It provides a user with an effective way for exploration of a search space. After having defined the initial search space, i.e., the set of resources of interest to the user, a browsing scenario consists of applying (or removing) filter restrictions of object-valued properties or of changing the range of a property value of various data types.
The goal of a benchmark for faceted browsing techniques and tools is to equip solution providers with a way to check their software for their capabilities of enabling faceted browsing through large-scale structured datasets. That is, it analyses their efficiency in navigating through large datasets, where the navigation is driven by intelligent iterative restrictions. Our goal within the HOBBIT project is to provide a platform for benchmarking Faceted Browsing systems on browsing scenarios through a dataset, which reflect an authentic use-case and challenge participating systems on different points of difficulty.
Benchmarking of faceted browsing
In a browsing scenario it is the effective transition from one state to the next one that determines the user experience. Ideally, a system uses the information of the state of the browsing scenario to return its answer to the SPARQL query that makes up the desired transition, instead of answering the query on the basis of the entire dataset in its original form. A good system for faceted browsing supports these transitions where, quite possibly, choices have to be made whether to better support a certain transition or the other. Support could be achieved through an intelligent database structure or through certain precomputations.
Therefore, choke points within a benchmark on faceted browsing correspond to certain transitions from one state to the other during the browsing scenario. Overall, we collected a list of 14 transitions that make up the choke points of our benchmark on faceted browsing. This means that systems can be evaluated towards quite a large set of different aspects of performance.
We created several lists of ordered SPAQRL queries, where each list simulates one browsing scenario. The development of the browsing scenarios took place with the aim to guarantee two requirements. Firstly, we wanted to come up with browsing sessions that make sense in a real-world browsing scenario. Secondly, the scenarios should also cover all types of transitions as specified by the choke points. The overall workload of the benchmark comprises 173 SPARQL queries divided up into 11 scenarios, each simulating a single user browsing through the dataset. Every choke point appears at least a few times over all scenarios combined. For the underlying dataset we use the HOBBIT transport dataset containing train connections between stations on an artificially created map. Therefore, our scenarios simulate a user trying to find information on train routes, stations, departure times, delays, and such.
The set of KPIs of the benchmark contains precision, recall, F1-score and, of course, we collect the time between query formulation and the receiving of an answer. We record this time in form of a score measuring the number of queries per second. These four performance values (precision, recall, F1-score and query-per-second score) are registered over all queries of all scenarios combined and additionally for each of the above choke points individually.
Our benchmark is integrated into the HOBBIT platform and can be run directly there. Also, our benchmark is part of the MOCHA Challenge of the ESWC 2017 taking place in Slovenia. That way, our benchmark already provides the possibility for systems to test improvements on specific faceted browsing choke points as the outcome of changes to their system’s implementation.
Despite the successful building of a first version of a benchmark on faceted browsing, there are a couple of extensions we would like to add in a future version of the benchmark. For example, we plan to make the dataset size of the underlying dataset a feature to be specified on the platform during initialization of a benchmark run. It will then be possible to conveniently benchmark systems with respect to different dataset sizes. Also we plan the possibility to simulate parallel users and a streaming scenario, where the underlying dataset continuously increases during the benchmark run.