The idea of creating a platform to host different benchmarking tools for the Big Linked Data lifecycle was born with the creation of the General Entity Annotator Benchmarking Framework (GERBIL). GERBIL is designed to facilitate the benchmarking of named entity recognition (NER), named entity disambiguation (NED) and other semantic tagging approaches. GERBIL’s objective is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. The main pain points for tool developers and end users that motivated the creation of GERBIL were:
Gold Standards accessibility. For evaluating annotation tools, a developer requires available gold standard datasets. Formats and data representations provided in these gold standards vary from the different domains. Thus, authors evaluating their tools have to write a parser and the actual evaluation tool to be able to use the available datasets.
Comparability of results. A large number of quality measures has been developed and used actively across the annotation research community to evaluate the same task, leading to difficult comparisons between the results across publications on the same topics. For example, while some authors publish macro-F-measures and simply call them F-measures, others publish micro-F-measures for the same purpose, leading to significant discrepancies across the scores.
Repeatability of experiments. Considering the challenges for evaluating tools, recreating the experiments remains a hard task. Moreover, it is difficult to track benchmarks configuration and results achievements. Thus, GERBIL allows users to receive a stable URL for their experiments containing human- as well as machine-readable metadata about the experiment.
GERBIL proposes to simplify the integration of datasets, benchmarks and annotators by using a common communication format, the NLP Interchange Format (NIF). NIF is a RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools at the structural, conceptual and access layer. Therefore, with the GERBIL Framework it is easy to:
- Integrate different annotators that provide a REST interface via a Java interface and a NIF library,
- Add different NIF based corpora from services such as DataHub, and
- Add new quality measurements that can be implemented via Java Interfaces.
The HOBBIT platform will expand upon the mechanisms behind to cover the different Linked Data lifecycle stages. First, the HOBBIT Platform is innovating the integration of datasets to not only limiting to open datasets, but also providing the tools for generating datasets that reflect real industrial (closed) datasets. In addition to classical metrics such as precision, recall, F-measure and runtime, we will collect relevant KPIs from the community and provide reference implementations as well as public performance reports to the community, especially to participating developers and interested parties.