Glossary

The aim of this glossary is to provide succinct definitions of concepts commonly found on the HOBBIT pages. With this glossary, we aim to make the semantics of these concepts more explicit to accelerate the understanding of what HOBBIT is all about.

  • Accessible: Data is considered accessible, if it can be viewed using a predefined protocol. In HOBBIT, we really on HTTP dereferenceability and the SPARQL protocol to make experimental results and configurations accessible.
  • Benchmark: A benchmark aims to measure the performance of a tool by providing standardized means for evaluating this performance. Within the context of HOBBIT, a benchmark commonly consists of a test dataset (e.g., sensor data from industrial machinery), a task (e.g., predict the next event in the output stream of a particular sensor), a target dataset (e.g., a dataset containing the correct predictions) and KPIs. The result of a HOBBIT benchmark is a set of measurements stored within an RDF graph accessible via SPARQL.
  • FAIR: This abbreviation stands for findable, accessible, interoperable and retrievable.
  • Findable: Data is regarded as findable when its location can be determined via descriptive means. The HOBBIT is findable by virtue of being indexed in major search engines such as Google.
  • Interoperable: Data is deemed interoperable when it can be integrated with data from other data sources for the purposes of another application. To ensure that our data is interoperable, we rely on the Semantic Web standards RDF, RDFS and OWL. Mechanisms such as link discovery allows to connect our results to other knowledge bases for the data of data integration.
  • Link Discovery: The goal of link discovery is to compute explicit links across entities contained in knowledge graphs. The links is most commonly typed links, where types are predefined properties. Link discovery should not be confused for link prediction.
  • Link Prediction: The goal of of link prediction is to compute links between resources found in a given knowledge base. This is commonly carried out using collective learning algorithms such as tensor factorization or Markov Logic Networks.
  • Mimicking algorithm: Algorithm that simulates a phenomenon, especially for the sake of generating datasets of relevance for a particular domain. For example, an algorithm which simulates the workings of a server to mimic its sensor output.
  • Named Entity Disambiguation: The aim of this family of algorithm to determine the correct URI, i.e., the entity mentioned by a named entity in a piece of text.
  • Named Entity Recognition: Given a predefined set of classes (e.g., persons and locations), the goal of a named entity recognition framework is to detect strings which are labels for resources which instantiate at least one of the predefined set of classes.
  • Question Answering: The goal of question answering is to compute answers to questions expressed in natural language. These systems are commonly stateless and transform the input query into some formal representation (e.g., SPARQL), which is subsequently used to query a structured data source (e.g., a knowledge graph).
  • Unstructured data: Used to mean textual data, i.e., data without a formal syntactic structure accessible to common algorithms.
  • Structured data: Data with an (explicit) formal syntactic structure, e.g., a knowledge graph.
  • Prediction: Given an input of some sort (vector, entity, etc.), compute the degree to which the input belongs to a certain category.
  • KPI: Key performance indicator. A measure used to determine how well an algorithm performs given a certain task.
  • Relation extraction: Given documents annotated with disambiguated entities, relation extraction aims to determine possible relations (often from a given set of relations) which hold between the entities aforementioned.
  • Retrievable: Holds for any data asset that can be materialized unto a storage solution of choice (i.e., downloaded). Within HOBBIT, this concept is implemented by using the dereferenceability of URIs, especially for experimental results.
  • Versioning: The goal of versioning is to devise effective means for the storage and querying of different versions of a given dataset. This functionality is particularly important when archiving data that is still deemed important to access.