OM2020 – Instance Matching or Link Discovery Task – Tasks and Training Data

Tasks and Training Data


The goal of the SPIMBENCH task is to determine when two instances describe the same Creative Work. A dataset is composed of a Tbox (contains the ontology and the instances) and corresponding Abox (contains only the instances). The datasets share almost the same ontology (with some difference in the properties’ level, due to the structure-based transformations). Ontology instances are described through 22 classes, 31 DatatypeProperty, and 85 ObjectProperty properties. From those properties, we have 1 InverseFunctionalProperty and 2 FunctionalProperties. What we expect from participants. Participants are requested to match instances in the source dataset (Tbox1) against the instances of the target dataset (Tbox2). The task goal is to produce a set of mappings between the pairs of matching instances that are found to refer to the same real-world entity. An instance in the source (Tbox1) dataset can have none or one matching counterparts in the target dataset (Tbox2). We ask the participants to map only instances of Creative Works (, and and not the instances of the other classes.

The SPIMBENCH task is composed of two datasets with different scales (i.e., number of instances to match):

  • Sandbox (~380 INSTANCES, ~10000 TRIPLES). It contains two datasets called source (Tbox1) and target (Tbox2) as well as the set of expected mappings (i.e., reference alignment).
  • Mainbox (~1800 CWs, ~50000 TRIPLES). It contains two datasets called source (Tbox1) and target (Tbox2). This test is blind, meaning that the reference alignment is not given to the participants.

In both datasets, the goal is to discover the matching pairs (i.e., mappings) among the instances in the source dataset (Tbox1) and the instances in the target dataset (Tbox2).

The SPIMBENCH datasets are generated and transformed using SPIMBENCH by altering a set of original data through value-based, structure-based, and semantics-aware transformations (simple combination of transformations).

hobbit:implementsAPI bench:spimbenchAPI;

The training datasets can be found here: Datasets

Link Discovery

The Spatial Benchmark generator can be used to test the performance of systems that deal with topological relations proposed in the state of the art DE-9IM (Dimensionally Extended nine-Intersection Model) model. This benchmark generator implements all topological relations of DE-9IM between trajectories in the two dimensional space. We used TomTom and Spaten datasets in order to create the appropriate benchmarks supporting the following relations: Equals, Disjoint, Touches, Contains/Within, Covers/CoveredBy, Intersects, Crosses, Overlaps.

     Task consists of two subtasks:

          Task 1 : TomTom dataset
                   1.1 : Match LineStrings to LineString
                   1.2 : Match LineStrings to Polygons

          Task 2 : Spaten dataset
                   2.1 : Match LineStrings to LineString
                   2.2 : Match LineStrings to Polygons

hobbit:implementsAPI bench:SpatialAPI;
The namespaces of the datasets are:
1) TomTom for the Traces (LineStrings) and the  namespace for the Regions (Polygons) 
2) Spaten for the Traces (LineStrings) and the namespace for the Regions (Polygons) 
The training datasets can be found here: Datasets. The zipped file contains two datasets called source and target as well as the set of expected mappings (i.e., reference alignment).We provide 20 instances per source and target file.


Participants must:

  • provide his/her solution as a docker image. First install docker using the instructions found here and then follow the guide on how to create your own docker image found here .
  • provide a SystemAdapter class on their prefered programming language. The SystemAdapter is main component that establishes the communication between the other benchmark components and the participant’s system.  The functionality of a SystemAdapter is divided in 4 steps:
    • Initialization of the system
    • Retrieval of source and target datasets
    • Execution of the system and sending the to the EvaluationStorage component
    • Shut down of the system

The following example is a description of HOBBIT’s API for participants that use Java as their programming language.

Firstly, read the article in the following link , that will give you a general idea on how to develop a System Adapter in Java.

As explained in the aformentioned article, the  storage system SystemAdapter class in Java must extend the abstract class ( ).

A SystemAdapter must override the following methods:

  • public void init() throws Exception{} : this method is responsible for initializing the system, by executing a command that starts the system’s docker container. First this function must call super.init().
  • public void receiveGeneratedData(byte[] arg0){} : this method is responsible for receiving the source dataset. Arg0  contains the format of the benchmark.

The SystemAdapter must be able to receive from the data generator the source dataset. The source datasets is received using the following code, that can be found here:

SimpleFileReceiver receiver = SimpleFileReceiver.create(this.incomingDataQueueFactory, queueName);
String[] receivedFiles = receiver.receiveData(outputDir);
Where the queueName is “source_file”.

  • public void receiveGeneratedTask(String arg0, byte[] arg1){}: this method is responsible for receiving the target dataset. arg0 is the Task id and arg1 contains the format of the dataset.

The SystemAdapter must be able to receive the target dataset. You have to use the same code as in receiveGeneratedData(byte[] arg0){} method in order to retrieve the dataset.  Here the queueName is “target_file”.

An example of LIMES [1] as system adapter can be found here:

The system has to use those datasets in order to produce the results that contain the matching instances between source and target datasets. The results has to follow the following format:

<sourceURI1> <targetURI1>
<sourceURI2> <targetURI2>

Once the system has produced the results, they must be send to the evaluation storage as byte[] along with the taskID using:

  • sendResultToEvalStorage(String taskID, byte[] data); method.
  • receiveCommand(byte command, byte[] data) : this method is responsible to notify that the datasets are ready to get received.
  • public void close() throws IOException{} : this method is responsible for shutting down the system.

For further explanation of the benchmark components, please read this tutorial: and

How to upload the docker image

When the docker image of your system is ready you should upload the image in the platform following the steps from this file:


The performance metric(s) in a benchmark determine the effectiveness and efficiency of the systems and tools. In linking and spatial benchmark, we focus on the quality of the output in terms of standard metrics such as precision, recall and f-measure. We also aim to quantify the performance (in ms) of the systems measuring the time needed to return all the results.

Java SDK

The standalone software library called the HOBBIT Java SDK may also enable an easier way to implement a benchmark or a system. It will also make possible to run local tests before submitting to the platform.


[1] Axel-Cyrille Ngonga Ngomo. On link discovery using a hybrid approach. Journal on Data Semantics, 1(4):203–217, 2012.