OM2017 – Link Discovery Task – Tasks and Training Data

 Tasks and Training Data

Tasks Summary Description

For Ontology Alignment Evaluation Initiative 2017 Campaign (OM2017) we propose two benchmark generators that deal with link discovery for spatial data where spatial data are represented as trajectories (i.e., sequences of longitude, latitude pairs).

    • The Linking Benchmark generator (Task 1) is based on SPIMBENCH [1], to test the performance of Instance Matching tools that implement mostly string-based approaches for identifying matching entities. This benchmark generator can be used not only by instance matching tools, but also by SPARQL engines that deal with query answering over geospatial data such as STRABON [2]. For this benchmark generator we used a subset of the transformations implemented in SPIMBENCH. The ontologies used to represent trajectories are fairly simple, and do not consider complex RDF or OWL schema constructs already supported by SPIMBENCH.The test cases implemented in the benchmark focus on string-based transformations with different  (a) levels (b) types of spatial object representations and (c) types of date representations.Furthermore, the benchmark supports addition and deletion of ontology (schema) properties, known also as schema transformations. The datasets that implement those test cases can be used by Instance Matching tools to identify matching entities. In a nutshell, the benchmark can be used to check whether two traces with their points annotated with place names designate the same trajectory.
    • The Spatial Benchmark generator (Task 2) can be used to test the performance of systems that deal with topological relations proposed in the state of the art DE-9IM (Dimensionally Extended nine-Intersection Model) model [3].
      This benchmark generator implements all topological relations of DE-9IM between trajectories in the two dimensional space. To the best of our knowledge such a generic benchmark, that takes as input trajectories and checks the performance of linking systems for spatial data does not exist. For the design, we focused on (a) on the correct implementation of all the topological relations of the DE-9IM topological model and (b) on producing large datasets large enough to stress the systems under test. The supported relations are: Equals, Disjoint, Touches, Contains/Within, Covers/CoveredBy, Intersects, Crosses, Overlaps.

    Testing and Training Data

    The training datasets can be found here : Datasets.  The zipped file contains two datasets called source and target as well as the set of expected mappings (i.e., reference alignment). For Task 1, we provide 100 instances and for Task 2, 20 instances per source and target file.


For Tasks 1 and 2, participants must:

  • provide his/her solution as a docker image. First install docker using the instructions found here and then follow the guide on how to create your own docker image found here .
  • provide a SystemAdapter class on their prefered programming language. The SystemAdapter is main component that establishes the communication between the other benchmark components and the participant’s system.  The functionality of a SystemAdapter is divided in 4 steps:
    • Initialization of the system
    • Retrieval of source and target datasets
    • Execution of the system and sending the to the EvaluationStorage component
    • Shut down of the system

The following example is a description of HOBBIT’s API for participants that use Java as their programming language.

Firstly, read the article in the following link , that will give you a general idea on how to develop a System Adapter in Java.

As explained in the aformentioned article, the  storage system SystemAdapter class in Java must extend the abstract class ( ).

A SystemAdapter must override the following methods:

  • public void init() throws Exception{} : this method is responsible for initializing the system, by executing a command that starts the system’s docker container. First this function must call super.init().
  • public void receiveGeneratedData(byte[] arg0){} : this method is responsible for receiving the source dataset. Arg0  contains the format of the benchmark.

The SystemAdapter must be able to receive from the data generator the source dataset. The source datasets is received using the following code, that can be found here:

SimpleFileReceiver receiver = SimpleFileReceiver.create(this.incomingDataQueueFactory, queueName);
String[] receivedFiles = receiver.receiveData(outputDir);
Where the queueName is “source_file”.

  • public void receiveGeneratedTask(String arg0, byte[] arg1){}: this method is responsible for receiving the target dataset. arg0 is the Task id and arg1 contains the format of the dataset.

The SystemAdapter must be able to receive the target dataset. You have to use the same code as in receiveGeneratedData(byte[] arg0){} method in order to retrieve the dataset.  Here the queueName is “target_file”.

An example of LIMES [4] as system adapter can be found here:

The system has to use those datasets in order to produce the results that contain the matching instances between source and target datasets. The results has to follow the following format:

<sourceURI1> <targetURI1>
<sourceURI2> <targetURI2>

Once the system has produced the results, they must be send to the evaluation storage as byte[] along with the taskID using:

  • sendResultToEvalStorage(String taskID, byte[] data); method.
  • receiveCommand(byte command, byte[] data) : this method is responsible to notify that the datasets are ready to get received.
  • public void close() throws IOException{} : this method is responsible for shutting down the system.

For further explanation of the benchmark components, please read this tutorial:   and

How to upload the docker image

When the docker image of your system is ready you should upload the image in the platform following the steps from this file:


The performance metric(s) in a benchmark determine the effectiveness and efficiency of the systems and tools. In linking and spatial benchmark, we focus on the quality of the output in terms of standard metrics such as precision, recall and f-measure. We also aim to quantify the performance (in ms) of the systems measuring the time needed to return all the results.



[1] T. Saveta, E. Daskalaki, G. Flouris, I Fundulaki, M. Herschel, and A.-C. Ngonga Ngomo. Pushing the limits of instance matching systems: A semantics-aware benchmark for linked data. In WWW, pages 105106. ACM, 2015. Poster.
[2] Manolis Koubarakis and Kostis Kyzirakos. Modeling and Querying Metadata in the Semantic Sensor Web: the Model stRDF and the Query Language stSPARQL. In ESWC , 2010.
[3] Christian Strobl. Encyclopedia of GIS , chapter Dimensionally Extended Nine-Intersection Model (DE-9IM), pages 240245. Springer, 2008.
[4] Axel-Cyrille Ngonga Ngomo. On link discovery using a hybrid approach. Journal on Data Semantics, 1(4):203–217, 2012.