OKE2017 Challenge – Tasks

Task 1: Focused NE Identification and Linking

The task comprises the identification of named entities in sentences and the disambiguation of the identified entities to the DBpedia knowledge base.
This task is limited to three DBpedia ontology classes (Person, Place, Organisation) and their associated sub classes.

A competing system is expected to identify elements in a given text by its start and end index, further to generate an RDF formalizing the linking of the identified entities to the DBpedia knowledge base.

Example

Florence May Harding studied at a school in Sydney, and with Douglas Robert Dundas, but in effect had no formal training in either botany or art.

identified named entity generated URI indices
Florence May Harding dbr:Florence_May_Harding 0,20
Sydney dbr:Sydney 44,50
Douglas Robert Dundas oke:Douglas_Robert_Dundas 61,82

Request data

In the example above, the benchmarked system would receive the following UTF-8 encoded String.

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
<http://example.com/example-task1#char=0,146>
        a                     nif:RFC5147String , nif:String , nif:Context ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "146"^^xsd:nonNegativeInteger ;
        nif:isString          "Florence May Harding studied at a school in Sydney, and with Douglas Robert Dundas , but in effect had no formal training in either botany or art."@en .

Response data

The expected response of a participating system for the example above would be the following UTF-8 encoded String.

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
<http://example.com/example-task1#char=0,146>
        a                     nif:RFC5147String , nif:String , nif:Context ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "146"^^xsd:nonNegativeInteger ;
        nif:isString          "Florence May Harding studied at a school in Sydney, and with Douglas Robert Dundas , but in effect had no formal training in either botany or art."@en .
<http://example.com/example-task1#char=0,20>
        a                     nif:RFC5147String , nif:String ;
        nif:anchorOf          "Florence May Harding"@en ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "20"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://example.com/example-task1#char=0,146> ;
        itsrdf:taIdentRef     dbpedia:Florence_May_Harding .
<http://example.com/example-task1#char=44,50>
        a                     nif:RFC5147String , nif:String ;
        nif:anchorOf          "Sydney"@en ;
        nif:beginIndex        "44"^^xsd:nonNegativeInteger ;
        nif:endIndex          "50"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://example.com/example-task1#char=0,146> ;
        itsrdf:taIdentRef     dbpedia:Sydney .
<http://example.com/example-task1#char=61,82>
        a                     nif:RFC5147String , nif:String ;
        nif:anchorOf          "Douglas Robert Dundas"@en ;
        nif:beginIndex        "61"^^xsd:nonNegativeInteger ;
        nif:endIndex          "82"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://example.com/example-task1#char=0,146> ;
        itsrdf:taIdentRef     <http://aksw.org/notInWiki/Douglas_Robert_Dundas> .

Task 2: Broader NE Identification and Linking

This task extends the first task towards the entity classes in DBpedia. Beside the three types in the first task, a competing system might have to identify other types of entities.
The following table provides in the first column a complete list of the super classes that are considered. The column in the middle contains an incomplite list of sub classes if any and the last example instances.

super class sub class examples instance examples
Activity Game, Sport Chess, Baseball
Agent Employer, Organisation, Person Leipzig_University, Angela_Merkel
Award Decoration, NobelPrize Humanitas_Prize
Disease Diabetes_mellitus_type_2
EthnicGroup Javanese_people
Event Competition, PersonalEvent Extended_Semantic_Web_Conference
Language ProgrammingLanguage English_language, Scala_(programming_language)
MeanOfTransportation Aircraft, Train Airbus_A300
PersonFunction PoliticalFunction, Profession
Place Leipzig
Species Animal Cat
Work Artwork Debian

The request and response look like the example request and response of task 1.

Task 3: Focused Musical NE Recognition and Linking

The task is composed of two sub tasks focused musical NE recognition and linking. A competing system has to fulfill both tasks. The domain of this task is music, thus the used knowledge base will be MusicBrainz as Linked Data (MBL).

Task 3.1: Focused Musical NE Recognition

This sub task consists of the identification and classification of named entities. The task is limited to a subset of the entities types in MBL, which are defined according to the Music Ontology (mo), i.e., entities of the following types: MusicArtist, SignalGroup and MusicalWork.

A competing system is expected to identify entity mentions in a given text by their start and end index, and to further assign them one of these three predefined types.

Task 3.2: Musical NE Linking

In this sub task a system has to link the recognized entities of the previous sub task to the corresponding resource in MBL.

Example

When Simon & Garfunkel split in 1970, Simon quickly began his solo career with the release of the self-titled album “Paul Simon”. This was followed by “There Goes Rhymin’ Simon” and “Still Crazy After All These Years”, both of which featured chart-topping hits such as “Loves Me Like A Rock” and “Kodachrome”.

identified named entity classified entity type generated URI indices
Simon & Garfunkel MusicArtist artist:5d02f264-e225-41ff-83f7-d9b1f0b1874a 5,22
Simon MusicArtist artist:05517043-ff78-4988-9c22-88c68588ebb9 38,43
Paul Simon SignalGroup release-group:a1cc3fbd-609b-323c-95e2-435dfceb51e9 117,127
There Goes Rhymin’ Simon SignalGroup release-group:fb1e90a8-4461-382b-9081-183abb3c8997 152,176
Still Crazy After All These Years SignalGroup release-group:cd0c17f4-ff8d-3b1d-ac36-397ebbb069e9 183,216
Loves Me Like A Rock MusicalWork work:bc76594b-b113-4a57-b929-b9911531108e 270,290
Kodachrome MusicalWork work:c9ad17e6-440e-40b6-b4f9-58b74b006c20 297,307

Request data

For a shortend version of the example above, the benchmarked system would receive the following UTF-8 encoded String.

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
<http://example.com/example-task3#char=0,40>
        a                     nif:RFC5147String , nif:String , nif:Context ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "40"^^xsd:nonNegativeInteger ;
        nif:isString          "When Simon & Garfunkel split in 1970,..."@en .

Response data

The expected response of a participating system for the example above would be the following UTF-8 encoded String.

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
<http://example.com/example-task3#char=0,40>
        a                     nif:RFC5147String , nif:String , nif:Context ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "40"^^xsd:nonNegativeInteger ;
        nif:isString          "When Simon & Garfunkel split in 1970,..."@en .
<http://example.com/example-task3#char=5,22>
        a                     nif:RFC5147String , nif:String ;
        nif:anchorOf          "Simon & Garfunkel"@en ;
        nif:beginIndex        "5"^^xsd:nonNegativeInteger ;
        nif:endIndex          "22"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://example.com/example-task3#char=0,40> ;
        itsrdf:taIdentRef     <http://musicbrainz.org/artist/5d02f264-e225-41ff-83f7-d9b1f0b1874a> ;
        itsrdf:taClassRef     <http://purl.org/ontology/mo/MusicArtist> .

Task 4: Knowledge Extraction

The task goal is to extract knowledge from a given text and to formalize the knowledge in RDF triples. DBpedia is the used knowledge base in the knowledge extraction task.

The quality is measured with precision, recall and F1-score. Precision and recall are calculated based on the comparison of the received graph, i.e., the triples that have been received from the competing system, and the expected graph. The evaluation module will try to match both graphs to each other in a way that the number of matching nodes and edges are as high as possible. Matching triples are counted as true positives. False positives are triples that have been received from the competing system but are not available in the expected graph. And false negatives are triples that have been expected but are not received from the system.

Note that the text might contain emerging entities, i.e., the received response contain the correct triples but the generated URIs of some nodes not match.
We will handle that case like the matching of blank nodes.

The task is limited to the following owl:ObjectProperty properties:

Example

William Thomson’s father, James Thomson, was a teacher of mathematics and engineering at Royal Belfast Academical Institution and the son of a farmer. James Thomson married Margaret Gardner in 1817 and, of their children, four boys and two girls survived infancy.

subject property object
dbr:William_Thomson,_1st_Baron_Kelvin parent dbr:James_Thomson_(mathematician)
dbr:James_Thomson_(mathematician) spouse oke:Margaret_Gardner
dbr:James_Thomson_(mathematician) employer dbr:Royal_Belfast_Academical_Institution

Request data

For the example above, the benchmarked system would receive the following UTF-8 encoded String.

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
<http://example.com/example-task4#char=0,262>
        a                     nif:RFC5147String , nif:String , nif:Context ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "262"^^xsd:nonNegativeInteger ;
        nif:isString          "William Thomson’s father, James Thomson, was a teacher of mathematics and engineering at Royal Belfast Academical Institution and the son of a farmer. James Thomson married Margaret Gardner in 1817 and, of their children, four boys and two girls survived infancy."@en .

Response data

The expected response of a participating system for the example above would be the following UTF-8 encoded String.

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dbpedia: <http://dbpedia.org/resource/> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix dbo: <http://dbpedia.org/ontology/> .
<http://example.com/example-task4#char=0,262>
        a                     nif:RFC5147String , nif:String , nif:Context ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "262"^^xsd:nonNegativeInteger ;
        nif:isString          "William Thomson’s father, James Thomson, was a teacher of mathematics and engineering at Royal Belfast Academical Institution and the son of a farmer. James Thomson married Margaret Gardner in 1817 and, of their children, four boys and two girls survived infancy."@en .
dbr:William_Thomson,_1st_Baron_Kelvin dbo:parent	dbr:James_Thomson_(mathematician) .
dbr:James_Thomson_(mathematician)     dbo:spouse    <http://aksw.org/notInWiki/Margaret_Gardner> .
dbr:James_Thomson_(mathematician)     dbo:employer	dbr:Royal_Belfast_Academical_Institution .

Data Creation

The documents might contain emerging entities, entities that are not part of the KB. These entities have to be marked and a URI has to be generated for them.

Scenario A

Scenario A offers 100 curated training and test data. The goal of this task is to achieve a high F1-score.

Scenario B

Scenario B offers a large number of synthetically generated data. The performance of a system is measured by β which is defined as β = F1-score / runtime.
The goal of this task is to achieve a high β.

Training Data

Scenario A
Task 1 training data last update 13.03.2017
Task 2 training data last update 13.01.2017
Task 3 training data last update 02.02.2017
Task 4 training data last update 13.01.2017

Evaluation

Overall, there will be two main evaluation approaches: subjective and objective. Subjective evaluation will be based on paper reviews and objective evaluation
will be based on computing relevance measures. As knowledge base the DBpedia and MBL is used and the performance of a system is measured using F1-score and β. Note that we reuse the ability of the Gerbil project enabling the benchmarking of systems that link to another KB than DBpedia as long as there exist sameAs links between the two knowledge bases.

The evaluation will be based in two matching strategies. The strong annotation matching includes the correct position of the entity mention inside the document. The weak annotation matching relaxes the conditions of the strong annotation matching. Thus, a correct annotation has to be linked to the same entity and must overlap the annotation of the gold standard.