QALD2017 Challenge – Tasks and Training Data

Task 1: Multilingual question answering over DBpedia

Train dataset: qald-7-train-multilingual.json
Test dataset: qald-7-test-multilingual-withoutanswers.json

Given the diversity of languages used on the web, there is an impeding need to facilitate multilingual access to semantic data. The core task of QALD is thus to retrieve answers from an RDF data repository given an information need expressed in a variety of natural languages.

The underlying RDF dataset will be DBpedia 2016-04. The training data will consist of more than 500 questions compiled and curated from previous challenges. The questions will be available in eight different languages (English, Spanish, German, Italian, French, Dutch, Romanian, and Farsi), possibly with the addition of three further languages (Korean, Hindi and Brazilian Portuguese). Those questions are general, open-domain factual questions, for example:

(en) Which book has the most pages?
(de) Welches Buch hat die meisten Seiten?
(es) Que libro tiene el mayor numero de paginas?
(it) Quale libro ha il maggior numero di pagine?
(fr) Quel livre a le plus de pages?
(nl) Welk boek heeft de meeste pagina’s?
(ro) Ce carte are cele mai multe pagini?

The questions vary with respect to their complexity, including questions with counts (e.g., How many children does Eddie Murphy have?…), superlatives (e.g., Which museum in New York has the most visitors? ), comparatives (e.g., Is Lake Baikal bigger than the Great Bear Lake? ), and temporal aggregators (e.g., How many companies were founded in the same year as Google? ). Each question is annotated with a manually specified SPARQL query and answers.

Data creation: The test dataset will consist of 50 to 100 computed similar questions. We plan to compile those from existing, real-world question and query logs, in order to provide unbiased questions expressing real-world information needs which will then be manually curated to ensure a high quality standard. Existing methodology for selecting queries from query logs has been shown to indeed be able to retrieve prototypical queries. We have seen more than 30 sub-mitted systems over the course of the last QALD challenges attracting systems for most languages.

Task 2: Hybrid question answering

Train dataset: qald-7-train-hybrid.json
Test dataset: qald-7-test-hybrid-withoutanswers.json

A lot of information is still available only in textual form, both on the web and in the form of labels and abstracts in Linked Data sources. Therefore, approaches are needed that can not only deal with the specific character of structured data but also with finding information in several sources, processing both structured and unstructured information, and combining such gathered information into one answer.

QALD therefore includes a task on hybrid question answering, asking systems to retrieve answers for questions that required the integration of data both from RDF and from textual sources. In the previous instantiation of the challenge, this task has gained significant momentum: it attracted seven participating systems.

The task will build on DBpedia 2016-04 as RDF knowledge base, together with the English Wikipedia as textual data source. As training data, we will compile more than 100 English questions from past challenges (partly based on questions used in the INEX Linked Data track). The questions are annotated with answers as well as a pseudo query that indicates which information can be obtained from RDF data and which from free text. The pseudo query is like an RDF query but can contain free text as subject, property, or object of a triple.

Data creation: As test questions, we will provide 50 similar questions all manually created and checked by at least 2 data experts. The main goal when devising those questions will not be to take into account the vast amount of data avail-able and problems arising from noisy, duplicate and conflicting information, but rather to enable a controlled and fair evaluation, given that hybrid question answering is a still very young line of research.

Task 3: Large-Scale Question answering over RDF

Train dataset: qald-7-train-largescale.json

A new task will be introduced this year. The focus will be on the inclusion of a large-scale question set. Successful approaches are able to scale up to a big data volume, handle a vast amount of questions and speed up the question answering process by parallelization, such that the most possible number of questions can be answered as accurately as possible in the shortest possible time. The task will build on DBpedia 2016-04 as RDF knowledge base.

The focus of this task is to withstand the confrontation of the large data volume while returning correct answers for as many questions as possible. We will provide a benchmark of several thousand automatically generated questions. The successful approaches will be able to deal with this vast amount of data and parallelize the answer retrieval process.

Data creation: The training set will consist of questions compiled from the HOB-BIT project. The question set will be generated to the full scale by an algorithm that derives a large amount of new questions from the training question set by varying both the query desire and the form of the natural language expression. Questions will be annotated with SPARQL queries and answers. Participating systems will be evaluated with respect to both number of correct answers and time needed.

Task 4: English question answering over Wikidata

Updated! Train dataset: qald-7-train-en-wikidata.json
Test dataset: qald-7-test-en-wikidata-withoutanswers.json

The Wikidata dataset used to create this benchmark can be found on HOBBIT’s ftp server and the Docker image for running this data with Blazegraph can be found in metaphacts’ Docker Hub.

Another new task introduced this year will use a public data source Wikidata (https://www.wikidata.org/) as a target repository. The training data will include 100 open-domain factual questions compiled from the previous iteration of Task 1. In this task, the questions originally formulated for DBpedia should be answered using Wikidata. Thus, your systems will have to deal with a different data representation structure. This task will help to evaluate how generic your approach is and how easy it is to adapt to a new data source. Note that the results obtained from Wikidata might be different to the answers to the same queries found in DBpedia. 

Data creation: This task was designed in the context of the DIESEL project (https://diesel-project.eu/). The training set contains 100 questions taken from the Task 1 of the QALD-6 challenge. We formulated the queries to answer these questions from Wikidata and generated the gold standard answers using them. For this task, we use the Wikidata dump from 09-01-2017 (https://dumps.wikimedia.org/wikidatawiki/entities/20170109/).

 

Provided Resources

We will provide novel as well as experienced participants with a large list of resources to use. Based on our experience, we expect participants to come up with a good solution within two months of development given the rich set if available open source libraries and tools. Running a task will not take longer than an hour per run. Prominent tools for indexing and searching datasets and text collections are:

Building question answering systems is a complex task; it thus helps to exploit high-level tools for component integration as well as existing architectures for question answering systems:

In the remainder of the section we provide a list of resources and tools that can be exploited especially for the linguistic analysis of a question and the matching of natural language expressions with vocabulary elements from a dataset.

For example, lexical resources are:

Well-known text processing tools and frameworks could be:

Dependency parsers include:

Example Named Entity Recognition tools:

To calculate string similarity and semantic relatedness the following tools are suited:

To foster the multilingual aspect, translation systems could be used:

Anything missing? If you know of a cool resource or tool that we forgot to include (especially for the challenge languages other than English), please drop us a note!