Benchmark II Analysis & Processing

HOBBIT will provide benchmarks for the linking and analysis of data KPIs of the big data value chain to test the performance of instance matching methods and tools for Linked Data and the analytics benchmark will test the performance of machine learning methods (supervised and unsupervised) for data analytics.

  • Linking: interlinking datasets requires identifying resources in the datasets that refer to the same real-world entity, a process known as instance matching. The proposed benchmark will extend LDBC’s SPIMBench, the instance matching benchmark developed in the context of the LDBC project, to consider streaming data. We will also work with LDIMBench, a benchmark generator inspired from SPIMBench that implements the set of test cases exploiting schema information and the associated semantics for identifying the matches between the instances. LDIMBench is not bound to any data generator, and its test case generator produces a weighted gold standard that encodes how “close” are the constructed matched instances. The fact that it is mostly a test case generator and is not bound to any domain can be used to test a variety of instance matching systems and algorithms.
  • Machine Learning on Structured Data: Given graph data by USU and AGT, we will investigate supervised and unsupervised machine learning methods. Those methods will take resources in the graph representing events as input and learn descriptions for abnormal events and problems. The gold standard for supervised learning methods will be derived from previous abnormal events and problems, e.g. at USU (related to SAKE/DL-Learner). Unsupervised machine learning methods will be based on data smart metering data provided by AGT and access measurement resources in the graph structures in order to predict the energy consumption. Optimal models will be learned for individual households for which additional metadata in the graph structure will be used.