Stream Machine Learning (StreaML) Open Challenge 2017-2018 – Details

Challenge Overview

After a successful organization of the DEBS Grand Challenge 2017 by the HOBBIT project at the DEBS 2017 Conference, HOBBIT is proud to announce an Stream Machine Learning (StreaML) Open Challenge, which will be launched in December 2017. The open challenge will ensure continuous participation and systems evaluation.

The goal of the StreaML Open Challenge competition is to evaluate event-based systems for real-time analytics over high velocity and high volume data streams generated by manufacturing equipment.

The predefined machine learning algorithm should be applied for analysis of the RDF streaming data generated by digital and analogue sensors embedded within multiple molding machines. The data produced by each sensor is clustered and the state transitions between the observed clusters is modeled as a Markov chain. Based on this machine learning-based classification, anomalies are detected by means of sequences of transitions that happen with a probability lower than a given threshold.

Please note, that StreaML Open Challenge reuses the dataset and the task description of DEBS GC 2017 and that in contrast to DEBS Challenge, systems will be continuously evaluated every week and results will be shown at leaderboard (evaluation criteria is presented below).

Participants can use the published benchmark as reference implementation of anomaly detection algorithm to pass the correctness checks and focus on performance and stability of their systems, which are included into evaluation criteria.

Implemented solutions should be compatible with the HOBBIT platform, i.e. implemented as system adapter. The HOBBIT-compatible sample system and helpful instructions will help to debug solutions locally and to make it compatible with the online benchmark.


The data set comes from two types of machines: (1) injection molding machines and (2) assembly machines. Injection molding machines are equipped with sensors that measure various parameters of a production process: distance, pressure, time, frequency, volume, temperature, time, speed and force. All the measurements taken at a certain point in time result in a 120 dimensional vector consisting of values of different types (e.g., text or numerical values). Assembly machines are equipped with 3 energy meters. Each measurement for both types of machines is timestamped and described using the OWL ontology. The OWL ontology is provided as several modules that are available and documented on the HOBBIT CKAN site.

All data is provided as RDF triples. The data is provided as (1) metadata and (2) measurements. Metadata includes information about the machine type, the number of sensors per machine and the number of clusters that must be used in order to detect anomalies in the data. Measurements include the actual sensor data as measured by sensors within the machines. We refer participants to this document for additional information and excerpts of the input data.

We provide two sample input data sets under the following address: .

Measurements and Metadata

For anomaly detection we provide an input stream of RDF tuples plus a file containing the metadata for all the machines (injection molding and assembly) to which tuples refer to. The following is a sample of the input stream containing a value for one dimension. We refer the reader to additional data description document for information about the RDF format of the input tuples and the metadata provided for the machines.

debs:ObservationGroup_1 rdf:type i40:MoldingMachineObservationGroup. debs:ObservationGroup_1 ssn:observationResultTime debs:Timestamp_1. debs:ObservationGroup_1 i40:contains debs:Observation_1. debs:ObservationGroup_1 i40:machine wmm:MoldingMachine_1. debs:ObservationGroup_1 i40:observedCycle debs:Cycle_2. debs:Cycle_2 rdf:type i40:Cycle. debs:Cycle_2 IoTCore:valueLiteral "2"^^xsd:int. debs:Timestamp_1 rdf:type IoTCore:Timestamp. debs:Timestamp_1 IoTCore:valueLiteral "2016-07-18T23:59:58"^^xsd:dateTime. debs:Observation_1 rdf:type i40:MoldingMachineObservation. debs:Observation_1 ssn:observationResult debs:Output_2. debs:Observation_1 ssn:observedProperty wmm:_9. debs:Output_2 rdf:type ssn:SensorOutput. debs:Output_2 ssn:hasValue debs:Value_2. debs:Value_2 rdf:type i40:NumberValue. debs:Value_2 IoTCore:valueLiteral "-0.01"^^xsd:float.


As for the input stream, the stream of results of the submitted solution should be provided as a stream of RDF tuples. A sample output is shown in the following. We refer the reader to the additional data description document for information about the RDF format of the output tuples. Please notice that the output data stream should be ordered by the application timestamp.

# anomaly in the first machine in the first dimension  debs:Anomaly_1 rdf:type ar:Anomaly.  debs:Anomaly_1 i40:machine debs:Machine_1.  debs:Anomaly_1 ar:inAbnormalDimension debs:ObservedProperty_1.  debs:Anomaly_1 ar:hasTimeStamp debs:TimeStamp_1.  debs:Anomaly_1 ar:hasProbabilityOfObservedAbnormalSequence 0.1.

Please note that anomaly detection must be calculated in a streaming fashion, i.e.:

  1. solutions must not make use of any pre-calculated information, such as indices, and
  2. result streams must be updated continuously.


The query has three stages: (1) Finding Clusters, (2) Training a Markov Model and (3) Finding Anomalies. The figure below illustrates the query stages as NFA. Note that, once started, the activities for each stage are executed continuously and never stop, e.g., cluster centers are continuously evaluated while the Markov model is already used for anomaly detection.

Three stages of the Query

An event passes the sketched stages in sequence. This means that a changed cluster center must be considered in the subsequent stages right after the centers have changed. An event that causes a change of a cluster center first causes the update of the centers, then an update of the Markov model and is finally used in anomaly detection.

Finding Clusters

For each stateful dimension, find up to and maintain K cluster centers, using the numbers 1 to K as seeds for the initial K centroids. The number K is defined in the metadata for each dimension of each individual machine. Use all measurements from the last W time units to find the cluster centers for a given time window.

The initial cluster centers for each dimension of measurements in a given time window are determined by the first K distinct values for that dimension in given window (K is the upper limit).

The algorithm must compute M (e.g.: 50) iterations to find a clustering, unless it terminates earlier.

* If a given window has the amount of distinct values less than K then the number of clusters must be equal to the number of distinct values in the window (K is the upper limit).

** If a data point has the exact same distance to more than one cluster center, it must be associated with the cluster that has the highest center value.

Training the Markov Model

Determine the transition probabilities by maintaining the count of transitions between all states in the last W time units. For determining a transition at time t, use the cluster centers that are valid at time t, i.e., no remapping of past observations to clusters in retrospect is required. Note, that also the current state that was reached prior to t, does not need to be reevaluated at t. Please also note that no two tuples for the same dimension have the same time stamp.

Finding Anomalies

Output an alert about a machine, if any sequence of up to N state transitions for that machine is observed that has a probability below T.

Please note that time is always defined as application time, i.e., as given by the timestamp of arriving tuples. Please also note that each new event is (1) first used to update the cluster centers, (2) then to update the Markov model, and (3) to compute the probability of the last up to N state transitions.


All submitted solutions should be able to accommodate the following parameters (will be passed to the systems by the HOBBIT platform depending on the particular evaluation task):

  • W: window size for finding cluster centers with k-means clustering and for training transition probabilities in Markov model.
  • N: number of transitions to be used for combined state transition probability.
  • M: number of maximum iterations for the clustering algorithm.
  • Td: the maximum probability for a sequence of N transitions to be considered an anomaly. The value of Td is specified for each dimension d for which the clustering is performed.

The whole parameters of the benchmark described below.

Submission & Evaluation

Participants should submit their solutions to the HOBBIT platform.

In the following section we provide a description of the platform, and detail the registration, submission and evaluation procedure.

Platform Overview

The platform comprises of several components. Each single component is implemented as independent container. The communication between these components is done via a message bus. Docker ( is used as a framework for the containerization and RabbitMQ ( as message bus. Participants must provide all solutions (benchmarked systems) as a docker containers. Each solution must read data from one message queue and output anomalies into another message queue provided by the evaluation platform.

The StreaML sample system should help participants to dockerize their systems and debug them locally before submission to the online platform.

Evaluation cluster has four nodes allocated for solutions. Each node is 2×64 bit Intel Xeon E5-2630v3 (8-Cores, 2,4 GHz, Hyperthreading, 20MB Cache, each proc.), 256 GB RAM, 1Gb Ethernet.

A detailed description of the platform is available here.

Submission procedure

Submitting your system to the challenge includes two steps:

  1. Upload your system to the HOBBIT platform
  2. Register your system for the challenge

After submitting your system to the HOBBIT platform, you can use the StreaML-Benchmark to test the correctness of your implementation. Once the training phase has started, you can register your system for the challenge and it will be continuously evaluated as described below.

In the following we describe what you need to do for these two steps.

Upload to the HOBBIT platform

The detailed information how to submit your system to the platform is documented here:

After you uploaded your system you can run experiments manually in order to play with intensity/performance parameters*. Note that the benchmark parameters will change during the challenge, dynamic mode with RDF format will be used.  You can use the published benchmark for testing your system. A detailed explanation about how to run benchmark experiments with your system is available here.

* Parameters of the benchmark:

  • benchmark mode: static/dynamic:InitialMachinesCount:datapointsCountBeforeNewMachineJoin
  • machines count – number of machines (impacts stream intensity rate)
  • amount of messages  per machine (impacts stream intensity rate)
  • interval between measurements (impacts stream intensity rate)
  • window size (impacts algorithm performance/memory consumption)
  • transitions count (impacts algorithm performance)
  • max clustering iterations (impacts algorithm performance)
  • timeout (for dropping benchmark and system, default is 1 hour)
  • output format (0- RDF, 1-CSV)
  • probability threshold
Registration for Open Challenge:

When the training phase has started, the “StreaML open challenge” will be available under the “Challenges” tab in the platform GUI, participants will be able to register their systems for participation. The detailed description of the registration procedure is described here. Participants need to register their systems for all tasks defined in StreaML Open Challenge at the moment.

Evaluation Procedure and Criteria

The registered systems will be periodically (weekly) evaluated against the set of tasks defined for the “StreaML Open Challenge” at the moment of time. In the HOBBIT platform a task is a benchmark experiment that is executed as part of a challenge.

Evaluation results will be available online (at at “StreaML Open Challenge” tasks’ leaderboards (individual leaderboard per each task), where systems will be sorted by mean latency*.

Once two or more systems successfully pass the existing tasks (appeared in leaderboard of the last task) the new task with the increased performance-sensitive parameter values** will be appended to the challenge. Participating systems will be automatically registered to participation in newly added tasks.

Winner of the last leaderboard at the moment of cut-off-date (the best performing and stable system with the lowest latency) wins the round of the challenge and gets the price.

* The latency is calculated as the difference between (1) the system clock time when the output tuple (anomaly tuple) was pulled  by the solution into the output queue and (2) the system clock time when the last contributing input tuple (datapoint) was consumed by the solution from the input queue.

** Example of performance-sensitive parameters: number of machines, delay between tuples, window size.


Feel free to ask any questions about the challenge under the Issues tab.