The project HOBBIT aims to provide a platform for technology users, solution providers, and scientific community to assess the fitness of existing solutions for their purposes based on industrial data. The challenge here is to provide a tool to reproducible generate an arbitrary amount of these data for different benchmarks. The solution within HOBBIT is to mimick industrial data.
For mimicking of a production machine event data, USU has been looking at machines where the data represents events of machines within a production line. The production line is configurable and can perform several individual production jobs. Each job starts with a start event and ends with an end event with several additional events in between. The jobs can be of arbitrary length depending on the concrete job configuration. Events between start and end event can be expected like maintenance events or job-specific machine adjustments but can also represent faults. USU is mimicking this event data within HOBBIT to reproducible generate arbitrary amounts of event data of an arbitrary number of production lines. To anonymize the asset of the machine manufacturer and the producers using the machines, the event data mimicking algorithm was implemented.
The general process for determining the mimicking algorithm used by USU is:
- Analysing the dependencies and patterns within the captured data (finding correlations)
- Validation of findings by experts (finding causalities)
- Generate models for stochastic and deterministic simulation
The main idea behind the simulation model for event data is to generate an event stream consisting of correlated and unrelated events. The correlations between these events are found based on analysis of the time lag between the occurrences between different events. Based on the analysis of the lag times between the events, distribution functions can be estimated. Assuming the analysis has revealed that event A precedes event B with a certain time lag tA,B, where tA,B is a random variable with the probability distribution function f(t). This distribution function can be used to reproduce an event sequence between the two events. Implementing the mimicking algorithm, USU followed a generic approach. The algorithm defines event sequences as a graph where events are represented as nodes, distribution functions as edges and the lag times between the events are calculated. Mimicked events are randomly generated according to sequence graphs and estimated distributions of events.
This general approach can be used to represent arbitrary event networks. It can be applied to other domains focused on event data processing. The main effort lies in finding the correlations and identifying the causalities between related events.