Big Data Analytics: Approaches and Benchmark

“Data is the new gold” is old news. That does not mean we are finished finding out how to best mine it, but we have become pretty good at it by now. Many companies already leverage Big Data Analytics in their value chain.

One of them is us, AGT International, a pioneer in IoT and Social data management, Big Data integration and advanced analytics. AGT has deployed our analytics on Social and IoT data and integrated it with traditional Big Data to help solve wide-ranging challenges for businesses and organizations around the world, including reducing traffic jams, predicting floods, producing new media content, better managing energy production & consumption, providing new kinds of experiences to audiences at concerts & sporting events, advising farmers how to grow crops in the desert and helping make cities smarter.

So, how do we – and others – mine our gold? To answer that, we give a short overview over selected concepts and technologies around Big Data Analytics. Big Data Architecture patterns, such as Lambda or Kappa, provide us with blue prints on how to design a system. They show us how to handle the trade-off between accuracy and timeliness on a conceptual level. Big Data Frameworks, such as Hadoop, Spark, Samza, or Flink, provide us with the technology to realize a system. They enable us to actually implement our system in a Big Data way. Finally, data visualization tools, such as Kibana or Tableau, provide us with a way to graphically represent analytics results. They enable us to consume the insights of Big Data analytics.

It should go without saying, that the world is not that easy. Each of the presented concepts and technologies has its advantages and disadvantages. And more often than not, there is no out-of-the-box solution for a given problem, and you have to resort to either tailoring existing solutions, or even build something from scratch. One of these situations occurs when it comes to evaluate the behavior of Big Data systems.

This is where tools like HOBBIT become handy. HOBBIT, within its scope to Big Linked Data, allows us to test and compare existing building blocks or fully-fledged systems. It spans a whole range of different types of benchmarking, such as latency, throughput, and accuracy and provides comparable results on them. This enables us to make informed technology choices, select amongst the best tools and even rank existing systems.

Since the beginning of the HOBBIT project, benchmarking platform has gone far enough to become main evaluation platform of the ACM DEBS Grand Challenge 2017. The challenge attracts participants from industry and academia interested in distributed event processing systems. This year ‘Anomaly Detection Benchmark’ from HOBBIT benchmark-set was chosen to challenge contestants. In particular, they need to cope with finding anomalies in high-velocity sensor stream. All the measurements, initialization and reporting is done by the HOBBIT platform. So we are looking forward for the benchmarking results!

Authors: Alexander Wiesmaier, Roman Katerinenko

Spread the word. Share this post!

Leave A Reply

Your email address will not be published. Required fields are marked *