TRACK: Big Data
Drinking from a Firehose with Apache Spark Streaming and Flink
Scalable near-realtime processing of large amounts of data is a feat that often requires a very specific tool- and mindset to be done right. Apache Spark Streaming and Apache Flink are aimed to help one on this journey, providing similar APIs, but conceptually different processing models on the background. Not too long ago in Playtech, we started to build a fault-tolerant near-realtime business event streaming platform for our central business information management solution called the IMS. For us, this was a large step into the unknown, introducing a fresh mindset, different data processing concepts and a new stack of technologies, including Apache Spark and Apache Flink. With what has hopefully been the steep slope of the learning curve behind us, it would be a good time to share the good, the bad and the ugly details of this experience with all of you. In this talk we have a short look at the two streaming frameworks through the eyes of a software developer relatively new to the Big Data and Event Sourcing ecosystem. We will cover some likely pitfalls one might find when transitioning from traditional request-response type communication infrastructure to event-driven architecture and have a look at some tools and tricks that could help you get up to speed when developing Spark Streaming and Flink applications. Is one better than the other or might you need both to succeed? Let’s find out!