Apache Flink

Apache Flink is a stream processing engine for stateful computations over data streams. It originated as the Stratosphere research project at TU Berlin in 2010, entered the Apache Incubator in 2014, and graduated as a top-level project the same year. Unlike Apache Spark, which processes streams in micro-batches, Flink processes each event individually as it arrives, achieving true real-time latency in the millisecond range.

The current release is Flink 2.0. The engine provides exactly-once processing guarantees through a checkpointing mechanism that periodically snapshots operator state without stopping the pipeline. Flink's event-time processing uses watermarks to handle out-of-order and late-arriving data correctly. The framework manages terabytes of state per application with keyed state stored in an embedded RocksDB backend. Flink also supports batch processing and provides a SQL interface for both streaming and batch queries.

The official documentation covers the DataStream API, Table/SQL API, and deployment. The source code is on GitHub under the Apache 2.0 license.

flink.apache.org

Related technologies

Apache Kafka Apache Spark

What's your hypothesis?

Tell us what you want to test. We'll help you scope it down and give you an honest estimate.