Ref: https://learn.cantrill.io/courses/1820301/lectures/41301411
DISCLAIMER: Amazon Kinesis Analytics Name Change
- ‼️ ‼️ This product used to be called Amazon Kinesis Data Analytics, and SQL was used to do the transformations!!! The product is no longer part of the Kinesis family of products, and Apache Flink is used as its main engine for stream processing (SQL can still be used though)
- Ref: https://aws.amazon.com/blogs/aws/announcing-amazon-managed-service-for-apache-flink-renamed-from-amazon-kinesis-data-analytics/
- The lecture and some of my notes are for the previous naming and are hence outdated… But the product retains most of its functionality. I will attempt to update these notes at some point.
- Old summary: Amazon Kinesis Data Analytics is the easiest way to analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time. It is part of the Kinesis family of products and is capable of operating in real time on high throughput streaming data. It transforms input data using SQL and (optionally) reference data stored in S3, and streams the output data in real-time to the destinations. It is a good fit for scenarios like time-series analytics (e.g. elections, esports…), real-time dashboards (e.g. leaderboards in games) and real-time metrics for security & response teams.
Amazon Managed Service for Apache Flink - Overview
- 🔧 Real-time processing of data streams, using Apache Flink
- Conceptually, product sits between two streams: input stream & output stream

- Supported Sources
- Kinesis Data Streams
- Data Firehose
- Amazon Managed Streaming for Apache Kafka (MSK)
- …
- Can optionally pull in static reference data from S3
- Supported Destinations
- Kinesis Data Streams
- Data Firehose
- Firehose destinations hence indirectly supported: HTTP, Splunk, OpenSearch Service, S3, and Redshift
- Amazon Managed Streaming for Apache Kafka (MSK)
- Lambda
- S3
- Analytics tools
- …
Kinesis Data Analytics - Architecture

- ❗ Kinesis Data Analytics doesn't modify input sources in any way (they're external), only delivers modified data to output sources
- Conceptually, in-application input streams are normal data tables that are updated in real time → they match the current data from input stream
- Reference table (from S3) contains static data which can be linked to the input data, enriching it
- 💡 e.g. a popular esport streams data to Kinesis. The S3 static data (player metadata, in-game stats/metadata…) is injected in real-time whenever relevant, i.e. when those players or in-game stuff appear on the stream → people can observe the current kills of the player featured on the stream
- Core: Application code (written in SQL) processes input and produces output
- Generates in-application output streams → output data tables updated in RT
- Processing errors can be sent to in-application error stream
- ❗Billed only for processed data, but it's NOT cheap!
- Use only in scenarios that make sense