Ref: https://learn.cantrill.io/courses/1820301/lectures/41301409
Amazon Kinesis Data Streams (KDS) - Basic Concepts
- 🔧 Real-time data streaming service
- Designed to ingest lots and lots of data, from lots of devices or apps
- Serverless, public, and regionally resilient service by design
- 💡 But not completely AWS-managed! Customer must manage shard scaling! (see architecture)
- (Kinesis) Data Stream = basic entity (unit of configuration)
- Producers send data into a stream, consumers can read data from streams
- Can scale from low levels of data throughput to near infinite amounts of data
- ❗ Stored in a moving window of data (24h by default)
- Older/expired data is discarded
- Storage for window data included in product (no matter how much data in window)
- Window can be increased up to 365 days (at additional costs)
- Amazon Data Firehose can move stream data en masse to another service e.g. S3 → Allows persisting data beyond the stream window
- Multiple producers can send data to a stream
- Multiple consumers can read/access stream data, with whatever granularity they choose
- Great fit for real-time (RT) analytics & dashboards
Kinesis Data Streams - Architecture
Diagram: https://github.com/acantril/aws-sa-associate-saac03/blob/main/1600-SERVERLESS_and_APPLICATION_SERVICES/00_LEARNINGAIDS/Kinesis.png
- Streams ingest data from Producers. Consumers read data from streams.
- 🔧 Shard architecture → scaling → shards added to ingest more data
- Shard capacity:
- 1MB/s of ingestion capacity
- 2MB/s of consumption capacity
- More shards → more performance & more cost
- Data stored in Kinesis Data Records (max 1MB) across shards
- Performance scales linearly
- Get billed for:
- Number of shards (more shards cost more)
- Size of data window (bigger windows cost more)
- 💡 Kinesis is generally an expensive service! Use it only if you need a RT stream of data!
SQS vs Kinesis
- ‼️ Don't confuse SQS with Kinesis, they're very different!
- If scenario involves ingestion of data, lots of data → most likely Kinesis
- If another scenario → assume SQS by default (much cheaper service)
- Only change your mind if you have strong reasons to do so
- SQS:
- Generally, 1 production group (WEB tier) & 1 consumption group (WORKER tier)
- ❗ Not designed for 100s/1000s of sensors sending data into a queue!
- Generally used for decoupling & asynchronous communication
- Temporary messages → No persistence (no data window like Kinesis)