Kinesis Data Streams

Kinesis Data Streams - Basic Concepts

🔧 Scalable streaming service
- Designed to ingest lots and lots of data, from lots of devices or apps
Kinesis Data Stream = basic entity (unit of configuration)
- Producers send data into a stream, consumers can read data from streams
- Can scale from low levels of data throughput to near infinite amounts of data
- 🔧 Stored in a moving window of data (24h by default)
  - Older/expired data is discarded
  - Storage for window data included in product (no matter how much data in window)
  - Window can be increased up to 365 days (additional costs)
Public service, regionally resilient by design
Multiple producers can send data to a stream
Multiple consumers can read/access stream data, with whatever granularity they choose
- Great fit for analytics & dashboards

Streams ingest data from Producers. Consumers read data from streams.
🔧 Shard architecture → scaling → shards added to ingest more data
- Shard capacity:
  - 1MBps of ingestion capacity
  - 2MBps of consumption capacity
- More shards → more performance & more cost
- Data stored in Kinesis Data Records (max 1MB) across shards
  - Performance scales linearly
Billing:
- Number of shards (more shards cost more)
- Size of data window (bigger windows cost more)
Amazon Data Firehose can move stream data en masse to another service e.g. S3
- Allows persisting data beyond the stream window
Kinesis Data Streams Architecture Diagram

‼️ Don't confuse the two services, they're very different!
- If scenario involves ingestion of data, lots of data → most likely Kinesis
- If another scenario → assume SQS by default
  - Only change your mind if you have strong reasons to do so