Amazon Kinesis Data Streams [MLA-C01]

Contents:

💡 NOTE: A couple of sections here were originally in Kane and Maarek's course, but they were then removed from the course. I'm leaving them here for completion’s sake, but feel free to skip the parts that are no longer in the course. For the record, everything here is required knowledge for the DEA-C01 certification, but only a part is required for the MLA-C01 certification.

Intro to Kinesis Data Streams from SAA-C03 Notes

Amazon Kinesis Data Streams 101

Basic Concepts and Features

Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45356719

Data Records up to 1MB (typical use case is lot of “small” real-time data)
- Data ingestion:
  - 1MBps or 1000msgs per shard
- Data consumption (throughput modes):
  - Shared: 2MBps per shard for all consumers
  - Enhanced: 2MBps per shard per consumer
Data records are made of a partition key & data blob (up to 1MB)
- When delivered to consumer, they also have a seq number (indicates where they were in the stream)
- Records with same partition key go to same shard → key-based ordering
  - Data ordering guarantee for data with the same “Partition ID”
Data Retention
- Retention up to 365 days (default = 1 day)
- Data can’t be deleted from Kinesis (until it expires)
- Consumers can reprocess (replay) data
Security: At-rest KMS encryption, in-flight HTTPS encryption
SW libraries to create custom producers/consumers
- Kinesis Producer Library (KPL) to write an optimized producer application
- Kinesis Client Library (KCL) to write an optimized consumer application

Capacity Modes

Provisioned
- Choose the number of shards
- Scale manually to increase or decrease the number of shards
- Pay per provisioned shard per hour
  - 💡 Use when you can predict capacity beforehand
On-demand
- No need to provision or manage capacity
- Capacity scales automatically based on observed throughput peak during last 30 days
- Diff pricing model: Pay per stream per hour & data IN/OUT per GB
  - 💡 Use when your capacity is unknown

Kinesis Producers

Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45356725