💡 NOTE: A couple of sections here were originally in Kane and Maarek's course, but they were then removed from the course. I'm leaving them here for completion’s sake, but feel free to skip the parts that are no longer in the course.
Intro on Kinesis Data Streams from Cantrill’s SAA-C03
Kinesis Data Streams
Basic Concepts
Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45356719
- Kinesis data records are made of a partition key & data blob (up to 1MB)
- When delivered to consumer, they also have a seq number (indicates where they were in the stream)
- Records with same partition key go to same shard → key-based ordering
- Data ingestion:
- 1MBps or 1000msgs per shard
- Data consumption (throughput modes):
- Shared: 2MBps per shard for all consumers
- Enhanced: 2MBps per shard per consumer
Capacity Modes
- Provisioned
- Choose the number of shards, scale manually or using API
- Pay per provisioned shard per hour
- 💡 Use when you can predict capacity beforehand
- On-demand
- No need to provision or manage capacity
- Capacity scales automatically based on observed throughput peak during last 30 days
- Diff pricing model: Pay per stream per hour & data IN/OUT per GB
- 💡 Use when your capacity is unknown
Kinesis Producers
Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/45356725
data:image/s3,"s3://crabby-images/e2da1/e2da1b121c5a55552c2e115ac60ca109dafbaf0e" alt="image.png"
Kinesis Producer SDK