Basic Concepts
- Kinesis data records are made of a partition key & data blob (up to 1MB)
- When delivered to consumer, they also have a seq number (indicates where they were in the stream)
- Records with same partition key go to same shard → key-based ordering
- Data ingestion:
- 1MBps or 1000msgs per shard
- Data consumption (throughput modes):
- Shared: 2MBps per shard for all consumers
- Enhanced: 2MBps per shard per consumer
Capacity Modes
- Provisioned
- Choose the number of shards, scale manually or using API
- Pay per provisioned shard per hour
- 💡 Use when you can predict capacity beforehand
- On-demand
- No need to provision or manage capacity
- Capacity scales automatically based on observed throughput peak during last 30 days
- Diff pricing model: Pay per stream per hour & data IN/OUT per GB
- 💡 Use when your capacity is unknown
Producers
![image.png](https://prod-files-secure.s3.us-west-2.amazonaws.com/6e1bc759-5181-447f-864f-016128e79569/5bb65aff-ba1c-4d44-abd2-45a3b8c7a88a/image.png)
Kinesis Producer SDK
- 🔧 Simple API
- low throughput, higher latency
- Used by AWS Mobile SDK, CWLogs, AWS IoT, Managed Service for Apache Flink… can also be used by Lambda functions
- API calls
PutRecord
→ puts one single record in the data stream
PutRecords
→ batches several records, puts them in one go
- increases throughput → less client requests
Kinesis Producer Library (KPL)
- 🔧 Easy to use, highly configurable Java/C++ library
- Build high performant, long-running producers
- Features
- Automated retry
- Synchronous or Asynchronous API (better performance for async)
- Metrics sent to CW
- Batching → increases throughput, reduces cost
- Can write to multiple shards with same API call
- Aggregate records to improve throughput
- ❗ Latency increases, so be careful → only use if app can tolerate
- Limitations
- Compression must be implemented by the user
- KPL Records must be decoded with KCL or special helper library