Amazon MSK 101
- 🔧 Kafka-aaS in AWS
- Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications
- 💡 No need to know Kafka for the exam, but understanding differences between MSK and Kinesis Data Streams is needed
- Kafka at a high level
Architecture (Provisioned Mode)
- CRUD Kafka clusters
- Private service
- MSK creates & manages Kafka brokers nodes & Zookeeper nodes
- MSK cluster deployed in your VPC
- Data is stored on EBS volumes
- Diagram
Security
- Encryption:
- Optional in-flight using TLS with brokers and/or clients
- At rest: EBS encryption with KMS
- NW: Authorize specific SGs for your Kafka clients
- AuthN & AuthZ: (who can R/W to which topics)
- Built-in Kafka methods:
- Mutual TLS (AuthN) + Kafka ACLs (AuthZ)
- SASL/SCRAM (AuthN) + Kafka ACLs (AuthZ)
- OR AWS method:
- IAM Access Control (AuthN + AuthZ)
- Diagram
Monitoring
- CW Metrics
- Prometheus
- Broker Log Delivery
MSK Connect
- Managed Kafka Connect workers in AWS → upload topic data to destinations (S3, Redshift, OpenSearch, Debezium…)
- Auto-scaling capabilities for workers