Ref: https://learn.cantrill.io/courses/1820301/lectures/41301424
RDS Read-Replicas (RRs) - Architecture
- 🔧 RDS Read-Replica (RR) = a read-only replica of an RDS instance
- Can be used for reads (unlike standby replica in Multi-AZ instance)
- 👍 allow read performance scaling
- An instance can have up to 5 direct RRs
- Can be created in the same region or in a different region (cross-region RRs)

- ‼️ RRs are separate from the main RDS architecture!
- Each RR has its own endpoint address, independent from RDS instance endpoints
- Requires app support: app needs to be adjusted to use a RR
- ❗ Apps by default know nothing about RRs
- No automatic failover
- ❗ Asynchronous replication
- Data committed when written to main instance. After that, replicated to its RRs.
- Lag (can be noticeable depending on NW conditions & amount of writes)
- 💡 RRs can have their own RRs, but lag starts to be even more noticeable!
- Cross-region RRs allow global performance improvement of read workloads
- Users can read DBs from different regions more efficiently
- Cross-region NWing handled transparently by AWS (data fully encrypted in transit)
- 💡 Multi-AZ cluster deployment is like a combination of Multi-AZ instance deployment + RRs
- ‼️ BUT! The 2 Reader Replicas in Multi-AZ cluster deployment are part of the main architecture! External RRs should be considered something separate!
- For exam: synchronous replication → multi-AZ; asynchronous → RRs (excluding Aurora)
Promotion of RDS Read-Replicas & Disaster Recovery
- 🔧 RRs are read-only until promoted. Upon promotion, they become a normal RDS instance
- Promotion can be done very quickly → 👍 low RTO
- 👍 Improves global availability/resilience
- RR in a different (failover) region can be quickly promoted if main region outage
- 👍 RRs also offer near 0 RPO
- data synced constantly from the main DB instance (very little potential for data loss)
- 💡 RRs are great for quickly recovering from failure, as long as there's no data corruption
- ‼️ Use RRs ONLY when recovering from failure/outage, NOT from data corruption!!
- 👎 Because data is constantly replicated to RR, data corruption is also replicated to RR!
- 💡 If data corruption → must rely on snapshots & backups
- higher frequency and/or higher quality of snapshots/backups improves RPO