Ref: https://learn.cantrill.io/courses/1820301/lectures/41301416
CAP Theorem
- DB transactions have different properties depending on the DB model and architecture
- CAP = Consistency, Availability, Partition tolerance → DB transaction properties
- Consistency: read operations always return the most recent data, or if that's not immediately possible, they return an error
- 💡 If you write a new item on the DB and immediately afterwards read the DB, the DB with the new item will be returned (or you get an error and don't receive anything)
- ‼️ Non-consistent DBs can return DB states that are NOT the most recent!
- Availability: every request to the DB will always receive a non-error response, but no guarantee that the response will contain the most recent data
- Partition tolerance (or partition resilience): a DB system can be made of multiple NW partitions, and will continue to operate EVEN IF there are faults in the NW
- e.g. errors between NW nodes, or messages between nodes getting dropped
- 🔧 CAP Theorem → A DB system can not have all 3 CAP properties, it must CHOOSE 2
- 💡 Every pair of CAP properties brings its own pros, cons and tradeoffs
- Consider a DB cluster in a NW. If some of the NW nodes fail, you have 2 options:
- Reject read operations and return an error
- Reduces availability, but ensures consistency
- Non-error answers are guaranteed to contain the latest write (i.e. latest DB state)
- Accept read operations and return something
- Improves availability, but risks consistency
- The answer might not contain the latest write
- 💡 If a DB is spread in multiple NW nodes, you can't have both consistency and availability. There's no way around this.
- Additional reference: https://en.wikipedia.org/wiki/CAP_theorem
- ACID and BASE are two DB transaction models that choose different tradeoffs
- ACID → Ensures consistency
- BASE → Opts for high availability
ACID Transaction Model
Diagram: https://github.com/acantril/aws-sa-associate-saac03/blob/main/1300-RELATIONAL_DATABASE_SERVICE(RDS)/00_LEARNINGAIDS/ACIDvBASE-1.png
- DB transactions are Atomic, Consistent, Isolated and Durable (ACID)
- Atomic = Either all parts of a transaction are successful, or none are
- Transactions succeed or fail as a whole
- 💡 Consider a $10 transaction from bank account A to bank account B
- Part 1: take out $10 from account A + Part 2: put $10 in account B
- Partial success is not desirable → it could mean money is lost (if only part 1 succeeds) or money is duplicated (if only part 2 succeeds)
- Consistent = Transactions move the DB from one valid state to another valid state
- Nothing in between is allowed, DB can never be in an invalid state
- Valid states determined as per rules of the DB
- 💡 e.g. in a SQL DB, all rows must have values for all columns. DB can't be in a state where that's not the case
- Additional reference: https://en.wikipedia.org/wiki/ACID#Consistency
- Isolated = Different transactions executed in parallel don't interfere with each other
- End state is the same as if all transactions had been sequentially executed
- DB can function with multiple apps or users accessing the DB simultaneously
- Durable = Once committed, transactions are stored on non-volatile memory
- Transactions remain committed, resilient to system failures (outages or crashes)
- 💡 Operation reported as successful by the DB → data stored securely
- If system now fails, or power fails, or a NW/node restarts, data is unaffected
- ‼️ 👎 ACID transactions limit the ability of a DB to scale!
- Transactions have strict and rigid characteristics
- Most SQL DBs (RDBMS) use ACID-based transactions
- Useful for e.g. finances → very rigid structure for data transactions
BASE Transaction Model
Diagram: https://github.com/acantril/aws-sa-associate-saac03/blob/main/1300-RELATIONAL_DATABASE_SERVICE(RDS)/00_LEARNINGAIDS/ACIDvBASE-2.png
- DB transactions are Basically Available, Soft state and Eventually consistent (BASE)
- Basically Available = R/W operations available “as much as possible”, without consistency guarantees
- Data availability increased by spreading and replicating data across all of the different NW nodes
- The DB does its best to be consistent, but no guarantee (not the main focus)
- 💡 Reads and Writes are “kinda, maybe”
- Soft state = Consistency is not enforced in DB, instead offloaded onto consumers
- Developers must care for consistency and state in the apps that use BASE DBs
- By default, returned data might not be the latest data, so…:
- EITHER apps need to tolerate this fact…
- OR apps must specify use of consistent operations, if the DB allows them
- Eventually consistent = Immediate consistency is not enforced
- Consistency might happen… eventually! 😅
- A read will match the latest write as long as we wait long enough
- 💡 How long must we wait? Unspecified! We only know that eventually the read will be consistent with the latest write!
- Additional reference: https://en.wikipedia.org/wiki/Eventual_consistency
- 💡 All BASE characteristics are related (soft state implies eventual consistency, etc)
- ‼️ Many BASE DBs allow immediately consistent operations with guaranteed latest state!
- Default = Eventually consistent operations with no guarantee for latest state
- To guarantee latest state, app must have awareness and explicitly/specifically ask the DB for immediately consistent reads