Database Models

Ref: https://learn.cantrill.io/courses/1820301/lectures/41301414 and https://learn.cantrill.io/courses/1820301/lectures/41301415

Summary:
Contents:

Databases - Basic Concepts

🔧 Database (DB) = a system that stores and manages data
- ‼️ Do not confuse with plain storage systems!!
  - Storage means raw data without intelligence (files, images, videos…)
  - Databases provide structure and computing abilities (query, sort, process, etc). A query language can retrieve data records from a database.
  - 💡 Think of it like a box of receipts (storage) versus a spreadsheet of income, expenses, deductions, etc (database)
  - 💡 Today, the line blurs with cloud services allowing to query cloud storage!
- Many different types of DBs:
  - Crucial differences in disk storage, data management (in disk and in memory) & data retrieval and presentation to end user
  - Broad split of relational (SQL) and non-relational (NoSQL)
Schema = a rigid/fixed structure defined for data records in a DB
- Defines attribute names, valid values, data types, hierarchies, data placement…
- ‼️ Fixed = defined in advance, before any data enters the DB. Difficult to change later!
Key/index = attribute present in all data records of the same group, unique for each record
- Allows to uniquely identify a data record in its group
- Composite key = a unique combination of 2+ keys (or 2+ key parts)
  - A key part can be repeated across composite keys, but combinations must be unique

Relational Databases = SQL Databases = RDBMS

SQL (pronounced “sequel” or “S-Q-L”) = Structured Query Language
- Query language that can be used to query relational DBs
- 💡 Strictly speaking, SQL is only a query language, not a DB model. However, many people use interchangeably: SQL DB = relational DB & NoSQL DB = non-relational DB
🔧 RDBMS = Relational Database Management System = Relational DB = SQL Database
- Usually has SQL features (but not all RDBMS necessarily have all SQL features)
Table = grouping of data records/entries related to each other (have common attributes)
- e.g. in example below, humans stored in the Humans table, animals in the Animals table
- Rigid structure (defined in schemas)
- Rows (entries) + columns (attributes)
  - Each column has a name
  - ‼️ Every row must have a value for each column
    - 💡 This is generally not the case for non-relational or NoSQL databases
  - Every row uniquely identifiable via the Primary Key attribute/column
Join table: defines a relationship between tables
- Also rigid structure (defined in schemas)
- Relationships can be 1-to-1, 1-to-many, many-to-many… → RELATIONAL databases
- Each entry has a unique composite key (combination of primary keys from both tables)
- e.g. Humans and animals have a many-to-many relationship in the example below
  - a human can have many animals
  - an animal can have many human minions 🙂
  - 💡 Notice how two different rows have both A_ID=2, but that's fine because H_ID=2 in one and H_ID=3 in another → composite keys (2,2) and (3,2) are different & unique
Tables, relationships and keys are fixed and defined in advance via schemas
- 👎 Very difficult to change after inserting data
- Hence, NOT a good choice for storing data that has rapidly changing relationships (e.g. social media like Facebook)
An example diagram of a RDBMS with humans and animals:

SQL DBs: Row Databases vs Column Databases

Untitled

Row-based databases (OLTP DBs)
- 🔧 Optimized for data interaction based on table rows
  - Items (rows) are stored on disk together
    - Advantage: efficient CRUD operations on rows (items)
  - Often called OnLine Transactional Processing (OLTP) databases
    - Ideal for DBs that perform transactions: items/rows are constantly accessed, modified & removed
    - Use cases examples: audit DBs, contact DBs, stock DBs…
  - Used by most SQL DBs
- 👎 Disadvantage: inefficient operations on columns (attributes)
  - Without indexes or shortcuts, column operations imply finding their rows first
    - Reading an attribute of an item implies reading the whole row in the DB first
    - 💡 That can mean scanning through rows and rows of data before reaching the ones you want to query
- Example: DB with product orders (rows=orders, attributes=id, product, color, size, price)
  - 👍 Inserting a new order or removing a whole order is easy and efficient
  - 👎 Querying the sizes of all orders in costly and inefficient
    - Implies querying all rows, finding the size of each