Ref: https://learn.cantrill.io/courses/1820301/lectures/41301414 and https://learn.cantrill.io/courses/1820301/lectures/41301415
Databases - Basic Concepts
- đź”§Â Database (DB) = a system that stores and manages data
- ‼️ Do not confuse with plain storage systems!!
- Storage means raw data without intelligence (files, images, videos…)
- Databases provide structure and computing abilities (query, sort, process, etc). A query language can retrieve data records from a database.
- 💡 Think of it like a box of receipts (storage) versus a spreadsheet of income, expenses, deductions, etc (database)
- 💡 Today, the line blurs with cloud services allowing to query cloud storage!
- Many different types of DBs:
- Schema = a rigid/fixed structure defined for data records in a DB
- Defines attribute names, valid values, data types, hierarchies, data placement…
- ‼️ Fixed = defined in advance, before any data enters the DB. Difficult to change later!
- Key/index = attribute present in all data records of the same group, unique for each record
- Allows to uniquely identify a data record in its group
- Composite key = a unique combination of 2+ keys (or 2+ key parts)
- A key part can be repeated across composite keys, but combinations must be unique
Relational Databases = SQL Databases = RDBMS
- SQL (pronounced “sequel” or “S-Q-L”) = Structured Query Language
- Query language that can be used to query relational DBs
- 💡 Strictly speaking, SQL is only a query language, not a DB model. However, many people use interchangeably: SQL DB = relational DB & NoSQL DB = non-relational DB
- đź”§Â RDBMS = Relational Database Management System = Relational DB = SQL Database
- Usually has SQL features (but not all RDBMS necessarily have all SQL features)
- Table = grouping of data records/entries related to each other (have common attributes)
- e.g. in example below, humans stored in the Humans table, animals in the Animals table
- Rigid structure (defined in schemas)
- Rows (entries) + columns (attributes)
- Each column has a name
- ‼️ Every row must have a value for each column
- 💡 This is generally not the case for non-relational or NoSQL databases
- Every row uniquely identifiable via the Primary Key attribute/column
- Join table: defines a relationship between tables
- Also rigid structure (defined in schemas)
- Relationships can be 1-to-1, 1-to-many, many-to-many… → RELATIONAL databases
- Each entry has a unique composite key (combination of primary keys from both tables)
- e.g. Humans and animals have a many-to-many relationship in the example below
- a human can have many animals
- an animal can have many human minions 🙂
- 💡 Notice how two different rows have both A_ID=2, but that's fine because H_ID=2 in one and H_ID=3 in another → composite keys (2,2) and (3,2) are different & unique
- Tables, relationships and keys are fixed and defined in advance via schemas
- 👎 Very difficult to change after inserting data
- Hence, NOT a good choice for storing data that has rapidly changing relationships (e.g. social media like Facebook)
- An example diagram of a RDBMS with humans and animals:
SQL DBs: Row Databases vs Column Databases

- Row-based databases (OLTP DBs)
- đź”§Â Optimized for data interaction based on table rows
- Items (rows) are stored on disk together
- Advantage: efficient CRUD operations on rows (items)
- Often called OnLine Transactional Processing (OLTP) databases
- Ideal for DBs that perform transactions: items/rows are constantly accessed, modified & removed
- Use cases examples: audit DBs, contact DBs, stock DBs…
- Used by most SQL DBs
- 👎 Disadvantage: inefficient operations on columns (attributes)
- Without indexes or shortcuts, column operations imply finding their rows first
- Reading an attribute of an item implies reading the whole row in the DB first
- 💡 That can mean scanning through rows and rows of data before reaching the ones you want to query
- Example: DB with product orders (rows=orders, attributes=id, product, color, size, price)
- 👍 Inserting a new order or removing a whole order is easy and efficient
- 👎 Querying the sizes of all orders in costly and inefficient
- Implies querying all rows, finding the size of each