Ref: https://learn.cantrill.io/courses/1820301/lectures/41301566
Amazon Athena - Core Concepts
- 🔧 Serverless interactive querying service
- Ad-hoc SQL-like queries on data stored in S3
- 💡 Athena can also read data from other sources with federated queries, but for simplicity we assume S3 for most of the Athena section
- âť—Â Pay ONLY for data consumed while running query (as well as storage costs in S3, but nothing else!)
- Schema-on-read
- Defined schema modifies data in flight during query process → table-like translation
- Schema translates data → relational-like when read
- ‼️ Original data never changed → remains unchanged in S3
- Conceptually, consider S3 data as read-only
- 💡 Like a special lens to enhance text reading
- Original text remains unaltered, but seeing it through lens makes it look different
- 💡 Many other DBs have schema-on-write, where data must be loaded into the defined schemas… but with schema-on-read this isn't necessary
Athena Architecture
- Source data can be in many formats (XML, JSON, AVRO, Log formats…)
- Schema (which contains tables) defined within Athena
- Defines how to present the source data in a table-like structure
- âť—Â Tables don't actually contain data (like traditional DBs), rather directives on how to convert source data so it can be queryable
- Data conceptually streamed/projected through schema while being queried
- Output can be sent to other services (e.g. visualization tools like Amazon QuickSight)
- ‼️ No base/upfront cost! Only pay for performing queries!
- Dataset can also be optimized to reduce query costs!
- ‼️ Athena has no infrastructure!
- No DBs, no servers, no need to load data nor think about data transformation in advance…
- ❗ Athena Federated Queries → for querying non-S3 data sources
- Use data source connectors that run on Lambda
- Architecture Diagram
Athena Use Cases
- Queries where data loading/transformation isn't desired
- Occasional/Ad-hoc queries on data in S3
- No need for servers nor thinking about data ETL
- Serverless querying scenarios → cost conscious