Athena and Glue
Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/46730253
- Athena can automatically build tables from Glue data catalogs
- 💡 Many analytics tools can read Glue data catalogs: RDS, Redshift, Redshift Spectrum, EMR, Hive metastore…
- 🔧 Glue Data Catalogs + Athena “databases” = unified (global) metadata repository across various services
- Glue crawls data, can maintain catalog up-to-date
- useful if underlying data format changes frequently
- Athena provides SQL-interface to catalogs
- Example Diagram
- Glue ETL jobs can be used to transform source data into columnar format, which is much more beneficial for Athena
- 💡 NOTE: In general, Athena uses a managed Glue Data Catalog to store information and schemas
- Athena uses an internal catalog only if in that particular region, Glue is not available
Athena Fine-Grained Access to Glue Data Catalog
Ref: https://www.udemy.com/course/aws-certified-machine-learning-engineer-associate-mla-c01/learn/lecture/46730269
- 🔧 IAM-based security for Athena databases and tables
- Can limit access to specific operations
- Broader than data filters in Lake Formation (i.e. cell-, row- or column-level security)
- âť—Â Can NOT restrict to specific table versions
- At the least, Athena needs full-access to DBs & Glue Data Catalogs in each region
- Permissions can be fine-grained (lock some stuff)
- e.g. restrict access to create DB/table, dropping DB/table, show DB/table… operations
- ❗ Mapping between operations ↔ underlying IAM actions is not always trivial!
- Example: control access to
DROP TABLE
operation requires access to DB, table & partition(s) that the table is in