‼️ NOTE!! S3 Select and Glacier Select have been DEPRECATED and DISCONTINUED for a while now. Cantrill still has this lecture in his course, and the feature MIGHT still pop out in an exam, but it SHOULDN'T! I'm leaving these notes here for completion's sake, but feel free to skip it completely.
Ref: https://learn.cantrill.io/courses/1820301/lectures/41301491
S3/Glacier Select - Overview
Diagram: https://github.com/acantril/aws-sa-associate-saac03/blob/main/0700-SIMPLE_STORAGE_SERVICE(S3)/00_LEARNINGAIDS/S3andGlacierSelect.png
- Retrieving a large object can be costly…:
- Downloading a 5TB object takes time
- You get billed for 5TB of data transfer
- ‼️ Filtering data client-side solves nothing!!
- You download whole object, then discard unused data…
- Damage is done, you had to download whole object first, took time and $$$
- 🔧 S3/Glacier Select retrieves PARTS of S3 objects (instead of the entire objects)
- Pre-filtering done in S3: use SQL-like statements to filter objects server-side
- Features
- 👍 Up to 400% faster & 80% cheaper than client-side filtering
- Allowed file formats: CSV, JSON, Parquet, BZIP2 compression for CSV & JSON
- Many formats supported → flexible product
- Disabled by default, must be enabled if desired