Amazon Textract

Ref: https://learn.cantrill.io/courses/1820301/lectures/42176882

Amazon Textract - Key Concepts

🔧 Detect, extract and analyze text contained in input documents
- Common documents: receipts, invoices, medical records, identity documents…
- Outputs extracted text with structure and analysis
Functionality
- Detection of text
  - e.g. scanned invoice or receipt → detects prices, products, dates…
  - Text can be computer generated or handwritten
- Detects relationships in the text
  - e.g. associated tax in the price
- Metadata generation
  - e.g. where exactly in the receipt is price shown
Features
- Several supported input formats: JPEG, PNG, PDF, TIFF
- For most documents synchronous operation (real-time)
  - Large documents (e.g. big PDFs) → asynchronous
- Pay-per-use (with custom pricing available for large volume)
Use cases
- Document analysis (names, address, birth date…)
- Receipt analysis (prices, vendor, line items, dates…)
- Identity documents (abstract fields e.g. PassportID…)