Ref: https://learn.cantrill.io/courses/1820301/lectures/42176882
Amazon Textract - Key Concepts
- đź”§Â Detect, extract and analyze text contained in input documents
- Common documents: receipts, invoices, medical records, identity documents…
- Outputs extracted text with structure and analysis
- Functionality
- Detection of text
- e.g. scanned invoice or receipt → detects prices, products, dates…
- Text can be computer generated or handwritten
- Detects relationships in the text
- e.g. associated tax in the price
- Metadata generation
- e.g. where exactly in the receipt is price shown
- Features
- Several supported input formats: JPEG, PNG, PDF, TIFF
- For most documents synchronous operation (real-time)
- Large documents (e.g. big PDFs) → asynchronous
- Pay-per-use (with custom pricing available for large volume)
- Use cases
- Document analysis (names, address, birth date…)
- Receipt analysis (prices, vendor, line items, dates…)
- Identity documents (abstract fields e.g. PassportID…)