Amazon Transcribe

Amazon Transcribe 101

🔧 Automatic Speech Recognition (ASR) service
- Audio (input) → Text (output)
- Pay-per-use: billed per second of transcribed audio
- 💡 ASR is a DL process
Features:
- Automatic language identification for multi-lingual audio
- Language customization
- Filters for privacy
  - e.g. filter out PII (identify and redact PII)
- Audience-appropriate language
- Speaker identification
Improve accuracy for domain-specific, non-standard terms with:
- Custom vocabularies (words)
- Custom language models (context)
  - Provide domain-specific text for Transcribe to learn context of the domain-specific words
- 💡 Use both for the highest transcription accuracy
Use cases
- Full text indexing of audio → allows searching
- Meeting notes
- Subtitles/captions and transcripts
- Amazon Transcribe Call Analytics → Phone call analytics
  - characteristics, summarization, categories, sentiment
- Amazon Transcribe Medical
- Integration with other apps and services
Screenshot

Transcribe can detect and score voice-based toxicity
- Leverages both speech cues (tone, pitch) & text cues
Toxicity categories: sexual harassment, hate speech, threat, abuse, profanity, insult, graphic…
Screenshot