Amazon Polly | Notion

Amazon Polly 101

🔧 Text-to-Speech (TTS) service
- ❗ NO translation done in the conversion, same language!
Voice engines:
1. Standard TTS = Concatenative (concatenates phonemes)
  - 💡 Phoneme = smallest unit of sound in a language
2. Neural TTS = Phonemes → spectograms → vocoder → audio
  - Much more human/natural sounding, but much more complex and computationally heavy
3. Newer engines: long-form, generative (more powerful)
Features
- Integration with other services and apps
  - e.g. WordPress plugin to read WordPress articles out loud
- Many output formats supported (MP3, PCM, Ogg Vorbis…)
- Supports Speech Synthesis Markup Language (SSML) → markup tags provide additional control over how speech is generated
  - e.g. emphasis, pronunciation, whispering, over-exaggerated “newscaster” speaking style
- Lexicons: define how to read certain specific text
  - e.g. “AWS → Amazon Web Services”
- Speech mark: encode where a sentence/word starts or ends in audio
  - helpful for e.g. lip-syncing or word highlighting
Screenshot