Foreword to this Section

Architecting, building, and improving AI systems is far from trivial. This section includes common problems, challenges, and tradeoffs to watch out for in production-level AI systems. First we expose the challenge, then mention cloud-agnostic techniques to tackle it, and finally the AWS-specific architectures and solutions available to us (but it can all still blend together).


Token Efficiency

Ref: https://www.udemy.com/course/ultimate-aws-certified-generative-ai-developer-professional/learn/lecture/53684395

Token Efficiency Techniques

  1. Count tokens (duh!)
  2. Context window optimization/Context pruning (Input)
  3. Prompt Compression (Input)
  4. Response size controls/Response limiting (Output)

Token Efficiency in AWS


Cost-Effective Model Selection

Ref: https://www.udemy.com/course/ultimate-aws-certified-generative-ai-developer-professional/learn/lecture/53684401