Reinforcement Learning - Basic Concepts

Ref: https://www.udemy.com/course/aws-ai-practitioner-certified/learn/lecture/44886629


Reinforcement Learning from Human Feedback (RLHF)

Ref: https://www.udemy.com/course/aws-ai-practitioner-certified/learn/lecture/45375323

RLHF process (example: internal company knowledge chatbot)

  1. Data collection
  2. Supervised fine-tuning of a model
  3. Build a separate reward model
  4. Optimize the original model with the reward-based model