Pricing modes
On-Demand
- Pay-as-you-go → no long-term commitment
- great for unpredictable workloads
- ‼️ Works with Base Models only!
- Model charges:
- Text Models – charged for every input/output token processed
- Embedding Models – charged for every input token processed
- Image Models – charged for every image generated
Batch
- Multiple predictions at a time, output is a single file in S3
- Discounts of up to 50%
- 💡 Answers to prompts are no longer in real time, but can get significant discounts
Provisioned Throughput
- Reserves throughput/capacity for a certain time (1 month, 6 months…)
- Throughput = max number of input/output tokens processed per minute
- ‼️ Required for Fine-tuned and Custom Models!
- Base Models can also use provisioned throughput, but not required
Other cost considerations
- Number of input & output tokens → main driver of cost
- Model size → usually a smaller model is cheaper
- Varies based on providers
- Smaller models also have less capabilities and capacity