ML Model Inference

Ref: https://www.udemy.com/course/aws-ai-practitioner-certified/learn/lecture/44886645

ML Inference - Key Concepts

Computers have to make decisions quickly as data arrives
Speed (low latency) is preferred over perfect accuracy → synchronous results
Example: chatbots

Large amount of data that is analyzed all at once
Speed of the results usually not a concern (high latency), but accuracy is → asynchronous results
Often used for data analysis

Edge device: a device with generally less computing power, close to where data is generated, where internet connections can be limited
Small Language Model (SLM) on an edge device
- 👍 Very low latency, low compute footprint
- 👍 Offline capability, local inference
- 👎 Less powerful models than remote server hosting
- Example: SLM on a RaspberryPi
Large Language Model (LLM) on a remote server
- 👍 More powerful model
- 👎 Higher latency
- 👎 Must be online to be accessed, remote inference
- Example: Raspberry Pi accesses LLM in remote server