ML Inference - basic concept and types
- ๐งย Inference = trained ML model makes prediction (output) on new data
Real-Time Inferencing
- Computers have to make decisions quickly as data arrives
- Speed (low latency) is preferred over perfect accuracy โ synchronous results
- Example: chatbots
- Diagram
Batch Inferencing
- Large amount of data that is analyzed all at once
- Speed of the results usually not a concern (high latency), but accuracy is โ asynchronous results
- Often used for data analysis
- Diagram
Inferencing at the Edge
- Edge device: a device with generally less computing power, close to where data is generated, where internet connections can be limited
- Small Language Model (SLM) on an edge device
- Very low latency, low compute footprint
- Offline capability, local inference
- Example: SLM on a RaspberryPi
- Large Language Model (LLM) on a remote server
- More powerful model
- Higher latency
- Must be online to be accessed, remote inference
- Example: RaspberryPi accesses LLM in remote server