Good data
- To train a ML model we must have good training data
- 💡 Garbage in → Garbage out
- ‼️ Most critical stage to build a good model!
- What is good data? Depends on scenario and requirements!
- Several options to model our data, which will impact the types of
algorithms we can use to train our models
- Labeled vs. Unlabeled Data
- Structured vs. Unstructured Data
Labeled vs Unlabeled Data
Labeled Data
- Data includes both input objects and corresponding output labels
- Example
Unlabeled Data
- Data includes only input objects (without any output labels)
- Example
- 💡 Models that are trained with labeled data are more precise than those trained with unlabeled data… so why use unlabeled data? Because often it can be a pain to label loads of unlabeled data, so it's more efficient to use unsupervised learning!
Structured vs Unstructured Data
Structured Data
- Data has structure (like in a DB)
- Examples
- Tabular Data: tables with rows and columns
- Time Series Data: sequential datapoints in time