Fix typos in ML project checklist and requirements
parent
ec67af216b
commit
248c9a78d7
|
@ -75,7 +75,7 @@ Notes:
|
||||||
- Fill in missing values (e.g., with zero, mean, median...) or drop their rows (or columns).
|
- Fill in missing values (e.g., with zero, mean, median...) or drop their rows (or columns).
|
||||||
2. Feature selection (optional):
|
2. Feature selection (optional):
|
||||||
- Drop the attributes that provide no useful information for the task.
|
- Drop the attributes that provide no useful information for the task.
|
||||||
3. Feature engineering, where appropriates:
|
3. Feature engineering, where appropriate:
|
||||||
- Discretize continuous features.
|
- Discretize continuous features.
|
||||||
- Decompose features (e.g., categorical, date/time, etc.).
|
- Decompose features (e.g., categorical, date/time, etc.).
|
||||||
- Add promising transformations of features (e.g., log(x), sqrt(x), x^2, etc.).
|
- Add promising transformations of features (e.g., log(x), sqrt(x), x^2, etc.).
|
||||||
|
@ -104,8 +104,8 @@ Notes:
|
||||||
|
|
||||||
1. Fine-tune the hyperparameters using cross-validation.
|
1. Fine-tune the hyperparameters using cross-validation.
|
||||||
- Treat your data transformation choices as hyperparameters, especially when you are not sure about them (e.g., should I replace missing values with zero or the median value? Or just drop the rows?).
|
- Treat your data transformation choices as hyperparameters, especially when you are not sure about them (e.g., should I replace missing values with zero or the median value? Or just drop the rows?).
|
||||||
- Unless there are very few hyperparamter values to explore, prefer random search over grid search. If training is very long, you may prefer a Bayesian optimization approach (e.g., using a Gaussian process priors, as described by Jasper Snoek, Hugo Larochelle, and Ryan Adams ([https://goo.gl/PEFfGr](https://goo.gl/PEFfGr)))
|
- Unless there are very few hyperparameter values to explore, prefer random search over grid search. If training is very long, you may prefer a Bayesian optimization approach (e.g., using a Gaussian process priors, as described by Jasper Snoek, Hugo Larochelle, and Ryan Adams ([https://goo.gl/PEFfGr](https://goo.gl/PEFfGr)))
|
||||||
2. Try Ensemble methods. Combining your best models will often perform better than running them invdividually.
|
2. Try Ensemble methods. Combining your best models will often perform better than running them individually.
|
||||||
3. Once you are confident about your final model, measure its performance on the test set to estimate the generalization error.
|
3. Once you are confident about your final model, measure its performance on the test set to estimate the generalization error.
|
||||||
|
|
||||||
> Don't tweak your model after measuring the generalization error: you would just start overfitting the test set.
|
> Don't tweak your model after measuring the generalization error: you would just start overfitting the test set.
|
||||||
|
@ -125,5 +125,5 @@ Notes:
|
||||||
2. Write monitoring code to check your system's live performance at regular intervals and trigger alerts when it drops.
|
2. Write monitoring code to check your system's live performance at regular intervals and trigger alerts when it drops.
|
||||||
- Beware of slow degradation too: models tend to "rot" as data evolves.
|
- Beware of slow degradation too: models tend to "rot" as data evolves.
|
||||||
- Measuring performance may require a human pipeline (e.g., via a crowdsourcing service).
|
- Measuring performance may require a human pipeline (e.g., via a crowdsourcing service).
|
||||||
- Also monitor your inputs' quality (e.g., a malfunctioning sensor sending random values, or another team's output becoming stale). This is particulary important for online learning systems.
|
- Also monitor your inputs' quality (e.g., a malfunctioning sensor sending random values, or another team's output becoming stale). This is particularly important for online learning systems.
|
||||||
3. Retrain your models on a regular basis on fresh data (automate as much as possible).
|
3. Retrain your models on a regular basis on fresh data (automate as much as possible).
|
||||||
|
|
|
@ -16,7 +16,7 @@ scikit-learn~=1.0.2
|
||||||
# Optional: the XGBoost library is only used in chapter 7
|
# Optional: the XGBoost library is only used in chapter 7
|
||||||
xgboost~=1.5.0
|
xgboost~=1.5.0
|
||||||
|
|
||||||
# Optional: the transformers library is only using in chapter 16
|
# Optional: the transformers library is only used in chapter 16
|
||||||
transformers~=4.16.2
|
transformers~=4.16.2
|
||||||
|
|
||||||
##### TensorFlow-related packages
|
##### TensorFlow-related packages
|
||||||
|
|
Loading…
Reference in New Issue