diff --git a/07_ensemble_learning_and_random_forests.ipynb b/07_ensemble_learning_and_random_forests.ipynb index 4f88ecf..db3f001 100644 --- a/07_ensemble_learning_and_random_forests.ipynb +++ b/07_ensemble_learning_and_random_forests.ipynb @@ -682,7 +682,7 @@ "\n", "m = len(X_train)\n", "\n", - "fix, axes = plt.subplots(ncols=2, figsize=(10, 4), sharey=True)\n", + "fig, axes = plt.subplots(ncols=2, figsize=(10, 4), sharey=True)\n", "for subplot, learning_rate in ((0, 1), (1, 0.5)):\n", " sample_weights = np.ones(m) / m\n", " plt.sca(axes[subplot])\n", @@ -773,7 +773,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's create a simple quadratic dataset:" + "Let's create a simple quadratic dataset and fit a `DecisionTreeRegressor` to it:" ] }, { @@ -808,7 +808,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's train a decision tree regressor on this dataset:" + "Now let's train another decision tree regressor on the residual errors made by the previous predictor:" ] }, { @@ -1045,7 +1045,7 @@ "source": [ "# extra code – this cell generates and saves Figure 7–10\n", "\n", - "fix, axes = plt.subplots(ncols=2, figsize=(10, 4), sharey=True)\n", + "fig, axes = plt.subplots(ncols=2, figsize=(10, 4), sharey=True)\n", "\n", "plt.sca(axes[0])\n", "plot_predictions([gbrt], X, y, axes=[-0.5, 0.5, -0.1, 0.8], style=\"r-\",\n", @@ -1251,8 +1251,8 @@ "3. It is quite possible to speed up training of a bagging ensemble by distributing it across multiple servers, since each predictor in the ensemble is independent of the others. The same goes for pasting ensembles and Random Forests, for the same reason. However, each predictor in a boosting ensemble is built based on the previous predictor, so training is necessarily sequential, and you will not gain anything by distributing training across multiple servers. Regarding stacking ensembles, all the predictors in a given layer are independent of each other, so they can be trained in parallel on multiple servers. However, the predictors in one layer can only be trained after the predictors in the previous layer have all been trained.\n", "4. With out-of-bag evaluation, each predictor in a bagging ensemble is evaluated using instances that it was not trained on (they were held out). This makes it possible to have a fairly unbiased evaluation of the ensemble without the need for an additional validation set. Thus, you have more instances available for training, and your ensemble can perform slightly better.\n", "5. When you are growing a tree in a Random Forest, only a random subset of the features is considered for splitting at each node. This is true as well for Extra-Trees, but they go one step further: rather than searching for the best possible thresholds, like regular Decision Trees do, they use random thresholds for each feature. This extra randomness acts like a form of regularization: if a Random Forest overfits the training data, Extra-Trees might perform better. Moreover, since Extra-Trees don't search for the best possible thresholds, they are much faster to train than Random Forests. However, they are neither faster nor slower than Random Forests when making predictions.\n", - "6. If your AdaBoost ensemble underfits the training data, you can try increasing the number of estimators or reducing the regularization hyperparameters of the base estimator. You may also try slightly increasing the learning rate.\n", - "7. If your Gradient Boosting ensemble overfits the training set, you should try decreasing the learning rate. You could also use early stopping to find the right number of predictors (you probably have too many)." + "6. If your AdaBoost ensemble underfits the training data, you can try increasing the number of estimators or reducing the regularization hyperparameters of the base estimator. You may also try slightly decreasing the learning rate.\n", + "7. If your Gradient Boosting ensemble overfits the training set, you should try increasing the learning rate. You could also use early stopping to find the right number of predictors (you probably have too many)." ] }, { @@ -1348,7 +1348,7 @@ { "data": { "text/plain": [ - "[0.9736, 0.9743, 0.8662, 0.9666]" + "[0.9736, 0.9743, 0.8662, 0.966]" ] }, "execution_count": 44, @@ -1440,7 +1440,7 @@ { "data": { "text/plain": [ - "0.9749" + "0.9758" ] }, "execution_count": 49, @@ -1502,7 +1502,7 @@ { "data": { "text/plain": [ - "[0.9736, 0.9743, 0.8662, 0.9666]" + "[0.9736, 0.9743, 0.8662, 0.966]" ] }, "execution_count": 52, @@ -1662,7 +1662,7 @@ { "data": { "text/plain": [ - "0.9761" + "0.9769" ] }, "execution_count": 58, @@ -1698,7 +1698,7 @@ { "data": { "text/plain": [ - "0.9711" + "0.9724" ] }, "execution_count": 60, @@ -1732,7 +1732,7 @@ { "data": { "text/plain": [ - "0.973" + "0.9727" ] }, "execution_count": 61, @@ -1753,7 +1753,7 @@ { "data": { "text/plain": [ - "[0.968, 0.9703, 0.9641]" + "[0.968, 0.9703, 0.965]" ] }, "execution_count": 62, @@ -1855,7 +1855,7 @@ { "data": { "text/plain": [ - "0.9735" + "0.9722" ] }, "execution_count": 66, @@ -1910,7 +1910,7 @@ { "data": { "text/plain": [ - "0.9694" + "0.9705" ] }, "execution_count": 69, @@ -1926,7 +1926,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This stacking ensemble does not perform as well as the voting classifier we trained earlier, and it's even very slightly worse than the best individual classifier." + "This stacking ensemble does not perform as well as the voting classifier we trained earlier." ] }, { @@ -2006,7 +2006,7 @@ { "data": { "text/plain": [ - "0.9785" + "0.9784" ] }, "execution_count": 72,