Clarify the Decision Tree instability section, fixes #422

main
Aurélien Geron 2021-05-27 17:05:34 +12:00
parent 07bc7aff0a
commit 661d591b04
1 changed files with 12 additions and 17 deletions

View File

@ -206,7 +206,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sensitivity to training set details"
"# High Variance"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We've seen that small changes in the dataset (such as a rotation) may produce a very different Decision Tree.\n",
"Now let's show that training the same model on the same data may produce a very different model every time, since the CART training algorithm used by Scikit-Learn is stochastic. To show this, we will set `random_state` to a different value than earlier:"
]
},
{
@ -215,7 +223,8 @@
"metadata": {},
"outputs": [],
"source": [
"X[(X[:, 1]==X[:, 1][y==1].max()) & (y==1)] # widest Iris versicolor flower"
"tree_clf_tweaked = DecisionTreeClassifier(max_depth=2, random_state=40)\n",
"tree_clf_tweaked.fit(X, y)"
]
},
{
@ -223,23 +232,9 @@
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"not_widest_versicolor = (X[:, 1]!=1.8) | (y==2)\n",
"X_tweaked = X[not_widest_versicolor]\n",
"y_tweaked = y[not_widest_versicolor]\n",
"\n",
"tree_clf_tweaked = DecisionTreeClassifier(max_depth=2, random_state=40)\n",
"tree_clf_tweaked.fit(X_tweaked, y_tweaked)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(8, 4))\n",
"plot_decision_boundary(tree_clf_tweaked, X_tweaked, y_tweaked, legend=False)\n",
"plot_decision_boundary(tree_clf_tweaked, X, y, legend=False)\n",
"plt.plot([0, 7.5], [0.8, 0.8], \"k-\", linewidth=2)\n",
"plt.plot([0, 7.5], [1.75, 1.75], \"k--\", linewidth=2)\n",
"plt.text(1.0, 0.9, \"Depth=0\", fontsize=15)\n",