Rename sparse to sparse_output in OneHotEncoder, set numeric_only=True in corr(), set n_init=10 in KMeans

main
Aurélien Geron 2023-11-14 15:56:52 +13:00
parent 4dc4a21367
commit 1dd8dba21d
1 changed files with 22 additions and 7 deletions

View File

@ -1199,13 +1199,20 @@
"## Looking for Correlations" "## Looking for Correlations"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: since Pandas 2.0.0, the `numeric_only` argument defaults to `False`, so we need to set it explicitly to True to avoid an error."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 35, "execution_count": 35,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"corr_matrix = housing.corr()" "corr_matrix = housing.corr(numeric_only=True)"
] ]
}, },
{ {
@ -1337,7 +1344,7 @@
} }
], ],
"source": [ "source": [
"corr_matrix = housing.corr()\n", "corr_matrix = housing.corr(numeric_only=True)\n",
"corr_matrix[\"median_house_value\"].sort_values(ascending=False)" "corr_matrix[\"median_house_value\"].sort_values(ascending=False)"
] ]
}, },
@ -2551,7 +2558,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Alternatively, you can set `sparse=False` when creating the `OneHotEncoder`:" "Alternatively, you can set `sparse_output=False` when creating the `OneHotEncoder` (note: the `sparse` hyperparameter was renamned to `sparse_output` in Scikit-Learn 1.2):"
] ]
}, },
{ {
@ -2577,7 +2584,7 @@
} }
], ],
"source": [ "source": [
"cat_encoder = OneHotEncoder(sparse=False)\n", "cat_encoder = OneHotEncoder(sparse_output=False)\n",
"housing_cat_1hot = cat_encoder.fit_transform(housing_cat)\n", "housing_cat_1hot = cat_encoder.fit_transform(housing_cat)\n",
"housing_cat_1hot" "housing_cat_1hot"
] ]
@ -3299,7 +3306,8 @@
" self.random_state = random_state\n", " self.random_state = random_state\n",
"\n", "\n",
" def fit(self, X, y=None, sample_weight=None):\n", " def fit(self, X, y=None, sample_weight=None):\n",
" self.kmeans_ = KMeans(self.n_clusters, random_state=self.random_state)\n", " self.kmeans_ = KMeans(self.n_clusters, n_init=10,\n",
" random_state=self.random_state)\n",
" self.kmeans_.fit(X, sample_weight=sample_weight)\n", " self.kmeans_.fit(X, sample_weight=sample_weight)\n",
" return self # always return self!\n", " return self # always return self!\n",
"\n", "\n",
@ -3310,6 +3318,13 @@
" return [f\"Cluster {i} similarity\" for i in range(self.n_clusters)]" " return [f\"Cluster {i} similarity\" for i in range(self.n_clusters)]"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: The default value for the `n_init` hyperparameter above will change from 10 to `\"auto\"` in Scikit-Learn 1.4, so I'm setting it explicitly to 10 to keep this notebook stable."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 97, "execution_count": 97,
@ -6238,7 +6253,7 @@
], ],
"metadata": { "metadata": {
"kernelspec": { "kernelspec": {
"display_name": "Python 3", "display_name": "Python 3 (ipykernel)",
"language": "python", "language": "python",
"name": "python3" "name": "python3"
}, },
@ -6252,7 +6267,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.6" "version": "3.10.13"
}, },
"nav_menu": { "nav_menu": {
"height": "279px", "height": "279px",