From 346dfe6d1eb7100f619a0476c5f94cb52b944d31 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Aur=C3=A9lien=20Geron?= <ageron@users.noreply.github.com>
Date: Tue, 2 Mar 2021 09:29:06 +1300
Subject: [PATCH] Use as_frame=False when calling fetch_openml()

---
 03_classification.ipynb                       | 11 +++++++++--
 05_support_vector_machines.ipynb              | 11 +++++++++--
 07_ensemble_learning_and_random_forests.ipynb | 11 +++++++++--
 09_unsupervised_learning.ipynb                |  9 ++++++++-
 4 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/03_classification.ipynb b/03_classification.ipynb
index cd5ac1e..26ef8f0 100644
--- a/03_classification.ipynb
+++ b/03_classification.ipynb
@@ -84,6 +84,13 @@
     "# MNIST"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this and keep the same code as in the book, we use `as_frame=False`."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 2,
@@ -91,7 +98,7 @@
    "outputs": [],
    "source": [
     "from sklearn.datasets import fetch_openml\n",
-    "mnist = fetch_openml('mnist_784', version=1)\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False)\n",
     "mnist.keys()"
    ]
   },
@@ -2588,7 +2595,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.7.9"
   },
   "nav_menu": {},
   "toc": {
diff --git a/05_support_vector_machines.ipynb b/05_support_vector_machines.ipynb
index 5f68eab..bb9b855 100644
--- a/05_support_vector_machines.ipynb
+++ b/05_support_vector_machines.ipynb
@@ -1381,6 +1381,13 @@
     "First, let's load the dataset and split it into a training set and a test set. We could use `train_test_split()` but people usually just take the first 60,000 instances for the training set, and the last 10,000 instances for the test set (this makes it possible to compare your model's performance with others): "
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this, we use `as_frame=False`."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 47,
@@ -1388,7 +1395,7 @@
    "outputs": [],
    "source": [
     "from sklearn.datasets import fetch_openml\n",
-    "mnist = fetch_openml('mnist_784', version=1, cache=True)\n",
+    "mnist = fetch_openml('mnist_784', version=1, cache=True, as_frame=False)\n",
     "\n",
     "X = mnist[\"data\"]\n",
     "y = mnist[\"target\"].astype(np.uint8)\n",
@@ -1837,7 +1844,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.7.9"
   },
   "nav_menu": {},
   "toc": {
diff --git a/07_ensemble_learning_and_random_forests.ipynb b/07_ensemble_learning_and_random_forests.ipynb
index 63a224e..089f502 100644
--- a/07_ensemble_learning_and_random_forests.ipynb
+++ b/07_ensemble_learning_and_random_forests.ipynb
@@ -452,6 +452,13 @@
     "## Feature importance"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this and keep the same code as in the book, we use `as_frame=False`."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 25,
@@ -460,7 +467,7 @@
    "source": [
     "from sklearn.datasets import fetch_openml\n",
     "\n",
-    "mnist = fetch_openml('mnist_784', version=1)\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False)\n",
     "mnist.target = mnist.target.astype(np.uint8)"
    ]
   },
@@ -1395,7 +1402,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.7.9"
   },
   "nav_menu": {
    "height": "252px",
diff --git a/09_unsupervised_learning.ipynb b/09_unsupervised_learning.ipynb
index aedfa4b..ad9a3b8 100644
--- a/09_unsupervised_learning.ipynb
+++ b/09_unsupervised_learning.ipynb
@@ -969,6 +969,13 @@
     "If the dataset does not fit in memory, the simplest option is to use the `memmap` class, just like we did for incremental PCA in the previous chapter. First let's load MNIST:"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this and keep the same code as in the book, we use `as_frame=False`."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 46,
@@ -978,7 +985,7 @@
     "import urllib.request\n",
     "from sklearn.datasets import fetch_openml\n",
     "\n",
-    "mnist = fetch_openml('mnist_784', version=1)\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False)\n",
     "mnist.target = mnist.target.astype(np.int64)"
    ]
   },