Use as_frame=False when calling fetch_openml()

2021-03-02 09:29:06 +13:00 · 2021-03-02 09:29:06 +13:00 · 346dfe6d1e
parent 5663779ae8
commit 346dfe6d1e
4 changed files with 35 additions and 7 deletions
--- a/03_classification.ipynb
+++ b/03_classification.ipynb
@ -84,6 +84,13 @@
    "# MNIST"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this and keep the same code as in the book, we use `as_frame=False`."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 2,
@ -91,7 +98,7 @@
   "outputs": [],
   "source": [
    "from sklearn.datasets import fetch_openml\n",
-    "mnist = fetch_openml('mnist_784', version=1)\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False)\n",
    "mnist.keys()"
   ]
  },
@ -2588,7 +2595,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.7.9"
  },
  "nav_menu": {},
  "toc": {
--- a/05_support_vector_machines.ipynb
+++ b/05_support_vector_machines.ipynb
@ -1381,6 +1381,13 @@
    "First, let's load the dataset and split it into a training set and a test set. We could use `train_test_split()` but people usually just take the first 60,000 instances for the training set, and the last 10,000 instances for the test set (this makes it possible to compare your model's performance with others): "
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this, we use `as_frame=False`."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 47,
@ -1388,7 +1395,7 @@
   "outputs": [],
   "source": [
    "from sklearn.datasets import fetch_openml\n",
-    "mnist = fetch_openml('mnist_784', version=1, cache=True)\n",
+    "mnist = fetch_openml('mnist_784', version=1, cache=True, as_frame=False)\n",
    "\n",
    "X = mnist[\"data\"]\n",
    "y = mnist[\"target\"].astype(np.uint8)\n",
@ -1837,7 +1844,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.7.9"
  },
  "nav_menu": {},
  "toc": {
--- a/07_ensemble_learning_and_random_forests.ipynb
+++ b/07_ensemble_learning_and_random_forests.ipynb
@ -452,6 +452,13 @@
    "## Feature importance"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this and keep the same code as in the book, we use `as_frame=False`."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 25,
@ -460,7 +467,7 @@
   "source": [
    "from sklearn.datasets import fetch_openml\n",
    "\n",
-    "mnist = fetch_openml('mnist_784', version=1)\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False)\n",
    "mnist.target = mnist.target.astype(np.uint8)"
   ]
  },
@ -1395,7 +1402,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.7.8"
+   "version": "3.7.9"
  },
  "nav_menu": {
   "height": "252px",
--- a/09_unsupervised_learning.ipynb
+++ b/09_unsupervised_learning.ipynb
@ -969,6 +969,13 @@
    "If the dataset does not fit in memory, the simplest option is to use the `memmap` class, just like we did for incremental PCA in the previous chapter. First let's load MNIST:"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this and keep the same code as in the book, we use `as_frame=False`."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 46,
@ -978,7 +985,7 @@
    "import urllib.request\n",
    "from sklearn.datasets import fetch_openml\n",
    "\n",
-    "mnist = fetch_openml('mnist_784', version=1)\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False)\n",
    "mnist.target = mnist.target.astype(np.int64)"
   ]
  },