Use as_frame=False when calling fetch_openml()

main
Aurélien Geron 2021-03-02 09:29:06 +13:00
parent 5663779ae8
commit 346dfe6d1e
4 changed files with 35 additions and 7 deletions

View File

@ -84,6 +84,13 @@
"# MNIST" "# MNIST"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this and keep the same code as in the book, we use `as_frame=False`."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 2, "execution_count": 2,
@ -91,7 +98,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from sklearn.datasets import fetch_openml\n", "from sklearn.datasets import fetch_openml\n",
"mnist = fetch_openml('mnist_784', version=1)\n", "mnist = fetch_openml('mnist_784', version=1, as_frame=False)\n",
"mnist.keys()" "mnist.keys()"
] ]
}, },
@ -2588,7 +2595,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.7.8" "version": "3.7.9"
}, },
"nav_menu": {}, "nav_menu": {},
"toc": { "toc": {

View File

@ -1381,6 +1381,13 @@
"First, let's load the dataset and split it into a training set and a test set. We could use `train_test_split()` but people usually just take the first 60,000 instances for the training set, and the last 10,000 instances for the test set (this makes it possible to compare your model's performance with others): " "First, let's load the dataset and split it into a training set and a test set. We could use `train_test_split()` but people usually just take the first 60,000 instances for the training set, and the last 10,000 instances for the test set (this makes it possible to compare your model's performance with others): "
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this, we use `as_frame=False`."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 47, "execution_count": 47,
@ -1388,7 +1395,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from sklearn.datasets import fetch_openml\n", "from sklearn.datasets import fetch_openml\n",
"mnist = fetch_openml('mnist_784', version=1, cache=True)\n", "mnist = fetch_openml('mnist_784', version=1, cache=True, as_frame=False)\n",
"\n", "\n",
"X = mnist[\"data\"]\n", "X = mnist[\"data\"]\n",
"y = mnist[\"target\"].astype(np.uint8)\n", "y = mnist[\"target\"].astype(np.uint8)\n",
@ -1837,7 +1844,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.7.8" "version": "3.7.9"
}, },
"nav_menu": {}, "nav_menu": {},
"toc": { "toc": {

View File

@ -452,6 +452,13 @@
"## Feature importance" "## Feature importance"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this and keep the same code as in the book, we use `as_frame=False`."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 25, "execution_count": 25,
@ -460,7 +467,7 @@
"source": [ "source": [
"from sklearn.datasets import fetch_openml\n", "from sklearn.datasets import fetch_openml\n",
"\n", "\n",
"mnist = fetch_openml('mnist_784', version=1)\n", "mnist = fetch_openml('mnist_784', version=1, as_frame=False)\n",
"mnist.target = mnist.target.astype(np.uint8)" "mnist.target = mnist.target.astype(np.uint8)"
] ]
}, },
@ -1395,7 +1402,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.7.8" "version": "3.7.9"
}, },
"nav_menu": { "nav_menu": {
"height": "252px", "height": "252px",

View File

@ -969,6 +969,13 @@
"If the dataset does not fit in memory, the simplest option is to use the `memmap` class, just like we did for incremental PCA in the previous chapter. First let's load MNIST:" "If the dataset does not fit in memory, the simplest option is to use the `memmap` class, just like we did for incremental PCA in the previous chapter. First let's load MNIST:"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this and keep the same code as in the book, we use `as_frame=False`."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 46, "execution_count": 46,
@ -978,7 +985,7 @@
"import urllib.request\n", "import urllib.request\n",
"from sklearn.datasets import fetch_openml\n", "from sklearn.datasets import fetch_openml\n",
"\n", "\n",
"mnist = fetch_openml('mnist_784', version=1)\n", "mnist = fetch_openml('mnist_784', version=1, as_frame=False)\n",
"mnist.target = mnist.target.astype(np.int64)" "mnist.target = mnist.target.astype(np.int64)"
] ]
}, },