From 771dccaca4d8c5cd1c41783df1f19a47c124052e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Aur=C3=A9lien=20Geron?= <ageron@users.noreply.github.com>
Date: Mon, 7 May 2018 21:09:08 +0200
Subject: [PATCH] Clarify future encoders in Scikit-Learn 0.20

---
 02_end_to_end_machine_learning_project.ipynb |  4 +-
 03_classification.ipynb                      | 43 +++++++++-----------
 2 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/02_end_to_end_machine_learning_project.ipynb b/02_end_to_end_machine_learning_project.ipynb
index 3545638..b82b519 100644
--- a/02_end_to_end_machine_learning_project.ipynb
+++ b/02_end_to_end_machine_learning_project.ipynb
@@ -798,7 +798,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "**Warning**: earlier versions of the book used the `LabelEncoder` class or Pandas' `Series.factorize()` method to encode string categorical attributes as integers. The `OrdinalEncoder` class that is planned to be introduced in Scikit-Learn 0.20 (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)) is preferable since it is designed for input features (`X` instead of labels `y`) and it plays well with pipelines, as we will see later in this notebook. For now, we will import it from `future_encoders.py`, but when it is available you can change `future_encoders` to `sklearn.preprocessing`."
+    "**Warning**: earlier versions of the book used the `LabelEncoder` class or Pandas' `Series.factorize()` method to encode string categorical attributes as integers. However, the `OrdinalEncoder` class that is planned to be introduced in Scikit-Learn 0.20 (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)) is preferable since it is designed for input features (`X` instead of labels `y`) and it plays well with pipelines (introduced later in this notebook). For now, we will import it from `future_encoders.py`, but once it is available you can import it directly from `sklearn.preprocessing`."
    ]
   },
   {
@@ -834,7 +834,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can convert each categorical value to a one-hot vector using a `OneHotEncoder`. Right now this class can only handle integer categorical inputs, but in Scikit-Learn 0.20 it will handle string categorical inputs. So for now we import it from `future_encoders.py`, but when Scikit-Learn 0.20 is released, you can import it from `sklearn.preprocessing` instead:"
+    "**Warning**: earlier versions of the book used the `LabelBinarizer` or `CategoricalEncoder` classes to convert each categorical value to a one-hot vector. It is now preferable to use the `OneHotEncoder` class. Right now it can only handle integer categorical inputs, but in Scikit-Learn 0.20 it will also handle string categorical inputs (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)). So for now we import it from `future_encoders.py`, but when Scikit-Learn 0.20 is released, you can import it from `sklearn.preprocessing` instead:"
    ]
   },
   {
diff --git a/03_classification.ipynb b/03_classification.ipynb
index 6cfb7f6..5a262b5 100644
--- a/03_classification.ipynb
+++ b/03_classification.ipynb
@@ -1513,25 +1513,6 @@
     "The Embarked attribute tells us where the passenger embarked: C=Cherbourg, Q=Queenstown, S=Southampton."
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The `OneHotEncoder` class will allow us to convert categorical attributes to one-hot vectors. Since Scikit-Learn 0.20, this class can handle string categorical attributes, which is what we need. In case you are using an older version of Scikit-Learn, we get the latest version of this class from `future_encoders.py`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 110,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "try:\n",
-    "    from sklearn.preprocessing import OrdinalEncoder, OneHotEncoder\n",
-    "except:\n",
-    "    from future_encoders import OrdinalEncoder, OneHotEncoder"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1541,7 +1522,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 111,
+   "execution_count": 110,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1567,7 +1548,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 112,
+   "execution_count": 111,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1584,7 +1565,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 113,
+   "execution_count": 112,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1600,7 +1581,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 114,
+   "execution_count": 113,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1614,6 +1595,22 @@
     "        return X.fillna(self.most_frequent_)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can convert each categorical value to a one-hot vector using a `OneHotEncoder`. Right now this class can only handle integer categorical inputs, but in Scikit-Learn 0.20 it will also handle string categorical inputs (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)). So for now we import it from `future_encoders.py`, but when Scikit-Learn 0.20 is released, you can import it from `sklearn.preprocessing` instead:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 114,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from future_encoders import OneHotEncoder"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},