Add hyperparameters to remove warnings, and sync comments with results

2018-12-25 15:34:36 +08:00 · 2018-12-25 15:34:36 +08:00 · ad125bbdba
parent d0e489afa4
commit ad125bbdba
1 changed files with 36 additions and 26 deletions
--- a/11_deep_learning.ipynb
+++ b/11_deep_learning.ipynb
@ -462,7 +462,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "This activation function was proposed in this [great paper](https://arxiv.org/pdf/1706.02515.pdf) by Günter Klambauer, Thomas Unterthiner and Andreas Mayr, published in June 2017 (I will definitely add it to the book). During training, a neural network composed of a stack of dense layers using the SELU activation function will self-normalize: the output of each layer will tend to preserve the same mean and variance during training, which solves the vanishing/exploding gradients problem. As a result, this activation function outperforms the other activation functions very significantly for such neural nets, so you should really try it out."
+    "This activation function was proposed in this [great paper](https://arxiv.org/pdf/1706.02515.pdf) by Günter Klambauer, Thomas Unterthiner and Andreas Mayr, published in June 2017 (I added paragraph about SELU in the latest release of my book). During training, a neural network composed of a stack of dense layers using the SELU activation function will self-normalize: the output of each layer will tend to preserve the same mean and variance during training, which solves the vanishing/exploding gradients problem. As a result, this activation function outperforms the other activation functions very significantly for such neural nets, so you should really try it out."
   ]
  },
  {
@ -2802,7 +2802,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We get 98.05% accuracy on the test set. That's not too bad, but let's see if we can do better by tuning the hyperparameters."
+    "This test accuracy is not too bad, but let's see if we can do better by tuning the hyperparameters."
   ]
  },
  {
@ -3084,11 +3084,11 @@
    "}\n",
    "\n",
    "rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
-    "                                random_state=42, verbose=2)\n",
+    "                                cv=3, random_state=42, verbose=2)\n",
    "rnd_search.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
    "\n",
    "# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n",
-    "# fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000}\n",
+    "# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
    "# rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
    "#                                 fit_params=fit_params, random_state=42, verbose=2)\n",
    "# rnd_search.fit(X_train1, y_train1)\n"
@ -3117,7 +3117,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Wonderful! Tuning the hyperparameters got us up to 99.32% accuracy! It may not sound like a great improvement to go from 98.05% to 99.32% accuracy, but consider the error rate: it went from roughly 2% to 0.7%. That's a 65% reduction of the number of errors this model will produce!"
+    "Wonderful! Tuning the hyperparameters got us up to 98.91% accuracy! It may not sound like a great improvement to go from 97.26% to 98.91% accuracy, but consider the error rate: it went from roughly 2.6% to 1.1%. That's almost 60% reduction of the number of errors this model will produce!"
   ]
  },
  {
@ -3172,14 +3172,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The best loss is reached at epoch 19, but it was already within 10% of that result at epoch 9."
+    "The best loss is reached at epoch 5."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's check that we do indeed get 99.32% accuracy on the test set:"
+    "Let's check that we do indeed get 98.9% accuracy on the test set:"
   ]
  },
  {
@ -3215,7 +3215,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The best params are reached during epoch 48, that's actually a slower convergence than earlier. Let's check the accuracy:"
+    "The best params are reached during epoch 20, that's actually a slower convergence than earlier. Let's check the accuracy:"
   ]
  },
  {
@ -3232,7 +3232,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Well, batch normalization did not improve accuracy. Let's see if we can find a good set of hyperparameters that will work well with batch normalization:"
+    "Great, batch normalization improved accuracy! Let's see if we can find a good set of hyperparameters that will work even better with batch normalization:"
   ]
  },
  {
@ -3254,10 +3254,15 @@
    "    \"batch_norm_momentum\": [0.9, 0.95, 0.98, 0.99, 0.999],\n",
    "}\n",
    "\n",
-    "rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
+    "rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50, cv=3,\n",
    "                                   fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000},\n",
    "                                   random_state=42, verbose=2)\n",
-    "rnd_search_bn.fit(X_train1, y_train1)"
+    "rnd_search_bn.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
    "\n",
    "# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n",
    "# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
    "# rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
    "#                                    fit_params=fit_params, random_state=42, verbose=2)\n",
    "# rnd_search_bn.fit(X_train1, y_train1)\n"
   ]
  },
  {
@ -3283,7 +3288,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Slightly better than earlier: 99.4% vs 99.3%. Let's see if dropout can do better."
+    "Slightly better than earlier: 99.49% vs 99.42%. Let's see if dropout can do better."
   ]
  },
  {
@ -3304,7 +3309,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Let's go back to the best model we trained earlier and see how it performs on the training set:"
+    "Let's go back to the model we trained earlier and see how it performs on the training set:"
   ]
  },
  {
@ -3321,7 +3326,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The model performs significantly better on the training set than on the test set (99.91% vs 99.32%), which means it is overfitting the training set. A bit of regularization may help. Let's try adding dropout with a 50% dropout rate:"
+    "The model performs significantly better on the training set than on the test set (99.51% vs 99.00%), which means it is overfitting the training set. A bit of regularization may help. Let's try adding dropout with a 50% dropout rate:"
   ]
  },
  {
@ -3340,7 +3345,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "The best params are reached during epoch 23. Dropout somewhat slowed down convergence."
+    "The best params are reached during epoch 17. Dropout somewhat slowed down convergence."
   ]
  },
  {
@ -3364,7 +3369,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "We are out of luck, dropout does not seem to help either. Let's try tuning the hyperparameters, perhaps we can squeeze a bit more performance out of this model:"
+    "We are out of luck, dropout does not seem to help. Let's try tuning the hyperparameters, perhaps we can squeeze a bit more performance out of this model:"
   ]
  },
  {
@ -3387,9 +3392,14 @@
    "}\n",
    "\n",
    "rnd_search_dropout = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
-    "                                        fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000},\n",
+    "                                        cv=3, random_state=42, verbose=2)\n",
-    "                                        random_state=42, verbose=2)\n",
+    "rnd_search_dropout.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
-    "rnd_search_dropout.fit(X_train1, y_train1)"
+    "\n",
    "# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n",
    "# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
    "# rnd_search_dropout = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
    "#                                         fit_params=fit_params, random_state=42, verbose=2)\n",
    "# rnd_search_dropout.fit(X_train1, y_train1)"
   ]
  },
  {
@ -3422,7 +3432,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "But that's okay, we have ourselves a nice DNN that achieves 99.40% accuracy on the test set using Batch Normalization, or 99.32% without BN. Let's see if some of this expertise on digits 0 to 4 can be transferred to the task of classifying digits 5 to 9. For the sake of simplicity we will reuse the DNN without BN, since it is almost as good."
+    "But that's okay, we have ourselves a nice DNN that achieves 99.49% accuracy on the test set using Batch Normalization, or 98.91% without BN. Let's see if some of this expertise on digits 0 to 4 can be transferred to the task of classifying digits 5 to 9. For the sake of simplicity we will reuse the DNN without BN."
   ]
  },
  {
@ -4011,7 +4021,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Meh. How disappointing! ;) Transfer learning did not help much (if at all) in this task. At least we tried... Fortunately, the next exercise will get better results."
+    "Transfer learning allowed us to go from 84.8% accuracy to 91.3%. Not too bad!"
   ]
  },
  {
@ -4554,7 +4564,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Well, 96.7% accuracy, that's not the best MNIST model we have trained so far, but recall that we are only using a small training set (just 500 images per digit). Let's compare this result with the same DNN trained from scratch, without using transfer learning:"
+    "Well, 96.5% accuracy, that's not the best MNIST model we have trained so far, but recall that we are only using a small training set (just 500 images per digit). Let's compare this result with the same DNN trained from scratch, without using transfer learning:"
   ]
  },
  {
@ -4620,9 +4630,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "Only 94.8% accuracy... So transfer learning helped us reduce the error rate from 5.2% to 3.3% (that's over 36% error reduction). Moreover, the model using transfer learning reached over 96% accuracy in less than 10 epochs.\n",
+    "Only 94.6% accuracy... So transfer learning helped us reduce the error rate from 5.4% to 3.5% (that's over 35% error reduction). Moreover, the model using transfer learning reached over 96% accuracy in less than 10 epochs.\n",
    "\n",
-    "Bottom line: transfer learning does not always work (as we saw in exercise 9), but when it does it can make a big difference. So try it out!"
+    "Bottom line: transfer learning does not always work, but when it does it can make a big difference. So try it out!"
   ]
  },
  {
@ -4649,7 +4659,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.5"
+   "version": "3.6.6"
  },
  "nav_menu": {
   "height": "360px",