From ad125bbdba34cbde7144ac6a7c36f83100faac81 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aur=C3=A9lien=20Geron?= Date: Tue, 25 Dec 2018 15:34:36 +0800 Subject: [PATCH] Add hyperparameters to remove warnings, and sync comments with results --- 11_deep_learning.ipynb | 62 ++++++++++++++++++++++++------------------ 1 file changed, 36 insertions(+), 26 deletions(-) diff --git a/11_deep_learning.ipynb b/11_deep_learning.ipynb index f63c59b..b27c72f 100644 --- a/11_deep_learning.ipynb +++ b/11_deep_learning.ipynb @@ -462,7 +462,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This activation function was proposed in this [great paper](https://arxiv.org/pdf/1706.02515.pdf) by Günter Klambauer, Thomas Unterthiner and Andreas Mayr, published in June 2017 (I will definitely add it to the book). During training, a neural network composed of a stack of dense layers using the SELU activation function will self-normalize: the output of each layer will tend to preserve the same mean and variance during training, which solves the vanishing/exploding gradients problem. As a result, this activation function outperforms the other activation functions very significantly for such neural nets, so you should really try it out." + "This activation function was proposed in this [great paper](https://arxiv.org/pdf/1706.02515.pdf) by Günter Klambauer, Thomas Unterthiner and Andreas Mayr, published in June 2017 (I added paragraph about SELU in the latest release of my book). During training, a neural network composed of a stack of dense layers using the SELU activation function will self-normalize: the output of each layer will tend to preserve the same mean and variance during training, which solves the vanishing/exploding gradients problem. As a result, this activation function outperforms the other activation functions very significantly for such neural nets, so you should really try it out." ] }, { @@ -2802,7 +2802,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We get 98.05% accuracy on the test set. That's not too bad, but let's see if we can do better by tuning the hyperparameters." + "This test accuracy is not too bad, but let's see if we can do better by tuning the hyperparameters." ] }, { @@ -3084,11 +3084,11 @@ "}\n", "\n", "rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n", - " random_state=42, verbose=2)\n", + " cv=3, random_state=42, verbose=2)\n", "rnd_search.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n", "\n", "# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n", - "# fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000}\n", + "# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n", "# rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n", "# fit_params=fit_params, random_state=42, verbose=2)\n", "# rnd_search.fit(X_train1, y_train1)\n" @@ -3117,7 +3117,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Wonderful! Tuning the hyperparameters got us up to 99.32% accuracy! It may not sound like a great improvement to go from 98.05% to 99.32% accuracy, but consider the error rate: it went from roughly 2% to 0.7%. That's a 65% reduction of the number of errors this model will produce!" + "Wonderful! Tuning the hyperparameters got us up to 98.91% accuracy! It may not sound like a great improvement to go from 97.26% to 98.91% accuracy, but consider the error rate: it went from roughly 2.6% to 1.1%. That's almost 60% reduction of the number of errors this model will produce!" ] }, { @@ -3172,14 +3172,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The best loss is reached at epoch 19, but it was already within 10% of that result at epoch 9." + "The best loss is reached at epoch 5." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's check that we do indeed get 99.32% accuracy on the test set:" + "Let's check that we do indeed get 98.9% accuracy on the test set:" ] }, { @@ -3215,7 +3215,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The best params are reached during epoch 48, that's actually a slower convergence than earlier. Let's check the accuracy:" + "The best params are reached during epoch 20, that's actually a slower convergence than earlier. Let's check the accuracy:" ] }, { @@ -3232,7 +3232,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Well, batch normalization did not improve accuracy. Let's see if we can find a good set of hyperparameters that will work well with batch normalization:" + "Great, batch normalization improved accuracy! Let's see if we can find a good set of hyperparameters that will work even better with batch normalization:" ] }, { @@ -3254,10 +3254,15 @@ " \"batch_norm_momentum\": [0.9, 0.95, 0.98, 0.99, 0.999],\n", "}\n", "\n", - "rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n", - " fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000},\n", + "rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50, cv=3,\n", " random_state=42, verbose=2)\n", - "rnd_search_bn.fit(X_train1, y_train1)" + "rnd_search_bn.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n", + "\n", + "# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n", + "# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n", + "# rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n", + "# fit_params=fit_params, random_state=42, verbose=2)\n", + "# rnd_search_bn.fit(X_train1, y_train1)\n" ] }, { @@ -3283,7 +3288,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Slightly better than earlier: 99.4% vs 99.3%. Let's see if dropout can do better." + "Slightly better than earlier: 99.49% vs 99.42%. Let's see if dropout can do better." ] }, { @@ -3304,7 +3309,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's go back to the best model we trained earlier and see how it performs on the training set:" + "Let's go back to the model we trained earlier and see how it performs on the training set:" ] }, { @@ -3321,7 +3326,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The model performs significantly better on the training set than on the test set (99.91% vs 99.32%), which means it is overfitting the training set. A bit of regularization may help. Let's try adding dropout with a 50% dropout rate:" + "The model performs significantly better on the training set than on the test set (99.51% vs 99.00%), which means it is overfitting the training set. A bit of regularization may help. Let's try adding dropout with a 50% dropout rate:" ] }, { @@ -3340,7 +3345,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The best params are reached during epoch 23. Dropout somewhat slowed down convergence." + "The best params are reached during epoch 17. Dropout somewhat slowed down convergence." ] }, { @@ -3364,7 +3369,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We are out of luck, dropout does not seem to help either. Let's try tuning the hyperparameters, perhaps we can squeeze a bit more performance out of this model:" + "We are out of luck, dropout does not seem to help. Let's try tuning the hyperparameters, perhaps we can squeeze a bit more performance out of this model:" ] }, { @@ -3387,9 +3392,14 @@ "}\n", "\n", "rnd_search_dropout = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n", - " fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000},\n", - " random_state=42, verbose=2)\n", - "rnd_search_dropout.fit(X_train1, y_train1)" + " cv=3, random_state=42, verbose=2)\n", + "rnd_search_dropout.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n", + "\n", + "# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n", + "# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n", + "# rnd_search_dropout = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n", + "# fit_params=fit_params, random_state=42, verbose=2)\n", + "# rnd_search_dropout.fit(X_train1, y_train1)" ] }, { @@ -3422,7 +3432,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "But that's okay, we have ourselves a nice DNN that achieves 99.40% accuracy on the test set using Batch Normalization, or 99.32% without BN. Let's see if some of this expertise on digits 0 to 4 can be transferred to the task of classifying digits 5 to 9. For the sake of simplicity we will reuse the DNN without BN, since it is almost as good." + "But that's okay, we have ourselves a nice DNN that achieves 99.49% accuracy on the test set using Batch Normalization, or 98.91% without BN. Let's see if some of this expertise on digits 0 to 4 can be transferred to the task of classifying digits 5 to 9. For the sake of simplicity we will reuse the DNN without BN." ] }, { @@ -4011,7 +4021,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Meh. How disappointing! ;) Transfer learning did not help much (if at all) in this task. At least we tried... Fortunately, the next exercise will get better results." + "Transfer learning allowed us to go from 84.8% accuracy to 91.3%. Not too bad!" ] }, { @@ -4554,7 +4564,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Well, 96.7% accuracy, that's not the best MNIST model we have trained so far, but recall that we are only using a small training set (just 500 images per digit). Let's compare this result with the same DNN trained from scratch, without using transfer learning:" + "Well, 96.5% accuracy, that's not the best MNIST model we have trained so far, but recall that we are only using a small training set (just 500 images per digit). Let's compare this result with the same DNN trained from scratch, without using transfer learning:" ] }, { @@ -4620,9 +4630,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Only 94.8% accuracy... So transfer learning helped us reduce the error rate from 5.2% to 3.3% (that's over 36% error reduction). Moreover, the model using transfer learning reached over 96% accuracy in less than 10 epochs.\n", + "Only 94.6% accuracy... So transfer learning helped us reduce the error rate from 5.4% to 3.5% (that's over 35% error reduction). Moreover, the model using transfer learning reached over 96% accuracy in less than 10 epochs.\n", "\n", - "Bottom line: transfer learning does not always work (as we saw in exercise 9), but when it does it can make a big difference. So try it out!" + "Bottom line: transfer learning does not always work, but when it does it can make a big difference. So try it out!" ] }, { @@ -4649,7 +4659,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.5.5" + "version": "3.6.6" }, "nav_menu": { "height": "360px",