Add hyperparameters to remove warnings, and sync comments with results

main
Aurélien Geron 2018-12-25 15:34:36 +08:00
parent d0e489afa4
commit ad125bbdba
1 changed files with 36 additions and 26 deletions

View File

@ -462,7 +462,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"This activation function was proposed in this [great paper](https://arxiv.org/pdf/1706.02515.pdf) by Günter Klambauer, Thomas Unterthiner and Andreas Mayr, published in June 2017 (I will definitely add it to the book). During training, a neural network composed of a stack of dense layers using the SELU activation function will self-normalize: the output of each layer will tend to preserve the same mean and variance during training, which solves the vanishing/exploding gradients problem. As a result, this activation function outperforms the other activation functions very significantly for such neural nets, so you should really try it out." "This activation function was proposed in this [great paper](https://arxiv.org/pdf/1706.02515.pdf) by Günter Klambauer, Thomas Unterthiner and Andreas Mayr, published in June 2017 (I added paragraph about SELU in the latest release of my book). During training, a neural network composed of a stack of dense layers using the SELU activation function will self-normalize: the output of each layer will tend to preserve the same mean and variance during training, which solves the vanishing/exploding gradients problem. As a result, this activation function outperforms the other activation functions very significantly for such neural nets, so you should really try it out."
] ]
}, },
{ {
@ -2802,7 +2802,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"We get 98.05% accuracy on the test set. That's not too bad, but let's see if we can do better by tuning the hyperparameters." "This test accuracy is not too bad, but let's see if we can do better by tuning the hyperparameters."
] ]
}, },
{ {
@ -3084,11 +3084,11 @@
"}\n", "}\n",
"\n", "\n",
"rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n", "rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
" random_state=42, verbose=2)\n", " cv=3, random_state=42, verbose=2)\n",
"rnd_search.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n", "rnd_search.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
"\n", "\n",
"# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n", "# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n",
"# fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000}\n", "# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
"# rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n", "# rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
"# fit_params=fit_params, random_state=42, verbose=2)\n", "# fit_params=fit_params, random_state=42, verbose=2)\n",
"# rnd_search.fit(X_train1, y_train1)\n" "# rnd_search.fit(X_train1, y_train1)\n"
@ -3117,7 +3117,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Wonderful! Tuning the hyperparameters got us up to 99.32% accuracy! It may not sound like a great improvement to go from 98.05% to 99.32% accuracy, but consider the error rate: it went from roughly 2% to 0.7%. That's a 65% reduction of the number of errors this model will produce!" "Wonderful! Tuning the hyperparameters got us up to 98.91% accuracy! It may not sound like a great improvement to go from 97.26% to 98.91% accuracy, but consider the error rate: it went from roughly 2.6% to 1.1%. That's almost 60% reduction of the number of errors this model will produce!"
] ]
}, },
{ {
@ -3172,14 +3172,14 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The best loss is reached at epoch 19, but it was already within 10% of that result at epoch 9." "The best loss is reached at epoch 5."
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Let's check that we do indeed get 99.32% accuracy on the test set:" "Let's check that we do indeed get 98.9% accuracy on the test set:"
] ]
}, },
{ {
@ -3215,7 +3215,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The best params are reached during epoch 48, that's actually a slower convergence than earlier. Let's check the accuracy:" "The best params are reached during epoch 20, that's actually a slower convergence than earlier. Let's check the accuracy:"
] ]
}, },
{ {
@ -3232,7 +3232,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Well, batch normalization did not improve accuracy. Let's see if we can find a good set of hyperparameters that will work well with batch normalization:" "Great, batch normalization improved accuracy! Let's see if we can find a good set of hyperparameters that will work even better with batch normalization:"
] ]
}, },
{ {
@ -3254,10 +3254,15 @@
" \"batch_norm_momentum\": [0.9, 0.95, 0.98, 0.99, 0.999],\n", " \"batch_norm_momentum\": [0.9, 0.95, 0.98, 0.99, 0.999],\n",
"}\n", "}\n",
"\n", "\n",
"rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n", "rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50, cv=3,\n",
" fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000},\n",
" random_state=42, verbose=2)\n", " random_state=42, verbose=2)\n",
"rnd_search_bn.fit(X_train1, y_train1)" "rnd_search_bn.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
"\n",
"# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n",
"# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
"# rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
"# fit_params=fit_params, random_state=42, verbose=2)\n",
"# rnd_search_bn.fit(X_train1, y_train1)\n"
] ]
}, },
{ {
@ -3283,7 +3288,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Slightly better than earlier: 99.4% vs 99.3%. Let's see if dropout can do better." "Slightly better than earlier: 99.49% vs 99.42%. Let's see if dropout can do better."
] ]
}, },
{ {
@ -3304,7 +3309,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Let's go back to the best model we trained earlier and see how it performs on the training set:" "Let's go back to the model we trained earlier and see how it performs on the training set:"
] ]
}, },
{ {
@ -3321,7 +3326,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The model performs significantly better on the training set than on the test set (99.91% vs 99.32%), which means it is overfitting the training set. A bit of regularization may help. Let's try adding dropout with a 50% dropout rate:" "The model performs significantly better on the training set than on the test set (99.51% vs 99.00%), which means it is overfitting the training set. A bit of regularization may help. Let's try adding dropout with a 50% dropout rate:"
] ]
}, },
{ {
@ -3340,7 +3345,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The best params are reached during epoch 23. Dropout somewhat slowed down convergence." "The best params are reached during epoch 17. Dropout somewhat slowed down convergence."
] ]
}, },
{ {
@ -3364,7 +3369,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"We are out of luck, dropout does not seem to help either. Let's try tuning the hyperparameters, perhaps we can squeeze a bit more performance out of this model:" "We are out of luck, dropout does not seem to help. Let's try tuning the hyperparameters, perhaps we can squeeze a bit more performance out of this model:"
] ]
}, },
{ {
@ -3387,9 +3392,14 @@
"}\n", "}\n",
"\n", "\n",
"rnd_search_dropout = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n", "rnd_search_dropout = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
" fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000},\n", " cv=3, random_state=42, verbose=2)\n",
" random_state=42, verbose=2)\n", "rnd_search_dropout.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
"rnd_search_dropout.fit(X_train1, y_train1)" "\n",
"# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n",
"# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
"# rnd_search_dropout = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
"# fit_params=fit_params, random_state=42, verbose=2)\n",
"# rnd_search_dropout.fit(X_train1, y_train1)"
] ]
}, },
{ {
@ -3422,7 +3432,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"But that's okay, we have ourselves a nice DNN that achieves 99.40% accuracy on the test set using Batch Normalization, or 99.32% without BN. Let's see if some of this expertise on digits 0 to 4 can be transferred to the task of classifying digits 5 to 9. For the sake of simplicity we will reuse the DNN without BN, since it is almost as good." "But that's okay, we have ourselves a nice DNN that achieves 99.49% accuracy on the test set using Batch Normalization, or 98.91% without BN. Let's see if some of this expertise on digits 0 to 4 can be transferred to the task of classifying digits 5 to 9. For the sake of simplicity we will reuse the DNN without BN."
] ]
}, },
{ {
@ -4011,7 +4021,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Meh. How disappointing! ;) Transfer learning did not help much (if at all) in this task. At least we tried... Fortunately, the next exercise will get better results." "Transfer learning allowed us to go from 84.8% accuracy to 91.3%. Not too bad!"
] ]
}, },
{ {
@ -4554,7 +4564,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Well, 96.7% accuracy, that's not the best MNIST model we have trained so far, but recall that we are only using a small training set (just 500 images per digit). Let's compare this result with the same DNN trained from scratch, without using transfer learning:" "Well, 96.5% accuracy, that's not the best MNIST model we have trained so far, but recall that we are only using a small training set (just 500 images per digit). Let's compare this result with the same DNN trained from scratch, without using transfer learning:"
] ]
}, },
{ {
@ -4620,9 +4630,9 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Only 94.8% accuracy... So transfer learning helped us reduce the error rate from 5.2% to 3.3% (that's over 36% error reduction). Moreover, the model using transfer learning reached over 96% accuracy in less than 10 epochs.\n", "Only 94.6% accuracy... So transfer learning helped us reduce the error rate from 5.4% to 3.5% (that's over 35% error reduction). Moreover, the model using transfer learning reached over 96% accuracy in less than 10 epochs.\n",
"\n", "\n",
"Bottom line: transfer learning does not always work (as we saw in exercise 9), but when it does it can make a big difference. So try it out!" "Bottom line: transfer learning does not always work, but when it does it can make a big difference. So try it out!"
] ]
}, },
{ {
@ -4649,7 +4659,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.5.5" "version": "3.6.6"
}, },
"nav_menu": { "nav_menu": {
"height": "360px", "height": "360px",