Add hyperparameters to remove warnings, and sync comments with results
parent
d0e489afa4
commit
ad125bbdba
|
@ -462,7 +462,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"This activation function was proposed in this [great paper](https://arxiv.org/pdf/1706.02515.pdf) by Günter Klambauer, Thomas Unterthiner and Andreas Mayr, published in June 2017 (I will definitely add it to the book). During training, a neural network composed of a stack of dense layers using the SELU activation function will self-normalize: the output of each layer will tend to preserve the same mean and variance during training, which solves the vanishing/exploding gradients problem. As a result, this activation function outperforms the other activation functions very significantly for such neural nets, so you should really try it out."
|
"This activation function was proposed in this [great paper](https://arxiv.org/pdf/1706.02515.pdf) by Günter Klambauer, Thomas Unterthiner and Andreas Mayr, published in June 2017 (I added paragraph about SELU in the latest release of my book). During training, a neural network composed of a stack of dense layers using the SELU activation function will self-normalize: the output of each layer will tend to preserve the same mean and variance during training, which solves the vanishing/exploding gradients problem. As a result, this activation function outperforms the other activation functions very significantly for such neural nets, so you should really try it out."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -2802,7 +2802,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We get 98.05% accuracy on the test set. That's not too bad, but let's see if we can do better by tuning the hyperparameters."
|
"This test accuracy is not too bad, but let's see if we can do better by tuning the hyperparameters."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3084,11 +3084,11 @@
|
||||||
"}\n",
|
"}\n",
|
||||||
"\n",
|
"\n",
|
||||||
"rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
"rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
||||||
" random_state=42, verbose=2)\n",
|
" cv=3, random_state=42, verbose=2)\n",
|
||||||
"rnd_search.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
|
"rnd_search.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n",
|
"# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n",
|
||||||
"# fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000}\n",
|
"# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
|
||||||
"# rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
"# rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
||||||
"# fit_params=fit_params, random_state=42, verbose=2)\n",
|
"# fit_params=fit_params, random_state=42, verbose=2)\n",
|
||||||
"# rnd_search.fit(X_train1, y_train1)\n"
|
"# rnd_search.fit(X_train1, y_train1)\n"
|
||||||
|
@ -3117,7 +3117,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Wonderful! Tuning the hyperparameters got us up to 99.32% accuracy! It may not sound like a great improvement to go from 98.05% to 99.32% accuracy, but consider the error rate: it went from roughly 2% to 0.7%. That's a 65% reduction of the number of errors this model will produce!"
|
"Wonderful! Tuning the hyperparameters got us up to 98.91% accuracy! It may not sound like a great improvement to go from 97.26% to 98.91% accuracy, but consider the error rate: it went from roughly 2.6% to 1.1%. That's almost 60% reduction of the number of errors this model will produce!"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3172,14 +3172,14 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The best loss is reached at epoch 19, but it was already within 10% of that result at epoch 9."
|
"The best loss is reached at epoch 5."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Let's check that we do indeed get 99.32% accuracy on the test set:"
|
"Let's check that we do indeed get 98.9% accuracy on the test set:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3215,7 +3215,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The best params are reached during epoch 48, that's actually a slower convergence than earlier. Let's check the accuracy:"
|
"The best params are reached during epoch 20, that's actually a slower convergence than earlier. Let's check the accuracy:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3232,7 +3232,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Well, batch normalization did not improve accuracy. Let's see if we can find a good set of hyperparameters that will work well with batch normalization:"
|
"Great, batch normalization improved accuracy! Let's see if we can find a good set of hyperparameters that will work even better with batch normalization:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3254,10 +3254,15 @@
|
||||||
" \"batch_norm_momentum\": [0.9, 0.95, 0.98, 0.99, 0.999],\n",
|
" \"batch_norm_momentum\": [0.9, 0.95, 0.98, 0.99, 0.999],\n",
|
||||||
"}\n",
|
"}\n",
|
||||||
"\n",
|
"\n",
|
||||||
"rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
"rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50, cv=3,\n",
|
||||||
" fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000},\n",
|
|
||||||
" random_state=42, verbose=2)\n",
|
" random_state=42, verbose=2)\n",
|
||||||
"rnd_search_bn.fit(X_train1, y_train1)"
|
"rnd_search_bn.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
|
||||||
|
"\n",
|
||||||
|
"# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n",
|
||||||
|
"# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
|
||||||
|
"# rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
||||||
|
"# fit_params=fit_params, random_state=42, verbose=2)\n",
|
||||||
|
"# rnd_search_bn.fit(X_train1, y_train1)\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3283,7 +3288,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Slightly better than earlier: 99.4% vs 99.3%. Let's see if dropout can do better."
|
"Slightly better than earlier: 99.49% vs 99.42%. Let's see if dropout can do better."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3304,7 +3309,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Let's go back to the best model we trained earlier and see how it performs on the training set:"
|
"Let's go back to the model we trained earlier and see how it performs on the training set:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3321,7 +3326,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The model performs significantly better on the training set than on the test set (99.91% vs 99.32%), which means it is overfitting the training set. A bit of regularization may help. Let's try adding dropout with a 50% dropout rate:"
|
"The model performs significantly better on the training set than on the test set (99.51% vs 99.00%), which means it is overfitting the training set. A bit of regularization may help. Let's try adding dropout with a 50% dropout rate:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3340,7 +3345,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The best params are reached during epoch 23. Dropout somewhat slowed down convergence."
|
"The best params are reached during epoch 17. Dropout somewhat slowed down convergence."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3364,7 +3369,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"We are out of luck, dropout does not seem to help either. Let's try tuning the hyperparameters, perhaps we can squeeze a bit more performance out of this model:"
|
"We are out of luck, dropout does not seem to help. Let's try tuning the hyperparameters, perhaps we can squeeze a bit more performance out of this model:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3387,9 +3392,14 @@
|
||||||
"}\n",
|
"}\n",
|
||||||
"\n",
|
"\n",
|
||||||
"rnd_search_dropout = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
"rnd_search_dropout = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
||||||
" fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000},\n",
|
" cv=3, random_state=42, verbose=2)\n",
|
||||||
" random_state=42, verbose=2)\n",
|
"rnd_search_dropout.fit(X_train1, y_train1, X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
|
||||||
"rnd_search_dropout.fit(X_train1, y_train1)"
|
"\n",
|
||||||
|
"# If you have Scikit-Learn 0.18 or earlier, you should upgrade, or use the fit_params argument:\n",
|
||||||
|
"# fit_params = dict(X_valid=X_valid1, y_valid=y_valid1, n_epochs=1000)\n",
|
||||||
|
"# rnd_search_dropout = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
||||||
|
"# fit_params=fit_params, random_state=42, verbose=2)\n",
|
||||||
|
"# rnd_search_dropout.fit(X_train1, y_train1)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -3422,7 +3432,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"But that's okay, we have ourselves a nice DNN that achieves 99.40% accuracy on the test set using Batch Normalization, or 99.32% without BN. Let's see if some of this expertise on digits 0 to 4 can be transferred to the task of classifying digits 5 to 9. For the sake of simplicity we will reuse the DNN without BN, since it is almost as good."
|
"But that's okay, we have ourselves a nice DNN that achieves 99.49% accuracy on the test set using Batch Normalization, or 98.91% without BN. Let's see if some of this expertise on digits 0 to 4 can be transferred to the task of classifying digits 5 to 9. For the sake of simplicity we will reuse the DNN without BN."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -4011,7 +4021,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Meh. How disappointing! ;) Transfer learning did not help much (if at all) in this task. At least we tried... Fortunately, the next exercise will get better results."
|
"Transfer learning allowed us to go from 84.8% accuracy to 91.3%. Not too bad!"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -4554,7 +4564,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Well, 96.7% accuracy, that's not the best MNIST model we have trained so far, but recall that we are only using a small training set (just 500 images per digit). Let's compare this result with the same DNN trained from scratch, without using transfer learning:"
|
"Well, 96.5% accuracy, that's not the best MNIST model we have trained so far, but recall that we are only using a small training set (just 500 images per digit). Let's compare this result with the same DNN trained from scratch, without using transfer learning:"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -4620,9 +4630,9 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Only 94.8% accuracy... So transfer learning helped us reduce the error rate from 5.2% to 3.3% (that's over 36% error reduction). Moreover, the model using transfer learning reached over 96% accuracy in less than 10 epochs.\n",
|
"Only 94.6% accuracy... So transfer learning helped us reduce the error rate from 5.4% to 3.5% (that's over 35% error reduction). Moreover, the model using transfer learning reached over 96% accuracy in less than 10 epochs.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Bottom line: transfer learning does not always work (as we saw in exercise 9), but when it does it can make a big difference. So try it out!"
|
"Bottom line: transfer learning does not always work, but when it does it can make a big difference. So try it out!"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
@ -4649,7 +4659,7 @@
|
||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.5.5"
|
"version": "3.6.6"
|
||||||
},
|
},
|
||||||
"nav_menu": {
|
"nav_menu": {
|
||||||
"height": "360px",
|
"height": "360px",
|
||||||
|
|
Loading…
Reference in New Issue