diff --git a/11_training_deep_neural_networks.ipynb b/11_training_deep_neural_networks.ipynb
index 5ad951f..067bf9a 100644
--- a/11_training_deep_neural_networks.ipynb
+++ b/11_training_deep_neural_networks.ipynb
@@ -64,6 +64,8 @@
     "from tensorflow import keras\n",
     "assert tf.__version__ >= \"2.0\"\n",
     "\n",
+    "%load_ext tensorboard\n",
+    "\n",
     "# Common imports\n",
     "import numpy as np\n",
     "import os\n",
@@ -1037,7 +1039,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Great! We got quite a bit of transfer: the error rate dropped by a factor of almost 4!"
+    "Great! We got quite a bit of transfer: the error rate dropped by a factor of 4!"
    ]
   },
   {
@@ -1046,7 +1048,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "(100 - 97.05) / (100 - 99.25)"
+    "(100 - 96.95) / (100 - 99.25)"
    ]
   },
   {
@@ -2126,491 +2128,538 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## 8. Deep Learning"
+    "## 8. Deep Learning on CIFAR10"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 8.1."
+    "### a.\n",
+    "*Exercise: Build a DNN with 20 hidden layers of 100 neurons each (that's too many, but it's the point of this exercise). Use He initialization and the ELU activation function.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 127,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "keras.backend.clear_session()\n",
+    "tf.random.set_seed(42)\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "model = keras.models.Sequential()\n",
+    "model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))\n",
+    "for _ in range(20):\n",
+    "    model.add(keras.layers.Dense(100,\n",
+    "                                 activation=\"elu\",\n",
+    "                                 kernel_initializer=\"he_normal\"))"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "_Exercise: Build a DNN with five hidden layers of 100 neurons each, He initialization, and the ELU activation function._"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 8.2."
+    "### b.\n",
+    "*Exercise: Using Nadam optimization and early stopping, train the network on the CIFAR10 dataset. You can load it with `keras.datasets.cifar10.load_data()`. The dataset is composed of 60,000 32 × 32–pixel color images (50,000 for training, 10,000 for testing) with 10 classes, so you'll need a softmax output layer with 10 neurons. Remember to search for the right learning rate each time you change the model's architecture or hyperparameters.*"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "_Exercise: Using Adam optimization and early stopping, try training it on MNIST but only on digits 0 to 4, as we will use transfer learning for digits 5 to 9 in the next exercise. You will need a softmax output layer with five neurons, and as always make sure to save checkpoints at regular intervals and save the final model so you can reuse it later._"
+    "Let's add the output layer to the model:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 128,
    "metadata": {},
    "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
    "source": [
-    "### 8.3."
+    "model.add(keras.layers.Dense(10, activation=\"softmax\"))"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "_Exercise: Tune the hyperparameters using cross-validation and see what precision you can achieve._"
+    "Let's use a Nadam optimizer with a learning rate of 5e-5. I tried learning rates 1e-5, 3e-5, 1e-4, 3e-4, 1e-3, 3e-3 and 1e-2, and I compared their learning curves for 10 epochs each (using the TensorBoard callback, below). The learning rates 3e-5 and 1e-4 were pretty good, so I tried 5e-5, which turned out to be slightly better."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 129,
    "metadata": {},
    "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
    "source": [
-    "### 8.4."
+    "optimizer = keras.optimizers.Nadam(lr=5e-5)\n",
+    "model.compile(loss=\"sparse_categorical_crossentropy\",\n",
+    "              optimizer=optimizer,\n",
+    "              metrics=[\"accuracy\"])"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "_Exercise: Now try adding Batch Normalization and compare the learning curves: is it converging faster than before? Does it produce a better model?_"
+    "Let's load the CIFAR10 dataset. We also want to use early stopping, so we need a validation set. Let's use the first 5,000 images of the original training set as the validation set:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 130,
    "metadata": {},
    "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
    "source": [
-    "### 8.5."
+    "(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.cifar10.load_data()\n",
+    "\n",
+    "X_train = X_train_full[5000:]\n",
+    "y_train = y_train_full[5000:]\n",
+    "X_valid = X_train_full[:5000]\n",
+    "y_valid = y_train_full[:5000]"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "_Exercise: is the model overfitting the training set? Try adding dropout to every layer and try again. Does it help?_"
+    "Now we can create the callbacks we need and train the model:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 131,
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "early_stopping_cb = keras.callbacks.EarlyStopping(patience=20)\n",
+    "model_checkpoint_cb = keras.callbacks.ModelCheckpoint(\"my_cifar10_model.h5\", save_best_only=True)\n",
+    "run_index = 1 # increment every time you train the model\n",
+    "run_logdir = os.path.join(os.curdir, \"my_cifar10_logs\", \"run_{:03d}\".format(run_index))\n",
+    "tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)\n",
+    "callbacks = [early_stopping_cb, model_checkpoint_cb, tensorboard_cb]"
+   ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 132,
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "%tensorboard --logdir=./my_cifar10_logs --port=6006"
+   ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 133,
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "model.fit(X_train, y_train, epochs=100,\n",
+    "          validation_data=(X_valid, y_valid),\n",
+    "          callbacks=callbacks)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 134,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = keras.models.load_model(\"my_cifar10_model.h5\")\n",
+    "model.evaluate(X_valid, y_valid)"
+   ]
   },
   {
    "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The model with the lowest validation loss gets about 47% accuracy on the validation set. It took 39 epochs to reach the lowest validation loss, with roughly 10 seconds per epoch on my laptop (without a GPU). Let's see if we can improve performance using Batch Normalization."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### c.\n",
+    "*Exercise: Now try adding Batch Normalization and compare the learning curves: Is it converging faster than before? Does it produce a better model? How does it affect training speed?*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The code below is very similar to the code above, with a few changes:\n",
+    "\n",
+    "* I added a BN layer after every Dense layer (before the activation function), except for the output layer. I also added a BN layer before the first hidden layer.\n",
+    "* I changed the learning rate to 5e-4. I experimented with 1e-5, 3e-5, 5e-5, 1e-4, 3e-4, 5e-4, 1e-3 and 3e-3, and I chose the one with the best validation performance after 20 epochs.\n",
+    "* I renamed the run directories to run_bn_* and the model file name to my_cifar10_bn_model.h5."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 135,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "keras.backend.clear_session()\n",
+    "tf.random.set_seed(42)\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "model = keras.models.Sequential()\n",
+    "model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))\n",
+    "model.add(keras.layers.BatchNormalization())\n",
+    "for _ in range(20):\n",
+    "    model.add(keras.layers.Dense(100, kernel_initializer=\"he_normal\"))\n",
+    "    model.add(keras.layers.BatchNormalization())\n",
+    "    model.add(keras.layers.Activation(\"elu\"))\n",
+    "model.add(keras.layers.Dense(10, activation=\"softmax\"))\n",
+    "\n",
+    "optimizer = keras.optimizers.Nadam(lr=5e-4)\n",
+    "model.compile(loss=\"sparse_categorical_crossentropy\",\n",
+    "              optimizer=optimizer,\n",
+    "              metrics=[\"accuracy\"])\n",
+    "\n",
+    "early_stopping_cb = keras.callbacks.EarlyStopping(patience=20)\n",
+    "model_checkpoint_cb = keras.callbacks.ModelCheckpoint(\"my_cifar10_bn_model.h5\", save_best_only=True)\n",
+    "run_index = 1 # increment every time you train the model\n",
+    "run_logdir = os.path.join(os.curdir, \"my_cifar10_logs\", \"run_bn_{:03d}\".format(run_index))\n",
+    "tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)\n",
+    "callbacks = [early_stopping_cb, model_checkpoint_cb, tensorboard_cb]\n",
+    "\n",
+    "model.fit(X_train, y_train, epochs=100,\n",
+    "          validation_data=(X_valid, y_valid),\n",
+    "          callbacks=callbacks)\n",
+    "\n",
+    "model = keras.models.load_model(\"my_cifar10_bn_model.h5\")\n",
+    "model.evaluate(X_valid, y_valid)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* *Is the model converging faster than before?* Much faster! The previous model took 39 epochs to reach the lowest validation loss, while the new model with BN took 18 epochs. That's more than twice as fast as the previous model. The BN layers stabilized training and allowed us to use a much larger learning rate, so convergence was faster.\n",
+    "* *Does BN produce a better model?* Yes! The final model is also much better, with 55% accuracy instead of 47%. It's still not a very good model, but at least it's much better than before (a Convolutional Neural Network would do much better, but that's a different topic, see chapter 14).\n",
+    "* *How does BN affect training speed?* Although the model converged twice as fast, each epoch took about 16s instead of 10s, because of the extra computations required by the BN layers. So overall, although the number of epochs was reduced by 50%, the training time (wall time) was shortened by 30%. Which is still pretty significant!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### d.\n",
+    "*Exercise: Try replacing Batch Normalization with SELU, and make the necessary adjustements to ensure the network self-normalizes (i.e., standardize the input features, use LeCun normal initialization, make sure the DNN contains only a sequence of dense layers, etc.).*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 136,
    "metadata": {
-    "collapsed": true
+    "scrolled": true
    },
+   "outputs": [],
    "source": [
-    "## 9. Transfer learning"
+    "keras.backend.clear_session()\n",
+    "tf.random.set_seed(42)\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "model = keras.models.Sequential()\n",
+    "model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))\n",
+    "for _ in range(20):\n",
+    "    model.add(keras.layers.Dense(100,\n",
+    "                                 kernel_initializer=\"lecun_normal\",\n",
+    "                                 activation=\"selu\"))\n",
+    "model.add(keras.layers.Dense(10, activation=\"softmax\"))\n",
+    "\n",
+    "optimizer = keras.optimizers.Nadam(lr=7e-4)\n",
+    "model.compile(loss=\"sparse_categorical_crossentropy\",\n",
+    "              optimizer=optimizer,\n",
+    "              metrics=[\"accuracy\"])\n",
+    "\n",
+    "early_stopping_cb = keras.callbacks.EarlyStopping(patience=20)\n",
+    "model_checkpoint_cb = keras.callbacks.ModelCheckpoint(\"my_cifar10_selu_model.h5\", save_best_only=True)\n",
+    "run_index = 1 # increment every time you train the model\n",
+    "run_logdir = os.path.join(os.curdir, \"my_cifar10_logs\", \"run_selu_{:03d}\".format(run_index))\n",
+    "tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)\n",
+    "callbacks = [early_stopping_cb, model_checkpoint_cb, tensorboard_cb]\n",
+    "\n",
+    "X_means = X_train.mean(axis=0)\n",
+    "X_stds = X_train.std(axis=0)\n",
+    "X_train_scaled = (X_train - X_means) / X_stds\n",
+    "X_valid_scaled = (X_valid - X_means) / X_stds\n",
+    "X_test_scaled = (X_test - X_means) / X_stds\n",
+    "\n",
+    "model.fit(X_train_scaled, y_train, epochs=100,\n",
+    "          validation_data=(X_valid_scaled, y_valid),\n",
+    "          callbacks=callbacks)\n",
+    "\n",
+    "model = keras.models.load_model(\"my_cifar10_selu_model.h5\")\n",
+    "model.evaluate(X_valid_scaled, y_valid)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 137,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = keras.models.load_model(\"my_cifar10_selu_model.h5\")\n",
+    "model.evaluate(X_valid_scaled, y_valid)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 9.1."
+    "We get 51.4% accuracy, which is better than the original model, but not quite as good as the model using batch normalization. Moreover, it took 13 epochs to reach the best model, which is much faster than both the original model and the BN model, plus each epoch took only 10 seconds, just like the original model. So it's by far the fastest model to train (both in terms of epochs and wall time)."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "_Exercise: create a new DNN that reuses all the pretrained hidden layers of the previous model, freezes them, and replaces the softmax output layer with a new one._"
+    "### e.\n",
+    "*Exercise: Try regularizing the model with alpha dropout. Then, without retraining your model, see if you can achieve better accuracy using MC Dropout.*"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 138,
    "metadata": {},
    "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
    "source": [
-    "### 9.2."
+    "keras.backend.clear_session()\n",
+    "tf.random.set_seed(42)\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "model = keras.models.Sequential()\n",
+    "model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))\n",
+    "for _ in range(20):\n",
+    "    model.add(keras.layers.Dense(100,\n",
+    "                                 kernel_initializer=\"lecun_normal\",\n",
+    "                                 activation=\"selu\"))\n",
+    "\n",
+    "model.add(keras.layers.AlphaDropout(rate=0.1))\n",
+    "model.add(keras.layers.Dense(10, activation=\"softmax\"))\n",
+    "\n",
+    "optimizer = keras.optimizers.Nadam(lr=5e-4)\n",
+    "model.compile(loss=\"sparse_categorical_crossentropy\",\n",
+    "              optimizer=optimizer,\n",
+    "              metrics=[\"accuracy\"])\n",
+    "\n",
+    "early_stopping_cb = keras.callbacks.EarlyStopping(patience=20)\n",
+    "model_checkpoint_cb = keras.callbacks.ModelCheckpoint(\"my_cifar10_alpha_dropout_model.h5\", save_best_only=True)\n",
+    "run_index = 1 # increment every time you train the model\n",
+    "run_logdir = os.path.join(os.curdir, \"my_cifar10_logs\", \"run_alpha_dropout_{:03d}\".format(run_index))\n",
+    "tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)\n",
+    "callbacks = [early_stopping_cb, model_checkpoint_cb, tensorboard_cb]\n",
+    "\n",
+    "X_means = X_train.mean(axis=0)\n",
+    "X_stds = X_train.std(axis=0)\n",
+    "X_train_scaled = (X_train - X_means) / X_stds\n",
+    "X_valid_scaled = (X_valid - X_means) / X_stds\n",
+    "X_test_scaled = (X_test - X_means) / X_stds\n",
+    "\n",
+    "model.fit(X_train_scaled, y_train, epochs=100,\n",
+    "          validation_data=(X_valid_scaled, y_valid),\n",
+    "          callbacks=callbacks)\n",
+    "\n",
+    "model = keras.models.load_model(\"my_cifar10_alpha_dropout_model.h5\")\n",
+    "model.evaluate(X_valid_scaled, y_valid)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "_Exercise: train this new DNN on digits 5 to 9, using only 100 images per digit, and time how long it takes. Despite this small number of examples, can you achieve high precision?_"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 9.3."
+    "The model reaches 50.8% accuracy on the validation set. That's very slightly worse than without dropout (51.4%). With an extensive hyperparameter search, it might be possible to do better (I tried dropout rates of 5%, 10%, 20% and 40%, and learning rates 1e-4, 3e-4, 5e-4, and 1e-3), but probably not much better in this case."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "_Exercise: try caching the frozen layers, and train the model again: how much faster is it now?_"
+    "Let's use MC Dropout now. We will need the `MCAlphaDropout` class we used earlier, so let's just copy it here for convenience:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 139,
    "metadata": {},
    "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
    "source": [
-    "### 9.4."
+    "class MCAlphaDropout(keras.layers.AlphaDropout):\n",
+    "    def call(self, inputs):\n",
+    "        return super().call(inputs, training=True)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "_Exercise: try again reusing just four hidden layers instead of five. Can you achieve a higher precision?_"
+    "Now let's create a new model, identical to the one we just trained (with the same weights), but with `MCAlphaDropout` dropout layers instead of `AlphaDropout` layers:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 140,
    "metadata": {},
    "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
    "source": [
-    "### 9.5."
+    "mc_model = keras.models.Sequential([\n",
+    "    MCAlphaDropout(layer.rate) if isinstance(layer, keras.layers.AlphaDropout) else layer\n",
+    "    for layer in model.layers\n",
+    "])"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "_Exercise: now unfreeze the top two hidden layers and continue training: can you get the model to perform even better?_"
+    "Then let's add a couple utility functions. The first will run the model many times (10 by default) and it will return the mean predicted class probabilities. The second will use these mean probabilities to predict the most likely class for each instance:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 141,
    "metadata": {},
    "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
    "source": [
-    "## 10. Pretraining on an auxiliary task"
+    "def mc_dropout_predict_probas(mc_model, X, n_samples=10):\n",
+    "    Y_probas = [mc_model.predict(X) for sample in range(n_samples)]\n",
+    "    return np.mean(Y_probas, axis=0)\n",
+    "\n",
+    "def mc_dropout_predict_classes(mc_model, X, n_samples=10):\n",
+    "    Y_probas = mc_dropout_predict_probas(mc_model, X, n_samples)\n",
+    "    return np.argmax(Y_probas, axis=1)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In this exercise you will build a DNN that compares two MNIST digit images and predicts whether they represent the same digit or not. Then you will reuse the lower layers of this network to train an MNIST classifier using very little training data."
+    "Now let's make predictions for all the instances in the validation set, and compute the accuracy:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 142,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "keras.backend.clear_session()\n",
+    "tf.random.set_seed(42)\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "y_pred = mc_dropout_predict_classes(mc_model, X_valid_scaled)\n",
+    "accuracy = np.mean(y_pred == y_valid[:, 0])\n",
+    "accuracy"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 10.1.\n",
-    "Exercise: _Start by building two DNNs (let's call them DNN A and B), both similar to the one you built earlier but without the output layer: each DNN should have five hidden layers of 100 neurons each, He initialization, and ELU activation. Next, add one more hidden layer with 10 units on top of both DNNs. You should use the `keras.layers.concatenate()` function to concatenate the outputs of both DNNs, then feed the result to the hidden layer. Finally, add an output layer with a single neuron using the logistic activation function._"
+    "We only get virtually no accuracy improvement in this case (from 50.8% to 50.9%).\n",
+    "\n",
+    "So the best model we got in this exercise is the Batch Normalization model."
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 10.2.\n",
-    "_Exercise: split the MNIST training set in two sets: split #1 should containing 55,000 images, and split #2 should contain contain 5,000 images. Create a function that generates a training batch where each instance is a pair of MNIST images picked from split #1. Half of the training instances should be pairs of images that belong to the same class, while the other half should be images from different classes. For each pair, the training label should be 0 if the images are from the same class, or 1 if they are from different classes._"
+    "### f.\n",
+    "*Exercise: Retrain your model using 1cycle scheduling and see if it improves training speed and model accuracy.*"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 143,
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "keras.backend.clear_session()\n",
+    "tf.random.set_seed(42)\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "model = keras.models.Sequential()\n",
+    "model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))\n",
+    "for _ in range(20):\n",
+    "    model.add(keras.layers.Dense(100,\n",
+    "                                 kernel_initializer=\"lecun_normal\",\n",
+    "                                 activation=\"selu\"))\n",
+    "\n",
+    "model.add(keras.layers.AlphaDropout(rate=0.1))\n",
+    "model.add(keras.layers.Dense(10, activation=\"softmax\"))\n",
+    "\n",
+    "optimizer = keras.optimizers.SGD(lr=1e-3)\n",
+    "model.compile(loss=\"sparse_categorical_crossentropy\",\n",
+    "              optimizer=optimizer,\n",
+    "              metrics=[\"accuracy\"])"
+   ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 144,
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "batch_size = 128\n",
+    "rates, losses = find_learning_rate(model, X_train_scaled, y_train, epochs=1, batch_size=batch_size)\n",
+    "plot_lr_vs_loss(rates, losses)\n",
+    "plt.axis([min(rates), max(rates), min(losses), (losses[0] + min(losses)) / 1.4])"
+   ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 145,
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "keras.backend.clear_session()\n",
+    "tf.random.set_seed(42)\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "model = keras.models.Sequential()\n",
+    "model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))\n",
+    "for _ in range(20):\n",
+    "    model.add(keras.layers.Dense(100,\n",
+    "                                 kernel_initializer=\"lecun_normal\",\n",
+    "                                 activation=\"selu\"))\n",
+    "\n",
+    "model.add(keras.layers.AlphaDropout(rate=0.1))\n",
+    "model.add(keras.layers.Dense(10, activation=\"softmax\"))\n",
+    "\n",
+    "optimizer = keras.optimizers.SGD(lr=1e-2)\n",
+    "model.compile(loss=\"sparse_categorical_crossentropy\",\n",
+    "              optimizer=optimizer,\n",
+    "              metrics=[\"accuracy\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 146,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n_epochs = 15\n",
+    "onecycle = OneCycleScheduler(len(X_train_scaled) // batch_size * n_epochs, max_rate=0.05)\n",
+    "history = model.fit(X_train_scaled, y_train, epochs=n_epochs, batch_size=batch_size,\n",
+    "                    validation_data=(X_valid_scaled, y_valid),\n",
+    "                    callbacks=[onecycle])"
+   ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 10.3.\n",
-    "_Exercise: train the DNN on this training set. For each image pair, you can simultaneously feed the first image to DNN A and the second image to DNN B. The whole network will gradually learn to tell whether two images belong to the same class or not._"
+    "One cycle allowed us to train the model in just 15 epochs, each taking only 3 seconds (thanks to the larger batch size). This is over 3 times faster than the fastest model we trained so far. Moreover, we improved the model's performance (from 50.8% to 52.8%). The batch normalized model reaches a slightly better performance, but it's much slower to train."
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 10.4.\n",
-    "_Exercise: now create a new DNN by reusing and freezing the hidden layers of DNN A and adding a softmax output layer on top with 10 neurons. Train this network on split #2 and see if you can achieve high performance despite having only 500 images per class._"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
   {
    "cell_type": "code",
    "execution_count": null,