2016-09-27 23:31:21 +02:00
{
"cells": [
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-04-16 14:39:14 +02:00
"**Chapter 11 – Training Deep Neural Networks**"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2017-08-19 17:01:55 +02:00
"_This notebook contains all the sample code and solutions to the exercises in chapter 11._"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0-preview."
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 1,
2018-03-24 22:50:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"# Python ≥3.5 is required\n",
"import sys\n",
"assert sys.version_info >= (3, 5)\n",
"\n",
"# Scikit-Learn ≥0.20 is required\n",
"import sklearn\n",
"assert sklearn.__version__ >= \"0.20\"\n",
"\n",
"# TensorFlow ≥2.0-preview is required\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"assert tf.__version__ >= \"2.0\"\n",
2016-09-27 23:31:21 +02:00
"\n",
"# Common imports\n",
"import numpy as np\n",
"import os\n",
"\n",
"# to make this notebook's output stable across runs\n",
2019-02-17 13:31:28 +01:00
"np.random.seed(42)\n",
2016-09-27 23:31:21 +02:00
"\n",
"# To plot pretty figures\n",
"%matplotlib inline\n",
2019-02-17 13:31:28 +01:00
"import matplotlib as mpl\n",
2016-09-27 23:31:21 +02:00
"import matplotlib.pyplot as plt\n",
2019-02-17 13:31:28 +01:00
"mpl.rc('axes', labelsize=14)\n",
"mpl.rc('xtick', labelsize=12)\n",
"mpl.rc('ytick', labelsize=12)\n",
2016-09-27 23:31:21 +02:00
"\n",
"# Where to save the figures\n",
"PROJECT_ROOT_DIR = \".\"\n",
"CHAPTER_ID = \"deep\"\n",
2019-02-17 13:31:28 +01:00
"IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID)\n",
"os.makedirs(IMAGES_PATH, exist_ok=True)\n",
2016-09-27 23:31:21 +02:00
"\n",
2019-02-17 13:31:28 +01:00
"def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n",
" path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n",
2016-09-27 23:31:21 +02:00
" print(\"Saving figure\", fig_id)\n",
" if tight_layout:\n",
" plt.tight_layout()\n",
2019-02-17 13:31:28 +01:00
" plt.savefig(path, format=fig_extension, dpi=resolution)"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2017-06-05 18:48:03 +02:00
"# Vanishing/Exploding Gradients Problem"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 2,
2018-03-24 22:50:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
"def logit(z):\n",
" return 1 / (1 + np.exp(-z))"
]
},
{
"cell_type": "code",
"execution_count": 3,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2016-09-27 23:31:21 +02:00
"source": [
"z = np.linspace(-5, 5, 200)\n",
"\n",
"plt.plot([-5, 5], [0, 0], 'k-')\n",
"plt.plot([-5, 5], [1, 1], 'k--')\n",
"plt.plot([0, 0], [-0.2, 1.2], 'k-')\n",
"plt.plot([-5, 5], [-3/4, 7/4], 'g--')\n",
"plt.plot(z, logit(z), \"b-\", linewidth=2)\n",
"props = dict(facecolor='black', shrink=0.1)\n",
"plt.annotate('Saturating', xytext=(3.5, 0.7), xy=(5, 1), arrowprops=props, fontsize=14, ha=\"center\")\n",
"plt.annotate('Saturating', xytext=(-3.5, 0.3), xy=(-5, 0), arrowprops=props, fontsize=14, ha=\"center\")\n",
"plt.annotate('Linear', xytext=(2, 0.2), xy=(0, 0.5), arrowprops=props, fontsize=14, ha=\"center\")\n",
"plt.grid(True)\n",
"plt.title(\"Sigmoid activation function\", fontsize=14)\n",
"plt.axis([-5, 5, -0.2, 1.2])\n",
"\n",
"save_fig(\"sigmoid_saturation_plot\")\n",
"plt.show()"
]
},
2017-06-05 18:48:03 +02:00
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
"## Xavier and He Initialization"
]
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "code",
"execution_count": 4,
2018-03-24 22:50:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"[name for name in dir(keras.initializers) if not name.startswith(\"_\")]"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 5,
2018-03-24 22:50:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"keras.layers.Dense(10, activation=\"relu\", kernel_initializer=\"he_normal\")"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 6,
2018-03-24 22:50:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"init = keras.initializers.VarianceScaling(scale=2., mode='fan_avg',\n",
" distribution='uniform')\n",
"keras.layers.Dense(10, activation=\"relu\", kernel_initializer=init)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
"## Nonsaturating Activation Functions"
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
"### Leaky ReLU"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 7,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
"def leaky_relu(z, alpha=0.01):\n",
" return np.maximum(alpha*z, z)"
]
},
{
"cell_type": "code",
"execution_count": 8,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2016-09-27 23:31:21 +02:00
"source": [
2017-06-05 18:48:03 +02:00
"plt.plot(z, leaky_relu(z, 0.05), \"b-\", linewidth=2)\n",
2016-09-27 23:31:21 +02:00
"plt.plot([-5, 5], [0, 0], 'k-')\n",
2017-06-05 18:48:03 +02:00
"plt.plot([0, 0], [-0.5, 4.2], 'k-')\n",
2016-09-27 23:31:21 +02:00
"plt.grid(True)\n",
"props = dict(facecolor='black', shrink=0.1)\n",
2017-06-05 18:48:03 +02:00
"plt.annotate('Leak', xytext=(-3.5, 0.5), xy=(-5, -0.2), arrowprops=props, fontsize=14, ha=\"center\")\n",
"plt.title(\"Leaky ReLU activation function\", fontsize=14)\n",
"plt.axis([-5, 5, -0.5, 4.2])\n",
2016-09-27 23:31:21 +02:00
"\n",
2017-06-05 18:48:03 +02:00
"save_fig(\"leaky_relu_plot\")\n",
2016-09-27 23:31:21 +02:00
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 9,
2018-03-24 22:50:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"[m for m in dir(keras.activations) if not m.startswith(\"_\")]"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 10,
2018-03-24 22:50:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"[m for m in dir(keras.layers) if \"relu\" in m.lower()]"
2016-09-27 23:31:21 +02:00
]
},
2019-06-09 14:08:53 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's train a neural network on Fashion MNIST using the Leaky ReLU:"
]
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "code",
"execution_count": 11,
2018-03-24 22:50:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-06-09 14:08:53 +02:00
"(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()\n",
"X_train_full = X_train_full / 255.0\n",
"X_test = X_test / 255.0\n",
"X_valid, X_train = X_train_full[:5000], X_train_full[5000:]\n",
"y_valid, y_train = y_train_full[:5000], y_train_full[5000:]"
2016-09-27 23:31:21 +02:00
]
},
2017-04-30 10:21:27 +02:00
{
2017-06-05 18:48:03 +02:00
"cell_type": "code",
"execution_count": 12,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
2017-04-30 10:21:27 +02:00
"source": [
2019-06-09 14:08:53 +02:00
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(300, kernel_initializer=\"he_normal\"),\n",
" keras.layers.LeakyReLU(),\n",
" keras.layers.Dense(100, kernel_initializer=\"he_normal\"),\n",
" keras.layers.LeakyReLU(),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])"
2017-06-05 18:48:03 +02:00
]
},
{
2019-06-09 14:08:53 +02:00
"cell_type": "code",
"execution_count": 13,
2018-03-24 22:50:29 +01:00
"metadata": {},
2019-06-09 14:08:53 +02:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-06-09 14:08:53 +02:00
"model.compile(loss=\"sparse_categorical_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1e-3),\n",
" metrics=[\"accuracy\"])"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 14,
"metadata": {
"scrolled": true
},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-06-09 14:08:53 +02:00
"history = model.fit(X_train, y_train, epochs=10,\n",
" validation_data=(X_valid, y_valid))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's try PReLU:"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 15,
2018-03-24 22:50:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-06-09 14:08:53 +02:00
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
2019-06-09 14:08:53 +02:00
" keras.layers.Dense(300, kernel_initializer=\"he_normal\"),\n",
" keras.layers.PReLU(),\n",
" keras.layers.Dense(100, kernel_initializer=\"he_normal\"),\n",
" keras.layers.PReLU(),\n",
2019-02-17 13:31:28 +01:00
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])"
2018-05-08 20:21:23 +02:00
]
},
2017-04-30 10:21:27 +02:00
{
2017-06-05 18:48:03 +02:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 16,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-04-30 10:21:27 +02:00
"source": [
2019-06-09 14:08:53 +02:00
"model.compile(loss=\"sparse_categorical_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1e-3),\n",
2019-02-17 13:31:28 +01:00
" metrics=[\"accuracy\"])"
2017-04-30 10:21:27 +02:00
]
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 17,
2018-05-08 20:21:23 +02:00
"metadata": {},
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"history = model.fit(X_train, y_train, epochs=10,\n",
" validation_data=(X_valid, y_valid))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
"### ELU"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 18,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
"def elu(z, alpha=1):\n",
2017-06-21 15:35:47 +02:00
" return np.where(z < 0, alpha * (np.exp(z) - 1), z)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 19,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
"plt.plot(z, elu(z), \"b-\", linewidth=2)\n",
"plt.plot([-5, 5], [0, 0], 'k-')\n",
"plt.plot([-5, 5], [-1, -1], 'k--')\n",
"plt.plot([0, 0], [-2.2, 3.2], 'k-')\n",
"plt.grid(True)\n",
"plt.title(r\"ELU activation function ($\\alpha=1$)\", fontsize=14)\n",
"plt.axis([-5, 5, -2.2, 3.2])\n",
"\n",
"save_fig(\"elu_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
"Implementing ELU in TensorFlow is trivial, just specify the activation function when building each layer:"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 20,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"keras.layers.Dense(10, activation=\"elu\")"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "markdown",
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-21 15:35:47 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### SELU"
2017-06-21 15:35:47 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2019-02-17 13:31:28 +01:00
"This activation function was proposed in this [great paper](https://arxiv.org/pdf/1706.02515.pdf) by Günter Klambauer, Thomas Unterthiner and Andreas Mayr, published in June 2017. During training, a neural network composed exclusively of a stack of dense layers using the SELU activation function and LeCun initialization will self-normalize: the output of each layer will tend to preserve the same mean and variance during training, which solves the vanishing/exploding gradients problem. As a result, this activation function outperforms the other activation functions very significantly for such neural nets, so you should really try it out. Unfortunately, the self-normalizing property of the SELU activation function is easily broken: you cannot use ℓ <sub>1</sub> or ℓ <sub>2</sub> regularization, regular dropout, max-norm, skip connections or other non-sequential topologies (so recurrent neural networks won't self-normalize). However, in practice it works quite well with sequential CNNs. If you break self-normalization, SELU will not necessarily outperform other activation functions."
2017-06-21 15:35:47 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 21,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-21 15:35:47 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"from scipy.special import erfc\n",
"\n",
"# alpha and scale to self normalize with mean 0 and standard deviation 1\n",
"# (see equation 14 in the paper):\n",
"alpha_0_1 = -np.sqrt(2 / np.pi) / (erfc(1/np.sqrt(2)) * np.exp(1/2) - 1)\n",
"scale_0_1 = (1 - erfc(1 / np.sqrt(2)) * np.sqrt(np.e)) * np.sqrt(2 * np.pi) * (2 * erfc(np.sqrt(2))*np.e**2 + np.pi*erfc(1/np.sqrt(2))**2*np.e - 2*(2+np.pi)*erfc(1/np.sqrt(2))*np.sqrt(np.e)+np.pi+2)**(-1/2)"
2017-06-21 15:35:47 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 22,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"def selu(z, scale=scale_0_1, alpha=alpha_0_1):\n",
2017-06-21 15:35:47 +02:00
" return scale * elu(z, alpha)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 23,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-21 15:35:47 +02:00
"source": [
"plt.plot(z, selu(z), \"b-\", linewidth=2)\n",
"plt.plot([-5, 5], [0, 0], 'k-')\n",
"plt.plot([-5, 5], [-1.758, -1.758], 'k--')\n",
"plt.plot([0, 0], [-2.2, 3.2], 'k-')\n",
"plt.grid(True)\n",
2019-02-17 13:31:28 +01:00
"plt.title(\"SELU activation function\", fontsize=14)\n",
2017-06-21 15:35:47 +02:00
"plt.axis([-5, 5, -2.2, 3.2])\n",
"\n",
"save_fig(\"selu_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2019-02-17 13:31:28 +01:00
"By default, the SELU hyperparameters (`scale` and `alpha`) are tuned in such a way that the mean output of each neuron remains close to 0, and the standard deviation remains close to 1 (assuming the inputs are standardized with mean 0 and standard deviation 1 too). Using this activation function, even a 1,000 layer deep neural network preserves roughly mean 0 and standard deviation 1 across all layers, avoiding the exploding/vanishing gradients problem:"
2017-06-21 15:35:47 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 24,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-21 15:35:47 +02:00
"source": [
"np.random.seed(42)\n",
2019-02-17 13:31:28 +01:00
"Z = np.random.normal(size=(500, 100)) # standardized inputs\n",
"for layer in range(1000):\n",
" W = np.random.normal(size=(100, 100), scale=np.sqrt(1 / 100)) # LeCun initialization\n",
2017-06-21 15:35:47 +02:00
" Z = selu(np.dot(Z, W))\n",
2019-02-17 13:31:28 +01:00
" means = np.mean(Z, axis=0).mean()\n",
" stds = np.std(Z, axis=0).mean()\n",
" if layer % 100 == 0:\n",
" print(\"Layer {}: mean {:.2f}, std deviation {:.2f}\".format(layer, means, stds))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
"source": [
2019-02-17 13:31:28 +01:00
"Using SELU is easy:"
2017-06-21 15:35:47 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 25,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-21 15:35:47 +02:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"keras.layers.Dense(10, activation=\"selu\",\n",
" kernel_initializer=\"lecun_normal\")"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
"source": [
2019-02-17 13:31:28 +01:00
"Let's create a neural net for Fashion MNIST with 100 hidden layers, using the SELU activation function:"
2017-06-21 15:35:47 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 26,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-21 15:35:47 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"np.random.seed(42)\n",
"tf.random.set_seed(42)"
2017-06-21 15:35:47 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 27,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-21 15:35:47 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential()\n",
"model.add(keras.layers.Flatten(input_shape=[28, 28]))\n",
"model.add(keras.layers.Dense(300, activation=\"selu\",\n",
" kernel_initializer=\"lecun_normal\"))\n",
"for layer in range(99):\n",
" model.add(keras.layers.Dense(100, activation=\"selu\",\n",
" kernel_initializer=\"lecun_normal\"))\n",
"model.add(keras.layers.Dense(10, activation=\"softmax\"))"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 28,
2019-02-17 13:31:28 +01:00
"metadata": {},
"outputs": [],
"source": [
2019-06-10 04:48:00 +02:00
"model.compile(loss=\"sparse_categorical_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1e-3),\n",
2019-02-17 13:31:28 +01:00
" metrics=[\"accuracy\"])"
2017-06-21 15:35:47 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's train it. Do not forget to scale the inputs to mean 0 and standard deviation 1:"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 29,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-21 15:35:47 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"pixel_means = X_train.mean(axis=0, keepdims=True)\n",
"pixel_stds = X_train.std(axis=0, keepdims=True)\n",
"X_train_scaled = (X_train - pixel_means) / pixel_stds\n",
"X_valid_scaled = (X_valid - pixel_means) / pixel_stds\n",
"X_test_scaled = (X_test - pixel_means) / pixel_stds"
2017-06-21 15:35:47 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 30,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-21 15:35:47 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=5,\n",
" validation_data=(X_valid_scaled, y_valid))"
2017-06-21 15:35:47 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"Now look at what happens if we try to use the ReLU activation function instead:"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 31,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"np.random.seed(42)\n",
"tf.random.set_seed(42)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 32,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential()\n",
"model.add(keras.layers.Flatten(input_shape=[28, 28]))\n",
"model.add(keras.layers.Dense(300, activation=\"relu\", kernel_initializer=\"he_normal\"))\n",
"for layer in range(99):\n",
" model.add(keras.layers.Dense(100, activation=\"relu\", kernel_initializer=\"he_normal\"))\n",
"model.add(keras.layers.Dense(10, activation=\"softmax\"))"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 33,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-06-10 04:48:00 +02:00
"model.compile(loss=\"sparse_categorical_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1e-3),\n",
2019-02-17 13:31:28 +01:00
" metrics=[\"accuracy\"])"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 34,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=5,\n",
" validation_data=(X_valid_scaled, y_valid))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Not great at all, we suffered from the vanishing/exploding gradients problem."
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"# Batch Normalization"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 35,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.BatchNormalization(),\n",
" keras.layers.Dense(300, activation=\"relu\"),\n",
" keras.layers.BatchNormalization(),\n",
" keras.layers.Dense(100, activation=\"relu\"),\n",
" keras.layers.BatchNormalization(),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])"
2016-09-27 23:31:21 +02:00
]
},
2017-04-30 10:21:27 +02:00
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 36,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-04-30 10:21:27 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"model.summary()"
2017-04-30 10:21:27 +02:00
]
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 37,
2018-03-24 22:50:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"bn1 = model.layers[1]\n",
"[(var.name, var.trainable) for var in bn1.variables]"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 38,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"bn1.updates"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 39,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-06-10 04:48:00 +02:00
"model.compile(loss=\"sparse_categorical_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1e-3),\n",
2019-02-17 13:31:28 +01:00
" metrics=[\"accuracy\"])"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 40,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"history = model.fit(X_train, y_train, epochs=10,\n",
" validation_data=(X_valid, y_valid))"
2016-09-27 23:31:21 +02:00
]
},
2017-04-30 10:21:27 +02:00
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-04-30 10:21:27 +02:00
"source": [
2019-03-25 05:03:44 +01:00
"Sometimes applying BN before the activation function works better (there's a debate on this topic). Moreover, the layer before a `BatchNormalization` layer does not need to have bias terms, since the `BatchNormalization` layer some as well, it would be a waste of parameters, so you can set `use_bias=False` when creating those layers:"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 41,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.BatchNormalization(),\n",
2019-03-25 05:03:44 +01:00
" keras.layers.Dense(300, use_bias=False),\n",
2019-02-17 13:31:28 +01:00
" keras.layers.BatchNormalization(),\n",
" keras.layers.Activation(\"relu\"),\n",
2019-03-25 05:03:44 +01:00
" keras.layers.Dense(100, use_bias=False),\n",
2019-02-17 13:31:28 +01:00
" keras.layers.Activation(\"relu\"),\n",
" keras.layers.BatchNormalization(),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 42,
2019-02-17 13:31:28 +01:00
"metadata": {},
"outputs": [],
"source": [
2019-06-10 04:48:00 +02:00
"model.compile(loss=\"sparse_categorical_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1e-3),\n",
2019-02-17 13:31:28 +01:00
" metrics=[\"accuracy\"])"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 43,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"history = model.fit(X_train, y_train, epochs=10,\n",
" validation_data=(X_valid, y_valid))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
"## Gradient Clipping"
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"All Keras optimizers accept `clipnorm` or `clipvalue` arguments:"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 44,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"optimizer = keras.optimizers.SGD(clipvalue=1.0)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 45,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"optimizer = keras.optimizers.SGD(clipnorm=1.0)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## Reusing Pretrained Layers"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "markdown",
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### Reusing a Keras model"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"Let's split the fashion MNIST training set in two:\n",
"* `X_train_A`: all images of all items except for sandals and shirts (classes 5 and 6).\n",
"* `X_train_B`: a much smaller training set of just the first 200 images of sandals or shirts.\n",
"\n",
"The validation set and the test set are also split this way, but without restricting the number of images.\n",
"\n",
"We will train a model on set A (classification task with 8 classes), and try to reuse it to tackle set B (binary classification). We hope to transfer a little bit of knowledge from task A to task B, since classes in set A (sneakers, ankle boots, coats, t-shirts, etc.) are somewhat similar to classes in set B (sandals and shirts). However, since we are using `Dense` layers, only patterns that occur at the same location can be reused (in contrast, convolutional layers will transfer much better, since learned patterns can be detected anywhere on the image, as we will see in the CNN chapter)."
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 46,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"def split_dataset(X, y):\n",
" y_5_or_6 = (y == 5) | (y == 6) # sandals or shirts\n",
" y_A = y[~y_5_or_6]\n",
" y_A[y_A > 6] -= 2 # class indices 7, 8, 9 should be moved to 5, 6, 7\n",
" y_B = (y[y_5_or_6] == 6).astype(np.float32) # binary classification task: is it a shirt (class 6)?\n",
" return ((X[~y_5_or_6], y_A),\n",
" (X[y_5_or_6], y_B))\n",
"\n",
"(X_train_A, y_train_A), (X_train_B, y_train_B) = split_dataset(X_train, y_train)\n",
"(X_valid_A, y_valid_A), (X_valid_B, y_valid_B) = split_dataset(X_valid, y_valid)\n",
"(X_test_A, y_test_A), (X_test_B, y_test_B) = split_dataset(X_test, y_test)\n",
"X_train_B = X_train_B[:200]\n",
"y_train_B = y_train_B[:200]"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 47,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"X_train_A.shape"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 48,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"X_train_B.shape"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 49,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"y_train_A[:30]"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 50,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"y_train_B[:30]"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 51,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"tf.random.set_seed(42)\n",
"np.random.seed(42)"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 52,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"model_A = keras.models.Sequential()\n",
"model_A.add(keras.layers.Flatten(input_shape=[28, 28]))\n",
"for n_hidden in (300, 100, 50, 50, 50):\n",
" model_A.add(keras.layers.Dense(n_hidden, activation=\"selu\"))\n",
"model_A.add(keras.layers.Dense(8, activation=\"softmax\"))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 53,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-06-10 04:48:00 +02:00
"model_A.compile(loss=\"sparse_categorical_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1e-3),\n",
" metrics=[\"accuracy\"])"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 54,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"history = model_A.fit(X_train_A, y_train_A, epochs=20,\n",
" validation_data=(X_valid_A, y_valid_A))"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 55,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"model_A.save(\"my_model_A.h5\")"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 56,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"model_B = keras.models.Sequential()\n",
"model_B.add(keras.layers.Flatten(input_shape=[28, 28]))\n",
"for n_hidden in (300, 100, 50, 50, 50):\n",
" model_B.add(keras.layers.Dense(n_hidden, activation=\"selu\"))\n",
"model_B.add(keras.layers.Dense(1, activation=\"sigmoid\"))"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 57,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-06-10 04:48:00 +02:00
"model_B.compile(loss=\"binary_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1e-3),\n",
" metrics=[\"accuracy\"])"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 58,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"history = model_B.fit(X_train_B, y_train_B, epochs=20,\n",
" validation_data=(X_valid_B, y_valid_B))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 59,
2019-02-17 13:31:28 +01:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"model.summary()"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 60,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"model_A = keras.models.load_model(\"my_model_A.h5\")\n",
"model_B_on_A = keras.models.Sequential(model_A.layers[:-1])\n",
"model_B_on_A.add(keras.layers.Dense(1, activation=\"sigmoid\"))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 61,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"model_A_clone = keras.models.clone_model(model_A)\n",
"model_A_clone.set_weights(model_A.get_weights())"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 62,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"for layer in model_B_on_A.layers[:-1]:\n",
" layer.trainable = False\n",
"\n",
2019-06-10 04:48:00 +02:00
"model_B_on_A.compile(loss=\"binary_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1e-3),\n",
2019-02-17 13:31:28 +01:00
" metrics=[\"accuracy\"])"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 63,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"history = model_B_on_A.fit(X_train_B, y_train_B, epochs=4,\n",
" validation_data=(X_valid_B, y_valid_B))\n",
"\n",
"for layer in model_B_on_A.layers[:-1]:\n",
" layer.trainable = True\n",
"\n",
2019-06-10 04:48:00 +02:00
"model_B_on_A.compile(loss=\"binary_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1e-3),\n",
2019-02-17 13:31:28 +01:00
" metrics=[\"accuracy\"])\n",
"history = model_B_on_A.fit(X_train_B, y_train_B, epochs=16,\n",
" validation_data=(X_valid_B, y_valid_B))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"So, what's the final verdict?"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 64,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"model_B.evaluate(X_test_B, y_test_B)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 65,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"model_B_on_A.evaluate(X_test_B, y_test_B)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"Great! We got quite a bit of transfer: the error rate dropped by a factor of almost 4!"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 66,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"(100 - 97.05) / (100 - 99.25)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"# Faster Optimizers"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## Momentum optimization"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 67,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"optimizer = keras.optimizers.SGD(lr=0.001, momentum=0.9)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## Nesterov Accelerated Gradient"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 68,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"optimizer = keras.optimizers.SGD(lr=0.001, momentum=0.9, nesterov=True)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## AdaGrad"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 69,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"optimizer = keras.optimizers.Adagrad(lr=0.001)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## RMSProp"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 70,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"optimizer = keras.optimizers.RMSprop(lr=0.001, rho=0.9)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## Adam Optimization"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 71,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"optimizer = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## Adamax Optimization"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 72,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"optimizer = keras.optimizers.Adamax(lr=0.001, beta_1=0.9, beta_2=0.999)"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "markdown",
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## Nadam Optimization"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 73,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"optimizer = keras.optimizers.Nadam(lr=0.001, beta_1=0.9, beta_2=0.999)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## Learning Rate Scheduling"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### Power Scheduling"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"```lr = lr0 / (1 + steps / s)**c```\n",
"* Keras uses `c=1` and `s = 1 / decay`"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 74,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"optimizer = keras.optimizers.SGD(lr=0.01, decay=1e-4)"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 75,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
2019-02-28 12:48:06 +01:00
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 76,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"n_epochs = 25\n",
2019-02-28 12:48:06 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n",
" validation_data=(X_valid_scaled, y_valid))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 77,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"learning_rate = 0.01\n",
"decay = 1e-4\n",
"batch_size = 32\n",
"n_steps_per_epoch = len(X_train) // batch_size\n",
"epochs = np.arange(n_epochs)\n",
"lrs = learning_rate / (1 + decay * epochs * n_steps_per_epoch)\n",
"\n",
"plt.plot(epochs, lrs, \"o-\")\n",
"plt.axis([0, n_epochs - 1, 0, 0.01])\n",
"plt.xlabel(\"Epoch\")\n",
"plt.ylabel(\"Learning Rate\")\n",
"plt.title(\"Power Scheduling\", fontsize=14)\n",
"plt.grid(True)\n",
"plt.show()"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### Exponential Scheduling"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "markdown",
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"```lr = lr0 * 0.1**(epoch / s)```"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 78,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"def exponential_decay_fn(epoch):\n",
" return 0.01 * 0.1**(epoch / 20)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 79,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"def exponential_decay(lr0, s):\n",
" def exponential_decay_fn(epoch):\n",
" return lr0 * 0.1**(epoch / s)\n",
" return exponential_decay_fn\n",
2017-06-05 18:48:03 +02:00
"\n",
2019-02-17 13:31:28 +01:00
"exponential_decay_fn = exponential_decay(lr0=0.01, s=20)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 80,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
2019-02-28 12:48:06 +01:00
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n",
2019-02-17 13:31:28 +01:00
"n_epochs = 25"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 81,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"lr_scheduler = keras.callbacks.LearningRateScheduler(exponential_decay_fn)\n",
2019-02-28 12:48:06 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n",
" validation_data=(X_valid_scaled, y_valid),\n",
2019-02-17 13:31:28 +01:00
" callbacks=[lr_scheduler])"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 82,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"plt.plot(history.epoch, history.history[\"lr\"], \"o-\")\n",
"plt.axis([0, n_epochs - 1, 0, 0.011])\n",
"plt.xlabel(\"Epoch\")\n",
"plt.ylabel(\"Learning Rate\")\n",
"plt.title(\"Exponential Scheduling\", fontsize=14)\n",
"plt.grid(True)\n",
"plt.show()"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"The schedule function can take the current learning rate as a second argument:"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 83,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"def exponential_decay_fn(epoch, lr):\n",
" return lr * 0.1**(1 / 20)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"If you want to update the learning rate at each iteration rather than at each epoch, you must write your own callback class:"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 84,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"K = keras.backend\n",
2017-06-05 18:48:03 +02:00
"\n",
2019-02-17 13:31:28 +01:00
"class ExponentialDecay(keras.callbacks.Callback):\n",
" def __init__(self, s=40000):\n",
" super().__init__()\n",
" self.s = s\n",
2017-06-05 18:48:03 +02:00
"\n",
2019-02-17 13:31:28 +01:00
" def on_batch_begin(self, batch, logs=None):\n",
" # Note: the `batch` argument is reset at each epoch\n",
" lr = K.get_value(self.model.optimizer.lr)\n",
" K.set_value(self.model.optimizer.lr, lr * 0.1**(1 / s))\n",
2017-06-05 18:48:03 +02:00
"\n",
2019-02-17 13:31:28 +01:00
" def on_epoch_end(self, epoch, logs=None):\n",
" logs = logs or {}\n",
" logs['lr'] = K.get_value(self.model.optimizer.lr)\n",
2017-06-05 18:48:03 +02:00
"\n",
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
"lr0 = 0.01\n",
2019-02-28 12:48:06 +01:00
"optimizer = keras.optimizers.Nadam(lr=lr0)\n",
2019-02-17 13:31:28 +01:00
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])\n",
"n_epochs = 25\n",
2017-06-05 18:48:03 +02:00
"\n",
2019-02-17 13:31:28 +01:00
"s = 20 * len(X_train) // 32 # number of steps in 20 epochs (batch size = 32)\n",
"exp_decay = ExponentialDecay(s)\n",
2019-02-28 12:48:06 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n",
" validation_data=(X_valid_scaled, y_valid),\n",
2019-02-17 13:31:28 +01:00
" callbacks=[exp_decay])"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 85,
2018-03-24 22:50:29 +01:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"n_steps = n_epochs * len(X_train) // 32\n",
"steps = np.arange(n_steps)\n",
"lrs = lr0 * 0.1**(steps / s)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 86,
2019-02-17 13:31:28 +01:00
"metadata": {
"scrolled": true
},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"plt.plot(steps, lrs, \"-\", linewidth=2)\n",
"plt.axis([0, n_steps - 1, 0, lr0 * 1.1])\n",
"plt.xlabel(\"Batch\")\n",
"plt.ylabel(\"Learning Rate\")\n",
"plt.title(\"Exponential Scheduling (per batch)\", fontsize=14)\n",
"plt.grid(True)\n",
"plt.show()"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### Piecewise Constant Scheduling"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 87,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"def piecewise_constant_fn(epoch):\n",
" if epoch < 5:\n",
" return 0.01\n",
" elif epoch < 15:\n",
" return 0.005\n",
" else:\n",
" return 0.001"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 88,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"def piecewise_constant(boundaries, values):\n",
" boundaries = np.array([0] + boundaries)\n",
" values = np.array(values)\n",
" def piecewise_constant_fn(epoch):\n",
" return values[np.argmax(boundaries > epoch) - 1]\n",
" return piecewise_constant_fn\n",
"\n",
"piecewise_constant_fn = piecewise_constant([5, 15], [0.01, 0.005, 0.001])"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 89,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"lr_scheduler = keras.callbacks.LearningRateScheduler(piecewise_constant_fn)\n",
"\n",
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
2019-02-28 12:48:06 +01:00
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n",
2019-02-17 13:31:28 +01:00
"n_epochs = 25\n",
2019-02-28 12:48:06 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n",
" validation_data=(X_valid_scaled, y_valid),\n",
2019-02-17 13:31:28 +01:00
" callbacks=[lr_scheduler])"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 90,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"plt.plot(history.epoch, [piecewise_constant_fn(epoch) for epoch in history.epoch], \"o-\")\n",
"plt.axis([0, n_epochs - 1, 0, 0.011])\n",
"plt.xlabel(\"Epoch\")\n",
"plt.ylabel(\"Learning Rate\")\n",
"plt.title(\"Piecewise Constant Scheduling\", fontsize=14)\n",
"plt.grid(True)\n",
"plt.show()"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### Performance Scheduling"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 91,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"tf.random.set_seed(42)\n",
"np.random.seed(42)"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 92,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"lr_scheduler = keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=5)\n",
"\n",
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
2019-02-28 12:48:06 +01:00
"optimizer = keras.optimizers.SGD(lr=0.02, momentum=0.9)\n",
2019-02-17 13:31:28 +01:00
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])\n",
"n_epochs = 25\n",
2019-02-28 12:48:06 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n",
" validation_data=(X_valid_scaled, y_valid),\n",
2019-02-17 13:31:28 +01:00
" callbacks=[lr_scheduler])"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 93,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"plt.plot(history.epoch, history.history[\"lr\"], \"bo-\")\n",
"plt.xlabel(\"Epoch\")\n",
"plt.ylabel(\"Learning Rate\", color='b')\n",
"plt.tick_params('y', colors='b')\n",
"plt.gca().set_xlim(0, n_epochs - 1)\n",
"plt.grid(True)\n",
"\n",
"ax2 = plt.gca().twinx()\n",
"ax2.plot(history.epoch, history.history[\"val_loss\"], \"r^-\")\n",
"ax2.set_ylabel('Validation Loss', color='r')\n",
"ax2.tick_params('y', colors='r')\n",
"\n",
"plt.title(\"Reduce LR on Plateau\", fontsize=14)\n",
"plt.show()"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### tf.keras schedulers"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 94,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
"s = 20 * len(X_train) // 32 # number of steps in 20 epochs (batch size = 32)\n",
"learning_rate = keras.optimizers.schedules.ExponentialDecay(0.01, s, 0.1)\n",
"optimizer = keras.optimizers.SGD(learning_rate)\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])\n",
"n_epochs = 25\n",
2019-02-28 12:48:06 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n",
" validation_data=(X_valid_scaled, y_valid))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"For piecewise constant scheduling, try this:"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 95,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"learning_rate = keras.optimizers.schedules.PiecewiseConstantDecay(\n",
2019-05-05 06:42:08 +02:00
" boundaries=[5. * n_steps_per_epoch, 15. * n_steps_per_epoch],\n",
" values=[0.01, 0.005, 0.001])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1Cycle scheduling"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 96,
2019-05-05 06:42:08 +02:00
"metadata": {},
"outputs": [],
"source": [
"K = keras.backend\n",
"\n",
"class ExponentialLearningRate(keras.callbacks.Callback):\n",
" def __init__(self, factor):\n",
" self.factor = factor\n",
" self.rates = []\n",
" self.losses = []\n",
" def on_batch_end(self, batch, logs):\n",
" self.rates.append(K.get_value(self.model.optimizer.lr))\n",
" self.losses.append(logs[\"loss\"])\n",
" K.set_value(self.model.optimizer.lr, self.model.optimizer.lr * self.factor)\n",
"\n",
"def find_learning_rate(model, X, y, epochs=1, batch_size=32, min_rate=10**-5, max_rate=10):\n",
" init_weights = model.get_weights()\n",
" iterations = len(X) // batch_size * epochs\n",
" factor = np.exp(np.log(max_rate / min_rate) / iterations)\n",
" init_lr = K.get_value(model.optimizer.lr)\n",
" K.set_value(model.optimizer.lr, min_rate)\n",
" exp_lr = ExponentialLearningRate(factor)\n",
" history = model.fit(X, y, epochs=epochs, batch_size=batch_size,\n",
" callbacks=[exp_lr])\n",
" K.set_value(model.optimizer.lr, init_lr)\n",
" model.set_weights(init_weights)\n",
" return exp_lr.rates, exp_lr.losses\n",
"\n",
"def plot_lr_vs_loss(rates, losses):\n",
" plt.plot(rates, losses)\n",
" plt.gca().set_xscale('log')\n",
" plt.hlines(min(losses), min(rates), max(rates))\n",
" plt.axis([min(rates), max(rates), min(losses), (losses[0] + min(losses)) / 2])\n",
" plt.xlabel(\"Learning rate\")\n",
" plt.ylabel(\"Loss\")"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 97,
2019-05-05 06:42:08 +02:00
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
2019-06-10 04:48:00 +02:00
"model.compile(loss=\"sparse_categorical_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1e-3),\n",
" metrics=[\"accuracy\"])"
2019-05-05 06:42:08 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 98,
2019-05-05 06:42:08 +02:00
"metadata": {},
"outputs": [],
"source": [
"batch_size = 128\n",
"rates, losses = find_learning_rate(model, X_train_scaled, y_train, epochs=1, batch_size=batch_size)\n",
"plot_lr_vs_loss(rates, losses)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 99,
2019-05-05 06:42:08 +02:00
"metadata": {},
"outputs": [],
"source": [
"class OneCycleScheduler(keras.callbacks.Callback):\n",
" def __init__(self, iterations, max_rate, start_rate=None,\n",
" last_iterations=None, last_rate=None):\n",
" self.iterations = iterations\n",
" self.max_rate = max_rate\n",
" self.start_rate = start_rate or max_rate / 10\n",
" self.last_iterations = last_iterations or iterations // 10 + 1\n",
" self.half_iteration = (iterations - self.last_iterations) // 2\n",
" self.last_rate = last_rate or self.start_rate / 1000\n",
" self.iteration = 0\n",
" def _interpolate(self, iter1, iter2, rate1, rate2):\n",
" return ((rate2 - rate1) * (iter2 - self.iteration)\n",
" / (iter2 - iter1) + rate1)\n",
" def on_batch_begin(self, batch, logs):\n",
" if self.iteration < self.half_iteration:\n",
" rate = self._interpolate(0, self.half_iteration, self.start_rate, self.max_rate)\n",
" elif self.iteration < 2 * self.half_iteration:\n",
" rate = self._interpolate(self.half_iteration, 2 * self.half_iteration,\n",
" self.max_rate, self.start_rate)\n",
" else:\n",
" rate = self._interpolate(2 * self.half_iteration, self.iterations,\n",
" self.start_rate, self.last_rate)\n",
" rate = max(rate, self.last_rate)\n",
" self.iteration += 1\n",
" K.set_value(self.model.optimizer.lr, rate)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 100,
2019-05-05 06:42:08 +02:00
"metadata": {},
"outputs": [],
"source": [
"n_epochs = 25\n",
"onecycle = OneCycleScheduler(len(X_train) // batch_size * n_epochs, max_rate=0.05)\n",
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs, batch_size=batch_size,\n",
" validation_data=(X_valid_scaled, y_valid),\n",
" callbacks=[onecycle])"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "markdown",
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"# Avoiding Overfitting Through Regularization"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "markdown",
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## $\\ell_1$ and $\\ell_2$ regularization"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 101,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"layer = keras.layers.Dense(100, activation=\"elu\",\n",
" kernel_initializer=\"he_normal\",\n",
" kernel_regularizer=keras.regularizers.l2(0.01))\n",
"# or l1(0.1) for ℓ 1 regularization with a factor or 0.1\n",
"# or l1_l2(0.1, 0.01) for both ℓ 1 and ℓ 2 regularization, with factors 0.1 and 0.01 respectively"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 102,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(300, activation=\"elu\",\n",
" kernel_initializer=\"he_normal\",\n",
" kernel_regularizer=keras.regularizers.l2(0.01)),\n",
" keras.layers.Dense(100, activation=\"elu\",\n",
" kernel_initializer=\"he_normal\",\n",
" kernel_regularizer=keras.regularizers.l2(0.01)),\n",
" keras.layers.Dense(10, activation=\"softmax\",\n",
" kernel_regularizer=keras.regularizers.l2(0.01))\n",
"])\n",
2019-02-28 12:48:06 +01:00
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n",
2019-02-17 13:31:28 +01:00
"n_epochs = 2\n",
2019-02-28 12:48:06 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n",
" validation_data=(X_valid_scaled, y_valid))"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 103,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"from functools import partial\n",
"\n",
"RegularizedDense = partial(keras.layers.Dense,\n",
" activation=\"elu\",\n",
" kernel_initializer=\"he_normal\",\n",
" kernel_regularizer=keras.regularizers.l2(0.01))\n",
"\n",
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" RegularizedDense(300),\n",
" RegularizedDense(100),\n",
" RegularizedDense(10, activation=\"softmax\")\n",
"])\n",
2019-02-28 12:48:06 +01:00
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n",
2019-02-17 13:31:28 +01:00
"n_epochs = 2\n",
2019-02-28 12:48:06 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n",
" validation_data=(X_valid_scaled, y_valid))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## Dropout"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 104,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dropout(rate=0.2),\n",
" keras.layers.Dense(300, activation=\"elu\", kernel_initializer=\"he_normal\"),\n",
" keras.layers.Dropout(rate=0.2),\n",
" keras.layers.Dense(100, activation=\"elu\", kernel_initializer=\"he_normal\"),\n",
" keras.layers.Dropout(rate=0.2),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
2019-02-28 12:48:06 +01:00
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n",
2019-02-17 13:31:28 +01:00
"n_epochs = 2\n",
2019-02-28 12:48:06 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n",
" validation_data=(X_valid_scaled, y_valid))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## Alpha Dropout"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 105,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
2019-02-28 12:48:06 +01:00
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 106,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.AlphaDropout(rate=0.2),\n",
" keras.layers.Dense(300, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.AlphaDropout(rate=0.2),\n",
" keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\"),\n",
" keras.layers.AlphaDropout(rate=0.2),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
2019-02-28 12:48:06 +01:00
"optimizer = keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True)\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])\n",
"n_epochs = 20\n",
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n",
" validation_data=(X_valid_scaled, y_valid))"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 107,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"model.evaluate(X_test_scaled, y_test)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 108,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"model.evaluate(X_train_scaled, y_train)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 109,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
2019-05-09 04:39:02 +02:00
"history = model.fit(X_train_scaled, y_train)"
2019-02-28 12:48:06 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## MC Dropout"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 110,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 111,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
2019-05-09 04:39:02 +02:00
"y_probas = np.stack([model(X_test_scaled, training=True)\n",
" for sample in range(100)])\n",
2019-02-28 12:48:06 +01:00
"y_proba = y_probas.mean(axis=0)\n",
"y_std = y_probas.std(axis=0)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 112,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"np.round(model.predict(X_test_scaled[:1]), 2)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 113,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"np.round(y_probas[:, :1], 2)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 114,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"np.round(y_proba[:1], 2)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 115,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"y_std = y_probas.std(axis=0)\n",
"np.round(y_std[:1], 2)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 116,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"y_pred = np.argmax(y_proba, axis=1)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 117,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"accuracy = np.sum(y_pred == y_test) / len(y_test)\n",
"accuracy"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 118,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"class MCDropout(keras.layers.Dropout):\n",
" def call(self, inputs):\n",
" return super().call(inputs, training=True)\n",
"\n",
"class MCAlphaDropout(keras.layers.AlphaDropout):\n",
" def call(self, inputs):\n",
" return super().call(inputs, training=True)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 119,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 120,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"mc_model = keras.models.Sequential([\n",
" MCAlphaDropout(layer.rate) if isinstance(layer, keras.layers.AlphaDropout) else layer\n",
" for layer in model.layers\n",
"])"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 121,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"mc_model.summary()"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 122,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"optimizer = keras.optimizers.SGD(lr=0.01, momentum=0.9, nesterov=True)\n",
"mc_model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 123,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"mc_model.set_weights(model.get_weights())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can use the model with MC Dropout:"
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 124,
2019-02-28 12:48:06 +01:00
"metadata": {},
"outputs": [],
"source": [
"np.round(np.mean([mc_model.predict(X_test_scaled[:1]) for sample in range(100)], axis=0), 2)"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## Max norm"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 125,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
"source": [
2019-02-17 13:31:28 +01:00
"layer = keras.layers.Dense(100, activation=\"selu\", kernel_initializer=\"lecun_normal\",\n",
" kernel_constraint=keras.constraints.max_norm(1.))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-06-09 14:08:53 +02:00
"execution_count": 126,
2019-02-17 13:31:28 +01:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"MaxNormDense = partial(keras.layers.Dense,\n",
" activation=\"selu\", kernel_initializer=\"lecun_normal\",\n",
" kernel_constraint=keras.constraints.max_norm(1.))\n",
2017-06-05 18:48:03 +02:00
"\n",
2019-02-17 13:31:28 +01:00
"model = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" MaxNormDense(300),\n",
" MaxNormDense(100),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
2019-02-28 12:48:06 +01:00
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n",
2019-02-17 13:31:28 +01:00
"n_epochs = 2\n",
2019-02-28 12:48:06 +01:00
"history = model.fit(X_train_scaled, y_train, epochs=n_epochs,\n",
" validation_data=(X_valid_scaled, y_valid))"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2019-02-17 13:31:28 +01:00
"metadata": {
"collapsed": true
},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-28 12:48:06 +01:00
"# Exercises"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "markdown",
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## 1. to 7."
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"See appendix A."
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "markdown",
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## 8. Deep Learning"
2017-06-05 18:48:03 +02:00
]
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "markdown",
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 8.1."
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"_Exercise: Build a DNN with five hidden layers of 100 neurons each, He initialization, and the ELU activation function._"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-05 18:48:03 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-05 18:48:03 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-05 18:48:03 +02:00
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 8.2."
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"_Exercise: Using Adam optimization and early stopping, try training it on MNIST but only on digits 0 to 4, as we will use transfer learning for digits 5 to 9 in the next exercise. You will need a softmax output layer with five neurons, and as always make sure to save checkpoints at regular intervals and save the final model so you can reuse it later._"
2017-06-05 18:48:03 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-05 18:48:03 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-05 18:48:03 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-04-30 10:21:27 +02:00
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2016-09-27 23:31:21 +02:00
},
{
2017-06-05 18:48:03 +02:00
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 8.3."
2016-09-27 23:31:21 +02:00
]
},
2017-04-30 10:21:27 +02:00
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-04-30 10:21:27 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"_Exercise: Tune the hyperparameters using cross-validation and see what precision you can achieve._"
2017-04-30 10:21:27 +02:00
]
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2016-09-27 23:31:21 +02:00
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
"source": []
2016-09-27 23:31:21 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 8.4."
2016-09-27 23:31:21 +02:00
]
},
{
2017-06-05 18:48:03 +02:00
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"_Exercise: Now try adding Batch Normalization and compare the learning curves: is it converging faster than before? Does it produce a better model?_"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2016-09-27 23:31:21 +02:00
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
"source": []
2016-09-27 23:31:21 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 8.5."
2016-09-27 23:31:21 +02:00
]
},
{
2017-06-05 18:48:03 +02:00
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"_Exercise: is the model overfitting the training set? Try adding dropout to every layer and try again. Does it help?_"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2016-09-27 23:31:21 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2016-09-27 23:31:21 +02:00
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
2016-09-27 23:31:21 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## 9. Transfer learning"
2016-09-27 23:31:21 +02:00
]
},
2017-04-30 10:21:27 +02:00
{
2019-02-17 13:31:28 +01:00
"cell_type": "markdown",
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-04-30 10:21:27 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 9.1."
2017-04-30 10:21:27 +02:00
]
},
2016-09-27 23:31:21 +02:00
{
2017-06-05 18:48:03 +02:00
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"_Exercise: create a new DNN that reuses all the pretrained hidden layers of the previous model, freezes them, and replaces the softmax output layer with a new one._"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2016-09-27 23:31:21 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2016-09-27 23:31:21 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 9.2."
2016-09-27 23:31:21 +02:00
]
},
{
2017-06-05 18:48:03 +02:00
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"_Exercise: train this new DNN on digits 5 to 9, using only 100 images per digit, and time how long it takes. Despite this small number of examples, can you achieve high precision?_"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2016-09-27 23:31:21 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 9.3."
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"_Exercise: try caching the frozen layers, and train the model again: how much faster is it now?_"
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 9.4."
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"_Exercise: try again reusing just four hidden layers instead of five. Can you achieve a higher precision?_"
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 9.5."
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"_Exercise: now unfreeze the top two hidden layers and continue training: can you get the model to perform even better?_"
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
2019-02-17 13:31:28 +01:00
"cell_type": "code",
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2019-02-17 13:31:28 +01:00
"outputs": [],
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"## 10. Pretraining on an auxiliary task"
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"In this exercise you will build a DNN that compares two MNIST digit images and predicts whether they represent the same digit or not. Then you will reuse the lower layers of this network to train an MNIST classifier using very little training data."
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 10.1.\n",
"Exercise: _Start by building two DNNs (let's call them DNN A and B), both similar to the one you built earlier but without the output layer: each DNN should have five hidden layers of 100 neurons each, He initialization, and ELU activation. Next, add one more hidden layer with 10 units on top of both DNNs. You should use the `keras.layers.concatenate()` function to concatenate the outputs of both DNNs, then feed the result to the hidden layer. Finally, add an output layer with a single neuron using the logistic activation function._"
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 10.2.\n",
"_Exercise: split the MNIST training set in two sets: split #1 should containing 55,000 images, and split #2 should contain contain 5,000 images. Create a function that generates a training batch where each instance is a pair of MNIST images picked from split #1. Half of the training instances should be pairs of images that belong to the same class, while the other half should be images from different classes. For each pair, the training label should be 0 if the images are from the same class, or 1 if they are from different classes._"
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 10.3.\n",
"_Exercise: train the DNN on this training set. For each image pair, you can simultaneously feed the first image to DNN A and the second image to DNN B. The whole network will gradually learn to tell whether two images belong to the same class or not._"
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "markdown",
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"source": [
2019-02-17 13:31:28 +01:00
"### 10.4.\n",
"_Exercise: now create a new DNN by reusing and freezing the hidden layers of DNN A and adding a softmax output layer on top with 10 neurons. Train this network on split #2 and see if you can achieve high performance despite having only 500 images per class._"
2017-06-14 09:09:23 +02:00
]
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2017-06-14 09:09:23 +02:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2017-06-14 09:09:23 +02:00
},
{
"cell_type": "code",
2019-02-17 13:31:28 +01:00
"execution_count": null,
2017-06-21 15:35:47 +02:00
"metadata": {},
2017-11-03 13:43:56 +01:00
"outputs": [],
2019-02-17 13:31:28 +01:00
"source": []
2016-09-27 23:31:21 +02:00
},
{
"cell_type": "code",
"execution_count": null,
2018-05-08 20:21:23 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2019-02-17 13:31:28 +01:00
"version": "3.6.8"
2016-09-27 23:31:21 +02:00
},
"nav_menu": {
"height": "360px",
"width": "416px"
},
"toc": {
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 6,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
2017-06-21 15:35:47 +02:00
"nbformat_minor": 1
2016-09-27 23:31:21 +02:00
}