4877 lines
160 KiB
Plaintext
4877 lines
160 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Chapter 11 – Deep Learning**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"_This notebook contains all the sample code and solutions to the exercices in chapter 11._"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Setup"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"First, let's make sure this notebook works well in both python 2 and 3, import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# To support both python 2 and python 3\n",
|
||
"from __future__ import division, print_function, unicode_literals\n",
|
||
"\n",
|
||
"# Common imports\n",
|
||
"import numpy as np\n",
|
||
"import os\n",
|
||
"\n",
|
||
"# to make this notebook's output stable across runs\n",
|
||
"def reset_graph(seed=42):\n",
|
||
" tf.reset_default_graph()\n",
|
||
" tf.set_random_seed(seed)\n",
|
||
" np.random.seed(seed)\n",
|
||
"\n",
|
||
"# To plot pretty figures\n",
|
||
"%matplotlib inline\n",
|
||
"import matplotlib\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"plt.rcParams['axes.labelsize'] = 14\n",
|
||
"plt.rcParams['xtick.labelsize'] = 12\n",
|
||
"plt.rcParams['ytick.labelsize'] = 12\n",
|
||
"\n",
|
||
"# Where to save the figures\n",
|
||
"PROJECT_ROOT_DIR = \".\"\n",
|
||
"CHAPTER_ID = \"deep\"\n",
|
||
"\n",
|
||
"def save_fig(fig_id, tight_layout=True):\n",
|
||
" path = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID, fig_id + \".png\")\n",
|
||
" print(\"Saving figure\", fig_id)\n",
|
||
" if tight_layout:\n",
|
||
" plt.tight_layout()\n",
|
||
" plt.savefig(path, format='png', dpi=300)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Vanishing/Exploding Gradients Problem"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def logit(z):\n",
|
||
" return 1 / (1 + np.exp(-z))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"z = np.linspace(-5, 5, 200)\n",
|
||
"\n",
|
||
"plt.plot([-5, 5], [0, 0], 'k-')\n",
|
||
"plt.plot([-5, 5], [1, 1], 'k--')\n",
|
||
"plt.plot([0, 0], [-0.2, 1.2], 'k-')\n",
|
||
"plt.plot([-5, 5], [-3/4, 7/4], 'g--')\n",
|
||
"plt.plot(z, logit(z), \"b-\", linewidth=2)\n",
|
||
"props = dict(facecolor='black', shrink=0.1)\n",
|
||
"plt.annotate('Saturating', xytext=(3.5, 0.7), xy=(5, 1), arrowprops=props, fontsize=14, ha=\"center\")\n",
|
||
"plt.annotate('Saturating', xytext=(-3.5, 0.3), xy=(-5, 0), arrowprops=props, fontsize=14, ha=\"center\")\n",
|
||
"plt.annotate('Linear', xytext=(2, 0.2), xy=(0, 0.5), arrowprops=props, fontsize=14, ha=\"center\")\n",
|
||
"plt.grid(True)\n",
|
||
"plt.title(\"Sigmoid activation function\", fontsize=14)\n",
|
||
"plt.axis([-5, 5, -0.2, 1.2])\n",
|
||
"\n",
|
||
"save_fig(\"sigmoid_saturation_plot\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Xavier and He Initialization"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Note: the book uses `tensorflow.contrib.layers.fully_connected()` rather than `tf.layers.dense()` (which did not exist when this chapter was written). It is now preferable to use `tf.layers.dense()`, because anything in the contrib module may change or be deleted without notice. The `dense()` function is almost identical to the `fully_connected()` function. The main differences relevant to this chapter are:\n",
|
||
"* several parameters are renamed: `scope` becomes `name`, `activation_fn` becomes `activation` (and similarly the `_fn` suffix is removed from other parameters such as `normalizer_fn`), `weights_initializer` becomes `kernel_initializer`, etc.\n",
|
||
"* the default `activation` is now `None` rather than `tf.nn.relu`.\n",
|
||
"* it does not support `tensorflow.contrib.framework.arg_scope()` (introduced later in chapter 11).\n",
|
||
"* it does not support regularizer params (introduced later in chapter 11)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import tensorflow as tf"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"he_init = tf.contrib.layers.variance_scaling_initializer()\n",
|
||
"hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu,\n",
|
||
" kernel_initializer=he_init, name=\"hidden1\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Nonsaturating Activation Functions"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Leaky ReLU"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def leaky_relu(z, alpha=0.01):\n",
|
||
" return np.maximum(alpha*z, z)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"plt.plot(z, leaky_relu(z, 0.05), \"b-\", linewidth=2)\n",
|
||
"plt.plot([-5, 5], [0, 0], 'k-')\n",
|
||
"plt.plot([0, 0], [-0.5, 4.2], 'k-')\n",
|
||
"plt.grid(True)\n",
|
||
"props = dict(facecolor='black', shrink=0.1)\n",
|
||
"plt.annotate('Leak', xytext=(-3.5, 0.5), xy=(-5, -0.2), arrowprops=props, fontsize=14, ha=\"center\")\n",
|
||
"plt.title(\"Leaky ReLU activation function\", fontsize=14)\n",
|
||
"plt.axis([-5, 5, -0.5, 4.2])\n",
|
||
"\n",
|
||
"save_fig(\"leaky_relu_plot\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Implementing Leaky ReLU in TensorFlow:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 10,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def leaky_relu(z, name=None):\n",
|
||
" return tf.maximum(0.01 * z, z, name=name)\n",
|
||
"\n",
|
||
"hidden1 = tf.layers.dense(X, n_hidden1, activation=leaky_relu, name=\"hidden1\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's train a neural network on MNIST using the Leaky ReLU. First let's create the graph:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 11,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300\n",
|
||
"n_hidden2 = 100\n",
|
||
"n_outputs = 10"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 13,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=leaky_relu, name=\"hidden1\")\n",
|
||
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=leaky_relu, name=\"hidden2\")\n",
|
||
" logits = tf.layers.dense(hidden2, n_outputs, name=\"outputs\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 14,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 15,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"learning_rate = 0.01\n",
|
||
"\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
" training_op = optimizer.minimize(loss)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 16,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 17,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's load the data:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 18,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from tensorflow.examples.tutorials.mnist import input_data\n",
|
||
"mnist = input_data.read_data_sets(\"/tmp/data/\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 19,
|
||
"metadata": {
|
||
"scrolled": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 40\n",
|
||
"batch_size = 50\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" if epoch % 5 == 0:\n",
|
||
" acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: mnist.validation.images, y: mnist.validation.labels})\n",
|
||
" print(epoch, \"Batch accuracy:\", acc_train, \"Validation accuracy:\", acc_test)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### ELU"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 20,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"def elu(z, alpha=1):\n",
|
||
" return np.where(z < 0, alpha * (np.exp(z) - 1), z)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 21,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"plt.plot(z, elu(z), \"b-\", linewidth=2)\n",
|
||
"plt.plot([-5, 5], [0, 0], 'k-')\n",
|
||
"plt.plot([-5, 5], [-1, -1], 'k--')\n",
|
||
"plt.plot([0, 0], [-2.2, 3.2], 'k-')\n",
|
||
"plt.grid(True)\n",
|
||
"plt.title(r\"ELU activation function ($\\alpha=1$)\", fontsize=14)\n",
|
||
"plt.axis([-5, 5, -2.2, 3.2])\n",
|
||
"\n",
|
||
"save_fig(\"elu_plot\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Implementing ELU in TensorFlow is trivial, just specify the activation function when building each layer:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 22,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 23,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.elu, name=\"hidden1\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### SELU"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"This activation function was proposed in this [great paper](https://arxiv.org/pdf/1706.02515.pdf) by Günter Klambauer, Thomas Unterthiner and Andreas Mayr, published in June 2017 (I will definitely add it to the book). It outperforms the other activation functions very significantly for deep neural networks, so you should really try it out."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 24,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def selu(z,\n",
|
||
" scale=1.0507009873554804934193349852946,\n",
|
||
" alpha=1.6732632423543772848170429916717):\n",
|
||
" return scale * elu(z, alpha)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 25,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"plt.plot(z, selu(z), \"b-\", linewidth=2)\n",
|
||
"plt.plot([-5, 5], [0, 0], 'k-')\n",
|
||
"plt.plot([-5, 5], [-1.758, -1.758], 'k--')\n",
|
||
"plt.plot([0, 0], [-2.2, 3.2], 'k-')\n",
|
||
"plt.grid(True)\n",
|
||
"plt.title(r\"SELU activation function\", fontsize=14)\n",
|
||
"plt.axis([-5, 5, -2.2, 3.2])\n",
|
||
"\n",
|
||
"save_fig(\"selu_plot\")\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"With this activation function, even a 100 layer deep neural network preserves roughly mean 0 and standard deviation 1 across all layers, avoiding the exploding/vanishing gradients problem:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 26,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"np.random.seed(42)\n",
|
||
"Z = np.random.normal(size=(500, 100))\n",
|
||
"for layer in range(100):\n",
|
||
" W = np.random.normal(size=(100, 100), scale=np.sqrt(1/100))\n",
|
||
" Z = selu(np.dot(Z, W))\n",
|
||
" means = np.mean(Z, axis=1)\n",
|
||
" stds = np.std(Z, axis=1)\n",
|
||
" if layer % 10 == 0:\n",
|
||
" print(\"Layer {}: {:.2f} < mean < {:.2f}, {:.2f} < std deviation < {:.2f}\".format(\n",
|
||
" layer, means.min(), means.max(), stds.min(), stds.max()))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Here's a TensorFlow implementation (there will almost certainly be a `tf.nn.selu()` function in future TensorFlow versions):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def selu(z,\n",
|
||
" scale=1.0507009873554804934193349852946,\n",
|
||
" alpha=1.6732632423543772848170429916717):\n",
|
||
" return scale * tf.where(z >= 0.0, z, alpha * tf.nn.elu(z))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"SELUs can also be combined with dropout, check out [this implementation](https://github.com/bioinf-jku/SNNs/blob/master/selu.py) by the Institute of Bioinformatics, Johannes Kepler University Linz."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's create a neural net for MNIST using the SELU activation function:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300\n",
|
||
"n_hidden2 = 100\n",
|
||
"n_outputs = 10\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=selu, name=\"hidden1\")\n",
|
||
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=selu, name=\"hidden2\")\n",
|
||
" logits = tf.layers.dense(hidden2, n_outputs, name=\"outputs\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"learning_rate = 0.01\n",
|
||
"\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
" training_op = optimizer.minimize(loss)\n",
|
||
"\n",
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()\n",
|
||
"n_epochs = 40\n",
|
||
"batch_size = 50"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now let's train it. Do not forget to scale the inputs to mean 0 and standard deviation 1:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"means = mnist.train.images.mean(axis=0, keepdims=True)\n",
|
||
"stds = mnist.train.images.std(axis=0, keepdims=True) + 1e-10\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" X_batch_scaled = (X_batch - means) / stds\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch_scaled, y: y_batch})\n",
|
||
" if epoch % 5 == 0:\n",
|
||
" acc_train = accuracy.eval(feed_dict={X: X_batch_scaled, y: y_batch})\n",
|
||
" X_val_scaled = (mnist.validation.images - means) / stds\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: X_val_scaled, y: mnist.validation.labels})\n",
|
||
" print(epoch, \"Batch accuracy:\", acc_train, \"Validation accuracy:\", acc_test)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_model_final_selu.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Batch Normalization"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Note: the book uses `tensorflow.contrib.layers.batch_norm()` rather than `tf.layers.batch_normalization()` (which did not exist when this chapter was written). It is now preferable to use `tf.layers.batch_normalization()`, because anything in the contrib module may change or be deleted without notice. Instead of using the `batch_norm()` function as a regularizer parameter to the `fully_connected()` function, we now use `batch_normalization()` and we explicitly create a distinct layer. The parameters are a bit different, in particular:\n",
|
||
"* `decay` is renamed to `momentum`,\n",
|
||
"* `is_training` is renamed to `training`,\n",
|
||
"* `updates_collections` is removed: the update operations needed by batch normalization are added to the `UPDATE_OPS` collection and you need to explicity run these operations during training (see the execution phase below),\n",
|
||
"* we don't need to specify `scale=True`, as that is the default.\n",
|
||
"\n",
|
||
"Also note that in order to run batch norm just _before_ each hidden layer's activation function, we apply the ELU activation function manually, right after the batch norm layer.\n",
|
||
"\n",
|
||
"Note: since the `tf.layers.dense()` function is incompatible with `tf.contrib.layers.arg_scope()` (which is used in the book), we now use python's `functools.partial()` function instead. It makes it easy to create a `my_dense_layer()` function that just calls `tf.layers.dense()` with the desired parameters automatically set (unless they are overridden when calling `my_dense_layer()`). As you can see, the code remains very similar."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 24,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"import tensorflow as tf\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28\n",
|
||
"n_hidden1 = 300\n",
|
||
"n_hidden2 = 100\n",
|
||
"n_outputs = 10\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"\n",
|
||
"training = tf.placeholder_with_default(False, shape=(), name='training')\n",
|
||
"\n",
|
||
"hidden1 = tf.layers.dense(X, n_hidden1, name=\"hidden1\")\n",
|
||
"bn1 = tf.layers.batch_normalization(hidden1, training=training, momentum=0.9)\n",
|
||
"bn1_act = tf.nn.elu(bn1)\n",
|
||
"\n",
|
||
"hidden2 = tf.layers.dense(bn1_act, n_hidden2, name=\"hidden2\")\n",
|
||
"bn2 = tf.layers.batch_normalization(hidden2, training=training, momentum=0.9)\n",
|
||
"bn2_act = tf.nn.elu(bn2)\n",
|
||
"\n",
|
||
"logits_before_bn = tf.layers.dense(bn2_act, n_outputs, name=\"outputs\")\n",
|
||
"logits = tf.layers.batch_normalization(logits_before_bn, training=training,\n",
|
||
" momentum=0.9)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 25,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"training = tf.placeholder_with_default(False, shape=(), name='training')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"To avoid repeating the same parameters over and over again, we can use Python's `partial()` function:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 26,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from functools import partial\n",
|
||
"\n",
|
||
"my_batch_norm_layer = partial(tf.layers.batch_normalization,\n",
|
||
" training=training, momentum=0.9)\n",
|
||
"\n",
|
||
"hidden1 = tf.layers.dense(X, n_hidden1, name=\"hidden1\")\n",
|
||
"bn1 = my_batch_norm_layer(hidden1)\n",
|
||
"bn1_act = tf.nn.elu(bn1)\n",
|
||
"hidden2 = tf.layers.dense(bn1_act, n_hidden2, name=\"hidden2\")\n",
|
||
"bn2 = my_batch_norm_layer(hidden2)\n",
|
||
"bn2_act = tf.nn.elu(bn2)\n",
|
||
"logits_before_bn = tf.layers.dense(bn2_act, n_outputs, name=\"outputs\")\n",
|
||
"logits = my_batch_norm_layer(logits_before_bn)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's build a neural net for MNIST, using the ELU activation function and Batch Normalization at each layer:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 27,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"batch_norm_momentum = 0.9\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
|
||
"training = tf.placeholder_with_default(False, shape=(), name='training')\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" he_init = tf.contrib.layers.variance_scaling_initializer()\n",
|
||
"\n",
|
||
" my_batch_norm_layer = partial(\n",
|
||
" tf.layers.batch_normalization,\n",
|
||
" training=training,\n",
|
||
" momentum=batch_norm_momentum)\n",
|
||
"\n",
|
||
" my_dense_layer = partial(\n",
|
||
" tf.layers.dense,\n",
|
||
" kernel_initializer=he_init)\n",
|
||
"\n",
|
||
" hidden1 = my_dense_layer(X, n_hidden1, name=\"hidden1\")\n",
|
||
" bn1 = tf.nn.elu(my_batch_norm_layer(hidden1))\n",
|
||
" hidden2 = my_dense_layer(bn1, n_hidden2, name=\"hidden2\")\n",
|
||
" bn2 = tf.nn.elu(my_batch_norm_layer(hidden2))\n",
|
||
" logits_before_bn = my_dense_layer(bn2, n_outputs, name=\"outputs\")\n",
|
||
" logits = my_batch_norm_layer(logits_before_bn)\n",
|
||
"\n",
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
" training_op = optimizer.minimize(loss)\n",
|
||
"\n",
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))\n",
|
||
" \n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Note: since we are using `tf.layers.batch_normalization()` rather than `tf.contrib.layers.batch_norm()` (as in the book), we need to explicitly run the extra update operations needed by batch normalization (`sess.run([training_op, extra_update_ops],...`)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 28,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 20\n",
|
||
"batch_size = 200"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 29,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run([training_op, extra_update_ops],\n",
|
||
" feed_dict={training: True, X: X_batch, y: y_batch})\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,\n",
|
||
" y: mnist.test.labels})\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"What!? That's not a great accuracy for MNIST. Of course, if you train for longer it will get much better accuracy, but with such a shallow network, Batch Norm and ELU are unlikely to have very positive impact: they shine mostly for much deeper nets."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Note that you could also make the training operation depend on the update operations:\n",
|
||
"\n",
|
||
"```python\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
" extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n",
|
||
" with tf.control_dependencies(extra_update_ops):\n",
|
||
" training_op = optimizer.minimize(loss)\n",
|
||
"```\n",
|
||
"\n",
|
||
"This way, you would just have to evaluate the `training_op` during training, TensorFlow would automatically run the update operations as well:\n",
|
||
"\n",
|
||
"```python\n",
|
||
"sess.run(training_op, feed_dict={training: True, X: X_batch, y: y_batch})\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"One more thing: notice that the list of trainable variables is shorter than the list of all global variables. This is because the moving averages are non-trainable variables. If you want to reuse a pretrained neural network (see below), you must not forget these non-trainable variables."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 30,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"[v.name for v in tf.trainable_variables()]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 31,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"[v.name for v in tf.global_variables()]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Gradient Clipping"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's create a simple neural net for MNIST and add gradient clipping. The first part is the same as earlier (except we added a few more layers to demonstrate reusing pretrained models, see below):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 32,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300\n",
|
||
"n_hidden2 = 50\n",
|
||
"n_hidden3 = 50\n",
|
||
"n_hidden4 = 50\n",
|
||
"n_hidden5 = 50\n",
|
||
"n_outputs = 10\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu, name=\"hidden1\")\n",
|
||
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu, name=\"hidden2\")\n",
|
||
" hidden3 = tf.layers.dense(hidden2, n_hidden3, activation=tf.nn.relu, name=\"hidden3\")\n",
|
||
" hidden4 = tf.layers.dense(hidden3, n_hidden4, activation=tf.nn.relu, name=\"hidden4\")\n",
|
||
" hidden5 = tf.layers.dense(hidden4, n_hidden5, activation=tf.nn.relu, name=\"hidden5\")\n",
|
||
" logits = tf.layers.dense(hidden5, n_outputs, name=\"outputs\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 33,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"learning_rate = 0.01"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now we apply gradient clipping. For this, we need to get the gradients, use the `clip_by_value()` function to clip them, then apply them:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 34,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"threshold = 1.0\n",
|
||
"\n",
|
||
"optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
"grads_and_vars = optimizer.compute_gradients(loss)\n",
|
||
"capped_gvs = [(tf.clip_by_value(grad, -threshold, threshold), var)\n",
|
||
" for grad, var in grads_and_vars]\n",
|
||
"training_op = optimizer.apply_gradients(capped_gvs)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The rest is the same as usual:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 35,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 36,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 37,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 20\n",
|
||
"batch_size = 200"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 38,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,\n",
|
||
" y: mnist.test.labels})\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Reusing Pretrained Layers"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Reusing a TensorFlow Model"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"First you need to load the graph's structure. The `import_meta_graph()` function does just that, loading the graph's operations into the default graph, and returning a `Saver` that you can then use to restore the model's state. Note that by default, a `Saver` saves the structure of the graph into a `.meta` file, so that's the file you should load:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 39,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 40,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"saver = tf.train.import_meta_graph(\"./my_model_final.ckpt.meta\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Next you need to get a handle on all the operations you will need for training. If you don't know the graph's structure, you can list all the operations:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 41,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"for op in tf.get_default_graph().get_operations():\n",
|
||
" print(op.name)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Oops, that's a lot of operations! It's much easier to use TensorBoard to visualize the graph. The following hack will allow you to visualize the graph within Jupyter (if it does not work with your browser, you will need to use a `FileWriter` to save the graph and then visualize it in TensorBoard):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 42,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"from IPython.display import clear_output, Image, display, HTML\n",
|
||
"\n",
|
||
"def strip_consts(graph_def, max_const_size=32):\n",
|
||
" \"\"\"Strip large constant values from graph_def.\"\"\"\n",
|
||
" strip_def = tf.GraphDef()\n",
|
||
" for n0 in graph_def.node:\n",
|
||
" n = strip_def.node.add() \n",
|
||
" n.MergeFrom(n0)\n",
|
||
" if n.op == 'Const':\n",
|
||
" tensor = n.attr['value'].tensor\n",
|
||
" size = len(tensor.tensor_content)\n",
|
||
" if size > max_const_size:\n",
|
||
" tensor.tensor_content = b\"<stripped %d bytes>\"%size\n",
|
||
" return strip_def\n",
|
||
"\n",
|
||
"def show_graph(graph_def, max_const_size=32):\n",
|
||
" \"\"\"Visualize TensorFlow graph.\"\"\"\n",
|
||
" if hasattr(graph_def, 'as_graph_def'):\n",
|
||
" graph_def = graph_def.as_graph_def()\n",
|
||
" strip_def = strip_consts(graph_def, max_const_size=max_const_size)\n",
|
||
" code = \"\"\"\n",
|
||
" <script>\n",
|
||
" function load() {{\n",
|
||
" document.getElementById(\"{id}\").pbtxt = {data};\n",
|
||
" }}\n",
|
||
" </script>\n",
|
||
" <link rel=\"import\" href=\"https://tensorboard.appspot.com/tf-graph-basic.build.html\" onload=load()>\n",
|
||
" <div style=\"height:600px\">\n",
|
||
" <tf-graph-basic id=\"{id}\"></tf-graph-basic>\n",
|
||
" </div>\n",
|
||
" \"\"\".format(data=repr(str(strip_def)), id='graph'+str(np.random.rand()))\n",
|
||
"\n",
|
||
" iframe = \"\"\"\n",
|
||
" <iframe seamless style=\"width:1200px;height:620px;border:0\" srcdoc=\"{}\"></iframe>\n",
|
||
" \"\"\".format(code.replace('\"', '"'))\n",
|
||
" display(HTML(iframe))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 43,
|
||
"metadata": {
|
||
"scrolled": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"show_graph(tf.get_default_graph())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Once you know which operations you need, you can get a handle on them using the graph's `get_operation_by_name()` or `get_tensor_by_name()` methods:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 44,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"X = tf.get_default_graph().get_tensor_by_name(\"X:0\")\n",
|
||
"y = tf.get_default_graph().get_tensor_by_name(\"y:0\")\n",
|
||
"\n",
|
||
"accuracy = tf.get_default_graph().get_tensor_by_name(\"eval/accuracy:0\")\n",
|
||
"\n",
|
||
"training_op = tf.get_default_graph().get_operation_by_name(\"GradientDescent\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"If you are the author of the original model, you could make things easier for people who will reuse your model by giving operations very clear names and documenting them. Another approach is to create a collection containing all the important operations that people will want to get a handle on:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 45,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"for op in (X, y, accuracy, training_op):\n",
|
||
" tf.add_to_collection(\"my_important_ops\", op)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"This way people who reuse your model will be able to simply write:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 46,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"X, y, accuracy, training_op = tf.get_collection(\"my_important_ops\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now you can start a session, restore the model's state and continue training on your data:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 47,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.Session() as sess:\n",
|
||
" saver.restore(sess, \"./my_model_final.ckpt\")\n",
|
||
" # continue training the model..."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Actually, let's test this for real!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 48,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.Session() as sess:\n",
|
||
" saver.restore(sess, \"./my_model_final.ckpt\")\n",
|
||
"\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,\n",
|
||
" y: mnist.test.labels})\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_new_model_final.ckpt\") "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Alternatively, if you have access to the Python code that built the original graph, you can use it instead of `import_meta_graph()`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 49,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300\n",
|
||
"n_hidden2 = 50\n",
|
||
"n_hidden3 = 50\n",
|
||
"n_hidden4 = 50\n",
|
||
"n_outputs = 10\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu, name=\"hidden1\")\n",
|
||
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu, name=\"hidden2\")\n",
|
||
" hidden3 = tf.layers.dense(hidden2, n_hidden3, activation=tf.nn.relu, name=\"hidden3\")\n",
|
||
" hidden4 = tf.layers.dense(hidden3, n_hidden4, activation=tf.nn.relu, name=\"hidden4\")\n",
|
||
" hidden5 = tf.layers.dense(hidden4, n_hidden5, activation=tf.nn.relu, name=\"hidden5\")\n",
|
||
" logits = tf.layers.dense(hidden5, n_outputs, name=\"outputs\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")\n",
|
||
"\n",
|
||
"learning_rate = 0.01\n",
|
||
"threshold = 1.0\n",
|
||
"\n",
|
||
"optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
"grads_and_vars = optimizer.compute_gradients(loss)\n",
|
||
"capped_gvs = [(tf.clip_by_value(grad, -threshold, threshold), var)\n",
|
||
" for grad, var in grads_and_vars]\n",
|
||
"training_op = optimizer.apply_gradients(capped_gvs)\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And continue training:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 50,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.Session() as sess:\n",
|
||
" saver.restore(sess, \"./my_model_final.ckpt\")\n",
|
||
"\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,\n",
|
||
" y: mnist.test.labels})\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_new_model_final.ckpt\") "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In general you will want to reuse only the lower layers. If you are using `import_meta_graph()` it will load the whole graph, but you can simply ignore the parts you do not need. In this example, we add a new 4th hidden layer on top of the pretrained 3rd layer (ignoring the old 4th hidden layer). We also build a new output layer, the loss for this new output, and a new optimizer to minimize it. We also need another saver to save the whole graph (containing both the entire old graph plus the new operations), and an initialization operation to initialize all the new variables:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 51,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_hidden4 = 20 # new layer\n",
|
||
"n_outputs = 10 # new layer\n",
|
||
"\n",
|
||
"saver = tf.train.import_meta_graph(\"./my_model_final.ckpt.meta\")\n",
|
||
"\n",
|
||
"X = tf.get_default_graph().get_tensor_by_name(\"X:0\")\n",
|
||
"y = tf.get_default_graph().get_tensor_by_name(\"y:0\")\n",
|
||
"\n",
|
||
"hidden3 = tf.get_default_graph().get_tensor_by_name(\"dnn/hidden4/Relu:0\")\n",
|
||
"\n",
|
||
"new_hidden4 = tf.layers.dense(hidden3, n_hidden4, activation=tf.nn.relu, name=\"new_hidden4\")\n",
|
||
"new_logits = tf.layers.dense(new_hidden4, n_outputs, name=\"new_outputs\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"new_loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=new_logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"new_eval\"):\n",
|
||
" correct = tf.nn.in_top_k(new_logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"new_train\"):\n",
|
||
" optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
" training_op = optimizer.minimize(loss)\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"new_saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And we can train this new model:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 52,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" saver.restore(sess, \"./my_model_final.ckpt\")\n",
|
||
"\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,\n",
|
||
" y: mnist.test.labels})\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val)\n",
|
||
"\n",
|
||
" save_path = new_saver.save(sess, \"./my_new_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"If you have access to the Python code that built the original graph, you can just reuse the parts you need and drop the rest:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 53,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300 # reused\n",
|
||
"n_hidden2 = 50 # reused\n",
|
||
"n_hidden3 = 50 # reused\n",
|
||
"n_hidden4 = 20 # new!\n",
|
||
"n_outputs = 10 # new!\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu, name=\"hidden1\") # reused\n",
|
||
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu, name=\"hidden2\") # reused\n",
|
||
" hidden3 = tf.layers.dense(hidden2, n_hidden3, activation=tf.nn.relu, name=\"hidden3\") # reused\n",
|
||
" hidden4 = tf.layers.dense(hidden3, n_hidden4, activation=tf.nn.relu, name=\"hidden4\") # new!\n",
|
||
" logits = tf.layers.dense(hidden4, n_outputs, name=\"outputs\") # new!\n",
|
||
"\n",
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
" training_op = optimizer.minimize(loss)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"However, you must create one `Saver` to restore the pretrained model (giving it the list of variables to restore, or else it will complain that the graphs don't match), and another `Saver` to save the new model, once it is trained:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 54,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reuse_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,\n",
|
||
" scope=\"hidden[123]\") # regular expression\n",
|
||
"reuse_vars_dict = dict([(var.op.name, var) for var in reuse_vars])\n",
|
||
"restore_saver = tf.train.Saver(reuse_vars_dict) # to restore layers 1-3\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" restore_saver.restore(sess, \"./my_model_final.ckpt\")\n",
|
||
"\n",
|
||
" for epoch in range(n_epochs): # not shown in the book\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size): # not shown\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size) # not shown\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch}) # not shown\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images, # not shown\n",
|
||
" y: mnist.test.labels}) # not shown\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val) # not shown\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_new_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Reusing Models from Other Frameworks"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In this example, for each variable we want to reuse, we find its initializer's assignment operation, and we get its second input, which corresponds to the initialization value. When we run the initializer, we replace the initialization values with the ones we want, using a `feed_dict`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 55,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 2\n",
|
||
"n_hidden1 = 3"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 56,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"original_w = [[1., 2., 3.], [4., 5., 6.]] # Load the weights from the other framework\n",
|
||
"original_b = [7., 8., 9.] # Load the biases from the other framework\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu, name=\"hidden1\")\n",
|
||
"# [...] Build the rest of the model\n",
|
||
"\n",
|
||
"# Get a handle on the assignment nodes for the hidden1 variables\n",
|
||
"graph = tf.get_default_graph()\n",
|
||
"assign_kernel = graph.get_operation_by_name(\"hidden1/kernel/Assign\")\n",
|
||
"assign_bias = graph.get_operation_by_name(\"hidden1/bias/Assign\")\n",
|
||
"init_kernel = assign_kernel.inputs[1]\n",
|
||
"init_bias = assign_bias.inputs[1]\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" sess.run(init, feed_dict={init_kernel: original_w, init_bias: original_b})\n",
|
||
" # [...] Train the model on your new task\n",
|
||
" print(hidden1.eval(feed_dict={X: [[10.0, 11.0]]})) # not shown in the book"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Note: the weights variable created by the `tf.layers.dense()` function is called `\"kernel\"` (instead of `\"weights\"` when using the `tf.contrib.layers.fully_connected()`, as in the book), and the biases variable is called `bias` instead of `biases`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Another approach (initially used in the book) would be to create dedicated assignment nodes and dedicated placeholders. This is more verbose and less efficient, but you may find this more explicit:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 57,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 2\n",
|
||
"n_hidden1 = 3\n",
|
||
"\n",
|
||
"original_w = [[1., 2., 3.], [4., 5., 6.]] # Load the weights from the other framework\n",
|
||
"original_b = [7., 8., 9.] # Load the biases from the other framework\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu, name=\"hidden1\")\n",
|
||
"# [...] Build the rest of the model\n",
|
||
"\n",
|
||
"# Get a handle on the variables of layer hidden1\n",
|
||
"with tf.variable_scope(\"\", default_name=\"\", reuse=True): # root scope\n",
|
||
" hidden1_weights = tf.get_variable(\"hidden1/kernel\")\n",
|
||
" hidden1_biases = tf.get_variable(\"hidden1/bias\")\n",
|
||
"\n",
|
||
"# Create dedicated placeholders and assignment nodes\n",
|
||
"original_weights = tf.placeholder(tf.float32, shape=(n_inputs, n_hidden1))\n",
|
||
"original_biases = tf.placeholder(tf.float32, shape=n_hidden1)\n",
|
||
"assign_hidden1_weights = tf.assign(hidden1_weights, original_weights)\n",
|
||
"assign_hidden1_biases = tf.assign(hidden1_biases, original_biases)\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" sess.run(init)\n",
|
||
" sess.run(assign_hidden1_weights, feed_dict={original_weights: original_w})\n",
|
||
" sess.run(assign_hidden1_biases, feed_dict={original_biases: original_b})\n",
|
||
" # [...] Train the model on your new task\n",
|
||
" print(hidden1.eval(feed_dict={X: [[10.0, 11.0]]}))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Note that we could also get a handle on the variables using `get_collection()` and specifying the `scope`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 58,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=\"hidden1\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Or we could use the graph's `get_tensor_by_name()` method:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 59,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tf.get_default_graph().get_tensor_by_name(\"hidden1/kernel:0\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 60,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"tf.get_default_graph().get_tensor_by_name(\"hidden1/bias:0\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Freezing the Lower Layers"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 61,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300 # reused\n",
|
||
"n_hidden2 = 50 # reused\n",
|
||
"n_hidden3 = 50 # reused\n",
|
||
"n_hidden4 = 20 # new!\n",
|
||
"n_outputs = 10 # new!\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu, name=\"hidden1\") # reused\n",
|
||
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu, name=\"hidden2\") # reused\n",
|
||
" hidden3 = tf.layers.dense(hidden2, n_hidden3, activation=tf.nn.relu, name=\"hidden3\") # reused\n",
|
||
" hidden4 = tf.layers.dense(hidden3, n_hidden4, activation=tf.nn.relu, name=\"hidden4\") # new!\n",
|
||
" logits = tf.layers.dense(hidden4, n_outputs, name=\"outputs\") # new!\n",
|
||
"\n",
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 62,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"train\"): # not shown in the book\n",
|
||
" optimizer = tf.train.GradientDescentOptimizer(learning_rate) # not shown\n",
|
||
" train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,\n",
|
||
" scope=\"hidden[34]|outputs\")\n",
|
||
" training_op = optimizer.minimize(loss, var_list=train_vars)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 63,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"init = tf.global_variables_initializer()\n",
|
||
"new_saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 64,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reuse_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,\n",
|
||
" scope=\"hidden[123]\") # regular expression\n",
|
||
"reuse_vars_dict = dict([(var.op.name, var) for var in reuse_vars])\n",
|
||
"restore_saver = tf.train.Saver(reuse_vars_dict) # to restore layers 1-3\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" restore_saver.restore(sess, \"./my_model_final.ckpt\")\n",
|
||
"\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,\n",
|
||
" y: mnist.test.labels})\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_new_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 65,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300 # reused\n",
|
||
"n_hidden2 = 50 # reused\n",
|
||
"n_hidden3 = 50 # reused\n",
|
||
"n_hidden4 = 20 # new!\n",
|
||
"n_outputs = 10 # new!\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 66,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu,\n",
|
||
" name=\"hidden1\") # reused frozen\n",
|
||
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu,\n",
|
||
" name=\"hidden2\") # reused frozen\n",
|
||
" hidden2_stop = tf.stop_gradient(hidden2)\n",
|
||
" hidden3 = tf.layers.dense(hidden2_stop, n_hidden3, activation=tf.nn.relu,\n",
|
||
" name=\"hidden3\") # reused, not frozen\n",
|
||
" hidden4 = tf.layers.dense(hidden3, n_hidden4, activation=tf.nn.relu,\n",
|
||
" name=\"hidden4\") # new!\n",
|
||
" logits = tf.layers.dense(hidden4, n_outputs, name=\"outputs\") # new!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 67,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
" training_op = optimizer.minimize(loss)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The training code is exactly the same as earlier:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 68,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reuse_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,\n",
|
||
" scope=\"hidden[123]\") # regular expression\n",
|
||
"reuse_vars_dict = dict([(var.op.name, var) for var in reuse_vars])\n",
|
||
"restore_saver = tf.train.Saver(reuse_vars_dict) # to restore layers 1-3\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" restore_saver.restore(sess, \"./my_model_final.ckpt\")\n",
|
||
"\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,\n",
|
||
" y: mnist.test.labels})\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_new_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Caching the Frozen Layers"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 69,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300 # reused\n",
|
||
"n_hidden2 = 50 # reused\n",
|
||
"n_hidden3 = 50 # reused\n",
|
||
"n_hidden4 = 20 # new!\n",
|
||
"n_outputs = 10 # new!\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu,\n",
|
||
" name=\"hidden1\") # reused frozen\n",
|
||
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu,\n",
|
||
" name=\"hidden2\") # reused frozen & cached\n",
|
||
" hidden2_stop = tf.stop_gradient(hidden2)\n",
|
||
" hidden3 = tf.layers.dense(hidden2_stop, n_hidden3, activation=tf.nn.relu,\n",
|
||
" name=\"hidden3\") # reused, not frozen\n",
|
||
" hidden4 = tf.layers.dense(hidden3, n_hidden4, activation=tf.nn.relu,\n",
|
||
" name=\"hidden4\") # new!\n",
|
||
" logits = tf.layers.dense(hidden4, n_outputs, name=\"outputs\") # new!\n",
|
||
"\n",
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
" training_op = optimizer.minimize(loss)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 70,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reuse_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,\n",
|
||
" scope=\"hidden[123]\") # regular expression\n",
|
||
"reuse_vars_dict = dict([(var.op.name, var) for var in reuse_vars])\n",
|
||
"restore_saver = tf.train.Saver(reuse_vars_dict) # to restore layers 1-3\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 71,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import numpy as np\n",
|
||
"\n",
|
||
"n_batches = mnist.train.num_examples // batch_size\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" restore_saver.restore(sess, \"./my_model_final.ckpt\")\n",
|
||
" \n",
|
||
" h2_cache = sess.run(hidden2, feed_dict={X: mnist.train.images})\n",
|
||
" h2_cache_test = sess.run(hidden2, feed_dict={X: mnist.test.images}) # not shown in the book\n",
|
||
"\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" shuffled_idx = np.random.permutation(mnist.train.num_examples)\n",
|
||
" hidden2_batches = np.array_split(h2_cache[shuffled_idx], n_batches)\n",
|
||
" y_batches = np.array_split(mnist.train.labels[shuffled_idx], n_batches)\n",
|
||
" for hidden2_batch, y_batch in zip(hidden2_batches, y_batches):\n",
|
||
" sess.run(training_op, feed_dict={hidden2:hidden2_batch, y:y_batch})\n",
|
||
"\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={hidden2: h2_cache_test, # not shown\n",
|
||
" y: mnist.test.labels}) # not shown\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val) # not shown\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_new_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Faster Optimizers"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Momentum optimization"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 72,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,\n",
|
||
" momentum=0.9)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Nesterov Accelerated Gradient"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 73,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,\n",
|
||
" momentum=0.9, use_nesterov=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## AdaGrad"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 74,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"optimizer = tf.train.AdagradOptimizer(learning_rate=learning_rate)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## RMSProp"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 75,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"optimizer = tf.train.RMSPropOptimizer(learning_rate=learning_rate,\n",
|
||
" momentum=0.9, decay=0.9, epsilon=1e-10)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Adam Optimization"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 76,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Learning Rate Scheduling"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 77,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300\n",
|
||
"n_hidden2 = 50\n",
|
||
"n_outputs = 10\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu, name=\"hidden1\")\n",
|
||
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu, name=\"hidden2\")\n",
|
||
" logits = tf.layers.dense(hidden2, n_outputs, name=\"outputs\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 78,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"train\"): # not shown in the book\n",
|
||
" initial_learning_rate = 0.1\n",
|
||
" decay_steps = 10000\n",
|
||
" decay_rate = 1/10\n",
|
||
" global_step = tf.Variable(0, trainable=False, name=\"global_step\")\n",
|
||
" learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step,\n",
|
||
" decay_steps, decay_rate)\n",
|
||
" optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=0.9)\n",
|
||
" training_op = optimizer.minimize(loss, global_step=global_step)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 79,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 80,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 5\n",
|
||
"batch_size = 50\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,\n",
|
||
" y: mnist.test.labels})\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Avoiding Overfitting Through Regularization"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## $\\ell_1$ and $\\ell_2$ regularization"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's implement $\\ell_1$ regularization manually. First, we create the model, as usual (with just one hidden layer this time, for simplicity):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 81,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300\n",
|
||
"n_outputs = 10\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu, name=\"hidden1\")\n",
|
||
" logits = tf.layers.dense(hidden1, n_outputs, name=\"outputs\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Next, we get a handle on the layer weights, and we compute the total loss, which is equal to the sum of the usual cross entropy loss and the $\\ell_1$ loss (i.e., the absolute values of the weights):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 82,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"W1 = tf.get_default_graph().get_tensor_by_name(\"hidden1/kernel:0\")\n",
|
||
"W2 = tf.get_default_graph().get_tensor_by_name(\"outputs/kernel:0\")\n",
|
||
"\n",
|
||
"scale = 0.001 # l1 regularization hyperparameter\n",
|
||
"\n",
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,\n",
|
||
" logits=logits)\n",
|
||
" base_loss = tf.reduce_mean(xentropy, name=\"avg_xentropy\")\n",
|
||
" reg_losses = tf.reduce_sum(tf.abs(W1)) + tf.reduce_sum(tf.abs(W2))\n",
|
||
" loss = tf.add(base_loss, scale * reg_losses, name=\"loss\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The rest is just as usual:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 83,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")\n",
|
||
"\n",
|
||
"learning_rate = 0.01\n",
|
||
"\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
" training_op = optimizer.minimize(loss)\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 84,
|
||
"metadata": {
|
||
"scrolled": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 20\n",
|
||
"batch_size = 200\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,\n",
|
||
" y: mnist.test.labels})\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Alternatively, we can pass a regularization function to the `tf.layers.dense()` function, which will use it to create operations that will compute the regularization loss, and it adds these operations to the collection of regularization losses. The beginning is the same as above:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 85,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_hidden1 = 300\n",
|
||
"n_hidden2 = 50\n",
|
||
"n_outputs = 10\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Next, we will use Python's `partial()` function to avoid repeating the same arguments over and over again. Note that we set the `kernel_regularizer` argument:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 86,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"scale = 0.001"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 87,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"my_dense_layer = partial(\n",
|
||
" tf.layers.dense, activation=tf.nn.relu,\n",
|
||
" kernel_regularizer=tf.contrib.layers.l1_regularizer(scale))\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = my_dense_layer(X, n_hidden1, name=\"hidden1\")\n",
|
||
" hidden2 = my_dense_layer(hidden1, n_hidden2, name=\"hidden2\")\n",
|
||
" logits = my_dense_layer(hidden2, n_outputs, activation=None,\n",
|
||
" name=\"outputs\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Next we must add the regularization losses to the base loss:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 88,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"loss\"): # not shown in the book\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits( # not shown\n",
|
||
" labels=y, logits=logits) # not shown\n",
|
||
" base_loss = tf.reduce_mean(xentropy, name=\"avg_xentropy\") # not shown\n",
|
||
" reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)\n",
|
||
" loss = tf.add_n([base_loss] + reg_losses, name=\"loss\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And the rest is the same as usual:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 89,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")\n",
|
||
"\n",
|
||
"learning_rate = 0.01\n",
|
||
"\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.GradientDescentOptimizer(learning_rate)\n",
|
||
" training_op = optimizer.minimize(loss)\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 90,
|
||
"metadata": {
|
||
"scrolled": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 20\n",
|
||
"batch_size = 200\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" accuracy_val = accuracy.eval(feed_dict={X: mnist.test.images,\n",
|
||
" y: mnist.test.labels})\n",
|
||
" print(epoch, \"Test accuracy:\", accuracy_val)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Dropout"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Note: the book uses `tf.contrib.layers.dropout()` rather than `tf.layers.dropout()` (which did not exist when this chapter was written). It is now preferable to use `tf.layers.dropout()`, because anything in the contrib module may change or be deleted without notice. The `tf.layers.dropout()` function is almost identical to the `tf.contrib.layers.dropout()` function, except for a few minor differences. Most importantly:\n",
|
||
"* you must specify the dropout rate (`rate`) rather than the keep probability (`keep_prob`), where `rate` is simply equal to `1 - keep_prob`,\n",
|
||
"* the `is_training` parameter is renamed to `training`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 91,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 92,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"training = tf.placeholder_with_default(False, shape=(), name='training')\n",
|
||
"\n",
|
||
"dropout_rate = 0.5 # == 1 - keep_prob\n",
|
||
"X_drop = tf.layers.dropout(X, dropout_rate, training=training)\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X_drop, n_hidden1, activation=tf.nn.relu,\n",
|
||
" name=\"hidden1\")\n",
|
||
" hidden1_drop = tf.layers.dropout(hidden1, dropout_rate, training=training)\n",
|
||
" hidden2 = tf.layers.dense(hidden1_drop, n_hidden2, activation=tf.nn.relu,\n",
|
||
" name=\"hidden2\")\n",
|
||
" hidden2_drop = tf.layers.dropout(hidden2, dropout_rate, training=training)\n",
|
||
" logits = tf.layers.dense(hidden2_drop, n_outputs, name=\"outputs\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 93,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.MomentumOptimizer(learning_rate, momentum=0.9)\n",
|
||
" training_op = optimizer.minimize(loss) \n",
|
||
"\n",
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))\n",
|
||
" \n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 94,
|
||
"metadata": {
|
||
"scrolled": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 20\n",
|
||
"batch_size = 50\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={training: True, X: X_batch, y: y_batch})\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: mnist.test.images, y: mnist.test.labels})\n",
|
||
" print(epoch, \"Test accuracy:\", acc_test)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Max norm"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's go back to a plain and simple neural net for MNIST with just 2 hidden layers:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 95,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28\n",
|
||
"n_hidden1 = 300\n",
|
||
"n_hidden2 = 50\n",
|
||
"n_outputs = 10\n",
|
||
"\n",
|
||
"learning_rate = 0.01\n",
|
||
"momentum = 0.9\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu, name=\"hidden1\")\n",
|
||
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu, name=\"hidden2\")\n",
|
||
" logits = tf.layers.dense(hidden2, n_outputs, name=\"outputs\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.MomentumOptimizer(learning_rate, momentum)\n",
|
||
" training_op = optimizer.minimize(loss) \n",
|
||
"\n",
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Next, let's get a handle on the first hidden layer's weight and create an operation that will compute the clipped weights using the `clip_by_norm()` function. Then we create an assignment operation to assign the clipped weights to the weights variable:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 96,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"threshold = 1.0\n",
|
||
"weights = tf.get_default_graph().get_tensor_by_name(\"hidden1/kernel:0\")\n",
|
||
"clipped_weights = tf.clip_by_norm(weights, clip_norm=threshold, axes=1)\n",
|
||
"clip_weights = tf.assign(weights, clipped_weights)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We can do this as well for the second hidden layer:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 97,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"weights2 = tf.get_default_graph().get_tensor_by_name(\"hidden2/kernel:0\")\n",
|
||
"clipped_weights2 = tf.clip_by_norm(weights2, clip_norm=threshold, axes=1)\n",
|
||
"clip_weights2 = tf.assign(weights2, clipped_weights2)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's add an initializer and a saver:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 98,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And now we can train the model. It's pretty much as usual, except that right after running the `training_op`, we run the `clip_weights` and `clip_weights2` operations:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 99,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 20\n",
|
||
"batch_size = 50"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 100,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.Session() as sess: # not shown in the book\n",
|
||
" init.run() # not shown\n",
|
||
" for epoch in range(n_epochs): # not shown\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size): # not shown\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size) # not shown\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" clip_weights.eval()\n",
|
||
" clip_weights2.eval() # not shown\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: mnist.test.images, # not shown\n",
|
||
" y: mnist.test.labels}) # not shown\n",
|
||
" print(epoch, \"Test accuracy:\", acc_test) # not shown\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_model_final.ckpt\") # not shown"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The implementation above is straightforward and it works fine, but it is a bit messy. A better approach is to define a `max_norm_regularizer()` function:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 101,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def max_norm_regularizer(threshold, axes=1, name=\"max_norm\",\n",
|
||
" collection=\"max_norm\"):\n",
|
||
" def max_norm(weights):\n",
|
||
" clipped = tf.clip_by_norm(weights, clip_norm=threshold, axes=axes)\n",
|
||
" clip_weights = tf.assign(weights, clipped, name=name)\n",
|
||
" tf.add_to_collection(collection, clip_weights)\n",
|
||
" return None # there is no regularization loss term\n",
|
||
" return max_norm"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Then you can call this function to get a max norm regularizer (with the threshold you want). When you create a hidden layer, you can pass this regularizer to the `kernel_regularizer` argument:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 102,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28\n",
|
||
"n_hidden1 = 300\n",
|
||
"n_hidden2 = 50\n",
|
||
"n_outputs = 10\n",
|
||
"\n",
|
||
"learning_rate = 0.01\n",
|
||
"momentum = 0.9\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 103,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"max_norm_reg = max_norm_regularizer(threshold=1.0)\n",
|
||
"\n",
|
||
"with tf.name_scope(\"dnn\"):\n",
|
||
" hidden1 = tf.layers.dense(X, n_hidden1, activation=tf.nn.relu,\n",
|
||
" kernel_regularizer=max_norm_reg, name=\"hidden1\")\n",
|
||
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=tf.nn.relu,\n",
|
||
" kernel_regularizer=max_norm_reg, name=\"hidden2\")\n",
|
||
" logits = tf.layers.dense(hidden2, n_outputs, name=\"outputs\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 104,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"with tf.name_scope(\"loss\"):\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"with tf.name_scope(\"train\"):\n",
|
||
" optimizer = tf.train.MomentumOptimizer(learning_rate, momentum)\n",
|
||
" training_op = optimizer.minimize(loss) \n",
|
||
"\n",
|
||
"with tf.name_scope(\"eval\"):\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Training is as usual, except you must run the weights clipping operations after each training operation:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 105,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 20\n",
|
||
"batch_size = 50"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 106,
|
||
"metadata": {
|
||
"scrolled": false
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"clip_all_weights = tf.get_collection(\"max_norm\")\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" sess.run(clip_all_weights)\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: mnist.test.images, # not shown in the book\n",
|
||
" y: mnist.test.labels}) # not shown\n",
|
||
" print(epoch, \"Test accuracy:\", acc_test) # not shown\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_model_final.ckpt\") # not shown"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"source": [
|
||
"# Exercise solutions"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1. to 7."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"See appendix A."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 8. Deep Learning"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 8.1."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"_Exercise: Build a DNN with five hidden layers of 100 neurons each, He initialization, and the ELU activation function._"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We will need similar DNNs in the next exercises, so let's create a function to build this DNN:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 107,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"he_init = tf.contrib.layers.variance_scaling_initializer()\n",
|
||
"\n",
|
||
"def dnn(inputs, n_hidden_layers=5, n_neurons=100, name=None,\n",
|
||
" activation=tf.nn.elu, initializer=he_init):\n",
|
||
" with tf.variable_scope(name, \"dnn\"):\n",
|
||
" for layer in range(n_hidden_layers):\n",
|
||
" inputs = tf.layers.dense(inputs, n_neurons, activation=activation,\n",
|
||
" kernel_initializer=initializer,\n",
|
||
" name=\"hidden%d\" % (layer + 1))\n",
|
||
" return inputs"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 108,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_outputs = 5\n",
|
||
"\n",
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"dnn_outputs = dnn(X)\n",
|
||
"\n",
|
||
"logits = tf.layers.dense(dnn_outputs, n_outputs, kernel_initializer=he_init, name=\"logits\")\n",
|
||
"Y_proba = tf.nn.softmax(logits, name=\"Y_proba\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 8.2."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"_Exercise: Using Adam optimization and early stopping, try training it on MNIST but only on digits 0 to 4, as we will use transfer learning for digits 5 to 9 in the next exercise. You will need a softmax output layer with five neurons, and as always make sure to save checkpoints at regular intervals and save the final model so you can reuse it later._"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's complete the graph with the cost function, the training op, and all the other usual components:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 109,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"learning_rate = 0.01\n",
|
||
"\n",
|
||
"xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
"loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"optimizer = tf.train.AdamOptimizer(learning_rate)\n",
|
||
"training_op = optimizer.minimize(loss, name=\"training_op\")\n",
|
||
"\n",
|
||
"correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
"accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's fetch the MNIST dataset:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 110,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from tensorflow.examples.tutorials.mnist import input_data\n",
|
||
"mnist = input_data.read_data_sets(\"/tmp/data/\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now let's create the training set, validation and test set (we need the validation set to implement early stopping):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 111,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"X_train1 = mnist.train.images[mnist.train.labels < 5]\n",
|
||
"y_train1 = mnist.train.labels[mnist.train.labels < 5]\n",
|
||
"X_valid1 = mnist.validation.images[mnist.validation.labels < 5]\n",
|
||
"y_valid1 = mnist.validation.labels[mnist.validation.labels < 5]\n",
|
||
"X_test1 = mnist.test.images[mnist.test.labels < 5]\n",
|
||
"y_test1 = mnist.test.labels[mnist.test.labels < 5]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 112,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 1000\n",
|
||
"batch_size = 20\n",
|
||
"\n",
|
||
"max_checks_without_progress = 20\n",
|
||
"checks_without_progress = 0\n",
|
||
"best_loss = np.infty\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
"\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" rnd_idx = np.random.permutation(len(X_train1))\n",
|
||
" for rnd_indices in np.array_split(rnd_idx, len(X_train1) // batch_size):\n",
|
||
" X_batch, y_batch = X_train1[rnd_indices], y_train1[rnd_indices]\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" loss_val, acc_val = sess.run([loss, accuracy], feed_dict={X: X_valid1, y: y_valid1})\n",
|
||
" if loss_val < best_loss:\n",
|
||
" save_path = saver.save(sess, \"./my_mnist_model_0_to_4.ckpt\")\n",
|
||
" best_loss = loss_val\n",
|
||
" checks_without_progress = 0\n",
|
||
" else:\n",
|
||
" checks_without_progress += 1\n",
|
||
" if checks_without_progress > max_checks_without_progress:\n",
|
||
" print(\"Early stopping!\")\n",
|
||
" break\n",
|
||
" print(\"{}\\tValidation loss: {:.6f}\\tBest loss: {:.6f}\\tAccuracy: {:.2f}%\".format(\n",
|
||
" epoch, loss_val, best_loss, acc_val * 100))\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" saver.restore(sess, \"./my_mnist_model_0_to_4.ckpt\")\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: X_test1, y: y_test1})\n",
|
||
" print(\"Final test accuracy: {:.2f}%\".format(acc_test * 100))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We get 98.05% accuracy on the test set. That's not too bad, but let's see if we can do better by tuning the hyperparameters."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 8.3."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"_Exercise: Tune the hyperparameters using cross-validation and see what precision you can achieve._"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's create a `DNNClassifier` class, compatible with Scikit-Learn's `RandomizedSearchCV` class, to perform hyperparameter tuning. Here are the key points of this implementation:\n",
|
||
"* the `__init__()` method (constructor) does nothing more than create instance variables for each of the hyperparameters.\n",
|
||
"* the `fit()` method creates the graph, starts a session and trains the model:\n",
|
||
" * it calls the `_build_graph()` method to build the graph (much lile the graph we defined earlier). Once this method is done creating the graph, it saves all the important operations as instance variables for easy access by other methods.\n",
|
||
" * the `_dnn()` method builds the hidden layers, just like the `dnn()` function above, but also with support for batch normalization and dropout (for the next exercises).\n",
|
||
" * if the `fit()` method is given a validation set (`X_valid` and `y_valid`), then it implements early stopping. This implementation does not save the best model to disk, but rather to memory: it uses the `_get_model_params()` method to get all the graph's variables and their values, and the `_restore_model_params()` method to restore the variable values (of the best model found). This trick helps speed up training.\n",
|
||
" * After the `fit()` method has finished training the model, it keeps the session open so that predictions can be made quickly, without having to save a model to disk and restore it for every prediction. You can close the session by calling the `close_session()` method.\n",
|
||
"* the `predict_proba()` method uses the trained model to predict the class probabilities.\n",
|
||
"* the `predict()` method calls `predict_proba()` and returns the class with the highest probability, for each instance."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 113,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.base import BaseEstimator, ClassifierMixin\n",
|
||
"from sklearn.exceptions import NotFittedError\n",
|
||
"\n",
|
||
"class DNNClassifier(BaseEstimator, ClassifierMixin):\n",
|
||
" def __init__(self, n_hidden_layers=5, n_neurons=100, optimizer_class=tf.train.AdamOptimizer,\n",
|
||
" learning_rate=0.01, batch_size=20, activation=tf.nn.elu, initializer=he_init,\n",
|
||
" batch_norm_momentum=None, dropout_rate=None, random_state=None):\n",
|
||
" \"\"\"Initialize the DNNClassifier by simply storing all the hyperparameters.\"\"\"\n",
|
||
" self.n_hidden_layers = n_hidden_layers\n",
|
||
" self.n_neurons = n_neurons\n",
|
||
" self.optimizer_class = optimizer_class\n",
|
||
" self.learning_rate = learning_rate\n",
|
||
" self.batch_size = batch_size\n",
|
||
" self.activation = activation\n",
|
||
" self.initializer = initializer\n",
|
||
" self.batch_norm_momentum = batch_norm_momentum\n",
|
||
" self.dropout_rate = dropout_rate\n",
|
||
" self.random_state = random_state\n",
|
||
" self._session = None\n",
|
||
"\n",
|
||
" def _dnn(self, inputs):\n",
|
||
" \"\"\"Build the hidden layers, with support for batch normalization and dropout.\"\"\"\n",
|
||
" for layer in range(self.n_hidden_layers):\n",
|
||
" if self.dropout_rate:\n",
|
||
" inputs = tf.layers.dropout(inputs, self.dropout_rate, training=self._training)\n",
|
||
" inputs = tf.layers.dense(inputs, self.n_neurons,\n",
|
||
" kernel_initializer=self.initializer,\n",
|
||
" name=\"hidden%d\" % (layer + 1))\n",
|
||
" if self.batch_norm_momentum:\n",
|
||
" inputs = tf.layers.batch_normalization(inputs, momentum=self.batch_norm_momentum,\n",
|
||
" training=self._training)\n",
|
||
" inputs = self.activation(inputs, name=\"hidden%d_out\" % (layer + 1))\n",
|
||
" return inputs\n",
|
||
"\n",
|
||
" def _build_graph(self, n_inputs, n_outputs):\n",
|
||
" \"\"\"Build the same model as earlier\"\"\"\n",
|
||
" if self.random_state is not None:\n",
|
||
" tf.set_random_seed(self.random_state)\n",
|
||
" np.random.seed(self.random_state)\n",
|
||
"\n",
|
||
" X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
" y = tf.placeholder(tf.int32, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
" if self.batch_norm_momentum or self.dropout_rate:\n",
|
||
" self._training = tf.placeholder_with_default(False, shape=(), name='training')\n",
|
||
" else:\n",
|
||
" self._training = None\n",
|
||
"\n",
|
||
" dnn_outputs = self._dnn(X)\n",
|
||
"\n",
|
||
" logits = tf.layers.dense(dnn_outputs, n_outputs, kernel_initializer=he_init, name=\"logits\")\n",
|
||
" Y_proba = tf.nn.softmax(logits, name=\"Y_proba\")\n",
|
||
"\n",
|
||
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,\n",
|
||
" logits=logits)\n",
|
||
" loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
" optimizer = self.optimizer_class(learning_rate=self.learning_rate)\n",
|
||
" training_op = optimizer.minimize(loss)\n",
|
||
"\n",
|
||
" correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")\n",
|
||
"\n",
|
||
" init = tf.global_variables_initializer()\n",
|
||
" saver = tf.train.Saver()\n",
|
||
"\n",
|
||
" # Make the important operations available easily through instance variables\n",
|
||
" self._X, self._y = X, y\n",
|
||
" self._Y_proba, self._loss = Y_proba, loss\n",
|
||
" self._training_op, self._accuracy = training_op, accuracy\n",
|
||
" self._init, self._saver = init, saver\n",
|
||
"\n",
|
||
" def close_session(self):\n",
|
||
" if self._session:\n",
|
||
" self._session.close()\n",
|
||
"\n",
|
||
" def _get_model_params(self):\n",
|
||
" \"\"\"Get all variable values (used for early stopping, faster than saving to disk)\"\"\"\n",
|
||
" with self._graph.as_default():\n",
|
||
" gvars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)\n",
|
||
" return {gvar.op.name: value for gvar, value in zip(gvars, self._session.run(gvars))}\n",
|
||
"\n",
|
||
" def _restore_model_params(self, model_params):\n",
|
||
" \"\"\"Set all variables to the given values (for early stopping, faster than loading from disk)\"\"\"\n",
|
||
" gvar_names = list(model_params.keys())\n",
|
||
" assign_ops = {gvar_name: self._graph.get_operation_by_name(gvar_name + \"/Assign\")\n",
|
||
" for gvar_name in gvar_names}\n",
|
||
" init_values = {gvar_name: assign_op.inputs[1] for gvar_name, assign_op in assign_ops.items()}\n",
|
||
" feed_dict = {init_values[gvar_name]: model_params[gvar_name] for gvar_name in gvar_names}\n",
|
||
" self._session.run(assign_ops, feed_dict=feed_dict)\n",
|
||
"\n",
|
||
" def fit(self, X, y, n_epochs=100, X_valid=None, y_valid=None):\n",
|
||
" \"\"\"Fit the model to the training set. If X_valid and y_valid are provided, use early stopping.\"\"\"\n",
|
||
" self.close_session()\n",
|
||
"\n",
|
||
" # infer n_inputs and n_outputs from the training set.\n",
|
||
" n_inputs = X.shape[1]\n",
|
||
" self.classes_ = np.unique(y)\n",
|
||
" n_outputs = len(self.classes_)\n",
|
||
" \n",
|
||
" # Translate the labels vector to a vector of sorted class indices, containing\n",
|
||
" # integers from 0 to n_outputs - 1.\n",
|
||
" # For example, if y is equal to [8, 8, 9, 5, 7, 6, 6, 6], then the sorted class\n",
|
||
" # labels (self.classes_) will be equal to [5, 6, 7, 8, 9], and the labels vector\n",
|
||
" # will be translated to [3, 3, 4, 0, 2, 1, 1, 1]\n",
|
||
" self.class_to_index_ = {label: index\n",
|
||
" for index, label in enumerate(self.classes_)}\n",
|
||
" y = np.array([self.class_to_index_[label]\n",
|
||
" for label in y], dtype=np.int32)\n",
|
||
" \n",
|
||
" self._graph = tf.Graph()\n",
|
||
" with self._graph.as_default():\n",
|
||
" self._build_graph(n_inputs, n_outputs)\n",
|
||
"\n",
|
||
" # needed in case of early stopping\n",
|
||
" max_checks_without_progress = 20\n",
|
||
" checks_without_progress = 0\n",
|
||
" best_loss = np.infty\n",
|
||
" best_params = None\n",
|
||
"\n",
|
||
" # extra ops for batch normalization\n",
|
||
" extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n",
|
||
" \n",
|
||
" # Now train the model!\n",
|
||
" self._session = tf.Session(graph=self._graph)\n",
|
||
" with self._session.as_default() as sess:\n",
|
||
" self._init.run()\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" rnd_idx = np.random.permutation(len(X))\n",
|
||
" for rnd_indices in np.array_split(rnd_idx, len(X) // self.batch_size):\n",
|
||
" X_batch, y_batch = X[rnd_indices], y[rnd_indices]\n",
|
||
" feed_dict = {self._X: X_batch, self._y: y_batch}\n",
|
||
" if self._training is not None:\n",
|
||
" feed_dict[self._training] = True\n",
|
||
" sess.run(self._training_op, feed_dict=feed_dict)\n",
|
||
" if extra_update_ops:\n",
|
||
" sess.run(extra_update_ops, feed_dict=feed_dict)\n",
|
||
" if X_valid is not None and y_valid is not None:\n",
|
||
" loss_val, acc_val = sess.run([self._loss, self._accuracy],\n",
|
||
" feed_dict={self._X: X_valid,\n",
|
||
" self._y: y_valid})\n",
|
||
" if loss_val < best_loss:\n",
|
||
" best_params = self._get_model_params()\n",
|
||
" best_loss = loss_val\n",
|
||
" checks_without_progress = 0\n",
|
||
" else:\n",
|
||
" checks_without_progress += 1\n",
|
||
" print(\"{}\\tValidation loss: {:.6f}\\tBest loss: {:.6f}\\tAccuracy: {:.2f}%\".format(\n",
|
||
" epoch, loss_val, best_loss, acc_val * 100))\n",
|
||
" if checks_without_progress > max_checks_without_progress:\n",
|
||
" print(\"Early stopping!\")\n",
|
||
" break\n",
|
||
" else:\n",
|
||
" loss_train, acc_train = sess.run([self._loss, self._accuracy],\n",
|
||
" feed_dict={self._X: X_batch,\n",
|
||
" self._y: y_batch})\n",
|
||
" print(\"{}\\tLast training batch loss: {:.6f}\\tAccuracy: {:.2f}%\".format(\n",
|
||
" epoch, loss_train, acc_train * 100))\n",
|
||
" # If we used early stopping then rollback to the best model found\n",
|
||
" if best_params:\n",
|
||
" self._restore_model_params(best_params)\n",
|
||
" return self\n",
|
||
"\n",
|
||
" def predict_proba(self, X):\n",
|
||
" if not self._session:\n",
|
||
" raise NotFittedError(\"This %s instance is not fitted yet\" % self.__class__.__name__)\n",
|
||
" with self._session.as_default() as sess:\n",
|
||
" return self._Y_proba.eval(feed_dict={self._X: X})\n",
|
||
"\n",
|
||
" def predict(self, X):\n",
|
||
" class_indices = np.argmax(self.predict_proba(X), axis=1)\n",
|
||
" return np.array([[self.classes_[class_index]]\n",
|
||
" for class_index in class_indices], np.int32)\n",
|
||
"\n",
|
||
" def save(self, path):\n",
|
||
" self._saver.save(self._session, path)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's see if we get the exact same accuracy as earlier using this class (without dropout or batch norm):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 114,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dnn_clf = DNNClassifier(random_state=42)\n",
|
||
"dnn_clf.fit(X_train1, y_train1, n_epochs=1000, X_valid=X_valid1, y_valid=y_valid1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The model is trained, let's see if it gets the same accuracy as earlier:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 115,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.metrics import accuracy_score\n",
|
||
"\n",
|
||
"y_pred = dnn_clf.predict(X_test1)\n",
|
||
"accuracy_score(y_test1, y_pred)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Yep! Working fine. Now we can use Scikit-Learn's `RandomizedSearchCV` class to search for better hyperparameters (this may take over an hour, depending on your system):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 116,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.model_selection import RandomizedSearchCV\n",
|
||
"\n",
|
||
"def leaky_relu(alpha=0.01):\n",
|
||
" def parametrized_leaky_relu(z, name=None):\n",
|
||
" return tf.maximum(alpha * z, z, name=name)\n",
|
||
" return parametrized_leaky_relu\n",
|
||
"\n",
|
||
"param_distribs = {\n",
|
||
" \"n_neurons\": [10, 30, 50, 70, 90, 100, 120, 140, 160],\n",
|
||
" \"batch_size\": [10, 50, 100, 500],\n",
|
||
" \"learning_rate\": [0.01, 0.02, 0.05, 0.1],\n",
|
||
" \"activation\": [tf.nn.relu, tf.nn.elu, leaky_relu(alpha=0.01), leaky_relu(alpha=0.1)],\n",
|
||
" # you could also try exploring different numbers of hidden layers, different optimizers, etc.\n",
|
||
" #\"n_hidden_layers\": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n",
|
||
" #\"optimizer_class\": [tf.train.AdamOptimizer, partial(tf.train.MomentumOptimizer, momentum=0.95)],\n",
|
||
"}\n",
|
||
"\n",
|
||
"rnd_search = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
||
" fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000},\n",
|
||
" random_state=42, verbose=2)\n",
|
||
"rnd_search.fit(X_train1, y_train1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 117,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"rnd_search.best_params_"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 118,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_pred = rnd_search.predict(X_test1)\n",
|
||
"accuracy_score(y_test1, y_pred)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Wonderful! Tuning the hyperparameters got us up to 99.32% accuracy! It may not sound like a great improvement to go from 98.05% to 99.32% accuracy, but consider the error rate: it went from roughly 2% to 0.7%. That's a 65% reduction of the number of errors this model will produce!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"It's a good idea to save this model:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 119,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"rnd_search.best_estimator_.save(\"./my_best_mnist_model_0_to_4\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 8.4."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"_Exercise: Now try adding Batch Normalization and compare the learning curves: is it converging faster than before? Does it produce a better model?_"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's train the best model found, once again, to see how fast it converges (alternatively, you could tweak the code above to make it write summaries for TensorBoard, so you can visualize the learning curve):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 120,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dnn_clf = DNNClassifier(activation=leaky_relu(alpha=0.1), batch_size=500, learning_rate=0.01,\n",
|
||
" n_neurons=140, random_state=42)\n",
|
||
"dnn_clf.fit(X_train1, y_train1, n_epochs=1000, X_valid=X_valid1, y_valid=y_valid1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The best loss is reached at epoch 19, but it was already within 10% of that result at epoch 9."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's check that we do indeed get 99.32% accuracy on the test set:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 121,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_pred = dnn_clf.predict(X_test1)\n",
|
||
"accuracy_score(y_test1, y_pred)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Good, now let's use the exact same model, but this time with batch normalization:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 122,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dnn_clf_bn = DNNClassifier(activation=leaky_relu(alpha=0.1), batch_size=500, learning_rate=0.01,\n",
|
||
" n_neurons=90, random_state=42,\n",
|
||
" batch_norm_momentum=0.95)\n",
|
||
"dnn_clf_bn.fit(X_train1, y_train1, n_epochs=1000, X_valid=X_valid1, y_valid=y_valid1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The best params are reached during epoch 2, that's much faster than earlier. Let's check the accuracy:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 123,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_pred = dnn_clf_bn.predict(X_test1)\n",
|
||
"accuracy_score(y_test1, y_pred)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Well, batch normalization did not improve accuracy, quite the contrary. Let's see if we can find a good set of hyperparameters that will work well with batch normalization:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 124,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.model_selection import RandomizedSearchCV\n",
|
||
"\n",
|
||
"param_distribs = {\n",
|
||
" \"n_neurons\": [10, 30, 50, 70, 90, 100, 120, 140, 160],\n",
|
||
" \"batch_size\": [10, 50, 100, 500],\n",
|
||
" \"learning_rate\": [0.01, 0.02, 0.05, 0.1],\n",
|
||
" \"activation\": [tf.nn.relu, tf.nn.elu, leaky_relu(alpha=0.01), leaky_relu(alpha=0.1)],\n",
|
||
" # you could also try exploring different numbers of hidden layers, different optimizers, etc.\n",
|
||
" #\"n_hidden_layers\": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n",
|
||
" #\"optimizer_class\": [tf.train.AdamOptimizer, partial(tf.train.MomentumOptimizer, momentum=0.95)],\n",
|
||
" \"batch_norm_momentum\": [0.9, 0.95, 0.98, 0.99, 0.999],\n",
|
||
"}\n",
|
||
"\n",
|
||
"rnd_search_bn = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
||
" fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000},\n",
|
||
" random_state=42, verbose=2)\n",
|
||
"rnd_search_bn.fit(X_train1, y_train1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 125,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"rnd_search_bn.best_params_"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 126,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_pred = rnd_search_bn.predict(X_test1)\n",
|
||
"accuracy_score(y_test1, y_pred)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Oh well! Batch normalization did not help in this case. Let's see if dropout can do better."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 8.5."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"_Exercise: is the model overfitting the training set? Try adding dropout to every layer and try again. Does it help?_"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Since batch normalization did not help, let's go back to the best model we trained earlier and see how it performs on the training set:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 127,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_pred = dnn_clf.predict(X_train1)\n",
|
||
"accuracy_score(y_train1, y_pred)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The model performs significantly better on the training set than on the test set (99.91% vs 99.32%), which means it is overfitting the training set. A bit of regularization may help. Let's try adding dropout with a 50% dropout rate:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 128,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dnn_clf_dropout = DNNClassifier(activation=leaky_relu(alpha=0.1), batch_size=500, learning_rate=0.01,\n",
|
||
" n_neurons=90, random_state=42,\n",
|
||
" dropout_rate=0.5)\n",
|
||
"dnn_clf_dropout.fit(X_train1, y_train1, n_epochs=1000, X_valid=X_valid1, y_valid=y_valid1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The best params are reached during epoch 23. Dropout somewhat slowed down convergence."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's check the accuracy:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 129,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_pred = dnn_clf_dropout.predict(X_test1)\n",
|
||
"accuracy_score(y_test1, y_pred)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We are out of luck, dropout does not seem to help either. Let's try tuning the hyperparameters, perhaps we can squeeze a bit more performance out of this model:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 130,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.model_selection import RandomizedSearchCV\n",
|
||
"\n",
|
||
"param_distribs = {\n",
|
||
" \"n_neurons\": [10, 30, 50, 70, 90, 100, 120, 140, 160],\n",
|
||
" \"batch_size\": [10, 50, 100, 500],\n",
|
||
" \"learning_rate\": [0.01, 0.02, 0.05, 0.1],\n",
|
||
" \"activation\": [tf.nn.relu, tf.nn.elu, leaky_relu(alpha=0.01), leaky_relu(alpha=0.1)],\n",
|
||
" # you could also try exploring different numbers of hidden layers, different optimizers, etc.\n",
|
||
" #\"n_hidden_layers\": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n",
|
||
" #\"optimizer_class\": [tf.train.AdamOptimizer, partial(tf.train.MomentumOptimizer, momentum=0.95)],\n",
|
||
" \"dropout_rate\": [0.2, 0.3, 0.4, 0.5, 0.6],\n",
|
||
"}\n",
|
||
"\n",
|
||
"rnd_search_dropout = RandomizedSearchCV(DNNClassifier(random_state=42), param_distribs, n_iter=50,\n",
|
||
" fit_params={\"X_valid\": X_valid1, \"y_valid\": y_valid1, \"n_epochs\": 1000},\n",
|
||
" random_state=42, verbose=2)\n",
|
||
"rnd_search_dropout.fit(X_train1, y_train1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 131,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"rnd_search_dropout.best_params_"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 132,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_pred = rnd_search_dropout.predict(X_test1)\n",
|
||
"accuracy_score(y_test1, y_pred)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Oh well, neither batch normalization nor dropout improved the model. Better luck next time! :)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"But that's okay, we have ourselves a nice DNN that achieves 99.32% accuracy on the test set. Now, let's see if some of its expertise on digits 0 to 4 can be transferred to the task of classifying digits 5 to 9."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"source": [
|
||
"## 9. Transfer learning"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 9.1."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"_Exercise: create a new DNN that reuses all the pretrained hidden layers of the previous model, freezes them, and replaces the softmax output layer with a new one._"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's load the best model's graph and get a handle on all the important operations we will need. Note that instead of creating a new softmax output layer, we will just reuse the existing one (since it has the same number of outputs as the existing one). We will reinitialize its parameters before training. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 133,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"restore_saver = tf.train.import_meta_graph(\"./my_best_mnist_model_0_to_4.meta\")\n",
|
||
"\n",
|
||
"X = tf.get_default_graph().get_tensor_by_name(\"X:0\")\n",
|
||
"y = tf.get_default_graph().get_tensor_by_name(\"y:0\")\n",
|
||
"loss = tf.get_default_graph().get_tensor_by_name(\"loss:0\")\n",
|
||
"Y_proba = tf.get_default_graph().get_tensor_by_name(\"Y_proba:0\")\n",
|
||
"logits = Y_proba.op.inputs[0]\n",
|
||
"accuracy = tf.get_default_graph().get_tensor_by_name(\"accuracy:0\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"To freeze the lower layers, we will exclude their variables from the optimizer's list of trainable variables, keeping only the output layer's trainable variables:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 134,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"learning_rate = 0.01\n",
|
||
"\n",
|
||
"output_layer_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=\"logits\")\n",
|
||
"optimizer = tf.train.AdamOptimizer(learning_rate, name=\"Adam2\")\n",
|
||
"training_op = optimizer.minimize(loss, var_list=output_layer_vars)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 135,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
"accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"five_frozen_saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 9.2."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"_Exercise: train this new DNN on digits 5 to 9, using only 100 images per digit, and time how long it takes. Despite this small number of examples, can you achieve high precision?_"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's create the training, validation and test sets. We need to subtract 5 from the labels because TensorFlow expects integers from 0 to `n_classes-1`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 136,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"X_train2_full = mnist.train.images[mnist.train.labels >= 5]\n",
|
||
"y_train2_full = mnist.train.labels[mnist.train.labels >= 5] - 5\n",
|
||
"X_valid2_full = mnist.validation.images[mnist.validation.labels >= 5]\n",
|
||
"y_valid2_full = mnist.validation.labels[mnist.validation.labels >= 5] - 5\n",
|
||
"X_test2 = mnist.test.images[mnist.test.labels >= 5]\n",
|
||
"y_test2 = mnist.test.labels[mnist.test.labels >= 5] - 5"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Also, for the purpose of this exercise, we want to keep only 100 instances per class in the training set (and let's keep only 30 instances per class in the validation set). Let's create a small function to do that:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 137,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def sample_n_instances_per_class(X, y, n=100):\n",
|
||
" Xs, ys = [], []\n",
|
||
" for label in np.unique(y):\n",
|
||
" idx = (y == label)\n",
|
||
" Xc = X[idx][:n]\n",
|
||
" yc = y[idx][:n]\n",
|
||
" Xs.append(Xc)\n",
|
||
" ys.append(yc)\n",
|
||
" return np.concatenate(Xs), np.concatenate(ys)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 138,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"X_train2, y_train2 = sample_n_instances_per_class(X_train2_full, y_train2_full, n=100)\n",
|
||
"X_valid2, y_valid2 = sample_n_instances_per_class(X_valid2_full, y_valid2_full, n=30)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now let's train the model. This is the same training code as earlier, using early stopping, except for the initialization: we first initialize all the variables, then we restore the best model trained earlier (on digits 0 to 4), and finally we reinitialize the output layer variables."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 139,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import time\n",
|
||
"\n",
|
||
"n_epochs = 1000\n",
|
||
"batch_size = 20\n",
|
||
"\n",
|
||
"max_checks_without_progress = 20\n",
|
||
"checks_without_progress = 0\n",
|
||
"best_loss = np.infty\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" restore_saver.restore(sess, \"./my_best_mnist_model_0_to_4\")\n",
|
||
" for var in output_layer_vars:\n",
|
||
" var.initializer.run()\n",
|
||
"\n",
|
||
" t0 = time.time()\n",
|
||
" \n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" rnd_idx = np.random.permutation(len(X_train2))\n",
|
||
" for rnd_indices in np.array_split(rnd_idx, len(X_train2) // batch_size):\n",
|
||
" X_batch, y_batch = X_train2[rnd_indices], y_train2[rnd_indices]\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" loss_val, acc_val = sess.run([loss, accuracy], feed_dict={X: X_valid2, y: y_valid2})\n",
|
||
" if loss_val < best_loss:\n",
|
||
" save_path = five_frozen_saver.save(sess, \"./my_mnist_model_5_to_9_five_frozen\")\n",
|
||
" best_loss = loss_val\n",
|
||
" checks_without_progress = 0\n",
|
||
" else:\n",
|
||
" checks_without_progress += 1\n",
|
||
" if checks_without_progress > max_checks_without_progress:\n",
|
||
" print(\"Early stopping!\")\n",
|
||
" break\n",
|
||
" print(\"{}\\tValidation loss: {:.6f}\\tBest loss: {:.6f}\\tAccuracy: {:.2f}%\".format(\n",
|
||
" epoch, loss_val, best_loss, acc_val * 100))\n",
|
||
"\n",
|
||
" t1 = time.time()\n",
|
||
" print(\"Total training time: {:.1f}s\".format(t1 - t0))\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" five_frozen_saver.restore(sess, \"./my_mnist_model_5_to_9_five_frozen\")\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: X_test2, y: y_test2})\n",
|
||
" print(\"Final test accuracy: {:.2f}%\".format(acc_test * 100))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Well that's not a great accuracy, is it? Of course with such a tiny training set, and with only one layer to tweak, we should not expect miracles."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 9.3."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"_Exercise: try caching the frozen layers, and train the model again: how much faster is it now?_"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's start by getting a handle on the output of the last frozen layer:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 140,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"hidden5_out = tf.get_default_graph().get_tensor_by_name(\"hidden5_out:0\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now let's train the model using roughly the same code as earlier. The difference is that we compute the output of the top frozen layer at the beginning (both for the training set and the validation set), and we cache it. This makes training roughly 1.5 to 3 times faster in this example (this may vary greatly, depending on your system): "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 141,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import time\n",
|
||
"\n",
|
||
"n_epochs = 1000\n",
|
||
"batch_size = 20\n",
|
||
"\n",
|
||
"max_checks_without_progress = 20\n",
|
||
"checks_without_progress = 0\n",
|
||
"best_loss = np.infty\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" restore_saver.restore(sess, \"./my_best_mnist_model_0_to_4\")\n",
|
||
" for var in output_layer_vars:\n",
|
||
" var.initializer.run()\n",
|
||
"\n",
|
||
" t0 = time.time()\n",
|
||
" \n",
|
||
" hidden5_train = hidden5_out.eval(feed_dict={X: X_train2, y: y_train2})\n",
|
||
" hidden5_valid = hidden5_out.eval(feed_dict={X: X_valid2, y: y_valid2})\n",
|
||
" \n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" rnd_idx = np.random.permutation(len(X_train2))\n",
|
||
" for rnd_indices in np.array_split(rnd_idx, len(X_train2) // batch_size):\n",
|
||
" h5_batch, y_batch = hidden5_train[rnd_indices], y_train2[rnd_indices]\n",
|
||
" sess.run(training_op, feed_dict={hidden5_out: h5_batch, y: y_batch})\n",
|
||
" loss_val, acc_val = sess.run([loss, accuracy], feed_dict={hidden5_out: hidden5_valid, y: y_valid2})\n",
|
||
" if loss_val < best_loss:\n",
|
||
" save_path = five_frozen_saver.save(sess, \"./my_mnist_model_5_to_9_five_frozen\")\n",
|
||
" best_loss = loss_val\n",
|
||
" checks_without_progress = 0\n",
|
||
" else:\n",
|
||
" checks_without_progress += 1\n",
|
||
" if checks_without_progress > max_checks_without_progress:\n",
|
||
" print(\"Early stopping!\")\n",
|
||
" break\n",
|
||
" print(\"{}\\tValidation loss: {:.6f}\\tBest loss: {:.6f}\\tAccuracy: {:.2f}%\".format(\n",
|
||
" epoch, loss_val, best_loss, acc_val * 100))\n",
|
||
"\n",
|
||
" t1 = time.time()\n",
|
||
" print(\"Total training time: {:.1f}s\".format(t1 - t0))\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" five_frozen_saver.restore(sess, \"./my_mnist_model_5_to_9_five_frozen\")\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: X_test2, y: y_test2})\n",
|
||
" print(\"Final test accuracy: {:.2f}%\".format(acc_test * 100))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 9.4."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"_Exercise: try again reusing just four hidden layers instead of five. Can you achieve a higher precision?_"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's load the best model again, but this time we will create a new softmax output layer on top of the 4th hidden layer:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 142,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_outputs = 5\n",
|
||
"\n",
|
||
"restore_saver = tf.train.import_meta_graph(\"./my_best_mnist_model_0_to_4.meta\")\n",
|
||
"\n",
|
||
"X = tf.get_default_graph().get_tensor_by_name(\"X:0\")\n",
|
||
"y = tf.get_default_graph().get_tensor_by_name(\"y:0\")\n",
|
||
"\n",
|
||
"hidden4_out = tf.get_default_graph().get_tensor_by_name(\"hidden4_out:0\")\n",
|
||
"logits = tf.layers.dense(hidden4_out, n_outputs, kernel_initializer=he_init, name=\"new_logits\")\n",
|
||
"Y_proba = tf.nn.softmax(logits)\n",
|
||
"xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
"loss = tf.reduce_mean(xentropy)\n",
|
||
"correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
"accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And now let's create the training operation. We want to freeze all the layers except for the new output layer:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 143,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"learning_rate = 0.01\n",
|
||
"\n",
|
||
"output_layer_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=\"new_logits\")\n",
|
||
"optimizer = tf.train.AdamOptimizer(learning_rate, name=\"Adam2\")\n",
|
||
"training_op = optimizer.minimize(loss, var_list=output_layer_vars)\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"four_frozen_saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And once again we train the model with the same code as earlier. Note: we could of course write a function once and use it multiple times, rather than copying almost the same training code over and over again, but as we keep tweaking the code slightly, the function would need multiple arguments and `if` statements, and it would have to be at the beginning of the notebook, where it would not make much sense to readers. In short it would be very confusing, so we're better off with copy & paste."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 144,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 1000\n",
|
||
"batch_size = 20\n",
|
||
"\n",
|
||
"max_checks_without_progress = 20\n",
|
||
"checks_without_progress = 0\n",
|
||
"best_loss = np.infty\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" restore_saver.restore(sess, \"./my_best_mnist_model_0_to_4\")\n",
|
||
" \n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" rnd_idx = np.random.permutation(len(X_train2))\n",
|
||
" for rnd_indices in np.array_split(rnd_idx, len(X_train2) // batch_size):\n",
|
||
" X_batch, y_batch = X_train2[rnd_indices], y_train2[rnd_indices]\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" loss_val, acc_val = sess.run([loss, accuracy], feed_dict={X: X_valid2, y: y_valid2})\n",
|
||
" if loss_val < best_loss:\n",
|
||
" save_path = four_frozen_saver.save(sess, \"./my_mnist_model_5_to_9_four_frozen\")\n",
|
||
" best_loss = loss_val\n",
|
||
" checks_without_progress = 0\n",
|
||
" else:\n",
|
||
" checks_without_progress += 1\n",
|
||
" if checks_without_progress > max_checks_without_progress:\n",
|
||
" print(\"Early stopping!\")\n",
|
||
" break\n",
|
||
" print(\"{}\\tValidation loss: {:.6f}\\tBest loss: {:.6f}\\tAccuracy: {:.2f}%\".format(\n",
|
||
" epoch, loss_val, best_loss, acc_val * 100))\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" four_frozen_saver.restore(sess, \"./my_mnist_model_5_to_9_four_frozen\")\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: X_test2, y: y_test2})\n",
|
||
" print(\"Final test accuracy: {:.2f}%\".format(acc_test * 100))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Still not fantastic, but much better."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 9.5."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"_Exercise: now unfreeze the top two hidden layers and continue training: can you get the model to perform even better?_"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 145,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"learning_rate = 0.01\n",
|
||
"\n",
|
||
"unfrozen_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=\"hidden[34]|new_logits\")\n",
|
||
"optimizer = tf.train.AdamOptimizer(learning_rate, name=\"Adam3\")\n",
|
||
"training_op = optimizer.minimize(loss, var_list=unfrozen_vars)\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"two_frozen_saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 146,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 1000\n",
|
||
"batch_size = 20\n",
|
||
"\n",
|
||
"max_checks_without_progress = 20\n",
|
||
"checks_without_progress = 0\n",
|
||
"best_loss = np.infty\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" four_frozen_saver.restore(sess, \"./my_mnist_model_5_to_9_four_frozen\")\n",
|
||
" \n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" rnd_idx = np.random.permutation(len(X_train2))\n",
|
||
" for rnd_indices in np.array_split(rnd_idx, len(X_train2) // batch_size):\n",
|
||
" X_batch, y_batch = X_train2[rnd_indices], y_train2[rnd_indices]\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" loss_val, acc_val = sess.run([loss, accuracy], feed_dict={X: X_valid2, y: y_valid2})\n",
|
||
" if loss_val < best_loss:\n",
|
||
" save_path = two_frozen_saver.save(sess, \"./my_mnist_model_5_to_9_two_frozen\")\n",
|
||
" best_loss = loss_val\n",
|
||
" checks_without_progress = 0\n",
|
||
" else:\n",
|
||
" checks_without_progress += 1\n",
|
||
" if checks_without_progress > max_checks_without_progress:\n",
|
||
" print(\"Early stopping!\")\n",
|
||
" break\n",
|
||
" print(\"{}\\tValidation loss: {:.6f}\\tBest loss: {:.6f}\\tAccuracy: {:.2f}%\".format(\n",
|
||
" epoch, loss_val, best_loss, acc_val * 100))\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" two_frozen_saver.restore(sess, \"./my_mnist_model_5_to_9_two_frozen\")\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: X_test2, y: y_test2})\n",
|
||
" print(\"Final test accuracy: {:.2f}%\".format(acc_test * 100))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's check what accuracy we can get by unfreezing all layers:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 147,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"learning_rate = 0.01\n",
|
||
"\n",
|
||
"optimizer = tf.train.AdamOptimizer(learning_rate, name=\"Adam4\")\n",
|
||
"training_op = optimizer.minimize(loss)\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"no_frozen_saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 148,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 1000\n",
|
||
"batch_size = 20\n",
|
||
"\n",
|
||
"max_checks_without_progress = 20\n",
|
||
"checks_without_progress = 0\n",
|
||
"best_loss = np.infty\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" two_frozen_saver.restore(sess, \"./my_mnist_model_5_to_9_two_frozen\")\n",
|
||
" \n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" rnd_idx = np.random.permutation(len(X_train2))\n",
|
||
" for rnd_indices in np.array_split(rnd_idx, len(X_train2) // batch_size):\n",
|
||
" X_batch, y_batch = X_train2[rnd_indices], y_train2[rnd_indices]\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" loss_val, acc_val = sess.run([loss, accuracy], feed_dict={X: X_valid2, y: y_valid2})\n",
|
||
" if loss_val < best_loss:\n",
|
||
" save_path = no_frozen_saver.save(sess, \"./my_mnist_model_5_to_9_no_frozen\")\n",
|
||
" best_loss = loss_val\n",
|
||
" checks_without_progress = 0\n",
|
||
" else:\n",
|
||
" checks_without_progress += 1\n",
|
||
" if checks_without_progress > max_checks_without_progress:\n",
|
||
" print(\"Early stopping!\")\n",
|
||
" break\n",
|
||
" print(\"{}\\tValidation loss: {:.6f}\\tBest loss: {:.6f}\\tAccuracy: {:.2f}%\".format(\n",
|
||
" epoch, loss_val, best_loss, acc_val * 100))\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" no_frozen_saver.restore(sess, \"./my_mnist_model_5_to_9_no_frozen\")\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: X_test2, y: y_test2})\n",
|
||
" print(\"Final test accuracy: {:.2f}%\".format(acc_test * 100))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's compare that to a DNN trained from scratch:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 149,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dnn_clf_5_to_9 = DNNClassifier(n_hidden_layers=4, random_state=42)\n",
|
||
"dnn_clf_5_to_9.fit(X_train2, y_train2, n_epochs=1000, X_valid=X_valid2, y_valid=y_valid2)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 150,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_pred = dnn_clf_5_to_9.predict(X_test2)\n",
|
||
"accuracy_score(y_test2, y_pred)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Meh. How disappointing! ;) Transfer learning did not help much (if at all) in this task. At least we tried... Fortunately, the next exercise will get better results."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 10. Pretraining on an auxiliary task"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In this exercise you will build a DNN that compares two MNIST digit images and predicts whether they represent the same digit or not. Then you will reuse the lower layers of this network to train an MNIST classifier using very little training data."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 10.1.\n",
|
||
"Exercise: _Start by building two DNNs (let's call them DNN A and B), both similar to the one you built earlier but without the output layer: each DNN should have five hidden layers of 100 neurons each, He initialization, and ELU activation. Next, add one more hidden layer with 10 units on top of both DNNs. You should use TensorFlow's `concat()` function with `axis=1` to concatenate the outputs of both DNNs along the horizontal axis, then feed the result to the hidden layer. Finally, add an output layer with a single neuron using the logistic activation function._"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Warning**! There was an error in the book for this exercise: there was no instruction to add a top hidden layer. Without it, the neural network generally fails to start learning. If you have the latest version of the book, this error has been fixed."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"You could have two input placeholders, `X1` and `X2`, one for the images that should be fed to the first DNN, and the other for the images that should be fed to the second DNN. It would work fine. However, another option is to have a single input placeholder to hold both sets of images (each row will hold a pair of images), and use `tf.unstack()` to split this tensor into two separate tensors, like this:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 151,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"\n",
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, 2, n_inputs), name=\"X\")\n",
|
||
"X1, X2 = tf.unstack(X, axis=1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We also need the labels placeholder. Each label will be 0 if the images represent different digits, or 1 if they represent the same digit:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 152,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"y = tf.placeholder(tf.int32, shape=[None, 1])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now let's feed these inputs through two separate DNNs:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 153,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dnn1 = dnn(X1, name=\"DNN_A\")\n",
|
||
"dnn2 = dnn(X2, name=\"DNN_B\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And let's concatenate their outputs:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 154,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"dnn_outputs = tf.concat([dnn1, dnn2], axis=1)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Each DNN outputs 100 activations (per instance), so the shape is `[None, 100]`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 155,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dnn1.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 156,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dnn2.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And of course the concatenated outputs have a shape of `[None, 200]`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 157,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"dnn_outputs.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now lets add an extra hidden layer with just 10 neurons, and the output layer, with a single neuron:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 158,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"hidden = tf.layers.dense(dnn_outputs, units=10, activation=tf.nn.elu, kernel_initializer=he_init)\n",
|
||
"logits = tf.layers.dense(hidden, units=1, kernel_initializer=he_init)\n",
|
||
"y_proba = tf.nn.sigmoid(logits)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The whole network predicts `1` if `y_proba >= 0.5` (i.e. the network predicts that the images represent the same digit), or `0` otherwise. We compute instead `logits >= 0`, which is equivalent but faster to compute: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 159,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_pred = tf.cast(tf.greater_equal(logits, 0), tf.int32)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now let's add the cost function:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 160,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_as_float = tf.cast(y, tf.float32)\n",
|
||
"xentropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_as_float, logits=logits)\n",
|
||
"loss = tf.reduce_mean(xentropy)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And we can now create the training operation using an optimizer:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 161,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"learning_rate = 0.01\n",
|
||
"momentum = 0.95\n",
|
||
"\n",
|
||
"optimizer = tf.train.MomentumOptimizer(learning_rate, momentum, use_nesterov=True)\n",
|
||
"training_op = optimizer.minimize(loss)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We will want to measure our classifier's accuracy."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 162,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_pred_correct = tf.equal(y_pred, y)\n",
|
||
"accuracy = tf.reduce_mean(tf.cast(y_pred_correct, tf.float32))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And the usual `init` and `saver`:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 163,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"init = tf.global_variables_initializer()\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 10.2.\n",
|
||
"_Exercise: split the MNIST training set in two sets: split #1 should containing 55,000 images, and split #2 should contain contain 5,000 images. Create a function that generates a training batch where each instance is a pair of MNIST images picked from split #1. Half of the training instances should be pairs of images that belong to the same class, while the other half should be images from different classes. For each pair, the training label should be 0 if the images are from the same class, or 1 if they are from different classes._"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"The MNIST dataset returned by TensorFlow's `input_data()` function is already split into 3 parts: a training set (55,000 instances), a validation set (5,000 instances) and a test set (10,000 instances). Let's use the first set to generate the training set composed image pairs, and we will use the second set for the second phase of the exercise (to train a regular MNIST classifier). We will use the third set as the test set for both phases."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 164,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"X_train1 = mnist.train.images\n",
|
||
"y_train1 = mnist.train.labels\n",
|
||
"\n",
|
||
"X_train2 = mnist.validation.images\n",
|
||
"y_train2 = mnist.validation.labels\n",
|
||
"\n",
|
||
"X_test = mnist.test.images\n",
|
||
"y_test = mnist.test.labels"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's write a function that generates pairs of images: 50% representing the same digit, and 50% representing different digits. There are many ways to implement this. In this implementation, we first decide how many \"same\" pairs (i.e. pairs of images representing the same digit) we will generate, and how many \"different\" pairs (i.e. pairs of images representing different digits). We could just use `batch_size // 2` but we want to handle the case where it is odd (granted, that might be overkill!). Then we generate random pairs and we pick the right number of \"same\" pairs, then we generate the right number of \"different\" pairs. Finally we shuffle the batch and return it:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 165,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def generate_batch(images, labels, batch_size):\n",
|
||
" size1 = batch_size // 2\n",
|
||
" size2 = batch_size - size1\n",
|
||
" if size1 != size2 and np.random.rand() > 0.5:\n",
|
||
" size1, size2 = size2, size1\n",
|
||
" X = []\n",
|
||
" y = []\n",
|
||
" while len(X) < size1:\n",
|
||
" rnd_idx1, rnd_idx2 = np.random.randint(0, len(images), 2)\n",
|
||
" if rnd_idx1 != rnd_idx2 and labels[rnd_idx1] == labels[rnd_idx2]:\n",
|
||
" X.append(np.array([images[rnd_idx1], images[rnd_idx2]]))\n",
|
||
" y.append([1])\n",
|
||
" while len(X) < batch_size:\n",
|
||
" rnd_idx1, rnd_idx2 = np.random.randint(0, len(images), 2)\n",
|
||
" if labels[rnd_idx1] != labels[rnd_idx2]:\n",
|
||
" X.append(np.array([images[rnd_idx1], images[rnd_idx2]]))\n",
|
||
" y.append([0])\n",
|
||
" rnd_indices = np.random.permutation(batch_size)\n",
|
||
" return np.array(X)[rnd_indices], np.array(y)[rnd_indices]"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's test it to generate a small batch of 5 image pairs:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 166,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"batch_size = 5\n",
|
||
"X_batch, y_batch = generate_batch(X_train1, y_train1, batch_size)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Each row in `X_batch` contains a pair of images:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 167,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"X_batch.shape, X_batch.dtype"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's look at these pairs:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 168,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"plt.figure(figsize=(3, 3 * batch_size))\n",
|
||
"plt.subplot(121)\n",
|
||
"plt.imshow(X_batch[:,0].reshape(28 * batch_size, 28), cmap=\"binary\", interpolation=\"nearest\")\n",
|
||
"plt.axis('off')\n",
|
||
"plt.subplot(122)\n",
|
||
"plt.imshow(X_batch[:,1].reshape(28 * batch_size, 28), cmap=\"binary\", interpolation=\"nearest\")\n",
|
||
"plt.axis('off')\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And let's look at the labels (0 means \"different\", 1 means \"same\"):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 169,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"y_batch"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Perfect!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 10.3.\n",
|
||
"_Exercise: train the DNN on this training set. For each image pair, you can simultaneously feed the first image to DNN A and the second image to DNN B. The whole network will gradually learn to tell whether two images belong to the same class or not._"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's generate a test set composed of many pairs of images pulled from the MNIST test set:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 170,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"X_test1, y_test1 = generate_batch(X_test, y_test, batch_size=len(X_test))"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"And now, let's train the model. There's really nothing special about this step, except for the fact that we need a fairly large `batch_size`, otherwise the model fails to learn anything and ends up with an accuracy of 50%:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 171,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 100\n",
|
||
"batch_size = 500\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" for iteration in range(mnist.train.num_examples // batch_size):\n",
|
||
" X_batch, y_batch = generate_batch(X_train1, y_train1, batch_size)\n",
|
||
" loss_val, _ = sess.run([loss, training_op], feed_dict={X: X_batch, y: y_batch})\n",
|
||
" print(epoch, \"Train loss:\", loss_val)\n",
|
||
" if epoch % 5 == 0:\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: X_test1, y: y_test1})\n",
|
||
" print(epoch, \"Test accuracy:\", acc_test)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_digit_comparison_model.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"All right, we reach 97.6% accuracy on this digit comparison task. That's not too bad, this model knows a thing or two about comparing handwritten digits!\n",
|
||
"\n",
|
||
"Let's see if some of that knowledge can be useful for the regular MNIST classification task."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 10.4.\n",
|
||
"_Exercise: now create a new DNN by reusing and freezing the hidden layers of DNN A and adding a softmax output layer on top with 10 neurons. Train this network on split #2 and see if you can achieve high performance despite having only 500 images per class._"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let's create the model, it is pretty straightforward. There are many ways to freeze the lower layers, as explained in the book. In this example, we chose to use the `tf.stop_gradient()` function. Note that we need one `Saver` to restore the pretrained DNN A, and another `Saver` to save the final model: "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 172,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_outputs = 10\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int32, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"dnn_outputs = dnn(X, name=\"DNN_A\")\n",
|
||
"frozen_outputs = tf.stop_gradient(dnn_outputs)\n",
|
||
"\n",
|
||
"logits = tf.layers.dense(dnn_outputs, n_outputs, kernel_initializer=he_init)\n",
|
||
"Y_proba = tf.nn.softmax(logits)\n",
|
||
"\n",
|
||
"xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
"loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"optimizer = tf.train.MomentumOptimizer(learning_rate, momentum, use_nesterov=True)\n",
|
||
"training_op = optimizer.minimize(loss)\n",
|
||
"\n",
|
||
"correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
"accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"\n",
|
||
"dnn_A_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=\"DNN_A\")\n",
|
||
"restore_saver = tf.train.Saver(var_list={var.op.name: var for var in dnn_A_vars})\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Now on to training! We first initialize all variables (including the variables in the new output layer), then we restore the pretrained DNN A. Next, we just train the model on the small MNIST dataset (containing just 5,000 images):"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 173,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 100\n",
|
||
"batch_size = 50\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
" restore_saver.restore(sess, \"./my_digit_comparison_model.ckpt\")\n",
|
||
"\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" rnd_idx = np.random.permutation(len(X_train2))\n",
|
||
" for rnd_indices in np.array_split(rnd_idx, len(X_train2) // batch_size):\n",
|
||
" X_batch, y_batch = X_train2[rnd_indices], y_train2[rnd_indices]\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" if epoch % 10 == 0:\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})\n",
|
||
" print(epoch, \"Test accuracy:\", acc_test)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_mnist_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Well, 96.7% accuracy, that's not the best MNIST model we have trained so far, but recall that we are only using a small training set (just 500 images per digit). Let's compare this result with the same DNN trained from scratch, without using transfer learning:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 174,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"reset_graph()\n",
|
||
"\n",
|
||
"n_inputs = 28 * 28 # MNIST\n",
|
||
"n_outputs = 10\n",
|
||
"\n",
|
||
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
|
||
"y = tf.placeholder(tf.int32, shape=(None), name=\"y\")\n",
|
||
"\n",
|
||
"dnn_outputs = dnn(X, name=\"DNN_A\")\n",
|
||
"\n",
|
||
"logits = tf.layers.dense(dnn_outputs, n_outputs, kernel_initializer=he_init)\n",
|
||
"Y_proba = tf.nn.softmax(logits)\n",
|
||
"\n",
|
||
"xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
|
||
"loss = tf.reduce_mean(xentropy, name=\"loss\")\n",
|
||
"\n",
|
||
"optimizer = tf.train.MomentumOptimizer(learning_rate, momentum, use_nesterov=True)\n",
|
||
"training_op = optimizer.minimize(loss)\n",
|
||
"\n",
|
||
"correct = tf.nn.in_top_k(logits, y, 1)\n",
|
||
"accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))\n",
|
||
"\n",
|
||
"init = tf.global_variables_initializer()\n",
|
||
"\n",
|
||
"dnn_A_vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=\"DNN_A\")\n",
|
||
"restore_saver = tf.train.Saver(var_list={var.op.name: var for var in dnn_A_vars})\n",
|
||
"saver = tf.train.Saver()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 175,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"n_epochs = 150\n",
|
||
"batch_size = 50\n",
|
||
"\n",
|
||
"with tf.Session() as sess:\n",
|
||
" init.run()\n",
|
||
"\n",
|
||
" for epoch in range(n_epochs):\n",
|
||
" rnd_idx = np.random.permutation(len(X_train2))\n",
|
||
" for rnd_indices in np.array_split(rnd_idx, len(X_train2) // batch_size):\n",
|
||
" X_batch, y_batch = X_train2[rnd_indices], y_train2[rnd_indices]\n",
|
||
" sess.run(training_op, feed_dict={X: X_batch, y: y_batch})\n",
|
||
" if epoch % 10 == 0:\n",
|
||
" acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})\n",
|
||
" print(epoch, \"Test accuracy:\", acc_test)\n",
|
||
"\n",
|
||
" save_path = saver.save(sess, \"./my_mnist_model_final.ckpt\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Only 94.8% accuracy... So transfer learning helped us reduce the error rate from 5.2% to 3.3% (that's over 36% error reduction). Moreover, the model using transfer learning reached over 96% accuracy in less than 10 epochs.\n",
|
||
"\n",
|
||
"Bottom line: transfer learning does not always work (as we saw in exercise 9), but when it does it can make a big difference. So try it out!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {
|
||
"collapsed": true
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.5.3"
|
||
},
|
||
"nav_menu": {
|
||
"height": "360px",
|
||
"width": "416px"
|
||
},
|
||
"toc": {
|
||
"navigate_menu": true,
|
||
"number_sections": true,
|
||
"sideBar": true,
|
||
"threshold": 6,
|
||
"toc_cell": false,
|
||
"toc_section_display": "block",
|
||
"toc_window_display": false
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 1
|
||
}
|