handson-ml/17_autoencoders_and_gans.ipynb

1797 lines
55 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Chapter 17 Autoencoders and GANs**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_This notebook contains all the sample code in chapter 17._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<table align=\"left\">\n",
" <td>\n",
" <a href=\"https://colab.research.google.com/github/ageron/handson-ml2/blob/master/17_autoencoders_and_gans.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://kaggle.com/kernels/welcome?src=https://github.com/ageron/handson-ml2/blob/master/17_autoencoders_and_gans.ipynb\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" /></a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Python ≥3.5 is required\n",
"import sys\n",
"assert sys.version_info >= (3, 5)\n",
"\n",
"# Is this notebook running on Colab or Kaggle?\n",
"IS_COLAB = \"google.colab\" in sys.modules\n",
"IS_KAGGLE = \"kaggle_secrets\" in sys.modules\n",
"\n",
"# Scikit-Learn ≥0.20 is required\n",
"import sklearn\n",
"assert sklearn.__version__ >= \"0.20\"\n",
"\n",
"# TensorFlow ≥2.0 is required\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"assert tf.__version__ >= \"2.0\"\n",
"\n",
"if not tf.config.list_physical_devices('GPU'):\n",
" print(\"No GPU was detected. LSTMs and CNNs can be very slow without a GPU.\")\n",
" if IS_COLAB:\n",
" print(\"Go to Runtime > Change runtime and select a GPU hardware accelerator.\")\n",
" if IS_KAGGLE:\n",
" print(\"Go to Settings > Accelerator and select GPU.\")\n",
"\n",
"# Common imports\n",
"import numpy as np\n",
"import os\n",
"\n",
"# to make this notebook's output stable across runs\n",
"np.random.seed(42)\n",
"tf.random.set_seed(42)\n",
"\n",
"# To plot pretty figures\n",
"%matplotlib inline\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"mpl.rc('axes', labelsize=14)\n",
"mpl.rc('xtick', labelsize=12)\n",
"mpl.rc('ytick', labelsize=12)\n",
"\n",
"# Where to save the figures\n",
"PROJECT_ROOT_DIR = \".\"\n",
"CHAPTER_ID = \"autoencoders\"\n",
"IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID)\n",
"os.makedirs(IMAGES_PATH, exist_ok=True)\n",
"\n",
"def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n",
" path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n",
" print(\"Saving figure\", fig_id)\n",
" if tight_layout:\n",
" plt.tight_layout()\n",
" plt.savefig(path, format=fig_extension, dpi=resolution)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A couple utility functions to plot grayscale 28x28 image:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def plot_image(image):\n",
" plt.imshow(image, cmap=\"binary\")\n",
" plt.axis(\"off\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PCA with a linear Autoencoder"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Build 3D dataset:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(4)\n",
"\n",
"def generate_3d_data(m, w1=0.1, w2=0.3, noise=0.1):\n",
" angles = np.random.rand(m) * 3 * np.pi / 2 - 0.5\n",
" data = np.empty((m, 3))\n",
" data[:, 0] = np.cos(angles) + np.sin(angles)/2 + noise * np.random.randn(m) / 2\n",
" data[:, 1] = np.sin(angles) * 0.7 + noise * np.random.randn(m) / 2\n",
" data[:, 2] = data[:, 0] * w1 + data[:, 1] * w2 + noise * np.random.randn(m)\n",
" return data\n",
"\n",
"X_train = generate_3d_data(60)\n",
"X_train = X_train - X_train.mean(axis=0, keepdims=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's build the Autoencoder..."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(42)\n",
"tf.random.set_seed(42)\n",
"\n",
"encoder = keras.models.Sequential([keras.layers.Dense(2, input_shape=[3])])\n",
"decoder = keras.models.Sequential([keras.layers.Dense(3, input_shape=[2])])\n",
"autoencoder = keras.models.Sequential([encoder, decoder])\n",
"\n",
"autoencoder.compile(loss=\"mse\", optimizer=keras.optimizers.SGD(lr=1.5))"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"history = autoencoder.fit(X_train, X_train, epochs=20)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"codings = encoder.predict(X_train)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"fig = plt.figure(figsize=(4,3))\n",
"plt.plot(codings[:,0], codings[:, 1], \"b.\")\n",
"plt.xlabel(\"$z_1$\", fontsize=18)\n",
"plt.ylabel(\"$z_2$\", fontsize=18, rotation=0)\n",
"plt.grid(True)\n",
"save_fig(\"linear_autoencoder_pca_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Stacked Autoencoders"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's use MNIST:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()\n",
"X_train_full = X_train_full.astype(np.float32) / 255\n",
"X_test = X_test.astype(np.float32) / 255\n",
"X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n",
"y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train all layers at once"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's build a stacked Autoencoder with 3 hidden layers and 1 output layer (i.e., 2 stacked Autoencoders)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def rounded_accuracy(y_true, y_pred):\n",
" return keras.metrics.binary_accuracy(tf.round(y_true), tf.round(y_pred))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"stacked_encoder = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(100, activation=\"selu\"),\n",
" keras.layers.Dense(30, activation=\"selu\"),\n",
"])\n",
"stacked_decoder = keras.models.Sequential([\n",
" keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n",
" keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n",
" keras.layers.Reshape([28, 28])\n",
"])\n",
"stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])\n",
"stacked_ae.compile(loss=\"binary_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1.5), metrics=[rounded_accuracy])\n",
"history = stacked_ae.fit(X_train, X_train, epochs=20,\n",
" validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This function processes a few test images through the autoencoder and displays the original images and their reconstructions:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"def show_reconstructions(model, images=X_valid, n_images=5):\n",
" reconstructions = model.predict(images[:n_images])\n",
" fig = plt.figure(figsize=(n_images * 1.5, 3))\n",
" for image_index in range(n_images):\n",
" plt.subplot(2, n_images, 1 + image_index)\n",
" plot_image(images[image_index])\n",
" plt.subplot(2, n_images, 1 + n_images + image_index)\n",
" plot_image(reconstructions[image_index])"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"show_reconstructions(stacked_ae)\n",
"save_fig(\"reconstruction_plot\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Visualizing Fashion MNIST"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(42)\n",
"\n",
"from sklearn.manifold import TSNE\n",
"\n",
"X_valid_compressed = stacked_encoder.predict(X_valid)\n",
"tsne = TSNE()\n",
"X_valid_2D = tsne.fit_transform(X_valid_compressed)\n",
"X_valid_2D = (X_valid_2D - X_valid_2D.min()) / (X_valid_2D.max() - X_valid_2D.min())"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"plt.scatter(X_valid_2D[:, 0], X_valid_2D[:, 1], c=y_valid, s=10, cmap=\"tab10\")\n",
"plt.axis(\"off\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's make this diagram a bit prettier:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"# adapted from https://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html\n",
"plt.figure(figsize=(10, 8))\n",
"cmap = plt.cm.tab10\n",
"plt.scatter(X_valid_2D[:, 0], X_valid_2D[:, 1], c=y_valid, s=10, cmap=cmap)\n",
"image_positions = np.array([[1., 1.]])\n",
"for index, position in enumerate(X_valid_2D):\n",
" dist = np.sum((position - image_positions) ** 2, axis=1)\n",
" if np.min(dist) > 0.02: # if far enough from other images\n",
" image_positions = np.r_[image_positions, [position]]\n",
" imagebox = mpl.offsetbox.AnnotationBbox(\n",
" mpl.offsetbox.OffsetImage(X_valid[index], cmap=\"binary\"),\n",
" position, bboxprops={\"edgecolor\": cmap(y_valid[index]), \"lw\": 2})\n",
" plt.gca().add_artist(imagebox)\n",
"plt.axis(\"off\")\n",
"save_fig(\"fashion_mnist_visualization_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tying weights"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is common to tie the weights of the encoder and the decoder, by simply using the transpose of the encoder's weights as the decoder weights. For this, we need to use a custom layer."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"class DenseTranspose(keras.layers.Layer):\n",
" def __init__(self, dense, activation=None, **kwargs):\n",
" self.dense = dense\n",
" self.activation = keras.activations.get(activation)\n",
" super().__init__(**kwargs)\n",
" def build(self, batch_input_shape):\n",
" self.biases = self.add_weight(name=\"bias\",\n",
" shape=[self.dense.input_shape[-1]],\n",
" initializer=\"zeros\")\n",
" super().build(batch_input_shape)\n",
" def call(self, inputs):\n",
" z = tf.matmul(inputs, self.dense.weights[0], transpose_b=True)\n",
" return self.activation(z + self.biases)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"keras.backend.clear_session()\n",
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"dense_1 = keras.layers.Dense(100, activation=\"selu\")\n",
"dense_2 = keras.layers.Dense(30, activation=\"selu\")\n",
"\n",
"tied_encoder = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" dense_1,\n",
" dense_2\n",
"])\n",
"\n",
"tied_decoder = keras.models.Sequential([\n",
" DenseTranspose(dense_2, activation=\"selu\"),\n",
" DenseTranspose(dense_1, activation=\"sigmoid\"),\n",
" keras.layers.Reshape([28, 28])\n",
"])\n",
"\n",
"tied_ae = keras.models.Sequential([tied_encoder, tied_decoder])\n",
"\n",
"tied_ae.compile(loss=\"binary_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=1.5), metrics=[rounded_accuracy])\n",
"history = tied_ae.fit(X_train, X_train, epochs=10,\n",
" validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"show_reconstructions(tied_ae)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training one Autoencoder at a Time"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"def train_autoencoder(n_neurons, X_train, X_valid, loss, optimizer,\n",
" n_epochs=10, output_activation=None, metrics=None):\n",
" n_inputs = X_train.shape[-1]\n",
" encoder = keras.models.Sequential([\n",
" keras.layers.Dense(n_neurons, activation=\"selu\", input_shape=[n_inputs])\n",
" ])\n",
" decoder = keras.models.Sequential([\n",
" keras.layers.Dense(n_inputs, activation=output_activation),\n",
" ])\n",
" autoencoder = keras.models.Sequential([encoder, decoder])\n",
" autoencoder.compile(optimizer, loss, metrics=metrics)\n",
" autoencoder.fit(X_train, X_train, epochs=n_epochs,\n",
" validation_data=(X_valid, X_valid))\n",
" return encoder, decoder, encoder(X_train), encoder(X_valid)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"K = keras.backend\n",
"X_train_flat = K.batch_flatten(X_train) # equivalent to .reshape(-1, 28 * 28)\n",
"X_valid_flat = K.batch_flatten(X_valid)\n",
"enc1, dec1, X_train_enc1, X_valid_enc1 = train_autoencoder(\n",
" 100, X_train_flat, X_valid_flat, \"binary_crossentropy\",\n",
" keras.optimizers.SGD(lr=1.5), output_activation=\"sigmoid\",\n",
" metrics=[rounded_accuracy])\n",
"enc2, dec2, _, _ = train_autoencoder(\n",
" 30, X_train_enc1, X_valid_enc1, \"mse\", keras.optimizers.SGD(lr=0.05),\n",
" output_activation=\"selu\")"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"stacked_ae_1_by_1 = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" enc1, enc2, dec2, dec1,\n",
" keras.layers.Reshape([28, 28])\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"show_reconstructions(stacked_ae_1_by_1)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"stacked_ae_1_by_1.compile(loss=\"binary_crossentropy\",\n",
" optimizer=keras.optimizers.SGD(lr=0.1), metrics=[rounded_accuracy])\n",
"history = stacked_ae_1_by_1.fit(X_train, X_train, epochs=10,\n",
" validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"show_reconstructions(stacked_ae_1_by_1)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using Convolutional Layers Instead of Dense Layers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's build a stacked Autoencoder with 3 hidden layers and 1 output layer (i.e., 2 stacked Autoencoders)."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"conv_encoder = keras.models.Sequential([\n",
" keras.layers.Reshape([28, 28, 1], input_shape=[28, 28]),\n",
" keras.layers.Conv2D(16, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n",
" keras.layers.MaxPool2D(pool_size=2),\n",
" keras.layers.Conv2D(32, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n",
" keras.layers.MaxPool2D(pool_size=2),\n",
" keras.layers.Conv2D(64, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n",
" keras.layers.MaxPool2D(pool_size=2)\n",
"])\n",
"conv_decoder = keras.models.Sequential([\n",
" keras.layers.Conv2DTranspose(32, kernel_size=3, strides=2, padding=\"VALID\", activation=\"selu\",\n",
" input_shape=[3, 3, 64]),\n",
" keras.layers.Conv2DTranspose(16, kernel_size=3, strides=2, padding=\"SAME\", activation=\"selu\"),\n",
" keras.layers.Conv2DTranspose(1, kernel_size=3, strides=2, padding=\"SAME\", activation=\"sigmoid\"),\n",
" keras.layers.Reshape([28, 28])\n",
"])\n",
"conv_ae = keras.models.Sequential([conv_encoder, conv_decoder])\n",
"\n",
"conv_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n",
" metrics=[rounded_accuracy])\n",
"history = conv_ae.fit(X_train, X_train, epochs=5,\n",
" validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"conv_encoder.summary()\n",
"conv_decoder.summary()"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"show_reconstructions(conv_ae)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Recurrent Autoencoders"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"recurrent_encoder = keras.models.Sequential([\n",
" keras.layers.LSTM(100, return_sequences=True, input_shape=[28, 28]),\n",
" keras.layers.LSTM(30)\n",
"])\n",
"recurrent_decoder = keras.models.Sequential([\n",
" keras.layers.RepeatVector(28, input_shape=[30]),\n",
" keras.layers.LSTM(100, return_sequences=True),\n",
" keras.layers.TimeDistributed(keras.layers.Dense(28, activation=\"sigmoid\"))\n",
"])\n",
"recurrent_ae = keras.models.Sequential([recurrent_encoder, recurrent_decoder])\n",
"recurrent_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(0.1),\n",
" metrics=[rounded_accuracy])"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"history = recurrent_ae.fit(X_train, X_train, epochs=10, validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"show_reconstructions(recurrent_ae)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Stacked denoising Autoencoder"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using Gaussian noise:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"denoising_encoder = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.GaussianNoise(0.2),\n",
" keras.layers.Dense(100, activation=\"selu\"),\n",
" keras.layers.Dense(30, activation=\"selu\")\n",
"])\n",
"denoising_decoder = keras.models.Sequential([\n",
" keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n",
" keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n",
" keras.layers.Reshape([28, 28])\n",
"])\n",
"denoising_ae = keras.models.Sequential([denoising_encoder, denoising_decoder])\n",
"denoising_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n",
" metrics=[rounded_accuracy])\n",
"history = denoising_ae.fit(X_train, X_train, epochs=10,\n",
" validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"noise = keras.layers.GaussianNoise(0.2)\n",
"show_reconstructions(denoising_ae, noise(X_valid, training=True))\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using dropout:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"dropout_encoder = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(100, activation=\"selu\"),\n",
" keras.layers.Dense(30, activation=\"selu\")\n",
"])\n",
"dropout_decoder = keras.models.Sequential([\n",
" keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n",
" keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n",
" keras.layers.Reshape([28, 28])\n",
"])\n",
"dropout_ae = keras.models.Sequential([dropout_encoder, dropout_decoder])\n",
"dropout_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n",
" metrics=[rounded_accuracy])\n",
"history = dropout_ae.fit(X_train, X_train, epochs=10,\n",
" validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"dropout = keras.layers.Dropout(0.5)\n",
"show_reconstructions(dropout_ae, dropout(X_valid, training=True))\n",
"save_fig(\"dropout_denoising_plot\", tight_layout=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sparse Autoencoder"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's build a simple stacked autoencoder, so we can compare it to the sparse autoencoders we will build. This time we will use the sigmoid activation function for the coding layer, to ensure that the coding values range from 0 to 1:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"simple_encoder = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(100, activation=\"selu\"),\n",
" keras.layers.Dense(30, activation=\"sigmoid\"),\n",
"])\n",
"simple_decoder = keras.models.Sequential([\n",
" keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n",
" keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n",
" keras.layers.Reshape([28, 28])\n",
"])\n",
"simple_ae = keras.models.Sequential([simple_encoder, simple_decoder])\n",
"simple_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.),\n",
" metrics=[rounded_accuracy])\n",
"history = simple_ae.fit(X_train, X_train, epochs=10,\n",
" validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"show_reconstructions(simple_ae)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create a couple functions to print nice activation histograms:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"def plot_percent_hist(ax, data, bins):\n",
" counts, _ = np.histogram(data, bins=bins)\n",
" widths = bins[1:] - bins[:-1]\n",
" x = bins[:-1] + widths / 2\n",
" ax.bar(x, counts / len(data), width=widths*0.8)\n",
" ax.xaxis.set_ticks(bins)\n",
" ax.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(\n",
" lambda y, position: \"{}%\".format(int(np.round(100 * y)))))\n",
" ax.grid(True)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"def plot_activations_histogram(encoder, height=1, n_bins=10):\n",
" X_valid_codings = encoder(X_valid).numpy()\n",
" activation_means = X_valid_codings.mean(axis=0)\n",
" mean = activation_means.mean()\n",
" bins = np.linspace(0, 1, n_bins + 1)\n",
"\n",
" fig, [ax1, ax2] = plt.subplots(figsize=(10, 3), nrows=1, ncols=2, sharey=True)\n",
" plot_percent_hist(ax1, X_valid_codings.ravel(), bins)\n",
" ax1.plot([mean, mean], [0, height], \"k--\", label=\"Overall Mean = {:.2f}\".format(mean))\n",
" ax1.legend(loc=\"upper center\", fontsize=14)\n",
" ax1.set_xlabel(\"Activation\")\n",
" ax1.set_ylabel(\"% Activations\")\n",
" ax1.axis([0, 1, 0, height])\n",
" plot_percent_hist(ax2, activation_means, bins)\n",
" ax2.plot([mean, mean], [0, height], \"k--\")\n",
" ax2.set_xlabel(\"Neuron Mean Activation\")\n",
" ax2.set_ylabel(\"% Neurons\")\n",
" ax2.axis([0, 1, 0, height])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's use these functions to plot histograms of the activations of the encoding layer. The histogram on the left shows the distribution of all the activations. You can see that values close to 0 or 1 are more frequent overall, which is consistent with the saturating nature of the sigmoid function. The histogram on the right shows the distribution of mean neuron activations: you can see that most neurons have a mean activation close to 0.5. Both histograms tell us that each neuron tends to either fire close to 0 or 1, with about 50% probability each. However, some neurons fire almost all the time (right side of the right histogram)."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"plot_activations_histogram(simple_encoder, height=0.35)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's add $\\ell_1$ regularization to the coding layer:"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"sparse_l1_encoder = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(100, activation=\"selu\"),\n",
" keras.layers.Dense(300, activation=\"sigmoid\"),\n",
" keras.layers.ActivityRegularization(l1=1e-3) # Alternatively, you could add\n",
" # activity_regularizer=keras.regularizers.l1(1e-3)\n",
" # to the previous layer.\n",
"])\n",
"sparse_l1_decoder = keras.models.Sequential([\n",
" keras.layers.Dense(100, activation=\"selu\", input_shape=[300]),\n",
" keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n",
" keras.layers.Reshape([28, 28])\n",
"])\n",
"sparse_l1_ae = keras.models.Sequential([sparse_l1_encoder, sparse_l1_decoder])\n",
"sparse_l1_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n",
" metrics=[rounded_accuracy])\n",
"history = sparse_l1_ae.fit(X_train, X_train, epochs=10,\n",
" validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"show_reconstructions(sparse_l1_ae)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"plot_activations_histogram(sparse_l1_encoder, height=1.)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's use the KL Divergence loss instead to ensure sparsity, and target 10% sparsity rather than 0%:"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"p = 0.1\n",
"q = np.linspace(0.001, 0.999, 500)\n",
"kl_div = p * np.log(p / q) + (1 - p) * np.log((1 - p) / (1 - q))\n",
"mse = (p - q)**2\n",
"mae = np.abs(p - q)\n",
"plt.plot([p, p], [0, 0.3], \"k:\")\n",
"plt.text(0.05, 0.32, \"Target\\nsparsity\", fontsize=14)\n",
"plt.plot(q, kl_div, \"b-\", label=\"KL divergence\")\n",
"plt.plot(q, mae, \"g--\", label=r\"MAE ($\\ell_1$)\")\n",
"plt.plot(q, mse, \"r--\", linewidth=1, label=r\"MSE ($\\ell_2$)\")\n",
"plt.legend(loc=\"upper left\", fontsize=14)\n",
"plt.xlabel(\"Actual sparsity\")\n",
"plt.ylabel(\"Cost\", rotation=0)\n",
"plt.axis([0, 1, 0, 0.95])\n",
"save_fig(\"sparsity_loss_plot\")"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"K = keras.backend\n",
"kl_divergence = keras.losses.kullback_leibler_divergence\n",
"\n",
"class KLDivergenceRegularizer(keras.regularizers.Regularizer):\n",
" def __init__(self, weight, target=0.1):\n",
" self.weight = weight\n",
" self.target = target\n",
" def __call__(self, inputs):\n",
" mean_activities = K.mean(inputs, axis=0)\n",
" return self.weight * (\n",
" kl_divergence(self.target, mean_activities) +\n",
" kl_divergence(1. - self.target, 1. - mean_activities))"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"kld_reg = KLDivergenceRegularizer(weight=0.05, target=0.1)\n",
"sparse_kl_encoder = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(100, activation=\"selu\"),\n",
" keras.layers.Dense(300, activation=\"sigmoid\", activity_regularizer=kld_reg)\n",
"])\n",
"sparse_kl_decoder = keras.models.Sequential([\n",
" keras.layers.Dense(100, activation=\"selu\", input_shape=[300]),\n",
" keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n",
" keras.layers.Reshape([28, 28])\n",
"])\n",
"sparse_kl_ae = keras.models.Sequential([sparse_kl_encoder, sparse_kl_decoder])\n",
"sparse_kl_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n",
" metrics=[rounded_accuracy])\n",
"history = sparse_kl_ae.fit(X_train, X_train, epochs=10,\n",
" validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"show_reconstructions(sparse_kl_ae)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"plot_activations_histogram(sparse_kl_encoder)\n",
"save_fig(\"sparse_autoencoder_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Variational Autoencoder"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"class Sampling(keras.layers.Layer):\n",
" def call(self, inputs):\n",
" mean, log_var = inputs\n",
" return K.random_normal(tf.shape(log_var)) * K.exp(log_var / 2) + mean "
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"codings_size = 10\n",
"\n",
"inputs = keras.layers.Input(shape=[28, 28])\n",
"z = keras.layers.Flatten()(inputs)\n",
"z = keras.layers.Dense(150, activation=\"selu\")(z)\n",
"z = keras.layers.Dense(100, activation=\"selu\")(z)\n",
"codings_mean = keras.layers.Dense(codings_size)(z)\n",
"codings_log_var = keras.layers.Dense(codings_size)(z)\n",
"codings = Sampling()([codings_mean, codings_log_var])\n",
"variational_encoder = keras.models.Model(\n",
" inputs=[inputs], outputs=[codings_mean, codings_log_var, codings])\n",
"\n",
"decoder_inputs = keras.layers.Input(shape=[codings_size])\n",
"x = keras.layers.Dense(100, activation=\"selu\")(decoder_inputs)\n",
"x = keras.layers.Dense(150, activation=\"selu\")(x)\n",
"x = keras.layers.Dense(28 * 28, activation=\"sigmoid\")(x)\n",
"outputs = keras.layers.Reshape([28, 28])(x)\n",
"variational_decoder = keras.models.Model(inputs=[decoder_inputs], outputs=[outputs])\n",
"\n",
"_, _, codings = variational_encoder(inputs)\n",
"reconstructions = variational_decoder(codings)\n",
"variational_ae = keras.models.Model(inputs=[inputs], outputs=[reconstructions])\n",
"\n",
"latent_loss = -0.5 * K.sum(\n",
" 1 + codings_log_var - K.exp(codings_log_var) - K.square(codings_mean),\n",
" axis=-1)\n",
"variational_ae.add_loss(K.mean(latent_loss) / 784.)\n",
"variational_ae.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\", metrics=[rounded_accuracy])\n",
"history = variational_ae.fit(X_train, X_train, epochs=25, batch_size=128,\n",
" validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"show_reconstructions(variational_ae)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate Fashion Images"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [],
"source": [
"def plot_multiple_images(images, n_cols=None):\n",
" n_cols = n_cols or len(images)\n",
" n_rows = (len(images) - 1) // n_cols + 1\n",
" if images.shape[-1] == 1:\n",
" images = np.squeeze(images, axis=-1)\n",
" plt.figure(figsize=(n_cols, n_rows))\n",
" for index, image in enumerate(images):\n",
" plt.subplot(n_rows, n_cols, index + 1)\n",
" plt.imshow(image, cmap=\"binary\")\n",
" plt.axis(\"off\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's generate a few random codings, decode them and plot the resulting images:"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"\n",
"codings = tf.random.normal(shape=[12, codings_size])\n",
"images = variational_decoder(codings).numpy()\n",
"plot_multiple_images(images, 4)\n",
"save_fig(\"vae_generated_images_plot\", tight_layout=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's perform semantic interpolation between these images:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"codings_grid = tf.reshape(codings, [1, 3, 4, codings_size])\n",
"larger_grid = tf.image.resize(codings_grid, size=[5, 7])\n",
"interpolated_codings = tf.reshape(larger_grid, [-1, codings_size])\n",
"images = variational_decoder(interpolated_codings).numpy()\n",
"\n",
"plt.figure(figsize=(7, 5))\n",
"for index, image in enumerate(images):\n",
" plt.subplot(5, 7, index + 1)\n",
" if index%7%2==0 and index//7%2==0:\n",
" plt.gca().get_xaxis().set_visible(False)\n",
" plt.gca().get_yaxis().set_visible(False)\n",
" else:\n",
" plt.axis(\"off\")\n",
" plt.imshow(image, cmap=\"binary\")\n",
"save_fig(\"semantic_interpolation_plot\", tight_layout=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Generative Adversarial Networks"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(42)\n",
"tf.random.set_seed(42)\n",
"\n",
"codings_size = 30\n",
"\n",
"generator = keras.models.Sequential([\n",
" keras.layers.Dense(100, activation=\"selu\", input_shape=[codings_size]),\n",
" keras.layers.Dense(150, activation=\"selu\"),\n",
" keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n",
" keras.layers.Reshape([28, 28])\n",
"])\n",
"discriminator = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(150, activation=\"selu\"),\n",
" keras.layers.Dense(100, activation=\"selu\"),\n",
" keras.layers.Dense(1, activation=\"sigmoid\")\n",
"])\n",
"gan = keras.models.Sequential([generator, discriminator])"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"discriminator.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")\n",
"discriminator.trainable = False\n",
"gan.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"batch_size = 32\n",
"dataset = tf.data.Dataset.from_tensor_slices(X_train).shuffle(1000)\n",
"dataset = dataset.batch(batch_size, drop_remainder=True).prefetch(1)"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"def train_gan(gan, dataset, batch_size, codings_size, n_epochs=50):\n",
" generator, discriminator = gan.layers\n",
" for epoch in range(n_epochs):\n",
" print(\"Epoch {}/{}\".format(epoch + 1, n_epochs)) # not shown in the book\n",
" for X_batch in dataset:\n",
" # phase 1 - training the discriminator\n",
" noise = tf.random.normal(shape=[batch_size, codings_size])\n",
" generated_images = generator(noise)\n",
" X_fake_and_real = tf.concat([generated_images, X_batch], axis=0)\n",
" y1 = tf.constant([[0.]] * batch_size + [[1.]] * batch_size)\n",
" discriminator.trainable = True\n",
" discriminator.train_on_batch(X_fake_and_real, y1)\n",
" # phase 2 - training the generator\n",
" noise = tf.random.normal(shape=[batch_size, codings_size])\n",
" y2 = tf.constant([[1.]] * batch_size)\n",
" discriminator.trainable = False\n",
" gan.train_on_batch(noise, y2)\n",
" plot_multiple_images(generated_images, 8) # not shown\n",
" plt.show() # not shown"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"train_gan(gan, dataset, batch_size, codings_size, n_epochs=1)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"noise = tf.random.normal(shape=[batch_size, codings_size])\n",
"generated_images = generator(noise)\n",
"plot_multiple_images(generated_images, 8)\n",
"save_fig(\"gan_generated_images_plot\", tight_layout=False)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"train_gan(gan, dataset, batch_size, codings_size)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deep Convolutional GAN"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"codings_size = 100\n",
"\n",
"generator = keras.models.Sequential([\n",
" keras.layers.Dense(7 * 7 * 128, input_shape=[codings_size]),\n",
" keras.layers.Reshape([7, 7, 128]),\n",
" keras.layers.BatchNormalization(),\n",
" keras.layers.Conv2DTranspose(64, kernel_size=5, strides=2, padding=\"SAME\",\n",
" activation=\"selu\"),\n",
" keras.layers.BatchNormalization(),\n",
" keras.layers.Conv2DTranspose(1, kernel_size=5, strides=2, padding=\"SAME\",\n",
" activation=\"tanh\"),\n",
"])\n",
"discriminator = keras.models.Sequential([\n",
" keras.layers.Conv2D(64, kernel_size=5, strides=2, padding=\"SAME\",\n",
" activation=keras.layers.LeakyReLU(0.2),\n",
" input_shape=[28, 28, 1]),\n",
" keras.layers.Dropout(0.4),\n",
" keras.layers.Conv2D(128, kernel_size=5, strides=2, padding=\"SAME\",\n",
" activation=keras.layers.LeakyReLU(0.2)),\n",
" keras.layers.Dropout(0.4),\n",
" keras.layers.Flatten(),\n",
" keras.layers.Dense(1, activation=\"sigmoid\")\n",
"])\n",
"gan = keras.models.Sequential([generator, discriminator])"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"discriminator.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")\n",
"discriminator.trainable = False\n",
"gan.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
"X_train_dcgan = X_train.reshape(-1, 28, 28, 1) * 2. - 1. # reshape and rescale"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"batch_size = 32\n",
"dataset = tf.data.Dataset.from_tensor_slices(X_train_dcgan)\n",
"dataset = dataset.shuffle(1000)\n",
"dataset = dataset.batch(batch_size, drop_remainder=True).prefetch(1)"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"train_gan(gan, dataset, batch_size, codings_size)"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"noise = tf.random.normal(shape=[batch_size, codings_size])\n",
"generated_images = generator(noise)\n",
"plot_multiple_images(generated_images, 8)\n",
"save_fig(\"dcgan_generated_images_plot\", tight_layout=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Extra Material"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hashing Using a Binary Autoencoder"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's load the Fashion MNIST dataset again:"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()\n",
"X_train_full = X_train_full.astype(np.float32) / 255\n",
"X_test = X_test.astype(np.float32) / 255\n",
"X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n",
"y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's train an autoencoder where the encoder has a 16-neuron output layer, using the sigmoid activation function, and heavy Gaussian noise just before it. During training, the noise layer will encourage the previous layer to learn to output large values, since small values will just be crushed by the noise. In turn, this means that the output layer will output values close to 0 or 1, thanks to the sigmoid activation function. Once we round the output values to 0s and 1s, we get a 16-bit \"semantic\" hash. If everything works well, images that look alike will have the same hash. This can be very useful for search engines: for example, if we store each image on a server identified by the image's semantic hash, then all similar images will end up on the same server. Users of the search engine can then provide an image to search for, and the search engine will compute the image's hash using the encoder, and quickly return all the images on the server identified by that hash."
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"hashing_encoder = keras.models.Sequential([\n",
" keras.layers.Flatten(input_shape=[28, 28]),\n",
" keras.layers.Dense(100, activation=\"selu\"),\n",
" keras.layers.GaussianNoise(15.),\n",
" keras.layers.Dense(16, activation=\"sigmoid\"),\n",
"])\n",
"hashing_decoder = keras.models.Sequential([\n",
" keras.layers.Dense(100, activation=\"selu\", input_shape=[16]),\n",
" keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n",
" keras.layers.Reshape([28, 28])\n",
"])\n",
"hashing_ae = keras.models.Sequential([hashing_encoder, hashing_decoder])\n",
"hashing_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.Nadam(),\n",
" metrics=[rounded_accuracy])\n",
"history = hashing_ae.fit(X_train, X_train, epochs=10,\n",
" validation_data=(X_valid, X_valid))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The autoencoder compresses the information so much (down to 16 bits!) that it's quite lossy, but that's okay, we're using it to produce semantic hashes, not to perfectly reconstruct the images:"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"show_reconstructions(hashing_ae)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that the outputs are indeed very close to 0 or 1 (left graph):"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [],
"source": [
"plot_activations_histogram(hashing_encoder)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's see what the hashes look like for the first few images in the validation set:"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [],
"source": [
"hashes = np.round(hashing_encoder.predict(X_valid)).astype(np.int32)\n",
"hashes *= np.array([[2**bit for bit in range(16)]])\n",
"hashes = hashes.sum(axis=1)\n",
"for h in hashes[:5]:\n",
" print(\"{:016b}\".format(h))\n",
"print(\"...\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's find the most common image hashes in the validation set, and display a few images for each hash. In the following image, all the images on a given row have the same hash:"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [],
"source": [
"from collections import Counter\n",
"\n",
"n_hashes = 10\n",
"n_images = 8\n",
"\n",
"top_hashes = Counter(hashes).most_common(n_hashes)\n",
"\n",
"plt.figure(figsize=(n_images, n_hashes))\n",
"for hash_index, (image_hash, hash_count) in enumerate(top_hashes):\n",
" indices = (hashes == image_hash)\n",
" for index, image in enumerate(X_valid[indices][:n_images]):\n",
" plt.subplot(n_hashes, n_images, hash_index * n_images + index + 1)\n",
" plt.imshow(image, cmap=\"binary\")\n",
" plt.axis(\"off\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise Solutions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. to 8.\n",
"\n",
"See Appendix A."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 9.\n",
"_Exercise: Try using a denoising autoencoder to pretrain an image classifier. You can use MNIST (the simplest option), or a more complex image dataset such as [CIFAR10](https://homl.info/122) if you want a bigger challenge. Regardless of the dataset you're using, follow these steps:_\n",
"* Split the dataset into a training set and a test set. Train a deep denoising autoencoder on the full training set.\n",
"* Check that the images are fairly well reconstructed. Visualize the images that most activate each neuron in the coding layer.\n",
"* Build a classification DNN, reusing the lower layers of the autoencoder. Train it using only 500 images from the training set. Does it perform better with or without pretraining?"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"[X_train, y_train], [X_test, y_test] = keras.datasets.cifar10.load_data()\n",
"X_train = X_train / 255\n",
"X_test = X_test / 255"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"denoising_encoder = keras.models.Sequential([\n",
" keras.layers.GaussianNoise(0.1, input_shape=[32, 32, 3]),\n",
" keras.layers.Conv2D(32, kernel_size=3, padding=\"same\", activation=\"relu\"),\n",
" keras.layers.MaxPool2D(),\n",
" keras.layers.Flatten(),\n",
" keras.layers.Dense(512, activation=\"relu\"),\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [],
"source": [
"denoising_encoder.summary()"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [],
"source": [
"denoising_decoder = keras.models.Sequential([\n",
" keras.layers.Dense(16 * 16 * 32, activation=\"relu\", input_shape=[512]),\n",
" keras.layers.Reshape([16, 16, 32]),\n",
" keras.layers.Conv2DTranspose(filters=3, kernel_size=3, strides=2,\n",
" padding=\"same\", activation=\"sigmoid\")\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [],
"source": [
"denoising_decoder.summary()"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [],
"source": [
"denoising_ae = keras.models.Sequential([denoising_encoder, denoising_decoder])\n",
"denoising_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.Nadam(),\n",
" metrics=[\"mse\"])\n",
"history = denoising_ae.fit(X_train, X_train, epochs=10,\n",
" validation_data=(X_test, X_test))"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [],
"source": [
"n_images = 5\n",
"new_images = X_test[:n_images]\n",
"new_images_noisy = new_images + np.random.randn(n_images, 32, 32, 3) * 0.1\n",
"new_images_denoised = denoising_ae.predict(new_images_noisy)\n",
"\n",
"plt.figure(figsize=(6, n_images * 2))\n",
"for index in range(n_images):\n",
" plt.subplot(n_images, 3, index * 3 + 1)\n",
" plt.imshow(new_images[index])\n",
" plt.axis('off')\n",
" if index == 0:\n",
" plt.title(\"Original\")\n",
" plt.subplot(n_images, 3, index * 3 + 2)\n",
" plt.imshow(np.clip(new_images_noisy[index], 0., 1.))\n",
" plt.axis('off')\n",
" if index == 0:\n",
" plt.title(\"Noisy\")\n",
" plt.subplot(n_images, 3, index * 3 + 3)\n",
" plt.imshow(new_images_denoised[index])\n",
" plt.axis('off')\n",
" if index == 0:\n",
" plt.title(\"Denoised\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 10.\n",
"_Exercise: Train a variational autoencoder on the image dataset of your choice, and use it to generate images. Alternatively, you can try to find an unlabeled dataset that you are interested in and see if you can generate new samples._\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 11.\n",
"_Exercise: Train a DCGAN to tackle the image dataset of your choice, and use it to generate images. Add experience replay and see if this helps. Turn it into a conditional GAN where you can control the generated class._\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
},
"nav_menu": {
"height": "381px",
"width": "453px"
},
"toc": {
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 6,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}