{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**Chapter 17 – Autoencoders and GANs**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_This notebook contains all the sample code in chapter 17._" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", "
\n", " Run in Google Colab\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Python ≥3.5 is required\n", "import sys\n", "assert sys.version_info >= (3, 5)\n", "\n", "# Scikit-Learn ≥0.20 is required\n", "import sklearn\n", "assert sklearn.__version__ >= \"0.20\"\n", "\n", "try:\n", " # %tensorflow_version only exists in Colab.\n", " %tensorflow_version 2.x\n", " IS_COLAB = True\n", "except Exception:\n", " IS_COLAB = False\n", "\n", "# TensorFlow ≥2.0 is required\n", "import tensorflow as tf\n", "from tensorflow import keras\n", "assert tf.__version__ >= \"2.0\"\n", "\n", "if not tf.config.list_physical_devices('GPU'):\n", " print(\"No GPU was detected. LSTMs and CNNs can be very slow without a GPU.\")\n", " if IS_COLAB:\n", " print(\"Go to Runtime > Change runtime and select a GPU hardware accelerator.\")\n", "\n", "# Common imports\n", "import numpy as np\n", "import os\n", "\n", "# to make this notebook's output stable across runs\n", "np.random.seed(42)\n", "tf.random.set_seed(42)\n", "\n", "# To plot pretty figures\n", "%matplotlib inline\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "mpl.rc('axes', labelsize=14)\n", "mpl.rc('xtick', labelsize=12)\n", "mpl.rc('ytick', labelsize=12)\n", "\n", "# Where to save the figures\n", "PROJECT_ROOT_DIR = \".\"\n", "CHAPTER_ID = \"autoencoders\"\n", "IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID)\n", "os.makedirs(IMAGES_PATH, exist_ok=True)\n", "\n", "def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n", " path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n", " print(\"Saving figure\", fig_id)\n", " if tight_layout:\n", " plt.tight_layout()\n", " plt.savefig(path, format=fig_extension, dpi=resolution)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A couple utility functions to plot grayscale 28x28 image:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def plot_image(image):\n", " plt.imshow(image, cmap=\"binary\")\n", " plt.axis(\"off\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PCA with a linear Autoencoder" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Build 3D dataset:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "np.random.seed(4)\n", "\n", "def generate_3d_data(m, w1=0.1, w2=0.3, noise=0.1):\n", " angles = np.random.rand(m) * 3 * np.pi / 2 - 0.5\n", " data = np.empty((m, 3))\n", " data[:, 0] = np.cos(angles) + np.sin(angles)/2 + noise * np.random.randn(m) / 2\n", " data[:, 1] = np.sin(angles) * 0.7 + noise * np.random.randn(m) / 2\n", " data[:, 2] = data[:, 0] * w1 + data[:, 1] * w2 + noise * np.random.randn(m)\n", " return data\n", "\n", "X_train = generate_3d_data(60)\n", "X_train = X_train - X_train.mean(axis=0, keepdims=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's build the Autoencoder..." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "np.random.seed(42)\n", "tf.random.set_seed(42)\n", "\n", "encoder = keras.models.Sequential([keras.layers.Dense(2, input_shape=[3])])\n", "decoder = keras.models.Sequential([keras.layers.Dense(3, input_shape=[2])])\n", "autoencoder = keras.models.Sequential([encoder, decoder])\n", "\n", "autoencoder.compile(loss=\"mse\", optimizer=keras.optimizers.SGD(lr=1.5))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "history = autoencoder.fit(X_train, X_train, epochs=20)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "codings = encoder.predict(X_train)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "fig = plt.figure(figsize=(4,3))\n", "plt.plot(codings[:,0], codings[:, 1], \"b.\")\n", "plt.xlabel(\"$z_1$\", fontsize=18)\n", "plt.ylabel(\"$z_2$\", fontsize=18, rotation=0)\n", "plt.grid(True)\n", "save_fig(\"linear_autoencoder_pca_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Stacked Autoencoders" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use MNIST:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()\n", "X_train_full = X_train_full.astype(np.float32) / 255\n", "X_test = X_test.astype(np.float32) / 255\n", "X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n", "y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train all layers at once" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's build a stacked Autoencoder with 3 hidden layers and 1 output layer (i.e., 2 stacked Autoencoders)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def rounded_accuracy(y_true, y_pred):\n", " return keras.metrics.binary_accuracy(tf.round(y_true), tf.round(y_pred))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "stacked_encoder = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(100, activation=\"selu\"),\n", " keras.layers.Dense(30, activation=\"selu\"),\n", "])\n", "stacked_decoder = keras.models.Sequential([\n", " keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n", " keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " keras.layers.Reshape([28, 28])\n", "])\n", "stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])\n", "stacked_ae.compile(loss=\"binary_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1.5), metrics=[rounded_accuracy])\n", "history = stacked_ae.fit(X_train, X_train, epochs=20,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function processes a few test images through the autoencoder and displays the original images and their reconstructions:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def show_reconstructions(model, images=X_valid, n_images=5):\n", " reconstructions = model.predict(images[:n_images])\n", " fig = plt.figure(figsize=(n_images * 1.5, 3))\n", " for image_index in range(n_images):\n", " plt.subplot(2, n_images, 1 + image_index)\n", " plot_image(images[image_index])\n", " plt.subplot(2, n_images, 1 + n_images + image_index)\n", " plot_image(reconstructions[image_index])" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(stacked_ae)\n", "save_fig(\"reconstruction_plot\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing Fashion MNIST" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "np.random.seed(42)\n", "\n", "from sklearn.manifold import TSNE\n", "\n", "X_valid_compressed = stacked_encoder.predict(X_valid)\n", "tsne = TSNE()\n", "X_valid_2D = tsne.fit_transform(X_valid_compressed)\n", "X_valid_2D = (X_valid_2D - X_valid_2D.min()) / (X_valid_2D.max() - X_valid_2D.min())" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "plt.scatter(X_valid_2D[:, 0], X_valid_2D[:, 1], c=y_valid, s=10, cmap=\"tab10\")\n", "plt.axis(\"off\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's make this diagram a bit prettier:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# adapted from https://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html\n", "plt.figure(figsize=(10, 8))\n", "cmap = plt.cm.tab10\n", "plt.scatter(X_valid_2D[:, 0], X_valid_2D[:, 1], c=y_valid, s=10, cmap=cmap)\n", "image_positions = np.array([[1., 1.]])\n", "for index, position in enumerate(X_valid_2D):\n", " dist = np.sum((position - image_positions) ** 2, axis=1)\n", " if np.min(dist) > 0.02: # if far enough from other images\n", " image_positions = np.r_[image_positions, [position]]\n", " imagebox = mpl.offsetbox.AnnotationBbox(\n", " mpl.offsetbox.OffsetImage(X_valid[index], cmap=\"binary\"),\n", " position, bboxprops={\"edgecolor\": cmap(y_valid[index]), \"lw\": 2})\n", " plt.gca().add_artist(imagebox)\n", "plt.axis(\"off\")\n", "save_fig(\"fashion_mnist_visualization_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tying weights" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is common to tie the weights of the encoder and the decoder, by simply using the transpose of the encoder's weights as the decoder weights. For this, we need to use a custom layer." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "class DenseTranspose(keras.layers.Layer):\n", " def __init__(self, dense, activation=None, **kwargs):\n", " self.dense = dense\n", " self.activation = keras.activations.get(activation)\n", " super().__init__(**kwargs)\n", " def build(self, batch_input_shape):\n", " self.biases = self.add_weight(name=\"bias\",\n", " shape=[self.dense.input_shape[-1]],\n", " initializer=\"zeros\")\n", " super().build(batch_input_shape)\n", " def call(self, inputs):\n", " z = tf.matmul(inputs, self.dense.weights[0], transpose_b=True)\n", " return self.activation(z + self.biases)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "keras.backend.clear_session()\n", "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "dense_1 = keras.layers.Dense(100, activation=\"selu\")\n", "dense_2 = keras.layers.Dense(30, activation=\"selu\")\n", "\n", "tied_encoder = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " dense_1,\n", " dense_2\n", "])\n", "\n", "tied_decoder = keras.models.Sequential([\n", " DenseTranspose(dense_2, activation=\"selu\"),\n", " DenseTranspose(dense_1, activation=\"sigmoid\"),\n", " keras.layers.Reshape([28, 28])\n", "])\n", "\n", "tied_ae = keras.models.Sequential([tied_encoder, tied_decoder])\n", "\n", "tied_ae.compile(loss=\"binary_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=1.5), metrics=[rounded_accuracy])\n", "history = tied_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": true }, "outputs": [], "source": [ "show_reconstructions(tied_ae)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training one Autoencoder at a Time" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "def train_autoencoder(n_neurons, X_train, X_valid, loss, optimizer,\n", " n_epochs=10, output_activation=None, metrics=None):\n", " n_inputs = X_train.shape[-1]\n", " encoder = keras.models.Sequential([\n", " keras.layers.Dense(n_neurons, activation=\"selu\", input_shape=[n_inputs])\n", " ])\n", " decoder = keras.models.Sequential([\n", " keras.layers.Dense(n_inputs, activation=output_activation),\n", " ])\n", " autoencoder = keras.models.Sequential([encoder, decoder])\n", " autoencoder.compile(optimizer, loss, metrics=metrics)\n", " autoencoder.fit(X_train, X_train, epochs=n_epochs,\n", " validation_data=(X_valid, X_valid))\n", " return encoder, decoder, encoder(X_train), encoder(X_valid)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "K = keras.backend\n", "X_train_flat = K.batch_flatten(X_train) # equivalent to .reshape(-1, 28 * 28)\n", "X_valid_flat = K.batch_flatten(X_valid)\n", "enc1, dec1, X_train_enc1, X_valid_enc1 = train_autoencoder(\n", " 100, X_train_flat, X_valid_flat, \"binary_crossentropy\",\n", " keras.optimizers.SGD(lr=1.5), output_activation=\"sigmoid\",\n", " metrics=[rounded_accuracy])\n", "enc2, dec2, _, _ = train_autoencoder(\n", " 30, X_train_enc1, X_valid_enc1, \"mse\", keras.optimizers.SGD(lr=0.05),\n", " output_activation=\"selu\")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "stacked_ae_1_by_1 = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " enc1, enc2, dec2, dec1,\n", " keras.layers.Reshape([28, 28])\n", "])" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(stacked_ae_1_by_1)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "stacked_ae_1_by_1.compile(loss=\"binary_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=0.1), metrics=[rounded_accuracy])\n", "history = stacked_ae_1_by_1.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(stacked_ae_1_by_1)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Convolutional Layers Instead of Dense Layers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's build a stacked Autoencoder with 3 hidden layers and 1 output layer (i.e., 2 stacked Autoencoders)." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "conv_encoder = keras.models.Sequential([\n", " keras.layers.Reshape([28, 28, 1], input_shape=[28, 28]),\n", " keras.layers.Conv2D(16, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", " keras.layers.MaxPool2D(pool_size=2),\n", " keras.layers.Conv2D(32, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", " keras.layers.MaxPool2D(pool_size=2),\n", " keras.layers.Conv2D(64, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", " keras.layers.MaxPool2D(pool_size=2)\n", "])\n", "conv_decoder = keras.models.Sequential([\n", " keras.layers.Conv2DTranspose(32, kernel_size=3, strides=2, padding=\"VALID\", activation=\"selu\",\n", " input_shape=[3, 3, 64]),\n", " keras.layers.Conv2DTranspose(16, kernel_size=3, strides=2, padding=\"SAME\", activation=\"selu\"),\n", " keras.layers.Conv2DTranspose(1, kernel_size=3, strides=2, padding=\"SAME\", activation=\"sigmoid\"),\n", " keras.layers.Reshape([28, 28])\n", "])\n", "conv_ae = keras.models.Sequential([conv_encoder, conv_decoder])\n", "\n", "conv_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n", " metrics=[rounded_accuracy])\n", "history = conv_ae.fit(X_train, X_train, epochs=5,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "conv_encoder.summary()\n", "conv_decoder.summary()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(conv_ae)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Recurrent Autoencoders" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "recurrent_encoder = keras.models.Sequential([\n", " keras.layers.LSTM(100, return_sequences=True, input_shape=[28, 28]),\n", " keras.layers.LSTM(30)\n", "])\n", "recurrent_decoder = keras.models.Sequential([\n", " keras.layers.RepeatVector(28, input_shape=[30]),\n", " keras.layers.LSTM(100, return_sequences=True),\n", " keras.layers.TimeDistributed(keras.layers.Dense(28, activation=\"sigmoid\"))\n", "])\n", "recurrent_ae = keras.models.Sequential([recurrent_encoder, recurrent_decoder])\n", "recurrent_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(0.1),\n", " metrics=[rounded_accuracy])" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "history = recurrent_ae.fit(X_train, X_train, epochs=10, validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(recurrent_ae)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Stacked denoising Autoencoder" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using Gaussian noise:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "denoising_encoder = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.GaussianNoise(0.2),\n", " keras.layers.Dense(100, activation=\"selu\"),\n", " keras.layers.Dense(30, activation=\"selu\")\n", "])\n", "denoising_decoder = keras.models.Sequential([\n", " keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n", " keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " keras.layers.Reshape([28, 28])\n", "])\n", "denoising_ae = keras.models.Sequential([denoising_encoder, denoising_decoder])\n", "denoising_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n", " metrics=[rounded_accuracy])\n", "history = denoising_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "noise = keras.layers.GaussianNoise(0.2)\n", "show_reconstructions(denoising_ae, noise(X_valid, training=True))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using dropout:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "dropout_encoder = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dropout(0.5),\n", " keras.layers.Dense(100, activation=\"selu\"),\n", " keras.layers.Dense(30, activation=\"selu\")\n", "])\n", "dropout_decoder = keras.models.Sequential([\n", " keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n", " keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " keras.layers.Reshape([28, 28])\n", "])\n", "dropout_ae = keras.models.Sequential([dropout_encoder, dropout_decoder])\n", "dropout_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n", " metrics=[rounded_accuracy])\n", "history = dropout_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "dropout = keras.layers.Dropout(0.5)\n", "show_reconstructions(dropout_ae, dropout(X_valid, training=True))\n", "save_fig(\"dropout_denoising_plot\", tight_layout=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sparse Autoencoder" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's build a simple stacked autoencoder, so we can compare it to the sparse autoencoders we will build. This time we will use the sigmoid activation function for the coding layer, to ensure that the coding values range from 0 to 1:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "simple_encoder = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(100, activation=\"selu\"),\n", " keras.layers.Dense(30, activation=\"sigmoid\"),\n", "])\n", "simple_decoder = keras.models.Sequential([\n", " keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n", " keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " keras.layers.Reshape([28, 28])\n", "])\n", "simple_ae = keras.models.Sequential([simple_encoder, simple_decoder])\n", "simple_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.),\n", " metrics=[rounded_accuracy])\n", "history = simple_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(simple_ae)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a couple functions to print nice activation histograms:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "def plot_percent_hist(ax, data, bins):\n", " counts, _ = np.histogram(data, bins=bins)\n", " widths = bins[1:] - bins[:-1]\n", " x = bins[:-1] + widths / 2\n", " ax.bar(x, counts / len(data), width=widths*0.8)\n", " ax.xaxis.set_ticks(bins)\n", " ax.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(\n", " lambda y, position: \"{}%\".format(int(np.round(100 * y)))))\n", " ax.grid(True)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "def plot_activations_histogram(encoder, height=1, n_bins=10):\n", " X_valid_codings = encoder(X_valid).numpy()\n", " activation_means = X_valid_codings.mean(axis=0)\n", " mean = activation_means.mean()\n", " bins = np.linspace(0, 1, n_bins + 1)\n", "\n", " fig, [ax1, ax2] = plt.subplots(figsize=(10, 3), nrows=1, ncols=2, sharey=True)\n", " plot_percent_hist(ax1, X_valid_codings.ravel(), bins)\n", " ax1.plot([mean, mean], [0, height], \"k--\", label=\"Overall Mean = {:.2f}\".format(mean))\n", " ax1.legend(loc=\"upper center\", fontsize=14)\n", " ax1.set_xlabel(\"Activation\")\n", " ax1.set_ylabel(\"% Activations\")\n", " ax1.axis([0, 1, 0, height])\n", " plot_percent_hist(ax2, activation_means, bins)\n", " ax2.plot([mean, mean], [0, height], \"k--\")\n", " ax2.set_xlabel(\"Neuron Mean Activation\")\n", " ax2.set_ylabel(\"% Neurons\")\n", " ax2.axis([0, 1, 0, height])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use these functions to plot histograms of the activations of the encoding layer. The histogram on the left shows the distribution of all the activations. You can see that values close to 0 or 1 are more frequent overall, which is consistent with the saturating nature of the sigmoid function. The histogram on the right shows the distribution of mean neuron activations: you can see that most neurons have a mean activation close to 0.5. Both histograms tell us that each neuron tends to either fire close to 0 or 1, with about 50% probability each. However, some neurons fire almost all the time (right side of the right histogram)." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "plot_activations_histogram(simple_encoder, height=0.35)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's add $\\ell_1$ regularization to the coding layer:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "sparse_l1_encoder = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(100, activation=\"selu\"),\n", " keras.layers.Dense(300, activation=\"sigmoid\"),\n", " keras.layers.ActivityRegularization(l1=1e-3) # Alternatively, you could add\n", " # activity_regularizer=keras.regularizers.l1(1e-3)\n", " # to the previous layer.\n", "])\n", "sparse_l1_decoder = keras.models.Sequential([\n", " keras.layers.Dense(100, activation=\"selu\", input_shape=[300]),\n", " keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " keras.layers.Reshape([28, 28])\n", "])\n", "sparse_l1_ae = keras.models.Sequential([sparse_l1_encoder, sparse_l1_decoder])\n", "sparse_l1_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n", " metrics=[rounded_accuracy])\n", "history = sparse_l1_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(sparse_l1_ae)" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "plot_activations_histogram(sparse_l1_encoder, height=1.)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use the KL Divergence loss instead to ensure sparsity, and target 10% sparsity rather than 0%:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "p = 0.1\n", "q = np.linspace(0.001, 0.999, 500)\n", "kl_div = p * np.log(p / q) + (1 - p) * np.log((1 - p) / (1 - q))\n", "mse = (p - q)**2\n", "mae = np.abs(p - q)\n", "plt.plot([p, p], [0, 0.3], \"k:\")\n", "plt.text(0.05, 0.32, \"Target\\nsparsity\", fontsize=14)\n", "plt.plot(q, kl_div, \"b-\", label=\"KL divergence\")\n", "plt.plot(q, mae, \"g--\", label=r\"MAE ($\\ell_1$)\")\n", "plt.plot(q, mse, \"r--\", linewidth=1, label=r\"MSE ($\\ell_2$)\")\n", "plt.legend(loc=\"upper left\", fontsize=14)\n", "plt.xlabel(\"Actual sparsity\")\n", "plt.ylabel(\"Cost\", rotation=0)\n", "plt.axis([0, 1, 0, 0.95])\n", "save_fig(\"sparsity_loss_plot\")" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "K = keras.backend\n", "kl_divergence = keras.losses.kullback_leibler_divergence\n", "\n", "class KLDivergenceRegularizer(keras.regularizers.Regularizer):\n", " def __init__(self, weight, target=0.1):\n", " self.weight = weight\n", " self.target = target\n", " def __call__(self, inputs):\n", " mean_activities = K.mean(inputs, axis=0)\n", " return self.weight * (\n", " kl_divergence(self.target, mean_activities) +\n", " kl_divergence(1. - self.target, 1. - mean_activities))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "kld_reg = KLDivergenceRegularizer(weight=0.05, target=0.1)\n", "sparse_kl_encoder = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(100, activation=\"selu\"),\n", " keras.layers.Dense(300, activation=\"sigmoid\", activity_regularizer=kld_reg)\n", "])\n", "sparse_kl_decoder = keras.models.Sequential([\n", " keras.layers.Dense(100, activation=\"selu\", input_shape=[300]),\n", " keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " keras.layers.Reshape([28, 28])\n", "])\n", "sparse_kl_ae = keras.models.Sequential([sparse_kl_encoder, sparse_kl_decoder])\n", "sparse_kl_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n", " metrics=[rounded_accuracy])\n", "history = sparse_kl_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(sparse_kl_ae)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "plot_activations_histogram(sparse_kl_encoder)\n", "save_fig(\"sparse_autoencoder_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Variational Autoencoder" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "class Sampling(keras.layers.Layer):\n", " def call(self, inputs):\n", " mean, log_var = inputs\n", " return K.random_normal(tf.shape(log_var)) * K.exp(log_var / 2) + mean " ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "codings_size = 10\n", "\n", "inputs = keras.layers.Input(shape=[28, 28])\n", "z = keras.layers.Flatten()(inputs)\n", "z = keras.layers.Dense(150, activation=\"selu\")(z)\n", "z = keras.layers.Dense(100, activation=\"selu\")(z)\n", "codings_mean = keras.layers.Dense(codings_size)(z)\n", "codings_log_var = keras.layers.Dense(codings_size)(z)\n", "codings = Sampling()([codings_mean, codings_log_var])\n", "variational_encoder = keras.models.Model(\n", " inputs=[inputs], outputs=[codings_mean, codings_log_var, codings])\n", "\n", "decoder_inputs = keras.layers.Input(shape=[codings_size])\n", "x = keras.layers.Dense(100, activation=\"selu\")(decoder_inputs)\n", "x = keras.layers.Dense(150, activation=\"selu\")(x)\n", "x = keras.layers.Dense(28 * 28, activation=\"sigmoid\")(x)\n", "outputs = keras.layers.Reshape([28, 28])(x)\n", "variational_decoder = keras.models.Model(inputs=[decoder_inputs], outputs=[outputs])\n", "\n", "_, _, codings = variational_encoder(inputs)\n", "reconstructions = variational_decoder(codings)\n", "variational_ae = keras.models.Model(inputs=[inputs], outputs=[reconstructions])\n", "\n", "latent_loss = -0.5 * K.sum(\n", " 1 + codings_log_var - K.exp(codings_log_var) - K.square(codings_mean),\n", " axis=-1)\n", "variational_ae.add_loss(K.mean(latent_loss) / 784.)\n", "variational_ae.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\", metrics=[rounded_accuracy])\n", "history = variational_ae.fit(X_train, X_train, epochs=25, batch_size=128,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "scrolled": true }, "outputs": [], "source": [ "show_reconstructions(variational_ae)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate Fashion Images" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "def plot_multiple_images(images, n_cols=None):\n", " n_cols = n_cols or len(images)\n", " n_rows = (len(images) - 1) // n_cols + 1\n", " if images.shape[-1] == 1:\n", " images = np.squeeze(images, axis=-1)\n", " plt.figure(figsize=(n_cols, n_rows))\n", " for index, image in enumerate(images):\n", " plt.subplot(n_rows, n_cols, index + 1)\n", " plt.imshow(image, cmap=\"binary\")\n", " plt.axis(\"off\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's generate a few random codings, decode them and plot the resulting images:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "\n", "codings = tf.random.normal(shape=[12, codings_size])\n", "images = variational_decoder(codings).numpy()\n", "plot_multiple_images(images, 4)\n", "save_fig(\"vae_generated_images_plot\", tight_layout=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's perform semantic interpolation between these images:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "codings_grid = tf.reshape(codings, [1, 3, 4, codings_size])\n", "larger_grid = tf.image.resize(codings_grid, size=[5, 7])\n", "interpolated_codings = tf.reshape(larger_grid, [-1, codings_size])\n", "images = variational_decoder(interpolated_codings).numpy()\n", "\n", "plt.figure(figsize=(7, 5))\n", "for index, image in enumerate(images):\n", " plt.subplot(5, 7, index + 1)\n", " if index%7%2==0 and index//7%2==0:\n", " plt.gca().get_xaxis().set_visible(False)\n", " plt.gca().get_yaxis().set_visible(False)\n", " else:\n", " plt.axis(\"off\")\n", " plt.imshow(image, cmap=\"binary\")\n", "save_fig(\"semantic_interpolation_plot\", tight_layout=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Generative Adversarial Networks" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "np.random.seed(42)\n", "tf.random.set_seed(42)\n", "\n", "codings_size = 30\n", "\n", "generator = keras.models.Sequential([\n", " keras.layers.Dense(100, activation=\"selu\", input_shape=[codings_size]),\n", " keras.layers.Dense(150, activation=\"selu\"),\n", " keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " keras.layers.Reshape([28, 28])\n", "])\n", "discriminator = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(150, activation=\"selu\"),\n", " keras.layers.Dense(100, activation=\"selu\"),\n", " keras.layers.Dense(1, activation=\"sigmoid\")\n", "])\n", "gan = keras.models.Sequential([generator, discriminator])" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "discriminator.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")\n", "discriminator.trainable = False\n", "gan.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "batch_size = 32\n", "dataset = tf.data.Dataset.from_tensor_slices(X_train).shuffle(1000)\n", "dataset = dataset.batch(batch_size, drop_remainder=True).prefetch(1)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "def train_gan(gan, dataset, batch_size, codings_size, n_epochs=50):\n", " generator, discriminator = gan.layers\n", " for epoch in range(n_epochs):\n", " print(\"Epoch {}/{}\".format(epoch + 1, n_epochs)) # not shown in the book\n", " for X_batch in dataset:\n", " # phase 1 - training the discriminator\n", " noise = tf.random.normal(shape=[batch_size, codings_size])\n", " generated_images = generator(noise)\n", " X_fake_and_real = tf.concat([generated_images, X_batch], axis=0)\n", " y1 = tf.constant([[0.]] * batch_size + [[1.]] * batch_size)\n", " discriminator.trainable = True\n", " discriminator.train_on_batch(X_fake_and_real, y1)\n", " # phase 2 - training the generator\n", " noise = tf.random.normal(shape=[batch_size, codings_size])\n", " y2 = tf.constant([[1.]] * batch_size)\n", " discriminator.trainable = False\n", " gan.train_on_batch(noise, y2)\n", " plot_multiple_images(generated_images, 8) # not shown\n", " plt.show() # not shown" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "train_gan(gan, dataset, batch_size, codings_size, n_epochs=1)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "noise = tf.random.normal(shape=[batch_size, codings_size])\n", "generated_images = generator(noise)\n", "plot_multiple_images(generated_images, 8)\n", "save_fig(\"gan_generated_images_plot\", tight_layout=False)" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "train_gan(gan, dataset, batch_size, codings_size)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Deep Convolutional GAN" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "codings_size = 100\n", "\n", "generator = keras.models.Sequential([\n", " keras.layers.Dense(7 * 7 * 128, input_shape=[codings_size]),\n", " keras.layers.Reshape([7, 7, 128]),\n", " keras.layers.BatchNormalization(),\n", " keras.layers.Conv2DTranspose(64, kernel_size=5, strides=2, padding=\"SAME\",\n", " activation=\"selu\"),\n", " keras.layers.BatchNormalization(),\n", " keras.layers.Conv2DTranspose(1, kernel_size=5, strides=2, padding=\"SAME\",\n", " activation=\"tanh\"),\n", "])\n", "discriminator = keras.models.Sequential([\n", " keras.layers.Conv2D(64, kernel_size=5, strides=2, padding=\"SAME\",\n", " activation=keras.layers.LeakyReLU(0.2),\n", " input_shape=[28, 28, 1]),\n", " keras.layers.Dropout(0.4),\n", " keras.layers.Conv2D(128, kernel_size=5, strides=2, padding=\"SAME\",\n", " activation=keras.layers.LeakyReLU(0.2)),\n", " keras.layers.Dropout(0.4),\n", " keras.layers.Flatten(),\n", " keras.layers.Dense(1, activation=\"sigmoid\")\n", "])\n", "gan = keras.models.Sequential([generator, discriminator])" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "discriminator.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")\n", "discriminator.trainable = False\n", "gan.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "X_train_dcgan = X_train.reshape(-1, 28, 28, 1) * 2. - 1. # reshape and rescale" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [], "source": [ "batch_size = 32\n", "dataset = tf.data.Dataset.from_tensor_slices(X_train_dcgan)\n", "dataset = dataset.shuffle(1000)\n", "dataset = dataset.batch(batch_size, drop_remainder=True).prefetch(1)" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [], "source": [ "train_gan(gan, dataset, batch_size, codings_size)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "noise = tf.random.normal(shape=[batch_size, codings_size])\n", "generated_images = generator(noise)\n", "plot_multiple_images(generated_images, 8)\n", "save_fig(\"dcgan_generated_images_plot\", tight_layout=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise Solutions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Unsupervised pretraining" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a small neural network for MNIST classification:" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "X_train_small = X_train[:500]\n", "y_train_small = y_train[:500]\n", "\n", "classifier = keras.models.Sequential([\n", " keras.layers.Reshape([28, 28, 1], input_shape=[28, 28]),\n", " keras.layers.Conv2D(16, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", " keras.layers.MaxPool2D(pool_size=2),\n", " keras.layers.Conv2D(32, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", " keras.layers.MaxPool2D(pool_size=2),\n", " keras.layers.Conv2D(64, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", " keras.layers.MaxPool2D(pool_size=2),\n", " keras.layers.Flatten(),\n", " keras.layers.Dense(20, activation=\"selu\"),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])\n", "classifier.compile(loss=\"sparse_categorical_crossentropy\", optimizer=keras.optimizers.SGD(lr=0.02),\n", " metrics=[\"accuracy\"])\n", "history = classifier.fit(X_train_small, y_train_small, epochs=20, validation_data=(X_valid, y_valid))" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "pd.DataFrame(history.history).plot()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "conv_encoder_clone = keras.models.clone_model(conv_encoder)\n", "\n", "pretrained_clf = keras.models.Sequential([\n", " conv_encoder_clone,\n", " keras.layers.Flatten(),\n", " keras.layers.Dense(20, activation=\"selu\"),\n", " keras.layers.Dense(10, activation=\"softmax\")\n", "])" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [], "source": [ "conv_encoder_clone.trainable = False\n", "pretrained_clf.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=0.02),\n", " metrics=[\"accuracy\"])\n", "history = pretrained_clf.fit(X_train_small, y_train_small, epochs=30,\n", " validation_data=(X_valid, y_valid))" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "scrolled": true }, "outputs": [], "source": [ "conv_encoder_clone.trainable = True\n", "pretrained_clf.compile(loss=\"sparse_categorical_crossentropy\",\n", " optimizer=keras.optimizers.SGD(lr=0.02),\n", " metrics=[\"accuracy\"])\n", "history = pretrained_clf.fit(X_train_small, y_train_small, epochs=20,\n", " validation_data=(X_valid, y_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hashing Using a Binary Autoencoder" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "hashing_encoder = keras.models.Sequential([\n", " keras.layers.Flatten(input_shape=[28, 28]),\n", " keras.layers.Dense(100, activation=\"selu\"),\n", " keras.layers.GaussianNoise(15.),\n", " keras.layers.Dense(16, activation=\"sigmoid\"),\n", "])\n", "hashing_decoder = keras.models.Sequential([\n", " keras.layers.Dense(100, activation=\"selu\", input_shape=[16]),\n", " keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " keras.layers.Reshape([28, 28])\n", "])\n", "hashing_ae = keras.models.Sequential([hashing_encoder, hashing_decoder])\n", "hashing_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n", " metrics=[rounded_accuracy])\n", "history = hashing_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(hashing_ae)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [], "source": [ "plot_activations_histogram(hashing_encoder)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "hashes = np.round(hashing_encoder.predict(X_valid)).astype(np.int32)\n", "hashes *= np.array([[2**bit for bit in range(16)]])\n", "hashes = hashes.sum(axis=1)\n", "for h in hashes[:5]:\n", " print(\"{:016b}\".format(h))\n", "print(\"...\")" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "scrolled": true }, "outputs": [], "source": [ "n_bits = 4\n", "n_images = 8\n", "plt.figure(figsize=(n_images, n_bits))\n", "for bit_index in range(n_bits):\n", " in_bucket = (hashes & 2**bit_index != 0)\n", " for index, image in zip(range(n_images), X_valid[in_bucket]):\n", " plt.subplot(n_bits, n_images, bit_index * n_images + index + 1)\n", " plt.imshow(image, cmap=\"binary\")\n", " plt.axis(\"off\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise Solutions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. to 8.\n", "\n", "See Appendix A." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9.\n", "_Exercise: Try using a denoising autoencoder to pretrain an image classifier. You can use MNIST (the simplest option), or a more complex image dataset such as [CIFAR10](https://homl.info/122) if you want a bigger challenge. Regardless of the dataset you're using, follow these steps:_\n", "* Split the dataset into a training set and a test set. Train a deep denoising autoencoder on the full training set.\n", "* Check that the images are fairly well reconstructed. Visualize the images that most activate each neuron in the coding layer.\n", "* Build a classification DNN, reusing the lower layers of the autoencoder. Train it using only 500 images from the training set. Does it perform better with or without pretraining?" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [], "source": [ "[X_train, y_train], [X_test, y_test] = keras.datasets.cifar10.load_data()\n", "X_train = X_train / 255\n", "X_test = X_test / 255" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "denoising_encoder = keras.models.Sequential([\n", " keras.layers.GaussianNoise(0.1, input_shape=[32, 32, 3]),\n", " keras.layers.Conv2D(32, kernel_size=3, padding=\"same\", activation=\"relu\"),\n", " keras.layers.MaxPool2D(),\n", " keras.layers.Flatten(),\n", " keras.layers.Dense(512, activation=\"relu\"),\n", "])" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [], "source": [ "denoising_encoder.summary()" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "denoising_decoder = keras.models.Sequential([\n", " keras.layers.Dense(16 * 16 * 32, activation=\"relu\", input_shape=[512]),\n", " keras.layers.Reshape([16, 16, 32]),\n", " keras.layers.Conv2DTranspose(filters=3, kernel_size=3, strides=2,\n", " padding=\"same\", activation=\"sigmoid\")\n", "])" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [], "source": [ "denoising_decoder.summary()" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [], "source": [ "denoising_ae = keras.models.Sequential([denoising_encoder, denoising_decoder])\n", "denoising_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.Nadam(),\n", " metrics=[\"mse\"])\n", "history = denoising_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_test, X_test))" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [], "source": [ "n_images = 5\n", "new_images = X_test[:n_images]\n", "new_images_noisy = new_images + np.random.randn(n_images, 32, 32, 3) * 0.1\n", "new_images_denoised = denoising_ae.predict(new_images_noisy)\n", "\n", "plt.figure(figsize=(6, n_images * 2))\n", "for index in range(n_images):\n", " plt.subplot(n_images, 3, index * 3 + 1)\n", " plt.imshow(new_images[index])\n", " plt.axis('off')\n", " if index == 0:\n", " plt.title(\"Original\")\n", " plt.subplot(n_images, 3, index * 3 + 2)\n", " plt.imshow(np.clip(new_images_noisy[index], 0., 1.))\n", " plt.axis('off')\n", " if index == 0:\n", " plt.title(\"Noisy\")\n", " plt.subplot(n_images, 3, index * 3 + 3)\n", " plt.imshow(new_images_denoised[index])\n", " plt.axis('off')\n", " if index == 0:\n", " plt.title(\"Denoised\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 10.\n", "_Exercise: Train a variational autoencoder on the image dataset of your choice, and use it to generate images. Alternatively, you can try to find an unlabeled dataset that you are interested in and see if you can generate new samples._\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 11.\n", "_Exercise: Train a DCGAN to tackle the image dataset of your choice, and use it to generate images. Add experience replay and see if this helps. Turn it into a conditional GAN where you can control the generated class._\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" }, "nav_menu": { "height": "381px", "width": "453px" }, "toc": { "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 6, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }