{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "**Chapter 17 – Autoencoders and GANs**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_This notebook contains all the sample code and solutions to the exercises in chapter 17._" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \n", " \n", "
\n", " \"Open\n", " \n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# WORK IN PROGRESS\n", "\n", "\n", "**I'm still working on updating this chapter to the 3rd edition. Please come back in a few weeks.**" ] }, { "cell_type": "markdown", "metadata": { "id": "dFXIv9qNpKzt", "tags": [] }, "source": [ "# Setup" ] }, { "cell_type": "markdown", "metadata": { "id": "8IPbJEmZpKzu" }, "source": [ "This project requires Python 3.7 or above:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "TFSU3FCOpKzu" }, "outputs": [], "source": [ "import sys\n", "\n", "assert sys.version_info >= (3, 7)" ] }, { "cell_type": "markdown", "metadata": { "id": "TAlKky09pKzv" }, "source": [ "It also requires Scikit-Learn ≥ 1.0.1:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "YqCwW7cMpKzw" }, "outputs": [], "source": [ "import sklearn\n", "\n", "assert sklearn.__version__ >= \"1.0.1\"" ] }, { "cell_type": "markdown", "metadata": { "id": "GJtVEqxfpKzw" }, "source": [ "And TensorFlow ≥ 2.6:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "0Piq5se2pKzx" }, "outputs": [], "source": [ "import tensorflow as tf\n", "\n", "assert tf.__version__ >= \"2.6.0\"" ] }, { "cell_type": "markdown", "metadata": { "id": "DDaDoLQTpKzx" }, "source": [ "As we did in earlier chapters, let's define the default font sizes to make the figures prettier:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8d4TH3NbpKzx" }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "plt.rc('font', size=14)\n", "plt.rc('axes', labelsize=14, titlesize=14)\n", "plt.rc('legend', fontsize=14)\n", "plt.rc('xtick', labelsize=10)\n", "plt.rc('ytick', labelsize=10)" ] }, { "cell_type": "markdown", "metadata": { "id": "RcoUIRsvpKzy" }, "source": [ "And let's create the `images/autoencoders` folder (if it doesn't already exist), and define the `save_fig()` function which is used through this notebook to save the figures in high-res for the book:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "PQFH5Y9PpKzy" }, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "IMAGES_PATH = Path() / \"images\" / \"autoencoders\"\n", "IMAGES_PATH.mkdir(parents=True, exist_ok=True)\n", "\n", "def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n", " path = IMAGES_PATH / f\"{fig_id}.{fig_extension}\"\n", " if tight_layout:\n", " plt.tight_layout()\n", " plt.savefig(path, format=fig_extension, dpi=resolution)" ] }, { "cell_type": "markdown", "metadata": { "id": "YTsawKlapKzy" }, "source": [ "This chapter can be very slow without a GPU, so let's make sure there's one, or else issue a warning:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Ekxzo6pOpKzy" }, "outputs": [], "source": [ "if not tf.config.list_physical_devices('GPU'):\n", " print(\"No GPU was detected. Neural nets can be very slow without a GPU.\")\n", " if \"google.colab\" in sys.modules:\n", " print(\"Go to Runtime > Change runtime and select a GPU hardware \"\n", " \"accelerator.\")\n", " if \"kaggle_secrets\" in sys.modules:\n", " print(\"Go to Settings > Accelerator and select GPU.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PCA with a linear Autoencoder" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Build 3D dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(4)\n", "\n", "def generate_3d_data(m, w1=0.1, w2=0.3, noise=0.1):\n", " angles = np.random.rand(m) * 3 * np.pi / 2 - 0.5\n", " data = np.empty((m, 3))\n", " data[:, 0] = np.cos(angles) + np.sin(angles)/2 + noise * np.random.randn(m) / 2\n", " data[:, 1] = np.sin(angles) * 0.7 + noise * np.random.randn(m) / 2\n", " data[:, 2] = data[:, 0] * w1 + data[:, 1] * w2 + noise * np.random.randn(m)\n", " return data\n", "\n", "X_train = generate_3d_data(60)\n", "X_train = X_train - X_train.mean(axis=0, keepdims=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's build the Autoencoder..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(42)\n", "tf.random.set_seed(42)\n", "\n", "encoder = tf.keras.Sequential([tf.keras.layers.Dense(2, input_shape=[3])])\n", "decoder = tf.keras.Sequential([tf.keras.layers.Dense(3, input_shape=[2])])\n", "autoencoder = tf.keras.Sequential([encoder, decoder])\n", "\n", "autoencoder.compile(loss=\"mse\", optimizer=tf.keras.optimizers.SGD(learning_rate=1.5))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "history = autoencoder.fit(X_train, X_train, epochs=20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "codings = encoder.predict(X_train)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = plt.figure(figsize=(4,3))\n", "plt.plot(codings[:,0], codings[:, 1], \"b.\")\n", "plt.xlabel(\"$z_1$\", fontsize=18)\n", "plt.ylabel(\"$z_2$\", fontsize=18, rotation=0)\n", "plt.grid(True)\n", "save_fig(\"linear_autoencoder_pca_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Stacked Autoencoders" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use MNIST:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(X_train_full, y_train_full), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()\n", "X_train_full = X_train_full.astype(np.float32) / 255\n", "X_test = X_test.astype(np.float32) / 255\n", "X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n", "y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train all layers at once" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's build a stacked Autoencoder with 3 hidden layers and 1 output layer (i.e., 2 stacked Autoencoders)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def rounded_accuracy(y_true, y_pred):\n", " return tf.keras.metrics.binary_accuracy(tf.round(y_true), tf.round(y_pred))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "stacked_encoder = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=[28, 28]),\n", " tf.keras.layers.Dense(100, activation=\"selu\"),\n", " tf.keras.layers.Dense(30, activation=\"selu\"),\n", "])\n", "stacked_decoder = tf.keras.Sequential([\n", " tf.keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n", " tf.keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " tf.keras.layers.Reshape([28, 28])\n", "])\n", "stacked_ae = tf.keras.Sequential([stacked_encoder, stacked_decoder])\n", "stacked_ae.compile(loss=\"binary_crossentropy\",\n", " optimizer=tf.keras.optimizers.SGD(learning_rate=1.5), metrics=[rounded_accuracy])\n", "history = stacked_ae.fit(X_train, X_train, epochs=20,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function processes a few test images through the autoencoder and displays the original images and their reconstructions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def show_reconstructions(model, images=X_valid, n_images=5):\n", " reconstructions = model.predict(images[:n_images])\n", " fig = plt.figure(figsize=(n_images * 1.5, 3))\n", " for image_index in range(n_images):\n", " plt.subplot(2, n_images, 1 + image_index)\n", " plot_image(images[image_index])\n", " plt.subplot(2, n_images, 1 + n_images + image_index)\n", " plot_image(reconstructions[image_index])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(stacked_ae)\n", "save_fig(\"reconstruction_plot\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing Fashion MNIST" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(42)\n", "\n", "from sklearn.manifold import TSNE\n", "\n", "X_valid_compressed = stacked_encoder.predict(X_valid)\n", "tsne = TSNE()\n", "X_valid_2D = tsne.fit_transform(X_valid_compressed)\n", "X_valid_2D = (X_valid_2D - X_valid_2D.min()) / (X_valid_2D.max() - X_valid_2D.min())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.scatter(X_valid_2D[:, 0], X_valid_2D[:, 1], c=y_valid, s=10, cmap=\"tab10\")\n", "plt.axis(\"off\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's make this diagram a bit prettier:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# adapted from https://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html\n", "plt.figure(figsize=(10, 8))\n", "cmap = plt.cm.tab10\n", "plt.scatter(X_valid_2D[:, 0], X_valid_2D[:, 1], c=y_valid, s=10, cmap=cmap)\n", "image_positions = np.array([[1., 1.]])\n", "for index, position in enumerate(X_valid_2D):\n", " dist = ((position - image_positions) ** 2).sum(axis=1)\n", " if dist.min() > 0.02: # if far enough from other images\n", " image_positions = np.r_[image_positions, [position]]\n", " imagebox = mpl.offsetbox.AnnotationBbox(\n", " mpl.offsetbox.OffsetImage(X_valid[index], cmap=\"binary\"),\n", " position, bboxprops={\"edgecolor\": cmap(y_valid[index]), \"lw\": 2})\n", " plt.gca().add_artist(imagebox)\n", "plt.axis(\"off\")\n", "save_fig(\"fashion_mnist_visualization_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tying weights" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is common to tie the weights of the encoder and the decoder, by simply using the transpose of the encoder's weights as the decoder weights. For this, we need to use a custom layer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class DenseTranspose(tf.keras.layers.Layer):\n", " def __init__(self, dense, activation=None, **kwargs):\n", " self.dense = dense\n", " self.activation = tf.keras.activations.get(activation)\n", " super().__init__(**kwargs)\n", " def build(self, batch_input_shape):\n", " self.biases = self.add_weight(name=\"bias\",\n", " shape=[self.dense.input_shape[-1]],\n", " initializer=\"zeros\")\n", " super().build(batch_input_shape)\n", " def call(self, inputs):\n", " z = tf.matmul(inputs, self.dense.weights[0], transpose_b=True)\n", " return self.activation(z + self.biases)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.keras.backend.clear_session()\n", "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "dense_1 = tf.keras.layers.Dense(100, activation=\"selu\")\n", "dense_2 = tf.keras.layers.Dense(30, activation=\"selu\")\n", "\n", "tied_encoder = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=[28, 28]),\n", " dense_1,\n", " dense_2\n", "])\n", "\n", "tied_decoder = tf.keras.Sequential([\n", " DenseTranspose(dense_2, activation=\"selu\"),\n", " DenseTranspose(dense_1, activation=\"sigmoid\"),\n", " tf.keras.layers.Reshape([28, 28])\n", "])\n", "\n", "tied_ae = tf.keras.Sequential([tied_encoder, tied_decoder])\n", "\n", "tied_ae.compile(loss=\"binary_crossentropy\",\n", " optimizer=tf.keras.optimizers.SGD(learning_rate=1.5), metrics=[rounded_accuracy])\n", "history = tied_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "show_reconstructions(tied_ae)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training one Autoencoder at a Time" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def train_autoencoder(n_neurons, X_train, X_valid, loss, optimizer,\n", " n_epochs=10, output_activation=None, metrics=None):\n", " n_inputs = X_train.shape[-1]\n", " encoder = tf.keras.Sequential([\n", " tf.keras.layers.Dense(n_neurons, activation=\"selu\", input_shape=[n_inputs])\n", " ])\n", " decoder = tf.keras.Sequential([\n", " tf.keras.layers.Dense(n_inputs, activation=output_activation),\n", " ])\n", " autoencoder = tf.keras.Sequential([encoder, decoder])\n", " autoencoder.compile(optimizer, loss, metrics=metrics)\n", " autoencoder.fit(X_train, X_train, epochs=n_epochs,\n", " validation_data=(X_valid, X_valid))\n", " return encoder, decoder, encoder(X_train), encoder(X_valid)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "K = tf.keras.backend\n", "X_train_flat = K.batch_flatten(X_train) # equivalent to .reshape(-1, 28 * 28)\n", "X_valid_flat = K.batch_flatten(X_valid)\n", "enc1, dec1, X_train_enc1, X_valid_enc1 = train_autoencoder(\n", " 100, X_train_flat, X_valid_flat, \"binary_crossentropy\",\n", " tf.keras.optimizers.SGD(learning_rate=1.5), output_activation=\"sigmoid\",\n", " metrics=[rounded_accuracy])\n", "enc2, dec2, _, _ = train_autoencoder(\n", " 30, X_train_enc1, X_valid_enc1, \"mse\", tf.keras.optimizers.SGD(learning_rate=0.05),\n", " output_activation=\"selu\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stacked_ae_1_by_1 = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=[28, 28]),\n", " enc1, enc2, dec2, dec1,\n", " tf.keras.layers.Reshape([28, 28])\n", "])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(stacked_ae_1_by_1)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stacked_ae_1_by_1.compile(loss=\"binary_crossentropy\",\n", " optimizer=tf.keras.optimizers.SGD(learning_rate=0.1), metrics=[rounded_accuracy])\n", "history = stacked_ae_1_by_1.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(stacked_ae_1_by_1)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Convolutional Layers Instead of Dense Layers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's build a stacked Autoencoder with 3 hidden layers and 1 output layer (i.e., 2 stacked Autoencoders)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "conv_encoder = tf.keras.Sequential([\n", " tf.keras.layers.Reshape([28, 28, 1], input_shape=[28, 28]),\n", " tf.keras.layers.Conv2D(16, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", " tf.keras.layers.MaxPool2D(pool_size=2),\n", " tf.keras.layers.Conv2D(32, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", " tf.keras.layers.MaxPool2D(pool_size=2),\n", " tf.keras.layers.Conv2D(64, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", " tf.keras.layers.MaxPool2D(pool_size=2)\n", "])\n", "conv_decoder = tf.keras.Sequential([\n", " tf.keras.layers.Conv2DTranspose(32, kernel_size=3, strides=2, padding=\"VALID\", activation=\"selu\",\n", " input_shape=[3, 3, 64]),\n", " tf.keras.layers.Conv2DTranspose(16, kernel_size=3, strides=2, padding=\"SAME\", activation=\"selu\"),\n", " tf.keras.layers.Conv2DTranspose(1, kernel_size=3, strides=2, padding=\"SAME\", activation=\"sigmoid\"),\n", " tf.keras.layers.Reshape([28, 28])\n", "])\n", "conv_ae = tf.keras.Sequential([conv_encoder, conv_decoder])\n", "\n", "conv_ae.compile(loss=\"binary_crossentropy\", optimizer=tf.keras.optimizers.SGD(learning_rate=1.0),\n", " metrics=[rounded_accuracy])\n", "history = conv_ae.fit(X_train, X_train, epochs=5,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "conv_encoder.summary()\n", "conv_decoder.summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(conv_ae)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Recurrent Autoencoders" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "recurrent_encoder = tf.keras.Sequential([\n", " tf.keras.layers.LSTM(100, return_sequences=True, input_shape=[28, 28]),\n", " tf.keras.layers.LSTM(30)\n", "])\n", "recurrent_decoder = tf.keras.Sequential([\n", " tf.keras.layers.RepeatVector(28, input_shape=[30]),\n", " tf.keras.layers.LSTM(100, return_sequences=True),\n", " tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(28, activation=\"sigmoid\"))\n", "])\n", "recurrent_ae = tf.keras.Sequential([recurrent_encoder, recurrent_decoder])\n", "recurrent_ae.compile(loss=\"binary_crossentropy\", optimizer=tf.keras.optimizers.SGD(0.1),\n", " metrics=[rounded_accuracy])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "history = recurrent_ae.fit(X_train, X_train, epochs=10, validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(recurrent_ae)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Stacked denoising Autoencoder" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using Gaussian noise:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "denoising_encoder = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=[28, 28]),\n", " tf.keras.layers.GaussianNoise(0.2),\n", " tf.keras.layers.Dense(100, activation=\"selu\"),\n", " tf.keras.layers.Dense(30, activation=\"selu\")\n", "])\n", "denoising_decoder = tf.keras.Sequential([\n", " tf.keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n", " tf.keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " tf.keras.layers.Reshape([28, 28])\n", "])\n", "denoising_ae = tf.keras.Sequential([denoising_encoder, denoising_decoder])\n", "denoising_ae.compile(loss=\"binary_crossentropy\", optimizer=tf.keras.optimizers.SGD(learning_rate=1.0),\n", " metrics=[rounded_accuracy])\n", "history = denoising_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "noise = tf.keras.layers.GaussianNoise(0.2)\n", "show_reconstructions(denoising_ae, noise(X_valid, training=True))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using dropout:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "dropout_encoder = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=[28, 28]),\n", " tf.keras.layers.Dropout(0.5),\n", " tf.keras.layers.Dense(100, activation=\"selu\"),\n", " tf.keras.layers.Dense(30, activation=\"selu\")\n", "])\n", "dropout_decoder = tf.keras.Sequential([\n", " tf.keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n", " tf.keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " tf.keras.layers.Reshape([28, 28])\n", "])\n", "dropout_ae = tf.keras.Sequential([dropout_encoder, dropout_decoder])\n", "dropout_ae.compile(loss=\"binary_crossentropy\", optimizer=tf.keras.optimizers.SGD(learning_rate=1.0),\n", " metrics=[rounded_accuracy])\n", "history = dropout_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "dropout = tf.keras.layers.Dropout(0.5)\n", "show_reconstructions(dropout_ae, dropout(X_valid, training=True))\n", "save_fig(\"dropout_denoising_plot\", tight_layout=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sparse Autoencoder" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's build a simple stacked autoencoder, so we can compare it to the sparse autoencoders we will build. This time we will use the sigmoid activation function for the coding layer, to ensure that the coding values range from 0 to 1:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "simple_encoder = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=[28, 28]),\n", " tf.keras.layers.Dense(100, activation=\"selu\"),\n", " tf.keras.layers.Dense(30, activation=\"sigmoid\"),\n", "])\n", "simple_decoder = tf.keras.Sequential([\n", " tf.keras.layers.Dense(100, activation=\"selu\", input_shape=[30]),\n", " tf.keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " tf.keras.layers.Reshape([28, 28])\n", "])\n", "simple_ae = tf.keras.Sequential([simple_encoder, simple_decoder])\n", "simple_ae.compile(loss=\"binary_crossentropy\", optimizer=tf.keras.optimizers.SGD(learning_rate=1.),\n", " metrics=[rounded_accuracy])\n", "history = simple_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(simple_ae)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a couple functions to print nice activation histograms:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_percent_hist(ax, data, bins):\n", " counts, _ = np.histogram(data, bins=bins)\n", " widths = bins[1:] - bins[:-1]\n", " x = bins[:-1] + widths / 2\n", " ax.bar(x, counts / len(data), width=widths*0.8)\n", " ax.xaxis.set_ticks(bins)\n", " ax.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(\n", " lambda y, position: \"{}%\".format(round(100 * y))))\n", " ax.grid(True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_activations_histogram(encoder, height=1, n_bins=10):\n", " X_valid_codings = encoder(X_valid).numpy()\n", " activation_means = X_valid_codings.mean(axis=0)\n", " mean = activation_means.mean()\n", " bins = np.linspace(0, 1, n_bins + 1)\n", "\n", " fig, [ax1, ax2] = plt.subplots(figsize=(10, 3), nrows=1, ncols=2, sharey=True)\n", " plot_percent_hist(ax1, X_valid_codings.ravel(), bins)\n", " ax1.plot([mean, mean], [0, height], \"k--\", label=\"Overall Mean = {:.2f}\".format(mean))\n", " ax1.legend(loc=\"upper center\", fontsize=14)\n", " ax1.set_xlabel(\"Activation\")\n", " ax1.set_ylabel(\"% Activations\")\n", " ax1.axis([0, 1, 0, height])\n", " plot_percent_hist(ax2, activation_means, bins)\n", " ax2.plot([mean, mean], [0, height], \"k--\")\n", " ax2.set_xlabel(\"Neuron Mean Activation\")\n", " ax2.set_ylabel(\"% Neurons\")\n", " ax2.axis([0, 1, 0, height])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use these functions to plot histograms of the activations of the encoding layer. The histogram on the left shows the distribution of all the activations. You can see that values close to 0 or 1 are more frequent overall, which is consistent with the saturating nature of the sigmoid function. The histogram on the right shows the distribution of mean neuron activations: you can see that most neurons have a mean activation close to 0.5. Both histograms tell us that each neuron tends to either fire close to 0 or 1, with about 50% probability each. However, some neurons fire almost all the time (right side of the right histogram)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_activations_histogram(simple_encoder, height=0.35)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's add $\\ell_1$ regularization to the coding layer:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "sparse_l1_encoder = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=[28, 28]),\n", " tf.keras.layers.Dense(100, activation=\"selu\"),\n", " tf.keras.layers.Dense(300, activation=\"sigmoid\"),\n", " tf.keras.layers.ActivityRegularization(l1=1e-3) # Alternatively, you could add\n", " # activity_regularizer=tf.keras.regularizers.l1(1e-3)\n", " # to the previous layer.\n", "])\n", "sparse_l1_decoder = tf.keras.Sequential([\n", " tf.keras.layers.Dense(100, activation=\"selu\", input_shape=[300]),\n", " tf.keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " tf.keras.layers.Reshape([28, 28])\n", "])\n", "sparse_l1_ae = tf.keras.Sequential([sparse_l1_encoder, sparse_l1_decoder])\n", "sparse_l1_ae.compile(loss=\"binary_crossentropy\", optimizer=tf.keras.optimizers.SGD(learning_rate=1.0),\n", " metrics=[rounded_accuracy])\n", "history = sparse_l1_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(sparse_l1_ae)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_activations_histogram(sparse_l1_encoder, height=1.)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use the KL Divergence loss instead to ensure sparsity, and target 10% sparsity rather than 0%:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "p = 0.1\n", "q = np.linspace(0.001, 0.999, 500)\n", "kl_div = p * np.log(p / q) + (1 - p) * np.log((1 - p) / (1 - q))\n", "mse = (p - q)**2\n", "mae = np.abs(p - q)\n", "plt.plot([p, p], [0, 0.3], \"k:\")\n", "plt.text(0.05, 0.32, \"Target\\nsparsity\", fontsize=14)\n", "plt.plot(q, kl_div, \"b-\", label=\"KL divergence\")\n", "plt.plot(q, mae, \"g--\", label=r\"MAE ($\\ell_1$)\")\n", "plt.plot(q, mse, \"r--\", linewidth=1, label=r\"MSE ($\\ell_2$)\")\n", "plt.legend(loc=\"upper left\", fontsize=14)\n", "plt.xlabel(\"Actual sparsity\")\n", "plt.ylabel(\"Cost\", rotation=0)\n", "plt.axis([0, 1, 0, 0.95])\n", "save_fig(\"sparsity_loss_plot\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "K = tf.keras.backend\n", "kl_divergence = tf.keras.losses.kullback_leibler_divergence\n", "\n", "class KLDivergenceRegularizer(tf.keras.regularizers.Regularizer):\n", " def __init__(self, weight, target=0.1):\n", " self.weight = weight\n", " self.target = target\n", " def __call__(self, inputs):\n", " mean_activities = K.mean(inputs, axis=0)\n", " return self.weight * (\n", " kl_divergence(self.target, mean_activities) +\n", " kl_divergence(1. - self.target, 1. - mean_activities))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "kld_reg = KLDivergenceRegularizer(weight=0.05, target=0.1)\n", "sparse_kl_encoder = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=[28, 28]),\n", " tf.keras.layers.Dense(100, activation=\"selu\"),\n", " tf.keras.layers.Dense(300, activation=\"sigmoid\", activity_regularizer=kld_reg)\n", "])\n", "sparse_kl_decoder = tf.keras.Sequential([\n", " tf.keras.layers.Dense(100, activation=\"selu\", input_shape=[300]),\n", " tf.keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " tf.keras.layers.Reshape([28, 28])\n", "])\n", "sparse_kl_ae = tf.keras.Sequential([sparse_kl_encoder, sparse_kl_decoder])\n", "sparse_kl_ae.compile(loss=\"binary_crossentropy\", optimizer=tf.keras.optimizers.SGD(learning_rate=1.0),\n", " metrics=[rounded_accuracy])\n", "history = sparse_kl_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(sparse_kl_ae)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_activations_histogram(sparse_kl_encoder)\n", "save_fig(\"sparse_autoencoder_plot\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Variational Autoencoder" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Sampling(tf.keras.layers.Layer):\n", " def call(self, inputs):\n", " mean, log_var = inputs\n", " return K.random_normal(tf.shape(log_var)) * K.exp(log_var / 2) + mean " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "codings_size = 10\n", "\n", "inputs = tf.keras.layers.Input(shape=[28, 28])\n", "z = tf.keras.layers.Flatten()(inputs)\n", "z = tf.keras.layers.Dense(150, activation=\"selu\")(z)\n", "z = tf.keras.layers.Dense(100, activation=\"selu\")(z)\n", "codings_mean = tf.keras.layers.Dense(codings_size)(z)\n", "codings_log_var = tf.keras.layers.Dense(codings_size)(z)\n", "codings = Sampling()([codings_mean, codings_log_var])\n", "variational_encoder = tf.keras.Model(\n", " inputs=[inputs], outputs=[codings_mean, codings_log_var, codings])\n", "\n", "decoder_inputs = tf.keras.layers.Input(shape=[codings_size])\n", "x = tf.keras.layers.Dense(100, activation=\"selu\")(decoder_inputs)\n", "x = tf.keras.layers.Dense(150, activation=\"selu\")(x)\n", "x = tf.keras.layers.Dense(28 * 28, activation=\"sigmoid\")(x)\n", "outputs = tf.keras.layers.Reshape([28, 28])(x)\n", "variational_decoder = tf.keras.Model(inputs=[decoder_inputs], outputs=[outputs])\n", "\n", "_, _, codings = variational_encoder(inputs)\n", "reconstructions = variational_decoder(codings)\n", "variational_ae = tf.keras.Model(inputs=[inputs], outputs=[reconstructions])\n", "\n", "latent_loss = -0.5 * K.sum(\n", " 1 + codings_log_var - K.exp(codings_log_var) - K.square(codings_mean),\n", " axis=-1)\n", "variational_ae.add_loss(K.mean(latent_loss) / 784.)\n", "variational_ae.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\", metrics=[rounded_accuracy])\n", "history = variational_ae.fit(X_train, X_train, epochs=25, batch_size=128,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "show_reconstructions(variational_ae)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate Fashion Images" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_multiple_images(images, n_cols=None):\n", " n_cols = n_cols or len(images)\n", " n_rows = (len(images) - 1) // n_cols + 1\n", " if images.shape[-1] == 1:\n", " images = images.squeeze(axis=-1)\n", " plt.figure(figsize=(n_cols, n_rows))\n", " for index, image in enumerate(images):\n", " plt.subplot(n_rows, n_cols, index + 1)\n", " plt.imshow(image, cmap=\"binary\")\n", " plt.axis(\"off\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's generate a few random codings, decode them and plot the resulting images:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "\n", "codings = tf.random.normal(shape=[12, codings_size])\n", "images = variational_decoder(codings).numpy()\n", "plot_multiple_images(images, 4)\n", "save_fig(\"vae_generated_images_plot\", tight_layout=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's perform semantic interpolation between these images:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "codings_grid = tf.reshape(codings, [1, 3, 4, codings_size])\n", "larger_grid = tf.image.resize(codings_grid, size=[5, 7])\n", "interpolated_codings = tf.reshape(larger_grid, [-1, codings_size])\n", "images = variational_decoder(interpolated_codings).numpy()\n", "\n", "plt.figure(figsize=(7, 5))\n", "for index, image in enumerate(images):\n", " plt.subplot(5, 7, index + 1)\n", " if index%7%2==0 and index//7%2==0:\n", " plt.gca().get_xaxis().set_visible(False)\n", " plt.gca().get_yaxis().set_visible(False)\n", " else:\n", " plt.axis(\"off\")\n", " plt.imshow(image, cmap=\"binary\")\n", "save_fig(\"semantic_interpolation_plot\", tight_layout=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Generative Adversarial Networks" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(42)\n", "tf.random.set_seed(42)\n", "\n", "codings_size = 30\n", "\n", "generator = tf.keras.Sequential([\n", " tf.keras.layers.Dense(100, activation=\"selu\", input_shape=[codings_size]),\n", " tf.keras.layers.Dense(150, activation=\"selu\"),\n", " tf.keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " tf.keras.layers.Reshape([28, 28])\n", "])\n", "discriminator = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=[28, 28]),\n", " tf.keras.layers.Dense(150, activation=\"selu\"),\n", " tf.keras.layers.Dense(100, activation=\"selu\"),\n", " tf.keras.layers.Dense(1, activation=\"sigmoid\")\n", "])\n", "gan = tf.keras.Sequential([generator, discriminator])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "discriminator.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")\n", "discriminator.trainable = False\n", "gan.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch_size = 32\n", "dataset = tf.data.Dataset.from_tensor_slices(X_train).shuffle(1000)\n", "dataset = dataset.batch(batch_size, drop_remainder=True).prefetch(1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def train_gan(gan, dataset, batch_size, codings_size, n_epochs=50):\n", " generator, discriminator = gan.layers\n", " for epoch in range(n_epochs):\n", " print(\"Epoch {}/{}\".format(epoch + 1, n_epochs)) # not shown in the book\n", " for X_batch in dataset:\n", " # phase 1 - training the discriminator\n", " noise = tf.random.normal(shape=[batch_size, codings_size])\n", " generated_images = generator(noise)\n", " X_fake_and_real = tf.concat([generated_images, X_batch], axis=0)\n", " y1 = tf.constant([[0.]] * batch_size + [[1.]] * batch_size)\n", " discriminator.trainable = True\n", " discriminator.train_on_batch(X_fake_and_real, y1)\n", " # phase 2 - training the generator\n", " noise = tf.random.normal(shape=[batch_size, codings_size])\n", " y2 = tf.constant([[1.]] * batch_size)\n", " discriminator.trainable = False\n", " gan.train_on_batch(noise, y2)\n", " plot_multiple_images(generated_images, 8) # not shown\n", " plt.show() # not shown" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_gan(gan, dataset, batch_size, codings_size, n_epochs=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "noise = tf.random.normal(shape=[batch_size, codings_size])\n", "generated_images = generator(noise)\n", "plot_multiple_images(generated_images, 8)\n", "save_fig(\"gan_generated_images_plot\", tight_layout=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_gan(gan, dataset, batch_size, codings_size)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Deep Convolutional GAN" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "codings_size = 100\n", "\n", "generator = tf.keras.Sequential([\n", " tf.keras.layers.Dense(7 * 7 * 128, input_shape=[codings_size]),\n", " tf.keras.layers.Reshape([7, 7, 128]),\n", " tf.keras.layers.BatchNormalization(),\n", " tf.keras.layers.Conv2DTranspose(64, kernel_size=5, strides=2, padding=\"SAME\",\n", " activation=\"selu\"),\n", " tf.keras.layers.BatchNormalization(),\n", " tf.keras.layers.Conv2DTranspose(1, kernel_size=5, strides=2, padding=\"SAME\",\n", " activation=\"tanh\"),\n", "])\n", "discriminator = tf.keras.Sequential([\n", " tf.keras.layers.Conv2D(64, kernel_size=5, strides=2, padding=\"SAME\",\n", " activation=tf.keras.layers.LeakyReLU(0.2),\n", " input_shape=[28, 28, 1]),\n", " tf.keras.layers.Dropout(0.4),\n", " tf.keras.layers.Conv2D(128, kernel_size=5, strides=2, padding=\"SAME\",\n", " activation=tf.keras.layers.LeakyReLU(0.2)),\n", " tf.keras.layers.Dropout(0.4),\n", " tf.keras.layers.Flatten(),\n", " tf.keras.layers.Dense(1, activation=\"sigmoid\")\n", "])\n", "gan = tf.keras.Sequential([generator, discriminator])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "discriminator.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")\n", "discriminator.trainable = False\n", "gan.compile(loss=\"binary_crossentropy\", optimizer=\"rmsprop\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train_dcgan = X_train.reshape(-1, 28, 28, 1) * 2. - 1. # reshape and rescale" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "batch_size = 32\n", "dataset = tf.data.Dataset.from_tensor_slices(X_train_dcgan)\n", "dataset = dataset.shuffle(1000)\n", "dataset = dataset.batch(batch_size, drop_remainder=True).prefetch(1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_gan(gan, dataset, batch_size, codings_size)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "noise = tf.random.normal(shape=[batch_size, codings_size])\n", "generated_images = generator(noise)\n", "plot_multiple_images(generated_images, 8)\n", "save_fig(\"dcgan_generated_images_plot\", tight_layout=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Extra Material" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hashing Using a Binary Autoencoder" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load the Fashion MNIST dataset again:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(X_train_full, y_train_full), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()\n", "X_train_full = X_train_full.astype(np.float32) / 255\n", "X_test = X_test.astype(np.float32) / 255\n", "X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n", "y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's train an autoencoder where the encoder has a 16-neuron output layer, using the sigmoid activation function, and heavy Gaussian noise just before it. During training, the noise layer will encourage the previous layer to learn to output large values, since small values will just be crushed by the noise. In turn, this means that the output layer will output values close to 0 or 1, thanks to the sigmoid activation function. Once we round the output values to 0s and 1s, we get a 16-bit \"semantic\" hash. If everything works well, images that look alike will have the same hash. This can be very useful for search engines: for example, if we store each image on a server identified by the image's semantic hash, then all similar images will end up on the same server. Users of the search engine can then provide an image to search for, and the search engine will compute the image's hash using the encoder, and quickly return all the images on the server identified by that hash." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "hashing_encoder = tf.keras.Sequential([\n", " tf.keras.layers.Flatten(input_shape=[28, 28]),\n", " tf.keras.layers.Dense(100, activation=\"selu\"),\n", " tf.keras.layers.GaussianNoise(15.),\n", " tf.keras.layers.Dense(16, activation=\"sigmoid\"),\n", "])\n", "hashing_decoder = tf.keras.Sequential([\n", " tf.keras.layers.Dense(100, activation=\"selu\", input_shape=[16]),\n", " tf.keras.layers.Dense(28 * 28, activation=\"sigmoid\"),\n", " tf.keras.layers.Reshape([28, 28])\n", "])\n", "hashing_ae = tf.keras.Sequential([hashing_encoder, hashing_decoder])\n", "hashing_ae.compile(loss=\"binary_crossentropy\", optimizer=tf.keras.optimizers.Nadam(),\n", " metrics=[rounded_accuracy])\n", "history = hashing_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The autoencoder compresses the information so much (down to 16 bits!) that it's quite lossy, but that's okay, we're using it to produce semantic hashes, not to perfectly reconstruct the images:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_reconstructions(hashing_ae)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the outputs are indeed very close to 0 or 1 (left graph):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_activations_histogram(hashing_encoder)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's see what the hashes look like for the first few images in the validation set:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hashes = hashing_encoder.predict(X_valid).round().astype(np.int32)\n", "hashes *= np.array([[2**bit for bit in range(16)]])\n", "hashes = hashes.sum(axis=1)\n", "for h in hashes[:5]:\n", " print(\"{:016b}\".format(h))\n", "print(\"...\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's find the most common image hashes in the validation set, and display a few images for each hash. In the following image, all the images on a given row have the same hash:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from collections import Counter\n", "\n", "n_hashes = 10\n", "n_images = 8\n", "\n", "top_hashes = Counter(hashes).most_common(n_hashes)\n", "\n", "plt.figure(figsize=(n_images, n_hashes))\n", "for hash_index, (image_hash, hash_count) in enumerate(top_hashes):\n", " indices = (hashes == image_hash)\n", " for index, image in enumerate(X_valid[indices][:n_images]):\n", " plt.subplot(n_hashes, n_images, hash_index * n_images + index + 1)\n", " plt.imshow(image, cmap=\"binary\")\n", " plt.axis(\"off\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise Solutions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. to 8." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Here are some of the main tasks that autoencoders are used for:\n", " * Feature extraction\n", " * Unsupervised pretraining\n", " * Dimensionality reduction\n", " * Generative models\n", " * Anomaly detection (an autoencoder is generally bad at reconstructing outliers)\n", "2. If you want to train a classifier and you have plenty of unlabeled training data but only a few thousand labeled instances, then you could first train a deep autoencoder on the full dataset (labeled + unlabeled), then reuse its lower half for the classifier (i.e., reuse the layers up to the codings layer, included) and train the classifier using the labeled data. If you have little labeled data, you probably want to freeze the reused layers when training the classifier.\n", "3. The fact that an autoencoder perfectly reconstructs its inputs does not necessarily mean that it is a good autoencoder; perhaps it is simply an overcomplete autoencoder that learned to copy its inputs to the codings layer and then to the outputs. In fact, even if the codings layer contained a single neuron, it would be possible for a very deep autoencoder to learn to map each training instance to a different coding (e.g., the first instance could be mapped to 0.001, the second to 0.002, the third to 0.003, and so on), and it could learn \"by heart\" to reconstruct the right training instance for each coding. It would perfectly reconstruct its inputs without really learning any useful pattern in the data. In practice such a mapping is unlikely to happen, but it illustrates the fact that perfect reconstructions are not a guarantee that the autoencoder learned anything useful. However, if it produces very bad reconstructions, then it is almost guaranteed to be a bad autoencoder. To evaluate the performance of an autoencoder, one option is to measure the reconstruction loss (e.g., compute the MSE, or the mean square of the outputs minus the inputs). Again, a high reconstruction loss is a good sign that the autoencoder is bad, but a low reconstruction loss is not a guarantee that it is good. You should also evaluate the autoencoder according to what it will be used for. For example, if you are using it for unsupervised pretraining of a classifier, then you should also evaluate the classifier's performance.\n", "4. An undercomplete autoencoder is one whose codings layer is smaller than the input and output layers. If it is larger, then it is an overcomplete autoencoder. The main risk of an excessively undercomplete autoencoder is that it may fail to reconstruct the inputs. The main risk of an overcomplete autoencoder is that it may just copy the inputs to the outputs, without learning any useful features.\n", "5. To tie the weights of an encoder layer and its corresponding decoder layer, you simply make the decoder weights equal to the transpose of the encoder weights. This reduces the number of parameters in the model by half, often making training converge faster with less training data and reducing the risk of overfitting the training set.\n", "6. A generative model is a model capable of randomly generating outputs that resemble the training instances. For example, once trained successfully on the MNIST dataset, a generative model can be used to randomly generate realistic images of digits. The output distribution is typically similar to the training data. For example, since MNIST contains many images of each digit, the generative model would output roughly the same number of images of each digit. Some generative models can be parametrized—for example, to generate only some kinds of outputs. An example of a generative autoencoder is the variational autoencoder.\n", "7. A generative adversarial network is a neural network architecture composed of two parts, the generator and the discriminator, which have opposing objectives. The generator's goal is to generate instances similar to those in the training set, to fool the discriminator. The discriminator must distinguish the real instances from the generated ones. At each training iteration, the discriminator is trained like a normal binary classifier, then the generator is trained to maximize the discriminator's error. GANs are used for advanced image processing tasks such as super resolution, colorization, image editing (replacing objects with realistic background), turning a simple sketch into a photorealistic image, or predicting the next frames in a video. They are also used to augment a dataset (to train other models), to generate other types of data (such as text, audio, and time series), and to identify the weaknesses in other models and strengthen them.\n", "8. Training GANs is notoriously difficult, because of the complex dynamics between the generator and the discriminator. The biggest difficulty is mode collapse, where the generator produces outputs with very little diversity. Moreover, training can be terribly unstable: it may start out fine and then suddenly start oscillating or diverging, without any apparent reason. GANs are also very sensitive to the choice of hyperparameters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9.\n", "_Exercise: Try using a denoising autoencoder to pretrain an image classifier. You can use MNIST (the simplest option), or a more complex image dataset such as [CIFAR10](https://homl.info/122) if you want a bigger challenge. Regardless of the dataset you're using, follow these steps:_\n", "* Split the dataset into a training set and a test set. Train a deep denoising autoencoder on the full training set.\n", "* Check that the images are fairly well reconstructed. Visualize the images that most activate each neuron in the coding layer.\n", "* Build a classification DNN, reusing the lower layers of the autoencoder. Train it using only 500 images from the training set. Does it perform better with or without pretraining?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "[X_train, y_train], [X_test, y_test] = tf.keras.datasets.cifar10.load_data()\n", "X_train = X_train / 255\n", "X_test = X_test / 255" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tf.random.set_seed(42)\n", "np.random.seed(42)\n", "\n", "denoising_encoder = tf.keras.Sequential([\n", " tf.keras.layers.GaussianNoise(0.1, input_shape=[32, 32, 3]),\n", " tf.keras.layers.Conv2D(32, kernel_size=3, padding=\"same\", activation=\"relu\"),\n", " tf.keras.layers.MaxPool2D(),\n", " tf.keras.layers.Flatten(),\n", " tf.keras.layers.Dense(512, activation=\"relu\"),\n", "])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "denoising_encoder.summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "denoising_decoder = tf.keras.Sequential([\n", " tf.keras.layers.Dense(16 * 16 * 32, activation=\"relu\", input_shape=[512]),\n", " tf.keras.layers.Reshape([16, 16, 32]),\n", " tf.keras.layers.Conv2DTranspose(filters=3, kernel_size=3, strides=2,\n", " padding=\"same\", activation=\"sigmoid\")\n", "])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "denoising_decoder.summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "denoising_ae = tf.keras.Sequential([denoising_encoder, denoising_decoder])\n", "denoising_ae.compile(loss=\"binary_crossentropy\", optimizer=tf.keras.optimizers.Nadam(),\n", " metrics=[\"mse\"])\n", "history = denoising_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_test, X_test))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n_images = 5\n", "new_images = X_test[:n_images]\n", "new_images_noisy = new_images + np.random.randn(n_images, 32, 32, 3) * 0.1\n", "new_images_denoised = denoising_ae.predict(new_images_noisy)\n", "\n", "plt.figure(figsize=(6, n_images * 2))\n", "for index in range(n_images):\n", " plt.subplot(n_images, 3, index * 3 + 1)\n", " plt.imshow(new_images[index])\n", " plt.axis('off')\n", " if index == 0:\n", " plt.title(\"Original\")\n", " plt.subplot(n_images, 3, index * 3 + 2)\n", " plt.imshow(new_images_noisy[index].clip(0., 1.))\n", " plt.axis('off')\n", " if index == 0:\n", " plt.title(\"Noisy\")\n", " plt.subplot(n_images, 3, index * 3 + 3)\n", " plt.imshow(new_images_denoised[index])\n", " plt.axis('off')\n", " if index == 0:\n", " plt.title(\"Denoised\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 10.\n", "_Exercise: Train a variational autoencoder on the image dataset of your choice, and use it to generate images. Alternatively, you can try to find an unlabeled dataset that you are interested in and see if you can generate new samples._\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 11.\n", "_Exercise: Train a DCGAN to tackle the image dataset of your choice, and use it to generate images. Add experience replay and see if this helps. Turn it into a conditional GAN where you can control the generated class._\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" }, "nav_menu": { "height": "381px", "width": "453px" }, "toc": { "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 6, "toc_cell": false, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }