diff --git a/17_autoencoders_and_gans.ipynb b/17_autoencoders_and_gans.ipynb index af967c2..40d6bbe 100644 --- a/17_autoencoders_and_gans.ipynb +++ b/17_autoencoders_and_gans.ipynb @@ -1411,110 +1411,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Exercise Solutions" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Unsupervised pretraining" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's create a small neural network for MNIST classification:" - ] - }, - { - "cell_type": "code", - "execution_count": 67, - "metadata": {}, - "outputs": [], - "source": [ - "tf.random.set_seed(42)\n", - "np.random.seed(42)\n", - "\n", - "X_train_small = X_train[:500]\n", - "y_train_small = y_train[:500]\n", - "\n", - "classifier = keras.models.Sequential([\n", - " keras.layers.Reshape([28, 28, 1], input_shape=[28, 28]),\n", - " keras.layers.Conv2D(16, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", - " keras.layers.MaxPool2D(pool_size=2),\n", - " keras.layers.Conv2D(32, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", - " keras.layers.MaxPool2D(pool_size=2),\n", - " keras.layers.Conv2D(64, kernel_size=3, padding=\"SAME\", activation=\"selu\"),\n", - " keras.layers.MaxPool2D(pool_size=2),\n", - " keras.layers.Flatten(),\n", - " keras.layers.Dense(20, activation=\"selu\"),\n", - " keras.layers.Dense(10, activation=\"softmax\")\n", - "])\n", - "classifier.compile(loss=\"sparse_categorical_crossentropy\", optimizer=keras.optimizers.SGD(lr=0.02),\n", - " metrics=[\"accuracy\"])\n", - "history = classifier.fit(X_train_small, y_train_small, epochs=20, validation_data=(X_valid, y_valid))" - ] - }, - { - "cell_type": "code", - "execution_count": 68, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "pd.DataFrame(history.history).plot()\n", - "plt.show()" - ] - }, - { - "cell_type": "code", - "execution_count": 69, - "metadata": {}, - "outputs": [], - "source": [ - "tf.random.set_seed(42)\n", - "np.random.seed(42)\n", - "\n", - "conv_encoder_clone = keras.models.clone_model(conv_encoder)\n", - "\n", - "pretrained_clf = keras.models.Sequential([\n", - " conv_encoder_clone,\n", - " keras.layers.Flatten(),\n", - " keras.layers.Dense(20, activation=\"selu\"),\n", - " keras.layers.Dense(10, activation=\"softmax\")\n", - "])" - ] - }, - { - "cell_type": "code", - "execution_count": 70, - "metadata": {}, - "outputs": [], - "source": [ - "conv_encoder_clone.trainable = False\n", - "pretrained_clf.compile(loss=\"sparse_categorical_crossentropy\",\n", - " optimizer=keras.optimizers.SGD(lr=0.02),\n", - " metrics=[\"accuracy\"])\n", - "history = pretrained_clf.fit(X_train_small, y_train_small, epochs=30,\n", - " validation_data=(X_valid, y_valid))" - ] - }, - { - "cell_type": "code", - "execution_count": 71, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "conv_encoder_clone.trainable = True\n", - "pretrained_clf.compile(loss=\"sparse_categorical_crossentropy\",\n", - " optimizer=keras.optimizers.SGD(lr=0.02),\n", - " metrics=[\"accuracy\"])\n", - "history = pretrained_clf.fit(X_train_small, y_train_small, epochs=20,\n", - " validation_data=(X_valid, y_valid))" + "# Extra Material" ] }, { @@ -1524,9 +1421,36 @@ "## Hashing Using a Binary Autoencoder" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's load the Fashion MNIST dataset again:" + ] + }, { "cell_type": "code", - "execution_count": 72, + "execution_count": 67, + "metadata": {}, + "outputs": [], + "source": [ + "(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()\n", + "X_train_full = X_train_full.astype(np.float32) / 255\n", + "X_test = X_test.astype(np.float32) / 255\n", + "X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n", + "y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's train an autoencoder where the encoder has a 16-neuron output layer, using the sigmoid activation function, and heavy Gaussian noise just before it. During training, the noise layer will encourage the previous layer to learn to output large values, since small values will just be crushed by the noise. In turn, this means that the output layer will output values close to 0 or 1, thanks to the sigmoid activation function. Once we round the output values to 0s and 1s, we get a 16-bit \"semantic\" hash. If everything works well, images that look alike will have the same hash. This can be very useful for search engines: for example, if we store each image on a server identified by the image's semantic hash, then all similar images will end up on the same server. Users of the search engine can then provide an image to search for, and the search engine will compute the image's hash using the encoder, and quickly return all the images on the server identified by that hash." + ] + }, + { + "cell_type": "code", + "execution_count": 68, "metadata": {}, "outputs": [], "source": [ @@ -1545,15 +1469,22 @@ " keras.layers.Reshape([28, 28])\n", "])\n", "hashing_ae = keras.models.Sequential([hashing_encoder, hashing_decoder])\n", - "hashing_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.SGD(lr=1.0),\n", + "hashing_ae.compile(loss=\"binary_crossentropy\", optimizer=keras.optimizers.Nadam(),\n", " metrics=[rounded_accuracy])\n", "history = hashing_ae.fit(X_train, X_train, epochs=10,\n", " validation_data=(X_valid, X_valid))" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The autoencoder compresses the information so much (down to 16 bits!) that it's quite lossy, but that's okay, we're using it to produce semantic hashes, not to perfectly reconstruct the images:" + ] + }, { "cell_type": "code", - "execution_count": 73, + "execution_count": 69, "metadata": {}, "outputs": [], "source": [ @@ -1561,9 +1492,16 @@ "plt.show()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that the outputs are indeed very close to 0 or 1 (left graph):" + ] + }, { "cell_type": "code", - "execution_count": 74, + "execution_count": 70, "metadata": {}, "outputs": [], "source": [ @@ -1571,9 +1509,16 @@ "plt.show()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's see what the hashes look like for the first few images in the validation set:" + ] + }, { "cell_type": "code", - "execution_count": 75, + "execution_count": 71, "metadata": {}, "outputs": [], "source": [ @@ -1585,21 +1530,31 @@ "print(\"...\")" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's find the most common image hashes in the validation set, and display a few images for each hash. In the following image, all the images on a given row have the same hash:" + ] + }, { "cell_type": "code", - "execution_count": 76, - "metadata": { - "scrolled": true - }, + "execution_count": 72, + "metadata": {}, "outputs": [], "source": [ - "n_bits = 4\n", + "from collections import Counter\n", + "\n", + "n_hashes = 10\n", "n_images = 8\n", - "plt.figure(figsize=(n_images, n_bits))\n", - "for bit_index in range(n_bits):\n", - " in_bucket = (hashes & 2**bit_index != 0)\n", - " for index, image in zip(range(n_images), X_valid[in_bucket]):\n", - " plt.subplot(n_bits, n_images, bit_index * n_images + index + 1)\n", + "\n", + "top_hashes = Counter(hashes).most_common(n_hashes)\n", + "\n", + "plt.figure(figsize=(n_images, n_hashes))\n", + "for hash_index, (image_hash, hash_count) in enumerate(top_hashes):\n", + " indices = (hashes == image_hash)\n", + " for index, image in enumerate(X_valid[indices][:n_images]):\n", + " plt.subplot(n_hashes, n_images, hash_index * n_images + index + 1)\n", " plt.imshow(image, cmap=\"binary\")\n", " plt.axis(\"off\")" ] @@ -1633,7 +1588,7 @@ }, { "cell_type": "code", - "execution_count": 77, + "execution_count": 73, "metadata": {}, "outputs": [], "source": [ @@ -1644,7 +1599,7 @@ }, { "cell_type": "code", - "execution_count": 78, + "execution_count": 74, "metadata": {}, "outputs": [], "source": [ @@ -1662,7 +1617,7 @@ }, { "cell_type": "code", - "execution_count": 79, + "execution_count": 75, "metadata": {}, "outputs": [], "source": [ @@ -1671,7 +1626,7 @@ }, { "cell_type": "code", - "execution_count": 80, + "execution_count": 76, "metadata": {}, "outputs": [], "source": [ @@ -1685,7 +1640,7 @@ }, { "cell_type": "code", - "execution_count": 81, + "execution_count": 77, "metadata": {}, "outputs": [], "source": [ @@ -1694,7 +1649,7 @@ }, { "cell_type": "code", - "execution_count": 82, + "execution_count": 78, "metadata": {}, "outputs": [], "source": [ @@ -1707,7 +1662,7 @@ }, { "cell_type": "code", - "execution_count": 83, + "execution_count": 79, "metadata": {}, "outputs": [], "source": [