handson-ml/13_deep_computer_vision_wit...

1475 lines
43 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Chapter 13 Deep Computer Vision Using Convolutional Neural Networks**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_This notebook contains all the sample code and solutions to the exercises in chapter 13._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<table align=\"left\">\n",
" <td>\n",
" <a href=\"https://colab.research.google.com/github/ageron/handson-ml2/blob/master/14_deep_computer_vision_with_cnns.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://kaggle.com/kernels/welcome?src=https://github.com/ageron/handson-ml2/blob/master/14_deep_computer_vision_with_cnns.ipynb\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" /></a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Python ≥3.7 is required\n",
"import sys\n",
"assert sys.version_info >= (3, 7)\n",
"\n",
"# Is this notebook running on Colab or Kaggle?\n",
"IS_COLAB = \"google.colab\" in sys.modules\n",
"IS_KAGGLE = \"kaggle_secrets\" in sys.modules\n",
"\n",
"# Common imports\n",
"import numpy as np\n",
"from pathlib import Path\n",
"\n",
"# Scikit-Learn ≥1.0 is required\n",
"import sklearn\n",
"assert sklearn.__version__ >= \"1.0\"\n",
"\n",
"# TensorFlow ≥2.6 is required\n",
"import tensorflow as tf\n",
"assert tf.__version__ >= \"2.6\"\n",
"\n",
"# to make this notebook's output stable across runs\n",
"np.random.seed(42)\n",
"tf.random.set_seed(42)\n",
"\n",
"if not tf.config.list_physical_devices('GPU'):\n",
" print(\"No GPU was detected. Neural nets can be very slow without a GPU.\")\n",
" if IS_COLAB:\n",
" print(\"Go to Runtime > Change runtime and select a GPU hardware accelerator.\")\n",
" if IS_KAGGLE:\n",
" print(\"Go to Settings > Accelerator and select GPU.\")\n",
"\n",
"# To plot pretty figures\n",
"%matplotlib inline\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"mpl.rc('axes', labelsize=14)\n",
"mpl.rc('xtick', labelsize=12)\n",
"mpl.rc('ytick', labelsize=12)\n",
"\n",
"# Where to save the figures\n",
"IMAGES_PATH = Path() / \"images\" / \"cnn\"\n",
"IMAGES_PATH.mkdir(parents=True, exist_ok=True)\n",
"\n",
"def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n",
" path = IMAGES_PATH / f\"{fig_id}.{fig_extension}\"\n",
" if tight_layout:\n",
" plt.tight_layout()\n",
" plt.savefig(path, format=fig_extension, dpi=resolution)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A couple utility functions to plot grayscale and RGB images:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def plot_image(image):\n",
" plt.imshow(image, cmap=\"gray\", interpolation=\"nearest\")\n",
" plt.axis(\"off\")\n",
"\n",
"def plot_color_image(image):\n",
" plt.imshow(image, interpolation=\"nearest\")\n",
" plt.axis(\"off\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What is a Convolution?"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from sklearn.datasets import load_sample_image\n",
"\n",
"# Load sample images\n",
"china = load_sample_image(\"china.jpg\") / 255\n",
"flower = load_sample_image(\"flower.jpg\") / 255\n",
"images = np.array([china, flower])\n",
"batch_size, height, width, channels = images.shape\n",
"\n",
"# Create 2 filters\n",
"filters = np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)\n",
"filters[:, 3, :, 0] = 1 # vertical line\n",
"filters[3, :, :, 1] = 1 # horizontal line\n",
"\n",
"outputs = tf.nn.conv2d(images, filters, strides=1, padding=\"SAME\")\n",
"\n",
"plt.imshow(outputs[0, :, :, 1], cmap=\"gray\") # plot 1st image's 2nd feature map\n",
"plt.axis(\"off\") # Not shown in the book\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"for image_index in (0, 1):\n",
" for feature_map_index in (0, 1):\n",
" plt.subplot(2, 2, image_index * 2 + feature_map_index + 1)\n",
" plot_image(outputs[image_index, :, :, feature_map_index])\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def crop(images):\n",
" return images[150:220, 130:250]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"plot_image(crop(images[0, :, :, 0]))\n",
"save_fig(\"china_original\", tight_layout=False)\n",
"plt.show()\n",
"\n",
"for feature_map_index, filename in enumerate([\"china_vertical\", \"china_horizontal\"]):\n",
" plot_image(crop(outputs[0, :, :, feature_map_index]))\n",
" save_fig(filename, tight_layout=False)\n",
" plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"plot_image(filters[:, :, 0, 0])\n",
"plt.show()\n",
"plot_image(filters[:, :, 0, 1])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convolutional Layer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create a 2D convolutional layer, using `keras.layers.Conv2D()`:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(42)\n",
"tf.random.set_seed(42)\n",
"\n",
"conv = keras.layers.Conv2D(filters=2, kernel_size=7, strides=1,\n",
" padding=\"SAME\", activation=\"relu\", input_shape=outputs.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's call this layer, passing it the two test images:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"conv_outputs = conv(images)\n",
"conv_outputs.shape "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The output is a 4D tensor. The dimensions are: batch size, height, width, channels. The first dimension (batch size) is 2 since there are 2 input images. The next two dimensions are the height and width of the output feature maps: since `padding=\"SAME\"` and `strides=1`, the output feature maps have the same height and width as the input images (in this case, 427×640). Lastly, this convolutional layer has 2 filters, so the last dimension is 2: there are 2 output feature maps per input image."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the filters are initialized randomly, they'll initially detect random patterns. Let's take a look at the 2 output features maps for each image:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(10,6))\n",
"for image_index in (0, 1):\n",
" for feature_map_index in (0, 1):\n",
" plt.subplot(2, 2, image_index * 2 + feature_map_index + 1)\n",
" plot_image(crop(conv_outputs[image_index, :, :, feature_map_index]))\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Although the filters were initialized randomly, the second filter happens to act like an edge detector. Randomly initialized filters often act this way, which is quite fortunate since detecting edges is quite useful in image processing."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we want, we can set the filters to be the ones we manually defined earlier, and set the biases to zeros (in real life we will almost never need to set filters or biases manually, as the convolutional layer will just learn the appropriate filters and biases during training):"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"conv.set_weights([filters, np.zeros(2)])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's call this layer again on the same two images, and let's check that the output feature maps do highlight vertical lines and horizontal lines, respectively (as earlier):"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"conv_outputs = conv(images)\n",
"conv_outputs.shape "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(10,6))\n",
"for image_index in (0, 1):\n",
" for feature_map_index in (0, 1):\n",
" plt.subplot(2, 2, image_index * 2 + feature_map_index + 1)\n",
" plot_image(crop(conv_outputs[image_index, :, :, feature_map_index]))\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## VALID vs SAME padding"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def feature_map_size(input_size, kernel_size, strides=1, padding=\"SAME\"):\n",
" if padding == \"SAME\":\n",
" return (input_size - 1) // strides + 1\n",
" else:\n",
" return (input_size - kernel_size) // strides + 1"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"def pad_before_and_padded_size(input_size, kernel_size, strides=1):\n",
" fmap_size = feature_map_size(input_size, kernel_size, strides)\n",
" padded_size = max((fmap_size - 1) * strides + kernel_size, input_size)\n",
" pad_before = (padded_size - input_size) // 2\n",
" return pad_before, padded_size"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"def manual_same_padding(images, kernel_size, strides=1):\n",
" if kernel_size == 1:\n",
" return images.astype(np.float32)\n",
" batch_size, height, width, channels = images.shape\n",
" top_pad, padded_height = pad_before_and_padded_size(height, kernel_size, strides)\n",
" left_pad, padded_width = pad_before_and_padded_size(width, kernel_size, strides)\n",
" padded_shape = [batch_size, padded_height, padded_width, channels]\n",
" padded_images = np.zeros(padded_shape, dtype=np.float32)\n",
" padded_images[:, top_pad:height+top_pad, left_pad:width+left_pad, :] = images\n",
" return padded_images"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `\"SAME\"` padding is equivalent to padding manually using `manual_same_padding()` then using `\"VALID\"` padding (confusingly, `\"VALID\"` padding means no padding at all):"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"kernel_size = 7\n",
"strides = 2\n",
"\n",
"conv_valid = keras.layers.Conv2D(filters=1, kernel_size=kernel_size, strides=strides, padding=\"VALID\")\n",
"conv_same = keras.layers.Conv2D(filters=1, kernel_size=kernel_size, strides=strides, padding=\"SAME\")\n",
"\n",
"valid_output = conv_valid(manual_same_padding(images, kernel_size, strides))\n",
"\n",
"# Need to call build() so conv_same's weights get created\n",
"conv_same.build(tf.TensorShape(images.shape))\n",
"\n",
"# Copy the weights from conv_valid to conv_same\n",
"conv_same.set_weights(conv_valid.get_weights())\n",
"\n",
"same_output = conv_same(images.astype(np.float32))\n",
"\n",
"assert np.allclose(valid_output.numpy(), same_output.numpy())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pooling layer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Max pooling"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"max_pool = keras.layers.MaxPool2D(pool_size=2)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"cropped_images = np.array([crop(image) for image in images], dtype=np.float32)\n",
"output = max_pool(cropped_images)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[2, 1])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(cropped_images[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(output[0]) # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"save_fig(\"china_max_pooling\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Depth-wise pooling"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"class DepthMaxPool(keras.layers.Layer):\n",
" def __init__(self, pool_size, strides=None, padding=\"VALID\", **kwargs):\n",
" super().__init__(**kwargs)\n",
" if strides is None:\n",
" strides = pool_size\n",
" self.pool_size = pool_size\n",
" self.strides = strides\n",
" self.padding = padding\n",
" def call(self, inputs):\n",
" return tf.nn.max_pool(inputs,\n",
" ksize=(1, 1, 1, self.pool_size),\n",
" strides=(1, 1, 1, self.pool_size),\n",
" padding=self.padding)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"depth_pool = DepthMaxPool(3)\n",
"with tf.device(\"/cpu:0\"): # there is no GPU-kernel yet\n",
" depth_output = depth_pool(cropped_images)\n",
"depth_output.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or just use a `Lambda` layer:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"depth_pool = keras.layers.Lambda(lambda X: tf.nn.max_pool(\n",
" X, ksize=(1, 1, 1, 3), strides=(1, 1, 1, 3), padding=\"VALID\"))\n",
"with tf.device(\"/cpu:0\"): # there is no GPU-kernel yet\n",
" depth_output = depth_pool(cropped_images)\n",
"depth_output.shape"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(12, 8))\n",
"plt.subplot(1, 2, 1)\n",
"plt.title(\"Input\", fontsize=14)\n",
"plot_color_image(cropped_images[0]) # plot the 1st image\n",
"plt.subplot(1, 2, 2)\n",
"plt.title(\"Output\", fontsize=14)\n",
"plot_image(depth_output[0, ..., 0]) # plot the output for the 1st image\n",
"plt.axis(\"off\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Average pooling"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"avg_pool = keras.layers.AvgPool2D(pool_size=2)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"output_avg = avg_pool(cropped_images)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[2, 1])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(cropped_images[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(output_avg[0]) # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Global Average Pooling"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"global_avg_pool = keras.layers.GlobalAvgPool2D()\n",
"global_avg_pool(cropped_images)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"output_global_avg2 = keras.layers.Lambda(lambda X: tf.reduce_mean(X, axis=[1, 2]))\n",
"output_global_avg2(cropped_images)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tackling Fashion MNIST With a CNN"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()\n",
"X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n",
"y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]\n",
"\n",
"X_mean = X_train.mean(axis=0, keepdims=True)\n",
"X_std = X_train.std(axis=0, keepdims=True) + 1e-7\n",
"X_train = (X_train - X_mean) / X_std\n",
"X_valid = (X_valid - X_mean) / X_std\n",
"X_test = (X_test - X_mean) / X_std\n",
"\n",
"X_train = X_train[..., np.newaxis]\n",
"X_valid = X_valid[..., np.newaxis]\n",
"X_test = X_test[..., np.newaxis]"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"from functools import partial\n",
"\n",
"DefaultConv2D = partial(keras.layers.Conv2D,\n",
" kernel_size=3, activation='relu', padding=\"SAME\")\n",
"\n",
"model = keras.models.Sequential([\n",
" DefaultConv2D(filters=64, kernel_size=7, input_shape=[28, 28, 1]),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" DefaultConv2D(filters=128),\n",
" DefaultConv2D(filters=128),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" DefaultConv2D(filters=256),\n",
" DefaultConv2D(filters=256),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" keras.layers.Flatten(),\n",
" keras.layers.Dense(units=128, activation='relu'),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(units=64, activation='relu'),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(units=10, activation='softmax'),\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n",
"history = model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))\n",
"score = model.evaluate(X_test, y_test)\n",
"X_new = X_test[:10] # pretend we have new images\n",
"y_pred = model.predict(X_new)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ResNet-34"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"DefaultConv2D = partial(keras.layers.Conv2D, kernel_size=3, strides=1,\n",
" padding=\"SAME\", use_bias=False)\n",
"\n",
"class ResidualUnit(keras.layers.Layer):\n",
" def __init__(self, filters, strides=1, activation=\"relu\", **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.activation = keras.activations.get(activation)\n",
" self.main_layers = [\n",
" DefaultConv2D(filters, strides=strides),\n",
" keras.layers.BatchNormalization(),\n",
" self.activation,\n",
" DefaultConv2D(filters),\n",
" keras.layers.BatchNormalization()]\n",
" self.skip_layers = []\n",
" if strides > 1:\n",
" self.skip_layers = [\n",
" DefaultConv2D(filters, kernel_size=1, strides=strides),\n",
" keras.layers.BatchNormalization()]\n",
"\n",
" def call(self, inputs):\n",
" Z = inputs\n",
" for layer in self.main_layers:\n",
" Z = layer(Z)\n",
" skip_Z = inputs\n",
" for layer in self.skip_layers:\n",
" skip_Z = layer(skip_Z)\n",
" return self.activation(Z + skip_Z)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"model = keras.models.Sequential()\n",
"model.add(DefaultConv2D(64, kernel_size=7, strides=2,\n",
" input_shape=[224, 224, 3]))\n",
"model.add(keras.layers.BatchNormalization())\n",
"model.add(keras.layers.Activation(\"relu\"))\n",
"model.add(keras.layers.MaxPool2D(pool_size=3, strides=2, padding=\"SAME\"))\n",
"prev_filters = 64\n",
"for filters in [64] * 3 + [128] * 4 + [256] * 6 + [512] * 3:\n",
" strides = 1 if filters == prev_filters else 2\n",
" model.add(ResidualUnit(filters, strides=strides))\n",
" prev_filters = filters\n",
"model.add(keras.layers.GlobalAvgPool2D())\n",
"model.add(keras.layers.Flatten())\n",
"model.add(keras.layers.Dense(10, activation=\"softmax\"))"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"model.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using a Pretrained Model"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"model = keras.applications.resnet50.ResNet50(weights=\"imagenet\")"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"images_resized = tf.image.resize(images, [224, 224])\n",
"plot_color_image(images_resized[0])\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"images_resized = tf.image.resize_with_pad(images, 224, 224, antialias=True)\n",
"plot_color_image(images_resized[0])"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"images_resized = tf.image.resize_with_crop_or_pad(images, 224, 224)\n",
"plot_color_image(images_resized[0])\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"china_box = [0, 0.03, 1, 0.68]\n",
"flower_box = [0.19, 0.26, 0.86, 0.7]\n",
"images_resized = tf.image.crop_and_resize(images, [china_box, flower_box], [0, 1], [224, 224])\n",
"plot_color_image(images_resized[0])\n",
"plt.show()\n",
"plot_color_image(images_resized[1])\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"inputs = keras.applications.resnet50.preprocess_input(images_resized * 255)\n",
"Y_proba = model.predict(inputs)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"Y_proba.shape"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"top_K = keras.applications.resnet50.decode_predictions(Y_proba, top=3)\n",
"for image_index in range(len(images)):\n",
" print(\"Image #{}\".format(image_index))\n",
" for class_id, name, y_proba in top_K[image_index]:\n",
" print(\" {} - {:12s} {:.2f}%\".format(class_id, name, y_proba * 100))\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pretrained Models for Transfer Learning"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"import tensorflow_datasets as tfds\n",
"\n",
"dataset, info = tfds.load(\"tf_flowers\", as_supervised=True, with_info=True)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"info.splits"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"info.splits[\"train\"]"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"class_names = info.features[\"label\"].names\n",
"class_names"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"n_classes = info.features[\"label\"].num_classes"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"dataset_size = info.splits[\"train\"].num_examples\n",
"dataset_size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning:** TFDS's split API has evolved since the book was published. The [new split API](https://www.tensorflow.org/datasets/splits) (called S3) is much simpler to use:"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"test_set_raw, valid_set_raw, train_set_raw = tfds.load(\n",
" \"tf_flowers\",\n",
" split=[\"train[:10%]\", \"train[10%:25%]\", \"train[25%:]\"],\n",
" as_supervised=True)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(12, 10))\n",
"index = 0\n",
"for image, label in train_set_raw.take(9):\n",
" index += 1\n",
" plt.subplot(3, 3, index)\n",
" plt.imshow(image)\n",
" plt.title(\"Class: {}\".format(class_names[label]))\n",
" plt.axis(\"off\")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Basic preprocessing:"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"def preprocess(image, label):\n",
" resized_image = tf.image.resize(image, [224, 224])\n",
" final_image = keras.applications.xception.preprocess_input(resized_image)\n",
" return final_image, label"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Slightly fancier preprocessing (but you could add much more data augmentation):"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"def central_crop(image):\n",
" shape = tf.shape(image)\n",
" min_dim = tf.reduce_min([shape[0], shape[1]])\n",
" top_crop = (shape[0] - min_dim) // 4\n",
" bottom_crop = shape[0] - top_crop\n",
" left_crop = (shape[1] - min_dim) // 4\n",
" right_crop = shape[1] - left_crop\n",
" return image[top_crop:bottom_crop, left_crop:right_crop]\n",
"\n",
"def random_crop(image):\n",
" shape = tf.shape(image)\n",
" min_dim = tf.reduce_min([shape[0], shape[1]]) * 90 // 100\n",
" return tf.image.random_crop(image, [min_dim, min_dim, 3])\n",
"\n",
"def preprocess(image, label, randomize=False):\n",
" if randomize:\n",
" cropped_image = random_crop(image)\n",
" cropped_image = tf.image.random_flip_left_right(cropped_image)\n",
" else:\n",
" cropped_image = central_crop(image)\n",
" resized_image = tf.image.resize(cropped_image, [224, 224])\n",
" final_image = keras.applications.xception.preprocess_input(resized_image)\n",
" return final_image, label\n",
"\n",
"batch_size = 32\n",
"train_set = train_set_raw.shuffle(1000).repeat()\n",
"train_set = train_set.map(partial(preprocess, randomize=True)).batch(batch_size).prefetch(1)\n",
"valid_set = valid_set_raw.map(preprocess).batch(batch_size).prefetch(1)\n",
"test_set = test_set_raw.map(preprocess).batch(batch_size).prefetch(1)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(12, 12))\n",
"for X_batch, y_batch in train_set.take(1):\n",
" for index in range(9):\n",
" plt.subplot(3, 3, index + 1)\n",
" plt.imshow(X_batch[index] / 2 + 0.5)\n",
" plt.title(\"Class: {}\".format(class_names[y_batch[index]]))\n",
" plt.axis(\"off\")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(12, 12))\n",
"for X_batch, y_batch in test_set.take(1):\n",
" for index in range(9):\n",
" plt.subplot(3, 3, index + 1)\n",
" plt.imshow(X_batch[index] / 2 + 0.5)\n",
" plt.title(\"Class: {}\".format(class_names[y_batch[index]]))\n",
" plt.axis(\"off\")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"base_model = keras.applications.xception.Xception(weights=\"imagenet\",\n",
" include_top=False)\n",
"avg = keras.layers.GlobalAveragePooling2D()(base_model.output)\n",
"output = keras.layers.Dense(n_classes, activation=\"softmax\")(avg)\n",
"model = keras.models.Model(inputs=base_model.input, outputs=output)"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"for index, layer in enumerate(base_model.layers):\n",
" print(index, layer.name)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"for layer in base_model.layers:\n",
" layer.trainable = False\n",
"\n",
"optimizer = keras.optimizers.SGD(learning_rate=0.2, momentum=0.9, decay=0.01)\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer,\n",
" metrics=[\"accuracy\"])\n",
"history = model.fit(train_set,\n",
" steps_per_epoch=int(0.75 * dataset_size / batch_size),\n",
" validation_data=valid_set,\n",
" validation_steps=int(0.15 * dataset_size / batch_size),\n",
" epochs=5)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"for layer in base_model.layers:\n",
" layer.trainable = True\n",
"\n",
"optimizer = keras.optimizers.SGD(learning_rate=0.01, momentum=0.9,\n",
" nesterov=True, decay=0.001)\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer,\n",
" metrics=[\"accuracy\"])\n",
"history = model.fit(train_set,\n",
" steps_per_epoch=int(0.75 * dataset_size / batch_size),\n",
" validation_data=valid_set,\n",
" validation_steps=int(0.15 * dataset_size / batch_size),\n",
" epochs=40)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Classification and Localization"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"base_model = keras.applications.xception.Xception(weights=\"imagenet\",\n",
" include_top=False)\n",
"avg = keras.layers.GlobalAveragePooling2D()(base_model.output)\n",
"class_output = keras.layers.Dense(n_classes, activation=\"softmax\")(avg)\n",
"loc_output = keras.layers.Dense(4)(avg)\n",
"model = keras.models.Model(inputs=base_model.input,\n",
" outputs=[class_output, loc_output])\n",
"model.compile(loss=[\"sparse_categorical_crossentropy\", \"mse\"],\n",
" loss_weights=[0.8, 0.2], # depends on what you care most about\n",
" optimizer=optimizer, metrics=[\"accuracy\"])"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"def add_random_bounding_boxes(images, labels):\n",
" fake_bboxes = tf.random.uniform([tf.shape(images)[0], 4])\n",
" return images, (labels, fake_bboxes)\n",
"\n",
"fake_train_set = train_set.take(5).repeat(2).map(add_random_bounding_boxes)"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"model.fit(fake_train_set, steps_per_epoch=5, epochs=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mean Average Precision (mAP)"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
"def maximum_precisions(precisions):\n",
" return np.flip(np.maximum.accumulate(np.flip(precisions)))"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"recalls = np.linspace(0, 1, 11)\n",
"\n",
"precisions = [0.91, 0.94, 0.96, 0.94, 0.95, 0.92, 0.80, 0.60, 0.45, 0.20, 0.10]\n",
"max_precisions = maximum_precisions(precisions)\n",
"mAP = max_precisions.mean()\n",
"plt.plot(recalls, precisions, \"ro--\", label=\"Precision\")\n",
"plt.plot(recalls, max_precisions, \"bo-\", label=\"Max Precision\")\n",
"plt.xlabel(\"Recall\")\n",
"plt.ylabel(\"Precision\")\n",
"plt.plot([0, 1], [mAP, mAP], \"g:\", linewidth=3, label=\"mAP\")\n",
"plt.grid(True)\n",
"plt.axis([0, 1, 0, 1])\n",
"plt.legend(loc=\"lower center\", fontsize=14)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Transpose convolutions:"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"X = images_resized.numpy()\n",
"\n",
"conv_transpose = keras.layers.Conv2DTranspose(filters=5, kernel_size=3, strides=2, padding=\"VALID\")\n",
"output = conv_transpose(X)\n",
"output.shape"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"def normalize(X):\n",
" return (X - tf.reduce_min(X)) / (tf.reduce_max(X) - tf.reduce_min(X))\n",
"\n",
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[1, 2])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(X[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(normalize(output[0, ..., :3]), interpolation=\"bicubic\") # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"def upscale_images(images, stride, kernel_size):\n",
" batch_size, height, width, channels = images.shape\n",
" upscaled = np.zeros((batch_size,\n",
" (height - 1) * stride + 2 * kernel_size - 1,\n",
" (width - 1) * stride + 2 * kernel_size - 1,\n",
" channels))\n",
" upscaled[:,\n",
" kernel_size - 1:(height - 1) * stride + kernel_size:stride,\n",
" kernel_size - 1:(width - 1) * stride + kernel_size:stride,\n",
" :] = images\n",
" return upscaled"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [],
"source": [
"upscaled = upscale_images(X, stride=2, kernel_size=3)\n",
"weights, biases = conv_transpose.weights\n",
"reversed_filters = np.flip(weights.numpy(), axis=[0, 1])\n",
"reversed_filters = np.transpose(reversed_filters, [0, 1, 3, 2])\n",
"manual_output = tf.nn.conv2d(upscaled, reversed_filters, strides=1, padding=\"VALID\")"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"def normalize(X):\n",
" return (X - tf.reduce_min(X)) / (tf.reduce_max(X) - tf.reduce_min(X))\n",
"\n",
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=3, width_ratios=[1, 2, 2])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(X[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Upscaled\", fontsize=14)\n",
"ax2.imshow(upscaled[0], interpolation=\"bicubic\")\n",
"ax2.axis(\"off\")\n",
"ax3 = fig.add_subplot(gs[0, 2])\n",
"ax3.set_title(\"Output\", fontsize=14)\n",
"ax3.imshow(normalize(manual_output[0, ..., :3]), interpolation=\"bicubic\") # plot the output for the 1st image\n",
"ax3.axis(\"off\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [],
"source": [
"np.allclose(output, manual_output.numpy(), atol=1e-7)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. to 8."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See appendix A."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 9. High Accuracy CNN for MNIST\n",
"_Exercise: Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following model uses 2 convolutional layers, followed by 1 pooling layer, then dropout 25%, then a dense layer, another dropout layer but with 50% dropout, and finally the output layer. It reaches about 99.2% accuracy on the test set. This places this model roughly in the top 20% in the [MNIST Kaggle competition](https://www.kaggle.com/c/digit-recognizer/) (if we ignore the models with an accuracy greater than 99.79% which were most likely trained on the test set, as explained by Chris Deotte in [this post](https://www.kaggle.com/c/digit-recognizer/discussion/61480)). Can you do better? To reach 99.5 to 99.7% accuracy on the test set, you need to add image augmentation, batch norm, use a learning schedule such as 1-cycle, and possibly create an ensemble."
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [],
"source": [
"(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()\n",
"X_train_full = X_train_full / 255.\n",
"X_test = X_test / 255.\n",
"X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n",
"y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]\n",
"\n",
"X_train = X_train[..., np.newaxis]\n",
"X_valid = X_valid[..., np.newaxis]\n",
"X_test = X_test[..., np.newaxis]"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [],
"source": [
"keras.backend.clear_session()\n",
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"model = keras.models.Sequential([\n",
" keras.layers.Conv2D(32, kernel_size=3, padding=\"same\", activation=\"relu\"),\n",
" keras.layers.Conv2D(64, kernel_size=3, padding=\"same\", activation=\"relu\"),\n",
" keras.layers.MaxPool2D(),\n",
" keras.layers.Flatten(),\n",
" keras.layers.Dropout(0.25),\n",
" keras.layers.Dense(128, activation=\"relu\"),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\",\n",
" metrics=[\"accuracy\"])\n",
"\n",
"model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))\n",
"model.evaluate(X_test, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 10. Use transfer learning for large image classification"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_Exercise: Use transfer learning for large image classification, going through these steps:_\n",
"\n",
"* _Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can use an existing dataset (e.g., from TensorFlow Datasets)._\n",
"* _Split it into a training set, a validation set, and a test set._\n",
"* _Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation._\n",
"* _Fine-tune a pretrained model on this dataset._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See the Flowers example above."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 11.\n",
"_Exercise: Go through TensorFlow's [Style Transfer tutorial](https://homl.info/styletuto). It is a fun way to generate art using Deep Learning._\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Simply open the Colab and follow its instructions."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
},
"nav_menu": {},
"toc": {
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 6,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}