handson-ml/14_deep_computer_vision_wit...

1431 lines
39 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Chapter 14 Deep Computer Vision Using Convolutional Neural Networks**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_This notebook contains all the sample code in chapter 14._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0-preview."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Python ≥3.5 is required\n",
"import sys\n",
"assert sys.version_info >= (3, 5)\n",
"\n",
"# Scikit-Learn ≥0.20 is required\n",
"import sklearn\n",
"assert sklearn.__version__ >= \"0.20\"\n",
"\n",
"# TensorFlow ≥2.0-preview is required\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"assert tf.__version__ >= \"2.0\"\n",
"\n",
"# Common imports\n",
"import numpy as np\n",
"import os\n",
"\n",
"# to make this notebook's output stable across runs\n",
"np.random.seed(42)\n",
"tf.random.set_seed(42)\n",
"\n",
"# To plot pretty figures\n",
"%matplotlib inline\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"mpl.rc('axes', labelsize=14)\n",
"mpl.rc('xtick', labelsize=12)\n",
"mpl.rc('ytick', labelsize=12)\n",
"\n",
"# Where to save the figures\n",
"PROJECT_ROOT_DIR = \".\"\n",
"CHAPTER_ID = \"cnn\"\n",
"IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID)\n",
"os.makedirs(IMAGES_PATH, exist_ok=True)\n",
"\n",
"def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n",
" path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n",
" print(\"Saving figure\", fig_id)\n",
" if tight_layout:\n",
" plt.tight_layout()\n",
" plt.savefig(path, format=fig_extension, dpi=resolution)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A couple utility functions to plot grayscale and RGB images:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def plot_image(image):\n",
" plt.imshow(image, cmap=\"gray\", interpolation=\"nearest\")\n",
" plt.axis(\"off\")\n",
"\n",
"def plot_color_image(image):\n",
" plt.imshow(image, interpolation=\"nearest\")\n",
" plt.axis(\"off\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What is a Convolution?"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from sklearn.datasets import load_sample_image\n",
"\n",
"# Load sample images\n",
"china = load_sample_image(\"china.jpg\") / 255\n",
"flower = load_sample_image(\"flower.jpg\") / 255\n",
"images = np.array([china, flower])\n",
"batch_size, height, width, channels = images.shape\n",
"\n",
"# Create 2 filters\n",
"filters = np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)\n",
"filters[:, 3, :, 0] = 1 # vertical line\n",
"filters[3, :, :, 1] = 1 # horizontal line\n",
"\n",
"outputs = tf.nn.conv2d(images, filters, strides=1, padding=\"SAME\")\n",
"\n",
"plt.imshow(outputs[0, :, :, 1], cmap=\"gray\") # plot 1st image's 2nd feature map\n",
"plt.axis(\"off\") # Not shown in the book\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"for image_index in (0, 1):\n",
" for feature_map_index in (0, 1):\n",
" plt.subplot(2, 2, image_index * 2 + feature_map_index + 1)\n",
" plot_image(outputs[image_index, :, :, feature_map_index])\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"def crop(images):\n",
" return images[150:220, 130:250]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"plot_image(crop(images[0, :, :, 0]))\n",
"save_fig(\"china_original\", tight_layout=False)\n",
"plt.show()\n",
"\n",
"for feature_map_index, filename in enumerate([\"china_vertical\", \"china_horizontal\"]):\n",
" plot_image(crop(outputs[0, :, :, feature_map_index]))\n",
" save_fig(filename, tight_layout=False)\n",
" plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"plot_image(filters[:, :, 0, 0])\n",
"plt.show()\n",
"plot_image(filters[:, :, 0, 1])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convolutional Layer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `keras.layers.Conv2D()`:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"conv = keras.layers.Conv2D(filters=32, kernel_size=3, strides=1,\n",
" padding=\"SAME\", activation=\"relu\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"plot_image(crop(outputs[0, :, :, 0]))\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## VALID vs SAME padding"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"def feature_map_size(input_size, kernel_size, strides=1, padding=\"SAME\"):\n",
" if padding == \"SAME\":\n",
" return (input_size - 1) // strides + 1\n",
" else:\n",
" return (input_size - kernel_size) // strides + 1"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"def pad_before_and_padded_size(input_size, kernel_size, strides=1):\n",
" fmap_size = feature_map_size(input_size, kernel_size, strides)\n",
" padded_size = max((fmap_size - 1) * strides + kernel_size, input_size)\n",
" pad_before = (padded_size - input_size) // 2\n",
" return pad_before, padded_size"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"def manual_same_padding(images, kernel_size, strides=1):\n",
" if kernel_size == 1:\n",
" return images.astype(np.float32)\n",
" batch_size, height, width, channels = images.shape\n",
" top_pad, padded_height = pad_before_and_padded_size(height, kernel_size, strides)\n",
" left_pad, padded_width = pad_before_and_padded_size(width, kernel_size, strides)\n",
" padded_shape = [batch_size, padded_height, padded_width, channels]\n",
" padded_images = np.zeros(padded_shape, dtype=np.float32)\n",
" padded_images[:, top_pad:height+top_pad, left_pad:width+left_pad, :] = images\n",
" return padded_images"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `\"SAME\"` padding is equivalent to padding manually using `manual_same_padding()` then using `\"VALID\"` padding (confusingly, `\"VALID\"` padding means no padding at all):"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"kernel_size = 7\n",
"strides = 2\n",
"\n",
"conv_valid = keras.layers.Conv2D(filters=1, kernel_size=kernel_size, strides=strides, padding=\"VALID\")\n",
"conv_same = keras.layers.Conv2D(filters=1, kernel_size=kernel_size, strides=strides, padding=\"SAME\")\n",
"\n",
"valid_output = conv_valid(manual_same_padding(images, kernel_size, strides))\n",
"\n",
"# Need to call build() so conv_same's weights get created\n",
"conv_same.build(tf.TensorShape(images.shape))\n",
"\n",
"# Copy the weights from conv_valid to conv_same\n",
"conv_same.set_weights(conv_valid.get_weights())\n",
"\n",
"same_output = conv_same(images.astype(np.float32))\n",
"\n",
"assert np.allclose(valid_output.numpy(), same_output.numpy())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pooling layer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Max pooling"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"max_pool = keras.layers.MaxPool2D(pool_size=2)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"cropped_images = np.array([crop(image) for image in images])\n",
"output = max_pool(cropped_images)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[2, 1])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(cropped_images[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(output[0]) # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"save_fig(\"china_max_pooling\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Depth-wise pooling"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"class DepthMaxPool(keras.layers.Layer):\n",
" def __init__(self, pool_size, strides=None, padding=\"VALID\", **kwargs):\n",
" super().__init__(**kwargs)\n",
" if strides is None:\n",
" strides = pool_size\n",
" self.pool_size = pool_size\n",
" self.strides = strides\n",
" self.padding = padding\n",
" def call(self, inputs):\n",
" return tf.nn.max_pool(inputs,\n",
" ksize=(1, 1, 1, self.pool_size),\n",
" strides=(1, 1, 1, self.pool_size),\n",
" padding=self.padding)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"depth_pool = DepthMaxPool(3)\n",
"with tf.device(\"/cpu:0\"): # there is no GPU-kernel yet\n",
" depth_output = depth_pool(cropped_images)\n",
"depth_output.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or just use a `Lambda` layer:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"depth_pool = keras.layers.Lambda(lambda X: tf.nn.max_pool(\n",
" X, ksize=(1, 1, 1, 3), strides=(1, 1, 1, 3), padding=\"VALID\"))\n",
"with tf.device(\"/cpu:0\"): # there is no GPU-kernel yet\n",
" depth_output = depth_pool(cropped_images)\n",
"depth_output.shape"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(12, 8))\n",
"plt.subplot(1, 2, 1)\n",
"plt.title(\"Input\", fontsize=14)\n",
"plot_color_image(cropped_images[0]) # plot the 1st image\n",
"plt.subplot(1, 2, 2)\n",
"plt.title(\"Output\", fontsize=14)\n",
"plot_image(depth_output[0, ..., 0]) # plot the output for the 1st image\n",
"plt.axis(\"off\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Average pooling"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"avg_pool = keras.layers.AvgPool2D(pool_size=2)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"output_avg = avg_pool(cropped_images)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[2, 1])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(cropped_images[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(output_avg[0]) # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Global Average Pooling"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"global_avg_pool = keras.layers.GlobalAvgPool2D()\n",
"global_avg_pool(cropped_images)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"output_global_avg2 = keras.layers.Lambda(lambda X: tf.reduce_mean(X, axis=[1, 2]))\n",
"output_global_avg2(cropped_images)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tackling Fashion MNIST With a CNN"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()\n",
"X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n",
"y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]\n",
"\n",
"X_mean = X_train.mean(axis=0, keepdims=True)\n",
"X_std = X_train.std(axis=0, keepdims=True) + 1e-7\n",
"X_train = (X_train - X_mean) / X_std\n",
"X_valid = (X_valid - X_mean) / X_std\n",
"X_test = (X_test - X_mean) / X_std\n",
"\n",
"X_train = X_train[..., np.newaxis]\n",
"X_valid = X_valid[..., np.newaxis]\n",
"X_test = X_test[..., np.newaxis]"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"from functools import partial\n",
"\n",
"DefaultConv2D = partial(keras.layers.Conv2D,\n",
" kernel_size=3, activation='relu', padding=\"SAME\")\n",
"\n",
"model = keras.models.Sequential([\n",
" DefaultConv2D(filters=64, kernel_size=7, input_shape=[28, 28, 1]),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" DefaultConv2D(filters=128),\n",
" DefaultConv2D(filters=128),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" DefaultConv2D(filters=256),\n",
" DefaultConv2D(filters=256),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" keras.layers.Flatten(),\n",
" keras.layers.Dense(units=128, activation='relu'),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(units=64, activation='relu'),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(units=10, activation='softmax'),\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n",
"history = model.fit(X_train, y_train, epochs=10, validation_data=[X_valid, y_valid])\n",
"score = model.evaluate(X_test, y_test)\n",
"X_new = X_test[:10] # pretend we have new images\n",
"y_pred = model.predict(X_new)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ResNet-34"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"DefaultConv2D = partial(keras.layers.Conv2D, kernel_size=3, strides=1,\n",
" padding=\"SAME\", use_bias=False)\n",
"\n",
"class ResidualUnit(keras.layers.Layer):\n",
" def __init__(self, filters, strides=1, activation=\"relu\", **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.activation = keras.activations.get(activation)\n",
" self.main_layers = [\n",
" DefaultConv2D(filters, strides=strides),\n",
" keras.layers.BatchNormalization(),\n",
" self.activation,\n",
" DefaultConv2D(filters),\n",
" keras.layers.BatchNormalization()]\n",
" self.skip_layers = []\n",
" if strides > 1:\n",
" self.skip_layers = [\n",
" DefaultConv2D(filters, kernel_size=1, strides=strides),\n",
" keras.layers.BatchNormalization()]\n",
"\n",
" def call(self, inputs):\n",
" Z = inputs\n",
" for layer in self.main_layers:\n",
" Z = layer(Z)\n",
" skip_Z = inputs\n",
" for layer in self.skip_layers:\n",
" skip_Z = layer(skip_Z)\n",
" return self.activation(Z + skip_Z)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"model = keras.models.Sequential()\n",
"model.add(DefaultConv2D(64, kernel_size=7, strides=2,\n",
" input_shape=[224, 224, 3]))\n",
"model.add(keras.layers.BatchNormalization())\n",
"model.add(keras.layers.Activation(\"relu\"))\n",
"model.add(keras.layers.MaxPool2D(pool_size=3, strides=2, padding=\"SAME\"))\n",
"prev_filters = 64\n",
"for filters in [64] * 3 + [128] * 4 + [256] * 6 + [512] * 3:\n",
" strides = 1 if filters == prev_filters else 2\n",
" model.add(ResidualUnit(filters, strides=strides))\n",
" prev_filters = filters\n",
"model.add(keras.layers.GlobalAvgPool2D())\n",
"model.add(keras.layers.Flatten())\n",
"model.add(keras.layers.Dense(10, activation=\"softmax\"))"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"model.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using a Pretrained Model"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"model = keras.applications.resnet50.ResNet50(weights=\"imagenet\")"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"images_resized = tf.image.resize(images, [224, 224])\n",
"plot_color_image(images_resized[0])\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"images_resized = tf.image.resize_with_pad(images, 224, 224, antialias=True)\n",
"plot_color_image(images_resized[0])"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"images_resized = tf.image.resize_image_with_crop_or_pad(images, 224, 224)\n",
"plot_color_image(images_resized[0])\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"china_box = [0, 0.03, 1, 0.68]\n",
"flower_box = [0.19, 0.26, 0.86, 0.7]\n",
"images_resized = tf.image.crop_and_resize(images, [china_box, flower_box], [0, 1], [224, 224])\n",
"plot_color_image(images_resized[0])\n",
"plt.show()\n",
"plot_color_image(images_resized[1])\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"inputs = keras.applications.resnet50.preprocess_input(images_resized * 255)\n",
"Y_proba = model.predict(inputs)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"Y_proba.shape"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"top_K = keras.applications.resnet50.decode_predictions(Y_proba, top=3)\n",
"for image_index in range(len(images)):\n",
" print(\"Image #{}\".format(image_index))\n",
" for class_id, name, y_proba in top_K[image_index]:\n",
" print(\" {} - {:12s} {:.2f}%\".format(class_id, name, y_proba * 100))\n",
" print()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pretrained Models for Transfer Learning"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"import tensorflow_datasets as tfds\n",
"\n",
"dataset, info = tfds.load(\"tf_flowers\", as_supervised=True, with_info=True)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"info.splits"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"info.splits[\"train\"]"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"class_names = info.features[\"label\"].names\n",
"class_names"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"n_classes = info.features[\"label\"].num_classes"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"dataset_size = info.splits[\"train\"].num_examples\n",
"dataset_size"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"test_split, valid_split, train_split = tfds.Split.TRAIN.subsplit([10, 15, 75])\n",
"\n",
"test_set_raw = tfds.load(\"tf_flowers\", split=test_split, as_supervised=True)\n",
"valid_set_raw = tfds.load(\"tf_flowers\", split=valid_split, as_supervised=True)\n",
"train_set_raw = tfds.load(\"tf_flowers\", split=train_split, as_supervised=True)"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(12, 10))\n",
"index = 0\n",
"for image, label in train_set_raw.take(9):\n",
" index += 1\n",
" plt.subplot(3, 3, index)\n",
" plt.imshow(image)\n",
" plt.title(\"Class: {}\".format(class_names[label]))\n",
" plt.axis(\"off\")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Basic preprocessing:"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"def preprocess(image, label):\n",
" resized_image = tf.image.resize(image, [224, 224])\n",
" final_image = keras.applications.xception.preprocess_input(resized_image)\n",
" return final_image, label"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Slightly fancier preprocessing (but you could add much more data augmentation):"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"def central_crop(image):\n",
" shape = tf.shape(image)\n",
" min_dim = tf.reduce_min([shape[0], shape[1]])\n",
" top_crop = (shape[0] - min_dim) // 4\n",
" bottom_crop = shape[0] - top_crop\n",
" left_crop = (shape[1] - min_dim) // 4\n",
" right_crop = shape[1] - left_crop\n",
" return image[top_crop:bottom_crop, left_crop:right_crop]\n",
"\n",
"def random_crop(image):\n",
" shape = tf.shape(image)\n",
" min_dim = tf.reduce_min([shape[0], shape[1]]) * 90 // 100\n",
" return tf.image.random_crop(image, [min_dim, min_dim, 3])\n",
"\n",
"def preprocess(image, label, randomize=False):\n",
" if randomize:\n",
" cropped_image = random_crop(image)\n",
" cropped_image = tf.image.random_flip_left_right(cropped_image)\n",
" else:\n",
" cropped_image = central_crop(image)\n",
" resized_image = tf.image.resize(cropped_image, [224, 224])\n",
" final_image = keras.applications.xception.preprocess_input(resized_image)\n",
" return final_image, label\n",
"\n",
"batch_size = 32\n",
"train_set = train_set_raw.shuffle(1000).repeat()\n",
"train_set = train_set.map(partial(preprocess, randomize=True)).batch(batch_size).prefetch(1)\n",
"valid_set = valid_set_raw.map(preprocess).batch(batch_size).prefetch(1)\n",
"test_set = test_set_raw.map(preprocess).batch(batch_size).prefetch(1)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(12, 12))\n",
"for X_batch, y_batch in train_set.take(1):\n",
" for index in range(9):\n",
" plt.subplot(3, 3, index + 1)\n",
" plt.imshow(X_batch[index] / 2 + 0.5)\n",
" plt.title(\"Class: {}\".format(class_names[y_batch[index]]))\n",
" plt.axis(\"off\")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(12, 12))\n",
"for X_batch, y_batch in test_set.take(1):\n",
" for index in range(9):\n",
" plt.subplot(3, 3, index + 1)\n",
" plt.imshow(X_batch[index] / 2 + 0.5)\n",
" plt.title(\"Class: {}\".format(class_names[y_batch[index]]))\n",
" plt.axis(\"off\")\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"base_model = keras.applications.xception.Xception(weights=\"imagenet\",\n",
" include_top=False)\n",
"avg = keras.layers.GlobalAveragePooling2D()(base_model.output)\n",
"output = keras.layers.Dense(n_classes, activation=\"softmax\")(avg)\n",
"model = keras.models.Model(inputs=base_model.input, outputs=output)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"for index, layer in enumerate(base_model.layers):\n",
" print(index, layer.name)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"for layer in base_model.layers:\n",
" layer.trainable = False\n",
"\n",
"optimizer = keras.optimizers.SGD(lr=0.2, momentum=0.9, decay=0.01)\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer,\n",
" metrics=[\"accuracy\"])\n",
"history = model.fit(train_set,\n",
" steps_per_epoch=int(0.75 * dataset_size / batch_size),\n",
" validation_data=valid_set,\n",
" validation_steps=int(0.15 * dataset_size / batch_size),\n",
" epochs=5)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"for layer in base_model.layers:\n",
" layer.trainable = True\n",
"\n",
"optimizer = keras.optimizers.SGD(learning_rate=0.01, momentum=0.9,\n",
" nesterov=True, decay=0.001)\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer,\n",
" metrics=[\"accuracy\"])\n",
"history = model.fit(train_set,\n",
" steps_per_epoch=int(0.75 * dataset_size / batch_size),\n",
" validation_data=valid_set,\n",
" validation_steps=int(0.15 * dataset_size / batch_size),\n",
" epochs=40)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Classification and Localization"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"base_model = keras.applications.xception.Xception(weights=\"imagenet\",\n",
" include_top=False)\n",
"avg = keras.layers.GlobalAveragePooling2D()(base_model.output)\n",
"class_output = keras.layers.Dense(n_classes, activation=\"softmax\")(avg)\n",
"loc_output = keras.layers.Dense(4)(avg)\n",
"model = keras.models.Model(inputs=base_model.input,\n",
" outputs=[class_output, loc_output])\n",
"model.compile(loss=[\"sparse_categorical_crossentropy\", \"mse\"],\n",
" loss_weights=[0.8, 0.2], # depends on what you care most about\n",
" optimizer=optimizer, metrics=[\"accuracy\"])"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"def add_random_bounding_boxes(images, labels):\n",
" fake_bboxes = tf.random.uniform([tf.shape(images)[0], 4])\n",
" return images, (labels, fake_bboxes)\n",
"\n",
"fake_train_set = train_set.take(5).repeat(2).map(add_random_bounding_boxes)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"model.fit(fake_train_set, steps_per_epoch=5, epochs=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Mean Average Precision (mAP)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"def maximum_precisions(precisions):\n",
" return np.flip(np.maximum.accumulate(np.flip(precisions)))"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"recalls = np.linspace(0, 1, 11)\n",
"\n",
"precisions = [0.91, 0.94, 0.96, 0.94, 0.95, 0.92, 0.80, 0.60, 0.45, 0.20, 0.10]\n",
"max_precisions = maximum_precisions(precisions)\n",
"mAP = max_precisions.mean()\n",
"plt.plot(recalls, precisions, \"ro--\", label=\"Precision\")\n",
"plt.plot(recalls, max_precisions, \"bo-\", label=\"Max Precision\")\n",
"plt.xlabel(\"Recall\")\n",
"plt.ylabel(\"Precision\")\n",
"plt.plot([0, 1], [mAP, mAP], \"g:\", linewidth=3, label=\"mAP\")\n",
"plt.grid(True)\n",
"plt.axis([0, 1, 0, 1])\n",
"plt.legend(loc=\"lower center\", fontsize=14)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Transpose convolutions:"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"X = images_resized.numpy()\n",
"\n",
"conv_transpose = keras.layers.Conv2DTranspose(filters=5, kernel_size=3, strides=2, padding=\"VALID\")\n",
"output = conv_transpose(X)\n",
"output.shape"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"def normalize(X):\n",
" return (X - tf.reduce_min(X)) / (tf.reduce_max(X) - tf.reduce_min(X))\n",
"\n",
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[1, 2])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(X[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(normalize(output[0, ..., :3]), interpolation=\"bicubic\") # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
"def upscale_images(images, stride, kernel_size):\n",
" batch_size, height, width, channels = images.shape\n",
" upscaled = np.zeros((batch_size,\n",
" (height - 1) * stride + 2 * kernel_size - 1,\n",
" (width - 1) * stride + 2 * kernel_size - 1,\n",
" channels))\n",
" upscaled[:,\n",
" kernel_size - 1:(height - 1) * stride + kernel_size:stride,\n",
" kernel_size - 1:(width - 1) * stride + kernel_size:stride,\n",
" :] = images\n",
" return upscaled"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"upscaled = upscale_images(X, stride=2, kernel_size=3)\n",
"weights, biases = conv_transpose.weights\n",
"reversed_filters = np.flip(weights.numpy(), axis=[0, 1])\n",
"reversed_filters = np.transpose(reversed_filters, [0, 1, 3, 2])\n",
"manual_output = tf.nn.conv2d(upscaled, reversed_filters, strides=1, padding=\"VALID\")"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"def normalize(X):\n",
" return (X - tf.reduce_min(X)) / (tf.reduce_max(X) - tf.reduce_min(X))\n",
"\n",
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=3, width_ratios=[1, 2, 2])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(X[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Upscaled\", fontsize=14)\n",
"ax2.imshow(upscaled[0], interpolation=\"bicubic\")\n",
"ax2.axis(\"off\")\n",
"ax3 = fig.add_subplot(gs[0, 2])\n",
"ax3.set_title(\"Output\", fontsize=14)\n",
"ax3.imshow(normalize(manual_output[0, ..., :3]), interpolation=\"bicubic\") # plot the output for the 1st image\n",
"ax3.axis(\"off\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"np.allclose(output, manual_output.numpy(), atol=1e-7)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"# Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. to 8."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See appendix A."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 9. High Accuracy CNN for MNIST\n",
"Exercise: Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## 10. Use transfer learning for large image classification\n",
"\n",
"### 10.1)\n",
"Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can just use an existing dataset (e.g., from TensorFlow Datasets)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 10.2)\n",
"Split it into a training set, a validation set and a test set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 10.3)\n",
"Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 10.4)\n",
"Fine-tune a pretrained model on this dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 11.\n",
"Exercise: Go through TensorFlow's [DeepDream tutorial](https://goo.gl/4b2s6g). It is a fun way to familiarize yourself with various ways of visualizing the patterns learned by a CNN, and to generate art using Deep Learning.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"Simply download the notebook and follow its instructions. For extra fun, you can produce a series of images, by repeatedly zooming in and running the DeepDream algorithm: using a tool such as [ffmpeg](https://ffmpeg.org/) you can then create a video from these images. For example, here is a [DeepDream video](https://www.youtube.com/watch?v=l6i_fDg30p0) I made... as you will see, it quickly turns into a nightmare. ;-) You can find hundreds of [similar videos](https://www.youtube.com/results?search_query=+deepdream) (often much more artistic) on the web."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
},
"nav_menu": {},
"toc": {
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 6,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}