handson-ml/14_deep_computer_vision_wit...

1384 lines
40 KiB
Plaintext
Raw Normal View History

2016-09-27 23:31:21 +02:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
"**Chapter 14 Deep Computer Vision Using Convolutional Neural Networks**"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"_This notebook contains all the sample code in chapter 14._"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<table align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/ageron/handson-ml2/blob/master/14_deep_computer_vision_with_cnns.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
"</table>"
]
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
"First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0."
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"# Python ≥3.5 is required\n",
"import sys\n",
"assert sys.version_info >= (3, 5)\n",
"\n",
2021-05-25 02:07:29 +02:00
"# Is this notebook running on Colab or Kaggle?\n",
"IS_COLAB = \"google.colab\" in sys.modules\n",
"IS_KAGGLE = \"kaggle_secrets\" in sys.modules\n",
"\n",
2019-03-24 02:06:29 +01:00
"# Scikit-Learn ≥0.20 is required\n",
"import sklearn\n",
"assert sklearn.__version__ >= \"0.20\"\n",
"\n",
"# TensorFlow ≥2.0 is required\n",
2019-03-24 02:06:29 +01:00
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"assert tf.__version__ >= \"2.0\"\n",
2016-09-27 23:31:21 +02:00
"\n",
"if not tf.config.list_physical_devices('GPU'):\n",
" print(\"No GPU was detected. CNNs can be very slow without a GPU.\")\n",
" if IS_COLAB:\n",
" print(\"Go to Runtime > Change runtime and select a GPU hardware accelerator.\")\n",
2021-05-25 02:07:29 +02:00
" if IS_KAGGLE:\n",
" print(\"Go to Settings > Accelerator and select GPU.\")\n",
"\n",
2016-09-27 23:31:21 +02:00
"# Common imports\n",
"import numpy as np\n",
"import os\n",
"\n",
"# to make this notebook's output stable across runs\n",
2019-03-24 02:06:29 +01:00
"np.random.seed(42)\n",
"tf.random.set_seed(42)\n",
2016-09-27 23:31:21 +02:00
"\n",
"# To plot pretty figures\n",
"%matplotlib inline\n",
2019-03-24 02:06:29 +01:00
"import matplotlib as mpl\n",
2016-09-27 23:31:21 +02:00
"import matplotlib.pyplot as plt\n",
2019-03-24 02:06:29 +01:00
"mpl.rc('axes', labelsize=14)\n",
"mpl.rc('xtick', labelsize=12)\n",
"mpl.rc('ytick', labelsize=12)\n",
2016-09-27 23:31:21 +02:00
"\n",
"# Where to save the figures\n",
"PROJECT_ROOT_DIR = \".\"\n",
"CHAPTER_ID = \"cnn\"\n",
2019-03-24 02:06:29 +01:00
"IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID)\n",
"os.makedirs(IMAGES_PATH, exist_ok=True)\n",
2016-09-27 23:31:21 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n",
" path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n",
2016-09-27 23:31:21 +02:00
" print(\"Saving figure\", fig_id)\n",
" if tight_layout:\n",
" plt.tight_layout()\n",
2019-03-24 02:06:29 +01:00
" plt.savefig(path, format=fig_extension, dpi=resolution)"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
"A couple utility functions to plot grayscale and RGB images:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
"def plot_image(image):\n",
" plt.imshow(image, cmap=\"gray\", interpolation=\"nearest\")\n",
" plt.axis(\"off\")\n",
"\n",
"def plot_color_image(image):\n",
2019-03-24 02:06:29 +01:00
" plt.imshow(image, interpolation=\"nearest\")\n",
2016-09-27 23:31:21 +02:00
" plt.axis(\"off\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"# What is a Convolution?"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"import numpy as np\n",
"from sklearn.datasets import load_sample_image\n",
"\n",
"# Load sample images\n",
"china = load_sample_image(\"china.jpg\") / 255\n",
"flower = load_sample_image(\"flower.jpg\") / 255\n",
"images = np.array([china, flower])\n",
"batch_size, height, width, channels = images.shape\n",
"\n",
"# Create 2 filters\n",
"filters = np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)\n",
"filters[:, 3, :, 0] = 1 # vertical line\n",
"filters[3, :, :, 1] = 1 # horizontal line\n",
"\n",
"outputs = tf.nn.conv2d(images, filters, strides=1, padding=\"SAME\")\n",
"\n",
"plt.imshow(outputs[0, :, :, 1], cmap=\"gray\") # plot 1st image's 2nd feature map\n",
"plt.axis(\"off\") # Not shown in the book\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"for image_index in (0, 1):\n",
" for feature_map_index in (0, 1):\n",
" plt.subplot(2, 2, image_index * 2 + feature_map_index + 1)\n",
" plot_image(outputs[image_index, :, :, feature_map_index])\n",
"\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def crop(images):\n",
" return images[150:220, 130:250]"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plot_image(crop(images[0, :, :, 0]))\n",
"save_fig(\"china_original\", tight_layout=False)\n",
"plt.show()\n",
2016-09-27 23:31:21 +02:00
"\n",
2019-05-06 07:14:23 +02:00
"for feature_map_index, filename in enumerate([\"china_vertical\", \"china_horizontal\"]):\n",
2019-03-24 02:06:29 +01:00
" plot_image(crop(outputs[0, :, :, feature_map_index]))\n",
" save_fig(filename, tight_layout=False)\n",
" plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plot_image(filters[:, :, 0, 0])\n",
"plt.show()\n",
"plot_image(filters[:, :, 0, 1])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convolutional Layer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `keras.layers.Conv2D()`:"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"conv = keras.layers.Conv2D(filters=32, kernel_size=3, strides=1,\n",
" padding=\"SAME\", activation=\"relu\")"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plot_image(crop(outputs[0, :, :, 0]))\n",
2016-09-27 23:31:21 +02:00
"plt.show()"
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## VALID vs SAME padding"
2016-09-27 23:31:21 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 10,
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"def feature_map_size(input_size, kernel_size, strides=1, padding=\"SAME\"):\n",
" if padding == \"SAME\":\n",
" return (input_size - 1) // strides + 1\n",
" else:\n",
" return (input_size - kernel_size) // strides + 1"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def pad_before_and_padded_size(input_size, kernel_size, strides=1):\n",
" fmap_size = feature_map_size(input_size, kernel_size, strides)\n",
" padded_size = max((fmap_size - 1) * strides + kernel_size, input_size)\n",
" pad_before = (padded_size - input_size) // 2\n",
" return pad_before, padded_size"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def manual_same_padding(images, kernel_size, strides=1):\n",
" if kernel_size == 1:\n",
" return images.astype(np.float32)\n",
" batch_size, height, width, channels = images.shape\n",
" top_pad, padded_height = pad_before_and_padded_size(height, kernel_size, strides)\n",
" left_pad, padded_width = pad_before_and_padded_size(width, kernel_size, strides)\n",
" padded_shape = [batch_size, padded_height, padded_width, channels]\n",
" padded_images = np.zeros(padded_shape, dtype=np.float32)\n",
" padded_images[:, top_pad:height+top_pad, left_pad:width+left_pad, :] = images\n",
" return padded_images"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2019-03-24 02:06:29 +01:00
"Using `\"SAME\"` padding is equivalent to padding manually using `manual_same_padding()` then using `\"VALID\"` padding (confusingly, `\"VALID\"` padding means no padding at all):"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"kernel_size = 7\n",
"strides = 2\n",
"\n",
2019-03-24 02:06:29 +01:00
"conv_valid = keras.layers.Conv2D(filters=1, kernel_size=kernel_size, strides=strides, padding=\"VALID\")\n",
"conv_same = keras.layers.Conv2D(filters=1, kernel_size=kernel_size, strides=strides, padding=\"SAME\")\n",
"\n",
2019-03-24 02:06:29 +01:00
"valid_output = conv_valid(manual_same_padding(images, kernel_size, strides))\n",
"\n",
"# Need to call build() so conv_same's weights get created\n",
"conv_same.build(tf.TensorShape(images.shape))\n",
"\n",
"# Copy the weights from conv_valid to conv_same\n",
"conv_same.set_weights(conv_valid.get_weights())\n",
"\n",
"same_output = conv_same(images.astype(np.float32))\n",
"\n",
"assert np.allclose(valid_output.numpy(), same_output.numpy())"
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "markdown",
"metadata": {},
"source": [
2019-03-24 02:06:29 +01:00
"# Pooling layer"
]
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## Max pooling"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 14,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"max_pool = keras.layers.MaxPool2D(pool_size=2)"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 15,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
"cropped_images = np.array([crop(image) for image in images], dtype=np.float32)\n",
2019-03-24 02:06:29 +01:00
"output = max_pool(cropped_images)"
2016-09-27 23:31:21 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 16,
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[2, 1])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(cropped_images[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(output[0]) # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"save_fig(\"china_max_pooling\")\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## Depth-wise pooling"
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"class DepthMaxPool(keras.layers.Layer):\n",
" def __init__(self, pool_size, strides=None, padding=\"VALID\", **kwargs):\n",
" super().__init__(**kwargs)\n",
" if strides is None:\n",
" strides = pool_size\n",
" self.pool_size = pool_size\n",
" self.strides = strides\n",
" self.padding = padding\n",
" def call(self, inputs):\n",
" return tf.nn.max_pool(inputs,\n",
" ksize=(1, 1, 1, self.pool_size),\n",
" strides=(1, 1, 1, self.pool_size),\n",
" padding=self.padding)"
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"depth_pool = DepthMaxPool(3)\n",
"with tf.device(\"/cpu:0\"): # there is no GPU-kernel yet\n",
" depth_output = depth_pool(cropped_images)\n",
"depth_output.shape"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"Or just use a `Lambda` layer:"
2016-09-27 23:31:21 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 19,
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"depth_pool = keras.layers.Lambda(lambda X: tf.nn.max_pool(\n",
" X, ksize=(1, 1, 1, 3), strides=(1, 1, 1, 3), padding=\"VALID\"))\n",
"with tf.device(\"/cpu:0\"): # there is no GPU-kernel yet\n",
" depth_output = depth_pool(cropped_images)\n",
"depth_output.shape"
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plt.figure(figsize=(12, 8))\n",
"plt.subplot(1, 2, 1)\n",
"plt.title(\"Input\", fontsize=14)\n",
"plot_color_image(cropped_images[0]) # plot the 1st image\n",
"plt.subplot(1, 2, 2)\n",
"plt.title(\"Output\", fontsize=14)\n",
"plot_image(depth_output[0, ..., 0]) # plot the output for the 1st image\n",
"plt.axis(\"off\")\n",
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2019-03-24 02:06:29 +01:00
"## Average pooling"
]
},
2017-05-05 15:22:45 +02:00
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 21,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"avg_pool = keras.layers.AvgPool2D(pool_size=2)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 22,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"output_avg = avg_pool(cropped_images)"
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 23,
"metadata": {},
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[2, 1])\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(cropped_images[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(output_avg[0]) # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## Global Average Pooling"
2017-05-05 15:22:45 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 24,
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"global_avg_pool = keras.layers.GlobalAvgPool2D()\n",
"global_avg_pool(cropped_images)"
2017-05-05 15:22:45 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 25,
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"output_global_avg2 = keras.layers.Lambda(lambda X: tf.reduce_mean(X, axis=[1, 2]))\n",
"output_global_avg2(cropped_images)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"# Tackling Fashion MNIST With a CNN"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 26,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()\n",
"X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n",
"y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]\n",
"\n",
2019-03-24 02:06:29 +01:00
"X_mean = X_train.mean(axis=0, keepdims=True)\n",
"X_std = X_train.std(axis=0, keepdims=True) + 1e-7\n",
"X_train = (X_train - X_mean) / X_std\n",
"X_valid = (X_valid - X_mean) / X_std\n",
"X_test = (X_test - X_mean) / X_std\n",
"\n",
2019-03-24 02:06:29 +01:00
"X_train = X_train[..., np.newaxis]\n",
"X_valid = X_valid[..., np.newaxis]\n",
"X_test = X_test[..., np.newaxis]"
]
},
2017-05-05 15:22:45 +02:00
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 27,
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"from functools import partial\n",
"\n",
"DefaultConv2D = partial(keras.layers.Conv2D,\n",
" kernel_size=3, activation='relu', padding=\"SAME\")\n",
"\n",
"model = keras.models.Sequential([\n",
" DefaultConv2D(filters=64, kernel_size=7, input_shape=[28, 28, 1]),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" DefaultConv2D(filters=128),\n",
" DefaultConv2D(filters=128),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" DefaultConv2D(filters=256),\n",
" DefaultConv2D(filters=256),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" keras.layers.Flatten(),\n",
" keras.layers.Dense(units=128, activation='relu'),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(units=64, activation='relu'),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(units=10, activation='softmax'),\n",
"])"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 28,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n",
"history = model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))\n",
2019-03-24 02:06:29 +01:00
"score = model.evaluate(X_test, y_test)\n",
"X_new = X_test[:10] # pretend we have new images\n",
"y_pred = model.predict(X_new)"
2016-09-27 23:31:21 +02:00
]
},
2017-05-05 15:22:45 +02:00
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## ResNet-34"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
"DefaultConv2D = partial(keras.layers.Conv2D, kernel_size=3, strides=1,\n",
" padding=\"SAME\", use_bias=False)\n",
"\n",
2019-03-24 02:06:29 +01:00
"class ResidualUnit(keras.layers.Layer):\n",
" def __init__(self, filters, strides=1, activation=\"relu\", **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.activation = keras.activations.get(activation)\n",
" self.main_layers = [\n",
" DefaultConv2D(filters, strides=strides),\n",
" keras.layers.BatchNormalization(),\n",
" self.activation,\n",
" DefaultConv2D(filters),\n",
" keras.layers.BatchNormalization()]\n",
" self.skip_layers = []\n",
" if strides > 1:\n",
" self.skip_layers = [\n",
" DefaultConv2D(filters, kernel_size=1, strides=strides),\n",
" keras.layers.BatchNormalization()]\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
" def call(self, inputs):\n",
" Z = inputs\n",
" for layer in self.main_layers:\n",
" Z = layer(Z)\n",
" skip_Z = inputs\n",
" for layer in self.skip_layers:\n",
" skip_Z = layer(skip_Z)\n",
" return self.activation(Z + skip_Z)"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"model = keras.models.Sequential()\n",
"model.add(DefaultConv2D(64, kernel_size=7, strides=2,\n",
" input_shape=[224, 224, 3]))\n",
"model.add(keras.layers.BatchNormalization())\n",
"model.add(keras.layers.Activation(\"relu\"))\n",
2019-03-24 02:06:29 +01:00
"model.add(keras.layers.MaxPool2D(pool_size=3, strides=2, padding=\"SAME\"))\n",
"prev_filters = 64\n",
"for filters in [64] * 3 + [128] * 4 + [256] * 6 + [512] * 3:\n",
" strides = 1 if filters == prev_filters else 2\n",
" model.add(ResidualUnit(filters, strides=strides))\n",
" prev_filters = filters\n",
"model.add(keras.layers.GlobalAvgPool2D())\n",
"model.add(keras.layers.Flatten())\n",
"model.add(keras.layers.Dense(10, activation=\"softmax\"))"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"model.summary()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## Using a Pretrained Model"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"model = keras.applications.resnet50.ResNet50(weights=\"imagenet\")"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"images_resized = tf.image.resize(images, [224, 224])\n",
"plot_color_image(images_resized[0])\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"images_resized = tf.image.resize_with_pad(images, 224, 224, antialias=True)\n",
"plot_color_image(images_resized[0])"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
"images_resized = tf.image.resize_with_crop_or_pad(images, 224, 224)\n",
2019-03-24 02:06:29 +01:00
"plot_color_image(images_resized[0])\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"china_box = [0, 0.03, 1, 0.68]\n",
"flower_box = [0.19, 0.26, 0.86, 0.7]\n",
"images_resized = tf.image.crop_and_resize(images, [china_box, flower_box], [0, 1], [224, 224])\n",
"plot_color_image(images_resized[0])\n",
"plt.show()\n",
"plot_color_image(images_resized[1])\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"inputs = keras.applications.resnet50.preprocess_input(images_resized * 255)\n",
"Y_proba = model.predict(inputs)"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"Y_proba.shape"
2017-05-05 15:22:45 +02:00
]
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"top_K = keras.applications.resnet50.decode_predictions(Y_proba, top=3)\n",
"for image_index in range(len(images)):\n",
" print(\"Image #{}\".format(image_index))\n",
" for class_id, name, y_proba in top_K[image_index]:\n",
" print(\" {} - {:12s} {:.2f}%\".format(class_id, name, y_proba * 100))\n",
" print()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## Pretrained Models for Transfer Learning"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"import tensorflow_datasets as tfds\n",
2016-09-27 23:31:21 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"dataset, info = tfds.load(\"tf_flowers\", as_supervised=True, with_info=True)"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"info.splits"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"info.splits[\"train\"]"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 43,
2019-03-24 02:06:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"class_names = info.features[\"label\"].names\n",
"class_names"
2017-05-05 15:22:45 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 44,
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"n_classes = info.features[\"label\"].num_classes"
2017-05-05 15:22:45 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 45,
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"dataset_size = info.splits[\"train\"].num_examples\n",
"dataset_size"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning:** TFDS's split API has evolved since the book was published. The [new split API](https://www.tensorflow.org/datasets/splits) (called S3) is much simpler to use:"
]
},
2017-05-05 15:22:45 +02:00
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 46,
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
"test_set_raw, valid_set_raw, train_set_raw = tfds.load(\n",
" \"tf_flowers\",\n",
" split=[\"train[:10%]\", \"train[10%:25%]\", \"train[25%:]\"],\n",
" as_supervised=True)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 47,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plt.figure(figsize=(12, 10))\n",
"index = 0\n",
"for image, label in train_set_raw.take(9):\n",
" index += 1\n",
" plt.subplot(3, 3, index)\n",
" plt.imshow(image)\n",
" plt.title(\"Class: {}\".format(class_names[label]))\n",
" plt.axis(\"off\")\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"Basic preprocessing:"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def preprocess(image, label):\n",
" resized_image = tf.image.resize(image, [224, 224])\n",
" final_image = keras.applications.xception.preprocess_input(resized_image)\n",
" return final_image, label"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"Slightly fancier preprocessing (but you could add much more data augmentation):"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def central_crop(image):\n",
" shape = tf.shape(image)\n",
" min_dim = tf.reduce_min([shape[0], shape[1]])\n",
" top_crop = (shape[0] - min_dim) // 4\n",
" bottom_crop = shape[0] - top_crop\n",
" left_crop = (shape[1] - min_dim) // 4\n",
" right_crop = shape[1] - left_crop\n",
" return image[top_crop:bottom_crop, left_crop:right_crop]\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"def random_crop(image):\n",
" shape = tf.shape(image)\n",
" min_dim = tf.reduce_min([shape[0], shape[1]]) * 90 // 100\n",
" return tf.image.random_crop(image, [min_dim, min_dim, 3])\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"def preprocess(image, label, randomize=False):\n",
" if randomize:\n",
" cropped_image = random_crop(image)\n",
" cropped_image = tf.image.random_flip_left_right(cropped_image)\n",
" else:\n",
" cropped_image = central_crop(image)\n",
" resized_image = tf.image.resize(cropped_image, [224, 224])\n",
" final_image = keras.applications.xception.preprocess_input(resized_image)\n",
" return final_image, label\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"batch_size = 32\n",
"train_set = train_set_raw.shuffle(1000).repeat()\n",
"train_set = train_set.map(partial(preprocess, randomize=True)).batch(batch_size).prefetch(1)\n",
"valid_set = valid_set_raw.map(preprocess).batch(batch_size).prefetch(1)\n",
"test_set = test_set_raw.map(preprocess).batch(batch_size).prefetch(1)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plt.figure(figsize=(12, 12))\n",
"for X_batch, y_batch in train_set.take(1):\n",
" for index in range(9):\n",
" plt.subplot(3, 3, index + 1)\n",
" plt.imshow(X_batch[index] / 2 + 0.5)\n",
" plt.title(\"Class: {}\".format(class_names[y_batch[index]]))\n",
" plt.axis(\"off\")\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plt.figure(figsize=(12, 12))\n",
"for X_batch, y_batch in test_set.take(1):\n",
" for index in range(9):\n",
" plt.subplot(3, 3, index + 1)\n",
" plt.imshow(X_batch[index] / 2 + 0.5)\n",
" plt.title(\"Class: {}\".format(class_names[y_batch[index]]))\n",
" plt.axis(\"off\")\n",
"\n",
2017-05-05 15:22:45 +02:00
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
2017-05-05 15:22:45 +02:00
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"base_model = keras.applications.xception.Xception(weights=\"imagenet\",\n",
" include_top=False)\n",
"avg = keras.layers.GlobalAveragePooling2D()(base_model.output)\n",
"output = keras.layers.Dense(n_classes, activation=\"softmax\")(avg)\n",
"model = keras.models.Model(inputs=base_model.input, outputs=output)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"for index, layer in enumerate(base_model.layers):\n",
" print(index, layer.name)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"for layer in base_model.layers:\n",
" layer.trainable = False\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"optimizer = keras.optimizers.SGD(lr=0.2, momentum=0.9, decay=0.01)\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer,\n",
" metrics=[\"accuracy\"])\n",
"history = model.fit(train_set,\n",
" steps_per_epoch=int(0.75 * dataset_size / batch_size),\n",
" validation_data=valid_set,\n",
" validation_steps=int(0.15 * dataset_size / batch_size),\n",
" epochs=5)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"for layer in base_model.layers:\n",
" layer.trainable = True\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"optimizer = keras.optimizers.SGD(learning_rate=0.01, momentum=0.9,\n",
" nesterov=True, decay=0.001)\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer,\n",
" metrics=[\"accuracy\"])\n",
"history = model.fit(train_set,\n",
" steps_per_epoch=int(0.75 * dataset_size / batch_size),\n",
" validation_data=valid_set,\n",
" validation_steps=int(0.15 * dataset_size / batch_size),\n",
" epochs=40)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"# Classification and Localization"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"base_model = keras.applications.xception.Xception(weights=\"imagenet\",\n",
" include_top=False)\n",
"avg = keras.layers.GlobalAveragePooling2D()(base_model.output)\n",
"class_output = keras.layers.Dense(n_classes, activation=\"softmax\")(avg)\n",
"loc_output = keras.layers.Dense(4)(avg)\n",
"model = keras.models.Model(inputs=base_model.input,\n",
" outputs=[class_output, loc_output])\n",
"model.compile(loss=[\"sparse_categorical_crossentropy\", \"mse\"],\n",
" loss_weights=[0.8, 0.2], # depends on what you care most about\n",
" optimizer=optimizer, metrics=[\"accuracy\"])"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def add_random_bounding_boxes(images, labels):\n",
" fake_bboxes = tf.random.uniform([tf.shape(images)[0], 4])\n",
" return images, (labels, fake_bboxes)\n",
"\n",
"fake_train_set = train_set.take(5).repeat(2).map(add_random_bounding_boxes)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"model.fit(fake_train_set, steps_per_epoch=5, epochs=2)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"### Mean Average Precision (mAP)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def maximum_precisions(precisions):\n",
" return np.flip(np.maximum.accumulate(np.flip(precisions)))"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"recalls = np.linspace(0, 1, 11)\n",
"\n",
"precisions = [0.91, 0.94, 0.96, 0.94, 0.95, 0.92, 0.80, 0.60, 0.45, 0.20, 0.10]\n",
"max_precisions = maximum_precisions(precisions)\n",
"mAP = max_precisions.mean()\n",
"plt.plot(recalls, precisions, \"ro--\", label=\"Precision\")\n",
"plt.plot(recalls, max_precisions, \"bo-\", label=\"Max Precision\")\n",
"plt.xlabel(\"Recall\")\n",
"plt.ylabel(\"Precision\")\n",
"plt.plot([0, 1], [mAP, mAP], \"g:\", linewidth=3, label=\"mAP\")\n",
"plt.grid(True)\n",
"plt.axis([0, 1, 0, 1])\n",
"plt.legend(loc=\"lower center\", fontsize=14)\n",
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"Transpose convolutions:"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"tf.random.set_seed(42)\n",
"X = images_resized.numpy()\n",
"\n",
"conv_transpose = keras.layers.Conv2DTranspose(filters=5, kernel_size=3, strides=2, padding=\"VALID\")\n",
"output = conv_transpose(X)\n",
"output.shape"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def normalize(X):\n",
" return (X - tf.reduce_min(X)) / (tf.reduce_max(X) - tf.reduce_min(X))\n",
"\n",
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[1, 2])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(X[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(normalize(output[0, ..., :3]), interpolation=\"bicubic\") # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def upscale_images(images, stride, kernel_size):\n",
" batch_size, height, width, channels = images.shape\n",
" upscaled = np.zeros((batch_size,\n",
" (height - 1) * stride + 2 * kernel_size - 1,\n",
" (width - 1) * stride + 2 * kernel_size - 1,\n",
" channels))\n",
" upscaled[:,\n",
" kernel_size - 1:(height - 1) * stride + kernel_size:stride,\n",
" kernel_size - 1:(width - 1) * stride + kernel_size:stride,\n",
" :] = images\n",
" return upscaled"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"upscaled = upscale_images(X, stride=2, kernel_size=3)\n",
"weights, biases = conv_transpose.weights\n",
"reversed_filters = np.flip(weights.numpy(), axis=[0, 1])\n",
"reversed_filters = np.transpose(reversed_filters, [0, 1, 3, 2])\n",
"manual_output = tf.nn.conv2d(upscaled, reversed_filters, strides=1, padding=\"VALID\")"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 65,
2019-03-24 02:06:29 +01:00
"metadata": {
"scrolled": true
},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def normalize(X):\n",
" return (X - tf.reduce_min(X)) / (tf.reduce_max(X) - tf.reduce_min(X))\n",
"\n",
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=3, width_ratios=[1, 2, 2])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(X[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Upscaled\", fontsize=14)\n",
"ax2.imshow(upscaled[0], interpolation=\"bicubic\")\n",
"ax2.axis(\"off\")\n",
"ax3 = fig.add_subplot(gs[0, 2])\n",
"ax3.set_title(\"Output\", fontsize=14)\n",
"ax3.imshow(normalize(manual_output[0, ..., :3]), interpolation=\"bicubic\") # plot the output for the 1st image\n",
"ax3.axis(\"off\")\n",
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"np.allclose(output, manual_output.numpy(), atol=1e-7)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2020-04-06 09:13:12 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"# Exercises"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## 1. to 8."
2017-05-05 15:22:45 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"See appendix A."
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## 9. High Accuracy CNN for MNIST\n",
"_Exercise: Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST._"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following model uses 2 convolutional layers, followed by 1 pooling layer, then dropout 25%, then a dense layer, another dropout layer but with 50% dropout, and finally the output layer. It reaches about 99.2% accuracy on the test set. This places this model roughly in the top 20% in the [MNIST Kaggle competition](https://www.kaggle.com/c/digit-recognizer/) (if we ignore the models with an accuracy greater than 99.79% which were most likely trained on the test set, as explained by Chris Deotte in [this post](https://www.kaggle.com/c/digit-recognizer/discussion/61480)). Can you do better? To reach 99.5 to 99.7% accuracy on the test set, you need to add image augmentation, batch norm, use a learning schedule such as 1-cycle, and possibly create an ensemble."
]
2017-05-05 15:22:45 +02:00
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
"(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()\n",
"X_train_full = X_train_full / 255.\n",
"X_test = X_test / 255.\n",
"X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n",
"y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]\n",
"\n",
"X_train = X_train[..., np.newaxis]\n",
"X_valid = X_valid[..., np.newaxis]\n",
"X_test = X_test[..., np.newaxis]"
]
2017-05-05 15:22:45 +02:00
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
"keras.backend.clear_session()\n",
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"model = keras.models.Sequential([\n",
" keras.layers.Conv2D(32, kernel_size=3, padding=\"same\", activation=\"relu\"),\n",
" keras.layers.Conv2D(64, kernel_size=3, padding=\"same\", activation=\"relu\"),\n",
" keras.layers.MaxPool2D(),\n",
" keras.layers.Flatten(),\n",
" keras.layers.Dropout(0.25),\n",
" keras.layers.Dense(128, activation=\"relu\"),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\",\n",
" metrics=[\"accuracy\"])\n",
"\n",
"model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))\n",
"model.evaluate(X_test, y_test)"
]
2017-05-05 15:22:45 +02:00
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "markdown",
2020-04-06 09:13:12 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
"## 10. Use transfer learning for large image classification"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
"_Exercise: Use transfer learning for large image classification, going through these steps:_\n",
"\n",
"* _Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can use an existing dataset (e.g., from TensorFlow Datasets)._\n",
"* _Split it into a training set, a validation set, and a test set._\n",
"* _Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation._\n",
"* _Fine-tune a pretrained model on this dataset._"
]
},
2017-05-05 15:22:45 +02:00
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
"See the Flowers example above."
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## 11.\n",
"_Exercise: Go through TensorFlow's [Style Transfer tutorial](https://homl.info/styletuto). It is a fun way to generate art using Deep Learning._\n"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2020-04-06 09:13:12 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
"Simply open the Colab and follow its instructions."
2016-09-27 23:31:21 +02:00
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2021-05-25 02:07:29 +02:00
"version": "3.7.10"
2016-09-27 23:31:21 +02:00
},
"nav_menu": {},
"toc": {
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 6,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
2020-04-06 09:13:12 +02:00
"nbformat_minor": 4
2016-09-27 23:31:21 +02:00
}