2016-09-27 23:31:21 +02:00
{
"cells": [
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-04-16 14:39:14 +02:00
"**Chapter 14 – Deep Computer Vision Using Convolutional Neural Networks**"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"_This notebook contains all the sample code in chapter 14._"
2016-09-27 23:31:21 +02:00
]
},
2019-11-06 14:06:55 +01:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<table align=\"left\">\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/ageron/handson-ml2/blob/master/14_deep_computer_vision_with_cnns.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n",
"</table>"
]
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-11-06 14:06:55 +01:00
"First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0."
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 1,
2018-12-25 12:48:05 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"# Python ≥3.5 is required\n",
"import sys\n",
"assert sys.version_info >= (3, 5)\n",
"\n",
2021-05-25 02:07:29 +02:00
"# Is this notebook running on Colab or Kaggle?\n",
"IS_COLAB = \"google.colab\" in sys.modules\n",
"IS_KAGGLE = \"kaggle_secrets\" in sys.modules\n",
"\n",
2019-03-24 02:06:29 +01:00
"# Scikit-Learn ≥0.20 is required\n",
"import sklearn\n",
"assert sklearn.__version__ >= \"0.20\"\n",
"\n",
2019-11-06 14:06:55 +01:00
"# TensorFlow ≥2.0 is required\n",
2019-03-24 02:06:29 +01:00
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"assert tf.__version__ >= \"2.0\"\n",
2016-09-27 23:31:21 +02:00
"\n",
2020-03-21 01:37:05 +01:00
"if not tf.config.list_physical_devices('GPU'):\n",
2019-11-06 14:06:55 +01:00
" print(\"No GPU was detected. CNNs can be very slow without a GPU.\")\n",
" if IS_COLAB:\n",
" print(\"Go to Runtime > Change runtime and select a GPU hardware accelerator.\")\n",
2021-05-25 02:07:29 +02:00
" if IS_KAGGLE:\n",
" print(\"Go to Settings > Accelerator and select GPU.\")\n",
2019-11-06 14:06:55 +01:00
"\n",
2016-09-27 23:31:21 +02:00
"# Common imports\n",
"import numpy as np\n",
"import os\n",
"\n",
"# to make this notebook's output stable across runs\n",
2019-03-24 02:06:29 +01:00
"np.random.seed(42)\n",
"tf.random.set_seed(42)\n",
2016-09-27 23:31:21 +02:00
"\n",
"# To plot pretty figures\n",
"%matplotlib inline\n",
2019-03-24 02:06:29 +01:00
"import matplotlib as mpl\n",
2016-09-27 23:31:21 +02:00
"import matplotlib.pyplot as plt\n",
2019-03-24 02:06:29 +01:00
"mpl.rc('axes', labelsize=14)\n",
"mpl.rc('xtick', labelsize=12)\n",
"mpl.rc('ytick', labelsize=12)\n",
2016-09-27 23:31:21 +02:00
"\n",
"# Where to save the figures\n",
"PROJECT_ROOT_DIR = \".\"\n",
"CHAPTER_ID = \"cnn\"\n",
2019-03-24 02:06:29 +01:00
"IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID)\n",
"os.makedirs(IMAGES_PATH, exist_ok=True)\n",
2016-09-27 23:31:21 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n",
" path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n",
2016-09-27 23:31:21 +02:00
" print(\"Saving figure\", fig_id)\n",
" if tight_layout:\n",
" plt.tight_layout()\n",
2019-03-24 02:06:29 +01:00
" plt.savefig(path, format=fig_extension, dpi=resolution)"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
"A couple utility functions to plot grayscale and RGB images:"
]
},
{
"cell_type": "code",
"execution_count": 2,
2018-12-25 12:48:05 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
"def plot_image(image):\n",
" plt.imshow(image, cmap=\"gray\", interpolation=\"nearest\")\n",
" plt.axis(\"off\")\n",
"\n",
"def plot_color_image(image):\n",
2019-03-24 02:06:29 +01:00
" plt.imshow(image, interpolation=\"nearest\")\n",
2016-09-27 23:31:21 +02:00
" plt.axis(\"off\")"
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"# What is a Convolution?"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 3,
2018-12-25 12:48:05 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"import numpy as np\n",
"from sklearn.datasets import load_sample_image\n",
"\n",
"# Load sample images\n",
"china = load_sample_image(\"china.jpg\") / 255\n",
"flower = load_sample_image(\"flower.jpg\") / 255\n",
"images = np.array([china, flower])\n",
"batch_size, height, width, channels = images.shape\n",
"\n",
"# Create 2 filters\n",
"filters = np.zeros(shape=(7, 7, channels, 2), dtype=np.float32)\n",
"filters[:, 3, :, 0] = 1 # vertical line\n",
"filters[3, :, :, 1] = 1 # horizontal line\n",
"\n",
"outputs = tf.nn.conv2d(images, filters, strides=1, padding=\"SAME\")\n",
"\n",
"plt.imshow(outputs[0, :, :, 1], cmap=\"gray\") # plot 1st image's 2nd feature map\n",
"plt.axis(\"off\") # Not shown in the book\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 4,
2018-12-25 12:48:05 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"for image_index in (0, 1):\n",
" for feature_map_index in (0, 1):\n",
" plt.subplot(2, 2, image_index * 2 + feature_map_index + 1)\n",
" plot_image(outputs[image_index, :, :, feature_map_index])\n",
"\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 5,
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def crop(images):\n",
" return images[150:220, 130:250]"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 6,
2018-12-25 12:48:05 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plot_image(crop(images[0, :, :, 0]))\n",
"save_fig(\"china_original\", tight_layout=False)\n",
"plt.show()\n",
2016-09-27 23:31:21 +02:00
"\n",
2019-05-06 07:14:23 +02:00
"for feature_map_index, filename in enumerate([\"china_vertical\", \"china_horizontal\"]):\n",
2019-03-24 02:06:29 +01:00
" plot_image(crop(outputs[0, :, :, feature_map_index]))\n",
" save_fig(filename, tight_layout=False)\n",
" plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 7,
2018-12-25 12:48:05 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plot_image(filters[:, :, 0, 0])\n",
"plt.show()\n",
"plot_image(filters[:, :, 0, 1])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Convolutional Layer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `keras.layers.Conv2D()`:"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 8,
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"conv = keras.layers.Conv2D(filters=32, kernel_size=3, strides=1,\n",
" padding=\"SAME\", activation=\"relu\")"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 9,
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plot_image(crop(outputs[0, :, :, 0]))\n",
2016-09-27 23:31:21 +02:00
"plt.show()"
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## VALID vs SAME padding"
2016-09-27 23:31:21 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 10,
2017-09-15 21:39:09 +02:00
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"def feature_map_size(input_size, kernel_size, strides=1, padding=\"SAME\"):\n",
" if padding == \"SAME\":\n",
" return (input_size - 1) // strides + 1\n",
" else:\n",
" return (input_size - kernel_size) // strides + 1"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
"execution_count": 11,
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def pad_before_and_padded_size(input_size, kernel_size, strides=1):\n",
" fmap_size = feature_map_size(input_size, kernel_size, strides)\n",
" padded_size = max((fmap_size - 1) * strides + kernel_size, input_size)\n",
" pad_before = (padded_size - input_size) // 2\n",
" return pad_before, padded_size"
2017-06-05 18:56:44 +02:00
]
},
{
"cell_type": "code",
"execution_count": 12,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-06-05 18:56:44 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def manual_same_padding(images, kernel_size, strides=1):\n",
" if kernel_size == 1:\n",
" return images.astype(np.float32)\n",
" batch_size, height, width, channels = images.shape\n",
" top_pad, padded_height = pad_before_and_padded_size(height, kernel_size, strides)\n",
" left_pad, padded_width = pad_before_and_padded_size(width, kernel_size, strides)\n",
" padded_shape = [batch_size, padded_height, padded_width, channels]\n",
" padded_images = np.zeros(padded_shape, dtype=np.float32)\n",
" padded_images[:, top_pad:height+top_pad, left_pad:width+left_pad, :] = images\n",
" return padded_images"
2016-09-27 23:31:21 +02:00
]
},
2017-06-05 18:56:44 +02:00
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-06-05 18:56:44 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"Using `\"SAME\"` padding is equivalent to padding manually using `manual_same_padding()` then using `\"VALID\"` padding (confusingly, `\"VALID\"` padding means no padding at all):"
2017-06-05 18:56:44 +02:00
]
},
{
"cell_type": "code",
2017-06-08 15:43:16 +02:00
"execution_count": 13,
2018-12-25 12:48:05 +01:00
"metadata": {},
2017-06-05 18:56:44 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"kernel_size = 7\n",
"strides = 2\n",
2017-06-08 15:43:16 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"conv_valid = keras.layers.Conv2D(filters=1, kernel_size=kernel_size, strides=strides, padding=\"VALID\")\n",
"conv_same = keras.layers.Conv2D(filters=1, kernel_size=kernel_size, strides=strides, padding=\"SAME\")\n",
2017-06-05 18:56:44 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"valid_output = conv_valid(manual_same_padding(images, kernel_size, strides))\n",
"\n",
"# Need to call build() so conv_same's weights get created\n",
"conv_same.build(tf.TensorShape(images.shape))\n",
"\n",
"# Copy the weights from conv_valid to conv_same\n",
"conv_same.set_weights(conv_valid.get_weights())\n",
"\n",
"same_output = conv_same(images.astype(np.float32))\n",
"\n",
"assert np.allclose(valid_output.numpy(), same_output.numpy())"
2017-06-05 18:56:44 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-06-05 18:56:44 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"# Pooling layer"
2017-06-05 18:56:44 +02:00
]
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## Max pooling"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 14,
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"max_pool = keras.layers.MaxPool2D(pool_size=2)"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 15,
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2020-03-21 01:37:05 +01:00
"cropped_images = np.array([crop(image) for image in images], dtype=np.float32)\n",
2019-03-24 02:06:29 +01:00
"output = max_pool(cropped_images)"
2016-09-27 23:31:21 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 16,
2017-09-15 21:39:09 +02:00
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[2, 1])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(cropped_images[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(output[0]) # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"save_fig(\"china_max_pooling\")\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "markdown",
2018-12-25 12:48:05 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## Depth-wise pooling"
2017-06-05 18:56:44 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 17,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-06-05 18:56:44 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"class DepthMaxPool(keras.layers.Layer):\n",
" def __init__(self, pool_size, strides=None, padding=\"VALID\", **kwargs):\n",
" super().__init__(**kwargs)\n",
" if strides is None:\n",
" strides = pool_size\n",
" self.pool_size = pool_size\n",
" self.strides = strides\n",
" self.padding = padding\n",
" def call(self, inputs):\n",
" return tf.nn.max_pool(inputs,\n",
" ksize=(1, 1, 1, self.pool_size),\n",
" strides=(1, 1, 1, self.pool_size),\n",
" padding=self.padding)"
2017-06-05 18:56:44 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 18,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-06-05 18:56:44 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"depth_pool = DepthMaxPool(3)\n",
"with tf.device(\"/cpu:0\"): # there is no GPU-kernel yet\n",
" depth_output = depth_pool(cropped_images)\n",
"depth_output.shape"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"Or just use a `Lambda` layer:"
2016-09-27 23:31:21 +02:00
]
},
2017-04-30 10:21:27 +02:00
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 19,
2017-09-15 21:39:09 +02:00
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-04-30 10:21:27 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"depth_pool = keras.layers.Lambda(lambda X: tf.nn.max_pool(\n",
" X, ksize=(1, 1, 1, 3), strides=(1, 1, 1, 3), padding=\"VALID\"))\n",
"with tf.device(\"/cpu:0\"): # there is no GPU-kernel yet\n",
" depth_output = depth_pool(cropped_images)\n",
"depth_output.shape"
2017-04-30 10:21:27 +02:00
]
},
2017-04-07 21:33:53 +02:00
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 20,
2018-12-25 12:48:05 +01:00
"metadata": {},
2017-04-07 21:33:53 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plt.figure(figsize=(12, 8))\n",
"plt.subplot(1, 2, 1)\n",
"plt.title(\"Input\", fontsize=14)\n",
"plot_color_image(cropped_images[0]) # plot the 1st image\n",
"plt.subplot(1, 2, 2)\n",
"plt.title(\"Output\", fontsize=14)\n",
"plot_image(depth_output[0, ..., 0]) # plot the output for the 1st image\n",
"plt.axis(\"off\")\n",
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
2017-09-20 18:45:31 +02:00
{
"cell_type": "markdown",
2018-12-25 12:48:05 +01:00
"metadata": {},
2017-09-20 18:45:31 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## Average pooling"
2017-09-20 18:45:31 +02:00
]
},
2017-05-05 15:22:45 +02:00
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 21,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"avg_pool = keras.layers.AvgPool2D(pool_size=2)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 22,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
2018-12-25 12:48:05 +01:00
"source": [
2019-03-24 02:06:29 +01:00
"output_avg = avg_pool(cropped_images)"
2018-12-25 12:48:05 +01:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 23,
2018-12-25 12:48:05 +01:00
"metadata": {},
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[2, 1])\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(cropped_images[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(output_avg[0]) # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## Global Average Pooling"
2017-05-05 15:22:45 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 24,
2017-09-15 21:39:09 +02:00
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"global_avg_pool = keras.layers.GlobalAvgPool2D()\n",
"global_avg_pool(cropped_images)"
2017-05-05 15:22:45 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 25,
2017-09-15 21:39:09 +02:00
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"output_global_avg2 = keras.layers.Lambda(lambda X: tf.reduce_mean(X, axis=[1, 2]))\n",
"output_global_avg2(cropped_images)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"# Tackling Fashion MNIST With a CNN"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 26,
2018-12-25 12:48:05 +01:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()\n",
"X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n",
"y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]\n",
2017-04-30 10:21:27 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"X_mean = X_train.mean(axis=0, keepdims=True)\n",
"X_std = X_train.std(axis=0, keepdims=True) + 1e-7\n",
"X_train = (X_train - X_mean) / X_std\n",
"X_valid = (X_valid - X_mean) / X_std\n",
"X_test = (X_test - X_mean) / X_std\n",
2017-04-30 10:21:27 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"X_train = X_train[..., np.newaxis]\n",
"X_valid = X_valid[..., np.newaxis]\n",
"X_test = X_test[..., np.newaxis]"
2017-04-07 21:33:53 +02:00
]
},
2017-05-05 15:22:45 +02:00
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 27,
2017-09-15 21:39:09 +02:00
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"from functools import partial\n",
"\n",
"DefaultConv2D = partial(keras.layers.Conv2D,\n",
" kernel_size=3, activation='relu', padding=\"SAME\")\n",
"\n",
"model = keras.models.Sequential([\n",
" DefaultConv2D(filters=64, kernel_size=7, input_shape=[28, 28, 1]),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" DefaultConv2D(filters=128),\n",
" DefaultConv2D(filters=128),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" DefaultConv2D(filters=256),\n",
" DefaultConv2D(filters=256),\n",
" keras.layers.MaxPooling2D(pool_size=2),\n",
" keras.layers.Flatten(),\n",
" keras.layers.Dense(units=128, activation='relu'),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(units=64, activation='relu'),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(units=10, activation='softmax'),\n",
"])"
2017-05-05 15:22:45 +02:00
]
},
2017-04-30 10:21:27 +02:00
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 28,
2018-12-25 12:48:05 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\", metrics=[\"accuracy\"])\n",
2020-03-31 12:09:52 +02:00
"history = model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))\n",
2019-03-24 02:06:29 +01:00
"score = model.evaluate(X_test, y_test)\n",
"X_new = X_test[:10] # pretend we have new images\n",
"y_pred = model.predict(X_new)"
2016-09-27 23:31:21 +02:00
]
},
2017-05-05 15:22:45 +02:00
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## ResNet-34"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 29,
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-25 05:19:27 +01:00
"DefaultConv2D = partial(keras.layers.Conv2D, kernel_size=3, strides=1,\n",
" padding=\"SAME\", use_bias=False)\n",
2017-04-30 10:21:27 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"class ResidualUnit(keras.layers.Layer):\n",
" def __init__(self, filters, strides=1, activation=\"relu\", **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.activation = keras.activations.get(activation)\n",
" self.main_layers = [\n",
" DefaultConv2D(filters, strides=strides),\n",
" keras.layers.BatchNormalization(),\n",
" self.activation,\n",
" DefaultConv2D(filters),\n",
" keras.layers.BatchNormalization()]\n",
" self.skip_layers = []\n",
" if strides > 1:\n",
" self.skip_layers = [\n",
" DefaultConv2D(filters, kernel_size=1, strides=strides),\n",
" keras.layers.BatchNormalization()]\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
" def call(self, inputs):\n",
" Z = inputs\n",
" for layer in self.main_layers:\n",
" Z = layer(Z)\n",
" skip_Z = inputs\n",
" for layer in self.skip_layers:\n",
" skip_Z = layer(skip_Z)\n",
" return self.activation(Z + skip_Z)"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 30,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"model = keras.models.Sequential()\n",
"model.add(DefaultConv2D(64, kernel_size=7, strides=2,\n",
" input_shape=[224, 224, 3]))\n",
2019-03-25 05:19:27 +01:00
"model.add(keras.layers.BatchNormalization())\n",
"model.add(keras.layers.Activation(\"relu\"))\n",
2019-03-24 02:06:29 +01:00
"model.add(keras.layers.MaxPool2D(pool_size=3, strides=2, padding=\"SAME\"))\n",
"prev_filters = 64\n",
"for filters in [64] * 3 + [128] * 4 + [256] * 6 + [512] * 3:\n",
" strides = 1 if filters == prev_filters else 2\n",
" model.add(ResidualUnit(filters, strides=strides))\n",
" prev_filters = filters\n",
"model.add(keras.layers.GlobalAvgPool2D())\n",
"model.add(keras.layers.Flatten())\n",
"model.add(keras.layers.Dense(10, activation=\"softmax\"))"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 31,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"model.summary()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## Using a Pretrained Model"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 32,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
2017-09-17 21:15:55 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"model = keras.applications.resnet50.ResNet50(weights=\"imagenet\")"
2017-09-17 21:15:55 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 33,
"metadata": {},
2017-09-17 21:15:55 +02:00
"outputs": [],
2016-09-27 23:31:21 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"images_resized = tf.image.resize(images, [224, 224])\n",
"plot_color_image(images_resized[0])\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 34,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"images_resized = tf.image.resize_with_pad(images, 224, 224, antialias=True)\n",
"plot_color_image(images_resized[0])"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 35,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-06-27 10:10:22 +02:00
"images_resized = tf.image.resize_with_crop_or_pad(images, 224, 224)\n",
2019-03-24 02:06:29 +01:00
"plot_color_image(images_resized[0])\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 36,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"china_box = [0, 0.03, 1, 0.68]\n",
"flower_box = [0.19, 0.26, 0.86, 0.7]\n",
"images_resized = tf.image.crop_and_resize(images, [china_box, flower_box], [0, 1], [224, 224])\n",
"plot_color_image(images_resized[0])\n",
"plt.show()\n",
"plot_color_image(images_resized[1])\n",
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 37,
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"inputs = keras.applications.resnet50.preprocess_input(images_resized * 255)\n",
"Y_proba = model.predict(inputs)"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 38,
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"Y_proba.shape"
2017-05-05 15:22:45 +02:00
]
},
2016-09-27 23:31:21 +02:00
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 39,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"top_K = keras.applications.resnet50.decode_predictions(Y_proba, top=3)\n",
"for image_index in range(len(images)):\n",
" print(\"Image #{}\".format(image_index))\n",
" for class_id, name, y_proba in top_K[image_index]:\n",
" print(\" {} - {:12s} {:.2f}%\".format(class_id, name, y_proba * 100))\n",
" print()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## Pretrained Models for Transfer Learning"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 40,
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"import tensorflow_datasets as tfds\n",
2016-09-27 23:31:21 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"dataset, info = tfds.load(\"tf_flowers\", as_supervised=True, with_info=True)"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 41,
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"info.splits"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 42,
2017-09-15 21:39:09 +02:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"info.splits[\"train\"]"
2016-09-27 23:31:21 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 43,
2019-03-24 02:06:29 +01:00
"metadata": {},
2016-09-27 23:31:21 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"class_names = info.features[\"label\"].names\n",
"class_names"
2017-05-05 15:22:45 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 44,
2017-09-15 21:39:09 +02:00
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"n_classes = info.features[\"label\"].num_classes"
2017-05-05 15:22:45 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 45,
2017-09-15 21:39:09 +02:00
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"dataset_size = info.splits[\"train\"].num_examples\n",
"dataset_size"
2017-05-05 15:22:45 +02:00
]
},
2020-03-21 01:37:05 +01:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning:** TFDS's split API has evolved since the book was published. The [new split API](https://www.tensorflow.org/datasets/splits) (called S3) is much simpler to use:"
]
},
2017-05-05 15:22:45 +02:00
{
2019-03-24 02:06:29 +01:00
"cell_type": "code",
"execution_count": 46,
2017-09-15 21:39:09 +02:00
"metadata": {},
2019-03-24 02:06:29 +01:00
"outputs": [],
2017-05-05 15:22:45 +02:00
"source": [
2020-03-21 01:37:05 +01:00
"test_set_raw, valid_set_raw, train_set_raw = tfds.load(\n",
" \"tf_flowers\",\n",
" split=[\"train[:10%]\", \"train[10%:25%]\", \"train[25%:]\"],\n",
" as_supervised=True)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2019-03-24 02:06:29 +01:00
"execution_count": 47,
2018-12-25 12:48:05 +01:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plt.figure(figsize=(12, 10))\n",
"index = 0\n",
"for image, label in train_set_raw.take(9):\n",
" index += 1\n",
" plt.subplot(3, 3, index)\n",
" plt.imshow(image)\n",
" plt.title(\"Class: {}\".format(class_names[label]))\n",
" plt.axis(\"off\")\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"Basic preprocessing:"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 48,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def preprocess(image, label):\n",
" resized_image = tf.image.resize(image, [224, 224])\n",
" final_image = keras.applications.xception.preprocess_input(resized_image)\n",
" return final_image, label"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"Slightly fancier preprocessing (but you could add much more data augmentation):"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 49,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def central_crop(image):\n",
" shape = tf.shape(image)\n",
" min_dim = tf.reduce_min([shape[0], shape[1]])\n",
" top_crop = (shape[0] - min_dim) // 4\n",
" bottom_crop = shape[0] - top_crop\n",
" left_crop = (shape[1] - min_dim) // 4\n",
" right_crop = shape[1] - left_crop\n",
" return image[top_crop:bottom_crop, left_crop:right_crop]\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"def random_crop(image):\n",
" shape = tf.shape(image)\n",
" min_dim = tf.reduce_min([shape[0], shape[1]]) * 90 // 100\n",
" return tf.image.random_crop(image, [min_dim, min_dim, 3])\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"def preprocess(image, label, randomize=False):\n",
" if randomize:\n",
" cropped_image = random_crop(image)\n",
" cropped_image = tf.image.random_flip_left_right(cropped_image)\n",
" else:\n",
" cropped_image = central_crop(image)\n",
" resized_image = tf.image.resize(cropped_image, [224, 224])\n",
" final_image = keras.applications.xception.preprocess_input(resized_image)\n",
" return final_image, label\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"batch_size = 32\n",
"train_set = train_set_raw.shuffle(1000).repeat()\n",
"train_set = train_set.map(partial(preprocess, randomize=True)).batch(batch_size).prefetch(1)\n",
"valid_set = valid_set_raw.map(preprocess).batch(batch_size).prefetch(1)\n",
"test_set = test_set_raw.map(preprocess).batch(batch_size).prefetch(1)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 50,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plt.figure(figsize=(12, 12))\n",
"for X_batch, y_batch in train_set.take(1):\n",
" for index in range(9):\n",
" plt.subplot(3, 3, index + 1)\n",
" plt.imshow(X_batch[index] / 2 + 0.5)\n",
" plt.title(\"Class: {}\".format(class_names[y_batch[index]]))\n",
" plt.axis(\"off\")\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 51,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"plt.figure(figsize=(12, 12))\n",
"for X_batch, y_batch in test_set.take(1):\n",
" for index in range(9):\n",
" plt.subplot(3, 3, index + 1)\n",
" plt.imshow(X_batch[index] / 2 + 0.5)\n",
" plt.title(\"Class: {}\".format(class_names[y_batch[index]]))\n",
" plt.axis(\"off\")\n",
"\n",
2017-05-05 15:22:45 +02:00
"plt.show()"
2016-09-27 23:31:21 +02:00
]
},
2017-05-05 15:22:45 +02:00
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 52,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"base_model = keras.applications.xception.Xception(weights=\"imagenet\",\n",
" include_top=False)\n",
"avg = keras.layers.GlobalAveragePooling2D()(base_model.output)\n",
"output = keras.layers.Dense(n_classes, activation=\"softmax\")(avg)\n",
"model = keras.models.Model(inputs=base_model.input, outputs=output)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 53,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"for index, layer in enumerate(base_model.layers):\n",
" print(index, layer.name)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 54,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"for layer in base_model.layers:\n",
" layer.trainable = False\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"optimizer = keras.optimizers.SGD(lr=0.2, momentum=0.9, decay=0.01)\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer,\n",
" metrics=[\"accuracy\"])\n",
"history = model.fit(train_set,\n",
" steps_per_epoch=int(0.75 * dataset_size / batch_size),\n",
" validation_data=valid_set,\n",
" validation_steps=int(0.15 * dataset_size / batch_size),\n",
" epochs=5)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 55,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"for layer in base_model.layers:\n",
" layer.trainable = True\n",
2017-05-05 15:22:45 +02:00
"\n",
2019-03-24 02:06:29 +01:00
"optimizer = keras.optimizers.SGD(learning_rate=0.01, momentum=0.9,\n",
" nesterov=True, decay=0.001)\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer,\n",
" metrics=[\"accuracy\"])\n",
"history = model.fit(train_set,\n",
" steps_per_epoch=int(0.75 * dataset_size / batch_size),\n",
" validation_data=valid_set,\n",
" validation_steps=int(0.15 * dataset_size / batch_size),\n",
" epochs=40)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"# Classification and Localization"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 56,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"base_model = keras.applications.xception.Xception(weights=\"imagenet\",\n",
" include_top=False)\n",
"avg = keras.layers.GlobalAveragePooling2D()(base_model.output)\n",
"class_output = keras.layers.Dense(n_classes, activation=\"softmax\")(avg)\n",
"loc_output = keras.layers.Dense(4)(avg)\n",
"model = keras.models.Model(inputs=base_model.input,\n",
" outputs=[class_output, loc_output])\n",
"model.compile(loss=[\"sparse_categorical_crossentropy\", \"mse\"],\n",
" loss_weights=[0.8, 0.2], # depends on what you care most about\n",
" optimizer=optimizer, metrics=[\"accuracy\"])"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 57,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def add_random_bounding_boxes(images, labels):\n",
" fake_bboxes = tf.random.uniform([tf.shape(images)[0], 4])\n",
" return images, (labels, fake_bboxes)\n",
"\n",
"fake_train_set = train_set.take(5).repeat(2).map(add_random_bounding_boxes)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 58,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"model.fit(fake_train_set, steps_per_epoch=5, epochs=2)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"### Mean Average Precision (mAP)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 59,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def maximum_precisions(precisions):\n",
" return np.flip(np.maximum.accumulate(np.flip(precisions)))"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 60,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"recalls = np.linspace(0, 1, 11)\n",
"\n",
"precisions = [0.91, 0.94, 0.96, 0.94, 0.95, 0.92, 0.80, 0.60, 0.45, 0.20, 0.10]\n",
"max_precisions = maximum_precisions(precisions)\n",
"mAP = max_precisions.mean()\n",
"plt.plot(recalls, precisions, \"ro--\", label=\"Precision\")\n",
"plt.plot(recalls, max_precisions, \"bo-\", label=\"Max Precision\")\n",
"plt.xlabel(\"Recall\")\n",
"plt.ylabel(\"Precision\")\n",
"plt.plot([0, 1], [mAP, mAP], \"g:\", linewidth=3, label=\"mAP\")\n",
"plt.grid(True)\n",
"plt.axis([0, 1, 0, 1])\n",
"plt.legend(loc=\"lower center\", fontsize=14)\n",
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"Transpose convolutions:"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 61,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"tf.random.set_seed(42)\n",
"X = images_resized.numpy()\n",
"\n",
"conv_transpose = keras.layers.Conv2DTranspose(filters=5, kernel_size=3, strides=2, padding=\"VALID\")\n",
"output = conv_transpose(X)\n",
"output.shape"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 62,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def normalize(X):\n",
" return (X - tf.reduce_min(X)) / (tf.reduce_max(X) - tf.reduce_min(X))\n",
"\n",
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=2, width_ratios=[1, 2])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(X[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Output\", fontsize=14)\n",
"ax2.imshow(normalize(output[0, ..., :3]), interpolation=\"bicubic\") # plot the output for the 1st image\n",
"ax2.axis(\"off\")\n",
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 63,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def upscale_images(images, stride, kernel_size):\n",
" batch_size, height, width, channels = images.shape\n",
" upscaled = np.zeros((batch_size,\n",
" (height - 1) * stride + 2 * kernel_size - 1,\n",
" (width - 1) * stride + 2 * kernel_size - 1,\n",
" channels))\n",
" upscaled[:,\n",
" kernel_size - 1:(height - 1) * stride + kernel_size:stride,\n",
" kernel_size - 1:(width - 1) * stride + kernel_size:stride,\n",
" :] = images\n",
" return upscaled"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 64,
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"upscaled = upscale_images(X, stride=2, kernel_size=3)\n",
"weights, biases = conv_transpose.weights\n",
"reversed_filters = np.flip(weights.numpy(), axis=[0, 1])\n",
"reversed_filters = np.transpose(reversed_filters, [0, 1, 3, 2])\n",
"manual_output = tf.nn.conv2d(upscaled, reversed_filters, strides=1, padding=\"VALID\")"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 65,
2019-03-24 02:06:29 +01:00
"metadata": {
"scrolled": true
},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"def normalize(X):\n",
" return (X - tf.reduce_min(X)) / (tf.reduce_max(X) - tf.reduce_min(X))\n",
"\n",
"fig = plt.figure(figsize=(12, 8))\n",
"gs = mpl.gridspec.GridSpec(nrows=1, ncols=3, width_ratios=[1, 2, 2])\n",
"\n",
"ax1 = fig.add_subplot(gs[0, 0])\n",
"ax1.set_title(\"Input\", fontsize=14)\n",
"ax1.imshow(X[0]) # plot the 1st image\n",
"ax1.axis(\"off\")\n",
"ax2 = fig.add_subplot(gs[0, 1])\n",
"ax2.set_title(\"Upscaled\", fontsize=14)\n",
"ax2.imshow(upscaled[0], interpolation=\"bicubic\")\n",
"ax2.axis(\"off\")\n",
"ax3 = fig.add_subplot(gs[0, 2])\n",
"ax3.set_title(\"Output\", fontsize=14)\n",
"ax3.imshow(normalize(manual_output[0, ..., :3]), interpolation=\"bicubic\") # plot the output for the 1st image\n",
"ax3.axis(\"off\")\n",
"plt.show()"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "code",
2018-12-25 12:48:05 +01:00
"execution_count": 66,
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
"source": [
2019-03-24 02:06:29 +01:00
"np.allclose(output, manual_output.numpy(), atol=1e-7)"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2020-04-06 09:13:12 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"# Exercises"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## 1. to 8."
2017-05-05 15:22:45 +02:00
]
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "markdown",
2018-12-25 12:48:05 +01:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"See appendix A."
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## 9. High Accuracy CNN for MNIST\n",
2020-03-21 01:37:05 +01:00
"_Exercise: Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST._"
2017-05-05 15:22:45 +02:00
]
},
{
2020-03-21 01:37:05 +01:00
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2020-03-21 01:37:05 +01:00
"source": [
"The following model uses 2 convolutional layers, followed by 1 pooling layer, then dropout 25%, then a dense layer, another dropout layer but with 50% dropout, and finally the output layer. It reaches about 99.2% accuracy on the test set. This places this model roughly in the top 20% in the [MNIST Kaggle competition](https://www.kaggle.com/c/digit-recognizer/) (if we ignore the models with an accuracy greater than 99.79% which were most likely trained on the test set, as explained by Chris Deotte in [this post](https://www.kaggle.com/c/digit-recognizer/discussion/61480)). Can you do better? To reach 99.5 to 99.7% accuracy on the test set, you need to add image augmentation, batch norm, use a learning schedule such as 1-cycle, and possibly create an ensemble."
]
2017-05-05 15:22:45 +02:00
},
{
"cell_type": "code",
2020-03-21 01:37:05 +01:00
"execution_count": 67,
2018-12-25 12:48:05 +01:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
2020-03-21 01:37:05 +01:00
"source": [
"(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()\n",
"X_train_full = X_train_full / 255.\n",
"X_test = X_test / 255.\n",
"X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]\n",
"y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]\n",
"\n",
"X_train = X_train[..., np.newaxis]\n",
"X_valid = X_valid[..., np.newaxis]\n",
"X_test = X_test[..., np.newaxis]"
]
2017-05-05 15:22:45 +02:00
},
{
"cell_type": "code",
2020-03-21 01:37:05 +01:00
"execution_count": 68,
2018-12-25 12:48:05 +01:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"outputs": [],
2020-03-21 01:37:05 +01:00
"source": [
"keras.backend.clear_session()\n",
"tf.random.set_seed(42)\n",
"np.random.seed(42)\n",
"\n",
"model = keras.models.Sequential([\n",
" keras.layers.Conv2D(32, kernel_size=3, padding=\"same\", activation=\"relu\"),\n",
" keras.layers.Conv2D(64, kernel_size=3, padding=\"same\", activation=\"relu\"),\n",
" keras.layers.MaxPool2D(),\n",
" keras.layers.Flatten(),\n",
" keras.layers.Dropout(0.25),\n",
" keras.layers.Dense(128, activation=\"relu\"),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(10, activation=\"softmax\")\n",
"])\n",
"model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"nadam\",\n",
" metrics=[\"accuracy\"])\n",
"\n",
2020-03-31 12:09:52 +02:00
"model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid))\n",
2020-03-21 01:37:05 +01:00
"model.evaluate(X_test, y_test)"
]
2017-05-05 15:22:45 +02:00
},
{
2019-03-24 02:06:29 +01:00
"cell_type": "markdown",
2020-04-06 09:13:12 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2020-03-21 01:37:05 +01:00
"## 10. Use transfer learning for large image classification"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2020-03-21 01:37:05 +01:00
"_Exercise: Use transfer learning for large image classification, going through these steps:_\n",
"\n",
"* _Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can use an existing dataset (e.g., from TensorFlow Datasets)._\n",
"* _Split it into a training set, a validation set, and a test set._\n",
"* _Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation._\n",
"* _Fine-tune a pretrained model on this dataset._"
2017-06-05 18:56:44 +02:00
]
},
2017-05-05 15:22:45 +02:00
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2020-03-21 01:37:05 +01:00
"See the Flowers example above."
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2017-09-15 21:39:09 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2019-03-24 02:06:29 +01:00
"## 11.\n",
2020-03-21 01:37:05 +01:00
"_Exercise: Go through TensorFlow's [Style Transfer tutorial](https://homl.info/styletuto). It is a fun way to generate art using Deep Learning._\n"
2017-05-05 15:22:45 +02:00
]
},
{
"cell_type": "markdown",
2020-04-06 09:13:12 +02:00
"metadata": {},
2017-05-05 15:22:45 +02:00
"source": [
2020-03-21 01:37:05 +01:00
"Simply open the Colab and follow its instructions."
2016-09-27 23:31:21 +02:00
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2021-05-25 02:07:29 +02:00
"version": "3.7.10"
2016-09-27 23:31:21 +02:00
},
"nav_menu": {},
"toc": {
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 6,
"toc_cell": false,
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
2020-04-06 09:13:12 +02:00
"nbformat_minor": 4
2016-09-27 23:31:21 +02:00
}