From 9abeffd21c2ddcc67029dc6628f8217c727ba031 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aur=C3=A9lien=20Geron?= Date: Tue, 2 Mar 2021 22:47:09 +1300 Subject: [PATCH] Remove work_in_progress directory --- work_in_progress/extra_capsnets-cn.ipynb | 2086 ----------------- work_in_progress/extra_capsnets.ipynb | 2066 ---------------- .../extra_tensorflow_reproducibility.ipynb | 842 ------- 3 files changed, 4994 deletions(-) delete mode 100644 work_in_progress/extra_capsnets-cn.ipynb delete mode 100644 work_in_progress/extra_capsnets.ipynb delete mode 100644 work_in_progress/extra_tensorflow_reproducibility.ipynb diff --git a/work_in_progress/extra_capsnets-cn.ipynb b/work_in_progress/extra_capsnets-cn.ipynb deleted file mode 100644 index 32b4753..0000000 --- a/work_in_progress/extra_capsnets-cn.ipynb +++ /dev/null @@ -1,2086 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 胶囊网络(CapsNets) " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "基于论文:[Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829),作者:Sara Sabour, Nicholas Frosst and Geoffrey E. Hinton (NIPS 2017)。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "部分启发来自于Huadong Liao的实现[CapsNet-TensorFlow](https://github.com/naturomics/CapsNet-Tensorflow)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 简介" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "观看 [视频](https://youtu.be/pPN8d0E3900)来理解胶囊网络背后的关键想法(大家可能看不到,因为youtube被墙了):" - ] - }, - { - "cell_type": "code", - "execution_count": 157, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.display import HTML\n", - "HTML(\"\"\"\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "你或许也需要观看[视频](https://youtu.be/2Kawrd5szHE),其展示了这个notebook的难点(大家可能看不到,因为youtube被墙了):" - ] - }, - { - "cell_type": "code", - "execution_count": 158, - "metadata": {}, - "outputs": [], - "source": [ - "HTML(\"\"\"\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Imports" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "为了绘制好看的图:" - ] - }, - { - "cell_type": "code", - "execution_count": 79, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import matplotlib\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "我们会用到 NumPy 和 TensorFlow:" - ] - }, - { - "cell_type": "code", - "execution_count": 80, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 可重复性" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "为了能够在不重新启动Jupyter Notebook Kernel的情况下重新运行本notebook,我们需要重置默认的计算图。" - ] - }, - { - "cell_type": "code", - "execution_count": 81, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "设置随机种子,以便于本notebook总是可以输出相同的输出:" - ] - }, - { - "cell_type": "code", - "execution_count": 82, - "metadata": {}, - "outputs": [], - "source": [ - "np.random.seed(42)\n", - "tf.set_random_seed(42)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 装载MNIST" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "是的,我知道,又是MNIST。但我们希望这个极具威力的想法可以工作在更大的数据集上,时间会说明一切。(译注:因为是Hinton吗,因为他老是对;-)?)" - ] - }, - { - "cell_type": "code", - "execution_count": 83, - "metadata": {}, - "outputs": [], - "source": [ - "from tensorflow.examples.tutorials.mnist import input_data\n", - "\n", - "mnist = input_data.read_data_sets(\"/tmp/data/\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "让我们看一下这些手写数字图像是什么样的:" - ] - }, - { - "cell_type": "code", - "execution_count": 84, - "metadata": {}, - "outputs": [], - "source": [ - "n_samples = 5\n", - "\n", - "plt.figure(figsize=(n_samples * 2, 3))\n", - "for index in range(n_samples):\n", - " plt.subplot(1, n_samples, index + 1)\n", - " sample_image = mnist.train.images[index].reshape(28, 28)\n", - " plt.imshow(sample_image, cmap=\"binary\")\n", - " plt.axis(\"off\")\n", - "\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "以及相应的标签:" - ] - }, - { - "cell_type": "code", - "execution_count": 85, - "metadata": {}, - "outputs": [], - "source": [ - "mnist.train.labels[:n_samples]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们建立一个胶囊网络来区分这些图像。这里有一个其总体的架构,享受一下ASCII字符的艺术吧! ;-)\n", - "注意:为了可读性,我摒弃了两种箭头:标签 → 掩盖,以及 输入的图像 → 重新构造损失。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```\n", - " 损 失\n", - " ↑\n", - " ┌─────────┴─────────┐\n", - " 标 签 → 边 际 损 失 重 新 构 造 损 失\n", - " ↑ ↑\n", - " 模 长 解 码 器\n", - " ↑ ↑ \n", - " 数 字 胶 囊 们 ────遮 盖─────┘\n", - " ↖↑↗ ↖↑↗ ↖↑↗\n", - " 主 胶 囊 们\n", - " ↑ \n", - " 输 入 的 图 像\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "我们打算从底层开始构建该计算图,然后逐步上移,左侧优先。让我们开始!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 输入图像" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "让我们通过为输入图像创建一个占位符作为起步,该输入图像具有28×28个像素,1个颜色通道=灰度。" - ] - }, - { - "cell_type": "code", - "execution_count": 86, - "metadata": {}, - "outputs": [], - "source": [ - "X = tf.placeholder(shape=[None, 28, 28, 1], dtype=tf.float32, name=\"X\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 主胶囊" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "第一层由32个特征映射组成,每个特征映射为6$\\times$6个胶囊,其中每个胶囊输出8维的激活向量:" - ] - }, - { - "cell_type": "code", - "execution_count": 87, - "metadata": {}, - "outputs": [], - "source": [ - "caps1_n_maps = 32\n", - "caps1_n_caps = caps1_n_maps * 6 * 6 # 1152 主胶囊们\n", - "caps1_n_dims = 8" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "为了计算它们的输出,我们首先应用两个常规的卷积层:" - ] - }, - { - "cell_type": "code", - "execution_count": 88, - "metadata": {}, - "outputs": [], - "source": [ - "conv1_params = {\n", - " \"filters\": 256,\n", - " \"kernel_size\": 9,\n", - " \"strides\": 1,\n", - " \"padding\": \"valid\",\n", - " \"activation\": tf.nn.relu,\n", - "}\n", - "\n", - "conv2_params = {\n", - " \"filters\": caps1_n_maps * caps1_n_dims, # 256 个卷积滤波器\n", - " \"kernel_size\": 9,\n", - " \"strides\": 2,\n", - " \"padding\": \"valid\",\n", - " \"activation\": tf.nn.relu\n", - "}" - ] - }, - { - "cell_type": "code", - "execution_count": 89, - "metadata": {}, - "outputs": [], - "source": [ - "conv1 = tf.layers.conv2d(X, name=\"conv1\", **conv1_params)\n", - "conv2 = tf.layers.conv2d(conv1, name=\"conv2\", **conv2_params)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "注意:由于我们使用一个尺寸为9的核,并且没有使用填充(出于某种原因,这就是`\"valid\"`的含义),该图像每经历一个卷积层就会缩减 $9-1=8$ 个像素(从 $28\\times 28$ 到 $20 \\times 20$,再从 $20\\times 20$ 到 $12\\times 12$),并且由于在第二个卷积层中使用了大小为2的步幅,那么该图像的大小就被除以2。这就是为什么我们最后会得到 $6\\times 6$ 的特征映射(feature map)。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "接着,我们重塑该输出以获得一组8D向量,用来表示主胶囊的输出。`conv2`的输出是一个数组,包含对于每个实例都有32×8=256个特征映射(feature map),其中每个特征映射为6×6。所以该输出的形状为 (_batch size_, 6, 6, 256)。我们想要把256分到32个8维向量中,可以通过使用重塑 (_batch size_, 6, 6, 32, 8)来达到目的。然而,由于首个胶囊层会被完全连接到下一个胶囊层,那么我们就可以简单地把它扁平成6×6的网格。这意味着我们只需要把它重塑成 (_batch size_, 6×6×32, 8) 即可。" - ] - }, - { - "cell_type": "code", - "execution_count": 90, - "metadata": {}, - "outputs": [], - "source": [ - "caps1_raw = tf.reshape(conv2, [-1, caps1_n_caps, caps1_n_dims],\n", - " name=\"caps1_raw\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在我们需要压缩这些向量。让我们来定义`squash()`函数,基于论文中的公式(1):\n", - "\n", - "$\\operatorname{squash}(\\mathbf{s}) = \\dfrac{\\|\\mathbf{s}\\|^2}{1 + \\|\\mathbf{s}\\|^2} \\dfrac{\\mathbf{s}}{\\|\\mathbf{s}\\|}$\n", - "\n", - "该`squash()`函数将会压缩所有的向量到给定的数组中,沿给定轴(默认情况为最后一个轴)。\n", - "\n", - "**当心**,这里有一个很讨厌的bug在等着你:当 $\\|\\mathbf{s}\\|=0$时,$\\|\\mathbf{s}\\|$ 为 undefined,这让我们不能直接使用 `tf.norm()`,否则会在训练过程中失败:如果一个向量为0,那么梯度就会是 `nan`,所以当优化器更新变量时,这些变量也会变为 `nan`,从那个时刻起,你就止步在 `nan` 那里了。解决的方法是手工实现norm,在计算的时候加上一个很小的值 epsilon:$\\|\\mathbf{s}\\| \\approx \\sqrt{\\sum\\limits_i{{s_i}^2}\\,\\,+ \\epsilon}$" - ] - }, - { - "cell_type": "code", - "execution_count": 91, - "metadata": {}, - "outputs": [], - "source": [ - "def squash(s, axis=-1, epsilon=1e-7, name=None):\n", - " with tf.name_scope(name, default_name=\"squash\"):\n", - " squared_norm = tf.reduce_sum(tf.square(s), axis=axis,\n", - " keep_dims=True)\n", - " safe_norm = tf.sqrt(squared_norm + epsilon)\n", - " squash_factor = squared_norm / (1. + squared_norm)\n", - " unit_vector = s / safe_norm\n", - " return squash_factor * unit_vector" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们应用这个函数以获得每个主胶囊$\\mathbf{u}_i$的输出:" - ] - }, - { - "cell_type": "code", - "execution_count": 92, - "metadata": {}, - "outputs": [], - "source": [ - "caps1_output = squash(caps1_raw, name=\"caps1_output\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "太棒了!我们有了首个胶囊层的输出了。不是很难,对吗?然后,计算下一层才是真正乐趣的开始(译注:好戏刚刚开始)。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 数字胶囊们" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "要计算数字胶囊们的输出,我们必须首先计算预测的输出向量(每个对应一个主胶囊/数字胶囊的对)。接着,我们就可以通过协议算法来运行路由。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 计算预测输出向量" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "该数字胶囊层包含10个胶囊(每个代表一个数字),每个胶囊16维:" - ] - }, - { - "cell_type": "code", - "execution_count": 93, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_n_caps = 10\n", - "caps2_n_dims = 16" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "对于在第一层里的每个胶囊 $i$,我们会在第二层中预测出每个胶囊 $j$ 的输出。为此,我们需要一个变换矩阵 $\\mathbf{W}_{i,j}$(每一对就是胶囊($i$, $j$) 中的一个),接着我们就可以计算预测的输出$\\hat{\\mathbf{u}}_{j|i} = \\mathbf{W}_{i,j} \\, \\mathbf{u}_i$(论文中的公式(2)的右半部分)。由于我们想要将8维向量变形为16维向量,因此每个变换向量$\\mathbf{W}_{i,j}$必须具备(16, 8)形状。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "要为每对胶囊 ($i$, $j$) 计算 $\\hat{\\mathbf{u}}_{j|i}$,我们会利用 `tf.matmul()` 函数的一个特点:你可能知道它可以让你进行两个矩阵相乘,但你可能不知道它可以让你进行更高维度的数组相乘。它将这些数组视作为数组矩阵,并且它会执行每项的矩阵相乘。例如,设有两个4D数组,每个包含2×3网格的矩阵。第一个包含矩阵为:$\\mathbf{A}, \\mathbf{B}, \\mathbf{C}, \\mathbf{D}, \\mathbf{E}, \\mathbf{F}$,第二个包含矩阵为:$\\mathbf{G}, \\mathbf{H}, \\mathbf{I}, \\mathbf{J}, \\mathbf{K}, \\mathbf{L}$。如果你使用 `tf.matmul`函数 对这两个4D数组进行相乘,你就会得到:\n", - "\n", - "$\n", - "\\pmatrix{\n", - "\\mathbf{A} & \\mathbf{B} & \\mathbf{C} \\\\\n", - "\\mathbf{D} & \\mathbf{E} & \\mathbf{F}\n", - "} \\times\n", - "\\pmatrix{\n", - "\\mathbf{G} & \\mathbf{H} & \\mathbf{I} \\\\\n", - "\\mathbf{J} & \\mathbf{K} & \\mathbf{L}\n", - "} = \\pmatrix{\n", - "\\mathbf{AG} & \\mathbf{BH} & \\mathbf{CI} \\\\\n", - "\\mathbf{DJ} & \\mathbf{EK} & \\mathbf{FL}\n", - "}\n", - "$" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "我们可以把这个函数用来计算每对胶囊 ($i$, $j$) 的 $\\hat{\\mathbf{u}}_{j|i}$,就像这样(回忆一下,有 6×6×32=1152 个胶囊在第一层,还有10个在第二层):\n", - "\n", - "$\n", - "\\pmatrix{\n", - " \\mathbf{W}_{1,1} & \\mathbf{W}_{1,2} & \\cdots & \\mathbf{W}_{1,10} \\\\\n", - " \\mathbf{W}_{2,1} & \\mathbf{W}_{2,2} & \\cdots & \\mathbf{W}_{2,10} \\\\\n", - " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", - " \\mathbf{W}_{1152,1} & \\mathbf{W}_{1152,2} & \\cdots & \\mathbf{W}_{1152,10}\n", - "} \\times\n", - "\\pmatrix{\n", - " \\mathbf{u}_1 & \\mathbf{u}_1 & \\cdots & \\mathbf{u}_1 \\\\\n", - " \\mathbf{u}_2 & \\mathbf{u}_2 & \\cdots & \\mathbf{u}_2 \\\\\n", - " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", - " \\mathbf{u}_{1152} & \\mathbf{u}_{1152} & \\cdots & \\mathbf{u}_{1152}\n", - "}\n", - "=\n", - "\\pmatrix{\n", - "\\hat{\\mathbf{u}}_{1|1} & \\hat{\\mathbf{u}}_{2|1} & \\cdots & \\hat{\\mathbf{u}}_{10|1} \\\\\n", - "\\hat{\\mathbf{u}}_{1|2} & \\hat{\\mathbf{u}}_{2|2} & \\cdots & \\hat{\\mathbf{u}}_{10|2} \\\\\n", - "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", - "\\hat{\\mathbf{u}}_{1|1152} & \\hat{\\mathbf{u}}_{2|1152} & \\cdots & \\hat{\\mathbf{u}}_{10|1152}\n", - "}\n", - "$\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "第一个数组的形状为 (1152, 10, 16, 8),第二个数组的形状为 (1152, 10, 8, 1)。注意到第二个数组必须包含10个对于向量$\\mathbf{u}_1$ 到 $\\mathbf{u}_{1152}$ 的完全拷贝。为了要创建这样的数组,我们将使用好用的 `tf.tile()` 函数,它可以让你创建包含很多基数组拷贝的数组,并且根据你想要的进行平铺。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "哦,稍等!我们还忘了一个维度:_batch size(批量/批次的大小)_。假设我们要给胶囊网络提供50张图片,那么该网络需要同时作出这50张图片的预测。所以第一个数组的形状为 (50, 1152, 10, 16, 8),而第二个数组的形状为 (50, 1152, 10, 8, 1)。第一层的胶囊实际上已经对于所有的50张图像作出预测,所以第二个数组没有问题,但对于第一个数组,我们需要使用 `tf.tile()` 让其具有50个拷贝的变换矩阵。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "好了,让我们开始,创建一个可训练的变量,形状为 (1, 1152, 10, 16, 8) 可以用来持有所有的变换矩阵。第一个维度的大小为1,可以让这个数组更容易的平铺。我们使用标准差为0.1的常规分布,随机初始化这个变量。" - ] - }, - { - "cell_type": "code", - "execution_count": 94, - "metadata": {}, - "outputs": [], - "source": [ - "init_sigma = 0.1\n", - "\n", - "W_init = tf.random_normal(\n", - " shape=(1, caps1_n_caps, caps2_n_caps, caps2_n_dims, caps1_n_dims),\n", - " stddev=init_sigma, dtype=tf.float32, name=\"W_init\")\n", - "W = tf.Variable(W_init, name=\"W\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在我们可以通过每个实例重复一次`W`来创建第一个数组:" - ] - }, - { - "cell_type": "code", - "execution_count": 95, - "metadata": {}, - "outputs": [], - "source": [ - "batch_size = tf.shape(X)[0]\n", - "W_tiled = tf.tile(W, [batch_size, 1, 1, 1, 1], name=\"W_tiled\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "就是这样!现在转到第二个数组。如前所述,我们需要创建一个数组,形状为 (_batch size_, 1152, 10, 8, 1),包含第一层胶囊的输出,重复10次(一次一个数字,在第三个维度,即axis=2)。 `caps1_output` 数组的形状为 (_batch size_, 1152, 8),所以我们首先需要展开两次来获得形状 (_batch size_, 1152, 1, 8, 1) 的数组,接着在第三维度重复它10次。" - ] - }, - { - "cell_type": "code", - "execution_count": 96, - "metadata": {}, - "outputs": [], - "source": [ - "caps1_output_expanded = tf.expand_dims(caps1_output, -1,\n", - " name=\"caps1_output_expanded\")\n", - "caps1_output_tile = tf.expand_dims(caps1_output_expanded, 2,\n", - " name=\"caps1_output_tile\")\n", - "caps1_output_tiled = tf.tile(caps1_output_tile, [1, 1, caps2_n_caps, 1, 1],\n", - " name=\"caps1_output_tiled\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "让我们检查以下第一个数组的形状:" - ] - }, - { - "cell_type": "code", - "execution_count": 97, - "metadata": {}, - "outputs": [], - "source": [ - "W_tiled" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "很好,现在第二个:" - ] - }, - { - "cell_type": "code", - "execution_count": 98, - "metadata": {}, - "outputs": [], - "source": [ - "caps1_output_tiled" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "好!现在,为了要获得所有的预测好的输出向量 $\\hat{\\mathbf{u}}_{j|i}$,我们只需要将这两个数组使用`tf.malmul()`函数进行相乘,就像前面解释的那样:" - ] - }, - { - "cell_type": "code", - "execution_count": 99, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_predicted = tf.matmul(W_tiled, caps1_output_tiled,\n", - " name=\"caps2_predicted\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "让我们检查一下形状:" - ] - }, - { - "cell_type": "code", - "execution_count": 100, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_predicted" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "非常好,对于在该批次(我们还不知道批次的大小,使用 \"?\" 替代)中的每个实例以及对于每对第一和第二层的胶囊(1152×10),我们都有一个16D预测的输出列向量 (16×1)。我们已经准备好应用 根据协议算法的路由 了!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 根据协议的路由" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "首先,让我们初始化原始的路由权重 $b_{i,j}$ 到0:" - ] - }, - { - "cell_type": "code", - "execution_count": 101, - "metadata": {}, - "outputs": [], - "source": [ - "raw_weights = tf.zeros([batch_size, caps1_n_caps, caps2_n_caps, 1, 1],\n", - " dtype=np.float32, name=\"raw_weights\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "我们马上将会看到为什么我们需要最后两维大小为1的维度。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 第一轮" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "首先,让我们应用 sofmax 函数来计算路由权重,$\\mathbf{c}_{i} = \\operatorname{softmax}(\\mathbf{b}_i)$ (论文中的公式(3)):" - ] - }, - { - "cell_type": "code", - "execution_count": 102, - "metadata": {}, - "outputs": [], - "source": [ - "routing_weights = tf.nn.softmax(raw_weights, dim=2, name=\"routing_weights\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们为每个第二层胶囊计算其预测输出向量的加权,$\\mathbf{s}_j = \\sum\\limits_{i}{c_{i,j}\\hat{\\mathbf{u}}_{j|i}}$ (论文公式(2)的左半部分):" - ] - }, - { - "cell_type": "code", - "execution_count": 103, - "metadata": {}, - "outputs": [], - "source": [ - "weighted_predictions = tf.multiply(routing_weights, caps2_predicted,\n", - " name=\"weighted_predictions\")\n", - "weighted_sum = tf.reduce_sum(weighted_predictions, axis=1, keep_dims=True,\n", - " name=\"weighted_sum\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "这里有几个重要的细节需要注意:\n", - "* 要执行元素级别矩阵相乘(也称为Hadamard积,记作$\\circ$),我们需要使用`tf.multiply()` 函数。它要求 `routing_weights` 和 `caps2_predicted` 具有相同的秩,这就是为什么前面我们在 `routing_weights` 上添加了两个额外的维度。\n", - "* `routing_weights`的形状为 (_batch size_, 1152, 10, 1, 1) 而 `caps2_predicted` 的形状为 (_batch size_, 1152, 10, 16, 1)。由于它们在第四个维度上不匹配(1 _vs_ 16),`tf.multiply()` 自动地在 `routing_weights` 该维度上 _广播_ 了16次。如果你不熟悉广播,这里有一个简单的例子,也许可以帮上忙:\n", - "\n", - " $ \\pmatrix{1 & 2 & 3 \\\\ 4 & 5 & 6} \\circ \\pmatrix{10 & 100 & 1000} = \\pmatrix{1 & 2 & 3 \\\\ 4 & 5 & 6} \\circ \\pmatrix{10 & 100 & 1000 \\\\ 10 & 100 & 1000} = \\pmatrix{10 & 200 & 3000 \\\\ 40 & 500 & 6000} $" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "最后,让我们应用squash函数到在协议算法的第一次迭代迭代结束时获取第二层胶囊的输出上,$\\mathbf{v}_j = \\operatorname{squash}(\\mathbf{s}_j)$:" - ] - }, - { - "cell_type": "code", - "execution_count": 104, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_round_1 = squash(weighted_sum, axis=-2,\n", - " name=\"caps2_output_round_1\")" - ] - }, - { - "cell_type": "code", - "execution_count": 105, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_round_1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "好!我们对于每个实例有了10个16D输出向量,就像我们期待的那样。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 第二轮" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "首先,让我们衡量一下,每个预测向量 $\\hat{\\mathbf{u}}_{j|i}$ 对于实际输出向量 $\\mathbf{v}_j$ 之间到底有多接近,这是通过它们的标量乘积 $\\hat{\\mathbf{u}}_{j|i} \\cdot \\mathbf{v}_j$来完成的。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "* 快速数学上的提示:如果 $\\vec{a}$ and $\\vec{b}$ 是长度相等的向量,并且 $\\mathbf{a}$ 和 $\\mathbf{b}$ 是相应的列向量(如,只有一列的矩阵),那么 $\\mathbf{a}^T \\mathbf{b}$ (即 $\\mathbf{a}$的转置和 $\\mathbf{b}$的矩阵相乘)为一个1×1的矩阵,包含两个向量$\\vec{a}\\cdot\\vec{b}$的标量积。在机器学习中,我们通常将向量表示为列向量,所以当我们探讨关于计算标量积 $\\hat{\\mathbf{u}}_{j|i} \\cdot \\mathbf{v}_j$的时候,其实意味着计算 ${\\hat{\\mathbf{u}}_{j|i}}^T \\mathbf{v}_j$。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "由于我们需要对每个实例和每个第一和第二层的胶囊对$(i, j)$,计算标量积 $\\hat{\\mathbf{u}}_{j|i} \\cdot \\mathbf{v}_j$ ,我们将再次利用`tf.matmul()`可以同时计算多个矩阵相乘的特点。这就要求使用 `tf.tile()`来使得所有维度都匹配(除了倒数第二个),就像我们之前所作的那样。所以让我们查看`caps2_predicted`的形状,因为它持有对每个实例和每个胶囊对的所有预测输出向量$\\hat{\\mathbf{u}}_{j|i}$。" - ] - }, - { - "cell_type": "code", - "execution_count": 106, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_predicted" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们查看 `caps2_output_round_1` 的形状,它有10个输出向量,每个16D,对应每个实例:" - ] - }, - { - "cell_type": "code", - "execution_count": 107, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_round_1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "为了让这些形状相匹配,我们只需要在第二个维度平铺 `caps2_output_round_1` 1152次(一次一个主胶囊):" - ] - }, - { - "cell_type": "code", - "execution_count": 108, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_round_1_tiled = tf.tile(\n", - " caps2_output_round_1, [1, caps1_n_caps, 1, 1, 1],\n", - " name=\"caps2_output_round_1_tiled\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在我们已经准备好可以调用 `tf.matmul()`(注意还需要告知它在第一个数组中的矩阵进行转置,让${\\hat{\\mathbf{u}}_{j|i}}^T$ 来替代 $\\hat{\\mathbf{u}}_{j|i}$):" - ] - }, - { - "cell_type": "code", - "execution_count": 109, - "metadata": {}, - "outputs": [], - "source": [ - "agreement = tf.matmul(caps2_predicted, caps2_output_round_1_tiled,\n", - " transpose_a=True, name=\"agreement\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "我们现在可以通过对于刚计算的标量积$\\hat{\\mathbf{u}}_{j|i} \\cdot \\mathbf{v}_j$进行简单相加,来进行原始路由权重 $b_{i,j}$ 的更新:$b_{i,j} \\gets b_{i,j} + \\hat{\\mathbf{u}}_{j|i} \\cdot \\mathbf{v}_j$ (参见论文过程1中第7步)" - ] - }, - { - "cell_type": "code", - "execution_count": 110, - "metadata": {}, - "outputs": [], - "source": [ - "raw_weights_round_2 = tf.add(raw_weights, agreement,\n", - " name=\"raw_weights_round_2\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "第二轮的其余部分和第一轮相同:" - ] - }, - { - "cell_type": "code", - "execution_count": 111, - "metadata": {}, - "outputs": [], - "source": [ - "routing_weights_round_2 = tf.nn.softmax(raw_weights_round_2,\n", - " dim=2,\n", - " name=\"routing_weights_round_2\")\n", - "weighted_predictions_round_2 = tf.multiply(routing_weights_round_2,\n", - " caps2_predicted,\n", - " name=\"weighted_predictions_round_2\")\n", - "weighted_sum_round_2 = tf.reduce_sum(weighted_predictions_round_2,\n", - " axis=1, keep_dims=True,\n", - " name=\"weighted_sum_round_2\")\n", - "caps2_output_round_2 = squash(weighted_sum_round_2,\n", - " axis=-2,\n", - " name=\"caps2_output_round_2\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "我们可以继续更多轮,只需要重复第二轮中相同的步骤,但为了保持简洁,我们就到这里:" - ] - }, - { - "cell_type": "code", - "execution_count": 112, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output = caps2_output_round_2" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### 静态还是动态循环?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "在上面的代码中,我们在TensorFlow计算图中为协调算法的每一轮路由创建了不同的操作。换句话说,它是一个静态循环。\n", - "\n", - "当然,与其拷贝/粘贴这些代码几次,通常在python中,我们可以写一个 `for` 循环,但这不会改变这样一个事实,那就是在计算图中最后对于每个路由迭代都会有不同的操作。这其实是可接受的,因为我们通常不会具有超过5次路由迭代,所以计算图不会成长得太大。\n", - "\n", - "然而,你可能更倾向于在TensorFlow计算图自身实现路由循环,而不是使用Python的`for`循环。为了要做到这点,将需要使用TensorFlow的 `tf.while_loop()` 函数。这种方式,所有的路由循环都可以重用在该计算图中的相同的操作,这被称为动态循环。\n", - "\n", - "例如,这里是如何构建一个小循环用来计算1到100的平方和:" - ] - }, - { - "cell_type": "code", - "execution_count": 113, - "metadata": {}, - "outputs": [], - "source": [ - "def condition(input, counter):\n", - " return tf.less(counter, 100)\n", - "\n", - "def loop_body(input, counter):\n", - " output = tf.add(input, tf.square(counter))\n", - " return output, tf.add(counter, 1)\n", - "\n", - "with tf.name_scope(\"compute_sum_of_squares\"):\n", - " counter = tf.constant(1)\n", - " sum_of_squares = tf.constant(0)\n", - "\n", - " result = tf.while_loop(condition, loop_body, [sum_of_squares, counter])\n", - " \n", - "\n", - "with tf.Session() as sess:\n", - " print(sess.run(result))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "如你所见, `tf.while_loop()` 函数期望的循环条件和循环体由两个函数来提供。这些函数仅会被TensorFlow调用一次,在构建计算图阶段,_不_ 在执行计算图的时候。 `tf.while_loop()` 函数将由 `condition()` 和 `loop_body()` 创建的计算图碎片同一些用来创建循环的额外操作缝制在一起。\n", - "\n", - "还注意到在训练的过程中,TensorFlow将自动地通过循环处理反向传播,因此你不需要担心这些事情。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "当然,我们也可以一行代码搞定!;)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sum([i**2 for i in range(1, 100 + 1)])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "开个玩笑,抛开缩减计算图的大小不说,使用动态循环而不是静态循环能够帮助减少很多的GPU RAM的使用(如果你使用GPU的话)。事实上,如果但调用 `tf.while_loop()` 函数时,你设置了 `swap_memory=True` ,TensorFlow会在每个循环的迭代上自动检查GPU RAM使用情况,并且它会照顾到在GPU和CPU之间swapping内存时的需求。既然CPU的内存便宜量又大,相对GPU RAM而言,这就很有意义了。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 估算的分类概率(模长)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "输出向量的模长代表了分类的概率,所以我们就可以使用`tf.norm()`来计算它们,但由于我们在讨论`squash`函数时看到的那样,可能会有风险,所以我们创建了自己的 `safe_norm()` 函数来进行替代:" - ] - }, - { - "cell_type": "code", - "execution_count": 114, - "metadata": {}, - "outputs": [], - "source": [ - "def safe_norm(s, axis=-1, epsilon=1e-7, keep_dims=False, name=None):\n", - " with tf.name_scope(name, default_name=\"safe_norm\"):\n", - " squared_norm = tf.reduce_sum(tf.square(s), axis=axis,\n", - " keep_dims=keep_dims)\n", - " return tf.sqrt(squared_norm + epsilon)" - ] - }, - { - "cell_type": "code", - "execution_count": 115, - "metadata": {}, - "outputs": [], - "source": [ - "y_proba = safe_norm(caps2_output, axis=-2, name=\"y_proba\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "要预测每个实例的分类,我们只需要选择那个具有最高估算概率的就可以了。要做到这点,让我们通过使用 `tf.argmax()` 来达到我们的目的:" - ] - }, - { - "cell_type": "code", - "execution_count": 116, - "metadata": {}, - "outputs": [], - "source": [ - "y_proba_argmax = tf.argmax(y_proba, axis=2, name=\"y_proba\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "让我们检查一下 `y_proba_argmax` 的形状:" - ] - }, - { - "cell_type": "code", - "execution_count": 117, - "metadata": {}, - "outputs": [], - "source": [ - "y_proba_argmax" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "这正好是我们想要的:对于每一个实例,我们现在有了最长的输出向量的索引。让我们用 `tf.squeeze()` 来移除后两个大小为1的维度。这就给出了该胶囊网络对于每个实例的预测分类:" - ] - }, - { - "cell_type": "code", - "execution_count": 118, - "metadata": {}, - "outputs": [], - "source": [ - "y_pred = tf.squeeze(y_proba_argmax, axis=[1,2], name=\"y_pred\")" - ] - }, - { - "cell_type": "code", - "execution_count": 119, - "metadata": {}, - "outputs": [], - "source": [ - "y_pred" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "好了,我们现在准备好开始定义训练操作,从损失开始。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 标签" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "首先,我们将需要一个对于标签的占位符:" - ] - }, - { - "cell_type": "code", - "execution_count": 120, - "metadata": {}, - "outputs": [], - "source": [ - "y = tf.placeholder(shape=[None], dtype=tf.int64, name=\"y\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 边际损失" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "论文使用了一个特殊的边际损失,来使得在每个图像中侦测多于两个以上的数字成为可能:\n", - "\n", - "$ L_k = T_k \\max(0, m^{+} - \\|\\mathbf{v}_k\\|)^2 + \\lambda (1 - T_k) \\max(0, \\|\\mathbf{v}_k\\| - m^{-})^2$\n", - "\n", - "* $T_k$ 等于1,如果分类$k$的数字出现,否则为0.\n", - "* 在论文中,$m^{+} = 0.9$, $m^{-} = 0.1$,并且$\\lambda = 0.5$\n", - "* 注意在视频15:47秒处有个错误:应该是最大化操作,而不是norms,被平方。不好意思。" - ] - }, - { - "cell_type": "code", - "execution_count": 121, - "metadata": {}, - "outputs": [], - "source": [ - "m_plus = 0.9\n", - "m_minus = 0.1\n", - "lambda_ = 0.5" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "既然 `y` 将包含数字分类,从0到9,要对于每个实例和每个分类获取 $T_k$ ,我们只需要使用 `tf.one_hot()` 函数即可:" - ] - }, - { - "cell_type": "code", - "execution_count": 122, - "metadata": {}, - "outputs": [], - "source": [ - "T = tf.one_hot(y, depth=caps2_n_caps, name=\"T\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "一个小例子应该可以说明这到底做了什么:" - ] - }, - { - "cell_type": "code", - "execution_count": 123, - "metadata": {}, - "outputs": [], - "source": [ - "with tf.Session():\n", - " print(T.eval(feed_dict={y: np.array([0, 1, 2, 3, 9])}))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们对于每个输出胶囊和每个实例计算输出向量。首先,让我们验证 `caps2_output` 形状:" - ] - }, - { - "cell_type": "code", - "execution_count": 124, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "这些16D向量位于第二到最后的维度,因此让我们在 `axis=-2` 使用 `safe_norm()` 函数:" - ] - }, - { - "cell_type": "code", - "execution_count": 125, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_norm = safe_norm(caps2_output, axis=-2, keep_dims=True,\n", - " name=\"caps2_output_norm\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们计算 $\\max(0, m^{+} - \\|\\mathbf{v}_k\\|)^2$,并且重塑其结果以获得一个简单的具有形状(_batch size_, 10)的矩阵:" - ] - }, - { - "cell_type": "code", - "execution_count": 126, - "metadata": {}, - "outputs": [], - "source": [ - "present_error_raw = tf.square(tf.maximum(0., m_plus - caps2_output_norm),\n", - " name=\"present_error_raw\")\n", - "present_error = tf.reshape(present_error_raw, shape=(-1, 10),\n", - " name=\"present_error\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "接下来让我们计算 $\\max(0, \\|\\mathbf{v}_k\\| - m^{-})^2$ 并且重塑成(_batch size_,10):" - ] - }, - { - "cell_type": "code", - "execution_count": 127, - "metadata": {}, - "outputs": [], - "source": [ - "absent_error_raw = tf.square(tf.maximum(0., caps2_output_norm - m_minus),\n", - " name=\"absent_error_raw\")\n", - "absent_error = tf.reshape(absent_error_raw, shape=(-1, 10),\n", - " name=\"absent_error\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "我们准备好为每个实例和每个数字计算损失:" - ] - }, - { - "cell_type": "code", - "execution_count": 128, - "metadata": {}, - "outputs": [], - "source": [ - "L = tf.add(T * present_error, lambda_ * (1.0 - T) * absent_error,\n", - " name=\"L\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在我们可以把对于每个实例的数字损失进行相加($L_0 + L_1 + \\cdots + L_9$),并且在所有的实例中计算均值。这给予我们最后的边际损失:" - ] - }, - { - "cell_type": "code", - "execution_count": 129, - "metadata": {}, - "outputs": [], - "source": [ - "margin_loss = tf.reduce_mean(tf.reduce_sum(L, axis=1), name=\"margin_loss\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 重新构造" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们添加一个解码器网络,其位于胶囊网络之上。它是一个常规的3层全连接神经网络,其将基于胶囊网络的输出,学习重新构建输入图像。这将强制胶囊网络保留所有需要重新构造数字的信息,贯穿整个网络。该约束正则化了模型:它减少了训练数据集过拟合的风险,并且它有助于泛化到新的数字。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 遮盖" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "论文中提及了在训练的过程中,与其发送所有的胶囊网络的输出到解码器网络,不如仅发送与目标数字对应的胶囊输出向量。所有其余输出向量必须被遮盖掉。在推断的时候,我们必须遮盖所有输出向量,除了最长的那个。即,预测的数字相关的那个。你可以查看论文中的图2(视频中的18:15):所有的输出向量都被遮盖掉了,除了那个重新构造目标的输出向量。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "我们需要一个占位符来告诉TensorFlow,是否我们想要遮盖这些输出向量,根据标签 (`True`) 或 预测 (`False`, 默认):" - ] - }, - { - "cell_type": "code", - "execution_count": 130, - "metadata": {}, - "outputs": [], - "source": [ - "mask_with_labels = tf.placeholder_with_default(False, shape=(),\n", - " name=\"mask_with_labels\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们使用 `tf.cond()` 来定义重新构造的目标,如果 `mask_with_labels` 为 `True` 就是标签 `y`,否则就是 `y_pred`。" - ] - }, - { - "cell_type": "code", - "execution_count": 131, - "metadata": {}, - "outputs": [], - "source": [ - "reconstruction_targets = tf.cond(mask_with_labels, # 条件\n", - " lambda: y, # if True\n", - " lambda: y_pred, # if False\n", - " name=\"reconstruction_targets\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "注意到 `tf.cond()` 函数期望的是通过函数传递而来的if-True 和 if-False张量:这些函数会在计算图构造阶段(而非执行阶段)被仅调用一次,和`tf.while_loop()`类似。这可以允许TensorFlow添加必要操作,以此处理if-True 和 if-False 张量的条件评估。然而,在这里,张量 `y` 和 `y_pred` 已经在我们调用 `tf.cond()` 时被创建,不幸地是TensorFlow会认为 `y` 和 `y_pred` 是 `reconstruction_targets` 张量的依赖项。虽然,`reconstruction_targets` 张量最终是会计算出正确值,但是:\n", - "1. 无论何时,我们评估某个依赖于 `reconstruction_targets` 的张量,`y_pred` 张量也会被评估(即便 `mask_with_layers` 为 `True`)。这不是什么大问题,因为,在训练阶段计算`y_pred` 张量不会添加额外的开销,而且不管怎么样我们都需要它来计算边际损失。并且在测试中,如果我们做的是分类,我们就不需要重新构造,所以`reconstruction_grpha`根本不会被评估。\n", - "2. 我们总是需要为`y`占位符递送一个值(即使`mask_with_layers`为`False`)。这就有点讨厌了,当然我们可以传递一个空数组,因为TensorFlow无论如何都不会用到它(就是当检查依赖项的时候还不知道)。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在我们有了重新构建的目标,让我们创建重新构建的遮盖。对于目标类型它应该为1.0,对于其他类型应该为0.0。为此我们就可以使用`tf.one_hot()`函数:" - ] - }, - { - "cell_type": "code", - "execution_count": 132, - "metadata": {}, - "outputs": [], - "source": [ - "reconstruction_mask = tf.one_hot(reconstruction_targets,\n", - " depth=caps2_n_caps,\n", - " name=\"reconstruction_mask\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "让我们检查一下 `reconstruction_mask`的形状:" - ] - }, - { - "cell_type": "code", - "execution_count": 133, - "metadata": {}, - "outputs": [], - "source": [ - "reconstruction_mask" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "和 `caps2_output` 的形状比对一下:" - ] - }, - { - "cell_type": "code", - "execution_count": 134, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "嗯,它的形状是 (_batch size_, 1, 10, 16, 1)。我们想要将它和 `reconstruction_mask` 进行相乘,但 `reconstruction_mask`的形状是(_batch size_, 10)。我们必须对此进行reshape成 (_batch size_, 1, 10, 1, 1) 来满足相乘的要求:" - ] - }, - { - "cell_type": "code", - "execution_count": 135, - "metadata": {}, - "outputs": [], - "source": [ - "reconstruction_mask_reshaped = tf.reshape(\n", - " reconstruction_mask, [-1, 1, caps2_n_caps, 1, 1],\n", - " name=\"reconstruction_mask_reshaped\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "最终我们可以应用 遮盖 了!" - ] - }, - { - "cell_type": "code", - "execution_count": 136, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_masked = tf.multiply(\n", - " caps2_output, reconstruction_mask_reshaped,\n", - " name=\"caps2_output_masked\")" - ] - }, - { - "cell_type": "code", - "execution_count": 137, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_masked" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "最后还有一个重塑操作被用来扁平化解码器的输入:" - ] - }, - { - "cell_type": "code", - "execution_count": 138, - "metadata": {}, - "outputs": [], - "source": [ - "decoder_input = tf.reshape(caps2_output_masked,\n", - " [-1, caps2_n_caps * caps2_n_dims],\n", - " name=\"decoder_input\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "这给予我们一个形状是 (_batch size_, 160) 的数组:" - ] - }, - { - "cell_type": "code", - "execution_count": 139, - "metadata": {}, - "outputs": [], - "source": [ - "decoder_input" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 解码器" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们来构建该解码器。它非常简单:两个密集(全连接)ReLU 层紧跟这一个密集输出sigmoid层:" - ] - }, - { - "cell_type": "code", - "execution_count": 140, - "metadata": {}, - "outputs": [], - "source": [ - "n_hidden1 = 512\n", - "n_hidden2 = 1024\n", - "n_output = 28 * 28" - ] - }, - { - "cell_type": "code", - "execution_count": 141, - "metadata": {}, - "outputs": [], - "source": [ - "with tf.name_scope(\"decoder\"):\n", - " hidden1 = tf.layers.dense(decoder_input, n_hidden1,\n", - " activation=tf.nn.relu,\n", - " name=\"hidden1\")\n", - " hidden2 = tf.layers.dense(hidden1, n_hidden2,\n", - " activation=tf.nn.relu,\n", - " name=\"hidden2\")\n", - " decoder_output = tf.layers.dense(hidden2, n_output,\n", - " activation=tf.nn.sigmoid,\n", - " name=\"decoder_output\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 重新构造的损失" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们计算重新构造的损失。它不过是输入图像和重新构造过的图像的平方差。" - ] - }, - { - "cell_type": "code", - "execution_count": 142, - "metadata": {}, - "outputs": [], - "source": [ - "X_flat = tf.reshape(X, [-1, n_output], name=\"X_flat\")\n", - "squared_difference = tf.square(X_flat - decoder_output,\n", - " name=\"squared_difference\")\n", - "reconstruction_loss = tf.reduce_mean(squared_difference,\n", - " name=\"reconstruction_loss\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 最终损失" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "最终损失为边际损失和重新构造损失(使用放大因子0.0005确保边际损失在训练过程中处于支配地位)的和:" - ] - }, - { - "cell_type": "code", - "execution_count": 143, - "metadata": {}, - "outputs": [], - "source": [ - "alpha = 0.0005\n", - "\n", - "loss = tf.add(margin_loss, alpha * reconstruction_loss, name=\"loss\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 最后润色" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 精度" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "为了衡量模型的精度,我们需要计算实例被正确分类的数量。为此,我们可以简单地比较`y`和`y_pred`,并将比较结果的布尔值转换成float32(0.0代表False,1.0代表True),并且计算所有实例的均值:" - ] - }, - { - "cell_type": "code", - "execution_count": 144, - "metadata": {}, - "outputs": [], - "source": [ - "correct = tf.equal(y, y_pred, name=\"correct\")\n", - "accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 训练操作" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "论文中提到作者使用Adam优化器,使用了TensorFlow的默认参数:" - ] - }, - { - "cell_type": "code", - "execution_count": 145, - "metadata": {}, - "outputs": [], - "source": [ - "optimizer = tf.train.AdamOptimizer()\n", - "training_op = optimizer.minimize(loss, name=\"training_op\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 初始化和Saver" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "让我们来添加变量初始器,还要加一个 `Saver`:" - ] - }, - { - "cell_type": "code", - "execution_count": 146, - "metadata": {}, - "outputs": [], - "source": [ - "init = tf.global_variables_initializer()\n", - "saver = tf.train.Saver()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "还有... 我们已经完成了构造阶段!花点时间可以庆祝🎉一下。:)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 训练" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "训练我们的胶囊网络是非常标准的。为了简化,我们不需要作任何花哨的超参调整、丢弃等,我们只是一遍又一遍运行训练操作,显示损失,并且在每个epoch结束的时候,根据验证集衡量一下精度,显示出来,并且保存模型,当然,验证损失是目前为止最低的模型才会被保存(这是一种基本的实现早停的方法,而不需要实际上打断训练的进程)。我们希望代码能够自释,但这里应该有几个细节值得注意:\n", - "* 如果某个checkpoint文件已经存在,那么它会被恢复(这可以让训练被打断,再从最新的checkpoint中进行恢复成为可能),\n", - "* 我们不要忘记在训练的时候传递`mask_with_labels=True`,\n", - "* 在测试的过程中,我们可以让`mask_with_labels`默认为`False`(但是我们仍然需要传递标签,因为它们在计算精度的时候会被用到),\n", - "* 通过 `mnist.train.next_batch()`装载的图片会被表示为类型 `float32` 数组,其形状为\\[784\\],但输入的占位符`X`期望的是一个`float32`数组,其形状为 \\[28, 28, 1\\],所以在我们把送到模型之前,必须把这些图像进行重塑,\n", - "* 我们在整个完整的验证集上对模型的损失和精度进行评估。为了能够看到进度和支持那些并没有太多RAM的系统,评估损失和精度的代码在一个批次上执行一次,并且最后再计算平均损失和平均精度。\n", - "\n", - "*警告*:如果你没有GPU,训练将会非常漫长(至少几个小时)。当使用GPU,它应该对于每个epoch只需要几分钟(如,在NVidia GeForce GTX 1080Ti上只需要6分钟)。" - ] - }, - { - "cell_type": "code", - "execution_count": 147, - "metadata": {}, - "outputs": [], - "source": [ - "n_epochs = 10\n", - "batch_size = 50\n", - "restore_checkpoint = True\n", - "\n", - "n_iterations_per_epoch = mnist.train.num_examples // batch_size\n", - "n_iterations_validation = mnist.validation.num_examples // batch_size\n", - "best_loss_val = np.infty\n", - "checkpoint_path = \"./my_capsule_network\"\n", - "\n", - "with tf.Session() as sess:\n", - " if restore_checkpoint and tf.train.checkpoint_exists(checkpoint_path):\n", - " saver.restore(sess, checkpoint_path)\n", - " else:\n", - " init.run()\n", - "\n", - " for epoch in range(n_epochs):\n", - " for iteration in range(1, n_iterations_per_epoch + 1):\n", - " X_batch, y_batch = mnist.train.next_batch(batch_size)\n", - " # 运行训练操作并且评估损失:\n", - " _, loss_train = sess.run(\n", - " [training_op, loss],\n", - " feed_dict={X: X_batch.reshape([-1, 28, 28, 1]),\n", - " y: y_batch,\n", - " mask_with_labels: True})\n", - " print(\"\\rIteration: {}/{} ({:.1f}%) Loss: {:.5f}\".format(\n", - " iteration, n_iterations_per_epoch,\n", - " iteration * 100 / n_iterations_per_epoch,\n", - " loss_train),\n", - " end=\"\")\n", - "\n", - " # 在每个epoch之后,\n", - " # 衡量验证损失和精度:\n", - " loss_vals = []\n", - " acc_vals = []\n", - " for iteration in range(1, n_iterations_validation + 1):\n", - " X_batch, y_batch = mnist.validation.next_batch(batch_size)\n", - " loss_val, acc_val = sess.run(\n", - " [loss, accuracy],\n", - " feed_dict={X: X_batch.reshape([-1, 28, 28, 1]),\n", - " y: y_batch})\n", - " loss_vals.append(loss_val)\n", - " acc_vals.append(acc_val)\n", - " print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(\n", - " iteration, n_iterations_validation,\n", - " iteration * 100 / n_iterations_validation),\n", - " end=\" \" * 10)\n", - " loss_val = np.mean(loss_vals)\n", - " acc_val = np.mean(acc_vals)\n", - " print(\"\\rEpoch: {} Val accuracy: {:.4f}% Loss: {:.6f}{}\".format(\n", - " epoch + 1, acc_val * 100, loss_val,\n", - " \" (improved)\" if loss_val < best_loss_val else \"\"))\n", - "\n", - " # 如果有进步就保存模型:\n", - " if loss_val < best_loss_val:\n", - " save_path = saver.save(sess, checkpoint_path)\n", - " best_loss_val = loss_val" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "我们在训练结束后,在验证集上达到了99.32%的精度,只用了5个epoches,看上去不错。现在让我们将模型运用到测试集上。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 评估" - ] - }, - { - "cell_type": "code", - "execution_count": 148, - "metadata": {}, - "outputs": [], - "source": [ - "n_iterations_test = mnist.test.num_examples // batch_size\n", - "\n", - "with tf.Session() as sess:\n", - " saver.restore(sess, checkpoint_path)\n", - "\n", - " loss_tests = []\n", - " acc_tests = []\n", - " for iteration in range(1, n_iterations_test + 1):\n", - " X_batch, y_batch = mnist.test.next_batch(batch_size)\n", - " loss_test, acc_test = sess.run(\n", - " [loss, accuracy],\n", - " feed_dict={X: X_batch.reshape([-1, 28, 28, 1]),\n", - " y: y_batch})\n", - " loss_tests.append(loss_test)\n", - " acc_tests.append(acc_test)\n", - " print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(\n", - " iteration, n_iterations_test,\n", - " iteration * 100 / n_iterations_test),\n", - " end=\" \" * 10)\n", - " loss_test = np.mean(loss_tests)\n", - " acc_test = np.mean(acc_tests)\n", - " print(\"\\rFinal test accuracy: {:.4f}% Loss: {:.6f}\".format(\n", - " acc_test * 100, loss_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "我们在测试集上达到了99.21%的精度。相当棒!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 预测" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们进行一些预测!首先从测试集确定一些图片,接着开始一个session,恢复已经训练好的模型,评估`cap2_output`来获得胶囊网络的输出向量,`decoder_output`来重新构造,用`y_pred`来获得类型预测:" - ] - }, - { - "cell_type": "code", - "execution_count": 149, - "metadata": {}, - "outputs": [], - "source": [ - "n_samples = 5\n", - "\n", - "sample_images = mnist.test.images[:n_samples].reshape([-1, 28, 28, 1])\n", - "\n", - "with tf.Session() as sess:\n", - " saver.restore(sess, checkpoint_path)\n", - " caps2_output_value, decoder_output_value, y_pred_value = sess.run(\n", - " [caps2_output, decoder_output, y_pred],\n", - " feed_dict={X: sample_images,\n", - " y: np.array([], dtype=np.int64)})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "注意:我们传递的`y`使用了一个空的数组,不过TensorFlow并不会用到它,前面已经解释过了。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们把这些图片和它们的标签绘制出来,同时绘制出来的还有相应的重新构造和预测:" - ] - }, - { - "cell_type": "code", - "execution_count": 150, - "metadata": {}, - "outputs": [], - "source": [ - "sample_images = sample_images.reshape(-1, 28, 28)\n", - "reconstructions = decoder_output_value.reshape([-1, 28, 28])\n", - "\n", - "plt.figure(figsize=(n_samples * 2, 3))\n", - "for index in range(n_samples):\n", - " plt.subplot(1, n_samples, index + 1)\n", - " plt.imshow(sample_images[index], cmap=\"binary\")\n", - " plt.title(\"Label:\" + str(mnist.test.labels[index]))\n", - " plt.axis(\"off\")\n", - "\n", - "plt.show()\n", - "\n", - "plt.figure(figsize=(n_samples * 2, 3))\n", - "for index in range(n_samples):\n", - " plt.subplot(1, n_samples, index + 1)\n", - " plt.title(\"Predicted:\" + str(y_pred_value[index]))\n", - " plt.imshow(reconstructions[index], cmap=\"binary\")\n", - " plt.axis(\"off\")\n", - " \n", - "plt.show()\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "预测都正确,而且重新构造的图片看上去很棒。阿弥陀佛!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 理解输出向量" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "让我们调整一下输出向量,对它们的姿态参数表示进行查看。" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "首先让我们检查`cap2_output_value` NumPy数组的形状:" - ] - }, - { - "cell_type": "code", - "execution_count": 151, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_value.shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "让我们创建一个函数,该函数在所有的输出向量里对于每个 16(维度)姿态参数进行调整。每个调整过的输出向量将和原来的输出向量相同,除了它的 姿态参数 中的一个会加上一个-0.5到0.5之间变动的值。默认的会有11个步数(-0.5, -0.4, ..., +0.4, +0.5)。这个函数会返回一个数组,其形状为(_调整过的姿态参数_=16, _步数_=11, _batch size_=5, 1, 10, 16, 1):" - ] - }, - { - "cell_type": "code", - "execution_count": 152, - "metadata": {}, - "outputs": [], - "source": [ - "def tweak_pose_parameters(output_vectors, min=-0.5, max=0.5, n_steps=11):\n", - " steps = np.linspace(min, max, n_steps) # -0.25, -0.15, ..., +0.25\n", - " pose_parameters = np.arange(caps2_n_dims) # 0, 1, ..., 15\n", - " tweaks = np.zeros([caps2_n_dims, n_steps, 1, 1, 1, caps2_n_dims, 1])\n", - " tweaks[pose_parameters, :, 0, 0, 0, pose_parameters, 0] = steps\n", - " output_vectors_expanded = output_vectors[np.newaxis, np.newaxis]\n", - " return tweaks + output_vectors_expanded" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "让我们计算所有的调整过的输出向量并且重塑结果到 (_parameters_×_steps_×_instances_, 1, 10, 16, 1) 以便于我们能够传递该数组到解码器中:" - ] - }, - { - "cell_type": "code", - "execution_count": 153, - "metadata": {}, - "outputs": [], - "source": [ - "n_steps = 11\n", - "\n", - "tweaked_vectors = tweak_pose_parameters(caps2_output_value, n_steps=n_steps)\n", - "tweaked_vectors_reshaped = tweaked_vectors.reshape(\n", - " [-1, 1, caps2_n_caps, caps2_n_dims, 1])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "现在让我们递送这些调整过的输出向量到解码器并且获得重新构造,它会产生:" - ] - }, - { - "cell_type": "code", - "execution_count": 154, - "metadata": {}, - "outputs": [], - "source": [ - "tweak_labels = np.tile(mnist.test.labels[:n_samples], caps2_n_dims * n_steps)\n", - "\n", - "with tf.Session() as sess:\n", - " saver.restore(sess, checkpoint_path)\n", - " decoder_output_value = sess.run(\n", - " decoder_output,\n", - " feed_dict={caps2_output: tweaked_vectors_reshaped,\n", - " mask_with_labels: True,\n", - " y: tweak_labels})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "让我们重塑解码器的输出以便于我们能够在输出维度,调整步数,和实例之上进行迭代:" - ] - }, - { - "cell_type": "code", - "execution_count": 155, - "metadata": {}, - "outputs": [], - "source": [ - "tweak_reconstructions = decoder_output_value.reshape(\n", - " [caps2_n_dims, n_steps, n_samples, 28, 28])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "最后,让我们绘制所有的重新构造,对于前三个输出维度,对于每个调整中的步数(列)和每个数字(行):" - ] - }, - { - "cell_type": "code", - "execution_count": 156, - "metadata": { - "scrolled": false - }, - "outputs": [], - "source": [ - "for dim in range(3):\n", - " print(\"Tweaking output dimension #{}\".format(dim))\n", - " plt.figure(figsize=(n_steps / 1.2, n_samples / 1.5))\n", - " for row in range(n_samples):\n", - " for col in range(n_steps):\n", - " plt.subplot(n_samples, n_steps, row * n_steps + col + 1)\n", - " plt.imshow(tweak_reconstructions[dim, col, row], cmap=\"binary\")\n", - " plt.axis(\"off\")\n", - " plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 小结" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "我试图让这个notebook中的代码尽量的扁平和线性,为了让大家容易跟上,当然在实践中大家可能想要包装这些代码成可重用的函数和类。例如,你可以尝试实现你自己的`PrimaryCapsuleLayer`,和`DeseRoutingCapsuleLayer` 类,其参数可以是胶囊的数量,路由迭代的数量,是使用动态循环还是静态循环,诸如此类。对于基于TensorFlow模块化的胶囊网络的实现,可以参考[CapsNet-TensorFlow](https://github.com/naturomics/CapsNet-Tensorflow) 项目。\n", - "\n", - "这就是今天所有的内容,我希望你们喜欢这个notebook!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.4" - }, - "toc": { - "base_numbering": 1, - "nav_menu": {}, - "number_sections": true, - "sideBar": true, - "skip_h1_title": false, - "title_cell": "Table of Contents", - "title_sidebar": "Contents", - "toc_cell": false, - "toc_position": { - "height": "calc(100% - 180px)", - "left": "10px", - "top": "150px", - "width": "336px" - }, - "toc_section_display": true, - "toc_window_display": true - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/work_in_progress/extra_capsnets.ipynb b/work_in_progress/extra_capsnets.ipynb deleted file mode 100644 index b951392..0000000 --- a/work_in_progress/extra_capsnets.ipynb +++ /dev/null @@ -1,2066 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Capsule Networks (CapsNets)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Based on the paper: [Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829), by Sara Sabour, Nicholas Frosst and Geoffrey E. Hinton (NIPS 2017)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Inspired in part from Huadong Liao's implementation: [CapsNet-TensorFlow](https://github.com/naturomics/CapsNet-Tensorflow)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Introduction" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Watch [this video](https://youtu.be/pPN8d0E3900) to understand the key ideas behind Capsule Networks:" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "from IPython.display import HTML\n", - "HTML(\"\"\"\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You may also want to watch [this video](https://youtu.be/2Kawrd5szHE), which presents the main difficulties in this notebook:" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "HTML(\"\"\"\"\"\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Imports" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To plot pretty figures:" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import matplotlib\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We will need NumPy and TensorFlow:" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Reproducibility" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's reset the default graph, in case you re-run this notebook without restarting the kernel:" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's set the random seeds so that this notebook always produces the same output:" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "np.random.seed(42)\n", - "tf.set_random_seed(42)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Load MNIST" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Yes, I know, it's MNIST again. But hopefully this powerful idea will work as well on larger datasets, time will tell." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "from tensorflow.examples.tutorials.mnist import input_data\n", - "\n", - "mnist = input_data.read_data_sets(\"/tmp/data/\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's look at what these hand-written digit images look like:" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "n_samples = 5\n", - "\n", - "plt.figure(figsize=(n_samples * 2, 3))\n", - "for index in range(n_samples):\n", - " plt.subplot(1, n_samples, index + 1)\n", - " sample_image = mnist.train.images[index].reshape(28, 28)\n", - " plt.imshow(sample_image, cmap=\"binary\")\n", - " plt.axis(\"off\")\n", - "\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And these are the corresponding labels:" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "mnist.train.labels[:n_samples]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's build a Capsule Network to classify these images. Here's the overall architecture, enjoy the ASCII art! ;-)\n", - "Note: for readability, I left out two arrows: Labels → Mask, and Input Images → Reconstruction Loss." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```\n", - " Loss\n", - " ↑\n", - " ┌─────────┴─────────┐\n", - " Labels → Margin Loss Reconstruction Loss\n", - " ↑ ↑\n", - " Length Decoder\n", - " ↑ ↑ \n", - " Digit Capsules ────Mask────┘\n", - " ↖↑↗ ↖↑↗ ↖↑↗\n", - " Primary Capsules\n", - " ↑ \n", - " Input Images\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We are going to build the graph starting from the bottom layer, and gradually move up, left side first. Let's go!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Input Images" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's start by creating a placeholder for the input images (28×28 pixels, 1 color channel = grayscale)." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "X = tf.placeholder(shape=[None, 28, 28, 1], dtype=tf.float32, name=\"X\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Primary Capsules" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The first layer will be composed of 32 maps of 6×6 capsules each, where each capsule will output an 8D activation vector:" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "caps1_n_maps = 32\n", - "caps1_n_caps = caps1_n_maps * 6 * 6 # 1152 primary capsules\n", - "caps1_n_dims = 8" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To compute their outputs, we first apply two regular convolutional layers:" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "conv1_params = {\n", - " \"filters\": 256,\n", - " \"kernel_size\": 9,\n", - " \"strides\": 1,\n", - " \"padding\": \"valid\",\n", - " \"activation\": tf.nn.relu,\n", - "}\n", - "\n", - "conv2_params = {\n", - " \"filters\": caps1_n_maps * caps1_n_dims, # 256 convolutional filters\n", - " \"kernel_size\": 9,\n", - " \"strides\": 2,\n", - " \"padding\": \"valid\",\n", - " \"activation\": tf.nn.relu\n", - "}" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "conv1 = tf.layers.conv2d(X, name=\"conv1\", **conv1_params)\n", - "conv2 = tf.layers.conv2d(conv1, name=\"conv2\", **conv2_params)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note: since we used a kernel size of 9 and no padding (for some reason, that's what `\"valid\"` means), the image shrunk by 9-1=8 pixels after each convolutional layer (28×28 to 20×20, then 20×20 to 12×12), and since we used a stride of 2 in the second convolutional layer, the image size was divided by 2. This is how we end up with 6×6 feature maps." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we reshape the output to get a bunch of 8D vectors representing the outputs of the primary capsules. The output of `conv2` is an array containing 32×8=256 feature maps for each instance, where each feature map is 6×6. So the shape of this output is (_batch size_, 6, 6, 256). We want to chop the 256 into 32 vectors of 8 dimensions each. We could do this by reshaping to (_batch size_, 6, 6, 32, 8). However, since this first capsule layer will be fully connected to the next capsule layer, we can simply flatten the 6×6 grids. This means we just need to reshape to (_batch size_, 6×6×32, 8)." - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "caps1_raw = tf.reshape(conv2, [-1, caps1_n_caps, caps1_n_dims],\n", - " name=\"caps1_raw\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we need to squash these vectors. Let's define the `squash()` function, based on equation (1) from the paper:\n", - "\n", - "$\\operatorname{squash}(\\mathbf{s}) = \\dfrac{\\|\\mathbf{s}\\|^2}{1 + \\|\\mathbf{s}\\|^2} \\dfrac{\\mathbf{s}}{\\|\\mathbf{s}\\|}$\n", - "\n", - "The `squash()` function will squash all vectors in the given array, along the given axis (by default, the last axis).\n", - "\n", - "**Caution**, a nasty bug is waiting to bite you: the derivative of $\\|\\mathbf{s}\\|$ is undefined when $\\|\\mathbf{s}\\|=0$, so we can't just use `tf.norm()`, or else it will blow up during training: if a vector is zero, the gradients will be `nan`, so when the optimizer updates the variables, they will also become `nan`, and from then on you will be stuck in `nan` land. The solution is to implement the norm manually by computing the square root of the sum of squares plus a tiny epsilon value: $\\|\\mathbf{s}\\| \\approx \\sqrt{\\sum\\limits_i{{s_i}^2}\\,\\,+ \\epsilon}$." - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [], - "source": [ - "def squash(s, axis=-1, epsilon=1e-7, name=None):\n", - " with tf.name_scope(name, default_name=\"squash\"):\n", - " squared_norm = tf.reduce_sum(tf.square(s), axis=axis,\n", - " keep_dims=True)\n", - " safe_norm = tf.sqrt(squared_norm + epsilon)\n", - " squash_factor = squared_norm / (1. + squared_norm)\n", - " unit_vector = s / safe_norm\n", - " return squash_factor * unit_vector" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's apply this function to get the output $\\mathbf{u}_i$ of each primary capsules $i$ :" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [], - "source": [ - "caps1_output = squash(caps1_raw, name=\"caps1_output\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Great! We have the output of the first capsule layer. It wasn't too hard, was it? However, computing the next layer is where the fun really begins." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Digit Capsules" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To compute the output of the digit capsules, we must first compute the predicted output vectors (one for each primary / digit capsule pair). Then we can run the routing by agreement algorithm." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Compute the Predicted Output Vectors" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The digit capsule layer contains 10 capsules (one for each digit) of 16 dimensions each:" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_n_caps = 10\n", - "caps2_n_dims = 16" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For each capsule $i$ in the first layer, we want to predict the output of every capsule $j$ in the second layer. For this, we will need a transformation matrix $\\mathbf{W}_{i,j}$ (one for each pair of capsules ($i$, $j$)), then we can compute the predicted output $\\hat{\\mathbf{u}}_{j|i} = \\mathbf{W}_{i,j} \\, \\mathbf{u}_i$ (equation (2)-right in the paper). Since we want to transform an 8D vector into a 16D vector, each transformation matrix $\\mathbf{W}_{i,j}$ must have a shape of (16, 8)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To compute $\\hat{\\mathbf{u}}_{j|i}$ for every pair of capsules ($i$, $j$), we will use a nice feature of the `tf.matmul()` function: you probably know that it lets you multiply two matrices, but you may not know that it also lets you multiply higher dimensional arrays. It treats the arrays as arrays of matrices, and it performs itemwise matrix multiplication. For example, suppose you have two 4D arrays, each containing a 2×3 grid of matrices. The first contains matrices $\\mathbf{A}, \\mathbf{B}, \\mathbf{C}, \\mathbf{D}, \\mathbf{E}, \\mathbf{F}$ and the second contains matrices $\\mathbf{G}, \\mathbf{H}, \\mathbf{I}, \\mathbf{J}, \\mathbf{K}, \\mathbf{L}$. If you multiply these two 4D arrays using the `tf.matmul()` function, this is what you get:\n", - "\n", - "$\n", - "\\pmatrix{\n", - "\\mathbf{A} & \\mathbf{B} & \\mathbf{C} \\\\\n", - "\\mathbf{D} & \\mathbf{E} & \\mathbf{F}\n", - "} \\times\n", - "\\pmatrix{\n", - "\\mathbf{G} & \\mathbf{H} & \\mathbf{I} \\\\\n", - "\\mathbf{J} & \\mathbf{K} & \\mathbf{L}\n", - "} = \\pmatrix{\n", - "\\mathbf{AG} & \\mathbf{BH} & \\mathbf{CI} \\\\\n", - "\\mathbf{DJ} & \\mathbf{EK} & \\mathbf{FL}\n", - "}\n", - "$" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can apply this function to compute $\\hat{\\mathbf{u}}_{j|i}$ for every pair of capsules ($i$, $j$) like this (recall that there are 6×6×32=1152 capsules in the first layer, and 10 in the second layer):\n", - "\n", - "$\n", - "\\pmatrix{\n", - " \\mathbf{W}_{1,1} & \\mathbf{W}_{1,2} & \\cdots & \\mathbf{W}_{1,10} \\\\\n", - " \\mathbf{W}_{2,1} & \\mathbf{W}_{2,2} & \\cdots & \\mathbf{W}_{2,10} \\\\\n", - " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", - " \\mathbf{W}_{1152,1} & \\mathbf{W}_{1152,2} & \\cdots & \\mathbf{W}_{1152,10}\n", - "} \\times\n", - "\\pmatrix{\n", - " \\mathbf{u}_1 & \\mathbf{u}_1 & \\cdots & \\mathbf{u}_1 \\\\\n", - " \\mathbf{u}_2 & \\mathbf{u}_2 & \\cdots & \\mathbf{u}_2 \\\\\n", - " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", - " \\mathbf{u}_{1152} & \\mathbf{u}_{1152} & \\cdots & \\mathbf{u}_{1152}\n", - "}\n", - "=\n", - "\\pmatrix{\n", - "\\hat{\\mathbf{u}}_{1|1} & \\hat{\\mathbf{u}}_{2|1} & \\cdots & \\hat{\\mathbf{u}}_{10|1} \\\\\n", - "\\hat{\\mathbf{u}}_{1|2} & \\hat{\\mathbf{u}}_{2|2} & \\cdots & \\hat{\\mathbf{u}}_{10|2} \\\\\n", - "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", - "\\hat{\\mathbf{u}}_{1|1152} & \\hat{\\mathbf{u}}_{2|1152} & \\cdots & \\hat{\\mathbf{u}}_{10|1152}\n", - "}\n", - "$\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The shape of the first array is (1152, 10, 16, 8), and the shape of the second array is (1152, 10, 8, 1). Note that the second array must contain 10 identical copies of the vectors $\\mathbf{u}_1$ to $\\mathbf{u}_{1152}$. To create this array, we will use the handy `tf.tile()` function, which lets you create an array containing many copies of a base array, tiled in any way you want." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Oh, wait a second! We forgot one dimension: _batch size_. Say we feed 50 images to the capsule network, it will make predictions for these 50 images simultaneously. So the shape of the first array must be (50, 1152, 10, 16, 8), and the shape of the second array must be (50, 1152, 10, 8, 1). The first layer capsules actually already output predictions for all 50 images, so the second array will be fine, but for the first array, we will need to use `tf.tile()` to have 50 copies of the transformation matrices." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Okay, let's start by creating a trainable variable of shape (1, 1152, 10, 16, 8) that will hold all the transformation matrices. The first dimension of size 1 will make this array easy to tile. We initialize this variable randomly using a normal distribution with a standard deviation to 0.1." - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [], - "source": [ - "init_sigma = 0.1\n", - "\n", - "W_init = tf.random_normal(\n", - " shape=(1, caps1_n_caps, caps2_n_caps, caps2_n_dims, caps1_n_dims),\n", - " stddev=init_sigma, dtype=tf.float32, name=\"W_init\")\n", - "W = tf.Variable(W_init, name=\"W\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we can create the first array by repeating `W` once per instance:" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [], - "source": [ - "batch_size = tf.shape(X)[0]\n", - "W_tiled = tf.tile(W, [batch_size, 1, 1, 1, 1], name=\"W_tiled\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "That's it! On to the second array, now. As discussed earlier, we need to create an array of shape (_batch size_, 1152, 10, 8, 1), containing the output of the first layer capsules, repeated 10 times (once per digit, along the third dimension, which is axis=2). The `caps1_output` array has a shape of (_batch size_, 1152, 8), so we first need to expand it twice, to get an array of shape (_batch size_, 1152, 1, 8, 1), then we can repeat it 10 times along the third dimension:" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [], - "source": [ - "caps1_output_expanded = tf.expand_dims(caps1_output, -1,\n", - " name=\"caps1_output_expanded\")\n", - "caps1_output_tile = tf.expand_dims(caps1_output_expanded, 2,\n", - " name=\"caps1_output_tile\")\n", - "caps1_output_tiled = tf.tile(caps1_output_tile, [1, 1, caps2_n_caps, 1, 1],\n", - " name=\"caps1_output_tiled\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's check the shape of the first array:" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [], - "source": [ - "W_tiled" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Good, and now the second:" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [], - "source": [ - "caps1_output_tiled" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Yes! Now, to get all the predicted output vectors $\\hat{\\mathbf{u}}_{j|i}$, we just need to multiply these two arrays using `tf.matmul()`, as explained earlier: " - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_predicted = tf.matmul(W_tiled, caps1_output_tiled,\n", - " name=\"caps2_predicted\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's check the shape:" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_predicted" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Perfect, for each instance in the batch (we don't know the batch size yet, hence the \"?\") and for each pair of first and second layer capsules (1152×10) we have a 16D predicted output column vector (16×1). We're ready to apply the routing by agreement algorithm!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Routing by agreement" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First let's initialize the raw routing weights $b_{i,j}$ to zero:" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [], - "source": [ - "raw_weights = tf.zeros([batch_size, caps1_n_caps, caps2_n_caps, 1, 1],\n", - " dtype=np.float32, name=\"raw_weights\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We will see why we need the last two dimensions of size 1 in a minute." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Round 1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, let's apply the softmax function to compute the routing weights, $\\mathbf{c}_{i} = \\operatorname{softmax}(\\mathbf{b}_i)$ (equation (3) in the paper):" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [], - "source": [ - "routing_weights = tf.nn.softmax(raw_weights, dim=2, name=\"routing_weights\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's compute the weighted sum of all the predicted output vectors for each second-layer capsule, $\\mathbf{s}_j = \\sum\\limits_{i}{c_{i,j}\\hat{\\mathbf{u}}_{j|i}}$ (equation (2)-left in the paper):" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [], - "source": [ - "weighted_predictions = tf.multiply(routing_weights, caps2_predicted,\n", - " name=\"weighted_predictions\")\n", - "weighted_sum = tf.reduce_sum(weighted_predictions, axis=1, keep_dims=True,\n", - " name=\"weighted_sum\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There are a couple important details to note here:\n", - "* To perform elementwise matrix multiplication (also called the Hadamard product, noted $\\circ$), we use the `tf.multiply()` function. It requires `routing_weights` and `caps2_predicted` to have the same rank, which is why we added two extra dimensions of size 1 to `routing_weights`, earlier.\n", - "* The shape of `routing_weights` is (_batch size_, 1152, 10, 1, 1) while the shape of `caps2_predicted` is (_batch size_, 1152, 10, 16, 1). Since they don't match on the fourth dimension (1 _vs_ 16), `tf.multiply()` automatically _broadcasts_ the `routing_weights` 16 times along that dimension. If you are not familiar with broadcasting, a simple example might help:\n", - "\n", - " $ \\pmatrix{1 & 2 & 3 \\\\ 4 & 5 & 6} \\circ \\pmatrix{10 & 100 & 1000} = \\pmatrix{1 & 2 & 3 \\\\ 4 & 5 & 6} \\circ \\pmatrix{10 & 100 & 1000 \\\\ 10 & 100 & 1000} = \\pmatrix{10 & 200 & 3000 \\\\ 40 & 500 & 6000} $" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And finally, let's apply the squash function to get the outputs of the second layer capsules at the end of the first iteration of the routing by agreement algorithm, $\\mathbf{v}_j = \\operatorname{squash}(\\mathbf{s}_j)$ :" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_round_1 = squash(weighted_sum, axis=-2,\n", - " name=\"caps2_output_round_1\")" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_round_1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Good! We have ten 16D output vectors for each instance, as expected." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Round 2" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, let's measure how close each predicted vector $\\hat{\\mathbf{u}}_{j|i}$ is to the actual output vector $\\mathbf{v}_j$ by computing their scalar product $\\hat{\\mathbf{u}}_{j|i} \\cdot \\mathbf{v}_j$." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "* Quick math reminder: if $\\vec{a}$ and $\\vec{b}$ are two vectors of equal length, and $\\mathbf{a}$ and $\\mathbf{b}$ are their corresponding column vectors (i.e., matrices with a single column), then $\\mathbf{a}^T \\mathbf{b}$ (i.e., the matrix multiplication of the transpose of $\\mathbf{a}$, and $\\mathbf{b}$) is a 1×1 matrix containing the scalar product of the two vectors $\\vec{a}\\cdot\\vec{b}$. In Machine Learning, we generally represent vectors as column vectors, so when we talk about computing the scalar product $\\hat{\\mathbf{u}}_{j|i} \\cdot \\mathbf{v}_j$, this actually means computing ${\\hat{\\mathbf{u}}_{j|i}}^T \\mathbf{v}_j$." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Since we need to compute the scalar product $\\hat{\\mathbf{u}}_{j|i} \\cdot \\mathbf{v}_j$ for each instance, and for each pair of first and second level capsules $(i, j)$, we will once again take advantage of the fact that `tf.matmul()` can multiply many matrices simultaneously. This will require playing around with `tf.tile()` to get all dimensions to match (except for the last 2), just like we did earlier. So let's look at the shape of `caps2_predicted`, which holds all the predicted output vectors $\\hat{\\mathbf{u}}_{j|i}$ for each instance and each pair of capsules:" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_predicted" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And now let's look at the shape of `caps2_output_round_1`, which holds 10 outputs vectors of 16D each, for each instance:" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_round_1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To get these shapes to match, we just need to tile the `caps2_output_round_1` array 1152 times (once per primary capsule) along the second dimension:" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_round_1_tiled = tf.tile(\n", - " caps2_output_round_1, [1, caps1_n_caps, 1, 1, 1],\n", - " name=\"caps2_output_round_1_tiled\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And now we are ready to call `tf.matmul()` (note that we must tell it to transpose the matrices in the first array, to get ${\\hat{\\mathbf{u}}_{j|i}}^T$ instead of $\\hat{\\mathbf{u}}_{j|i}$):" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [], - "source": [ - "agreement = tf.matmul(caps2_predicted, caps2_output_round_1_tiled,\n", - " transpose_a=True, name=\"agreement\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can now update the raw routing weights $b_{i,j}$ by simply adding the scalar product $\\hat{\\mathbf{u}}_{j|i} \\cdot \\mathbf{v}_j$ we just computed: $b_{i,j} \\gets b_{i,j} + \\hat{\\mathbf{u}}_{j|i} \\cdot \\mathbf{v}_j$ (see Procedure 1, step 7, in the paper)." - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [], - "source": [ - "raw_weights_round_2 = tf.add(raw_weights, agreement,\n", - " name=\"raw_weights_round_2\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The rest of round 2 is the same as in round 1:" - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [], - "source": [ - "routing_weights_round_2 = tf.nn.softmax(raw_weights_round_2,\n", - " dim=2,\n", - " name=\"routing_weights_round_2\")\n", - "weighted_predictions_round_2 = tf.multiply(routing_weights_round_2,\n", - " caps2_predicted,\n", - " name=\"weighted_predictions_round_2\")\n", - "weighted_sum_round_2 = tf.reduce_sum(weighted_predictions_round_2,\n", - " axis=1, keep_dims=True,\n", - " name=\"weighted_sum_round_2\")\n", - "caps2_output_round_2 = squash(weighted_sum_round_2,\n", - " axis=-2,\n", - " name=\"caps2_output_round_2\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We could go on for a few more rounds, by repeating exactly the same steps as in round 2, but to keep things short, we will stop here:" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output = caps2_output_round_2" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Static or Dynamic Loop?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the code above, we created different operations in the TensorFlow graph for each round of the routing by agreement algorithm. In other words, it's a static loop.\n", - "\n", - "Sure, instead of copy/pasting the code several times, we could have written a `for` loop in Python, but this would not change the fact that the graph would end up containing different operations for each routing iteration. It's actually okay since we generally want less than 5 routing iterations, so the graph won't grow too big.\n", - "\n", - "However, you may prefer to implement the routing loop within the TensorFlow graph itself rather than using a Python `for` loop. To do this, you would need to use TensorFlow's `tf.while_loop()` function. This way, all routing iterations would reuse the same operations in the graph, it would be a dynamic loop.\n", - "\n", - "For example, here is how to build a small loop that computes the sum of squares from 1 to 100:" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [], - "source": [ - "def condition(input, counter):\n", - " return tf.less(counter, 100)\n", - "\n", - "def loop_body(input, counter):\n", - " output = tf.add(input, tf.square(counter))\n", - " return output, tf.add(counter, 1)\n", - "\n", - "with tf.name_scope(\"compute_sum_of_squares\"):\n", - " counter = tf.constant(1)\n", - " sum_of_squares = tf.constant(0)\n", - "\n", - " result = tf.while_loop(condition, loop_body, [sum_of_squares, counter])\n", - " \n", - "\n", - "with tf.Session() as sess:\n", - " print(sess.run(result))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "As you can see, the `tf.while_loop()` function expects the loop condition and body to be provided _via_ two functions. These functions will be called only once by TensorFlow, during the graph construction phase, _not_ while executing the graph. The `tf.while_loop()` function stitches together the graph fragments created by `condition()` and `loop_body()` with some additional operations to create the loop.\n", - "\n", - "Also note that during training, TensorFlow will automagically handle backpropogation through the loop, so you don't need to worry about that." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Of course, we could have used this one-liner instead! ;-)" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [], - "source": [ - "sum([i**2 for i in range(1, 100 + 1)])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Joke aside, apart from reducing the graph size, using a dynamic loop instead of a static loop can help reduce how much GPU RAM you use (if you are using a GPU). Indeed, if you set `swap_memory=True` when calling the `tf.while_loop()` function, TensorFlow will automatically check GPU RAM usage at each loop iteration, and it will take care of swapping memory between the GPU and the CPU when needed. Since CPU memory is much cheaper and abundant than GPU RAM, this can really make a big difference." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Estimated Class Probabilities (Length)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The lengths of the output vectors represent the class probabilities, so we could just use `tf.norm()` to compute them, but as we saw when discussing the squash function, it would be risky, so instead let's create our own `safe_norm()` function:" - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [], - "source": [ - "def safe_norm(s, axis=-1, epsilon=1e-7, keep_dims=False, name=None):\n", - " with tf.name_scope(name, default_name=\"safe_norm\"):\n", - " squared_norm = tf.reduce_sum(tf.square(s), axis=axis,\n", - " keep_dims=keep_dims)\n", - " return tf.sqrt(squared_norm + epsilon)" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [], - "source": [ - "y_proba = safe_norm(caps2_output, axis=-2, name=\"y_proba\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To predict the class of each instance, we can just select the one with the highest estimated probability. To do this, let's start by finding its index using `tf.argmax()`:" - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [], - "source": [ - "y_proba_argmax = tf.argmax(y_proba, axis=2, name=\"y_proba\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's look at the shape of `y_proba_argmax`:" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [], - "source": [ - "y_proba_argmax" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "That's what we wanted: for each instance, we now have the index of the longest output vector. Let's get rid of the last two dimensions by using `tf.squeeze()` which removes dimensions of size 1. This gives us the capsule network's predicted class for each instance:" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [], - "source": [ - "y_pred = tf.squeeze(y_proba_argmax, axis=[1,2], name=\"y_pred\")" - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [], - "source": [ - "y_pred" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Okay, we are now ready to define the training operations, starting with the losses." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Labels" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, we will need a placeholder for the labels:" - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [], - "source": [ - "y = tf.placeholder(shape=[None], dtype=tf.int64, name=\"y\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Margin loss" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The paper uses a special margin loss to make it possible to detect two or more different digits in each image:\n", - "\n", - "$ L_k = T_k \\max(0, m^{+} - \\|\\mathbf{v}_k\\|)^2 + \\lambda (1 - T_k) \\max(0, \\|\\mathbf{v}_k\\| - m^{-})^2$\n", - "\n", - "* $T_k$ is equal to 1 if the digit of class $k$ is present, or 0 otherwise.\n", - "* In the paper, $m^{+} = 0.9$, $m^{-} = 0.1$ and $\\lambda = 0.5$.\n", - "* Note that there was an error in the video (at 15:47): the max operations are squared, not the norms. Sorry about that." - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [], - "source": [ - "m_plus = 0.9\n", - "m_minus = 0.1\n", - "lambda_ = 0.5" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Since `y` will contain the digit classes, from 0 to 9, to get $T_k$ for every instance and every class, we can just use the `tf.one_hot()` function:" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [], - "source": [ - "T = tf.one_hot(y, depth=caps2_n_caps, name=\"T\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "A small example should make it clear what this does:" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": {}, - "outputs": [], - "source": [ - "with tf.Session():\n", - " print(T.eval(feed_dict={y: np.array([0, 1, 2, 3, 9])}))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's compute the norm of the output vector for each output capsule and each instance. First, let's verify the shape of `caps2_output`:" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The 16D output vectors are in the second to last dimension, so let's use the `safe_norm()` function with `axis=-2`:" - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_norm = safe_norm(caps2_output, axis=-2, keep_dims=True,\n", - " name=\"caps2_output_norm\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's compute $\\max(0, m^{+} - \\|\\mathbf{v}_k\\|)^2$, and reshape the result to get a simple matrix of shape (_batch size_, 10):" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [], - "source": [ - "present_error_raw = tf.square(tf.maximum(0., m_plus - caps2_output_norm),\n", - " name=\"present_error_raw\")\n", - "present_error = tf.reshape(present_error_raw, shape=(-1, 10),\n", - " name=\"present_error\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next let's compute $\\max(0, \\|\\mathbf{v}_k\\| - m^{-})^2$ and reshape it:" - ] - }, - { - "cell_type": "code", - "execution_count": 53, - "metadata": {}, - "outputs": [], - "source": [ - "absent_error_raw = tf.square(tf.maximum(0., caps2_output_norm - m_minus),\n", - " name=\"absent_error_raw\")\n", - "absent_error = tf.reshape(absent_error_raw, shape=(-1, 10),\n", - " name=\"absent_error\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We are ready to compute the loss for each instance and each digit:" - ] - }, - { - "cell_type": "code", - "execution_count": 54, - "metadata": {}, - "outputs": [], - "source": [ - "L = tf.add(T * present_error, lambda_ * (1.0 - T) * absent_error,\n", - " name=\"L\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we can sum the digit losses for each instance ($L_0 + L_1 + \\cdots + L_9$), and compute the mean over all instances. This gives us the final margin loss:" - ] - }, - { - "cell_type": "code", - "execution_count": 55, - "metadata": {}, - "outputs": [], - "source": [ - "margin_loss = tf.reduce_mean(tf.reduce_sum(L, axis=1), name=\"margin_loss\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Reconstruction" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's add a decoder network on top of the capsule network. It is a regular 3-layer fully connected neural network which will learn to reconstruct the input images based on the output of the capsule network. This will force the capsule network to preserve all the information required to reconstruct the digits, across the whole network. This constraint regularizes the model: it reduces the risk of overfitting the training set, and it helps generalize to new digits." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Mask" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The paper mentions that during training, instead of sending all the outputs of the capsule network to the decoder network, we must send only the output vector of the capsule that corresponds to the target digit. All the other output vectors must be masked out. At inference time, we must mask all output vectors except for the longest one, i.e., the one that corresponds to the predicted digit. You can see this in the paper's figure 2 (at 18:15 in the video): all output vectors are masked out, except for the reconstruction target's output vector." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We need a placeholder to tell TensorFlow whether we want to mask the output vectors based on the labels (`True`) or on the predictions (`False`, the default):" - ] - }, - { - "cell_type": "code", - "execution_count": 56, - "metadata": {}, - "outputs": [], - "source": [ - "mask_with_labels = tf.placeholder_with_default(False, shape=(),\n", - " name=\"mask_with_labels\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's use `tf.cond()` to define the reconstruction targets as the labels `y` if `mask_with_labels` is `True`, or `y_pred` otherwise." - ] - }, - { - "cell_type": "code", - "execution_count": 57, - "metadata": {}, - "outputs": [], - "source": [ - "reconstruction_targets = tf.cond(mask_with_labels, # condition\n", - " lambda: y, # if True\n", - " lambda: y_pred, # if False\n", - " name=\"reconstruction_targets\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note that the `tf.cond()` function expects the if-True and if-False tensors to be passed _via_ functions: these functions will be called just once during the graph construction phase (not during the execution phase), similar to `tf.while_loop()`. This allows TensorFlow to add the necessary operations to handle the conditional evaluation of the if-True or if-False tensors. However, in our case, the tensors `y` and `y_pred` are already created by the time we call `tf.cond()`, so unfortunately TensorFlow will consider both `y` and `y_pred` to be dependencies of the `reconstruction_targets` tensor. The `reconstruction_targets` tensor will end up with the correct value, but:\n", - "1. whenever we evaluate a tensor that depends on `reconstruction_targets`, the `y_pred` tensor will be evaluated (even if `mask_with_layers` is `True`). This is not a big deal because computing `y_pred` adds no computing overhead during training, since we need it anyway to compute the margin loss. And during testing, if we are doing classification, we won't need reconstructions, so `reconstruction_targets` won't be evaluated at all.\n", - "2. we will always need to feed a value for the `y` placeholder (even if `mask_with_layers` is `False`). This is a bit annoying, but we can pass an empty array, because TensorFlow won't use it anyway (it just does not know it yet when it checks for dependencies)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that we have the reconstruction targets, let's create the reconstruction mask. It should be equal to 1.0 for the target class, and 0.0 for the other classes, for each instance. For this we can just use the `tf.one_hot()` function:" - ] - }, - { - "cell_type": "code", - "execution_count": 58, - "metadata": {}, - "outputs": [], - "source": [ - "reconstruction_mask = tf.one_hot(reconstruction_targets,\n", - " depth=caps2_n_caps,\n", - " name=\"reconstruction_mask\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's check the shape of `reconstruction_mask`:" - ] - }, - { - "cell_type": "code", - "execution_count": 59, - "metadata": {}, - "outputs": [], - "source": [ - "reconstruction_mask" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's compare this to the shape of `caps2_output`:" - ] - }, - { - "cell_type": "code", - "execution_count": 60, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Mmh, its shape is (_batch size_, 1, 10, 16, 1). We want to multiply it by the `reconstruction_mask`, but the shape of the `reconstruction_mask` is (_batch size_, 10). We must reshape it to (_batch size_, 1, 10, 1, 1) to make multiplication possible:" - ] - }, - { - "cell_type": "code", - "execution_count": 61, - "metadata": {}, - "outputs": [], - "source": [ - "reconstruction_mask_reshaped = tf.reshape(\n", - " reconstruction_mask, [-1, 1, caps2_n_caps, 1, 1],\n", - " name=\"reconstruction_mask_reshaped\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "At last! We can apply the mask:" - ] - }, - { - "cell_type": "code", - "execution_count": 62, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_masked = tf.multiply(\n", - " caps2_output, reconstruction_mask_reshaped,\n", - " name=\"caps2_output_masked\")" - ] - }, - { - "cell_type": "code", - "execution_count": 63, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_masked" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "One last reshape operation to flatten the decoder's inputs:" - ] - }, - { - "cell_type": "code", - "execution_count": 64, - "metadata": {}, - "outputs": [], - "source": [ - "decoder_input = tf.reshape(caps2_output_masked,\n", - " [-1, caps2_n_caps * caps2_n_dims],\n", - " name=\"decoder_input\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This gives us an array of shape (_batch size_, 160):" - ] - }, - { - "cell_type": "code", - "execution_count": 65, - "metadata": {}, - "outputs": [], - "source": [ - "decoder_input" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Decoder" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's build the decoder. It's quite simple: two dense (fully connected) ReLU layers followed by a dense output sigmoid layer:" - ] - }, - { - "cell_type": "code", - "execution_count": 66, - "metadata": {}, - "outputs": [], - "source": [ - "n_hidden1 = 512\n", - "n_hidden2 = 1024\n", - "n_output = 28 * 28" - ] - }, - { - "cell_type": "code", - "execution_count": 67, - "metadata": {}, - "outputs": [], - "source": [ - "with tf.name_scope(\"decoder\"):\n", - " hidden1 = tf.layers.dense(decoder_input, n_hidden1,\n", - " activation=tf.nn.relu,\n", - " name=\"hidden1\")\n", - " hidden2 = tf.layers.dense(hidden1, n_hidden2,\n", - " activation=tf.nn.relu,\n", - " name=\"hidden2\")\n", - " decoder_output = tf.layers.dense(hidden2, n_output,\n", - " activation=tf.nn.sigmoid,\n", - " name=\"decoder_output\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Reconstruction Loss" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's compute the reconstruction loss. It is just the squared difference between the input image and the reconstructed image:" - ] - }, - { - "cell_type": "code", - "execution_count": 68, - "metadata": {}, - "outputs": [], - "source": [ - "X_flat = tf.reshape(X, [-1, n_output], name=\"X_flat\")\n", - "squared_difference = tf.square(X_flat - decoder_output,\n", - " name=\"squared_difference\")\n", - "reconstruction_loss = tf.reduce_mean(squared_difference,\n", - " name=\"reconstruction_loss\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Final Loss" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The final loss is the sum of the margin loss and the reconstruction loss (scaled down by a factor of 0.0005 to ensure the margin loss dominates training):" - ] - }, - { - "cell_type": "code", - "execution_count": 69, - "metadata": {}, - "outputs": [], - "source": [ - "alpha = 0.0005\n", - "\n", - "loss = tf.add(margin_loss, alpha * reconstruction_loss, name=\"loss\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Final Touches" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accuracy" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To measure our model's accuracy, we need to count the number of instances that are properly classified. For this, we can simply compare `y` and `y_pred`, convert the boolean value to a float32 (0.0 for False, 1.0 for True), and compute the mean over all the instances:" - ] - }, - { - "cell_type": "code", - "execution_count": 70, - "metadata": {}, - "outputs": [], - "source": [ - "correct = tf.equal(y, y_pred, name=\"correct\")\n", - "accuracy = tf.reduce_mean(tf.cast(correct, tf.float32), name=\"accuracy\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training Operations" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The paper mentions that the authors used the Adam optimizer with TensorFlow's default parameters:" - ] - }, - { - "cell_type": "code", - "execution_count": 71, - "metadata": {}, - "outputs": [], - "source": [ - "optimizer = tf.train.AdamOptimizer()\n", - "training_op = optimizer.minimize(loss, name=\"training_op\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Init and Saver" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And let's add the usual variable initializer, as well as a `Saver`:" - ] - }, - { - "cell_type": "code", - "execution_count": 72, - "metadata": {}, - "outputs": [], - "source": [ - "init = tf.global_variables_initializer()\n", - "saver = tf.train.Saver()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And... we're done with the construction phase! Please take a moment to celebrate. :)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Training" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Training our capsule network is pretty standard. For simplicity, we won't do any fancy hyperparameter tuning, dropout or anything, we will just run the training operation over and over again, displaying the loss, and at the end of each epoch, measure the accuracy on the validation set, display it, and save the model if the validation loss is the lowest seen found so far (this is a basic way to implement early stopping, without actually stopping). Hopefully the code should be self-explanatory, but here are a few details to note:\n", - "* if a checkpoint file exists, it will be restored (this makes it possible to interrupt training, then restart it later from the last checkpoint),\n", - "* we must not forget to feed `mask_with_labels=True` during training,\n", - "* during testing, we let `mask_with_labels` default to `False` (but we still feed the labels since they are required to compute the accuracy),\n", - "* the images loaded _via_ `mnist.train.next_batch()` are represented as `float32` arrays of shape \\[784\\], but the input placeholder `X` expects a `float32` array of shape \\[28, 28, 1\\], so we must reshape the images before we feed them to our model,\n", - "* we evaluate the model's loss and accuracy on the full validation set (5,000 instances). To view progress and support systems that don't have a lot of RAM, the code evaluates the loss and accuracy on one batch at a time, and computes the mean loss and mean accuracy at the end.\n", - "\n", - "*Warning*: if you don't have a GPU, training will take a very long time (at least a few hours). With a GPU, it should take just a few minutes per epoch (e.g., 6 minutes on an NVidia GeForce GTX 1080Ti)." - ] - }, - { - "cell_type": "code", - "execution_count": 73, - "metadata": {}, - "outputs": [], - "source": [ - "n_epochs = 10\n", - "batch_size = 50\n", - "restore_checkpoint = True\n", - "\n", - "n_iterations_per_epoch = mnist.train.num_examples // batch_size\n", - "n_iterations_validation = mnist.validation.num_examples // batch_size\n", - "best_loss_val = np.infty\n", - "checkpoint_path = \"./my_capsule_network\"\n", - "\n", - "with tf.Session() as sess:\n", - " if restore_checkpoint and tf.train.checkpoint_exists(checkpoint_path):\n", - " saver.restore(sess, checkpoint_path)\n", - " else:\n", - " init.run()\n", - "\n", - " for epoch in range(n_epochs):\n", - " for iteration in range(1, n_iterations_per_epoch + 1):\n", - " X_batch, y_batch = mnist.train.next_batch(batch_size)\n", - " # Run the training operation and measure the loss:\n", - " _, loss_train = sess.run(\n", - " [training_op, loss],\n", - " feed_dict={X: X_batch.reshape([-1, 28, 28, 1]),\n", - " y: y_batch,\n", - " mask_with_labels: True})\n", - " print(\"\\rIteration: {}/{} ({:.1f}%) Loss: {:.5f}\".format(\n", - " iteration, n_iterations_per_epoch,\n", - " iteration * 100 / n_iterations_per_epoch,\n", - " loss_train),\n", - " end=\"\")\n", - "\n", - " # At the end of each epoch,\n", - " # measure the validation loss and accuracy:\n", - " loss_vals = []\n", - " acc_vals = []\n", - " for iteration in range(1, n_iterations_validation + 1):\n", - " X_batch, y_batch = mnist.validation.next_batch(batch_size)\n", - " loss_val, acc_val = sess.run(\n", - " [loss, accuracy],\n", - " feed_dict={X: X_batch.reshape([-1, 28, 28, 1]),\n", - " y: y_batch})\n", - " loss_vals.append(loss_val)\n", - " acc_vals.append(acc_val)\n", - " print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(\n", - " iteration, n_iterations_validation,\n", - " iteration * 100 / n_iterations_validation),\n", - " end=\" \" * 10)\n", - " loss_val = np.mean(loss_vals)\n", - " acc_val = np.mean(acc_vals)\n", - " print(\"\\rEpoch: {} Val accuracy: {:.4f}% Loss: {:.6f}{}\".format(\n", - " epoch + 1, acc_val * 100, loss_val,\n", - " \" (improved)\" if loss_val < best_loss_val else \"\"))\n", - "\n", - " # And save the model if it improved:\n", - " if loss_val < best_loss_val:\n", - " save_path = saver.save(sess, checkpoint_path)\n", - " best_loss_val = loss_val" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Training is finished, we reached over 99.4% accuracy on the validation set after just 5 epochs, things are looking good. Now let's evaluate the model on the test set." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Evaluation" - ] - }, - { - "cell_type": "code", - "execution_count": 74, - "metadata": {}, - "outputs": [], - "source": [ - "n_iterations_test = mnist.test.num_examples // batch_size\n", - "\n", - "with tf.Session() as sess:\n", - " saver.restore(sess, checkpoint_path)\n", - "\n", - " loss_tests = []\n", - " acc_tests = []\n", - " for iteration in range(1, n_iterations_test + 1):\n", - " X_batch, y_batch = mnist.test.next_batch(batch_size)\n", - " loss_test, acc_test = sess.run(\n", - " [loss, accuracy],\n", - " feed_dict={X: X_batch.reshape([-1, 28, 28, 1]),\n", - " y: y_batch})\n", - " loss_tests.append(loss_test)\n", - " acc_tests.append(acc_test)\n", - " print(\"\\rEvaluating the model: {}/{} ({:.1f}%)\".format(\n", - " iteration, n_iterations_test,\n", - " iteration * 100 / n_iterations_test),\n", - " end=\" \" * 10)\n", - " loss_test = np.mean(loss_tests)\n", - " acc_test = np.mean(acc_tests)\n", - " print(\"\\rFinal test accuracy: {:.4f}% Loss: {:.6f}\".format(\n", - " acc_test * 100, loss_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We reach 99.53% accuracy on the test set. Pretty nice. :)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Predictions" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's make some predictions! We first fix a few images from the test set, then we start a session, restore the trained model, evaluate `caps2_output` to get the capsule network's output vectors, `decoder_output` to get the reconstructions, and `y_pred` to get the class predictions:" - ] - }, - { - "cell_type": "code", - "execution_count": 75, - "metadata": {}, - "outputs": [], - "source": [ - "n_samples = 5\n", - "\n", - "sample_images = mnist.test.images[:n_samples].reshape([-1, 28, 28, 1])\n", - "\n", - "with tf.Session() as sess:\n", - " saver.restore(sess, checkpoint_path)\n", - " caps2_output_value, decoder_output_value, y_pred_value = sess.run(\n", - " [caps2_output, decoder_output, y_pred],\n", - " feed_dict={X: sample_images,\n", - " y: np.array([], dtype=np.int64)})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note: we feed `y` with an empty array, but TensorFlow will not use it, as explained earlier." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And now let's plot the images and their labels, followed by the corresponding reconstructions and predictions:" - ] - }, - { - "cell_type": "code", - "execution_count": 76, - "metadata": {}, - "outputs": [], - "source": [ - "sample_images = sample_images.reshape(-1, 28, 28)\n", - "reconstructions = decoder_output_value.reshape([-1, 28, 28])\n", - "\n", - "plt.figure(figsize=(n_samples * 2, 3))\n", - "for index in range(n_samples):\n", - " plt.subplot(1, n_samples, index + 1)\n", - " plt.imshow(sample_images[index], cmap=\"binary\")\n", - " plt.title(\"Label:\" + str(mnist.test.labels[index]))\n", - " plt.axis(\"off\")\n", - "\n", - "plt.show()\n", - "\n", - "plt.figure(figsize=(n_samples * 2, 3))\n", - "for index in range(n_samples):\n", - " plt.subplot(1, n_samples, index + 1)\n", - " plt.title(\"Predicted:\" + str(y_pred_value[index]))\n", - " plt.imshow(reconstructions[index], cmap=\"binary\")\n", - " plt.axis(\"off\")\n", - " \n", - "plt.show()\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The predictions are all correct, and the reconstructions look great. Hurray!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Interpreting the Output Vectors" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's tweak the output vectors to see what their pose parameters represent." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, let's check the shape of the `cap2_output_value` NumPy array:" - ] - }, - { - "cell_type": "code", - "execution_count": 77, - "metadata": {}, - "outputs": [], - "source": [ - "caps2_output_value.shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's create a function that will tweak each of the 16 pose parameters (dimensions) in all output vectors. Each tweaked output vector will be identical to the original output vector, except that one of its pose parameters will be incremented by a value varying from -0.5 to 0.5. By default there will be 11 steps (-0.5, -0.4, ..., +0.4, +0.5). This function will return an array of shape (_tweaked pose parameters_=16, _steps_=11, _batch size_=5, 1, 10, 16, 1):" - ] - }, - { - "cell_type": "code", - "execution_count": 78, - "metadata": {}, - "outputs": [], - "source": [ - "def tweak_pose_parameters(output_vectors, min=-0.5, max=0.5, n_steps=11):\n", - " steps = np.linspace(min, max, n_steps) # -0.25, -0.15, ..., +0.25\n", - " pose_parameters = np.arange(caps2_n_dims) # 0, 1, ..., 15\n", - " tweaks = np.zeros([caps2_n_dims, n_steps, 1, 1, 1, caps2_n_dims, 1])\n", - " tweaks[pose_parameters, :, 0, 0, 0, pose_parameters, 0] = steps\n", - " output_vectors_expanded = output_vectors[np.newaxis, np.newaxis]\n", - " return tweaks + output_vectors_expanded" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's compute all the tweaked output vectors and reshape the result to (_parameters_×_steps_×_instances_, 1, 10, 16, 1) so we can feed the array to the decoder:" - ] - }, - { - "cell_type": "code", - "execution_count": 79, - "metadata": {}, - "outputs": [], - "source": [ - "n_steps = 11\n", - "\n", - "tweaked_vectors = tweak_pose_parameters(caps2_output_value, n_steps=n_steps)\n", - "tweaked_vectors_reshaped = tweaked_vectors.reshape(\n", - " [-1, 1, caps2_n_caps, caps2_n_dims, 1])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's feed these tweaked output vectors to the decoder and get the reconstructions it produces:" - ] - }, - { - "cell_type": "code", - "execution_count": 80, - "metadata": {}, - "outputs": [], - "source": [ - "tweak_labels = np.tile(mnist.test.labels[:n_samples], caps2_n_dims * n_steps)\n", - "\n", - "with tf.Session() as sess:\n", - " saver.restore(sess, checkpoint_path)\n", - " decoder_output_value = sess.run(\n", - " decoder_output,\n", - " feed_dict={caps2_output: tweaked_vectors_reshaped,\n", - " mask_with_labels: True,\n", - " y: tweak_labels})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's reshape the decoder's output so we can easily iterate on the output dimension, the tweak steps, and the instances:" - ] - }, - { - "cell_type": "code", - "execution_count": 81, - "metadata": {}, - "outputs": [], - "source": [ - "tweak_reconstructions = decoder_output_value.reshape(\n", - " [caps2_n_dims, n_steps, n_samples, 28, 28])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Lastly, let's plot all the reconstructions, for the first 3 output dimensions, for each tweaking step (column) and each digit (row):" - ] - }, - { - "cell_type": "code", - "execution_count": 82, - "metadata": {}, - "outputs": [], - "source": [ - "for dim in range(3):\n", - " print(\"Tweaking output dimension #{}\".format(dim))\n", - " plt.figure(figsize=(n_steps / 1.2, n_samples / 1.5))\n", - " for row in range(n_samples):\n", - " for col in range(n_steps):\n", - " plt.subplot(n_samples, n_steps, row * n_steps + col + 1)\n", - " plt.imshow(tweak_reconstructions[dim, col, row], cmap=\"binary\")\n", - " plt.axis(\"off\")\n", - " plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Conclusion" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "I tried to make the code in this notebook as flat and linear as possible, to make it easier to follow, but of course in practice you would want to wrap the code in nice reusable functions and classes. For example, you could try implementing your own `PrimaryCapsuleLayer`, and `DenseRoutingCapsuleLayer` classes, with parameters for the number of capsules, the number of routing iterations, whether to use a dynamic loop or a static loop, and so on. For an example a modular implementation of Capsule Networks based on TensorFlow, take a look at the [CapsNet-TensorFlow](https://github.com/naturomics/CapsNet-Tensorflow) project.\n", - "\n", - "That's all for today, I hope you enjoyed this notebook!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.3" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/work_in_progress/extra_tensorflow_reproducibility.ipynb b/work_in_progress/extra_tensorflow_reproducibility.ipynb deleted file mode 100644 index 99f06ac..0000000 --- a/work_in_progress/extra_tensorflow_reproducibility.ipynb +++ /dev/null @@ -1,842 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# TensorFlow Reproducibility" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import tensorflow as tf\n", - "from tensorflow import keras" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Checklist" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "1. Do not run TensorFlow on the GPU.\n", - "2. Beware of multithreading, and make TensorFlow single-threaded.\n", - "3. Set all the random seeds.\n", - "4. Eliminate any other source of variability." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Do Not Run TensorFlow on the GPU" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Some operations (like `tf.reduce_sum()`) have favor performance over precision, and their outputs may vary slightly across runs. To get reproducible results, make sure TensorFlow runs on the CPU:" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Beware of Multithreading" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Because floats have limited precision, the order of execution matters:" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "2. * 5. / 7." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "2. / 7. * 5." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You should make sure TensorFlow runs your ops on a single thread:" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "config = tf.ConfigProto(intra_op_parallelism_threads=1,\n", - " inter_op_parallelism_threads=1)\n", - "\n", - "with tf.Session(config=config) as sess:\n", - " #... this will run single threaded\n", - " pass" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The thread pools for all sessions are created when you create the first session, so all sessions in the rest of this notebook will be single-threaded:" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "with tf.Session() as sess:\n", - " #... also single-threaded!\n", - " pass" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set all the random seeds!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Python's built-in `hash()` function" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "print(set(\"Try restarting the kernel and running this again\"))\n", - "print(set(\"Try restarting the kernel and running this again\"))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Since Python 3.3, the result will be different every time, unless you start Python with the `PYTHONHASHSEED` environment variable set to `0`:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```shell\n", - "PYTHONHASHSEED=0 python\n", - "```\n", - "\n", - "```pycon\n", - ">>> print(set(\"Now the output is stable across runs\"))\n", - "{'n', 'b', 'h', 'o', 'i', 'a', 'r', 't', 'p', 'N', 's', 'c', ' ', 'l', 'e', 'w', 'u'}\n", - ">>> exit()\n", - "```\n", - "\n", - "```shell\n", - "PYTHONHASHSEED=0 python\n", - "```\n", - "```pycon\n", - ">>> print(set(\"Now the output is stable across runs\"))\n", - "{'n', 'b', 'h', 'o', 'i', 'a', 'r', 't', 'p', 'N', 's', 'c', ' ', 'l', 'e', 'w', 'u'}\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you could set this environment variable system-wide, but that's probably not a good idea, because this automatic randomization was [introduced for security reasons](http://ocert.org/advisories/ocert-2011-003.html)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Unfortunately, setting the environment variable from within Python (e.g., using `os.environ[\"PYTHONHASHSEED\"]=\"0\"`) will not work, because Python reads it upon startup. For Jupyter notebooks, you have to start the Jupyter server like this:\n", - "\n", - "```shell\n", - "PYTHONHASHSEED=0 jupyter notebook\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "if os.environ.get(\"PYTHONHASHSEED\") != \"0\":\n", - " raise Exception(\"You must set PYTHONHASHSEED=0 when starting the Jupyter server to get reproducible results.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Python Random Number Generators (RNGs)" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "import random\n", - "\n", - "random.seed(42)\n", - "print(random.random())\n", - "print(random.random())\n", - "\n", - "print()\n", - "\n", - "random.seed(42)\n", - "print(random.random())\n", - "print(random.random())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### NumPy RNGs" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "\n", - "np.random.seed(42)\n", - "print(np.random.rand())\n", - "print(np.random.rand())\n", - "\n", - "print()\n", - "\n", - "np.random.seed(42)\n", - "print(np.random.rand())\n", - "print(np.random.rand())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### TensorFlow RNGs" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "TensorFlow's behavior is more complex because of two things:\n", - "* you create a graph, and then you execute it. The random seed must be set before you create the random operations.\n", - "* there are two seeds: one at the graph level, and one at the individual random operation level." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [], - "source": [ - "import tensorflow as tf\n", - "\n", - "tf.set_random_seed(42)\n", - "rnd = tf.random_uniform(shape=[])\n", - "\n", - "with tf.Session() as sess:\n", - " print(rnd.eval())\n", - " print(rnd.eval())\n", - "\n", - "print()\n", - "\n", - "with tf.Session() as sess:\n", - " print(rnd.eval())\n", - " print(rnd.eval())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Every time you reset the graph, you need to set the seed again:" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "\n", - "tf.set_random_seed(42)\n", - "rnd = tf.random_uniform(shape=[])\n", - "\n", - "with tf.Session() as sess:\n", - " print(rnd.eval())\n", - " print(rnd.eval())\n", - "\n", - "print()\n", - "\n", - "with tf.Session() as sess:\n", - " print(rnd.eval())\n", - " print(rnd.eval())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you create your own graph, it will ignore the default graph's seed:" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "tf.set_random_seed(42)\n", - "\n", - "graph = tf.Graph()\n", - "with graph.as_default():\n", - " rnd = tf.random_uniform(shape=[])\n", - "\n", - "with tf.Session(graph=graph):\n", - " print(rnd.eval())\n", - " print(rnd.eval())\n", - "\n", - "print()\n", - "\n", - "with tf.Session(graph=graph):\n", - " print(rnd.eval())\n", - " print(rnd.eval())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You must set its own seed:" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "graph = tf.Graph()\n", - "with graph.as_default():\n", - " tf.set_random_seed(42)\n", - " rnd = tf.random_uniform(shape=[])\n", - "\n", - "with tf.Session(graph=graph):\n", - " print(rnd.eval())\n", - " print(rnd.eval())\n", - "\n", - "print()\n", - "\n", - "with tf.Session(graph=graph):\n", - " print(rnd.eval())\n", - " print(rnd.eval())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you set the seed after the random operation is created, the seed has no effet:" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "\n", - "rnd = tf.random_uniform(shape=[])\n", - "\n", - "tf.set_random_seed(42) # BAD, NO EFFECT!\n", - "with tf.Session() as sess:\n", - " print(rnd.eval())\n", - " print(rnd.eval())\n", - "\n", - "print()\n", - "\n", - "tf.set_random_seed(42) # BAD, NO EFFECT!\n", - "with tf.Session() as sess:\n", - " print(rnd.eval())\n", - " print(rnd.eval())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### A note about operation seeds" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also set a seed for each individual random operation. When you do, it is combined with the graph seed into the final seed used by that op. The following table summarizes how this works:\n", - "\n", - "| Graph seed | Op seed | Resulting seed |\n", - "|------------|---------|--------------------------------|\n", - "| None | None | Random |\n", - "| graph_seed | None | f(graph_seed, op_index) |\n", - "| None | op_seed | f(default_graph_seed, op_seed) |\n", - "| graph_seed | op_seed | f(graph_seed, op_seed) |\n", - "\n", - "* `f()` is a deterministic function.\n", - "* `op_index = graph._last_id` when there is a graph seed, different random ops without op seeds will have different outputs. However, each of them will have the same sequence of outputs at every run.\n", - "\n", - "In eager mode, there is a global seed instead of graph seed (since there is no graph in eager mode)." - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "\n", - "rnd1 = tf.random_uniform(shape=[], seed=42)\n", - "rnd2 = tf.random_uniform(shape=[], seed=42)\n", - "rnd3 = tf.random_uniform(shape=[])\n", - "\n", - "with tf.Session() as sess:\n", - " print(rnd1.eval())\n", - " print(rnd2.eval())\n", - " print(rnd3.eval())\n", - " print(rnd1.eval())\n", - " print(rnd2.eval())\n", - " print(rnd3.eval())\n", - "\n", - "print()\n", - "\n", - "with tf.Session() as sess:\n", - " print(rnd1.eval())\n", - " print(rnd2.eval())\n", - " print(rnd3.eval())\n", - " print(rnd1.eval())\n", - " print(rnd2.eval())\n", - " print(rnd3.eval())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the following example, you may think that all random ops will have the same random seed, but `rnd3` will actually have a different seed:" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [], - "source": [ - "tf.reset_default_graph()\n", - "\n", - "tf.set_random_seed(42)\n", - "\n", - "rnd1 = tf.random_uniform(shape=[], seed=42)\n", - "rnd2 = tf.random_uniform(shape=[], seed=42)\n", - "rnd3 = tf.random_uniform(shape=[])\n", - "\n", - "with tf.Session() as sess:\n", - " print(rnd1.eval())\n", - " print(rnd2.eval())\n", - " print(rnd3.eval())\n", - " print(rnd1.eval())\n", - " print(rnd2.eval())\n", - " print(rnd3.eval())\n", - "\n", - "print()\n", - "\n", - "with tf.Session() as sess:\n", - " print(rnd1.eval())\n", - " print(rnd2.eval())\n", - " print(rnd3.eval())\n", - " print(rnd1.eval())\n", - " print(rnd2.eval())\n", - " print(rnd3.eval())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Estimators API" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Tip**: in a Jupyter notebook, you probably want to set the random seeds regularly so that you can come back and run the notebook from there (instead of from the beginning) and still get reproducible outputs." - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [], - "source": [ - "random.seed(42)\n", - "np.random.seed(42)\n", - "tf.set_random_seed(42)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you use the Estimators API, make sure to create a `RunConfig` and set its `tf_random_seed`, then pass it to the constructor of your estimator:" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [], - "source": [ - "my_config = tf.estimator.RunConfig(tf_random_seed=42)\n", - "\n", - "feature_cols = [tf.feature_column.numeric_column(\"X\", shape=[28 * 28])]\n", - "dnn_clf = tf.estimator.DNNClassifier(hidden_units=[300, 100], n_classes=10,\n", - " feature_columns=feature_cols,\n", - " config=my_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's try it on MNIST:" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [], - "source": [ - "(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()\n", - "X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0\n", - "y_train = y_train.astype(np.int32)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Unfortunately, the `numpy_input_fn` does not allow us to set the seed when `shuffle=True`, so we must shuffle the data ourself and set `shuffle=False`." - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [], - "source": [ - "indices = np.random.permutation(len(X_train))\n", - "X_train_shuffled = X_train[indices]\n", - "y_train_shuffled = y_train[indices]\n", - "\n", - "input_fn = tf.estimator.inputs.numpy_input_fn(\n", - " x={\"X\": X_train_shuffled}, y=y_train_shuffled, num_epochs=10, batch_size=32, shuffle=False)\n", - "dnn_clf.train(input_fn=input_fn)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The final loss should be exactly 0.46282205." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Instead of using the `numpy_input_fn()` function (which cannot reproducibly shuffle the dataset at each epoch), you can create your own input function using the Data API and set its shuffling seed:" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [], - "source": [ - "def create_dataset(X, y=None, n_epochs=1, batch_size=32,\n", - " buffer_size=1000, seed=None):\n", - " dataset = tf.data.Dataset.from_tensor_slices(({\"X\": X}, y))\n", - " dataset = dataset.repeat(n_epochs)\n", - " dataset = dataset.shuffle(buffer_size, seed=seed)\n", - " return dataset.batch(batch_size)\n", - "\n", - "input_fn=lambda: create_dataset(X_train, y_train, seed=42)" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [], - "source": [ - "random.seed(42)\n", - "np.random.seed(42)\n", - "tf.set_random_seed(42)\n", - "\n", - "my_config = tf.estimator.RunConfig(tf_random_seed=42)\n", - "\n", - "feature_cols = [tf.feature_column.numeric_column(\"X\", shape=[28 * 28])]\n", - "dnn_clf = tf.estimator.DNNClassifier(hidden_units=[300, 100], n_classes=10,\n", - " feature_columns=feature_cols,\n", - " config=my_config)\n", - "dnn_clf.train(input_fn=input_fn)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The final loss should be exactly 1.0556093." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "```python\n", - "indices = np.random.permutation(len(X_train))\n", - "X_train_shuffled = X_train[indices]\n", - "y_train_shuffled = y_train[indices]\n", - "\n", - "input_fn = tf.estimator.inputs.numpy_input_fn(\n", - " x={\"X\": X_train_shuffled}, y=y_train_shuffled,\n", - " num_epochs=10, batch_size=32, shuffle=False)\n", - "dnn_clf.train(input_fn=input_fn)\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Keras API" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you use the Keras API, all you need to do is set the random seed any time you clear the session:" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [], - "source": [ - "keras.backend.clear_session()\n", - "\n", - "random.seed(42)\n", - "np.random.seed(42)\n", - "tf.set_random_seed(42)\n", - "\n", - "model = keras.models.Sequential([\n", - " keras.layers.Dense(300, activation=\"relu\"),\n", - " keras.layers.Dense(100, activation=\"relu\"),\n", - " keras.layers.Dense(10, activation=\"softmax\"),\n", - "])\n", - "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"sgd\",\n", - " metrics=[\"accuracy\"])\n", - "model.fit(X_train, y_train, epochs=10)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You should get exactly 97.16% accuracy on the training set at the end of training." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Eliminate other sources of variability" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For example, `os.listdir()` returns file names in an order that depends on how the files were indexed by the file system:" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(10):\n", - " with open(\"my_test_foo_{}\".format(i), \"w\"):\n", - " pass\n", - "\n", - "[f for f in os.listdir() if f.startswith(\"my_test_foo_\")]" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [], - "source": [ - "for i in range(10):\n", - " with open(\"my_test_bar_{}\".format(i), \"w\"):\n", - " pass\n", - "\n", - "[f for f in os.listdir() if f.startswith(\"my_test_bar_\")]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You should sort the file names before you use them:" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [], - "source": [ - "filenames = os.listdir()\n", - "filenames.sort()" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [], - "source": [ - "[f for f in filenames if f.startswith(\"my_test_foo_\")]" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [], - "source": [ - "for f in os.listdir():\n", - " if f.startswith(\"my_test_foo_\") or f.startswith(\"my_test_bar_\"):\n", - " os.remove(f)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "I hope you enjoyed this notebook. If you do not get reproducible results, or if they are different than mine, then please [file an issue](https://github.com/ageron/handson-ml2/issues) on github, specifying what version of Python, TensorFlow, and NumPy you are using, as well as your O.S. version. Thank you!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you want to learn more about Deep Learning and TensorFlow, check out my book [Hands-On Machine Learning with Scitkit-Learn and TensorFlow](http://homl.info/amazon), O'Reilly. You can also follow me on twitter [@aureliengeron](https://twitter.com/aureliengeron) or watch my videos on YouTube at [youtube.com/c/AurelienGeron](https://www.youtube.com/c/AurelienGeron)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -}