Inception model was trained with color range -1 to 1, not 0 to 1

main
Aurélien Geron 2017-09-17 21:15:55 +02:00
parent db529298d8
commit 0bf0e475ab
1 changed files with 176 additions and 104 deletions

View File

@ -31,7 +31,9 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 1, "execution_count": 1,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"# To support both python 2 and python 3\n", "# To support both python 2 and python 3\n",
@ -152,7 +154,9 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 6, "execution_count": 6,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"reset_graph()\n", "reset_graph()\n",
@ -267,7 +271,9 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 13, "execution_count": 13,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"reset_graph()\n", "reset_graph()\n",
@ -430,7 +436,9 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 21, "execution_count": 21,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"height = 28\n", "height = 28\n",
@ -566,7 +574,9 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 24, "execution_count": 24,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"import tensorflow as tf\n", "import tensorflow as tf\n",
@ -749,7 +759,7 @@
"## 8. Classifying large images using Inception v3.\n", "## 8. Classifying large images using Inception v3.\n",
"\n", "\n",
"### 8.1.\n", "### 8.1.\n",
"Exercise: Download some images of various animals. Load them in Python, for example using the `matplotlib.image.mpimg.imread()` function. Resize and/or crop them to 299 × 299 pixels, and ensure that they have just three channels (RGB), with no transparency channel." "Exercise: Download some images of various animals. Load them in Python, for example using the `matplotlib.image.mpimg.imread()` function or the `scipy.misc.imread()` function. Resize and/or crop them to 299 × 299 pixels, and ensure that they have just three channels (RGB), with no transparency channel. The images that the Inception model was trained on were preprocessed so that their values range from -1.0 to 1.0, so you must ensure that your images do too."
] ]
}, },
{ {
@ -782,8 +792,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 8.2.\n", "Ensure that the values are in the range [-1, 1] (as expected by the pretrained Inception model), instead of [0, 1]:"
"Exercise: Download the latest pretrained Inception v3 model: the checkpoint is available at https://goo.gl/nxSQvl[].\n"
] ]
}, },
{ {
@ -793,6 +802,25 @@
"collapsed": true "collapsed": true
}, },
"outputs": [], "outputs": [],
"source": [
"test_image = 2 * test_image - 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8.2.\n",
"Exercise: Download the latest pretrained Inception v3 model: the checkpoint is available at https://goo.gl/nxSQvl. The list of class names is available at https://goo.gl/brXRtZ, but you must insert a \"background\" class at the beginning.\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [ "source": [
"import sys\n", "import sys\n",
"import tarfile\n", "import tarfile\n",
@ -822,8 +850,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 31, "execution_count": 32,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"fetch_pretrained_inception_v3()" "fetch_pretrained_inception_v3()"
@ -831,7 +861,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 32, "execution_count": 33,
"metadata": { "metadata": {
"collapsed": true "collapsed": true
}, },
@ -849,8 +879,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 33, "execution_count": 34,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"class_names = [\"background\"] + load_class_names()" "class_names = [\"background\"] + load_class_names()"
@ -858,7 +890,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 34, "execution_count": 35,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -875,8 +907,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 35, "execution_count": 36,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from tensorflow.contrib.slim.nets import inception\n", "from tensorflow.contrib.slim.nets import inception\n",
@ -902,7 +936,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 36, "execution_count": 37,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -916,12 +950,12 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### 8.5.\n", "### 8.5.\n",
"Run the model to classify the images you prepared. Display the top five predictions for each image, along with the estimated probability (the list of class names is available at https://goo.gl/brXRtZ[]). How accurate is the model?\n" "Run the model to classify the images you prepared. Display the top five predictions for each image, along with the estimated probability (the list of class names is available at https://goo.gl/brXRtZ). How accurate is the model?\n"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 37, "execution_count": 38,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -934,7 +968,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 38, "execution_count": 39,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -944,7 +978,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 39, "execution_count": 40,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -953,7 +987,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 40, "execution_count": 41,
"metadata": { "metadata": {
"scrolled": true "scrolled": true
}, },
@ -991,8 +1025,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 41, "execution_count": 42,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"import sys\n", "import sys\n",
@ -1016,8 +1052,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 42, "execution_count": 43,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"fetch_flowers()" "fetch_flowers()"
@ -1032,7 +1070,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 43, "execution_count": 44,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1051,7 +1089,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 44, "execution_count": 45,
"metadata": { "metadata": {
"collapsed": true "collapsed": true
}, },
@ -1077,7 +1115,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 45, "execution_count": 46,
"metadata": { "metadata": {
"collapsed": true "collapsed": true
}, },
@ -1096,7 +1134,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 46, "execution_count": 47,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1147,7 +1185,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 47, "execution_count": 48,
"metadata": { "metadata": {
"collapsed": true "collapsed": true
}, },
@ -1189,8 +1227,8 @@
" # Now, let's resize the image to the target dimensions.\n", " # Now, let's resize the image to the target dimensions.\n",
" image = imresize(image, (target_width, target_height))\n", " image = imresize(image, (target_width, target_height))\n",
" \n", " \n",
" # Finally, the Convolution Neural Network expects colors represented as\n", " # Finally, let's ensure that the colors are represented as\n",
" # 32-bit floats ranging from 0.0 to 1.0:\n", " # 32-bit floats ranging from 0.0 to 1.0 (for now):\n",
" return image.astype(np.float32) / 255" " return image.astype(np.float32) / 255"
] ]
}, },
@ -1210,7 +1248,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 48, "execution_count": 49,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1230,7 +1268,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 49, "execution_count": 50,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1252,7 +1290,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 50, "execution_count": 51,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1285,8 +1323,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 51, "execution_count": 52,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"def prepare_image_with_tensorflow(image, target_width = 299, target_height = 299, max_zoom = 0.2):\n", "def prepare_image_with_tensorflow(image, target_width = 299, target_height = 299, max_zoom = 0.2):\n",
@ -1339,7 +1379,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 52, "execution_count": 53,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1382,8 +1422,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 53, "execution_count": 54,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from tensorflow.contrib.slim.nets import inception\n", "from tensorflow.contrib.slim.nets import inception\n",
@ -1408,7 +1450,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 54, "execution_count": 55,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1424,7 +1466,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 55, "execution_count": 56,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1440,7 +1482,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 56, "execution_count": 57,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1456,7 +1498,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 57, "execution_count": 58,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1472,7 +1514,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 58, "execution_count": 59,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1488,7 +1530,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 59, "execution_count": 60,
"metadata": { "metadata": {
"collapsed": true "collapsed": true
}, },
@ -1506,8 +1548,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 60, "execution_count": 61,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"n_outputs = len(flower_classes)\n", "n_outputs = len(flower_classes)\n",
@ -1534,8 +1578,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 61, "execution_count": 62,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"y = tf.placeholder(tf.int32, shape=[None])\n", "y = tf.placeholder(tf.int32, shape=[None])\n",
@ -1558,7 +1604,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 62, "execution_count": 63,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1589,7 +1635,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 63, "execution_count": 64,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1606,8 +1652,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 64, "execution_count": 65,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"flower_paths_and_classes = []\n", "flower_paths_and_classes = []\n",
@ -1625,8 +1673,10 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 65, "execution_count": 66,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"test_ratio = 0.2\n", "test_ratio = 0.2\n",
@ -1647,7 +1697,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 66, "execution_count": 67,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1663,7 +1713,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 67, "execution_count": 68,
"metadata": { "metadata": {
"collapsed": true "collapsed": true
}, },
@ -1675,36 +1725,29 @@
" batch_paths_and_classes = sample(flower_paths_and_classes, batch_size)\n", " batch_paths_and_classes = sample(flower_paths_and_classes, batch_size)\n",
" images = [mpimg.imread(path)[:, :, :channels] for path, labels in batch_paths_and_classes]\n", " images = [mpimg.imread(path)[:, :, :channels] for path, labels in batch_paths_and_classes]\n",
" prepared_images = [prepare_image(image) for image in images]\n", " prepared_images = [prepare_image(image) for image in images]\n",
" X_batch = np.stack(prepared_images)\n", " X_batch = 2 * np.stack(prepared_images) - 1 # Inception expects colors ranging from -1 to 1\n",
" y_batch = np.array([labels for path, labels in batch_paths_and_classes], dtype=np.int32)\n", " y_batch = np.array([labels for path, labels in batch_paths_and_classes], dtype=np.int32)\n",
" return X_batch, y_batch" " return X_batch, y_batch"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 68, "execution_count": 69,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"X_batch, y_batch = prepare_batch(flower_paths_and_classes_train, batch_size=4)" "X_batch, y_batch = prepare_batch(flower_paths_and_classes_train, batch_size=4)"
] ]
}, },
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"X_batch.shape"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 70, "execution_count": 70,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"X_batch.dtype" "X_batch.shape"
] ]
}, },
{ {
@ -1713,7 +1756,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"y_batch.shape" "X_batch.dtype"
] ]
}, },
{ {
@ -1721,6 +1764,15 @@
"execution_count": 72, "execution_count": 72,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [
"y_batch.shape"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [ "source": [
"y_batch.dtype" "y_batch.dtype"
] ]
@ -1734,48 +1786,52 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 73, "execution_count": 74,
"metadata": {}, "metadata": {
"collapsed": true
},
"outputs": [], "outputs": [],
"source": [ "source": [
"X_test, y_test = prepare_batch(flower_paths_and_classes_test, batch_size=len(flower_paths_and_classes_test))" "X_test, y_test = prepare_batch(flower_paths_and_classes_test, batch_size=len(flower_paths_and_classes_test))"
] ]
}, },
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [],
"source": [
"X_test.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could prepare the training set in much the same way, but it would only generate one variant for each image. Instead, it's preferable to generate the training batches on the fly during training, so that we can really benefit from data augmentation, with many variants of each image."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now, we are ready to train the network (or more precisely, the output layer we just added, since all the other layers are frozen). Be aware that this may take a (very) long time."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 75, "execution_count": 75,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [
"X_test.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could prepare the training set in much the same way, but it would only generate one variant for each image. Instead, it's preferable to generate the training batches on the fly during training, so that we can really benefit from data augmentation, with many variants of each image."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And now, we are ready to train the network (or more precisely, the output layer we just added, since all the other layers are frozen). Be aware that this may take a (very) long time."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [ "source": [
"X_test, y_test = prepare_batch(flower_paths_and_classes_test, batch_size=len(flower_paths_and_classes_test))" "X_test, y_test = prepare_batch(flower_paths_and_classes_test, batch_size=len(flower_paths_and_classes_test))"
] ]
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 77,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
@ -1798,12 +1854,12 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": 78,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"n_epochs = 10\n", "n_epochs = 10\n",
"batch_size = 50\n", "batch_size = 40\n",
"n_iterations_per_epoch = len(flower_paths_and_classes_train) // batch_size\n", "n_iterations_per_epoch = len(flower_paths_and_classes_train) // batch_size\n",
"\n", "\n",
"with tf.Session() as sess:\n", "with tf.Session() as sess:\n",
@ -1820,10 +1876,26 @@
" acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})\n", " acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})\n",
" print(\" Train accuracy:\", acc_train)\n", " print(\" Train accuracy:\", acc_train)\n",
"\n", "\n",
" save_path = saver.save(sess, \"./my_flowers_model\")\n", " save_path = saver.save(sess, \"./my_flowers_model\")"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [],
"source": [
"n_test_batches = 10\n",
"X_test_batches = np.array_split(X_test, n_test_batches)\n",
"y_test_batches = np.array_split(y_test, n_test_batches)\n",
"\n",
"with tf.Session() as sess:\n",
" saver.restore(sess, \"./my_flowers_model\")\n",
"\n", "\n",
" print(\"Computing final accuracy on the test set (this will take a while)...\")\n", " print(\"Computing final accuracy on the test set (this will take a while)...\")\n",
" acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})\n", " acc_test = np.mean([\n",
" accuracy.eval(feed_dict={X: X_test_batch, y: y_test_batch})\n",
" for X_test_batch, y_test_batch in zip(X_test_batches, y_test_batches)])\n",
" print(\"Test accuracy:\", acc_test)" " print(\"Test accuracy:\", acc_test)"
] ]
}, },
@ -1831,7 +1903,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Okay, 72.3% accuracy is not great (in fact, it's really bad), but this is only after 10 epochs, and freezing all layers except for the output layer. If you have a GPU, you can try again and let training run for much longer (e.g., using early stopping to decide when to stop). You can also improve the image preprocessing function to make more tweaks to the image (e.g., changing the brightness and hue, rotate the image slightly). You can reach above 95% accuracy on this task. If you want to dig deeper, this [great blog post](https://kwotsin.github.io/tech/2017/02/11/transfer-learning.html) goes into more details and reaches 96% accuracy." "Okay, 70.58% accuracy is not great (in fact, it's really bad), but this is only after 10 epochs, and freezing all layers except for the output layer. If you have a GPU, you can try again and let training run for much longer (e.g., using early stopping to decide when to stop). You can also improve the image preprocessing function to make more tweaks to the image (e.g., changing the brightness and hue, rotate the image slightly). You can reach above 95% accuracy on this task. If you want to dig deeper, this [great blog post](https://kwotsin.github.io/tech/2017/02/11/transfer-learning.html) goes into more details and reaches 96% accuracy."
] ]
}, },
{ {