handson-ml/12_custom_models_and_traini...

3891 lines
105 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters!

This file contains invisible Unicode characters that may be processed differently from what appears below. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to reveal hidden characters.

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Chapter 12 Custom Models and Training with TensorFlow**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_This notebook contains all the sample code and solutions to the exercises in chapter 12, as well as code examples from Appendix C_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<table align=\"left\">\n",
" <td>\n",
" <a href=\"https://colab.research.google.com/github/ageron/handson-ml3/blob/main/12_custom_models_and_training_with_tensorflow.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://kaggle.com/kernels/welcome?src=https://github.com/ageron/handson-ml3/blob/main/12_custom_models_and_training_with_tensorflow.ipynb\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" /></a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This project requires Python 3.8 or above:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"assert sys.version_info >= (3, 8)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And TensorFlow ≥ 2.6:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"\n",
"assert tf.__version__ >= \"2.6.0\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using TensorFlow like NumPy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tensors and Operations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Tensors"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"t = tf.constant([[1., 2., 3.], [4., 5., 6.]]) # matrix\n",
"t"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"t.shape"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"t.dtype"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Indexing"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"t[:, 1:]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"t[..., 1, tf.newaxis]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Ops"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"t + 10"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"tf.square(t)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"t @ tf.transpose(t)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Scalars"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"tf.constant(42)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Keras's low-level API"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You may still run across code that uses Keras's low-level API:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"K = tf.keras.backend\n",
"K.square(K.transpose(t)) + 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But since Keras does not support multiple backends anymore, you should instead use TF's low-level API directly:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"tf.square(tf.transpose(t)) + 10"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tensors and NumPy"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"a = np.array([2., 4., 5.])\n",
"tf.constant(a)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"t.numpy()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"np.array(t)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"tf.square(a)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"np.square(t)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Type Conversions"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" tf.constant(2.0) + tf.constant(40)\n",
"except tf.errors.InvalidArgumentError as ex:\n",
" print(ex)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" tf.constant(2.0) + tf.constant(40., dtype=tf.float64)\n",
"except tf.errors.InvalidArgumentError as ex:\n",
" print(ex)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"t2 = tf.constant(40., dtype=tf.float64)\n",
"tf.constant(2.0) + tf.cast(t2, tf.float32)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Variables"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])\n",
"v"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"v.assign(2 * v)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"v[0, 1].assign(42)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"v[:, 2].assign([0., 1.])"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"v.scatter_nd_update(\n",
" indices=[[0, 0], [1, 2]], updates=[100., 200.])"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to use scatter_update()\n",
"sparse_delta = tf.IndexedSlices(values=[[1., 2., 3.], [4., 5., 6.]],\n",
" indices=[1, 0])\n",
"v.scatter_update(sparse_delta)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" v[1] = [7., 8., 9.]\n",
"except TypeError as ex:\n",
" print(ex)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Strings"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code in this section and all the following sections in appendix C"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
"tf.constant(b\"hello world\")"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"tf.constant(\"café\")"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"u = tf.constant([ord(c) for c in \"café\"])\n",
"u"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [],
"source": [
"b = tf.strings.unicode_encode(u, \"UTF-8\")\n",
"tf.strings.length(b, unit=\"UTF8_CHAR\")"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [],
"source": [
"tf.strings.unicode_decode(b, \"UTF-8\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Other Data Structures"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The code in this section is in Appendix C."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### String arrays"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
"tf.constant(b\"hello world\")"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
"tf.constant(\"café\")"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
"u = tf.constant([ord(c) for c in \"café\"])\n",
"u"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"b = tf.strings.unicode_encode(u, \"UTF-8\")\n",
"b"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"tf.strings.length(b, unit=\"UTF8_CHAR\")"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"tf.strings.unicode_decode(b, \"UTF-8\")"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
"p = tf.constant([\"Café\", \"Coffee\", \"caffè\", \"咖啡\"])"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
"tf.strings.length(p, unit=\"UTF8_CHAR\")"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"r = tf.strings.unicode_decode(p, \"UTF8\")\n",
"r"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Ragged tensors"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"r[1]"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"r[1:3] # extra code a slice of a ragged tensor is a ragged tensor"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
"r2 = tf.ragged.constant([[65, 66], [], [67]])\n",
"tf.concat([r, r2], axis=0)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"r3 = tf.ragged.constant([[68, 69, 70], [71], [], [72, 73]])\n",
"print(tf.concat([r, r3], axis=1))"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
"r.to_tensor()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparse tensors"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
"s = tf.SparseTensor(indices=[[0, 1], [1, 0], [2, 3]],\n",
" values=[1., 2., 3.],\n",
" dense_shape=[3, 4])"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"tf.sparse.to_dense(s)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
"s * 42.0"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" s + 42.0\n",
"except TypeError as ex:\n",
" print(ex)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to multiply a sparse tensor and a dense tensor\n",
"s4 = tf.constant([[10., 20.], [30., 40.], [50., 60.], [70., 80.]])\n",
"tf.sparse.sparse_dense_matmul(s, s4)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [],
"source": [
"# extra code when creating a sparse tensor, values must be given in \"reading\n",
"# order\", or else `to_dense()` will fail.\n",
"s5 = tf.SparseTensor(indices=[[0, 2], [0, 1]], # WRONG ORDER!\n",
" values=[1., 2.],\n",
" dense_shape=[3, 4])\n",
"try:\n",
" tf.sparse.to_dense(s5)\n",
"except tf.errors.InvalidArgumentError as ex:\n",
" print(ex)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to fix the sparse tensor s5 by reordering its values\n",
"s6 = tf.sparse.reorder(s5)\n",
"tf.sparse.to_dense(s6)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Tensor Arrays"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [],
"source": [
"array = tf.TensorArray(dtype=tf.float32, size=3)\n",
"array = array.write(0, tf.constant([1., 2.]))\n",
"array = array.write(1, tf.constant([3., 10.]))\n",
"array = array.write(2, tf.constant([5., 7.]))\n",
"tensor1 = array.read(1) # returns (and zeros out!) tf.constant([3., 10.])"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"array.stack()"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to disable clear_after_read\n",
"array2 = tf.TensorArray(dtype=tf.float32, size=3, clear_after_read=False)\n",
"array2 = array2.write(0, tf.constant([1., 2.]))\n",
"array2 = array2.write(1, tf.constant([3., 10.]))\n",
"array2 = array2.write(2, tf.constant([5., 7.]))\n",
"tensor2 = array2.read(1) # returns tf.constant([3., 10.])\n",
"array2.stack()"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to create and use a tensor array with a dynamic size\n",
"array3 = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)\n",
"array3 = array3.write(0, tf.constant([1., 2.]))\n",
"array3 = array3.write(1, tf.constant([3., 10.]))\n",
"array3 = array3.write(2, tf.constant([5., 7.]))\n",
"tensor3 = array3.read(1)\n",
"array3.stack()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sets"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"a = tf.constant([[1, 5, 9]])\n",
"b = tf.constant([[5, 6, 9, 11]])\n",
"u = tf.sets.union(a, b)\n",
"u"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [],
"source": [
"tf.sparse.to_dense(u)"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"a = tf.constant([[1, 5, 9], [10, 0, 0]])\n",
"b = tf.constant([[5, 6, 9, 11], [13, 0, 0, 0]])\n",
"u = tf.sets.union(a, b)\n",
"tf.sparse.to_dense(u)"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to use a different default value: -1 in this case\n",
"a = tf.constant([[1, 5, 9], [10, -1, -1]])\n",
"b = tf.constant([[5, 6, 9, 11], [13, -1, -1, -1]])\n",
"u = tf.sets.union(a, b)\n",
"tf.sparse.to_dense(u, default_value=-1)"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to use `tf.sets.difference()`\n",
"set1 = tf.constant([[2, 3, 5, 7], [7, 9, 0, 0]])\n",
"set2 = tf.constant([[4, 5, 6], [9, 10, 0]])\n",
"tf.sparse.to_dense(tf.sets.difference(set1, set2))"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to use `tf.sets.difference()`\n",
"tf.sparse.to_dense(tf.sets.intersection(set1, set2))"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [],
"source": [
"# extra code check whether set1[0] contains 5\n",
"tf.sets.size(tf.sets.intersection(set1[:1], tf.constant([[5, 0, 0, 0]]))) > 0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Queues"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [],
"source": [
"q = tf.queue.FIFOQueue(3, [tf.int32, tf.string], shapes=[(), ()])\n",
"q.enqueue([10, b\"windy\"])\n",
"q.enqueue([15, b\"sunny\"])\n",
"q.size()"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"q.dequeue()"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [],
"source": [
"q.enqueue_many([[13, 16], [b'cloudy', b'rainy']])"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [],
"source": [
"q.dequeue_many(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Custom loss function"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [],
"source": [
"def huber_fn(y_true, y_pred):\n",
" error = y_true - y_pred\n",
" is_small_error = tf.abs(error) < 1\n",
" squared_loss = tf.square(error) / 2\n",
" linear_loss = tf.abs(error) - 0.5\n",
" return tf.where(is_small_error, squared_loss, linear_loss)"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows what the Huber loss looks like\n",
"\n",
"import matplotlib.pyplot as plt\n",
"\n",
"plt.figure(figsize=(8, 3.5))\n",
"z = np.linspace(-4, 4, 200)\n",
"z_center = np.linspace(-1, 1, 200)\n",
"plt.plot(z, huber_fn(0, z), \"b-\", linewidth=2, label=\"huber($z$)\")\n",
"plt.plot(z, z ** 2 / 2, \"r:\", linewidth=1)\n",
"plt.plot(z_center, z_center ** 2 / 2, \"r\", linewidth=2)\n",
"plt.plot([-1, -1], [0, huber_fn(0., -1.)], \"k--\")\n",
"plt.plot([1, 1], [0, huber_fn(0., 1.)], \"k--\")\n",
"plt.gca().axhline(y=0, color='k')\n",
"plt.gca().axvline(x=0, color='k')\n",
"plt.text(2.1, 3.5, r\"$\\frac{1}{2}z^2$\", color=\"r\", fontsize=15)\n",
"plt.text(3.0, 2.2, r\"$|z| - \\frac{1}{2}$\", color=\"b\", fontsize=15)\n",
"plt.axis([-4, 4, 0, 4])\n",
"plt.grid(True)\n",
"plt.xlabel(\"$z$\")\n",
"plt.legend(fontsize=14)\n",
"plt.title(\"Huber loss\", fontsize=14)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To test our custom loss function, let's create a basic Keras model and train it on the California housing dataset:"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# extra code loads, splits and scales the California housing dataset, then\n",
"# creates a simple Keras model\n",
"\n",
"from sklearn.datasets import fetch_california_housing\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.preprocessing import StandardScaler\n",
"\n",
"housing = fetch_california_housing()\n",
"X_train_full, X_test, y_train_full, y_test = train_test_split(\n",
" housing.data, housing.target.reshape(-1, 1), random_state=42)\n",
"X_train, X_valid, y_train, y_valid = train_test_split(\n",
" X_train_full, y_train_full, random_state=42)\n",
"\n",
"scaler = StandardScaler()\n",
"X_train_scaled = scaler.fit_transform(X_train)\n",
"X_valid_scaled = scaler.transform(X_valid)\n",
"X_test_scaled = scaler.transform(X_test)\n",
"\n",
"input_shape = X_train.shape[1:]\n",
"\n",
"tf.random.set_seed(42)\n",
"model = tf.keras.Sequential([\n",
" tf.keras.layers.Dense(30, activation=\"relu\", kernel_initializer=\"he_normal\",\n",
" input_shape=input_shape),\n",
" tf.keras.layers.Dense(1),\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=huber_fn, optimizer=\"nadam\", metrics=[\"mae\"])"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Saving/Loading Models with Custom Objects"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [],
"source": [
"model.save(\"my_model_with_a_custom_loss\") # extra code saving works fine"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [],
"source": [
"model = tf.keras.models.load_model(\"my_model_with_a_custom_loss\",\n",
" custom_objects={\"huber_fn\": huber_fn})"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [],
"source": [
"def create_huber(threshold=1.0):\n",
" def huber_fn(y_true, y_pred):\n",
" error = y_true - y_pred\n",
" is_small_error = tf.abs(error) < threshold\n",
" squared_loss = tf.square(error) / 2\n",
" linear_loss = threshold * tf.abs(error) - threshold ** 2 / 2\n",
" return tf.where(is_small_error, squared_loss, linear_loss)\n",
" return huber_fn"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=create_huber(2.0), optimizer=\"nadam\", metrics=[\"mae\"])"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [],
"source": [
"model.save(\"my_model_with_a_custom_loss_threshold_2\")"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [],
"source": [
"model = tf.keras.models.load_model(\"my_model_with_a_custom_loss_threshold_2\",\n",
" custom_objects={\"huber_fn\": create_huber(2.0)})"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [],
"source": [
"class HuberLoss(tf.keras.losses.Loss):\n",
" def __init__(self, threshold=1.0, **kwargs):\n",
" self.threshold = threshold\n",
" super().__init__(**kwargs)\n",
"\n",
" def call(self, y_true, y_pred):\n",
" error = y_true - y_pred\n",
" is_small_error = tf.abs(error) < self.threshold\n",
" squared_loss = tf.square(error) / 2\n",
" linear_loss = self.threshold * tf.abs(error) - self.threshold**2 / 2\n",
" return tf.where(is_small_error, squared_loss, linear_loss)\n",
"\n",
" def get_config(self):\n",
" base_config = super().get_config()\n",
" return {**base_config, \"threshold\": self.threshold}"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {},
"outputs": [],
"source": [
"# extra code creates another basic Keras model\n",
"tf.random.set_seed(42)\n",
"model = tf.keras.Sequential([\n",
" tf.keras.layers.Dense(30, activation=\"relu\", kernel_initializer=\"he_normal\",\n",
" input_shape=input_shape),\n",
" tf.keras.layers.Dense(1),\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=HuberLoss(2.), optimizer=\"nadam\", metrics=[\"mae\"])"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"outputs": [],
"source": [
"model.save(\"my_model_with_a_custom_loss_class\") # extra code saving works"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"outputs": [],
"source": [
"model = tf.keras.models.load_model(\"my_model_with_a_custom_loss_class\",\n",
" custom_objects={\"HuberLoss\": HuberLoss})"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows that loading worked fine, the model can be used normally\n",
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [],
"source": [
"model.loss.threshold # extra code the treshold was loaded correctly"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Other Custom Functions"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {},
"outputs": [],
"source": [
"def my_softplus(z):\n",
" return tf.math.log(1.0 + tf.exp(z))\n",
"\n",
"def my_glorot_initializer(shape, dtype=tf.float32):\n",
" stddev = tf.sqrt(2. / (shape[0] + shape[1]))\n",
" return tf.random.normal(shape, stddev=stddev, dtype=dtype)\n",
"\n",
"def my_l1_regularizer(weights):\n",
" return tf.reduce_sum(tf.abs(0.01 * weights))\n",
"\n",
"def my_positive_weights(weights): # return value is just tf.nn.relu(weights)\n",
" return tf.where(weights < 0., tf.zeros_like(weights), weights)"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [],
"source": [
"layer = tf.keras.layers.Dense(1, activation=my_softplus,\n",
" kernel_initializer=my_glorot_initializer,\n",
" kernel_regularizer=my_l1_regularizer,\n",
" kernel_constraint=my_positive_weights)"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [],
"source": [
"# extra code show that building, training, saving, loading, and training again\n",
"# works fine with a model containing many custom parts\n",
"\n",
"tf.random.set_seed(42)\n",
"model = tf.keras.Sequential([\n",
" tf.keras.layers.Dense(30, activation=\"relu\", kernel_initializer=\"he_normal\",\n",
" input_shape=input_shape),\n",
" tf.keras.layers.Dense(1, activation=my_softplus,\n",
" kernel_initializer=my_glorot_initializer,\n",
" kernel_regularizer=my_l1_regularizer,\n",
" kernel_constraint=my_positive_weights)\n",
"])\n",
"model.compile(loss=\"mse\", optimizer=\"nadam\", metrics=[\"mae\"])\n",
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))\n",
"model.save(\"my_model_with_many_custom_parts\")\n",
"model = tf.keras.models.load_model(\n",
" \"my_model_with_many_custom_parts\",\n",
" custom_objects={\n",
" \"my_l1_regularizer\": my_l1_regularizer,\n",
" \"my_positive_weights\": my_positive_weights,\n",
" \"my_glorot_initializer\": my_glorot_initializer,\n",
" \"my_softplus\": my_softplus,\n",
" }\n",
")\n",
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [],
"source": [
"class MyL1Regularizer(tf.keras.regularizers.Regularizer):\n",
" def __init__(self, factor):\n",
" self.factor = factor\n",
"\n",
" def __call__(self, weights):\n",
" return tf.reduce_sum(tf.abs(self.factor * weights))\n",
"\n",
" def get_config(self):\n",
" return {\"factor\": self.factor}"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {},
"outputs": [],
"source": [
"# extra code again, show that everything works fine, this time using our\n",
"# custom regularizer class\n",
"\n",
"tf.random.set_seed(42)\n",
"model = tf.keras.Sequential([\n",
" tf.keras.layers.Dense(30, activation=\"relu\", kernel_initializer=\"he_normal\",\n",
" input_shape=input_shape),\n",
" tf.keras.layers.Dense(1, activation=my_softplus,\n",
" kernel_regularizer=MyL1Regularizer(0.01),\n",
" kernel_constraint=my_positive_weights,\n",
" kernel_initializer=my_glorot_initializer),\n",
"])\n",
"model.compile(loss=\"mse\", optimizer=\"nadam\", metrics=[\"mae\"])\n",
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))\n",
"model.save(\"my_model_with_many_custom_parts\")\n",
"model = tf.keras.models.load_model(\n",
" \"my_model_with_many_custom_parts\",\n",
" custom_objects={\n",
" \"MyL1Regularizer\": MyL1Regularizer,\n",
" \"my_positive_weights\": my_positive_weights,\n",
" \"my_glorot_initializer\": my_glorot_initializer,\n",
" \"my_softplus\": my_softplus,\n",
" }\n",
")\n",
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Custom Metrics"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [],
"source": [
"# extra code once again, lets' create a basic Keras model\n",
"tf.random.set_seed(42)\n",
"model = tf.keras.Sequential([\n",
" tf.keras.layers.Dense(30, activation=\"relu\", kernel_initializer=\"he_normal\",\n",
" input_shape=input_shape),\n",
" tf.keras.layers.Dense(1),\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=\"mse\", optimizer=\"nadam\", metrics=[create_huber(2.0)])"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [],
"source": [
"# extra code train the model with our custom metric\n",
"model.fit(X_train_scaled, y_train, epochs=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: if you use the same function as the loss and a metric, you may be surprised to see slightly different results. This is in part because the operations are not computed exactly in the same order, so there might be tiny floating point errors. More importantly, if you use sample weights or class weights, then the equations are a bit different:\n",
"* the `fit()` method keeps track of the mean of all batch losses seen so far since the start of the epoch. Each batch loss is the sum of the weighted instance losses divided by the _batch size_ (not the sum of weights, so the batch loss is _not_ the weighted mean of the losses).\n",
"* the metric since the start of the epoch is equal to the sum of weighted instance losses divided by sum of all weights seen so far. In other words, it is the weighted mean of all the instance losses. Not the same thing."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Streaming metrics"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [],
"source": [
"precision = tf.keras.metrics.Precision()\n",
"precision([0, 1, 1, 1, 0, 1, 0, 1], [1, 1, 0, 1, 0, 1, 0, 1])"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [],
"source": [
"precision([0, 1, 0, 0, 1, 0, 1, 1], [1, 0, 1, 1, 0, 0, 0, 0])"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [],
"source": [
"precision.result()"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [],
"source": [
"precision.variables"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [],
"source": [
"precision.reset_states()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Creating a streaming metric:"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [],
"source": [
"class HuberMetric(tf.keras.metrics.Metric):\n",
" def __init__(self, threshold=1.0, **kwargs):\n",
" super().__init__(**kwargs) # handles base args (e.g., dtype)\n",
" self.threshold = threshold\n",
" self.huber_fn = create_huber(threshold)\n",
" self.total = self.add_weight(\"total\", initializer=\"zeros\")\n",
" self.count = self.add_weight(\"count\", initializer=\"zeros\")\n",
"\n",
" def update_state(self, y_true, y_pred, sample_weight=None):\n",
" sample_metrics = self.huber_fn(y_true, y_pred)\n",
" self.total.assign_add(tf.reduce_sum(sample_metrics))\n",
" self.count.assign_add(tf.cast(tf.size(y_true), tf.float32))\n",
"\n",
" def result(self):\n",
" return self.total / self.count\n",
"\n",
" def get_config(self):\n",
" base_config = super().get_config()\n",
" return {**base_config, \"threshold\": self.threshold}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Extra material** the rest of this section tests the `HuberMetric` class and shows another implementation subclassing `tf.keras.metrics.Mean`."
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {},
"outputs": [],
"source": [
"m = HuberMetric(2.)\n",
"\n",
"# total = 2 * |10 - 2| - 2²/2 = 14\n",
"# count = 1\n",
"# result = 14 / 1 = 14\n",
"m(tf.constant([[2.]]), tf.constant([[10.]]))"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {},
"outputs": [],
"source": [
"# total = total + (|1 - 0|² / 2) + (2 * |9.25 - 5| - 2² / 2) = 14 + 7 = 21\n",
"# count = count + 2 = 3\n",
"# result = total / count = 21 / 3 = 7\n",
"m(tf.constant([[0.], [5.]]), tf.constant([[1.], [9.25]]))"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {},
"outputs": [],
"source": [
"m.result()"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [],
"source": [
"m.variables"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [],
"source": [
"m.reset_states()\n",
"m.variables"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check that the `HuberMetric` class works well:"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"model = tf.keras.Sequential([\n",
" tf.keras.layers.Dense(30, activation=\"relu\", kernel_initializer=\"he_normal\",\n",
" input_shape=input_shape),\n",
" tf.keras.layers.Dense(1),\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=create_huber(2.0), optimizer=\"nadam\",\n",
" metrics=[HuberMetric(2.0)])"
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X_train_scaled, y_train, epochs=2)"
]
},
{
"cell_type": "code",
"execution_count": 114,
"metadata": {},
"outputs": [],
"source": [
"model.save(\"my_model_with_a_custom_metric\")"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {},
"outputs": [],
"source": [
"model = tf.keras.models.load_model(\n",
" \"my_model_with_a_custom_metric\",\n",
" custom_objects={\n",
" \"huber_fn\": create_huber(2.0),\n",
" \"HuberMetric\": HuberMetric\n",
" }\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X_train_scaled, y_train, epochs=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`model.metrics` contains the model's loss followed by the model's metric(s), so the `HuberMetric` is `model.metrics[-1]`:"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [],
"source": [
"model.metrics[-1].threshold"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looks like it works fine! More simply, we could have created the class like this:"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [],
"source": [
"class HuberMetric(tf.keras.metrics.Mean):\n",
" def __init__(self, threshold=1.0, name='HuberMetric', dtype=None):\n",
" self.threshold = threshold\n",
" self.huber_fn = create_huber(threshold)\n",
" super().__init__(name=name, dtype=dtype)\n",
"\n",
" def update_state(self, y_true, y_pred, sample_weight=None):\n",
" metric = self.huber_fn(y_true, y_pred)\n",
" super(HuberMetric, self).update_state(metric, sample_weight)\n",
"\n",
" def get_config(self):\n",
" base_config = super().get_config()\n",
" return {**base_config, \"threshold\": self.threshold} "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This class handles shapes better, and it also supports sample weights."
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"model = tf.keras.Sequential([\n",
" tf.keras.layers.Dense(30, activation=\"relu\", kernel_initializer=\"he_normal\",\n",
" input_shape=input_shape),\n",
" tf.keras.layers.Dense(1),\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=tf.keras.losses.Huber(2.0), optimizer=\"nadam\",\n",
" weighted_metrics=[HuberMetric(2.0)])"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"np.random.seed(42)\n",
"sample_weight = np.random.rand(len(y_train))\n",
"history = model.fit(X_train_scaled, y_train, epochs=2,\n",
" sample_weight=sample_weight)"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"outputs": [],
"source": [
"(history.history[\"loss\"][0],\n",
" history.history[\"HuberMetric\"][0] * sample_weight.mean())"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {},
"outputs": [],
"source": [
"model.save(\"my_model_with_a_custom_metric_v2\")"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {},
"outputs": [],
"source": [
"model = tf.keras.models.load_model(\"my_model_with_a_custom_metric_v2\",\n",
" custom_objects={\"HuberMetric\": HuberMetric})"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X_train_scaled, y_train, epochs=2)"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"model.metrics[-1].threshold"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Custom Layers"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {},
"outputs": [],
"source": [
"exponential_layer = tf.keras.layers.Lambda(lambda x: tf.exp(x))"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {},
"outputs": [],
"source": [
"# extra code like all layers, it can be used as a function:\n",
"exponential_layer([-1., 0., 1.])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Adding an exponential layer at the output of a regression model can be useful if the values to predict are positive and with very different scales (e.g., 0.001, 10., 10000)."
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"model = tf.keras.Sequential([\n",
" tf.keras.layers.Dense(30, activation=\"relu\", input_shape=input_shape),\n",
" tf.keras.layers.Dense(1),\n",
" exponential_layer\n",
"])\n",
"model.compile(loss=\"mse\", optimizer=\"sgd\")\n",
"model.fit(X_train_scaled, y_train, epochs=5,\n",
" validation_data=(X_valid_scaled, y_valid))\n",
"model.evaluate(X_test_scaled, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, it's often preferable to replace the targets with the logarithm of the targets (and use no activation function in the output layer)."
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {},
"outputs": [],
"source": [
"class MyDense(tf.keras.layers.Layer):\n",
" def __init__(self, units, activation=None, **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.units = units\n",
" self.activation = tf.keras.activations.get(activation)\n",
"\n",
" def build(self, batch_input_shape):\n",
" self.kernel = self.add_weight(\n",
" name=\"kernel\", shape=[batch_input_shape[-1], self.units],\n",
" initializer=\"he_normal\")\n",
" self.bias = self.add_weight(\n",
" name=\"bias\", shape=[self.units], initializer=\"zeros\")\n",
" super().build(batch_input_shape) # must be at the end\n",
"\n",
" def call(self, X):\n",
" return self.activation(X @ self.kernel + self.bias)\n",
"\n",
" def compute_output_shape(self, batch_input_shape):\n",
" return tf.TensorShape(batch_input_shape.as_list()[:-1] + [self.units])\n",
"\n",
" def get_config(self):\n",
" base_config = super().get_config()\n",
" return {**base_config, \"units\": self.units,\n",
" \"activation\": tf.keras.activations.serialize(self.activation)}"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows that a custom layer can be used normally\n",
"tf.random.set_seed(42)\n",
"model = tf.keras.Sequential([\n",
" MyDense(30, activation=\"relu\", input_shape=input_shape),\n",
" MyDense(1)\n",
"])\n",
"model.compile(loss=\"mse\", optimizer=\"nadam\")\n",
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))\n",
"model.evaluate(X_test_scaled, y_test)\n",
"model.save(\"my_model_with_a_custom_layer\")"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to load a model with a custom layer\n",
"model = tf.keras.models.load_model(\"my_model_with_a_custom_layer\",\n",
" custom_objects={\"MyDense\": MyDense})\n",
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))"
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {},
"outputs": [],
"source": [
"class MyMultiLayer(tf.keras.layers.Layer):\n",
" def call(self, X):\n",
" X1, X2 = X\n",
" print(\"X1.shape: \", X1.shape ,\" X2.shape: \", X2.shape) # extra code\n",
" return X1 + X2, X1 * X2, X1 / X2\n",
"\n",
" def compute_output_shape(self, batch_input_shape):\n",
" batch_input_shape1, batch_input_shape2 = batch_input_shape\n",
" return [batch_input_shape1, batch_input_shape1, batch_input_shape1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our custom layer can be called using the functional API like this:"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {},
"outputs": [],
"source": [
"# extra code tests MyMultiLayer with symbolic inputs\n",
"inputs1 = tf.keras.layers.Input(shape=[2])\n",
"inputs2 = tf.keras.layers.Input(shape=[2])\n",
"MyMultiLayer()((inputs1, inputs2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the `call()` method receives symbolic inputs, and it returns symbolic outputs. The shapes are only partially specified at this stage: we don't know the batch size, which is why the first dimension is `None`.\n",
"\n",
"We can also pass actual data to the custom layer:"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {},
"outputs": [],
"source": [
"# extra code tests MyMultiLayer with actual data \n",
"X1, X2 = np.array([[3., 6.], [2., 7.]]), np.array([[6., 12.], [4., 3.]]) \n",
"MyMultiLayer()((X1, X2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's create a layer with a different behavior during training and testing:"
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {},
"outputs": [],
"source": [
"class MyGaussianNoise(tf.keras.layers.Layer):\n",
" def __init__(self, stddev, **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.stddev = stddev\n",
"\n",
" def call(self, X, training=None):\n",
" if training:\n",
" noise = tf.random.normal(tf.shape(X), stddev=self.stddev)\n",
" return X + noise\n",
" else:\n",
" return X\n",
"\n",
" def compute_output_shape(self, batch_input_shape):\n",
" return batch_input_shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's a simple model that uses this custom layer:"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {},
"outputs": [],
"source": [
"# extra code tests MyGaussianNoise\n",
"tf.random.set_seed(42)\n",
"model = tf.keras.Sequential([\n",
" MyGaussianNoise(stddev=1.0, input_shape=input_shape),\n",
" tf.keras.layers.Dense(30, activation=\"relu\",\n",
" kernel_initializer=\"he_normal\"),\n",
" tf.keras.layers.Dense(1)\n",
"])\n",
"model.compile(loss=\"mse\", optimizer=\"nadam\")\n",
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))\n",
"model.evaluate(X_test_scaled, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Custom Models"
]
},
{
"cell_type": "code",
"execution_count": 138,
"metadata": {},
"outputs": [],
"source": [
"class ResidualBlock(tf.keras.layers.Layer):\n",
" def __init__(self, n_layers, n_neurons, **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.hidden = [tf.keras.layers.Dense(n_neurons, activation=\"relu\",\n",
" kernel_initializer=\"he_normal\")\n",
" for _ in range(n_layers)]\n",
"\n",
" def call(self, inputs):\n",
" Z = inputs\n",
" for layer in self.hidden:\n",
" Z = layer(Z)\n",
" return inputs + Z"
]
},
{
"cell_type": "code",
"execution_count": 139,
"metadata": {},
"outputs": [],
"source": [
"class ResidualRegressor(tf.keras.Model):\n",
" def __init__(self, output_dim, **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.hidden1 = tf.keras.layers.Dense(30, activation=\"relu\",\n",
" kernel_initializer=\"he_normal\")\n",
" self.block1 = ResidualBlock(2, 30)\n",
" self.block2 = ResidualBlock(2, 30)\n",
" self.out = tf.keras.layers.Dense(output_dim)\n",
"\n",
" def call(self, inputs):\n",
" Z = self.hidden1(inputs)\n",
" for _ in range(1 + 3):\n",
" Z = self.block1(Z)\n",
" Z = self.block2(Z)\n",
" return self.out(Z)"
]
},
{
"cell_type": "code",
"execution_count": 140,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows that the model can be used normally\n",
"tf.random.set_seed(42)\n",
"model = ResidualRegressor(1)\n",
"model.compile(loss=\"mse\", optimizer=\"nadam\")\n",
"history = model.fit(X_train_scaled, y_train, epochs=2)\n",
"score = model.evaluate(X_test_scaled, y_test)\n",
"model.save(\"my_custom_model\")"
]
},
{
"cell_type": "code",
"execution_count": 141,
"metadata": {},
"outputs": [],
"source": [
"# extra code the model can be loaded and you can continue training or use it\n",
"# to make predictions\n",
"model = tf.keras.models.load_model(\"my_custom_model\")\n",
"history = model.fit(X_train_scaled, y_train, epochs=2)\n",
"model.predict(X_test_scaled[:3])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could have defined the model using the sequential API instead:"
]
},
{
"cell_type": "code",
"execution_count": 142,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)\n",
"block1 = ResidualBlock(2, 30)\n",
"model = tf.keras.Sequential([\n",
" tf.keras.layers.Dense(30, activation=\"relu\",\n",
" kernel_initializer=\"he_normal\"),\n",
" block1, block1, block1, block1,\n",
" ResidualBlock(2, 30),\n",
" tf.keras.layers.Dense(1)\n",
"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Losses and Metrics Based on Model Internals"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning**: due to an issue introduced in TF 2.2 ([#46858](https://github.com/tensorflow/tensorflow/issues/46858)), `super().build()` fails. We can work around this issue by setting `self.built = True` instead."
]
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {},
"outputs": [],
"source": [
"class ReconstructingRegressor(tf.keras.Model):\n",
" def __init__(self, output_dim, **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.hidden = [tf.keras.layers.Dense(30, activation=\"relu\",\n",
" kernel_initializer=\"he_normal\")\n",
" for _ in range(5)]\n",
" self.out = tf.keras.layers.Dense(output_dim)\n",
" self.reconstruction_mean = tf.keras.metrics.Mean(\n",
" name=\"reconstruction_error\")\n",
"\n",
" def build(self, batch_input_shape):\n",
" n_inputs = batch_input_shape[-1]\n",
" self.reconstruct = tf.keras.layers.Dense(n_inputs)\n",
" self.built = True # WORKAROUND for super().build(batch_input_shape)\n",
"\n",
" def call(self, inputs, training=None):\n",
" Z = inputs\n",
" for layer in self.hidden:\n",
" Z = layer(Z)\n",
" reconstruction = self.reconstruct(Z)\n",
" recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))\n",
" self.add_loss(0.05 * recon_loss)\n",
" if training:\n",
" result = self.reconstruction_mean(recon_loss)\n",
" self.add_metric(result)\n",
" return self.out(Z)"
]
},
{
"cell_type": "code",
"execution_count": 144,
"metadata": {},
"outputs": [],
"source": [
"# extra code\n",
"tf.random.set_seed(42)\n",
"model = ReconstructingRegressor(1)\n",
"model.compile(loss=\"mse\", optimizer=\"nadam\")\n",
"history = model.fit(X_train_scaled, y_train, epochs=5)\n",
"y_pred = model.predict(X_test_scaled)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computing Gradients Using Autodiff"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {},
"outputs": [],
"source": [
"def f(w1, w2):\n",
" return 3 * w1 ** 2 + 2 * w1 * w2"
]
},
{
"cell_type": "code",
"execution_count": 146,
"metadata": {},
"outputs": [],
"source": [
"w1, w2 = 5, 3\n",
"eps = 1e-6\n",
"(f(w1 + eps, w2) - f(w1, w2)) / eps"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {},
"outputs": [],
"source": [
"(f(w1, w2 + eps) - f(w1, w2)) / eps"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {},
"outputs": [],
"source": [
"w1, w2 = tf.Variable(5.), tf.Variable(3.)\n",
"with tf.GradientTape() as tape:\n",
" z = f(w1, w2)\n",
"\n",
"gradients = tape.gradient(z, [w1, w2])"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [],
"source": [
"gradients"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {},
"outputs": [],
"source": [
"with tf.GradientTape() as tape:\n",
" z = f(w1, w2)\n",
"\n",
"dz_dw1 = tape.gradient(z, w1) # returns tensor 36.0\n",
"try:\n",
" dz_dw2 = tape.gradient(z, w2) # raises a RuntimeError!\n",
"except RuntimeError as ex:\n",
" print(ex)"
]
},
{
"cell_type": "code",
"execution_count": 151,
"metadata": {},
"outputs": [],
"source": [
"with tf.GradientTape(persistent=True) as tape:\n",
" z = f(w1, w2)\n",
"\n",
"dz_dw1 = tape.gradient(z, w1) # returns tensor 36.0\n",
"dz_dw2 = tape.gradient(z, w2) # returns tensor 10.0, works fine now!\n",
"del tape"
]
},
{
"cell_type": "code",
"execution_count": 152,
"metadata": {},
"outputs": [],
"source": [
"dz_dw1, dz_dw2"
]
},
{
"cell_type": "code",
"execution_count": 153,
"metadata": {},
"outputs": [],
"source": [
"c1, c2 = tf.constant(5.), tf.constant(3.)\n",
"with tf.GradientTape() as tape:\n",
" z = f(c1, c2)\n",
"\n",
"gradients = tape.gradient(z, [c1, c2])"
]
},
{
"cell_type": "code",
"execution_count": 154,
"metadata": {},
"outputs": [],
"source": [
"gradients"
]
},
{
"cell_type": "code",
"execution_count": 155,
"metadata": {},
"outputs": [],
"source": [
"with tf.GradientTape() as tape:\n",
" tape.watch(c1)\n",
" tape.watch(c2)\n",
" z = f(c1, c2)\n",
"\n",
"gradients = tape.gradient(z, [c1, c2])"
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {},
"outputs": [],
"source": [
"gradients"
]
},
{
"cell_type": "code",
"execution_count": 157,
"metadata": {},
"outputs": [],
"source": [
"# extra code if given a vector, tape.gradient() will compute the gradient of\n",
"# the vector's sum.\n",
"with tf.GradientTape() as tape:\n",
" z1 = f(w1, w2 + 2.)\n",
" z2 = f(w1, w2 + 5.)\n",
" z3 = f(w1, w2 + 7.)\n",
"\n",
"tape.gradient([z1, z2, z3], [w1, w2])"
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows that we get the same result as the previous cell\n",
"with tf.GradientTape() as tape:\n",
" z1 = f(w1, w2 + 2.)\n",
" z2 = f(w1, w2 + 5.)\n",
" z3 = f(w1, w2 + 7.)\n",
" z = z1 + z2 + z3\n",
"\n",
"tape.gradient(z, [w1, w2])"
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to compute the jacobians and the hessians\n",
"with tf.GradientTape(persistent=True) as hessian_tape:\n",
" with tf.GradientTape() as jacobian_tape:\n",
" z = f(w1, w2)\n",
" jacobians = jacobian_tape.gradient(z, [w1, w2])\n",
"hessians = [hessian_tape.gradient(jacobian, [w1, w2])\n",
" for jacobian in jacobians]\n",
"del hessian_tape"
]
},
{
"cell_type": "code",
"execution_count": 160,
"metadata": {},
"outputs": [],
"source": [
"jacobians"
]
},
{
"cell_type": "code",
"execution_count": 161,
"metadata": {},
"outputs": [],
"source": [
"hessians"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {},
"outputs": [],
"source": [
"def f(w1, w2):\n",
" return 3 * w1 ** 2 + tf.stop_gradient(2 * w1 * w2)\n",
"\n",
"with tf.GradientTape() as tape:\n",
" z = f(w1, w2) # same result as without stop_gradient()\n",
"\n",
"gradients = tape.gradient(z, [w1, w2])"
]
},
{
"cell_type": "code",
"execution_count": 163,
"metadata": {},
"outputs": [],
"source": [
"gradients"
]
},
{
"cell_type": "code",
"execution_count": 164,
"metadata": {},
"outputs": [],
"source": [
"x = tf.Variable(1e-50)\n",
"with tf.GradientTape() as tape:\n",
" z = tf.sqrt(x)\n",
"\n",
"tape.gradient(z, [x])"
]
},
{
"cell_type": "code",
"execution_count": 165,
"metadata": {},
"outputs": [],
"source": [
"tf.math.log(tf.exp(tf.constant(30., dtype=tf.float32)) + 1.)"
]
},
{
"cell_type": "code",
"execution_count": 166,
"metadata": {},
"outputs": [],
"source": [
"x = tf.Variable([1.0e30])\n",
"with tf.GradientTape() as tape:\n",
" z = my_softplus(x)\n",
"\n",
"tape.gradient(z, [x])"
]
},
{
"cell_type": "code",
"execution_count": 167,
"metadata": {},
"outputs": [],
"source": [
"def my_softplus(z):\n",
" return tf.math.log(1 + tf.exp(-tf.abs(z))) + tf.maximum(0., z)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is the proof that this equation is equal to log(1 + exp(_z_)):\n",
"* softplus(_z_) = log(1 + exp(_z_))\n",
"* softplus(_z_) = log(1 + exp(_z_)) - log(exp(_z_)) + log(exp(_z_)) ; **just adding and subtracting the same value**\n",
"* softplus(_z_) = log\\[(1 + exp(_z_)) / exp(_z_)\\] + log(exp(_z_)) ; **since log(_a_) - log(_b_) = log(_a_ / _b_)**\n",
"* softplus(_z_) = log\\[(1 + exp(_z_)) / exp(_z_)\\] + _z_ ; **since log(exp(_z_)) = _z_**\n",
"* softplus(_z_) = log\\[1 / exp(_z_) + exp(_z_) / exp(_z_)\\] + _z_ ; **since (1 + _a_) / _b_ = 1 / _b_ + _a_ / _b_**\n",
"* softplus(_z_) = log\\[exp(_z_) + 1\\] + _z_ ; **since 1 / exp(_z_) = exp(z), and exp(_z_) / exp(_z_) = 1**\n",
"* softplus(_z_) = softplus(_z_) + _z_ ; **we recognize the definition at the top, but with _z_**\n",
"* softplus(_z_) = softplus(|_z_|) + max(0, _z_) ; **if you consider both cases, _z_ < 0 or _z_ ≥ 0, you will see that this works**"
]
},
{
"cell_type": "code",
"execution_count": 168,
"metadata": {},
"outputs": [],
"source": [
"@tf.custom_gradient\n",
"def my_softplus(z):\n",
" def my_softplus_gradients(grads): # grads = backprop'ed from upper layers\n",
" return grads * (1 - 1 / (1 + tf.exp(z))) # stable grads of softplus\n",
"\n",
" result = tf.math.log(1 + tf.exp(-tf.abs(z))) + tf.maximum(0., z)\n",
" return result, my_softplus_gradients"
]
},
{
"cell_type": "code",
"execution_count": 169,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows that the function is now stable, as well as its gradients\n",
"x = tf.Variable([1000.])\n",
"with tf.GradientTape() as tape:\n",
" z = my_softplus(x)\n",
"\n",
"z, tape.gradient(z, [x])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Custom Training Loops"
]
},
{
"cell_type": "code",
"execution_count": 170,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42) # extra code to ensure reproducibility\n",
"l2_reg = tf.keras.regularizers.l2(0.05)\n",
"model = tf.keras.models.Sequential([\n",
" tf.keras.layers.Dense(30, activation=\"relu\", kernel_initializer=\"he_normal\",\n",
" kernel_regularizer=l2_reg),\n",
" tf.keras.layers.Dense(1, kernel_regularizer=l2_reg)\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 171,
"metadata": {},
"outputs": [],
"source": [
"def random_batch(X, y, batch_size=32):\n",
" idx = np.random.randint(len(X), size=batch_size)\n",
" return X[idx], y[idx]"
]
},
{
"cell_type": "code",
"execution_count": 172,
"metadata": {},
"outputs": [],
"source": [
"def print_status_bar(step, total, loss, metrics=None):\n",
" metrics = \" - \".join([f\"{m.name}: {m.result():.4f}\"\n",
" for m in [loss] + (metrics or [])])\n",
" end = \"\" if step < total else \"\\n\"\n",
" print(f\"\\r{step}/{total} - \" + metrics, end=end)"
]
},
{
"cell_type": "code",
"execution_count": 173,
"metadata": {},
"outputs": [],
"source": [
"np.random.seed(42)\n",
"tf.random.set_seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 174,
"metadata": {},
"outputs": [],
"source": [
"n_epochs = 5\n",
"batch_size = 32\n",
"n_steps = len(X_train) // batch_size\n",
"optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)\n",
"loss_fn = tf.keras.losses.mean_squared_error\n",
"mean_loss = tf.keras.metrics.Mean()\n",
"metrics = [tf.keras.metrics.MeanAbsoluteError()]"
]
},
{
"cell_type": "code",
"execution_count": 175,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"for epoch in range(1, n_epochs + 1):\n",
" print(f\"Epoch {epoch}/{n_epochs}\")\n",
" for step in range(1, n_steps + 1):\n",
" X_batch, y_batch = random_batch(X_train_scaled, y_train)\n",
" with tf.GradientTape() as tape:\n",
" y_pred = model(X_batch, training=True)\n",
" main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))\n",
" loss = tf.add_n([main_loss] + model.losses)\n",
"\n",
" gradients = tape.gradient(loss, model.trainable_variables)\n",
" optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n",
"\n",
" # extra code if your model has variable constraints\n",
" for variable in model.variables:\n",
" if variable.constraint is not None:\n",
" variable.assign(variable.constraint(variable))\n",
"\n",
" mean_loss(loss)\n",
" for metric in metrics:\n",
" metric(y_batch, y_pred)\n",
"\n",
" print_status_bar(step, n_steps, mean_loss, metrics)\n",
"\n",
" for metric in [mean_loss] + metrics:\n",
" metric.reset_states()"
]
},
{
"cell_type": "code",
"execution_count": 176,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to use the tqdm package to display nice progress bars\n",
"\n",
"from tqdm.notebook import trange\n",
"from collections import OrderedDict\n",
"with trange(1, n_epochs + 1, desc=\"All epochs\") as epochs:\n",
" for epoch in epochs:\n",
" with trange(1, n_steps + 1, desc=f\"Epoch {epoch}/{n_epochs}\") as steps:\n",
" for step in steps:\n",
" X_batch, y_batch = random_batch(X_train_scaled, y_train)\n",
" with tf.GradientTape() as tape:\n",
" y_pred = model(X_batch)\n",
" main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))\n",
" loss = tf.add_n([main_loss] + model.losses)\n",
"\n",
" gradients = tape.gradient(loss, model.trainable_variables)\n",
" optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n",
"\n",
" for variable in model.variables:\n",
" if variable.constraint is not None:\n",
" variable.assign(variable.constraint(variable))\n",
"\n",
" status = OrderedDict()\n",
" mean_loss(loss)\n",
" status[\"loss\"] = mean_loss.result().numpy()\n",
" for metric in metrics:\n",
" metric(y_batch, y_pred)\n",
" status[metric.name] = metric.result().numpy()\n",
"\n",
" steps.set_postfix(status)\n",
"\n",
" for metric in [mean_loss] + metrics:\n",
" metric.reset_states()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## TensorFlow Functions"
]
},
{
"cell_type": "code",
"execution_count": 177,
"metadata": {},
"outputs": [],
"source": [
"def cube(x):\n",
" return x ** 3"
]
},
{
"cell_type": "code",
"execution_count": 178,
"metadata": {},
"outputs": [],
"source": [
"cube(2)"
]
},
{
"cell_type": "code",
"execution_count": 179,
"metadata": {},
"outputs": [],
"source": [
"cube(tf.constant(2.0))"
]
},
{
"cell_type": "code",
"execution_count": 180,
"metadata": {},
"outputs": [],
"source": [
"tf_cube = tf.function(cube)\n",
"tf_cube"
]
},
{
"cell_type": "code",
"execution_count": 181,
"metadata": {},
"outputs": [],
"source": [
"tf_cube(2)"
]
},
{
"cell_type": "code",
"execution_count": 182,
"metadata": {},
"outputs": [],
"source": [
"tf_cube(tf.constant(2.0))"
]
},
{
"cell_type": "code",
"execution_count": 183,
"metadata": {},
"outputs": [],
"source": [
"@tf.function\n",
"def tf_cube(x):\n",
" return x ** 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** the rest of the code in this section is in appendix D."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TF Functions and Concrete Functions"
]
},
{
"cell_type": "code",
"execution_count": 184,
"metadata": {},
"outputs": [],
"source": [
"concrete_function = tf_cube.get_concrete_function(tf.constant(2.0))\n",
"concrete_function"
]
},
{
"cell_type": "code",
"execution_count": 185,
"metadata": {},
"outputs": [],
"source": [
"concrete_function(tf.constant(2.0))"
]
},
{
"cell_type": "code",
"execution_count": 186,
"metadata": {},
"outputs": [],
"source": [
"concrete_function is tf_cube.get_concrete_function(tf.constant(2.0))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exploring Function Definitions and Graphs"
]
},
{
"cell_type": "code",
"execution_count": 187,
"metadata": {},
"outputs": [],
"source": [
"concrete_function.graph"
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {},
"outputs": [],
"source": [
"ops = concrete_function.graph.get_operations()\n",
"ops"
]
},
{
"cell_type": "code",
"execution_count": 189,
"metadata": {},
"outputs": [],
"source": [
"pow_op = ops[2]\n",
"list(pow_op.inputs)"
]
},
{
"cell_type": "code",
"execution_count": 190,
"metadata": {},
"outputs": [],
"source": [
"pow_op.outputs"
]
},
{
"cell_type": "code",
"execution_count": 191,
"metadata": {},
"outputs": [],
"source": [
"concrete_function.graph.get_operation_by_name('x')"
]
},
{
"cell_type": "code",
"execution_count": 192,
"metadata": {},
"outputs": [],
"source": [
"concrete_function.graph.get_tensor_by_name('Identity:0')"
]
},
{
"cell_type": "code",
"execution_count": 193,
"metadata": {},
"outputs": [],
"source": [
"concrete_function.function_def.signature"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### How TF Functions Trace Python Functions to Extract Their Computation Graphs"
]
},
{
"cell_type": "code",
"execution_count": 194,
"metadata": {},
"outputs": [],
"source": [
"@tf.function\n",
"def tf_cube(x):\n",
" print(f\"x = {x}\")\n",
" return x ** 3"
]
},
{
"cell_type": "code",
"execution_count": 195,
"metadata": {},
"outputs": [],
"source": [
"result = tf_cube(tf.constant(2.0))"
]
},
{
"cell_type": "code",
"execution_count": 196,
"metadata": {},
"outputs": [],
"source": [
"result"
]
},
{
"cell_type": "code",
"execution_count": 197,
"metadata": {},
"outputs": [],
"source": [
"result = tf_cube(2)"
]
},
{
"cell_type": "code",
"execution_count": 198,
"metadata": {},
"outputs": [],
"source": [
"result = tf_cube(3)"
]
},
{
"cell_type": "code",
"execution_count": 199,
"metadata": {},
"outputs": [],
"source": [
"result = tf_cube(tf.constant([[1., 2.]])) # New shape: trace!"
]
},
{
"cell_type": "code",
"execution_count": 200,
"metadata": {},
"outputs": [],
"source": [
"result = tf_cube(tf.constant([[3., 4.], [5., 6.]])) # New shape: trace!"
]
},
{
"cell_type": "code",
"execution_count": 201,
"metadata": {},
"outputs": [],
"source": [
"result = tf_cube(tf.constant([[7., 8.], [9., 10.]])) # Same shape: no trace"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is also possible to specify a particular input signature:"
]
},
{
"cell_type": "code",
"execution_count": 202,
"metadata": {},
"outputs": [],
"source": [
"@tf.function(input_signature=[tf.TensorSpec([None, 28, 28], tf.float32)])\n",
"def shrink(images):\n",
" print(\"Tracing\", images) # extra code to show when tracing happens\n",
" return images[:, ::2, ::2] # drop half the rows and columns"
]
},
{
"cell_type": "code",
"execution_count": 203,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 204,
"metadata": {},
"outputs": [],
"source": [
"img_batch_1 = tf.random.uniform(shape=[100, 28, 28])\n",
"img_batch_2 = tf.random.uniform(shape=[50, 28, 28])\n",
"preprocessed_images = shrink(img_batch_1) # Works fine, traces the function\n",
"preprocessed_images = shrink(img_batch_2) # Works fine, same concrete function"
]
},
{
"cell_type": "code",
"execution_count": 205,
"metadata": {},
"outputs": [],
"source": [
"img_batch_3 = tf.random.uniform(shape=[2, 2, 2])\n",
"try:\n",
" preprocessed_images = shrink(img_batch_3) # ValueError! Incompatible inputs\n",
"except ValueError as ex:\n",
" print(ex)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using Autograph To Capture Control Flow"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A \"static\" `for` loop using `range()`:"
]
},
{
"cell_type": "code",
"execution_count": 206,
"metadata": {},
"outputs": [],
"source": [
"@tf.function\n",
"def add_10(x):\n",
" for i in range(10):\n",
" x += 1\n",
" return x"
]
},
{
"cell_type": "code",
"execution_count": 207,
"metadata": {},
"outputs": [],
"source": [
"add_10(tf.constant(5))"
]
},
{
"cell_type": "code",
"execution_count": 208,
"metadata": {},
"outputs": [],
"source": [
"add_10.get_concrete_function(tf.constant(5)).graph.get_operations()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A \"dynamic\" loop using `tf.while_loop()`:"
]
},
{
"cell_type": "code",
"execution_count": 209,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to use tf.while_loop (usually @tf.function is simpler)\n",
"@tf.function\n",
"def add_10(x):\n",
" condition = lambda i, x: tf.less(i, 10)\n",
" body = lambda i, x: (tf.add(i, 1), tf.add(x, 1))\n",
" final_i, final_x = tf.while_loop(condition, body, [tf.constant(0), x])\n",
" return final_x"
]
},
{
"cell_type": "code",
"execution_count": 210,
"metadata": {},
"outputs": [],
"source": [
"add_10(tf.constant(5))"
]
},
{
"cell_type": "code",
"execution_count": 211,
"metadata": {},
"outputs": [],
"source": [
"add_10.get_concrete_function(tf.constant(5)).graph.get_operations()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A \"dynamic\" `for` loop using `tf.range()` (captured by autograph):"
]
},
{
"cell_type": "code",
"execution_count": 212,
"metadata": {},
"outputs": [],
"source": [
"@tf.function\n",
"def add_10(x):\n",
" for i in tf.range(10):\n",
" x = x + 1\n",
" return x"
]
},
{
"cell_type": "code",
"execution_count": 213,
"metadata": {},
"outputs": [],
"source": [
"add_10.get_concrete_function(tf.constant(0)).graph.get_operations()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Handling Variables and Other Resources in TF Functions"
]
},
{
"cell_type": "code",
"execution_count": 214,
"metadata": {},
"outputs": [],
"source": [
"counter = tf.Variable(0)\n",
"\n",
"@tf.function\n",
"def increment(counter, c=1):\n",
" return counter.assign_add(c)\n",
"\n",
"increment(counter) # counter is now equal to 1\n",
"increment(counter) # counter is now equal to 2"
]
},
{
"cell_type": "code",
"execution_count": 215,
"metadata": {},
"outputs": [],
"source": [
"function_def = increment.get_concrete_function(counter).function_def\n",
"function_def.signature.input_arg[0]"
]
},
{
"cell_type": "code",
"execution_count": 216,
"metadata": {},
"outputs": [],
"source": [
"counter = tf.Variable(0)\n",
"\n",
"@tf.function\n",
"def increment(c=1):\n",
" return counter.assign_add(c)"
]
},
{
"cell_type": "code",
"execution_count": 217,
"metadata": {},
"outputs": [],
"source": [
"increment()\n",
"increment()"
]
},
{
"cell_type": "code",
"execution_count": 218,
"metadata": {},
"outputs": [],
"source": [
"function_def = increment.get_concrete_function().function_def\n",
"function_def.signature.input_arg[0]"
]
},
{
"cell_type": "code",
"execution_count": 219,
"metadata": {},
"outputs": [],
"source": [
"class Counter:\n",
" def __init__(self):\n",
" self.counter = tf.Variable(0)\n",
"\n",
" @tf.function\n",
" def increment(self, c=1):\n",
" return self.counter.assign_add(c)"
]
},
{
"cell_type": "code",
"execution_count": 220,
"metadata": {},
"outputs": [],
"source": [
"c = Counter()\n",
"c.increment()\n",
"c.increment()"
]
},
{
"cell_type": "code",
"execution_count": 221,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"@tf.function\n",
"def add_10(x):\n",
" for i in tf.range(10):\n",
" x += 1\n",
" return x\n",
"\n",
"print(tf.autograph.to_code(add_10.python_function))"
]
},
{
"cell_type": "code",
"execution_count": 222,
"metadata": {},
"outputs": [],
"source": [
"# extra code shows how to display the autograph code with syntax highlighting\n",
"def display_tf_code(func):\n",
" from IPython.display import display, Markdown\n",
" if hasattr(func, \"python_function\"):\n",
" func = func.python_function\n",
" code = tf.autograph.to_code(func)\n",
" display(Markdown(f'```python\\n{code}\\n```'))"
]
},
{
"cell_type": "code",
"execution_count": 223,
"metadata": {},
"outputs": [],
"source": [
"display_tf_code(add_10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using TF Functions with tf.keras (or Not)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By default, tf.keras will automatically convert your custom code into TF Functions, no need to use\n",
"`tf.function()`:"
]
},
{
"cell_type": "code",
"execution_count": 224,
"metadata": {},
"outputs": [],
"source": [
"# Custom loss function\n",
"def my_mse(y_true, y_pred):\n",
" print(\"Tracing loss my_mse()\")\n",
" return tf.reduce_mean(tf.square(y_pred - y_true))"
]
},
{
"cell_type": "code",
"execution_count": 225,
"metadata": {},
"outputs": [],
"source": [
"# Custom metric function\n",
"def my_mae(y_true, y_pred):\n",
" print(\"Tracing metric my_mae()\")\n",
" return tf.reduce_mean(tf.abs(y_pred - y_true))"
]
},
{
"cell_type": "code",
"execution_count": 226,
"metadata": {},
"outputs": [],
"source": [
"# Custom layer\n",
"class MyDense(tf.keras.layers.Layer):\n",
" def __init__(self, units, activation=None, **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.units = units\n",
" self.activation = tf.keras.activations.get(activation)\n",
"\n",
" def build(self, input_shape):\n",
" self.kernel = self.add_weight(name='kernel', \n",
" shape=(input_shape[1], self.units),\n",
" initializer='uniform',\n",
" trainable=True)\n",
" self.biases = self.add_weight(name='bias', \n",
" shape=(self.units,),\n",
" initializer='zeros',\n",
" trainable=True)\n",
" super().build(input_shape)\n",
"\n",
" def call(self, X):\n",
" print(\"Tracing MyDense.call()\")\n",
" return self.activation(X @ self.kernel + self.biases)"
]
},
{
"cell_type": "code",
"execution_count": 227,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 228,
"metadata": {},
"outputs": [],
"source": [
"# Custom model\n",
"class MyModel(tf.keras.Model):\n",
" def __init__(self, **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.hidden1 = MyDense(30, activation=\"relu\")\n",
" self.hidden2 = MyDense(30, activation=\"relu\")\n",
" self.output_ = MyDense(1)\n",
"\n",
" def call(self, input):\n",
" print(\"Tracing MyModel.call()\")\n",
" hidden1 = self.hidden1(input)\n",
" hidden2 = self.hidden2(hidden1)\n",
" concat = tf.keras.layers.concatenate([input, hidden2])\n",
" output = self.output_(concat)\n",
" return output\n",
"\n",
"model = MyModel()"
]
},
{
"cell_type": "code",
"execution_count": 229,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=my_mse, optimizer=\"nadam\", metrics=[my_mae])"
]
},
{
"cell_type": "code",
"execution_count": 230,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X_train_scaled, y_train, epochs=2,\n",
" validation_data=(X_valid_scaled, y_valid))\n",
"model.evaluate(X_test_scaled, y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can turn this off by creating the model with `dynamic=True` (or calling `super().__init__(dynamic=True, **kwargs)` in the model's constructor):"
]
},
{
"cell_type": "code",
"execution_count": 231,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 232,
"metadata": {},
"outputs": [],
"source": [
"model = MyModel(dynamic=True)"
]
},
{
"cell_type": "code",
"execution_count": 233,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=my_mse, optimizer=\"nadam\", metrics=[my_mae])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now the custom code will be called at each iteration. Let's fit, validate and evaluate with tiny datasets to avoid getting too much output:"
]
},
{
"cell_type": "code",
"execution_count": 234,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X_train_scaled[:64], y_train[:64], epochs=1,\n",
" validation_data=(X_valid_scaled[:64], y_valid[:64]), verbose=0)\n",
"model.evaluate(X_test_scaled[:64], y_test[:64], verbose=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, you can compile a model with `run_eagerly=True`:"
]
},
{
"cell_type": "code",
"execution_count": 235,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 236,
"metadata": {},
"outputs": [],
"source": [
"model = MyModel()"
]
},
{
"cell_type": "code",
"execution_count": 237,
"metadata": {},
"outputs": [],
"source": [
"model.compile(loss=my_mse, optimizer=\"nadam\", metrics=[my_mae], run_eagerly=True)"
]
},
{
"cell_type": "code",
"execution_count": 238,
"metadata": {},
"outputs": [],
"source": [
"model.fit(X_train_scaled[:64], y_train[:64], epochs=1,\n",
" validation_data=(X_valid_scaled[:64], y_valid[:64]), verbose=0)\n",
"model.evaluate(X_test_scaled[:64], y_test[:64], verbose=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Extra Material Custom Optimizers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Defining custom optimizers is not very common, but in case you are one of the happy few who gets to write one, here is an example:"
]
},
{
"cell_type": "code",
"execution_count": 239,
"metadata": {},
"outputs": [],
"source": [
"class MyMomentumOptimizer(tf.keras.optimizers.Optimizer):\n",
" def __init__(self, learning_rate=0.001, momentum=0.9, name=\"MyMomentumOptimizer\", **kwargs):\n",
" \"\"\"Call super().__init__() and use _set_hyper() to store hyperparameters\"\"\"\n",
" super().__init__(name, **kwargs)\n",
" self._set_hyper(\"learning_rate\", kwargs.get(\"lr\", learning_rate)) # handle lr=learning_rate\n",
" self._set_hyper(\"decay\", self._initial_decay) # \n",
" self._set_hyper(\"momentum\", momentum)\n",
" \n",
" def _create_slots(self, var_list):\n",
" \"\"\"For each model variable, create the optimizer variable associated with it.\n",
" TensorFlow calls these optimizer variables \"slots\".\n",
" For momentum optimization, we need one momentum slot per model variable.\n",
" \"\"\"\n",
" for var in var_list:\n",
" self.add_slot(var, \"momentum\")\n",
"\n",
" @tf.function\n",
" def _resource_apply_dense(self, grad, var):\n",
" \"\"\"Update the slots and perform one optimization step for one model variable\n",
" \"\"\"\n",
" var_dtype = var.dtype.base_dtype\n",
" lr_t = self._decayed_lr(var_dtype) # handle learning rate decay\n",
" momentum_var = self.get_slot(var, \"momentum\")\n",
" momentum_hyper = self._get_hyper(\"momentum\", var_dtype)\n",
" momentum_var.assign(momentum_var * momentum_hyper - (1. - momentum_hyper)* grad)\n",
" var.assign_add(momentum_var * lr_t)\n",
"\n",
" def _resource_apply_sparse(self, grad, var):\n",
" raise NotImplementedError\n",
"\n",
" def get_config(self):\n",
" base_config = super().get_config()\n",
" return {\n",
" **base_config,\n",
" \"learning_rate\": self._serialize_hyperparameter(\"learning_rate\"),\n",
" \"decay\": self._serialize_hyperparameter(\"decay\"),\n",
" \"momentum\": self._serialize_hyperparameter(\"momentum\"),\n",
" }"
]
},
{
"cell_type": "code",
"execution_count": 240,
"metadata": {},
"outputs": [],
"source": [
"tf.random.set_seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 241,
"metadata": {},
"outputs": [],
"source": [
"model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=[8])])\n",
"model.compile(loss=\"mse\", optimizer=MyMomentumOptimizer())\n",
"model.fit(X_train_scaled, y_train, epochs=5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. to 11."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. TensorFlow is an open-source library for numerical computation, particularly well suited and fine-tuned for large-scale Machine Learning. Its core is similar to NumPy, but it also features GPU support, support for distributed computing, computation graph analysis and optimization capabilities (with a portable graph format that allows you to train a TensorFlow model in one environment and run it in another), an optimization API based on reverse-mode autodiff, and several powerful APIs such as tf.keras, tf.data, tf.image, tf.signal, and more. Other popular Deep Learning libraries include PyTorch, MXNet, Microsoft Cognitive Toolkit, Theano, Caffe2, and Chainer.\n",
"2. Although TensorFlow offers most of the functionalities provided by NumPy, it is not a drop-in replacement, for a few reasons. First, the names of the functions are not always the same (for example, `tf.reduce_sum()` versus `np.sum()`). Second, some functions do not behave in exactly the same way (for example, `tf.transpose()` creates a transposed copy of a tensor, while NumPy's `T` attribute creates a transposed view, without actually copying any data). Lastly, NumPy arrays are mutable, while TensorFlow tensors are not (but you can use a `tf.Variable` if you need a mutable object).\n",
"3. Both `tf.range(10)` and `tf.constant(np.arange(10))` return a one-dimensional tensor containing the integers 0 to 9. However, the former uses 32-bit integers while the latter uses 64-bit integers. Indeed, TensorFlow defaults to 32 bits, while NumPy defaults to 64 bits.\n",
"4. Beyond regular tensors, TensorFlow offers several other data structures, including sparse tensors, tensor arrays, ragged tensors, queues, string tensors, and sets. The last two are actually represented as regular tensors, but TensorFlow provides special functions to manipulate them (in `tf.strings` and `tf.sets`).\n",
"5. When you want to define a custom loss function, in general you can just implement it as a regular Python function. However, if your custom loss function must support some hyperparameters (or any other state), then you should subclass the `keras.losses.Loss` class and implement the `__init__()` and `call()` methods. If you want the loss function's hyperparameters to be saved along with the model, then you must also implement the `get_config()` method.\n",
"6. Much like custom loss functions, most metrics can be defined as regular Python functions. But if you want your custom metric to support some hyperparameters (or any other state), then you should subclass the `keras.metrics.Metric` class. Moreover, if computing the metric over a whole epoch is not equivalent to computing the mean metric over all batches in that epoch (e.g., as for the precision and recall metrics), then you should subclass the `keras.metrics.Metric` class and implement the `__init__()`, `update_state()`, and `result()` methods to keep track of a running metric during each epoch. You should also implement the `reset_states()` method unless all it needs to do is reset all variables to 0.0. If you want the state to be saved along with the model, then you should implement the `get_config()` method as well.\n",
"7. You should distinguish the internal components of your model (i.e., layers or reusable blocks of layers) from the model itself (i.e., the object you will train). The former should subclass the `keras.layers.Layer` class, while the latter should subclass the `keras.models.Model` class.\n",
"8. Writing your own custom training loop is fairly advanced, so you should only do it if you really need to. Keras provides several tools to customize training without having to write a custom training loop: callbacks, custom regularizers, custom constraints, custom losses, and so on. You should use these instead of writing a custom training loop whenever possible: writing a custom training loop is more error-prone, and it will be harder to reuse the custom code you write. However, in some cases writing a custom training loop is necessary—for example, if you want to use different optimizers for different parts of your neural network, like in the [Wide & Deep paper](https://homl.info/widedeep). A custom training loop can also be useful when debugging, or when trying to understand exactly how training works.\n",
"9. Custom Keras components should be convertible to TF Functions, which means they should stick to TF operations as much as possible and respect all the rules listed in Chapter 12 (in the _TF Function Rules_ section). If you absolutely need to include arbitrary Python code in a custom component, you can either wrap it in a `tf.py_function()` operation (but this will reduce performance and limit your model's portability) or set `dynamic=True` when creating the custom layer or model (or set `run_eagerly=True` when calling the model's `compile()` method).\n",
"10. Please refer to Chapter 12 for the list of rules to respect when creating a TF Function (in the _TF Function Rules_ section).\n",
"11. Creating a dynamic Keras model can be useful for debugging, as it will not compile any custom component to a TF Function, and you can use any Python debugger to debug your code. It can also be useful if you want to include arbitrary Python code in your model (or in your training code), including calls to external libraries. To make a model dynamic, you must set `dynamic=True` when creating it. Alternatively, you can set `run_eagerly=True` when calling the model's `compile()` method. Making a model dynamic prevents Keras from using any of TensorFlow's graph features, so it will slow down training and inference, and you will not have the possibility to export the computation graph, which will limit your model's portability."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 12. Implement a custom layer that performs _Layer Normalization_\n",
"_We will use this type of layer in Chapter 15 when using Recurrent Neural Networks._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### a.\n",
"_Exercise: The `build()` method should define two trainable weights *α* and *β*, both of shape `input_shape[-1:]` and data type `tf.float32`. *α* should be initialized with 1s, and *β* with 0s._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Solution: see below."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### b.\n",
"_Exercise: The `call()` method should compute the mean_ μ _and standard deviation_ σ _of each instance's features. For this, you can use `tf.nn.moments(inputs, axes=-1, keepdims=True)`, which returns the mean μ and the variance σ<sup>2</sup> of all instances (compute the square root of the variance to get the standard deviation). Then the function should compute and return *α*⊗(*X* - μ)/(σ + ε) + *β*, where ⊗ represents itemwise multiplication (`*`) and ε is a smoothing term (small constant to avoid division by zero, e.g., 0.001)._"
]
},
{
"cell_type": "code",
"execution_count": 242,
"metadata": {},
"outputs": [],
"source": [
"class LayerNormalization(tf.keras.layers.Layer):\n",
" def __init__(self, eps=0.001, **kwargs):\n",
" super().__init__(**kwargs)\n",
" self.eps = eps\n",
"\n",
" def build(self, batch_input_shape):\n",
" self.alpha = self.add_weight(\n",
" name=\"alpha\", shape=batch_input_shape[-1:],\n",
" initializer=\"ones\")\n",
" self.beta = self.add_weight(\n",
" name=\"beta\", shape=batch_input_shape[-1:],\n",
" initializer=\"zeros\")\n",
" super().build(batch_input_shape) # must be at the end\n",
"\n",
" def call(self, X):\n",
" mean, variance = tf.nn.moments(X, axes=-1, keepdims=True)\n",
" return self.alpha * (X - mean) / (tf.sqrt(variance + self.eps)) + self.beta\n",
"\n",
" def compute_output_shape(self, batch_input_shape):\n",
" return batch_input_shape\n",
"\n",
" def get_config(self):\n",
" base_config = super().get_config()\n",
" return {**base_config, \"eps\": self.eps}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that making _ε_ a hyperparameter (`eps`) was not compulsory. Also note that it's preferable to compute `tf.sqrt(variance + self.eps)` rather than `tf.sqrt(variance) + self.eps`. Indeed, the derivative of sqrt(z) is undefined when z=0, so training will bomb whenever the variance vector has at least one component equal to 0. Adding _ε_ within the square root guarantees that this will never happen."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### c.\n",
"_Exercise: Ensure that your custom layer produces the same (or very nearly the same) output as the `tf.keras.layers.LayerNormalization` layer._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create one instance of each class, apply them to some data (e.g., the training set), and ensure that the difference is negligeable."
]
},
{
"cell_type": "code",
"execution_count": 243,
"metadata": {},
"outputs": [],
"source": [
"X = X_train.astype(np.float32)\n",
"\n",
"custom_layer_norm = LayerNormalization()\n",
"keras_layer_norm = tf.keras.layers.LayerNormalization()\n",
"\n",
"tf.reduce_mean(tf.keras.losses.mean_absolute_error(\n",
" keras_layer_norm(X), custom_layer_norm(X)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Yep, that's close enough. To be extra sure, let's make alpha and beta completely random and compare again:"
]
},
{
"cell_type": "code",
"execution_count": 244,
"metadata": {},
"outputs": [],
"source": [
"random_alpha = np.random.rand(X.shape[-1])\n",
"random_beta = np.random.rand(X.shape[-1])\n",
"\n",
"custom_layer_norm.set_weights([random_alpha, random_beta])\n",
"keras_layer_norm.set_weights([random_alpha, random_beta])\n",
"\n",
"tf.reduce_mean(tf.keras.losses.mean_absolute_error(\n",
" keras_layer_norm(X), custom_layer_norm(X)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Still a negligeable difference! Our custom layer works fine."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 13. Train a model using a custom training loop to tackle the Fashion MNIST dataset\n",
"_The Fashion MNIST dataset was introduced in Chapter 10._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### a.\n",
"_Exercise: Display the epoch, iteration, mean training loss, and mean accuracy over each epoch (updated at each iteration), as well as the validation loss and accuracy at the end of each epoch._"
]
},
{
"cell_type": "code",
"execution_count": 245,
"metadata": {},
"outputs": [],
"source": [
"(X_train_full, y_train_full), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()\n",
"X_train_full = X_train_full.astype(np.float32) / 255.\n",
"X_valid, X_train = X_train_full[:5000], X_train_full[5000:]\n",
"y_valid, y_train = y_train_full[:5000], y_train_full[5000:]\n",
"X_test = X_test.astype(np.float32) / 255."
]
},
{
"cell_type": "code",
"execution_count": 246,
"metadata": {},
"outputs": [],
"source": [
"tf.keras.backend.clear_session()\n",
"np.random.seed(42)\n",
"tf.random.set_seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 247,
"metadata": {},
"outputs": [],
"source": [
"model = tf.keras.Sequential([\n",
" tf.keras.layers.Flatten(input_shape=[28, 28]),\n",
" tf.keras.layers.Dense(100, activation=\"relu\"),\n",
" tf.keras.layers.Dense(10, activation=\"softmax\"),\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 248,
"metadata": {},
"outputs": [],
"source": [
"n_epochs = 5\n",
"batch_size = 32\n",
"n_steps = len(X_train) // batch_size\n",
"optimizer = tf.keras.optimizers.Nadam(learning_rate=0.01)\n",
"loss_fn = tf.keras.losses.sparse_categorical_crossentropy\n",
"mean_loss = tf.keras.metrics.Mean()\n",
"metrics = [tf.keras.metrics.SparseCategoricalAccuracy()]"
]
},
{
"cell_type": "code",
"execution_count": 249,
"metadata": {},
"outputs": [],
"source": [
"with trange(1, n_epochs + 1, desc=\"All epochs\") as epochs:\n",
" for epoch in epochs:\n",
" with trange(1, n_steps + 1, desc=f\"Epoch {epoch}/{n_epochs}\") as steps:\n",
" for step in steps:\n",
" X_batch, y_batch = random_batch(X_train, y_train)\n",
" with tf.GradientTape() as tape:\n",
" y_pred = model(X_batch)\n",
" main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))\n",
" loss = tf.add_n([main_loss] + model.losses)\n",
" gradients = tape.gradient(loss, model.trainable_variables)\n",
" optimizer.apply_gradients(zip(gradients, model.trainable_variables))\n",
" for variable in model.variables:\n",
" if variable.constraint is not None:\n",
" variable.assign(variable.constraint(variable)) \n",
" status = OrderedDict()\n",
" mean_loss(loss)\n",
" status[\"loss\"] = mean_loss.result().numpy()\n",
" for metric in metrics:\n",
" metric(y_batch, y_pred)\n",
" status[metric.name] = metric.result().numpy()\n",
" steps.set_postfix(status)\n",
" y_pred = model(X_valid)\n",
" status[\"val_loss\"] = np.mean(loss_fn(y_valid, y_pred))\n",
" status[\"val_accuracy\"] = np.mean(tf.keras.metrics.sparse_categorical_accuracy(\n",
" tf.constant(y_valid, dtype=np.float32), y_pred))\n",
" steps.set_postfix(status)\n",
" for metric in [mean_loss] + metrics:\n",
" metric.reset_states()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### b.\n",
"_Exercise: Try using a different optimizer with a different learning rate for the upper layers and the lower layers._"
]
},
{
"cell_type": "code",
"execution_count": 250,
"metadata": {},
"outputs": [],
"source": [
"tf.keras.backend.clear_session()\n",
"np.random.seed(42)\n",
"tf.random.set_seed(42)"
]
},
{
"cell_type": "code",
"execution_count": 251,
"metadata": {},
"outputs": [],
"source": [
"lower_layers = tf.keras.Sequential([\n",
" tf.keras.layers.Flatten(input_shape=[28, 28]),\n",
" tf.keras.layers.Dense(100, activation=\"relu\"),\n",
"])\n",
"upper_layers = tf.keras.Sequential([\n",
" tf.keras.layers.Dense(10, activation=\"softmax\"),\n",
"])\n",
"model = tf.keras.Sequential([\n",
" lower_layers, upper_layers\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": 252,
"metadata": {},
"outputs": [],
"source": [
"lower_optimizer = tf.keras.optimizers.SGD(learning_rate=1e-4)\n",
"upper_optimizer = tf.keras.optimizers.Nadam(learning_rate=1e-3)"
]
},
{
"cell_type": "code",
"execution_count": 253,
"metadata": {},
"outputs": [],
"source": [
"n_epochs = 5\n",
"batch_size = 32\n",
"n_steps = len(X_train) // batch_size\n",
"loss_fn = tf.keras.losses.sparse_categorical_crossentropy\n",
"mean_loss = tf.keras.metrics.Mean()\n",
"metrics = [tf.keras.metrics.SparseCategoricalAccuracy()]"
]
},
{
"cell_type": "code",
"execution_count": 254,
"metadata": {},
"outputs": [],
"source": [
"with trange(1, n_epochs + 1, desc=\"All epochs\") as epochs:\n",
" for epoch in epochs:\n",
" with trange(1, n_steps + 1, desc=f\"Epoch {epoch}/{n_epochs}\") as steps:\n",
" for step in steps:\n",
" X_batch, y_batch = random_batch(X_train, y_train)\n",
" with tf.GradientTape(persistent=True) as tape:\n",
" y_pred = model(X_batch)\n",
" main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))\n",
" loss = tf.add_n([main_loss] + model.losses)\n",
" for layers, optimizer in ((lower_layers, lower_optimizer),\n",
" (upper_layers, upper_optimizer)):\n",
" gradients = tape.gradient(loss, layers.trainable_variables)\n",
" optimizer.apply_gradients(zip(gradients, layers.trainable_variables))\n",
" del tape\n",
" for variable in model.variables:\n",
" if variable.constraint is not None:\n",
" variable.assign(variable.constraint(variable)) \n",
" status = OrderedDict()\n",
" mean_loss(loss)\n",
" status[\"loss\"] = mean_loss.result().numpy()\n",
" for metric in metrics:\n",
" metric(y_batch, y_pred)\n",
" status[metric.name] = metric.result().numpy()\n",
" steps.set_postfix(status)\n",
" y_pred = model(X_valid)\n",
" status[\"val_loss\"] = np.mean(loss_fn(y_valid, y_pred))\n",
" status[\"val_accuracy\"] = np.mean(tf.keras.metrics.sparse_categorical_accuracy(\n",
" tf.constant(y_valid, dtype=np.float32), y_pred))\n",
" steps.set_postfix(status)\n",
" for metric in [mean_loss] + metrics:\n",
" metric.reset_states()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}