Work in progress for chapters 16 to 19

main
Aurélien Geron 2022-02-19 22:09:28 +13:00
parent 4a0a353b5b
commit bbc1113951
4 changed files with 1085 additions and 1649 deletions

File diff suppressed because it is too large Load Diff

View File

@ -4,14 +4,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Chapter 16 Autoencoders and GANs**"
"**Chapter 17 Autoencoders and GANs**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_This notebook contains all the sample code and solutions to the exercises in chapter 16._"
"_This notebook contains all the sample code and solutions to the exercises in chapter 17._"
]
},
{
@ -20,74 +20,147 @@
"source": [
"<table align=\"left\">\n",
" <td>\n",
" <a href=\"https://colab.research.google.com/github/ageron/handson-ml2/blob/master/17_autoencoders_and_gans.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
" <a href=\"https://colab.research.google.com/github/ageron/handson-ml3/blob/main/17_autoencoders_and_gans.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://kaggle.com/kernels/welcome?src=https://github.com/ageron/handson-ml2/blob/master/17_autoencoders_and_gans.ipynb\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" /></a>\n",
" <a target=\"_blank\" href=\"https://kaggle.com/kernels/welcome?src=https://github.com/ageron/handson-ml3/blob/main/17_autoencoders_and_gans.ipynb\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" /></a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"tags": []
},
"source": [
"# WORK IN PROGRESS\n",
"<img src=\"https://freesvg.org/img/Lavori-in-corso.png\" width=\"200\" />\n",
"\n",
"**I'm still working on updating this chapter to the 3rd edition. Please come back in a few weeks.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dFXIv9qNpKzt",
"tags": []
},
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "8IPbJEmZpKzu"
},
"source": [
"First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures."
"This project requires Python 3.8 or above:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"execution_count": null,
"metadata": {
"id": "TFSU3FCOpKzu"
},
"outputs": [],
"source": [
"# Python ≥3.8 is required\n",
"import sys\n",
"assert sys.version_info >= (3, 8)\n",
"\n",
"# Is this notebook running on Colab or Kaggle?\n",
"IS_COLAB = \"google.colab\" in sys.modules\n",
"IS_KAGGLE = \"kaggle_secrets\" in sys.modules\n",
"assert sys.version_info >= (3, 8)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TAlKky09pKzv"
},
"source": [
"It also requires Scikit-Learn ≥ 1.0.1:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YqCwW7cMpKzw"
},
"outputs": [],
"source": [
"import sklearn\n",
"\n",
"# Common imports\n",
"import numpy as np\n",
"assert sklearn.__version__ >= \"1.0.1\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GJtVEqxfpKzw"
},
"source": [
"And TensorFlow ≥ 2.6:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0Piq5se2pKzx"
},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"\n",
"assert tf.__version__ >= \"2.6.0\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DDaDoLQTpKzx"
},
"source": [
"As we did in earlier chapters, let's define the default font sizes to make the figures prettier:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8d4TH3NbpKzx"
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"plt.rc('font', size=14)\n",
"plt.rc('axes', labelsize=14, titlesize=14)\n",
"plt.rc('legend', fontsize=14)\n",
"plt.rc('xtick', labelsize=10)\n",
"plt.rc('ytick', labelsize=10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RcoUIRsvpKzy"
},
"source": [
"And let's create the `images/autoencoders` folder (if it doesn't already exist), and define the `save_fig()` function which is used through this notebook to save the figures in high-res for the book:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PQFH5Y9PpKzy"
},
"outputs": [],
"source": [
"from pathlib import Path\n",
"\n",
"# Scikit-Learn ≥1.0 is required\n",
"import sklearn\n",
"assert sklearn.__version__ >= \"1.0\"\n",
"\n",
"# TensorFlow ≥2.6 is required\n",
"import tensorflow as tf\n",
"assert tf.__version__ >= \"2.6\"\n",
"\n",
"# to make this notebook's output stable across runs\n",
"np.random.seed(42)\n",
"tf.random.set_seed(42)\n",
"\n",
"if not tf.config.list_physical_devices('GPU'):\n",
" print(\"No GPU was detected. Neural nets can be very slow without a GPU.\")\n",
" if IS_COLAB:\n",
" print(\"Go to Runtime > Change runtime and select a GPU hardware accelerator.\")\n",
" if IS_KAGGLE:\n",
" print(\"Go to Settings > Accelerator and select GPU.\")\n",
"\n",
"# To plot pretty figures\n",
"%matplotlib inline\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"mpl.rc('axes', labelsize=14)\n",
"mpl.rc('xtick', labelsize=12)\n",
"mpl.rc('ytick', labelsize=12)\n",
"\n",
"# Where to save the figures\n",
"IMAGES_PATH = Path() / \"images\" / \"autoencoders\"\n",
"IMAGES_PATH.mkdir(parents=True, exist_ok=True)\n",
"\n",
@ -100,20 +173,28 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "YTsawKlapKzy"
},
"source": [
"A couple utility functions to plot grayscale 28x28 image:"
"This chapter can be very slow without a GPU, so let's make sure there's one, or else issue a warning:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"execution_count": null,
"metadata": {
"id": "Ekxzo6pOpKzy"
},
"outputs": [],
"source": [
"def plot_image(image):\n",
" plt.imshow(image, cmap=\"binary\")\n",
" plt.axis(\"off\")"
"if not tf.config.list_physical_devices('GPU'):\n",
" print(\"No GPU was detected. Neural nets can be very slow without a GPU.\")\n",
" if \"google.colab\" in sys.modules:\n",
" print(\"Go to Runtime > Change runtime and select a GPU hardware \"\n",
" \"accelerator.\")\n",
" if \"kaggle_secrets\" in sys.modules:\n",
" print(\"Go to Settings > Accelerator and select GPU.\")"
]
},
{
@ -132,7 +213,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -159,7 +240,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -175,7 +256,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -184,7 +265,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -193,7 +274,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -222,7 +303,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -249,7 +330,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -259,7 +340,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -292,7 +373,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -308,7 +389,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -325,7 +406,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -341,7 +422,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -359,7 +440,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -369,8 +450,8 @@
"plt.scatter(X_valid_2D[:, 0], X_valid_2D[:, 1], c=y_valid, s=10, cmap=cmap)\n",
"image_positions = np.array([[1., 1.]])\n",
"for index, position in enumerate(X_valid_2D):\n",
" dist = np.sum((position - image_positions) ** 2, axis=1)\n",
" if np.min(dist) > 0.02: # if far enough from other images\n",
" dist = ((position - image_positions) ** 2).sum(axis=1)\n",
" if dist.min() > 0.02: # if far enough from other images\n",
" image_positions = np.r_[image_positions, [position]]\n",
" imagebox = mpl.offsetbox.AnnotationBbox(\n",
" mpl.offsetbox.OffsetImage(X_valid[index], cmap=\"binary\"),\n",
@ -397,7 +478,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -418,7 +499,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -451,7 +532,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": null,
"metadata": {
"scrolled": true
},
@ -470,7 +551,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -492,7 +573,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -513,7 +594,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -526,7 +607,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -536,7 +617,7 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -548,7 +629,7 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -572,7 +653,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -605,7 +686,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -615,7 +696,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -632,7 +713,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -652,7 +733,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -661,7 +742,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -685,7 +766,7 @@
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -712,7 +793,7 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -733,7 +814,7 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -760,7 +841,7 @@
},
{
"cell_type": "code",
"execution_count": 34,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -788,7 +869,7 @@
},
{
"cell_type": "code",
"execution_count": 35,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -814,7 +895,7 @@
},
{
"cell_type": "code",
"execution_count": 36,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -831,7 +912,7 @@
},
{
"cell_type": "code",
"execution_count": 37,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -842,13 +923,13 @@
" ax.bar(x, counts / len(data), width=widths*0.8)\n",
" ax.xaxis.set_ticks(bins)\n",
" ax.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(\n",
" lambda y, position: \"{}%\".format(int(np.round(100 * y)))))\n",
" lambda y, position: \"{}%\".format(round(100 * y))))\n",
" ax.grid(True)"
]
},
{
"cell_type": "code",
"execution_count": 38,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -881,7 +962,7 @@
},
{
"cell_type": "code",
"execution_count": 39,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -898,7 +979,7 @@
},
{
"cell_type": "code",
"execution_count": 40,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -927,7 +1008,7 @@
},
{
"cell_type": "code",
"execution_count": 41,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -936,7 +1017,7 @@
},
{
"cell_type": "code",
"execution_count": 42,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -953,7 +1034,7 @@
},
{
"cell_type": "code",
"execution_count": 43,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -976,7 +1057,7 @@
},
{
"cell_type": "code",
"execution_count": 44,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -996,7 +1077,7 @@
},
{
"cell_type": "code",
"execution_count": 45,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1023,7 +1104,7 @@
},
{
"cell_type": "code",
"execution_count": 46,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1032,7 +1113,7 @@
},
{
"cell_type": "code",
"execution_count": 47,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1050,7 +1131,7 @@
},
{
"cell_type": "code",
"execution_count": 48,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1062,7 +1143,7 @@
},
{
"cell_type": "code",
"execution_count": 49,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1103,7 +1184,7 @@
},
{
"cell_type": "code",
"execution_count": 50,
"execution_count": null,
"metadata": {
"scrolled": true
},
@ -1122,7 +1203,7 @@
},
{
"cell_type": "code",
"execution_count": 51,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1130,7 +1211,7 @@
" n_cols = n_cols or len(images)\n",
" n_rows = (len(images) - 1) // n_cols + 1\n",
" if images.shape[-1] == 1:\n",
" images = np.squeeze(images, axis=-1)\n",
" images = images.squeeze(axis=-1)\n",
" plt.figure(figsize=(n_cols, n_rows))\n",
" for index, image in enumerate(images):\n",
" plt.subplot(n_rows, n_cols, index + 1)\n",
@ -1147,7 +1228,7 @@
},
{
"cell_type": "code",
"execution_count": 52,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1168,7 +1249,7 @@
},
{
"cell_type": "code",
"execution_count": 53,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1201,7 +1282,7 @@
},
{
"cell_type": "code",
"execution_count": 54,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1227,7 +1308,7 @@
},
{
"cell_type": "code",
"execution_count": 55,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1238,7 +1319,7 @@
},
{
"cell_type": "code",
"execution_count": 56,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1249,7 +1330,7 @@
},
{
"cell_type": "code",
"execution_count": 57,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1276,7 +1357,7 @@
},
{
"cell_type": "code",
"execution_count": 58,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1285,7 +1366,7 @@
},
{
"cell_type": "code",
"execution_count": 59,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1300,7 +1381,7 @@
},
{
"cell_type": "code",
"execution_count": 60,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1316,7 +1397,7 @@
},
{
"cell_type": "code",
"execution_count": 61,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1351,7 +1432,7 @@
},
{
"cell_type": "code",
"execution_count": 62,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1362,7 +1443,7 @@
},
{
"cell_type": "code",
"execution_count": 63,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1371,7 +1452,7 @@
},
{
"cell_type": "code",
"execution_count": 64,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1383,7 +1464,7 @@
},
{
"cell_type": "code",
"execution_count": 65,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1392,7 +1473,7 @@
},
{
"cell_type": "code",
"execution_count": 66,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1428,7 +1509,7 @@
},
{
"cell_type": "code",
"execution_count": 67,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1448,7 +1529,7 @@
},
{
"cell_type": "code",
"execution_count": 68,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1482,7 +1563,7 @@
},
{
"cell_type": "code",
"execution_count": 69,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1499,7 +1580,7 @@
},
{
"cell_type": "code",
"execution_count": 70,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1516,11 +1597,11 @@
},
{
"cell_type": "code",
"execution_count": 71,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hashes = np.round(hashing_encoder.predict(X_valid)).astype(np.int32)\n",
"hashes = hashing_encoder.predict(X_valid).round().astype(np.int32)\n",
"hashes *= np.array([[2**bit for bit in range(16)]])\n",
"hashes = hashes.sum(axis=1)\n",
"for h in hashes[:5]:\n",
@ -1537,7 +1618,7 @@
},
{
"cell_type": "code",
"execution_count": 72,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1568,9 +1649,26 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. to 8.\n",
"\n",
"See Appendix A."
"## 1. to 8."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Here are some of the main tasks that autoencoders are used for:\n",
" * Feature extraction\n",
" * Unsupervised pretraining\n",
" * Dimensionality reduction\n",
" * Generative models\n",
" * Anomaly detection (an autoencoder is generally bad at reconstructing outliers)\n",
"2. If you want to train a classifier and you have plenty of unlabeled training data but only a few thousand labeled instances, then you could first train a deep autoencoder on the full dataset (labeled + unlabeled), then reuse its lower half for the classifier (i.e., reuse the layers up to the codings layer, included) and train the classifier using the labeled data. If you have little labeled data, you probably want to freeze the reused layers when training the classifier.\n",
"3. The fact that an autoencoder perfectly reconstructs its inputs does not necessarily mean that it is a good autoencoder; perhaps it is simply an overcomplete autoencoder that learned to copy its inputs to the codings layer and then to the outputs. In fact, even if the codings layer contained a single neuron, it would be possible for a very deep autoencoder to learn to map each training instance to a different coding (e.g., the first instance could be mapped to 0.001, the second to 0.002, the third to 0.003, and so on), and it could learn \"by heart\" to reconstruct the right training instance for each coding. It would perfectly reconstruct its inputs without really learning any useful pattern in the data. In practice such a mapping is unlikely to happen, but it illustrates the fact that perfect reconstructions are not a guarantee that the autoencoder learned anything useful. However, if it produces very bad reconstructions, then it is almost guaranteed to be a bad autoencoder. To evaluate the performance of an autoencoder, one option is to measure the reconstruction loss (e.g., compute the MSE, or the mean square of the outputs minus the inputs). Again, a high reconstruction loss is a good sign that the autoencoder is bad, but a low reconstruction loss is not a guarantee that it is good. You should also evaluate the autoencoder according to what it will be used for. For example, if you are using it for unsupervised pretraining of a classifier, then you should also evaluate the classifier's performance.\n",
"4. An undercomplete autoencoder is one whose codings layer is smaller than the input and output layers. If it is larger, then it is an overcomplete autoencoder. The main risk of an excessively undercomplete autoencoder is that it may fail to reconstruct the inputs. The main risk of an overcomplete autoencoder is that it may just copy the inputs to the outputs, without learning any useful features.\n",
"5. To tie the weights of an encoder layer and its corresponding decoder layer, you simply make the decoder weights equal to the transpose of the encoder weights. This reduces the number of parameters in the model by half, often making training converge faster with less training data and reducing the risk of overfitting the training set.\n",
"6. A generative model is a model capable of randomly generating outputs that resemble the training instances. For example, once trained successfully on the MNIST dataset, a generative model can be used to randomly generate realistic images of digits. The output distribution is typically similar to the training data. For example, since MNIST contains many images of each digit, the generative model would output roughly the same number of images of each digit. Some generative models can be parametrized—for example, to generate only some kinds of outputs. An example of a generative autoencoder is the variational autoencoder.\n",
"7. A generative adversarial network is a neural network architecture composed of two parts, the generator and the discriminator, which have opposing objectives. The generator's goal is to generate instances similar to those in the training set, to fool the discriminator. The discriminator must distinguish the real instances from the generated ones. At each training iteration, the discriminator is trained like a normal binary classifier, then the generator is trained to maximize the discriminator's error. GANs are used for advanced image processing tasks such as super resolution, colorization, image editing (replacing objects with realistic background), turning a simple sketch into a photorealistic image, or predicting the next frames in a video. They are also used to augment a dataset (to train other models), to generate other types of data (such as text, audio, and time series), and to identify the weaknesses in other models and strengthen them.\n",
"8. Training GANs is notoriously difficult, because of the complex dynamics between the generator and the discriminator. The biggest difficulty is mode collapse, where the generator produces outputs with very little diversity. Moreover, training can be terribly unstable: it may start out fine and then suddenly start oscillating or diverging, without any apparent reason. GANs are also very sensitive to the choice of hyperparameters."
]
},
{
@ -1586,7 +1684,7 @@
},
{
"cell_type": "code",
"execution_count": 73,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1597,7 +1695,7 @@
},
{
"cell_type": "code",
"execution_count": 74,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1615,7 +1713,7 @@
},
{
"cell_type": "code",
"execution_count": 75,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1624,7 +1722,7 @@
},
{
"cell_type": "code",
"execution_count": 76,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1638,7 +1736,7 @@
},
{
"cell_type": "code",
"execution_count": 77,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1647,7 +1745,7 @@
},
{
"cell_type": "code",
"execution_count": 78,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1660,7 +1758,7 @@
},
{
"cell_type": "code",
"execution_count": 79,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1677,7 +1775,7 @@
" if index == 0:\n",
" plt.title(\"Original\")\n",
" plt.subplot(n_images, 3, index * 3 + 2)\n",
" plt.imshow(np.clip(new_images_noisy[index], 0., 1.))\n",
" plt.imshow(new_images_noisy[index].clip(0., 1.))\n",
" plt.axis('off')\n",
" if index == 0:\n",
" plt.title(\"Noisy\")\n",
@ -1757,7 +1855,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},

File diff suppressed because it is too large Load Diff

View File

@ -4,14 +4,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Chapter 18 Training and Deploying TensorFlow Models at Scale**"
"**Chapter 19 Training and Deploying TensorFlow Models at Scale**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_This notebook contains all the sample code and solutions to the exercises in chapter 18._"
"_This notebook contains all the sample code and solutions to the exercises in chapter 19._"
]
},
{
@ -20,81 +20,147 @@
"source": [
"<table align=\"left\">\n",
" <td>\n",
" <a href=\"https://colab.research.google.com/github/ageron/handson-ml2/blob/master/19_training_and_deploying_at_scale.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
" <a href=\"https://colab.research.google.com/github/ageron/handson-ml3/blob/main/19_training_and_deploying_at_scale.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
" </td>\n",
" <td>\n",
" <a target=\"_blank\" href=\"https://kaggle.com/kernels/welcome?src=https://github.com/ageron/handson-ml2/blob/master/19_training_and_deploying_at_scale.ipynb\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" /></a>\n",
" <a target=\"_blank\" href=\"https://kaggle.com/kernels/welcome?src=https://github.com/ageron/handson-ml3/blob/main/19_training_and_deploying_at_scale.ipynb\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" /></a>\n",
" </td>\n",
"</table>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"tags": []
},
"source": [
"# WORK IN PROGRESS\n",
"<img src=\"https://freesvg.org/img/Lavori-in-corso.png\" width=\"200\" />\n",
"\n",
"**I'm still working on updating this chapter to the 3rd edition. Please come back in a few weeks.**"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dFXIv9qNpKzt",
"tags": []
},
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"id": "8IPbJEmZpKzu"
},
"source": [
"First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures."
"This project requires Python 3.8 or above:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"execution_count": null,
"metadata": {
"id": "TFSU3FCOpKzu"
},
"outputs": [],
"source": [
"# Python ≥3.8 is required\n",
"import sys\n",
"assert sys.version_info >= (3, 8)\n",
"\n",
"# Is this notebook running on Colab or Kaggle?\n",
"IS_COLAB = \"google.colab\" in sys.modules\n",
"IS_KAGGLE = \"kaggle_secrets\" in sys.modules\n",
"assert sys.version_info >= (3, 8)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TAlKky09pKzv"
},
"source": [
"It also requires Scikit-Learn ≥ 1.0.1:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YqCwW7cMpKzw"
},
"outputs": [],
"source": [
"import sklearn\n",
"\n",
"if IS_COLAB or IS_KAGGLE:\n",
" !echo \"deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal\" > /etc/apt/sources.list.d/tensorflow-serving.list\n",
" !curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -\n",
" !apt update && apt-get install -y tensorflow-model-server\n",
" %pip install -q -U tensorflow-serving-api\n",
"assert sklearn.__version__ >= \"1.0.1\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "GJtVEqxfpKzw"
},
"source": [
"And TensorFlow ≥ 2.6:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0Piq5se2pKzx"
},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"\n",
"# Common imports\n",
"import os\n",
"import numpy as np\n",
"assert tf.__version__ >= \"2.6.0\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DDaDoLQTpKzx"
},
"source": [
"As we did in earlier chapters, let's define the default font sizes to make the figures prettier:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8d4TH3NbpKzx"
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"plt.rc('font', size=14)\n",
"plt.rc('axes', labelsize=14, titlesize=14)\n",
"plt.rc('legend', fontsize=14)\n",
"plt.rc('xtick', labelsize=10)\n",
"plt.rc('ytick', labelsize=10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RcoUIRsvpKzy"
},
"source": [
"And let's create the `images/deploy` folder (if it doesn't already exist), and define the `save_fig()` function which is used through this notebook to save the figures in high-res for the book:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PQFH5Y9PpKzy"
},
"outputs": [],
"source": [
"from pathlib import Path\n",
"\n",
"# Scikit-Learn ≥1.0 is required\n",
"import sklearn\n",
"assert sklearn.__version__ >= \"1.0\"\n",
"\n",
"# TensorFlow ≥2.6 is required\n",
"import tensorflow as tf\n",
"assert tf.__version__ >= \"2.6\"\n",
"\n",
"# to make this notebook's output stable across runs\n",
"np.random.seed(42)\n",
"tf.random.set_seed(42)\n",
"\n",
"if not tf.config.list_physical_devices('GPU'):\n",
" print(\"No GPU was detected. Neural nets can be very slow without a GPU.\")\n",
" if IS_COLAB:\n",
" print(\"Go to Runtime > Change runtime and select a GPU hardware accelerator.\")\n",
" if IS_KAGGLE:\n",
" print(\"Go to Settings > Accelerator and select GPU.\")\n",
"\n",
"# To plot pretty figures\n",
"%matplotlib inline\n",
"import matplotlib as mpl\n",
"import matplotlib.pyplot as plt\n",
"mpl.rc('axes', labelsize=14)\n",
"mpl.rc('xtick', labelsize=12)\n",
"mpl.rc('ytick', labelsize=12)\n",
"\n",
"# Where to save the figures\n",
"IMAGES_PATH = Path() / \"images\" / \"deploy\"\n",
"IMAGES_PATH.mkdir(parents=True, exist_ok=True)\n",
"\n",
@ -105,6 +171,52 @@
" plt.savefig(path, format=fig_extension, dpi=resolution)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YTsawKlapKzy"
},
"source": [
"This chapter can be very slow without a GPU, so let's make sure there's one, or else issue a warning:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ekxzo6pOpKzy"
},
"outputs": [],
"source": [
"if not tf.config.list_physical_devices('GPU'):\n",
" print(\"No GPU was detected. Neural nets can be very slow without a GPU.\")\n",
" if \"google.colab\" in sys.modules:\n",
" print(\"Go to Runtime > Change runtime and select a GPU hardware \"\n",
" \"accelerator.\")\n",
" if \"kaggle_secrets\" in sys.modules:\n",
" print(\"Go to Settings > Accelerator and select GPU.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you are running this notebook in Colab or Kaggle, let's install TensorFlow Server:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if \"google.colab\" in sys.modules or \"kaggle_secrets\" in sys.modules:\n",
" !echo \"deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal\" > /etc/apt/sources.list.d/tensorflow-serving.list\n",
" !curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -\n",
" !apt update && apt-get install -y tensorflow-model-server\n",
" %pip install -q -U tensorflow-serving-api"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -122,7 +234,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -136,7 +248,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -156,16 +268,16 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.round(model.predict(X_new), 2)"
"model.predict(X_new).round(2)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -177,7 +289,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -186,7 +298,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -202,7 +314,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -223,7 +335,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -232,7 +344,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -241,7 +353,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -250,7 +362,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -260,7 +372,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -276,7 +388,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -285,7 +397,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -302,7 +414,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -313,7 +425,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -363,7 +475,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -372,7 +484,7 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -385,7 +497,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -394,7 +506,7 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -408,7 +520,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -424,7 +536,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -438,7 +550,7 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -447,7 +559,7 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -464,7 +576,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -479,7 +591,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -493,7 +605,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -509,7 +621,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": null,
"metadata": {
"scrolled": true
},
@ -530,7 +642,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -550,7 +662,7 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": null,
"metadata": {
"scrolled": true
},
@ -573,7 +685,7 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -585,7 +697,7 @@
},
{
"cell_type": "code",
"execution_count": 35,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -594,7 +706,7 @@
},
{
"cell_type": "code",
"execution_count": 36,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -610,7 +722,7 @@
},
{
"cell_type": "code",
"execution_count": 34,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -625,7 +737,7 @@
},
{
"cell_type": "code",
"execution_count": 35,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -634,7 +746,7 @@
},
{
"cell_type": "code",
"execution_count": 36,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -658,7 +770,7 @@
},
{
"cell_type": "code",
"execution_count": 37,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -667,7 +779,7 @@
},
{
"cell_type": "code",
"execution_count": 38,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -682,7 +794,7 @@
},
{
"cell_type": "code",
"execution_count": 39,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -698,12 +810,12 @@
},
{
"cell_type": "code",
"execution_count": 40,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Y_probas = predict(X_new)\n",
"np.round(Y_probas, 2)"
"Y_probas.round(2)"
]
},
{
@ -722,7 +834,7 @@
},
{
"cell_type": "code",
"execution_count": 41,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -732,7 +844,7 @@
},
{
"cell_type": "code",
"execution_count": 42,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -741,7 +853,7 @@
},
{
"cell_type": "code",
"execution_count": 43,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -750,7 +862,7 @@
},
{
"cell_type": "code",
"execution_count": 44,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -769,7 +881,7 @@
},
{
"cell_type": "code",
"execution_count": 45,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -780,7 +892,7 @@
},
{
"cell_type": "code",
"execution_count": 46,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -803,7 +915,7 @@
},
{
"cell_type": "code",
"execution_count": 47,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -818,7 +930,7 @@
},
{
"cell_type": "code",
"execution_count": 48,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -856,7 +968,7 @@
},
{
"cell_type": "code",
"execution_count": 49,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -867,7 +979,7 @@
},
{
"cell_type": "code",
"execution_count": 50,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -883,7 +995,7 @@
},
{
"cell_type": "code",
"execution_count": 51,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -954,7 +1066,7 @@
},
{
"cell_type": "code",
"execution_count": 52,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -978,7 +1090,7 @@
},
{
"cell_type": "code",
"execution_count": 53,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1007,7 +1119,7 @@
},
{
"cell_type": "code",
"execution_count": 54,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1019,7 +1131,7 @@
},
{
"cell_type": "code",
"execution_count": 55,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1028,7 +1140,7 @@
},
{
"cell_type": "code",
"execution_count": 56,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1048,7 +1160,7 @@
},
{
"cell_type": "code",
"execution_count": 57,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1115,7 +1227,7 @@
},
{
"cell_type": "code",
"execution_count": 58,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1131,7 +1243,7 @@
},
{
"cell_type": "code",
"execution_count": 59,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1158,7 +1270,7 @@
},
{
"cell_type": "code",
"execution_count": 60,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@ -1175,13 +1287,13 @@
},
{
"cell_type": "code",
"execution_count": 61,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = tf.keras.models.load_model(\"my_mnist_multiworker_model.h5\")\n",
"Y_pred = model.predict(X_new)\n",
"np.argmax(Y_pred, axis=-1)"
"Y_pred.argmax(axis=-1)"
]
},
{
@ -1202,9 +1314,26 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. to 8.\n",
"\n",
"See Appendix A."
"## 1. to 8."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. A SavedModel contains a TensorFlow model, including its architecture (a computation graph) and its weights. It is stored as a directory containing a _saved_model.pb_ file, which defines the computation graph (represented as a serialized protocol buffer), and a _variables_ subdirectory containing the variable values. For models containing a large number of weights, these variable values may be split across multiple files. A SavedModel also includes an _assets_ subdirectory that may contain additional data, such as vocabulary files, class names, or some example instances for this model. To be more accurate, a SavedModel can contain one or more _metagraphs_. A metagraph is a computation graph plus some function signature definitions (including their input and output names, types, and shapes). Each metagraph is identified by a set of tags. To inspect a SavedModel, you can use the command-line tool `saved_model_cli` or just load it using `tf.saved_model.load()` and inspect it in Python.\n",
"2. TF Serving allows you to deploy multiple TensorFlow models (or multiple versions of the same model) and make them accessible to all your applications easily via a REST API or a gRPC API. Using your models directly in your applications would make it harder to deploy a new version of a model across all applications. Implementing your own microservice to wrap a TF model would require extra work, and it would be hard to match TF Serving's features. TF Serving has many features: it can monitor a directory and autodeploy the models that are placed there, and you won't have to change or even restart any of your applications to benefit from the new model versions; it's fast, well tested, and scales very well; and it supports A/B testing of experimental models and deploying a new model version to just a subset of your users (in this case the model is called a _canary_). TF Serving is also capable of grouping individual requests into batches to run them jointly on the GPU. To deploy TF Serving, you can install it from source, but it is much simpler to install it using a Docker image. To deploy a cluster of TF Serving Docker images, you can use an orchestration tool such as Kubernetes, or use a fully hosted solution such as Google Cloud AI Platform.\n",
"3. To deploy a model across multiple TF Serving instances, all you need to do is configure these TF Serving instances to monitor the same _models_ directory, and then export your new model as a SavedModel into a subdirectory.\n",
"4. The gRPC API is more efficient than the REST API. However, its client libraries are not as widely available, and if you activate compression when using the REST API, you can get almost the same performance. So, the gRPC API is most useful when you need the highest possible performance and the clients are not limited to the REST API.\n",
"5. To reduce a model's size so it can run on a mobile or embedded device, TFLite uses several techniques:\n",
" * It provides a converter which can optimize a SavedModel: it shrinks the model and reduces its latency. To do this, it prunes all the operations that are not needed to make predictions (such as training operations), and it optimizes and fuses operations whenever possible.\n",
" * The converter can also perform post-training quantization: this technique dramatically reduces the models size, so its much faster to download and store.\n",
" * It saves the optimized model using the FlatBuffer format, which can be loaded to RAM directly, without parsing. This reduces the loading time and memory footprint.\n",
"6. Quantization-aware training consists in adding fake quantization operations to the model during training. This allows the model to learn to ignore the quantization noise; the final weights will be more robust to quantization.\n",
"7. Model parallelism means chopping your model into multiple parts and running them in parallel across multiple devices, hopefully speeding up the model during training or inference. Data parallelism means creating multiple exact replicas of your model and deploying them across multiple devices. At each iteration during training, each replica is given a different batch of data, and it computes the gradients of the loss with regard to the model parameters. In synchronous data parallelism, the gradients from all replicas are then aggregated and the optimizer performs a Gradient Descent step. The parameters may be centralized (e.g., on parameter servers) or replicated across all replicas and kept in sync using AllReduce. In asynchronous data parallelism, the parameters are centralized and the replicas run independently from each other, each updating the central parameters directly at the end of each training iteration, without having to wait for the other replicas. To speed up training, data parallelism turns out to work better than model parallelism, in general. This is mostly because it requires less communication across devices. Moreover, it is much easier to implement, and it works the same way for any model, whereas model parallelism requires analyzing the model to determine the best way to chop it into pieces.\n",
"8. When training a model across multiple servers, you can use the following distribution strategies:\n",
" * The `MultiWorkerMirroredStrategy` performs mirrored data parallelism. The model is replicated across all available servers and devices, and each replica gets a different batch of data at each training iteration and computes its own gradients. The mean of the gradients is computed and shared across all replicas using a distributed AllReduce implementation (NCCL by default), and all replicas perform the same Gradient Descent step. This strategy is the simplest to use since all servers and devices are treated in exactly the same way, and it performs fairly well. In general, you should use this strategy. Its main limitation is that it requires the model to fit in RAM on every replica.\n",
" * The `ParameterServerStrategy` performs asynchronous data parallelism. The model is replicated across all devices on all workers, and the parameters are sharded across all parameter servers. Each worker has its own training loop, running asynchronously with the other workers; at each training iteration, each worker gets its own batch of data and fetches the latest version of the model parameters from the parameter servers, then it computes the gradients of the loss with regard to these parameters, and it sends them to the parameter servers. Lastly, the parameter servers perform a Gradient Descent step using these gradients. This strategy is generally slower than the previous strategy, and a bit harder to deploy, since it requires managing parameter servers. However, it can be useful in some situations, especially when you can take advantage of the asynchronous updates, for example to reduce I/O bottlenecks. This depends on many factors, including hardware, network topology, number of servers, model size, and more, so your mileage may vary."
]
},
{
@ -1262,7 +1391,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},