diff --git a/10_neural_nets_with_keras.ipynb b/10_neural_nets_with_keras.ipynb index 3f622b6..28881c5 100644 --- a/10_neural_nets_with_keras.ipynb +++ b/10_neural_nets_with_keras.ipynb @@ -567,7 +567,7 @@ "metadata": {}, "outputs": [], "source": [ - "keras.utils.plot_model(model, \"my_mnist_model.png\", show_shapes=True)" + "keras.utils.plot_model(model, \"my_fashion_mnist_model.png\", show_shapes=True)" ] }, { @@ -1580,7 +1580,9 @@ { "cell_type": "code", "execution_count": 106, - "metadata": {}, + "metadata": { + "scrolled": true + }, "outputs": [], "source": [ "model.evaluate(X_test, y_test)" @@ -1622,7 +1624,364 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "TODO" + "*Exercise: Train a deep MLP on the MNIST dataset (you can load it using `keras.datasets.mnist.load_data()`. See if you can get over 98% precision. Try searching for the optimal learning rate by using the approach presented in this chapter (i.e., by growing the learning rate exponentially, plotting the loss, and finding the point where the loss shoots up). Try adding all the bells and whistles—save checkpoints, use early stopping, and plot learning curves using TensorBoard.*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's load the dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": 107, + "metadata": {}, + "outputs": [], + "source": [ + "(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Just like for the Fashion MNIST dataset, the MNIST training set contains 60,000 grayscale images, each 28x28 pixels:" + ] + }, + { + "cell_type": "code", + "execution_count": 108, + "metadata": {}, + "outputs": [], + "source": [ + "X_train_full.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each pixel intensity is also represented as a byte (0 to 255):" + ] + }, + { + "cell_type": "code", + "execution_count": 109, + "metadata": {}, + "outputs": [], + "source": [ + "X_train_full.dtype" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's split the full training set into a validation set and a (smaller) training set. We also scale the pixel intensities down to the 0-1 range and convert them to floats, by dividing by 255, just like we did for Fashion MNIST:" + ] + }, + { + "cell_type": "code", + "execution_count": 110, + "metadata": {}, + "outputs": [], + "source": [ + "X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.\n", + "y_valid, y_train = y_train_full[:5000], y_train_full[5000:]\n", + "X_test = X_test / 255." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's plot an image using Matplotlib's `imshow()` function, with a `'binary'`\n", + " color map:" + ] + }, + { + "cell_type": "code", + "execution_count": 111, + "metadata": {}, + "outputs": [], + "source": [ + "plt.imshow(X_train[0], cmap=\"binary\")\n", + "plt.axis('off')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The labels are the class IDs (represented as uint8), from 0 to 9. Conveniently, the class IDs correspond to the digits represented in the images, so we don't need a `class_names` array:" + ] + }, + { + "cell_type": "code", + "execution_count": 112, + "metadata": {}, + "outputs": [], + "source": [ + "y_train" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The validation set contains 5,000 images, and the test set contains 10,000 images:" + ] + }, + { + "cell_type": "code", + "execution_count": 113, + "metadata": {}, + "outputs": [], + "source": [ + "X_valid.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 114, + "metadata": {}, + "outputs": [], + "source": [ + "X_test.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's take a look at a sample of the images in the dataset:" + ] + }, + { + "cell_type": "code", + "execution_count": 115, + "metadata": {}, + "outputs": [], + "source": [ + "n_rows = 4\n", + "n_cols = 10\n", + "plt.figure(figsize=(n_cols * 1.2, n_rows * 1.2))\n", + "for row in range(n_rows):\n", + " for col in range(n_cols):\n", + " index = n_cols * row + col\n", + " plt.subplot(n_rows, n_cols, index + 1)\n", + " plt.imshow(X_train[index], cmap=\"binary\", interpolation=\"nearest\")\n", + " plt.axis('off')\n", + " plt.title(y_train[index], fontsize=12)\n", + "plt.subplots_adjust(wspace=0.2, hspace=0.5)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's build a simple dense network and find the optimal learning rate. We will need a callback to grow the learning rate at each iteration. It will also record the learning rate and the loss at each iteration:" + ] + }, + { + "cell_type": "code", + "execution_count": 116, + "metadata": {}, + "outputs": [], + "source": [ + "K = keras.backend\n", + "\n", + "class ExponentialLearningRate(keras.callbacks.Callback):\n", + " def __init__(self, factor):\n", + " self.factor = factor\n", + " self.rates = []\n", + " self.losses = []\n", + " def on_batch_end(self, batch, logs):\n", + " self.rates.append(K.get_value(self.model.optimizer.lr))\n", + " self.losses.append(logs[\"loss\"])\n", + " K.set_value(self.model.optimizer.lr, self.model.optimizer.lr * self.factor)" + ] + }, + { + "cell_type": "code", + "execution_count": 117, + "metadata": {}, + "outputs": [], + "source": [ + "keras.backend.clear_session()\n", + "np.random.seed(42)\n", + "tf.random.set_seed(42)" + ] + }, + { + "cell_type": "code", + "execution_count": 118, + "metadata": {}, + "outputs": [], + "source": [ + "model = keras.models.Sequential([\n", + " keras.layers.Flatten(input_shape=[28, 28]),\n", + " keras.layers.Dense(300, activation=\"relu\"),\n", + " keras.layers.Dense(100, activation=\"relu\"),\n", + " keras.layers.Dense(10, activation=\"softmax\")\n", + "])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will start with a small learning rate of 1e-3, and grow it by 0.5% at each iteration:" + ] + }, + { + "cell_type": "code", + "execution_count": 119, + "metadata": {}, + "outputs": [], + "source": [ + "model.compile(loss=\"sparse_categorical_crossentropy\",\n", + " optimizer=keras.optimizers.SGD(lr=1e-3),\n", + " metrics=[\"accuracy\"])\n", + "expon_lr = ExponentialLearningRate(factor=1.005)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's train the model for just 1 epoch:" + ] + }, + { + "cell_type": "code", + "execution_count": 120, + "metadata": {}, + "outputs": [], + "source": [ + "history = model.fit(X_train, y_train, epochs=1,\n", + " validation_data=(X_valid, y_valid),\n", + " callbacks=[expon_lr])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now plot the loss as a functionof the learning rate:" + ] + }, + { + "cell_type": "code", + "execution_count": 121, + "metadata": {}, + "outputs": [], + "source": [ + "plt.plot(expon_lr.rates, expon_lr.losses)\n", + "plt.gca().set_xscale('log')\n", + "plt.hlines(min(expon_lr.losses), min(expon_lr.rates), max(expon_lr.rates))\n", + "plt.axis([min(expon_lr.rates), max(expon_lr.rates), 0, expon_lr.losses[0]])\n", + "plt.xlabel(\"Learning rate\")\n", + "plt.ylabel(\"Loss\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The loss starts shooting back up violently around 3e-1, so let's try using 2e-1 as our learning rate:" + ] + }, + { + "cell_type": "code", + "execution_count": 122, + "metadata": {}, + "outputs": [], + "source": [ + "keras.backend.clear_session()\n", + "np.random.seed(42)\n", + "tf.random.set_seed(42)" + ] + }, + { + "cell_type": "code", + "execution_count": 123, + "metadata": {}, + "outputs": [], + "source": [ + "model = keras.models.Sequential([\n", + " keras.layers.Flatten(input_shape=[28, 28]),\n", + " keras.layers.Dense(300, activation=\"relu\"),\n", + " keras.layers.Dense(100, activation=\"relu\"),\n", + " keras.layers.Dense(10, activation=\"softmax\")\n", + "])" + ] + }, + { + "cell_type": "code", + "execution_count": 124, + "metadata": {}, + "outputs": [], + "source": [ + "model.compile(loss=\"sparse_categorical_crossentropy\",\n", + " optimizer=keras.optimizers.SGD(lr=2e-1),\n", + " metrics=[\"accuracy\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 125, + "metadata": {}, + "outputs": [], + "source": [ + "run_index = 1 # increment this at every run\n", + "run_logdir = os.path.join(os.curdir, \"my_mnist_logs\", \"run_{:03d}\".format(run_index))\n", + "run_logdir" + ] + }, + { + "cell_type": "code", + "execution_count": 126, + "metadata": {}, + "outputs": [], + "source": [ + "early_stopping_cb = keras.callbacks.EarlyStopping(patience=20)\n", + "checkpoint_cb = keras.callbacks.ModelCheckpoint(\"my_mnist_model.h5\", save_best_only=True)\n", + "tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)\n", + "\n", + "history = model.fit(X_train, y_train, epochs=100,\n", + " validation_data=(X_valid, y_valid),\n", + " callbacks=[early_stopping_cb, checkpoint_cb, tensorboard_cb])" + ] + }, + { + "cell_type": "code", + "execution_count": 127, + "metadata": {}, + "outputs": [], + "source": [ + "model = keras.models.load_model(\"my_mnist_model.h5\") # rollback to best model\n", + "model.evaluate(X_test, y_test)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We got over 98% accuracy. Finally, let's look at the learning curves using TensorBoard:" + ] + }, + { + "cell_type": "code", + "execution_count": 128, + "metadata": {}, + "outputs": [], + "source": [ + "%tensorboard --logdir=./my_mnist_logs --port=6006" ] }, {