logs["loss"] is the mean loss, not the batch loss anymore (since TF 2.2), fixes #188

2021-03-19 10:50:13 +13:00 · 2021-03-19 10:50:13 +13:00 · 9d33d65ef0
parent e9b5dce122
commit 9d33d65ef0
1 changed files with 23 additions and 0 deletions
--- a/11_training_deep_neural_networks.ipynb
+++ b/11_training_deep_neural_networks.ipynb
@ -1652,6 +1652,29 @@
    "    plt.ylabel(\"Loss\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Warning**: In the `on_batch_end()` method, `logs[\"loss\"]` used to contain the batch loss, but in TensorFlow 2.2.0 it was replaced with the mean loss (since the start of the epoch). This explains why the graph below is much smoother than in the book (if you are using TF 2.2 or above). It also means that there is a lag between the moment the batch loss starts exploding and the moment the explosion becomes clear in the graph. So you should choose a slightly smaller learning rate than you would have chosen with the \"noisy\" graph. Alternatively, you can tweak the `ExponentialLearningRate` callback above so it computes the batch loss (based on the current mean loss and the previous mean loss):\n",
    "\n",
    "```python\n",
    "class ExponentialLearningRate(keras.callbacks.Callback):\n",
    "    def __init__(self, factor):\n",
    "        self.factor = factor\n",
    "        self.rates = []\n",
    "        self.losses = []\n",
    "    def on_epoch_begin(self, epoch, logs=None):\n",
    "        self.prev_loss = 0\n",
    "    def on_batch_end(self, batch, logs=None):\n",
    "        batch_loss = logs[\"loss\"] * (batch + 1) - self.prev_loss * batch\n",
    "        self.prev_loss = logs[\"loss\"]\n",
    "        self.rates.append(K.get_value(self.model.optimizer.lr))\n",
    "        self.losses.append(batch_loss)\n",
    "        K.set_value(self.model.optimizer.lr, self.model.optimizer.lr * self.factor)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,