From dc2b38fc8376550d7d4a1c84f3977658e2b7c2aa Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aur=C3=A9lien=20Geron?= Date: Fri, 18 May 2018 17:09:59 +0200 Subject: [PATCH] Add a comment about why Deep Learning prefers reverse mode autodiff rather than forward mode --- extra_autodiff.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extra_autodiff.ipynb b/extra_autodiff.ipynb index ebc40bd..cdcc42c 100644 --- a/extra_autodiff.ipynb +++ b/extra_autodiff.ipynb @@ -431,7 +431,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Of course both approaches give the same result (except for rounding errors), and with a single input and output they involve the same number of computations. But when there are several inputs or outputs, they can have very different performance. Indeed, if there are many inputs, the right-most terms will be needed to compute the partial derivatives with regards to each input, so it is a good idea to compute these right-most terms first. That means using reverse-mode autodiff. This way, the right-most terms can be computed just once and used to compute all the partial derivatives. Conversely, if there are many outputs, forward-mode is generally preferable because the left-most terms can be computed just once to compute the partial derivatives of the different outputs. In Deep Learning, there are typically thousands of model parameters, meaning there are lots of inputs, but few outputs. In fact, there is generally just one output during training: the loss." + "Of course both approaches give the same result (except for rounding errors), and with a single input and output they involve the same number of computations. But when there are several inputs or outputs, they can have very different performance. Indeed, if there are many inputs, the right-most terms will be needed to compute the partial derivatives with regards to each input, so it is a good idea to compute these right-most terms first. That means using reverse-mode autodiff. This way, the right-most terms can be computed just once and used to compute all the partial derivatives. Conversely, if there are many outputs, forward-mode is generally preferable because the left-most terms can be computed just once to compute the partial derivatives of the different outputs. In Deep Learning, there are typically thousands of model parameters, meaning there are lots of inputs, but few outputs. In fact, there is generally just one output during training: the loss. This is why reverse mode autodiff is used in TensorFlow and all major Deep Learning libraries." ] }, {