Fix Hessian notation

main
Aurélien Geron 2020-04-12 20:28:34 +12:00
parent 6285443c78
commit 500c53756b
1 changed files with 7 additions and 7 deletions

View File

@ -1014,7 +1014,7 @@
"source": [ "source": [
"# Higher order derivatives\n", "# Higher order derivatives\n",
"\n", "\n",
"What happens if we try to differentiate the function $f'(x)$? Well, we get the so-called second order derivative, noted $f''(x)$, or $\\dfrac{\\mathrm{d}f}{\\mathrm{d}^2x}$. If we repeat the process by differentiating $f''(x)$, we get the third-order derivative $f'''(x)$, or $\\dfrac{\\mathrm{d}f}{\\mathrm{d}^3x}$. And we could go on to get higher order derivatives.\n", "What happens if we try to differentiate the function $f'(x)$? Well, we get the so-called second order derivative, noted $f''(x)$, or $\\dfrac{\\mathrm{d}^2f}{\\mathrm{d}x^2}$. If we repeat the process by differentiating $f''(x)$, we get the third-order derivative $f'''(x)$, or $\\dfrac{\\mathrm{d}^3f}{\\mathrm{d}x^3}$. And we could go on to get higher order derivatives.\n",
"\n", "\n",
"What's the intuition behind second order derivatives? Well, since the (first order) derivative represents the instantaneous rate of change of $f$ at each point, the second order derivative represents the instantaneous rate of change of the rate of change itself, in other words, you can think of it as the **acceleration** of the curve: if $f''(x) < 0$, then the curve is accelerating \"downwards\", if $f''(x) > 0$ then the curve is accelerating \"upwards\", and if $f''(x) = 0$, then the curve is locally a straight line. Note that a curve could be going upwards (i.e., $f'(x)>0$) but also be accelerating downwards (i.e., $f''(x) < 0$): for example, imagine the path of a stone thrown upwards, as it is being slowed down by gravity (which constantly accelerates the stone downwards).\n", "What's the intuition behind second order derivatives? Well, since the (first order) derivative represents the instantaneous rate of change of $f$ at each point, the second order derivative represents the instantaneous rate of change of the rate of change itself, in other words, you can think of it as the **acceleration** of the curve: if $f''(x) < 0$, then the curve is accelerating \"downwards\", if $f''(x) > 0$ then the curve is accelerating \"upwards\", and if $f''(x) = 0$, then the curve is locally a straight line. Note that a curve could be going upwards (i.e., $f'(x)>0$) but also be accelerating downwards (i.e., $f''(x) < 0$): for example, imagine the path of a stone thrown upwards, as it is being slowed down by gravity (which constantly accelerates the stone downwards).\n",
"\n", "\n",
@ -1236,14 +1236,14 @@
"# Hessians\n", "# Hessians\n",
"\n", "\n",
"Let's come back to a function $f(\\mathbf{x})$ which takes an $n$-dimensional vector as input and outputs a scalar. If you determine the equation of the partial derivative of $f$ with regards to $x_i$ (the $i^\\text{th}$ component of $\\mathbf{x}$), you will get a new function of $\\mathbf{x}$: $\\dfrac{\\partial f}{\\partial x_i}$. You can then compute the partial derivative of this function with regards to $x_j$ (the $j^\\text{th}$ component of $\\mathbf{x}$). The result is a partial derivative of a partial derivative: in other words, it is a **second order partial derivatives**, also called a **Hessian**. It is noted $\\mathbf{x}$: $\\dfrac{\\partial^2 f}{\\partial x_jx_i}$. If $i\\neq j$ then it is called a **mixed second order partial derivative**.\n", "Let's come back to a function $f(\\mathbf{x})$ which takes an $n$-dimensional vector as input and outputs a scalar. If you determine the equation of the partial derivative of $f$ with regards to $x_i$ (the $i^\\text{th}$ component of $\\mathbf{x}$), you will get a new function of $\\mathbf{x}$: $\\dfrac{\\partial f}{\\partial x_i}$. You can then compute the partial derivative of this function with regards to $x_j$ (the $j^\\text{th}$ component of $\\mathbf{x}$). The result is a partial derivative of a partial derivative: in other words, it is a **second order partial derivatives**, also called a **Hessian**. It is noted $\\mathbf{x}$: $\\dfrac{\\partial^2 f}{\\partial x_jx_i}$. If $i\\neq j$ then it is called a **mixed second order partial derivative**.\n",
"Or else, if $j=i$, it is noted $\\dfrac{\\partial^2 f}{\\partial^2 x_i}$\n", "Or else, if $j=i$, it is noted $\\dfrac{\\partial^2 f}{\\partial {x_i}^2}$\n",
"\n", "\n",
"Let's look at an example: $f(x, y)=\\sin(xy)$. As we showed earlier, the first order partial derivatives of $f$ are: $\\dfrac{\\partial f}{\\partial x}=y\\cos(xy)$ and $\\dfrac{\\partial f}{\\partial y}=x\\cos(xy)$. So we can now compute all the Hessians (using the derivative rules we discussed earlier):\n", "Let's look at an example: $f(x, y)=\\sin(xy)$. As we showed earlier, the first order partial derivatives of $f$ are: $\\dfrac{\\partial f}{\\partial x}=y\\cos(xy)$ and $\\dfrac{\\partial f}{\\partial y}=x\\cos(xy)$. So we can now compute all the Hessians (using the derivative rules we discussed earlier):\n",
"\n", "\n",
"* $\\dfrac{\\partial^2 f}{\\partial^2 x} = \\dfrac{\\partial f}{\\partial x}\\left[y\\cos(xy)\\right] = -y^2\\sin(xy)$\n", "* $\\dfrac{\\partial^2 f}{\\partial x^2} = \\dfrac{\\partial f}{\\partial x}\\left[y\\cos(xy)\\right] = -y^2\\sin(xy)$\n",
"* $\\dfrac{\\partial^2 f}{\\partial y\\,\\partial x} = \\dfrac{\\partial f}{\\partial y}\\left[y\\cos(xy)\\right] = \\cos(xy) - xy\\sin(xy)$\n", "* $\\dfrac{\\partial^2 f}{\\partial y\\,\\partial x} = \\dfrac{\\partial f}{\\partial y}\\left[y\\cos(xy)\\right] = \\cos(xy) - xy\\sin(xy)$\n",
"* $\\dfrac{\\partial^2 f}{\\partial x\\,\\partial y} = \\dfrac{\\partial f}{\\partial x}\\left[x\\cos(xy)\\right] = \\cos(xy) - xy\\sin(xy)$\n", "* $\\dfrac{\\partial^2 f}{\\partial x\\,\\partial y} = \\dfrac{\\partial f}{\\partial x}\\left[x\\cos(xy)\\right] = \\cos(xy) - xy\\sin(xy)$\n",
"* $\\dfrac{\\partial^2 f}{\\partial^2 y} = \\dfrac{\\partial f}{\\partial y}\\left[x\\cos(xy)\\right] = -x^2\\sin(xy)$\n", "* $\\dfrac{\\partial^2 f}{\\partial y^2} = \\dfrac{\\partial f}{\\partial y}\\left[x\\cos(xy)\\right] = -x^2\\sin(xy)$\n",
"\n", "\n",
"Note that $\\dfrac{\\partial^2 f}{\\partial x\\,\\partial y} = \\dfrac{\\partial^2 f}{\\partial y\\,\\partial x}$. This is the case whenever all the partial derivatives are defined and continuous in a neighborhood around the point at which we differentiate.\n", "Note that $\\dfrac{\\partial^2 f}{\\partial x\\,\\partial y} = \\dfrac{\\partial^2 f}{\\partial y\\,\\partial x}$. This is the case whenever all the partial derivatives are defined and continuous in a neighborhood around the point at which we differentiate.\n",
"\n", "\n",
@ -1251,19 +1251,19 @@
"\n", "\n",
"$\n", "$\n",
"\\mathbf{H}_f(\\mathbf{x}_\\mathbf{A}) = \\begin{pmatrix}\n", "\\mathbf{H}_f(\\mathbf{x}_\\mathbf{A}) = \\begin{pmatrix}\n",
"\\dfrac{\\partial^2 f}{\\partial^2 x_1}(\\mathbf{x}_\\mathbf{A})\n", "\\dfrac{\\partial^2 f}{\\partial {x_1}^2}(\\mathbf{x}_\\mathbf{A})\n",
"&& \\dfrac{\\partial^2 f}{\\partial x_1\\, \\partial x_2}(\\mathbf{x}_\\mathbf{A})\n", "&& \\dfrac{\\partial^2 f}{\\partial x_1\\, \\partial x_2}(\\mathbf{x}_\\mathbf{A})\n",
"&& \\dots\n", "&& \\dots\n",
"&& \\dfrac{\\partial^2 f}{\\partial x_1\\, \\partial x_n}(\\mathbf{x}_\\mathbf{A})\\\\\n", "&& \\dfrac{\\partial^2 f}{\\partial x_1\\, \\partial x_n}(\\mathbf{x}_\\mathbf{A})\\\\\n",
"\\dfrac{\\partial^2 f}{\\partial x_2\\,\\partial x_1}(\\mathbf{x}_\\mathbf{A})\n", "\\dfrac{\\partial^2 f}{\\partial x_2\\,\\partial x_1}(\\mathbf{x}_\\mathbf{A})\n",
"&& \\dfrac{\\partial^2 f}{\\partial^2 x_2}(\\mathbf{x}_\\mathbf{A})\n", "&& \\dfrac{\\partial^2 f}{\\partial {x_2}^2}(\\mathbf{x}_\\mathbf{A})\n",
"&& \\dots\n", "&& \\dots\n",
"&& \\dfrac{\\partial^2 f}{\\partial x_2\\, \\partial x_n}(\\mathbf{x}_\\mathbf{A})\\\\\n", "&& \\dfrac{\\partial^2 f}{\\partial x_2\\, \\partial x_n}(\\mathbf{x}_\\mathbf{A})\\\\\n",
"\\vdots && \\vdots && \\ddots && \\vdots \\\\\n", "\\vdots && \\vdots && \\ddots && \\vdots \\\\\n",
"\\dfrac{\\partial^2 f}{\\partial x_n\\,\\partial x_1}(\\mathbf{x}_\\mathbf{A})\n", "\\dfrac{\\partial^2 f}{\\partial x_n\\,\\partial x_1}(\\mathbf{x}_\\mathbf{A})\n",
"&& \\dfrac{\\partial^2 f}{\\partial x_n\\,\\partial x_2}(\\mathbf{x}_\\mathbf{A})\n", "&& \\dfrac{\\partial^2 f}{\\partial x_n\\,\\partial x_2}(\\mathbf{x}_\\mathbf{A})\n",
"&& \\dots\n", "&& \\dots\n",
"&& \\dfrac{\\partial^2 f}{\\partial^2 x_n}(\\mathbf{x}_\\mathbf{A})\\\\\n", "&& \\dfrac{\\partial^2 f}{\\partial {x_n}^2}(\\mathbf{x}_\\mathbf{A})\\\\\n",
"\\end{pmatrix}\n", "\\end{pmatrix}\n",
"$" "$"
] ]