Fix Hessian notation
parent
6285443c78
commit
500c53756b
|
@ -1014,7 +1014,7 @@
|
||||||
"source": [
|
"source": [
|
||||||
"# Higher order derivatives\n",
|
"# Higher order derivatives\n",
|
||||||
"\n",
|
"\n",
|
||||||
"What happens if we try to differentiate the function $f'(x)$? Well, we get the so-called second order derivative, noted $f''(x)$, or $\\dfrac{\\mathrm{d}f}{\\mathrm{d}^2x}$. If we repeat the process by differentiating $f''(x)$, we get the third-order derivative $f'''(x)$, or $\\dfrac{\\mathrm{d}f}{\\mathrm{d}^3x}$. And we could go on to get higher order derivatives.\n",
|
"What happens if we try to differentiate the function $f'(x)$? Well, we get the so-called second order derivative, noted $f''(x)$, or $\\dfrac{\\mathrm{d}^2f}{\\mathrm{d}x^2}$. If we repeat the process by differentiating $f''(x)$, we get the third-order derivative $f'''(x)$, or $\\dfrac{\\mathrm{d}^3f}{\\mathrm{d}x^3}$. And we could go on to get higher order derivatives.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"What's the intuition behind second order derivatives? Well, since the (first order) derivative represents the instantaneous rate of change of $f$ at each point, the second order derivative represents the instantaneous rate of change of the rate of change itself, in other words, you can think of it as the **acceleration** of the curve: if $f''(x) < 0$, then the curve is accelerating \"downwards\", if $f''(x) > 0$ then the curve is accelerating \"upwards\", and if $f''(x) = 0$, then the curve is locally a straight line. Note that a curve could be going upwards (i.e., $f'(x)>0$) but also be accelerating downwards (i.e., $f''(x) < 0$): for example, imagine the path of a stone thrown upwards, as it is being slowed down by gravity (which constantly accelerates the stone downwards).\n",
|
"What's the intuition behind second order derivatives? Well, since the (first order) derivative represents the instantaneous rate of change of $f$ at each point, the second order derivative represents the instantaneous rate of change of the rate of change itself, in other words, you can think of it as the **acceleration** of the curve: if $f''(x) < 0$, then the curve is accelerating \"downwards\", if $f''(x) > 0$ then the curve is accelerating \"upwards\", and if $f''(x) = 0$, then the curve is locally a straight line. Note that a curve could be going upwards (i.e., $f'(x)>0$) but also be accelerating downwards (i.e., $f''(x) < 0$): for example, imagine the path of a stone thrown upwards, as it is being slowed down by gravity (which constantly accelerates the stone downwards).\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
@ -1236,14 +1236,14 @@
|
||||||
"# Hessians\n",
|
"# Hessians\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Let's come back to a function $f(\\mathbf{x})$ which takes an $n$-dimensional vector as input and outputs a scalar. If you determine the equation of the partial derivative of $f$ with regards to $x_i$ (the $i^\\text{th}$ component of $\\mathbf{x}$), you will get a new function of $\\mathbf{x}$: $\\dfrac{\\partial f}{\\partial x_i}$. You can then compute the partial derivative of this function with regards to $x_j$ (the $j^\\text{th}$ component of $\\mathbf{x}$). The result is a partial derivative of a partial derivative: in other words, it is a **second order partial derivatives**, also called a **Hessian**. It is noted $\\mathbf{x}$: $\\dfrac{\\partial^2 f}{\\partial x_jx_i}$. If $i\\neq j$ then it is called a **mixed second order partial derivative**.\n",
|
"Let's come back to a function $f(\\mathbf{x})$ which takes an $n$-dimensional vector as input and outputs a scalar. If you determine the equation of the partial derivative of $f$ with regards to $x_i$ (the $i^\\text{th}$ component of $\\mathbf{x}$), you will get a new function of $\\mathbf{x}$: $\\dfrac{\\partial f}{\\partial x_i}$. You can then compute the partial derivative of this function with regards to $x_j$ (the $j^\\text{th}$ component of $\\mathbf{x}$). The result is a partial derivative of a partial derivative: in other words, it is a **second order partial derivatives**, also called a **Hessian**. It is noted $\\mathbf{x}$: $\\dfrac{\\partial^2 f}{\\partial x_jx_i}$. If $i\\neq j$ then it is called a **mixed second order partial derivative**.\n",
|
||||||
"Or else, if $j=i$, it is noted $\\dfrac{\\partial^2 f}{\\partial^2 x_i}$\n",
|
"Or else, if $j=i$, it is noted $\\dfrac{\\partial^2 f}{\\partial {x_i}^2}$\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Let's look at an example: $f(x, y)=\\sin(xy)$. As we showed earlier, the first order partial derivatives of $f$ are: $\\dfrac{\\partial f}{\\partial x}=y\\cos(xy)$ and $\\dfrac{\\partial f}{\\partial y}=x\\cos(xy)$. So we can now compute all the Hessians (using the derivative rules we discussed earlier):\n",
|
"Let's look at an example: $f(x, y)=\\sin(xy)$. As we showed earlier, the first order partial derivatives of $f$ are: $\\dfrac{\\partial f}{\\partial x}=y\\cos(xy)$ and $\\dfrac{\\partial f}{\\partial y}=x\\cos(xy)$. So we can now compute all the Hessians (using the derivative rules we discussed earlier):\n",
|
||||||
"\n",
|
"\n",
|
||||||
"* $\\dfrac{\\partial^2 f}{\\partial^2 x} = \\dfrac{\\partial f}{\\partial x}\\left[y\\cos(xy)\\right] = -y^2\\sin(xy)$\n",
|
"* $\\dfrac{\\partial^2 f}{\\partial x^2} = \\dfrac{\\partial f}{\\partial x}\\left[y\\cos(xy)\\right] = -y^2\\sin(xy)$\n",
|
||||||
"* $\\dfrac{\\partial^2 f}{\\partial y\\,\\partial x} = \\dfrac{\\partial f}{\\partial y}\\left[y\\cos(xy)\\right] = \\cos(xy) - xy\\sin(xy)$\n",
|
"* $\\dfrac{\\partial^2 f}{\\partial y\\,\\partial x} = \\dfrac{\\partial f}{\\partial y}\\left[y\\cos(xy)\\right] = \\cos(xy) - xy\\sin(xy)$\n",
|
||||||
"* $\\dfrac{\\partial^2 f}{\\partial x\\,\\partial y} = \\dfrac{\\partial f}{\\partial x}\\left[x\\cos(xy)\\right] = \\cos(xy) - xy\\sin(xy)$\n",
|
"* $\\dfrac{\\partial^2 f}{\\partial x\\,\\partial y} = \\dfrac{\\partial f}{\\partial x}\\left[x\\cos(xy)\\right] = \\cos(xy) - xy\\sin(xy)$\n",
|
||||||
"* $\\dfrac{\\partial^2 f}{\\partial^2 y} = \\dfrac{\\partial f}{\\partial y}\\left[x\\cos(xy)\\right] = -x^2\\sin(xy)$\n",
|
"* $\\dfrac{\\partial^2 f}{\\partial y^2} = \\dfrac{\\partial f}{\\partial y}\\left[x\\cos(xy)\\right] = -x^2\\sin(xy)$\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Note that $\\dfrac{\\partial^2 f}{\\partial x\\,\\partial y} = \\dfrac{\\partial^2 f}{\\partial y\\,\\partial x}$. This is the case whenever all the partial derivatives are defined and continuous in a neighborhood around the point at which we differentiate.\n",
|
"Note that $\\dfrac{\\partial^2 f}{\\partial x\\,\\partial y} = \\dfrac{\\partial^2 f}{\\partial y\\,\\partial x}$. This is the case whenever all the partial derivatives are defined and continuous in a neighborhood around the point at which we differentiate.\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
@ -1251,19 +1251,19 @@
|
||||||
"\n",
|
"\n",
|
||||||
"$\n",
|
"$\n",
|
||||||
"\\mathbf{H}_f(\\mathbf{x}_\\mathbf{A}) = \\begin{pmatrix}\n",
|
"\\mathbf{H}_f(\\mathbf{x}_\\mathbf{A}) = \\begin{pmatrix}\n",
|
||||||
"\\dfrac{\\partial^2 f}{\\partial^2 x_1}(\\mathbf{x}_\\mathbf{A})\n",
|
"\\dfrac{\\partial^2 f}{\\partial {x_1}^2}(\\mathbf{x}_\\mathbf{A})\n",
|
||||||
"&& \\dfrac{\\partial^2 f}{\\partial x_1\\, \\partial x_2}(\\mathbf{x}_\\mathbf{A})\n",
|
"&& \\dfrac{\\partial^2 f}{\\partial x_1\\, \\partial x_2}(\\mathbf{x}_\\mathbf{A})\n",
|
||||||
"&& \\dots\n",
|
"&& \\dots\n",
|
||||||
"&& \\dfrac{\\partial^2 f}{\\partial x_1\\, \\partial x_n}(\\mathbf{x}_\\mathbf{A})\\\\\n",
|
"&& \\dfrac{\\partial^2 f}{\\partial x_1\\, \\partial x_n}(\\mathbf{x}_\\mathbf{A})\\\\\n",
|
||||||
"\\dfrac{\\partial^2 f}{\\partial x_2\\,\\partial x_1}(\\mathbf{x}_\\mathbf{A})\n",
|
"\\dfrac{\\partial^2 f}{\\partial x_2\\,\\partial x_1}(\\mathbf{x}_\\mathbf{A})\n",
|
||||||
"&& \\dfrac{\\partial^2 f}{\\partial^2 x_2}(\\mathbf{x}_\\mathbf{A})\n",
|
"&& \\dfrac{\\partial^2 f}{\\partial {x_2}^2}(\\mathbf{x}_\\mathbf{A})\n",
|
||||||
"&& \\dots\n",
|
"&& \\dots\n",
|
||||||
"&& \\dfrac{\\partial^2 f}{\\partial x_2\\, \\partial x_n}(\\mathbf{x}_\\mathbf{A})\\\\\n",
|
"&& \\dfrac{\\partial^2 f}{\\partial x_2\\, \\partial x_n}(\\mathbf{x}_\\mathbf{A})\\\\\n",
|
||||||
"\\vdots && \\vdots && \\ddots && \\vdots \\\\\n",
|
"\\vdots && \\vdots && \\ddots && \\vdots \\\\\n",
|
||||||
"\\dfrac{\\partial^2 f}{\\partial x_n\\,\\partial x_1}(\\mathbf{x}_\\mathbf{A})\n",
|
"\\dfrac{\\partial^2 f}{\\partial x_n\\,\\partial x_1}(\\mathbf{x}_\\mathbf{A})\n",
|
||||||
"&& \\dfrac{\\partial^2 f}{\\partial x_n\\,\\partial x_2}(\\mathbf{x}_\\mathbf{A})\n",
|
"&& \\dfrac{\\partial^2 f}{\\partial x_n\\,\\partial x_2}(\\mathbf{x}_\\mathbf{A})\n",
|
||||||
"&& \\dots\n",
|
"&& \\dots\n",
|
||||||
"&& \\dfrac{\\partial^2 f}{\\partial^2 x_n}(\\mathbf{x}_\\mathbf{A})\\\\\n",
|
"&& \\dfrac{\\partial^2 f}{\\partial {x_n}^2}(\\mathbf{x}_\\mathbf{A})\\\\\n",
|
||||||
"\\end{pmatrix}\n",
|
"\\end{pmatrix}\n",
|
||||||
"$"
|
"$"
|
||||||
]
|
]
|
||||||
|
|
Loading…
Reference in New Issue