Update differential calculus tutorial

2022-05-20 13:17:31 +09:00 · 2022-05-20 13:17:31 +09:00 · 42525c610c
parent 609ca0bcb0
commit 42525c610c
1 changed files with 26 additions and 29 deletions
--- a/math_differential_calculus.ipynb
+++ b/math_differential_calculus.ipynb
@ -49,7 +49,6 @@
   "outputs": [],
   "source": [
    "#@title\n",
    "%matplotlib inline\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
@ -167,7 +166,7 @@
    "id": "gcb7eqkmGGXf"
   },
   "source": [
-    "But what if you want to know the slope of something else than a straight line? For example, let's consider the curve defined by $y = f(x) = x^2$:"
+    "But what if you want to know the slope of something other than a straight line? For example, let's consider the curve defined by $y = f(x) = x^2$:"
   ]
  },
  {
@ -226,7 +225,7 @@
    "id": "4qCXg9nQSp6S"
   },
   "source": [
-    "How can we put numbers on these intuitions? Well, say we want to estimate the slope of the curve at a point $\\mathrm{A}$, we can do this by taking another point $\\mathrm{B}$ on the curve, not too far away, and then computing the slope between these two points:\n"
+    "How can we put numbers on these intuitions? Well, say we want to estimate the slope of the curve at a point $\\mathrm{A}$. We can do this by taking another point $\\mathrm{B}$ on the curve, not too far away, and then computing the slope between these two points:\n"
   ]
  },
  {
@ -963,7 +962,7 @@
   "source": [
    "# Differentiability\n",
    "\n",
-    "Note that some functions are not quite as well-behaved as $x^2$: for example, consider the function $f(x)=|x|$, the absolute value of $x$:"
+    "Note that some functions are not quite as well-behaved as $x^2$. For example, consider the function $f(x)=|x|$, the absolute value of $x$:"
   ]
  },
  {
@ -2231,7 +2230,7 @@
    "& = \\underset{x_\\mathrm{B} \\to x_\\mathrm{A}}\\lim x_\\mathrm{B} \\, + \\underset{x_\\mathrm{B} \\to x_\\mathrm{A}}\\lim x_\\mathrm{A}\\quad && \\text{since the limit of a sum is the sum of the limits}\\\\\n",
    "& = x_\\mathrm{A} \\, + \\underset{x_\\mathrm{B} \\to x_\\mathrm{A}}\\lim x_\\mathrm{A} \\quad && \\text{since } x_\\mathrm{B}\\text{ approaches } x_\\mathrm{A} \\\\\n",
    "& = x_\\mathrm{A} + x_\\mathrm{A} \\quad && \\text{since } x_\\mathrm{A} \\text{ remains constant when } x_\\mathrm{B}\\text{ approaches } x_\\mathrm{A} \\\\\n",
-    "& = 2 x_\\mathrm{A}\n",
+    "& = 2x_\\mathrm{A} &&\n",
    "\\end{align*}\n",
    "$\n",
    "\n",
@ -2307,7 +2306,7 @@
    "& = \\underset{\\epsilon \\to 0}\\lim\\dfrac{{x}^2 + 2x\\epsilon + \\epsilon^2 - {x}^2}{\\epsilon}\\quad && \\text{since } (x + \\epsilon)^2 = {x}^2 + 2x\\epsilon + \\epsilon^2\\\\\n",
    "& = \\underset{\\epsilon \\to 0}\\lim\\dfrac{2x\\epsilon + \\epsilon^2}{\\epsilon}\\quad && \\text{since the two } {x}^2 \\text{ cancel out}\\\\\n",
    "& = \\underset{\\epsilon \\to 0}\\lim \\, (2x + \\epsilon)\\quad && \\text{since } 2x\\epsilon \\text{ and } \\epsilon^2 \\text{ can both be divided by } \\epsilon\\\\\n",
-    "& = 2 x\n",
+    "& = 2x &&\n",
    "\\end{align*}\n",
    "$\n",
    "\n",
@ -2343,7 +2342,7 @@
    "\n",
    "The $f'$ notation is Lagrange's notation, while $\\dfrac{\\mathrm{d}f}{\\mathrm{d}x}$ is Leibniz's notation.\n",
    "\n",
-    "There are also other less common notations, such as Newton's notation $\\dot y$ (assuming $y = f(x)$) or Euler's notation $\\mathrm{D}f$."
+    "There are other less common notations, such as Newton's notation $\\dot y$ (assuming $y = f(x)$) or Euler's notation $\\mathrm{D}f$."
   ]
  },
  {
@ -4164,7 +4163,7 @@
    "\n",
    "It is possible to chain many functions. For example, if $f(x)=g(h(i(x)))$, and we define $y=i(x)$ and $z=h(y)$, then $\\dfrac{\\mathrm{d}f}{\\mathrm{d}x} = \\dfrac{\\mathrm{d}f}{\\mathrm{d}z} \\dfrac{\\mathrm{d}z}{\\mathrm{d}y} \\dfrac{\\mathrm{d}y}{\\mathrm{d}x}$. Using Lagrange's notation, we get $f'(x)=g'(z)\\,h'(y)\\,i'(x)=g'(h(i(x)))\\,h'(i(x))\\,i'(x)$\n",
    "\n",
-    "The chain rule is crucial in Deep Learning, as a neural network is basically as a long composition of functions. For example, a 3-layer dense neural network corresponds to the following function: $f(\\mathbf{x})=\\operatorname{Dense}_3(\\operatorname{Dense}_2(\\operatorname{Dense}_1(\\mathbf{x})))$ (in this example, $\\operatorname{Dense}_3$ is the output layer).\n"
+    "The chain rule is crucial in Deep Learning, as a neural network is basically a long composition of functions. For example, a 3-layer dense neural network corresponds to the following function: $f(\\mathbf{x})=\\operatorname{Dense}_3(\\operatorname{Dense}_2(\\operatorname{Dense}_1(\\mathbf{x})))$ (in this example, $\\operatorname{Dense}_3$ is the output layer).\n"
   ]
  },
  {
@ -4296,7 +4295,7 @@
    "\n",
    "At each iteration, the step size is proportional to the slope, so the process naturally slows down as it approaches a local minimum. Each step is also proportional to the learning rate: a parameter of the Gradient Descent algorithm itself (since it is not a parameter of the function we are optimizing, it is called a **hyperparameter**).\n",
    "\n",
-    "Here is an animation of this process on the function $f(x)=\\dfrac{1}{4}x^4 - x^2 + \\dfrac{1}{2}$:"
+    "Here is an animation of this process for the function $f(x)=\\dfrac{1}{4}x^4 - x^2 + \\dfrac{1}{2}$:"
   ]
  },
  {
@ -5253,8 +5252,6 @@
   ],
   "source": [
    "#@title\n",
    "from mpl_toolkits.mplot3d import Axes3D\n",
    "\n",
    "def plot_3d(f, title):\n",
    "    fig = plt.figure(figsize=(8, 5))\n",
    "    ax = fig.add_subplot(111, projection='3d')\n",
@ -5367,7 +5364,7 @@
    "$\\nabla f(\\mathbf{x}_\\mathrm{A}) = \\begin{pmatrix}\n",
    "\\dfrac{\\partial f}{\\partial x_1}(\\mathbf{x}_\\mathrm{A})\\\\\n",
    "\\dfrac{\\partial f}{\\partial x_2}(\\mathbf{x}_\\mathrm{A})\\\\\n",
-    "\\vdots\\\\\\\n",
+    "\\vdots\\\\\n",
    "\\dfrac{\\partial f}{\\partial x_n}(\\mathbf{x}_\\mathrm{A})\\\\\n",
    "\\end{pmatrix}$"
   ]
@ -5407,7 +5404,7 @@
   "source": [
    "# Jacobians\n",
    "\n",
-    "Until now we have only considered functions that output a scalar, but it is possible to output vectors instead. For example, a classification neural network typically outputs one probability for each class, so if there are $m$ classes, the neural network will output an $d$-dimensional vector for each input.\n",
+    "Until now, we have only considered functions that output a scalar, but it is possible to output vectors instead. For example, a classification neural network typically outputs one probability for each class, so if there are $m$ classes, the neural network will output a $d$-dimensional vector for each input.\n",
    "\n",
    "In Deep Learning we generally only need to differentiate the loss function, which almost always outputs a single scalar number. But suppose for a second that you want to differentiate a function $\\mathbf{f}(\\mathbf{x})$ which outputs $d$-dimensional vectors. The good news is that you can treat each _output_ dimension independently of the others. This will give you a partial derivative for each input dimension and each output dimension. If you put them all in a single matrix, with one column per input dimension and one row per output dimension, you get the so-called **Jacobian matrix**.\n",
    "\n",
@ -5532,7 +5529,7 @@
    "& = \\underset{\\epsilon \\to 0}\\lim\\dfrac{g(x+\\epsilon)h(x+\\epsilon) - g(x)h(x+\\epsilon)}{\\epsilon} + \\underset{\\epsilon \\to 0}\\lim\\dfrac{g(x)h(x + \\epsilon) - g(x)h(x)}{\\epsilon} && \\quad \\text{since the limit of a sum is the sum of the limits}\\\\\n",
    "& = \\underset{\\epsilon \\to 0}\\lim{\\left[\\dfrac{g(x+\\epsilon) - g(x)}{\\epsilon}h(x+\\epsilon)\\right]} \\,+\\, \\underset{\\epsilon \\to 0}\\lim{\\left[g(x)\\dfrac{h(x + \\epsilon) - h(x)}{\\epsilon}\\right]} && \\quad \\text{factorizing }h(x+\\epsilon) \\text{ and } g(x)\\\\\n",
    "& = \\underset{\\epsilon \\to 0}\\lim{\\left[\\dfrac{g(x+\\epsilon) - g(x)}{\\epsilon}h(x+\\epsilon)\\right]} \\,+\\, g(x)\\underset{\\epsilon \\to 0}\\lim{\\dfrac{h(x + \\epsilon) - h(x)}{\\epsilon}} && \\quad \\text{taking } g(x) \\text{ out of the limit since it does not depend on }\\epsilon\\\\\n",
-    "& = \\underset{\\epsilon \\to 0}\\lim{\\left[\\dfrac{g(x+\\epsilon) - g(x)}{\\epsilon}h(x+\\epsilon)\\right]} \\,+\\, g(x)h'(x) && \\quad \\text{using the definition of h'(x)}\\\\\n",
+    "& = \\underset{\\epsilon \\to 0}\\lim{\\left[\\dfrac{g(x+\\epsilon) - g(x)}{\\epsilon}h(x+\\epsilon)\\right]} \\,+\\, g(x)h'(x) && \\quad \\text{using the definition of }h'(x)\\\\\n",
    "& = \\underset{\\epsilon \\to 0}\\lim{\\left[\\dfrac{g(x+\\epsilon) - g(x)}{\\epsilon}\\right]}\\underset{\\epsilon \\to 0}\\lim{h(x+\\epsilon)} + g(x)h'(x) && \\quad \\text{since the limit of a product is the product of the limits}\\\\\n",
    "& = \\underset{\\epsilon \\to 0}\\lim{\\left[\\dfrac{g(x+\\epsilon) - g(x)}{\\epsilon}\\right]}h(x) + h(x)g'(x) && \\quad \\text{since } h(x) \\text{ is continuous}\\\\\n",
    "& = g'(x)h(x) + g(x)h'(x) && \\quad \\text{using the definition of }g'(x)\n",
@ -5620,7 +5617,7 @@
    "& = \\underset{\\epsilon \\to 0}\\lim{\\left[\\dfrac{1}{\\epsilon} \\, \\ln\\left(1 + \\dfrac{\\epsilon}{x}\\right)\\right]} && \\quad \\text{just moving things around a bit}\\\\\n",
    "& = \\underset{\\epsilon \\to 0}\\lim{\\left[\\dfrac{1}{xu} \\, \\ln\\left(1 + u\\right)\\right]} && \\quad \\text{defining }u=\\dfrac{\\epsilon}{x} \\text{ and thus } \\epsilon=xu\\\\\n",
    "& = \\underset{u \\to 0}\\lim{\\left[\\dfrac{1}{xu} \\, \\ln\\left(1 + u\\right)\\right]} && \\quad \\text{replacing } \\underset{\\epsilon \\to 0}\\lim \\text{ with } \\underset{u \\to 0}\\lim \\text{ since }\\underset{\\epsilon \\to 0}\\lim u=0\\\\\n",
-    "& = \\underset{u \\to 0}\\lim{\\left[\\dfrac{1}{x} \\, \\ln\\left((1 + u)^{1/u}\\right)\\right]} && \\quad \\text{since }a\\ln(b)=\\ln(a^b)\\\\\n",
+    "& = \\underset{u \\to 0}\\lim{\\left[\\dfrac{1}{x} \\, \\ln\\left((1 + u)^{1/u}\\right)\\right]} && \\quad \\text{since }a\\ln(b)=\\ln(b^a)\\\\\n",
    "& = \\dfrac{1}{x}\\underset{u \\to 0}\\lim{\\left[\\ln\\left((1 + u)^{1/u}\\right)\\right]} && \\quad \\text{taking }\\dfrac{1}{x} \\text{ out since it does not depend on }\\epsilon\\\\\n",
    "& = \\dfrac{1}{x}\\ln\\left(\\underset{u \\to 0}\\lim{(1 + u)^{1/u}}\\right) && \\quad \\text{taking }\\ln\\text{ out since it is a continuous function}\\\\\n",
    "& = \\dfrac{1}{x}\\ln(e) && \\quad \\text{since }e=\\underset{u \\to 0}\\lim{(1 + u)^{1/u}}\\\\\n",
@ -5644,9 +5641,9 @@
    "\n",
    "We know the derivative of the exponential: $g'(x)=e^x$. We also know the derivative of the natural logarithm: $\\ln'(x)=\\dfrac{1}{x}$ so $h'(x)=\\dfrac{r}{x}$. Therefore:\n",
    "\n",
-    "$f'(x) = \\dfrac{r}{x}\\exp\\left({\\ln(x^r)}\\right)$\n",
+    "$f'(x) = \\dfrac{r}{x} e^{\\ln(x^r)}$\n",
    "\n",
-    "Since $a = \\exp(\\ln(a))$, this equation simplifies to:\n",
+    "Since $e^{\\ln(a)} = a$, this equation simplifies to:\n",
    "\n",
    "$f'(x) = \\dfrac{r}{x} x^r$\n",
    "\n",
@ -5657,7 +5654,7 @@
    "Note that the power rule works for any $r \\neq 0$, including negative numbers and real numbers. For example:\n",
    "\n",
    "* if $f(x) = \\dfrac{1}{x} = x^{-1}$, then $f'(x)=-x^{-2}=-\\dfrac{1}{x^2}$.\n",
-    "* if $f(x) = \\sqrt(x) = x^{1/2}$, then $f'(x)=\\dfrac{1}{2}x^{-1/2}=\\dfrac{1}{2\\sqrt{x}}$"
+    "* if $f(x) = \\sqrt{x} = x^{1/2}$, then $f'(x)=\\dfrac{1}{2}x^{-1/2}=\\dfrac{1}{2\\sqrt{x}}$"
   ]
  },
  {
@ -5800,17 +5797,17 @@
   "source": [
    "The circle is the unit circle (radius=1).\n",
    "\n",
-    "Assuming $0 < \\theta < \\dfrac{\\pi}{2}$, the area of the blue triangle (area $\\mathrm{A}$) is equal to its height ($\\sin(\\theta)$), times its base ($\\cos(\\theta)$), divided by 2. So $\\mathrm{A} = \\dfrac{1}{2}\\sin(\\theta)\\cos(\\theta)$.\n",
+    "Assuming $0 < \\theta < \\dfrac{\\pi}{2}$, the area of the blue triangle (area $\\mathrm{A}$) is equal to its height ($\\sin(\\theta)$) times its base ($\\cos(\\theta)$) divided by 2. So $\\mathrm{A} = \\dfrac{1}{2}\\sin(\\theta)\\cos(\\theta)$.\n",
    "\n",
    "The unit circle has an area of $\\pi$, so the circular sector (in the shape of a pizza slice) has an area of A + B = $\\pi\\dfrac{\\theta}{2\\pi} = \\dfrac{\\theta}{2}$.\n",
    "\n",
-    "Next, the large triangle (A + B + C) has an area equal to its height ($\\tan(\\theta)$) multiplied by its base (1) divided by 2, so A + B + C = $\\dfrac{\\tan(\\theta)}{2}$.\n",
+    "Next, the large triangle (A + B + C) has an area equal to its height ($\\tan(\\theta)$) multiplied by its base (of length 1) divided by 2, so A + B + C = $\\dfrac{\\tan(\\theta)}{2}$.\n",
    "\n",
    "When $0 < \\theta < \\dfrac{\\pi}{2}$, we have $\\mathrm{A} < \\mathrm{A} + \\mathrm{B} < \\mathrm{A} + \\mathrm{B} + \\mathrm{C}$, therefore:\n",
    "\n",
    "$\\dfrac{1}{2}\\sin(\\theta)\\cos(\\theta) < \\dfrac{\\theta}{2} < \\dfrac{\\tan(\\theta)}{2}$\n",
    "\n",
-    "We can multiply all the terms by 2 to get rid of the $\\dfrac{1}{2}$ factors. We can also divide by $\\sin(\\theta)$, which is stricly positive (assuming $0 < \\theta < \\dfrac{\\pi}{2}$), so the inequalities still hold:\n",
+    "We can multiply all the terms by 2 to get rid of the $\\dfrac{1}{2}$ factors. We can also divide by $\\sin(\\theta)$, which is strictly positive (assuming $0 < \\theta < \\dfrac{\\pi}{2}$), so the inequalities still hold:\n",
    "\n",
    "$cos(\\theta) < \\dfrac{\\theta}{\\sin(\\theta)} < \\dfrac{\\tan(\\theta)}{\\sin(\\theta)}$\n",
    "\n",
@ -5843,7 +5840,7 @@
    "\n",
    "$\\dfrac{1}{cos(\\theta)} > \\dfrac{\\sin(\\theta)}{\\theta} > \\cos(\\theta)$\n",
    "\n",
-    "assuming $-\\dfrac{\\theta}{2} < \\theta < \\dfrac{\\pi}{2}$ and $\\theta \\neq 0$\n",
+    "assuming $-\\dfrac{\\pi}{2} < \\theta < \\dfrac{\\pi}{2}$ and $\\theta \\neq 0$\n",
    "<hr />\n",
    "\n",
    "Since $\\cos$ is a continuous function, $\\underset{\\theta \\to 0}\\lim\\cos(\\theta)=\\cos(0)=1$. Similarly, $\\underset{\\theta \\to 0}\\lim\\dfrac{1}{cos(\\theta)}=\\dfrac{1}{\\cos(0)}=1$.\n",
@ -5872,12 +5869,12 @@
    "\\begin{align*}\n",
    "\\underset{\\theta \\to 0}\\lim\\dfrac{\\cos(\\theta) - 1}{\\theta} & =  \\underset{\\theta \\to 0}\\lim\\dfrac{\\cos(\\theta) - 1}{\\theta}\\frac{\\cos(\\theta) + 1}{\\cos(\\theta) + 1} && \\quad \\text{ multiplying and dividing by }\\cos(\\theta)+1\\\\\n",
    "& =  \\underset{\\theta \\to 0}\\lim\\dfrac{\\cos^2(\\theta) - 1}{\\theta(\\cos(\\theta) + 1)} && \\quad \\text{ since }(a-1)(a+1)=a^2-1\\\\\n",
-    "& =  \\underset{\\theta \\to 0}\\lim\\dfrac{\\sin^2(\\theta)}{\\theta(\\cos(\\theta) + 1)} && \\quad \\text{ since }\\cos^2(\\theta) - 1 = \\sin^2(\\theta)\\\\\n",
+    "& =  \\underset{\\theta \\to 0}\\lim\\dfrac{-\\sin^2(\\theta)}{\\theta(\\cos(\\theta) + 1)} && \\quad \\text{ since }\\cos^2(\\theta) - 1 = -\\sin^2(\\theta)\\\\\n",
-    "& =  \\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\theta}\\dfrac{\\sin(\\theta)}{\\cos(\\theta) + 1} && \\quad \\text{ just rearranging the terms}\\\\\n",
+    "& =  -\\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\theta}\\dfrac{\\sin(\\theta)}{\\cos(\\theta) + 1} && \\quad \\text{ just rearranging the terms}\\\\\n",
-    "& =  \\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\theta} \\, \\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\cos(\\theta) + 1} && \\quad \\text{ since the limit of a product is the product of the limits}\\\\\n",
+    "& =  -\\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\theta} \\, \\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\cos(\\theta) + 1} && \\quad \\text{ since the limit of a product is the product of the limits}\\\\\n",
-    "& =  \\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\cos(\\theta) + 1} && \\quad \\text{ since } \\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\theta}=1\\\\\n",
+    "& =  -\\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\cos(\\theta) + 1} && \\quad \\text{ since } \\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\theta}=1\\\\\n",
-    "& =  \\dfrac{0}{1+1} && \\quad \\text{ since } \\underset{\\theta \\to 0}\\lim\\sin(\\theta)=0 \\text{ and } \\underset{\\theta \\to 0}\\lim\\cos(\\theta)=1\\\\\n",
+    "& =  -\\dfrac{0}{1+1} && \\quad \\text{ since } \\underset{\\theta \\to 0}\\lim\\sin(\\theta)=0 \\text{ and } \\underset{\\theta \\to 0}\\lim\\cos(\\theta)=1\\\\\n",
-    "& =  0\\\\\n",
+    "& =  0 &&\n",
    "\\end{align*}\n",
    "$\n",
    "\n",
@ -5911,7 +5908,7 @@
    "\\begin{align*}\n",
    "f'(x) & = \\underset{\\theta \\to 0}\\lim\\dfrac{f(x+\\theta) - f(x)}{\\theta} && \\quad\\text{by definition}\\\\\n",
    "& = \\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(x+\\theta) - \\sin(x)}{\\theta} && \\quad \\text{using }f(x) = \\sin(x)\\\\\n",
-    "& = \\underset{\\theta \\to 0}\\lim\\dfrac{\\cos(x)\\sin(\\theta) + \\sin(x)\\cos(\\theta) - \\sin(x)}{\\theta} && \\quad \\text{since } cos(a+b)=\\cos(a)\\sin(b)+\\sin(a)\\cos(b)\\\\\n",
+    "& = \\underset{\\theta \\to 0}\\lim\\dfrac{\\cos(x)\\sin(\\theta) + \\sin(x)\\cos(\\theta) - \\sin(x)}{\\theta} && \\quad \\text{since } sin(a+b)=\\cos(a)\\sin(b)+\\sin(a)\\cos(b)\\\\\n",
    "& = \\underset{\\theta \\to 0}\\lim\\dfrac{\\cos(x)\\sin(\\theta)}{\\theta} + \\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(x)\\cos(\\theta) - \\sin(x)}{\\theta} && \\quad \\text{since the limit of a sum is the sum of the limits}\\\\\n",
    "& = \\cos(x)\\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\theta} + \\sin(x)\\underset{\\theta \\to 0}\\lim\\dfrac{\\cos(\\theta) - 1}{\\theta} && \\quad \\text{bringing out } \\cos(x) \\text{ and } \\sin(x) \\text{ since they don't depend on }\\theta\\\\\n",
    "& = \\cos(x)\\underset{\\theta \\to 0}\\lim\\dfrac{\\sin(\\theta)}{\\theta} && \\quad \\text{since }\\underset{\\theta \\to 0}\\lim\\dfrac{\\cos(\\theta) - 1}{\\theta}=0\\\\\n",