handson-ml/math_linear_algebra.ipynb

4062 lines
108 KiB
Plaintext
Raw Normal View History

2016-03-03 18:29:41 +01:00
{
"cells": [
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"**Math - Linear Algebra**\n",
"\n",
"*Linear Algebra is the branch of mathematics that studies [vector spaces](https://en.wikipedia.org/wiki/Vector_space) and linear transformations between vector spaces, such as rotating a shape, scaling it up or down, translating it (ie. moving it), etc.*\n",
"\n",
"*Machine Learning relies heavily on Linear Algebra, so it is essential to understand what vectors and matrices are, what operations you can perform with them, and how they can be useful.*"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Before we start, let's ensure that this notebook works well in both Python 2 and 3:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"from __future__ import division, print_function, unicode_literals"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"# Vectors\n",
"## Definition\n",
"A vector is a quantity defined by a magnitude and a direction. For example, a rocket's velocity is a 3-dimensional vector: its magnitude is the speed of the rocket, and its direction is (hopefully) up. A vector can be represented by an array of numbers called *scalars*. Each scalar corresponds to the magnitude of the vector with regards to each dimension.\n",
"\n",
"For example, say the rocket is going up at a slight angle: it has a vertical speed of 5,000 m/s, and also a slight speed towards the East at 10 m/s, and a slight speed towards the North at 50 m/s. The rocket's velocity may be represented by the following vector:\n",
"\n",
"**velocity** $= \\begin{pmatrix}\n",
"10 \\\\\n",
"50 \\\\\n",
"5000 \\\\\n",
"\\end{pmatrix}$\n",
"\n",
"Note: by convention vectors are generally presented in the form of columns. Also, vector names are generally lowercase to distinguish them from matrices (which we will discuss below) and in bold (when possible) to distinguish them from simple scalar values such as ${meters\\_per\\_second} = 5026$.\n",
"\n",
"A list of N numbers may also represent the coordinates of a point in an N-dimensional space, so it is quite frequent to represent vectors as simple points instead of arrows. A vector with 1 element may be represented as an arrow or a point on an axis, a vector with 2 elements is an arrow or a point on a plane, a vector with 3 elements is an arrow or point in space, and a vector with N elements is an arrow or a point in an N-dimensional space… which most people find hard to imagine.\n",
"\n",
"\n",
"## Purpose\n",
"Vectors have many purposes in Machine Learning, most notably to represent observations and predictions. For example, say we built a Machine Learning system to classify videos into 3 categories (good, spam, clickbait) based on what we know about them. For each video, we would have a vector representing what we know about it, such as:\n",
"\n",
"**video** $= \\begin{pmatrix}\n",
"10.5 \\\\\n",
"5.2 \\\\\n",
"3.25 \\\\\n",
"7.0\n",
"\\end{pmatrix}$\n",
"\n",
"This vector could represent a video that lasts 10.5 minutes, but only 5.2% viewers watch for more than a minute, it gets 3.25 views per day on average, and it was flagged 7 times as spam. As you can see, each axis may have a different meaning.\n",
"\n",
"Based on this vector our Machine Learning system may predict that there is an 80% probability that it is a spam video, 18% that it is clickbait, and 2% that it is a good video. This could be represented as the following vector:\n",
"\n",
"**class_probabilities** $= \\begin{pmatrix}\n",
"0.80 \\\\\n",
"0.18 \\\\\n",
"0.02\n",
"\\end{pmatrix}$"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Vectors in python\n",
"In python, a vector can be represented in many ways, the simplest being a regular python list of numbers:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"[10.5, 5.2, 3.25, 7.0]"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Since we plan to do quite a lot of scientific calculations, it is much better to use NumPy's `ndarray`, which provides a lot of convenient and optimized implementations of essential mathematical operations on vectors (for more details about NumPy, check out the [NumPy tutorial](tools_numpy.ipynb)). For example:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"import numpy as np\n",
"video = np.array([10.5, 5.2, 3.25, 7.0])\n",
"video"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"The size of a vector can be obtained using the `size` attribute:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"video.size"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"The $i^{th}$ element (also called *entry* or *item*) of a vector $\\textbf{v}$ is noted $\\textbf{v}_i$.\n",
"\n",
"Note that indices in mathematics generally start at 1, but in programming they usually start at 0. So to access $\\textbf{video}_3$ programmatically, we would write:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"video[2] # 3rd element"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Plotting vectors\n",
"To plot vectors we will use matplotlib, so let's start by importing it (for details about matplotlib, check the [matplotlib tutorial](tools_matplotlib.ipynb)):"
]
},
{
"cell_type": "code",
"execution_count": 6,
2016-03-03 18:29:41 +01:00
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"### 2D vectors\n",
"Let's create a couple very simple 2D vectors to plot:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"u = np.array([2, 5])\n",
"v = np.array([3, 1])"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"These vectors each have 2 elements, so they can easily be represented graphically on a 2D graph, for example as points:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"x_coords, y_coords = zip(u, v)\n",
"plt.scatter(x_coords, y_coords, color=[\"r\",\"b\"])\n",
"plt.axis([0, 9, 0, 6])\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Vectors can also be represented as arrows. Let's create a small convenience function to draw nice arrows:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"def plot_vector2d(vector2d, origin=[0, 0], **options):\n",
" return plt.arrow(origin[0], origin[1], vector2d[0], vector2d[1],\n",
" head_width=0.2, head_length=0.3, length_includes_head=True,\n",
" **options)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Now let's draw the vectors **u** and **v** as arrows:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"plot_vector2d(u, color=\"r\")\n",
"plot_vector2d(v, color=\"b\")\n",
"plt.axis([0, 9, 0, 6])\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"### 3D vectors\n",
"Plotting 3D vectors is also relatively straightforward. First let's create two 3D vectors:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"a = np.array([1, 2, 8])\n",
"b = np.array([5, 6, 3])"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Now let's plot them using matplotlib's `Axes3D`:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"from mpl_toolkits.mplot3d import Axes3D\n",
"\n",
"subplot3d = plt.subplot(111, projection='3d')\n",
"x_coords, y_coords, z_coords = zip(a,b)\n",
"subplot3d.scatter(x_coords, y_coords, z_coords)\n",
"subplot3d.set_zlim3d([0, 9])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"It is a bit hard to visualize exactly where in space these two points are, so let's add vertical lines. We'll create a small convenience function to plot a list of 3d vectors with vertical lines attached:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"def plot_vectors3d(ax, vectors3d, z0, **options):\n",
" for v in vectors3d:\n",
" x, y, z = v\n",
" ax.plot([x,x], [y,y], [z0, z], color=\"gray\", linestyle='dotted', marker=\".\")\n",
" x_coords, y_coords, z_coords = zip(*vectors3d)\n",
" ax.scatter(x_coords, y_coords, z_coords, **options)\n",
"\n",
"subplot3d = plt.subplot(111, projection='3d')\n",
"subplot3d.set_zlim([0, 9])\n",
"plot_vectors3d(subplot3d, [a,b], 0, color=(\"r\",\"b\"))\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Norm\n",
"The norm of a vector $\\textbf{u}$, noted $\\left \\Vert \\textbf{u} \\right \\|$, is a measure of the length (a.k.a. the magnitude) of $\\textbf{u}$. There are multiple possible norms, but the most common one (and the only one we will discuss here) is the Euclidian norm, which is defined as:\n",
"\n",
"$\\left \\Vert \\textbf{u} \\right \\| = \\sqrt{\\sum_{i}{\\textbf{u}_i}^2}$\n",
"\n",
"We could implement this easily in pure python, recalling that $\\sqrt x = x^{\\frac{1}{2}}$"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"def vector_norm(vector):\n",
" squares = [element**2 for element in vector]\n",
" return sum(squares)**0.5\n",
"\n",
"print(\"||\", u, \"|| =\")\n",
"vector_norm(u)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"However, it is much more efficient to use NumPy's `norm` function, available in the `linalg` (**Lin**ear **Alg**ebra) module:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"import numpy.linalg as LA\n",
"LA.norm(u)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Let's plot a little diagram to confirm that the length of vector $\\textbf{v}$ is indeed $\\approx5.4$:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"radius = LA.norm(u)\n",
"plt.gca().add_artist(plt.Circle((0,0), radius, color=\"#DDDDDD\"))\n",
"plot_vector2d(u, color=\"red\")\n",
"plt.axis([0, 8.7, 0, 6])\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Looks about right!"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Addition\n",
"Vectors of same size can be added together. Addition is performed *elementwise*:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"print(\" \", u)\n",
"print(\"+\", v)\n",
"print(\"-\"*10)\n",
"u + v"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Let's look at what vector addition looks like graphically:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
2017-04-20 21:45:58 +02:00
"deletable": true,
"editable": true,
2016-03-03 18:29:41 +01:00
"scrolled": true
},
"outputs": [],
"source": [
"plot_vector2d(u, color=\"r\")\n",
"plot_vector2d(v, color=\"b\")\n",
"plot_vector2d(v, origin=u, color=\"b\", linestyle=\"dotted\")\n",
"plot_vector2d(u, origin=v, color=\"r\", linestyle=\"dotted\")\n",
"plot_vector2d(u+v, color=\"g\")\n",
"plt.axis([0, 9, 0, 7])\n",
"plt.text(0.7, 3, \"u\", color=\"r\", fontsize=18)\n",
"plt.text(4, 3, \"u\", color=\"r\", fontsize=18)\n",
"plt.text(1.8, 0.2, \"v\", color=\"b\", fontsize=18)\n",
"plt.text(3.1, 5.6, \"v\", color=\"b\", fontsize=18)\n",
"plt.text(2.4, 2.5, \"u+v\", color=\"g\", fontsize=18)\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Vector addition is **commutative**, meaning that $\\textbf{u} + \\textbf{v} = \\textbf{v} + \\textbf{u}$. You can see it on the previous image: following $\\textbf{u}$ *then* $\\textbf{v}$ leads to the same point as following $\\textbf{v}$ *then* $\\textbf{u}$.\n",
"\n",
"Vector addition is also **associative**, meaning that $\\textbf{u} + (\\textbf{v} + \\textbf{w}) = (\\textbf{u} + \\textbf{v}) + \\textbf{w}$."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"If you have a shape defined by a number of points (vectors), and you add a vector $\\textbf{v}$ to all of these points, then the whole shape gets shifted by $\\textbf{v}$. This is called a [geometric translation](https://en.wikipedia.org/wiki/Translation_%28geometry%29):"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"t1 = np.array([2, 0.25])\n",
"t2 = np.array([2.5, 3.5])\n",
"t3 = np.array([1, 2])\n",
"\n",
"x_coords, y_coords = zip(t1, t2, t3, t1)\n",
"plt.plot(x_coords, y_coords, \"c--\", x_coords, y_coords, \"co\")\n",
"\n",
"plot_vector2d(v, t1, color=\"r\", linestyle=\":\")\n",
"plot_vector2d(v, t2, color=\"r\", linestyle=\":\")\n",
"plot_vector2d(v, t3, color=\"r\", linestyle=\":\")\n",
"\n",
"t1b = t1 + v\n",
"t2b = t2 + v\n",
"t3b = t3 + v\n",
"\n",
"x_coords_b, y_coords_b = zip(t1b, t2b, t3b, t1b)\n",
"plt.plot(x_coords_b, y_coords_b, \"b-\", x_coords_b, y_coords_b, \"bo\")\n",
"\n",
"plt.text(4, 4.2, \"v\", color=\"r\", fontsize=18)\n",
"plt.text(3, 2.3, \"v\", color=\"r\", fontsize=18)\n",
"plt.text(3.5, 0.4, \"v\", color=\"r\", fontsize=18)\n",
"\n",
"plt.axis([0, 6, 0, 5])\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Finally, substracting a vector is like adding the opposite vector."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Multiplication by a scalar\n",
"Vectors can be multiplied by scalars. All elements in the vector are multiplied by that number, for example:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"print(\"1.5 *\", u, \"=\")\n",
"\n",
"1.5 * u"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Graphically, scalar multiplication results in changing the scale of a figure, hence the name *scalar*. The distance from the origin (the point at coordinates equal to zero) is also multiplied by the scalar. For example, let's scale up by a factor of `k = 2.5`:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"k = 2.5\n",
"t1c = k * t1\n",
"t2c = k * t2\n",
"t3c = k * t3\n",
"\n",
"plt.plot(x_coords, y_coords, \"c--\", x_coords, y_coords, \"co\")\n",
"\n",
"plot_vector2d(t1, color=\"r\")\n",
"plot_vector2d(t2, color=\"r\")\n",
"plot_vector2d(t3, color=\"r\")\n",
"\n",
"x_coords_c, y_coords_c = zip(t1c, t2c, t3c, t1c)\n",
"plt.plot(x_coords_c, y_coords_c, \"b-\", x_coords_c, y_coords_c, \"bo\")\n",
"\n",
"plot_vector2d(k * t1, color=\"b\", linestyle=\":\")\n",
"plot_vector2d(k * t2, color=\"b\", linestyle=\":\")\n",
"plot_vector2d(k * t3, color=\"b\", linestyle=\":\")\n",
"\n",
"plt.axis([0, 9, 0, 9])\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"As you might guess, dividing a vector by a scalar is equivalent to multiplying by its inverse:\n",
"\n",
"$\\dfrac{\\textbf{u}}{\\lambda} = \\dfrac{1}{\\lambda} \\times \\textbf{u}$"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Scalar multiplication is **commutative**: $\\lambda \\times \\textbf{u} = \\textbf{u} \\times \\lambda$.\n",
"\n",
"It is also **associative**: $\\lambda_1 \\times (\\lambda_2 \\times \\textbf{u}) = (\\lambda_1 \\times \\lambda_2) \\times \\textbf{u}$.\n",
"\n",
"Finally, it is **distributive** over addition of vectors: $\\lambda \\times (\\textbf{u} + \\textbf{v}) = \\lambda \\times \\textbf{u} + \\lambda \\times \\textbf{v}$."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Zero, unit and normalized vectors\n",
"* A **zero-vector ** is a vector full of 0s.\n",
"* A **unit vector** is a vector with a norm equal to 1.\n",
"* The **normalized vector** of a non-null vector $\\textbf{u}$, noted $\\hat{\\textbf{u}}$, is the unit vector that points in the same direction as $\\textbf{u}$. It is equal to: $\\hat{\\textbf{u}} = \\dfrac{\\textbf{u}}{\\left \\Vert \\textbf{u} \\right \\|}$\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"plt.gca().add_artist(plt.Circle((0,0),1,color='c'))\n",
"plt.plot(0, 0, \"ko\")\n",
"plot_vector2d(v / LA.norm(v), color=\"k\")\n",
"plot_vector2d(v, color=\"b\", linestyle=\":\")\n",
"plt.text(0.3, 0.3, \"$\\hat{u}$\", color=\"k\", fontsize=18)\n",
"plt.text(1.5, 0.7, \"$u$\", color=\"b\", fontsize=18)\n",
"plt.axis([-1.5, 5.5, -1.5, 3.5])\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Dot product\n",
"### Definition\n",
"The dot product (also called *scalar product* or *inner product* in the context of the Euclidian space) of two vectors $\\textbf{u}$ and $\\textbf{v}$ is a useful operation that comes up fairly often in linear algebra. It is noted $\\textbf{u} \\cdot \\textbf{v}$, or sometimes $⟨\\textbf{u}|\\textbf{v}⟩$ or $(\\textbf{u}|\\textbf{v})$, and it is defined as:\n",
"\n",
"$\\textbf{u} \\cdot \\textbf{v} = \\left \\Vert \\textbf{u} \\right \\| \\times \\left \\Vert \\textbf{v} \\right \\| \\times cos(\\theta)$\n",
"\n",
"where $\\theta$ is the angle between $\\textbf{u}$ and $\\textbf{v}$.\n",
"\n",
"Another way to calculate the dot product is:\n",
"\n",
"$\\textbf{u} \\cdot \\textbf{v} = \\sum_i{\\textbf{u}_i \\times \\textbf{v}_i}$\n",
"\n",
"### In python\n",
"The dot product is pretty simple to implement:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"def dot_product(v1, v2):\n",
" return sum(v1i * v2i for v1i, v2i in zip(v1, v2))\n",
"\n",
"dot_product(u, v)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"But a *much* more efficient implementation is provided by NumPy with the `dot` function:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"np.dot(u,v)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Equivalently, you can use the `dot` method of `ndarray`s:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"u.dot(v)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"**Caution**: the `*` operator will perform an *elementwise* multiplication, *NOT* a dot product:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"print(\" \",u)\n",
"print(\"* \",v, \"(NOT a dot product)\")\n",
"print(\"-\"*10)\n",
"\n",
"u * v"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"### Main properties\n",
"* The dot product is **commutative**: $\\textbf{u} \\cdot \\textbf{v} = \\textbf{v} \\cdot \\textbf{u}$.\n",
"* The dot product is only defined between two vectors, not between a scalar and a vector. This means that we cannot chain dot products: for example, the expression $\\textbf{u} \\cdot \\textbf{v} \\cdot \\textbf{w}$ is not defined since $\\textbf{u} \\cdot \\textbf{v}$ is a scalar and $\\textbf{w}$ is a vector.\n",
"* This also means that the dot product is **NOT associative**: $(\\textbf{u} \\cdot \\textbf{v}) \\cdot \\textbf{w} ≠ \\textbf{u} \\cdot (\\textbf{v} \\cdot \\textbf{w})$ since neither are defined.\n",
"* However, the dot product is **associative with regards to scalar multiplication**: $\\lambda \\times (\\textbf{u} \\cdot \\textbf{v}) = (\\lambda \\times \\textbf{u}) \\cdot \\textbf{v} = \\textbf{u} \\cdot (\\lambda \\times \\textbf{v})$\n",
"* Finally, the dot product is **distributive** over addition of vectors: $\\textbf{u} \\cdot (\\textbf{v} + \\textbf{w}) = \\textbf{u} \\cdot \\textbf{v} + \\textbf{u} \\cdot \\textbf{w}$."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"### Calculating the angle between vectors\n",
"One of the many uses of the dot product is to calculate the angle between two non-zero vectors. Looking at the dot product definition, we can deduce the following formula:\n",
"\n",
"$\\theta = \\arccos{\\left ( \\dfrac{\\textbf{u} \\cdot \\textbf{v}}{\\left \\Vert \\textbf{u} \\right \\| \\times \\left \\Vert \\textbf{v} \\right \\|} \\right ) }$\n",
"\n",
"Note that if $\\textbf{u} \\cdot \\textbf{v} = 0$, it follows that $\\theta = \\dfrac{π}{4}$. In other words, if the dot product of two non-null vectors is zero, it means that they are orthogonal.\n",
2016-03-03 18:29:41 +01:00
"\n",
"Let's use this formula to calculate the angle between $\\textbf{u}$ and $\\textbf{v}$ (in radians):"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"def vector_angle(u, v):\n",
" cos_theta = u.dot(v) / LA.norm(u) / LA.norm(v)\n",
" return np.arccos(np.clip(cos_theta, -1, 1))\n",
"\n",
"theta = vector_angle(u, v)\n",
"print(\"Angle =\", theta, \"radians\")\n",
"print(\" =\", theta * 180 / np.pi, \"degrees\")"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Note: due to small floating point errors, `cos_theta` may be very slightly outside of the $[-1, 1]$ interval, which would make `arccos` fail. This is why we clipped the value within the range, using NumPy's `clip` function."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"### Projecting a point onto an axis\n",
"The dot product is also very useful to project points onto an axis. The projection of vector $\\textbf{v}$ onto $\\textbf{u}$'s axis is given by this formula:\n",
"\n",
"$\\textbf{proj}_{\\textbf{u}}{\\textbf{v}} = \\dfrac{\\textbf{u} \\cdot \\textbf{v}}{\\left \\Vert \\textbf{u} \\right \\| ^2} \\times \\textbf{u}$\n",
"\n",
"Which is equivalent to:\n",
"\n",
"$\\textbf{proj}_{\\textbf{u}}{\\textbf{v}} = (\\textbf{v} \\cdot \\hat{\\textbf{u}}) \\times \\hat{\\textbf{u}}$\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"u_normalized = u / LA.norm(u)\n",
"proj = v.dot(u_normalized) * u_normalized\n",
"\n",
"plot_vector2d(u, color=\"r\")\n",
"plot_vector2d(v, color=\"b\")\n",
"\n",
"plot_vector2d(proj, color=\"k\", linestyle=\":\")\n",
"plt.plot(proj[0], proj[1], \"ko\")\n",
"\n",
"plt.plot([proj[0], v[0]], [proj[1], v[1]], \"b:\")\n",
"\n",
"plt.text(1, 2, \"$proj_u v$\", color=\"k\", fontsize=18)\n",
"plt.text(1.8, 0.2, \"$v$\", color=\"b\", fontsize=18)\n",
"plt.text(0.8, 3, \"$u$\", color=\"r\", fontsize=18)\n",
"\n",
"plt.axis([0, 8, 0, 5.5])\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"# Matrices\n",
"A matrix is a rectangular array of scalars (ie. any number: integer, real or complex) arranged in rows and columns, for example:\n",
"\n",
"\\begin{bmatrix} 10 & 20 & 30 \\\\ 40 & 50 & 60 \\end{bmatrix}\n",
"\n",
"You can also think of a matrix as a list of vectors: the previous matrix contains either 2 horizontal 3D vectors or 3 vertical 2D vectors.\n",
"\n",
"Matrices are convenient and very efficient to run operations on many vectors at a time. We will also see that they are great at representing and performing linear transformations such rotations, translations and scaling."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Matrices in python\n",
"In python, a matrix can be represented in various ways. The simplest is just a list of python lists:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"[\n",
" [10, 20, 30],\n",
" [40, 50, 60]\n",
"]"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"A much more efficient way is to use the NumPy library which provides optimized implementations of many matrix operations:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A = np.array([\n",
" [10,20,30],\n",
" [40,50,60]\n",
"])\n",
"A"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"By convention matrices generally have uppercase names, such as $A$.\n",
"\n",
"In the rest of this tutorial, we will assume that we are using NumPy arrays (type `ndarray`) to represent matrices."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Size\n",
"The size of a matrix is defined by its number of rows and number of columns. It is noted $rows \\times columns$. For example, the matrix $A$ above is an example of a $2 \\times 3$ matrix: 2 rows, 3 columns. Caution: a $3 \\times 2$ matrix would have 3 rows and 2 columns.\n",
"\n",
"To get a matrix's size in NumPy:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A.shape"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"**Caution**: the `size` attribute represents the number of elements in the `ndarray`, not the matrix's size:"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A.size"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Element indexing\n",
"The number located in the $i^{th}$ row, and $j^{th}$ column of a matrix $X$ is sometimes noted $X_{i,j}$ or $X_{ij}$, but there is no standard notation, so people often prefer to explicitely name the elements, like this: \"*let $X = (x_{i,j})_{1 ≤ i ≤ m, 1 ≤ j ≤ n}$*\". This means that $X$ is equal to:\n",
"\n",
"$X = \\begin{bmatrix}\n",
" x_{1,1} & x_{1,2} & x_{1,3} & \\cdots & x_{1,n}\\\\\n",
" x_{2,1} & x_{2,2} & x_{2,3} & \\cdots & x_{2,n}\\\\\n",
" x_{3,1} & x_{3,2} & x_{3,3} & \\cdots & x_{3,n}\\\\\n",
" \\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
" x_{m,1} & x_{m,2} & x_{m,3} & \\cdots & x_{m,n}\\\\\n",
"\\end{bmatrix}$\n",
"\n",
"However in this notebook we will use the $X_{i,j}$ notation, as it matches fairly well NumPy's notation. Note that in math indices generally start at 1, but in programming they usually start at 0. So to access $A_{2,3}$ programmatically, we need to write this:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A[1,2] # 2nd row, 3rd column"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"The $i^{th}$ row vector is sometimes noted $M_i$ or $M_{i,*}$, but again there is no standard notation so people often prefer to explicitely define their own names, for example: \"*let **x**$_{i}$ be the $i^{th}$ row vector of matrix $X$*\". We will use the $M_{i,*}$, for the same reason as above. For example, to access $A_{2,*}$ (ie. $A$'s 2nd row vector):"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A[1, :] # 2nd row vector (as a 1D array)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Similarly, the $j^{th}$ column vector is sometimes noted $M^j$ or $M_{*,j}$, but there is no standard notation. We will use $M_{*,j}$. For example, to access $A_{*,3}$ (ie. $A$'s 3rd column vector):"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A[:, 2] # 3rd column vector (as a 1D array)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Note that the result is actually a one-dimensional NumPy array: there is no such thing as a *vertical* or *horizontal* one-dimensional array. If you need to actually represent a row vector as a one-row matrix (ie. a 2D NumPy array), or a column vector as a one-column matrix, then you need to use a slice instead of an integer when accessing the row or column, for example:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A[1:2, :] # rows 2 to 3 (excluded): this returns row 2 as a one-row matrix"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A[:, 2:3] # columns 3 to 4 (excluded): this returns column 3 as a one-column matrix"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Square, triangular, diagonal and identity matrices\n",
"A **square matrix** is a matrix that has the same number of rows and columns, for example a $3 \\times 3$ matrix:\n",
"\n",
"\\begin{bmatrix}\n",
" 4 & 9 & 2 \\\\\n",
" 3 & 5 & 7 \\\\\n",
" 8 & 1 & 6\n",
"\\end{bmatrix}"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"An **upper triangular matrix** is a special kind of square matrix where all the elements *below* the main diagonal (top-left to bottom-right) are zero, for example:\n",
"\n",
"\\begin{bmatrix}\n",
" 4 & 9 & 2 \\\\\n",
" 0 & 5 & 7 \\\\\n",
" 0 & 0 & 6\n",
"\\end{bmatrix}"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Similarly, a **lower triangular matrix** is a square matrix where all elements *above* the main diagonal are zero, for example:\n",
"\n",
"\\begin{bmatrix}\n",
" 4 & 0 & 0 \\\\\n",
" 3 & 5 & 0 \\\\\n",
" 8 & 1 & 6\n",
"\\end{bmatrix}"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"A **triangular matrix** is one that is either lower triangular or upper triangular."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"A matrix that is both upper and lower triangular is called a **diagonal matrix**, for example:\n",
"\n",
"\\begin{bmatrix}\n",
" 4 & 0 & 0 \\\\\n",
" 0 & 5 & 0 \\\\\n",
" 0 & 0 & 6\n",
"\\end{bmatrix}\n",
"\n",
"You can construct a diagonal matrix using NumPy's `diag` function:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"np.diag([4, 5, 6])"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"If you pass a matrix to the `diag` function, it will happily extract the diagonal values:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"D = np.array([\n",
" [1, 2, 3],\n",
" [4, 5, 6],\n",
" [7, 8, 9],\n",
" ])\n",
"np.diag(D)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Finally, the **identity matrix** of size $n$, noted $I_n$, is a diagonal matrix of size $n \\times n$ with $1$'s in the main diagonal, for example $I_3$:\n",
"\n",
"\\begin{bmatrix}\n",
" 1 & 0 & 0 \\\\\n",
" 0 & 1 & 0 \\\\\n",
" 0 & 0 & 1\n",
"\\end{bmatrix}\n",
"\n",
"Numpy's `eye` function returns the identity matrix of the desired size:"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"np.eye(3)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"The identity matrix is often noted simply $I$ (instead of $I_n$) when its size is clear given the context. It is called the *identity* matrix because multiplying a matrix with it leaves the matrix unchanged as we will see below."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Adding matrices\n",
"If two matrices $Q$ and $R$ have the same size $m \\times n$, they can be added together. Addition is performed *elementwise*: the result is also a $m \\times n$ matrix $S$ where each element is the sum of the elements at the corresponding position: $S_{i,j} = Q_{i,j} + R_{i,j}$\n",
"\n",
"$S =\n",
"\\begin{bmatrix}\n",
" Q_{11} + R_{11} & Q_{12} + R_{12} & Q_{13} + R_{13} & \\cdots & Q_{1n} + R_{1n} \\\\\n",
" Q_{21} + R_{21} & Q_{22} + R_{22} & Q_{23} + R_{23} & \\cdots & Q_{2n} + R_{2n} \\\\\n",
" Q_{31} + R_{31} & Q_{32} + R_{32} & Q_{33} + R_{33} & \\cdots & Q_{3n} + R_{3n} \\\\\n",
" \\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
" Q_{m1} + R_{m1} & Q_{m2} + R_{m2} & Q_{m3} + R_{m3} & \\cdots & Q_{mn} + R_{mn} \\\\\n",
"\\end{bmatrix}$\n",
"\n",
"For example, let's create a $2 \\times 3$ matric $B$ and compute $A + B$:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"B = np.array([[1,2,3], [4, 5, 6]])\n",
"B"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A + B"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"**Addition is *commutative***, meaning that $A + B = B + A$:"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"B + A"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"**It is also *associative***, meaning that $A + (B + C) = (A + B) + C$:"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"C = np.array([[100,200,300], [400, 500, 600]])\n",
"\n",
"A + (B + C)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"(A + B) + C"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Scalar multiplication\n",
"A matrix $M$ can be multiplied by a scalar $\\lambda$. The result is noted $\\lambda M$, and it is a matrix of the same size as $M$ with all elements multiplied by $\\lambda$:\n",
2016-03-03 18:29:41 +01:00
"\n",
"$\\lambda M =\n",
"\\begin{bmatrix}\n",
" \\lambda \\times M_{11} & \\lambda \\times M_{12} & \\lambda \\times M_{13} & \\cdots & \\lambda \\times M_{1n} \\\\\n",
" \\lambda \\times M_{21} & \\lambda \\times M_{22} & \\lambda \\times M_{23} & \\cdots & \\lambda \\times M_{2n} \\\\\n",
" \\lambda \\times M_{31} & \\lambda \\times M_{32} & \\lambda \\times M_{33} & \\cdots & \\lambda \\times M_{3n} \\\\\n",
" \\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
" \\lambda \\times M_{m1} & \\lambda \\times M_{m2} & \\lambda \\times M_{m3} & \\cdots & \\lambda \\times M_{mn} \\\\\n",
"\\end{bmatrix}$\n",
"\n",
"A more concise way of writing this is:\n",
"\n",
"$(\\lambda M)_{i,j} = \\lambda (M)_{i,j}$\n",
"\n",
2016-03-03 18:29:41 +01:00
"In NumPy, simply use the `*` operator to multiply a matrix by a scalar. For example:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"2 * A"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Scalar multiplication is also defined on the right hand side, and gives the same result: $M \\lambda = \\lambda M$. For example:"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A * 2"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"This makes scalar multiplication **commutative**.\n",
"\n",
"It is also **associative**, meaning that $\\alpha (\\beta M) = (\\alpha \\times \\beta) M$, where $\\alpha$ and $\\beta$ are scalars. For example:"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"2 * (3 * A)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"(2 * 3) * A"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Finally, it is **distributive over addition** of matrices, meaning that $\\lambda (Q + R) = \\lambda Q + \\lambda R$:"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"2 * (A + B)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"2 * A + 2 * B"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Matrix multiplication\n",
2016-03-03 18:29:41 +01:00
"So far, matrix operations have been rather intuitive. But multiplying matrices is a bit more involved.\n",
"\n",
"A matrix $Q$ of size $m \\times n$ can be multiplied by a matrix $R$ of size $n \\times q$. It is noted simply $QR$ without multiplication sign or dot. The result $P$ is an $m \\times q$ matrix where each element is computed as a sum of products:\n",
2016-03-03 18:29:41 +01:00
"\n",
"$P_{i,j} = \\sum_{k=1}^n{Q_{i,k} \\times R_{k,j}}$\n",
"\n",
"The element at position $i,j$ in the resulting matrix is the sum of the products of elements in row $i$ of matrix $Q$ by the elements in column $j$ of matrix $R$.\n",
"\n",
"$P =\n",
"\\begin{bmatrix}\n",
"Q_{11} R_{11} + Q_{12} R_{21} + \\cdots + Q_{1n} R_{n1} &\n",
" Q_{11} R_{12} + Q_{12} R_{22} + \\cdots + Q_{1n} R_{n2} &\n",
" \\cdots &\n",
" Q_{11} R_{1q} + Q_{12} R_{2q} + \\cdots + Q_{1n} R_{nq} \\\\\n",
"Q_{21} R_{11} + Q_{22} R_{21} + \\cdots + Q_{2n} R_{n1} &\n",
" Q_{21} R_{12} + Q_{22} R_{22} + \\cdots + Q_{2n} R_{n2} &\n",
" \\cdots &\n",
" Q_{21} R_{1q} + Q_{22} R_{2q} + \\cdots + Q_{2n} R_{nq} \\\\\n",
" \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
"Q_{m1} R_{11} + Q_{m2} R_{21} + \\cdots + Q_{mn} R_{n1} &\n",
" Q_{m1} R_{12} + Q_{m2} R_{22} + \\cdots + Q_{mn} R_{n2} &\n",
" \\cdots &\n",
" Q_{m1} R_{1q} + Q_{m2} R_{2q} + \\cdots + Q_{mn} R_{nq}\n",
"\\end{bmatrix}$\n",
"\n",
"You may notice that each element $P_{i,j}$ is the dot product of the row vector $Q_{i,*}$ and the column vector $R_{*,j}$:\n",
"\n",
"$P_{i,j} = Q_{i,*} \\cdot R_{*,j}$\n",
"\n",
"So we can rewrite $P$ more concisely as:\n",
"\n",
"$P =\n",
"\\begin{bmatrix}\n",
"Q_{1,*} \\cdot R_{*,1} & Q_{1,*} \\cdot R_{*,2} & \\cdots & Q_{1,*} \\cdot R_{*,q} \\\\\n",
"Q_{2,*} \\cdot R_{*,1} & Q_{2,*} \\cdot R_{*,2} & \\cdots & Q_{2,*} \\cdot R_{*,q} \\\\\n",
"\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
"Q_{m,*} \\cdot R_{*,1} & Q_{m,*} \\cdot R_{*,2} & \\cdots & Q_{m,*} \\cdot R_{*,q}\n",
"\\end{bmatrix}$\n"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Let's multiply two matrices in NumPy, using `ndarray`'s `dot` method:\n",
"\n",
"$E = AD = \\begin{bmatrix}\n",
2016-03-03 18:29:41 +01:00
" 10 & 20 & 30 \\\\\n",
" 40 & 50 & 60\n",
"\\end{bmatrix} \n",
2016-03-03 18:29:41 +01:00
"\\begin{bmatrix}\n",
" 2 & 3 & 5 & 7 \\\\\n",
" 11 & 13 & 17 & 19 \\\\\n",
" 23 & 29 & 31 & 37\n",
"\\end{bmatrix} = \n",
"\\begin{bmatrix}\n",
" 930 & 1160 & 1320 & 1560 \\\\\n",
" 2010 & 2510 & 2910 & 3450\n",
"\\end{bmatrix}$"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"D = np.array([\n",
" [ 2, 3, 5, 7],\n",
" [11, 13, 17, 19],\n",
" [23, 29, 31, 37]\n",
" ])\n",
"E = A.dot(D)\n",
"E"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Let's check this result by looking at one element, just to be sure: looking at $E_{2,3}$ for example, we need to multiply elements in $A$'s $2^{nd}$ row by elements in $D$'s $3^{rd}$ column, and sum up these products:"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"40*5 + 50*17 + 60*31"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"E[1,2] # row 2, column 3"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Looks good! You can check the other elements until you get used to the algorithm.\n",
"\n",
"We multiplied a $2 \\times 3$ matrix by a $3 \\times 4$ matrix, so the result is a $2 \\times 4$ matrix. The first matrix's number of columns has to be equal to the second matrix's number of rows. If we try to multiple $D$ by $A$, we get an error because D has 4 columns while A has 2 rows:"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"try:\n",
" D.dot(A)\n",
"except ValueError as e:\n",
" print(\"ValueError:\", e)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"This illustrates the fact that **matrix multiplication is *NOT* commutative**: in general $QR ≠ RQ$\n",
2016-03-03 18:29:41 +01:00
"\n",
"In fact, $QR$ and $RQ$ are only *both* defined if $Q$ has size $m \\times n$ and $R$ has size $n \\times m$. Let's look at an example where both *are* defined and show that they are (in general) *NOT* equal:"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F = np.array([\n",
" [5,2],\n",
" [4,1],\n",
" [9,3]\n",
" ])\n",
"A.dot(F)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F.dot(A)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"On the other hand, **matrix multiplication *is* associative**, meaning that $Q(RS) = (QR)S$. Let's create a $4 \\times 5$ matrix $G$ to illustrate this:"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"G = np.array([\n",
" [8, 7, 4, 2, 5],\n",
" [2, 5, 1, 0, 5],\n",
" [9, 11, 17, 21, 0],\n",
" [0, 1, 0, 1, 2]])\n",
"A.dot(D).dot(G) # (AB)G"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A.dot(D.dot(G)) # A(BG)"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"It is also ***distributive* over addition** of matrices, meaning that $(Q + R)S = QS + RS$. For example:"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"(A + B).dot(D)"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A.dot(D) + B.dot(D)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"The product of a matrix $M$ by the identity matrix (of matching size) results in the same matrix $M$. More formally, if $M$ is an $m \\times n$ matrix, then:\n",
2016-03-03 18:29:41 +01:00
"\n",
"$M I_n = I_m M = M$\n",
2016-03-03 18:29:41 +01:00
"\n",
"This is generally written more concisely (since the size of the identity matrices is unambiguous given the context):\n",
"\n",
"$MI = IM = M$\n",
"\n",
"For example:"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A.dot(np.eye(3))"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"np.eye(2).dot(A)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"**Caution**: NumPy's `*` operator performs elementwise multiplication, *NOT* a matrix multiplication:"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false,
2017-04-20 21:45:58 +02:00
"deletable": true,
"editable": true,
2016-03-03 18:29:41 +01:00
"scrolled": true
},
"outputs": [],
"source": [
"A * B # NOT a matrix multiplication"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"**The @ infix operator**\n",
"\n",
"Python 3.5 [introduced](https://docs.python.org/3/whatsnew/3.5.html#pep-465-a-dedicated-infix-operator-for-matrix-multiplication) the `@` infix operator for matrix multiplication, and NumPy 1.10 added support for it. If you are using Python 3.5+ and NumPy 1.10+, you can simply write `A @ D` instead of `A.dot(D)`, making your code much more readable (but less portable). This operator also works for vector dot products."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"import sys\n",
"print(\"Python version: {}.{}.{}\".format(*sys.version_info))\n",
"print(\"Numpy version:\", np.version.version)\n",
"\n",
"# Uncomment the following line if your Python version is ≥3.5\n",
"# and your NumPy version is ≥1.10:\n",
"\n",
"#A @ D"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Note: `Q @ R` is actually equivalent to `Q.__matmul__(R)` which is implemented by NumPy as `np.matmul(Q, R)`, not as `Q.dot(R)`. The main difference is that `matmul` does not support scalar multiplication, while `dot` does, so you can write `Q.dot(3)`, which is equivalent to `Q * 3`, but you cannot write `Q @ 3` ([more details](http://stackoverflow.com/a/34142617/38626))."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Matrix transpose\n",
"The transpose of a matrix $M$ is a matrix noted $M^T$ such that the $i^{th}$ row in $M^T$ is equal to the $i^{th}$ column in $M$:\n",
"\n",
"$ A^T =\n",
"\\begin{bmatrix}\n",
" 10 & 20 & 30 \\\\\n",
" 40 & 50 & 60\n",
"\\end{bmatrix}^T =\n",
"\\begin{bmatrix}\n",
" 10 & 40 \\\\\n",
" 20 & 50 \\\\\n",
" 30 & 60\n",
"\\end{bmatrix}$\n",
"\n",
"In other words, ($A^T)_{i,j}$ = $A_{j,i}$\n",
"\n",
"Obviously, if $M$ is an $m \\times n$ matrix, then $M^T$ is an $n \\times m$ matrix.\n",
2016-03-03 18:29:41 +01:00
"\n",
"Note: there are a few other notations, such as $M^t$, $M$, or ${^t}M$.\n",
"\n",
"In NumPy, a matrix's transpose can be obtained simply using the `T` attribute:"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A.T"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"As you might expect, transposing a matrix twice returns the original matrix:"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A.T.T"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Transposition is distributive over addition of matrices, meaning that $(Q + R)^T = Q^T + R^T$. For example:"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"(A + B).T"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"A.T + B.T"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Moreover, $(Q \\cdot R)^T = R^T \\cdot Q^T$. Note that the order is reversed. For example:"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"(A.dot(D)).T"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"D.T.dot(A.T)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"A **symmetric matrix** $M$ is defined as a matrix that is equal to its transpose: $M^T = M$. This definition implies that it must be a square matrix whose elements are symmetric relative to the main diagonal, for example:\n",
"\n",
"\\begin{bmatrix}\n",
" 17 & 22 & 27 & 49 \\\\\n",
" 22 & 29 & 36 & 0 \\\\\n",
" 27 & 36 & 45 & 2 \\\\\n",
" 49 & 0 & 2 & 99\n",
"\\end{bmatrix}\n",
"\n",
"The product of a matrix by its transpose is always a symmetric matrix, for example:"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"D.dot(D.T)"
]
},
{
"cell_type": "markdown",
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"source": [
"## Converting 1D arrays to 2D arrays in NumPy\n",
"As we mentionned earlier, in NumPy (as opposed to Matlab, for example), 1D really means 1D: there is no such thing as a vertical 1D-array or a horizontal 1D-array. So you should not be surprised to see that transposing a 1D array does not do anything:"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"u"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"u.T"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"We want to convert $\\textbf{u}$ into a row vector before transposing it. There are a few ways to do this:"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"u_row = np.array([u])\n",
"u_row"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Notice the extra square brackets: this is a 2D array with just one row (ie. a 1x2 matrix). In other words it really is a **row vector**."
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"u[np.newaxis, :]"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"This quite explicit: we are asking for a new vertical axis, keeping the existing data as the horizontal axis."
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"u[np.newaxis]"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"This is equivalent, but a little less explicit."
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"u[None]"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"This is the shortest version, but you probably want to avoid it because it is unclear. The reason it works is that `np.newaxis` is actually equal to `None`, so this is equivalent to the previous version.\n",
"\n",
"Ok, now let's transpose our row vector:"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"u_row.T"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Great! We now have a nice **column vector**.\n",
"\n",
"Rather than creating a row vector then transposing it, it is also possible to convert a 1D array directly into a column vector:"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"u[:, np.newaxis]"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Plotting a matrix\n",
"We have already seen that vectors can been represented as points or arrows in N-dimensional space. Is there a good graphical representation of matrices? Well you can simply see a matrix as a list of vectors, so plotting a matrix results in many points or arrows. For example, let's create a $2 \\times 4$ matrix `P` and plot it as points:"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"P = np.array([\n",
" [3.0, 4.0, 1.0, 4.6],\n",
" [0.2, 3.5, 2.0, 0.5]\n",
" ])\n",
"x_coords_P, y_coords_P = P\n",
"plt.scatter(x_coords_P, y_coords_P)\n",
"plt.axis([0, 5, 0, 4])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Of course we could also have stored the same 4 vectors as row vectors instead of column vectors, resulting in a $4 \\times 2$ matrix (the transpose of $P$, in fact). It is really an arbitrary choice.\n",
"\n",
"Since the vectors are ordered, you can see the matrix as a path and represent it with connected dots:"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"plt.plot(x_coords_P, y_coords_P, \"bo\")\n",
"plt.plot(x_coords_P, y_coords_P, \"b--\")\n",
"plt.axis([0, 5, 0, 4])\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Or you can represent it as a polygon: matplotlib's `Polygon` class expects an $n \\times 2$ NumPy array, not a $2 \\times n$ array, so we just need to give it $P^T$:"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"from matplotlib.patches import Polygon\n",
"plt.gca().add_artist(Polygon(P.T))\n",
"plt.axis([0, 5, 0, 4])\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Geometric applications of matrix operations\n",
"We saw earlier that vector addition results in a geometric translation, vector multiplication by a scalar results in rescaling (zooming in or out, centered on the origin), and vector dot product results in projecting a vector onto another vector, rescaling and measuring the resulting coordinate.\n",
"\n",
"Similarly, matrix operations have very useful geometric applications."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"### Addition = multiple geometric translations\n",
"First, adding two matrices together is equivalent to adding all their vectors together. For example, let's create a $2 \\times 4$ matrix $H$ and add it to $P$, and look at the result:"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"H = np.array([\n",
" [ 0.5, -0.2, 0.2, -0.1],\n",
" [ 0.4, 0.4, 1.5, 0.6]\n",
" ])\n",
"P_moved = P + H\n",
"\n",
"plt.gca().add_artist(Polygon(P.T, alpha=0.2))\n",
"plt.gca().add_artist(Polygon(P_moved.T, alpha=0.3, color=\"r\"))\n",
"for vector, origin in zip(H.T, P.T):\n",
" plot_vector2d(vector, origin=origin)\n",
"\n",
"plt.text(2.2, 1.8, \"$P$\", color=\"b\", fontsize=18)\n",
"plt.text(2.0, 3.2, \"$P+H$\", color=\"r\", fontsize=18)\n",
"plt.text(2.5, 0.5, \"$H_{*,1}$\", color=\"k\", fontsize=18)\n",
"plt.text(4.1, 3.5, \"$H_{*,2}$\", color=\"k\", fontsize=18)\n",
"plt.text(0.4, 2.6, \"$H_{*,3}$\", color=\"k\", fontsize=18)\n",
"plt.text(4.4, 0.2, \"$H_{*,4}$\", color=\"k\", fontsize=18)\n",
"\n",
"plt.axis([0, 5, 0, 4])\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"If we add a matrix full of identical vectors, we get a simple geometric translation:"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"H2 = np.array([\n",
" [-0.5, -0.5, -0.5, -0.5],\n",
" [ 0.4, 0.4, 0.4, 0.4]\n",
" ])\n",
"P_translated = P + H2\n",
"\n",
"plt.gca().add_artist(Polygon(P.T, alpha=0.2))\n",
"plt.gca().add_artist(Polygon(P_translated.T, alpha=0.3, color=\"r\"))\n",
"for vector, origin in zip(H2.T, P.T):\n",
" plot_vector2d(vector, origin=origin)\n",
"\n",
"plt.axis([0, 5, 0, 4])\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Although matrices can only be added together if they have the same size, NumPy allows adding a row vector or a column vector to a matrix: this is called *broadcasting* and is explained in further details in the [NumPy tutorial](tools_numpy.ipynb). We could have obtained the same result as above with:"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"P + [[-0.5], [0.4]] # same as P + H2, thanks to NumPy broadcasting"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"### Scalar multiplication\n",
"Multiplying a matrix by a scalar results in all its vectors being multiplied by that scalar, so unsurprisingly, the geometric result is a rescaling of the entire figure. For example, let's rescale our polygon by a factor of 60% (zooming out, centered on the origin):"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"def plot_transformation(P_before, P_after, text_before, text_after, axis = [0, 5, 0, 4], arrows=False):\n",
" if arrows:\n",
" for vector_before, vector_after in zip(P_before.T, P_after.T):\n",
" plot_vector2d(vector_before, color=\"blue\", linestyle=\"--\")\n",
" plot_vector2d(vector_after, color=\"red\", linestyle=\"-\")\n",
" plt.gca().add_artist(Polygon(P_before.T, alpha=0.2))\n",
" plt.gca().add_artist(Polygon(P_after.T, alpha=0.3, color=\"r\"))\n",
" plt.text(P_before[0].mean(), P_before[1].mean(), text_before, fontsize=18, color=\"blue\")\n",
" plt.text(P_after[0].mean(), P_after[1].mean(), text_after, fontsize=18, color=\"red\")\n",
" plt.axis(axis)\n",
" plt.grid()\n",
"\n",
"P_rescaled = 0.60 * P\n",
"plot_transformation(P, P_rescaled, \"$P$\", \"$0.6 P$\", arrows=True)\n",
2016-03-03 18:29:41 +01:00
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"### Matrix multiplication Projection onto an axis\n",
"Matrix multiplication is more complex to visualize, but it is also the most powerful tool in the box.\n",
2016-03-03 18:29:41 +01:00
"\n",
"Let's start simple, by defining a $1 \\times 2$ matrix $U = \\begin{bmatrix} 1 & 0 \\end{bmatrix}$. This row vector is just the horizontal unit vector."
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"U = np.array([[1, 0]])"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Now let's look at the dot product $P \\cdot U$:"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"U.dot(P)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"These are the horizontal coordinates of the vectors in $P$. In other words, we just projected $P$ onto the horizontal axis:"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"def plot_projection(U, P):\n",
" U_P = U.dot(P)\n",
" \n",
" axis_end = 100 * U\n",
" plot_vector2d(axis_end[0], color=\"black\")\n",
"\n",
" plt.gca().add_artist(Polygon(P.T, alpha=0.2))\n",
" for vector, proj_coordinate in zip(P.T, U_P.T):\n",
" proj_point = proj_coordinate * U\n",
" plt.plot(proj_point[0][0], proj_point[0][1], \"ro\")\n",
" plt.plot([vector[0], proj_point[0][0]], [vector[1], proj_point[0][1]], \"r--\")\n",
"\n",
" plt.axis([0, 5, 0, 4])\n",
" plt.grid()\n",
" plt.show()\n",
"\n",
"plot_projection(U, P)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"We can actually project on any other axis by just replacing $U$ with any other unit vector. For example, let's project on the axis that is at a 30° angle above the horizontal axis:"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"angle30 = 30 * np.pi / 180 # angle in radians\n",
"U_30 = np.array([[np.cos(angle30), np.sin(angle30)]])\n",
"\n",
"plot_projection(U_30, P)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Good! Remember that the dot product of a unit vector and a matrix basically performs a projection on an axis and gives us the coordinates of the resulting points on that axis."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"### Matrix multiplication Rotation\n",
2016-03-03 18:29:41 +01:00
"Now let's create a $2 \\times 2$ matrix $V$ containing two unit vectors that make 30° and 120° angles with the horizontal axis:\n",
"\n",
"$V = \\begin{bmatrix} \\cos(30°) & \\sin(30°) \\\\ \\cos(120°) & \\sin(120°) \\end{bmatrix}$"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"angle120 = 120 * np.pi / 180\n",
"V = np.array([\n",
" [np.cos(angle30), np.sin(angle30)],\n",
" [np.cos(angle120), np.sin(angle120)]\n",
" ])\n",
"V"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Let's look at the product $VP$:"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"V.dot(P)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"The first row is equal to $V_{1,*} P$, which is the coordinates of the projection of $P$ onto the 30° axis, as we have seen above. The second row is $V_{2,*} P$, which is the coordinates of the projection of $P$ onto the 120° axis. So basically we obtained the coordinates of $P$ after rotating the horizontal and vertical axes by 30° (or equivalently after rotating the polygon by -30° around the origin)! Let's plot $VP$ to see this:"
2016-03-03 18:29:41 +01:00
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"P_rotated = V.dot(P)\n",
"plot_transformation(P, P_rotated, \"$P$\", \"$VP$\", [-2, 6, -2, 4], arrows=True)\n",
2016-03-03 18:29:41 +01:00
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Matrix $V$ is called a **rotation matrix**."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"### Matrix multiplication Other linear transformations\n",
2016-03-03 18:29:41 +01:00
"More generally, any linear transformation $f$ that maps n-dimensional vectors to m-dimensional vectors can be represented as an $m \\times n$ matrix. For example, say $\\textbf{u}$ is a 3-dimensional vector:\n",
"\n",
"$\\textbf{u} = \\begin{pmatrix} x \\\\ y \\\\ z \\end{pmatrix}$\n",
"\n",
"and $f$ is defined as:\n",
"\n",
"$f(\\textbf{u}) = \\begin{pmatrix}\n",
"ax + by + cz \\\\\n",
"dx + ey + fz\n",
"\\end{pmatrix}$\n",
"\n",
"This transormation $f$ maps 3-dimensional vectors to 2-dimensional vectors in a linear way (ie. the resulting coordinates only involve sums of multiples of the original coordinates). We can represent this transformation as matrix $F$:\n",
"\n",
"$F = \\begin{bmatrix}\n",
"a & b & c \\\\\n",
"d & e & f\n",
"\\end{bmatrix}$\n",
"\n",
"Now, to compute $f(\\textbf{u})$ we can simply do a matrix multiplication:\n",
2016-03-03 18:29:41 +01:00
"\n",
"$f(\\textbf{u}) = F \\textbf{u}$\n",
2016-03-03 18:29:41 +01:00
"\n",
"If we have a matric $G = \\begin{bmatrix}\\textbf{u}_1 & \\textbf{u}_2 & \\cdots & \\textbf{u}_q \\end{bmatrix}$, where each $\\textbf{u}_i$ is a 3-dimensional column vector, then $FG$ results in the linear transformation of all vectors $\\textbf{u}_i$ as defined by the matrix $F$:\n",
2016-03-03 18:29:41 +01:00
"\n",
"$FG = \\begin{bmatrix}f(\\textbf{u}_1) & f(\\textbf{u}_2) & \\cdots & f(\\textbf{u}_q) \\end{bmatrix}$\n",
2016-03-03 18:29:41 +01:00
"\n",
"To summarize, the matrix on the left hand side of a dot product specifies what linear transormation to apply to the right hand side vectors. We have already shown that this can be used to perform projections and rotations, but any other linear transformation is possible. For example, here is a transformation known as a *shear mapping*:"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {
"collapsed": false,
2017-04-20 21:45:58 +02:00
"deletable": true,
"editable": true,
2016-03-03 18:29:41 +01:00
"scrolled": true
},
"outputs": [],
"source": [
"F_shear = np.array([\n",
" [1, 1.5],\n",
" [0, 1]\n",
" ])\n",
"plot_transformation(P, F_shear.dot(P), \"$P$\", \"$F_{shear} P$\",\n",
2016-03-03 18:29:41 +01:00
" axis=[0, 10, 0, 7])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Let's look at how this transformation affects the **unit square**: "
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"Square = np.array([\n",
" [0, 0, 1, 1],\n",
" [0, 1, 1, 0]\n",
" ])\n",
"plot_transformation(Square, F_shear.dot(Square), \"$Square$\", \"$F_{shear} Square$\",\n",
2016-03-03 18:29:41 +01:00
" axis=[0, 2.6, 0, 1.8])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Now let's look at a **squeeze mapping**:"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F_squeeze = np.array([\n",
" [1.4, 0],\n",
" [0, 1/1.4]\n",
" ])\n",
"plot_transformation(P, F_squeeze.dot(P), \"$P$\", \"$F_{squeeze} P$\",\n",
2016-03-03 18:29:41 +01:00
" axis=[0, 7, 0, 5])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"The effect on the unit square is:"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"plot_transformation(Square, F_squeeze.dot(Square), \"$Square$\", \"$F_{squeeze} Square$\",\n",
2016-03-03 18:29:41 +01:00
" axis=[0, 1.8, 0, 1.2])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Let's show a last one: reflection through the horizontal axis:"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F_reflect = np.array([\n",
" [1, 0],\n",
" [0, -1]\n",
" ])\n",
"plot_transformation(P, F_reflect.dot(P), \"$P$\", \"$F_{reflect} P$\",\n",
2016-03-03 18:29:41 +01:00
" axis=[-2, 9, -4.5, 4.5])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Matrix inverse\n",
"Now that we understand that a matrix can represent any linear transformation, a natural question is: can we find a transformation matrix that reverses the effect of a given transformation matrix $F$? The answer is yes… sometimes! When it exists, such a matrix is called the **inverse** of $F$, and it is noted $F^{-1}$.\n",
"\n",
"For example, the rotation, the shear mapping and the squeeze mapping above all have inverse transformations. Let's demonstrate this on the shear mapping:"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F_inv_shear = np.array([\n",
" [1, -1.5],\n",
" [0, 1]\n",
"])\n",
"P_sheared = F_shear.dot(P)\n",
"P_unsheared = F_inv_shear.dot(P_sheared)\n",
"plot_transformation(P_sheared, P_unsheared, \"$P_{sheared}$\", \"$P_{unsheared}$\",\n",
" axis=[0, 10, 0, 7])\n",
"plt.plot(P[0], P[1], \"b--\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"We applied a shear mapping on $P$, just like we did before, but then we applied a second transformation to the result, and *lo and behold* this had the effect of coming back to the original $P$ (we plotted the original $P$'s outline to double check). The second transformation is the inverse of the first one.\n",
"\n",
"We defined the inverse matrix $F_{shear}^{-1}$ manually this time, but NumPy provides an `inv` function to compute a matrix's inverse, so we could have written instead:"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F_inv_shear = LA.inv(F_shear)\n",
"F_inv_shear"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Only square matrices can be inversed. This makes sense when you think about it: if you have a transformation that reduces the number of dimensions, then some information is lost and there is no way that you can get it back. For example say you use a $2 \\times 3$ matrix to project a 3D object onto a plane. The result may look like this:"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"plt.plot([0, 0, 1, 1, 0, 0.1, 0.1, 0, 0.1, 1.1, 1.0, 1.1, 1.1, 1.0, 1.1, 0.1],\n",
" [0, 1, 1, 0, 0, 0.1, 1.1, 1.0, 1.1, 1.1, 1.0, 1.1, 0.1, 0, 0.1, 0.1],\n",
" \"r-\")\n",
"plt.axis([-0.5, 2.1, -0.5, 1.5])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Looking at this image, it is impossible to tell whether this is the projection of a cube or the projection of a narrow rectangular object. Some information has been lost in the projection.\n",
"\n",
"Even square transformation matrices can lose information. For example, consider this transformation matrix:"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F_project = np.array([\n",
" [1, 0],\n",
" [0, 0]\n",
" ])\n",
"plot_transformation(P, F_project.dot(P), \"$P$\", \"$F_{project} \\cdot P$\",\n",
" axis=[0, 6, -1, 4])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"This transformation matrix performs a projection onto the horizontal axis. Our polygon gets entirely flattened out so some information is entirely lost and it is impossible to go back to the original polygon using a linear transformation. In other words, $F_{project}$ has no inverse. Such a square matrix that cannot be inversed is called a **singular matrix** (aka degenerate matrix). If we ask NumPy to calculate its inverse, it raises an exception:"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"try:\n",
" LA.inv(F_project)\n",
"except LA.LinAlgError as e:\n",
" print(\"LinAlgError:\", e)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Here is another example of a singular matrix. This one performs a projection onto the axis at a 30° angle above the horizontal axis:"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"angle30 = 30 * np.pi / 180\n",
"F_project_30 = np.array([\n",
" [np.cos(angle30)**2, np.sin(2*angle30)/2],\n",
" [np.sin(2*angle30)/2, np.sin(angle30)**2]\n",
" ])\n",
"plot_transformation(P, F_project_30.dot(P), \"$P$\", \"$F_{project\\_30} \\cdot P$\",\n",
" axis=[0, 6, -1, 4])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"But this time, due to floating point rounding errors, NumPy manages to calculate an inverse (notice how large the elements are, though):"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"LA.inv(F_project_30)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"As you might expect, the dot product of a matrix by its inverse results in the identity matrix:\n",
"\n",
"$M \\cdot M^{-1} = M^{-1} \\cdot M = I$\n",
"\n",
"This makes sense since doing a linear transformation followed by the inverse transformation results in no change at all."
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F_shear.dot(LA.inv(F_shear))"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Another way to express this is that the inverse of the inverse of a matrix $M$ is $M$ itself:\n",
"\n",
"$((M)^{-1})^{-1} = M$"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"LA.inv(LA.inv(F_shear))"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Also, the inverse of scaling by a factor of $\\lambda$ is of course scaling by a factor or $\\frac{1}{\\lambda}$:\n",
"\n",
"$ (\\lambda \\times M)^{-1} = \\frac{1}{\\lambda} \\times M^{-1}$\n",
"\n",
"Once you understand the geometric interpretation of matrices as linear transformations, most of these properties seem fairly intuitive.\n",
"\n",
"A matrix that is its own inverse is called an **involution**. The simplest examples are reflection matrices, or a rotation by 180°, but there are also more complex involutions, for example imagine a transformation that squeezes horizontally, then reflects over the vertical axis and finally rotates by 90° clockwise. Pick up a napkin and try doing that twice: you will end up in the original position. Here is the corresponding involutory matrix:"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F_involution = np.array([\n",
" [0, -2],\n",
" [-1/2, 0]\n",
" ])\n",
"plot_transformation(P, F_involution.dot(P), \"$P$\", \"$F_{involution} \\cdot P$\",\n",
" axis=[-8, 5, -4, 4])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Finally, a square matrix $H$ whose inverse is its own transpose is an **orthogonal matrix**:\n",
"\n",
"$H^{-1} = H^T$\n",
"\n",
"Therefore:\n",
"\n",
"$H \\cdot H^T = H^T \\cdot H = I$\n",
"\n",
"It corresponds to a transformation that preserves distances, such as rotations and reflections, and combinations of these, but not rescaling, shearing or squeezing. Let's check that $F_{reflect}$ is indeed orthogonal:"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F_reflect.dot(F_reflect.T)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Determinant\n",
"The determinant of a square matrix $M$, noted $\\det(M)$ or $\\det M$ or $|M|$ is a value that can be calculated from its elements $(M_{i,j})$ using various equivalent methods. One of the simplest methods is this recursive approach:\n",
"\n",
"$|M| = M_{1,1}\\times|M^{(1,1)}| - M_{2,1}\\times|M^{(2,1)}| + M_{3,1}\\times|M^{(3,1)}| - M_{4,1}\\times|M^{(4,1)}| + \\cdots ± M_{n,1}\\times|M^{(n,1)}|$\n",
"\n",
"* Where $M^{(i,j)}$ is the matrix $M$ without row $i$ and column $j$.\n",
"\n",
"For example, let's calculate the determinant of the following $3 \\times 3$ matrix:\n",
"\n",
"$M = \\begin{bmatrix}\n",
" 1 & 2 & 3 \\\\\n",
" 4 & 5 & 6 \\\\\n",
" 7 & 8 & 0\n",
"\\end{bmatrix}$\n",
"\n",
"Using the method above, we get:\n",
"\n",
"$|M| = 1 \\times \\left | \\begin{bmatrix} 5 & 6 \\\\ 8 & 0 \\end{bmatrix} \\right |\n",
" - 2 \\times \\left | \\begin{bmatrix} 4 & 6 \\\\ 7 & 0 \\end{bmatrix} \\right |\n",
" + 3 \\times \\left | \\begin{bmatrix} 4 & 5 \\\\ 7 & 8 \\end{bmatrix} \\right |$\n",
"\n",
"Now we need to compute the determinant of each of these $2 \\times 2$ matrices (these determinants are called **minors**):\n",
"\n",
"$\\left | \\begin{bmatrix} 5 & 6 \\\\ 8 & 0 \\end{bmatrix} \\right | = 5 \\times 0 - 6 \\times 8 = -48$\n",
"\n",
"$\\left | \\begin{bmatrix} 4 & 6 \\\\ 7 & 0 \\end{bmatrix} \\right | = 4 \\times 0 - 6 \\times 7 = -42$\n",
"\n",
"$\\left | \\begin{bmatrix} 4 & 5 \\\\ 7 & 8 \\end{bmatrix} \\right | = 4 \\times 8 - 5 \\times 7 = -3$\n",
"\n",
"Now we can calculate the final result:\n",
"\n",
"$|M| = 1 \\times (-48) - 2 \\times (-42) + 3 \\times (-3) = 27$"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"To get the determinant of a matrix, you can call NumPy's `det` function in the `numpy.linalg` module:"
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"M = np.array([\n",
" [1, 2, 3],\n",
" [4, 5, 6],\n",
" [7, 8, 0]\n",
" ])\n",
"LA.det(M)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"One of the main uses of the determinant is to *determine* whether a square matrix can be inversed or not: if the determinant is equal to 0, then the matrix *cannot* be inversed (it is a singular matrix), and if the determinant is not 0, then it *can* be inversed.\n",
"\n",
"For example, let's compute the determinant for the $F_{project}$, $F_{project\\_30}$ and $F_{shear}$ matrices that we defined earlier:"
]
},
{
"cell_type": "code",
"execution_count": 114,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"LA.det(F_project)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"That's right, $F_{project}$ is singular, as we saw earlier."
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"LA.det(F_project_30)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"This determinant is suspiciously close to 0: it really should be 0, but it's not due to tiny floating point errors. The matrix is actually singular."
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"LA.det(F_shear)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Perfect! This matrix *can* be inversed as we saw earlier. Wow, math really works!"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"The determinant can also be used to measure how much a linear transformation affects surface areas: for example, the projection matrices $F_{project}$ and $F_{project\\_30}$ completely flatten the polygon $P$, until its area is zero. This is why the determinant of these matrices is 0. The shear mapping modified the shape of the polygon, but it did not affect its surface area, which is why the determinant is 1. You can try computing the determinant of a rotation matrix, and you should also find 1. What about a scaling matrix? Let's see:"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F_scale = np.array([\n",
" [0.5, 0],\n",
" [0, 0.5]\n",
" ])\n",
"plot_transformation(P, F_scale.dot(P), \"$P$\", \"$F_{scale} \\cdot P$\",\n",
" axis=[0, 6, -1, 4])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"We rescaled the polygon by a factor of 1/2 on both vertical and horizontal axes so the surface area of the resulting polygon is 1/4$^{th}$ of the original polygon. Let's compute the determinant and check that:"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"LA.det(F_scale)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Correct!\n",
"\n",
"The determinant can actually be negative, when the transformation results in a \"flipped over\" version of the original polygon (eg. a left hand glove becomes a right hand glove). For example, the determinant of the `F_reflect` matrix is -1 because the surface area is preserved but the polygon gets flipped over:"
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"LA.det(F_reflect)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Composing linear transformations\n",
"Several linear transformations can be chained simply by performing multiple dot products in a row. For example, to perform a squeeze mapping followed by a shear mapping, just write:"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"P_squeezed_then_sheared = F_shear.dot(F_squeeze.dot(P))"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Since the dot product is associative, the following code is equivalent:"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"P_squeezed_then_sheared = (F_shear.dot(F_squeeze)).dot(P)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Note that the order of the transformations is the reverse of the dot product order.\n",
"\n",
"If we are going to perform this composition of linear transformations more than once, we might as well save the composition matrix like this:"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F_squeeze_then_shear = F_shear.dot(F_squeeze)\n",
"P_squeezed_then_sheared = F_squeeze_then_shear.dot(P)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"From now on we can perform both transformations in just one dot product, which can lead to a very significant performance boost."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"What if you want to perform the inverse of this double transformation? Well, if you squeezed and then you sheared, and you want to undo what you have done, it should be obvious that you should unshear first and then unsqueeze. In more mathematical terms, given two invertible (aka nonsingular) matrices $Q$ and $R$:\n",
"\n",
"$(Q \\cdot R)^{-1} = R^{-1} \\cdot Q^{-1}$\n",
"\n",
"And in NumPy:"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"LA.inv(F_shear.dot(F_squeeze)) == LA.inv(F_squeeze).dot(LA.inv(F_shear))"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Singular Value Decomposition\n",
"It turns out that any $m \\times n$ matrix $M$ can be decomposed into the dot product of three simple matrices:\n",
"* a rotation matrix $U$ (an $m \\times m$ orthogonal matrix)\n",
"* a scaling & projecting matrix $\\Sigma$ (an $m \\times n$ diagonal matrix)\n",
"* and another rotation matrix $V^T$ (an $n \\times n$ orthogonal matrix)\n",
"\n",
"$M = U \\cdot \\Sigma \\cdot V^{T}$\n",
"\n",
"For example, let's decompose the shear transformation:"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"U, S_diag, V_T = LA.svd(F_shear) # note: in python 3 you can rename S_diag to Σ_diag\n",
"U"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"S_diag"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Note that this is just a 1D array containing the diagonal values of Σ. To get the actual matrix Σ, we can use NumPy's `diag` function:"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"S = np.diag(S_diag)\n",
"S"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Now let's check that $U \\cdot \\Sigma \\cdot V^T$ is indeed equal to `F_shear`:"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"U.dot(np.diag(S_diag)).dot(V_T)"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"F_shear"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"It worked like a charm. Let's apply these transformations one by one (in reverse order) on the unit square to understand what's going on. First, let's apply the first rotation $V^T$:"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"plot_transformation(Square, V_T.dot(Square), \"$Square$\", \"$V^T \\cdot Square$\",\n",
" axis=[-0.5, 3.5 , -1.5, 1.5])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Now let's rescale along the vertical and horizontal axes using $\\Sigma$:"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"plot_transformation(V_T.dot(Square), S.dot(V_T).dot(Square), \"$V^T \\cdot Square$\", \"$\\Sigma \\cdot V^T \\cdot Square$\",\n",
" axis=[-0.5, 3.5 , -1.5, 1.5])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Finally, we apply the second rotation $U$:"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"plot_transformation(S.dot(V_T).dot(Square), U.dot(S).dot(V_T).dot(Square),\"$\\Sigma \\cdot V^T \\cdot Square$\", \"$U \\cdot \\Sigma \\cdot V^T \\cdot Square$\",\n",
" axis=[-0.5, 3.5 , -1.5, 1.5])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"And we can see that the result is indeed a shear mapping of the original unit square."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Eigenvectors and eigenvalues\n",
"An **eigenvector** of a square matrix $M$ (also called a **characteristic vector**) is a non-zero vector that remains on the same line after transformation by the linear transformation associated with $M$. A more formal definition is any vector $v$ such that:\n",
"\n",
"$M \\cdot v = \\lambda \\times v$\n",
"\n",
"Where $\\lambda$ is a scalar value called the **eigenvalue** associated to the vector $v$.\n",
"\n",
"For example, any horizontal vector remains horizontal after applying the shear mapping (as you can see on the image above), so it is an eigenvector of $M$. A vertical vector ends up tilted to the right, so vertical vectors are *NOT* eigenvectors of $M$.\n",
"\n",
"If we look at the squeeze mapping, we find that any horizontal or vertical vector keeps its direction (although its length changes), so all horizontal and vertical vectors are eigenvectors of $F_{squeeze}$.\n",
"\n",
"However, rotation matrices have no eigenvectors at all (except if the rotation angle is 0° or 180°, in which case all non-zero vectors are eigenvectors).\n",
"\n",
"NumPy's `eig` function returns the list of unit eigenvectors and their corresponding eigenvalues for any square matrix. Let's look at the eigenvectors and eigenvalues of the squeeze mapping matrix $F_{squeeze}$:"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"eigenvalues, eigenvectors = LA.eig(F_squeeze)\n",
"eigenvalues # [λ0, λ1, …]"
]
},
{
"cell_type": "code",
"execution_count": 133,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"eigenvectors # [v0, v1, …]"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Indeed the horizontal vectors are stretched by a factor of 1.4, and the vertical vectors are shrunk by a factor of 1/1.4=0.714…, so far so good. Let's look at the shear mapping matrix $F_{shear}$:"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"eigenvalues2, eigenvectors2 = LA.eig(F_shear)\n",
"eigenvalues2 # [λ0, λ1, …]"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"eigenvectors2 # [v0, v1, …]"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"Wait, what!? We expected just one unit eigenvector, not two. The second vector is almost equal to $\\begin{pmatrix}-1 \\\\ 0 \\end{pmatrix}$, which is on the same line as the first vector $\\begin{pmatrix}1 \\\\ 0 \\end{pmatrix}$. This is due to floating point errors. We can safely ignore vectors that are (almost) colinear (ie. on the same line)."
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"## Trace\n",
"The trace of a square matrix $M$, noted $tr(M)$ is the sum of the values on its main diagonal. For example:"
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"D = np.array([\n",
" [100, 200, 300],\n",
" [ 10, 20, 30],\n",
" [ 1, 2, 3],\n",
" ])\n",
"np.trace(D)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"The trace does not have a simple geometric interpretation (in general), but it has a number of properties that make it useful in many areas:\n",
"* $tr(A + B) = tr(A) + tr(B)$\n",
"* $tr(A \\cdot B) = tr(B \\cdot A)$\n",
"* $tr(A \\cdot B \\cdot \\cdots \\cdot Y \\cdot Z) = tr(Z \\cdot A \\cdot B \\cdot \\cdots \\cdot Y)$\n",
"* $tr(A^T \\cdot B) = tr(A \\cdot B^T) = tr(B^T \\cdot A) = tr(B \\cdot A^T) = \\sum_{i,j}X_{i,j} \\times Y_{i,j}$\n",
"* …\n",
"\n",
"It does, however, have a useful geometric interpretation in the case of projection matrices (such as $F_{project}$ that we discussed earlier): it corresponds to the number of dimensions after projection. For example:"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {
2017-04-20 21:45:58 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-03-03 18:29:41 +01:00
},
"outputs": [],
"source": [
"np.trace(F_project)"
]
},
{
"cell_type": "markdown",
2017-04-20 21:45:58 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-03-03 18:29:41 +01:00
"source": [
"# What next?\n",
2017-04-20 21:45:58 +02:00
"This concludes this introduction to Linear Algebra. Although these basics cover most of what you will need to know for Machine Learning, if you wish to go deeper into this topic there are many options available: Linear Algebra [books](http://linear.axler.net/), [Khan Academy](https://www.khanacademy.org/math/linear-algebra) lessons, or just [Wikipedia](https://en.wikipedia.org/wiki/Linear_algebra) pages. "
2016-03-03 18:29:41 +01:00
]
2017-04-20 21:45:58 +02:00
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
2016-03-03 18:29:41 +01:00
}
],
"metadata": {
"kernelspec": {
2017-04-20 21:45:58 +02:00
"display_name": "Python 3",
2016-03-03 18:29:41 +01:00
"language": "python",
2017-04-20 21:45:58 +02:00
"name": "python3"
2016-03-03 18:29:41 +01:00
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
2017-04-20 21:45:58 +02:00
"version": 3
2016-03-03 18:29:41 +01:00
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
2017-04-20 21:45:58 +02:00
"pygments_lexer": "ipython3",
"version": "3.5.3"
2016-03-03 18:29:41 +01:00
},
"toc": {
"toc_cell": false,
"toc_number_sections": true,
"toc_threshold": 6,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 0
}