handson-ml/tools_numpy.ipynb

1433 lines
30 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tools - NumPy\n",
"*NumPy is the fundamental library for scientific computing with Python. NumPy is centered around a powerful N-dimensional array object, and it also contains useful linear algebra, Fourier transform, and random number functions.*\n",
"\n",
"## Creating arrays\n",
"First let's import `numpy`:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `zeros` function creates an array containing any number of zeros:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print np.zeros(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's just as easy to create a 2D array (ie. a matrix) by providing a tuple with the desired number of rows and columns. For example, here's a 3x4 matrix:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print np.zeros((3,4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some vocabulary:\n",
"\n",
"* In NumPy, each dimension is called an **axis**.\n",
"* The number of axes is called the **rank**.\n",
" * For example, the above 3x4 matrix is an array of rank 2 (it is 2-dimensional).\n",
" * The first axis has length 3, the second has length 4.\n",
"* An array's list of axis lengths is called the **shape** of the array.\n",
" * For example, the above matrix's shape is `(3, 4)`.\n",
" * The rank is equal to the shape's length.\n",
"* The **size** of an array is the total number of elements, which is the product of all axis lengths (eg. 3*4=12)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a = np.zeros((3,4))\n",
"print a\n",
"print \"Shape:\", a.shape\n",
"print \"Rank:\", a.ndim\n",
"print \"Size:\", a.size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also create an N-dimensional array of arbitrary rank. For example, here's a 3D array (rank=3), with shape `(2,3,4)`:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print np.zeros((2,3,4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at the type of these arrays:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print type(np.zeros((3,4)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Many other NumPy functions create `ndarrays`.\n",
"\n",
"Here's a 3x4 matrix full of ones:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print np.ones((3,4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An uninitialized 2x3 array (its content is not predictable, as it is whatever is in memory at that point):"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"print np.empty((2,3))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course you can initialize an `ndarray` using a regular python array (or any iterable). Just call the `array` function:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a = np.array([[1,2,3,4], [10, 20, 30, 40]])\n",
"print type(a)\n",
"print a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can create an `ndarray` using NumPy's `range` function, which is similar to python's built-in `range` function:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"print np.arange(1, 5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It also works with floats:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print np.arange(1.0, 5.0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course you can provide a step parameter:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print np.arange(1, 5, 0.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, when dealing with floats, the exact number of elements in the array is not always predictible. For example, consider this:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print np.arange(0, 5/3.0, 1/3.0) # depending on floating point errors, the max value is 4/3.0 or 5/3.0.\n",
"print np.arange(0, 5/3.0, 0.333333333)\n",
"print np.arange(0, 5/3.0, 0.333333334)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For this reason, it is generally preferable to use the `linspace` function instead of `arange` when working with floats. The `linspace` function returns an array containing a specific number of points evenly distributed between two values (note that the maximum value is *included*, contrary to `arange`):"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print np.linspace(0, 5/3.0, 6)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A number of functions are available in NumPy's `random` module to create `ndarray`s initialized with random values.\n",
"For example, here is a 3x4 matrix initialized with random floats between 0 and 1 (uniform distribution):"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print np.random.rand(3,4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's a 3x4 matrix containing random floats sampled from a univariate [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) (Gaussian distribution) of mean 0 and variance 1:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print np.random.randn(3,4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To give you a feel of what these distributions look like, let's use matplotlib (see the [matplotlib tutorial](tools_matplotlib.ipynb) for more details):"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"plt.hist(np.random.rand(100000), normed=True, bins=100, histtype=\"step\", color=\"blue\", label=\"rand\")\n",
"plt.hist(np.random.randn(100000), normed=True, bins=100, histtype=\"step\", color=\"red\", label=\"randn\")\n",
"plt.axis([-2.5, 2.5, 0, 1.1])\n",
"plt.legend(loc = \"upper left\")\n",
"plt.title(\"Random distributions\")\n",
"plt.xlabel(\"Value\")\n",
"plt.ylabel(\"Density\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also initialize an `ndarray` using a function:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def my_function(z, y, x):\n",
" return x * y + z\n",
"\n",
"print np.fromfunction(my_function, (3, 2, 10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"NumPy first creates three `ndarrays` (one per dimension), each of shape `(3, 2, 10)`. Each array has values equal to the coordinate along a specific axis. For example, all elements in the `z` array are equal to the z-coordinate:\n",
"\n",
" [[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n",
" [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]\n",
" \n",
" [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n",
" [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]\n",
" \n",
" [[ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]\n",
" [ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]]]\n",
"\n",
"This means that `my_function` is only called once, so the initialization is very efficient."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Array data\n",
"NumPy's `ndarray`s are very efficient in part because all their elements must have the same type (usually numbers).\n",
"You can check what the data type is by looking at the `dtype` attribute:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"a = np.arange(1, 5)\n",
"print a.dtype, a\n",
"\n",
"b = np.arange(1.0, 5.0)\n",
"print b.dtype, b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead of letting NumPy guess what data type to use, you can set it explicitly when creating an array by setting the `dtype` parameter:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a = np.arange(1, 5, dtype=np.complex64)\n",
"print a.dtype, a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Available data types include int8, int16, int32, int64, uint8/16/32/64, float16/32/64 and complex64/128. Check out [the documentation](http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html) for the full list.\n",
"\n",
"The `itemsize` attribute returns the size (in bytes) of each item:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a = np.arange(1, 5, dtype=np.complex64)\n",
"print a.itemsize"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An array's data is actually stored in memory as a flat (one dimensional) byte buffer. It is available *via* the `data` attribute (you will rarely need it, though)."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [],
"source": [
"a = np.array([[1,2],[1000, 2000]], dtype=np.int32)\n",
"print \"Array:\"\n",
"print a\n",
"print \"Raw data:\"\n",
"print [ord(c) for c in a.data]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Several `ndarrays` can share the same data buffer, meaning that modifying one will also modify the others. We will see an example in a minute."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reshaping an array\n",
"Changing the shape of an `ndarray` is as simple as setting its `shape` attribute. However, the array's size must remain the same."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a = np.arange(24)\n",
"print a\n",
"print \"Rank:\", a.ndim"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a.shape = (6, 4)\n",
"print a\n",
"print \"Rank:\", a.ndim"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"a.shape = (2, 3, 4)\n",
"print a\n",
"print \"Rank:\", a.ndim"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `reshape` function returns a new `ndarray` object pointing to the *same* data. This means that modifying one array will also modify the other."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"a2 = a.reshape(4,6)\n",
"print a2\n",
"print \"Rank:\", a2.ndim"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Set item at row 1, col 2 to 999 (more about indexing below)."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a2[1, 2] = 999\n",
"print a2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The corresponding element in a has been modified."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, the `ravel` function returns a new one-dimensional `ndarray` that also points to the same data:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a3 = a.ravel()\n",
"print a3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Array indexing\n",
"One-dimensional NumPy arrays can be accessed more or less like regular python arrays:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a = np.array([1, 5, 3, 19, 13, 7, 3])\n",
"print a[3]"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print a[2:5]"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print a[2:-1]"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print a[:2]"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print a[2::2]"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print a[::-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, you can modify elements:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a[3]=999\n",
"print a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also modify an `ndarray` slice:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a[2:5] = [997, 998, 999]\n",
"print a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And if you assign a single value, it is copied across the whole slice (this is called *broadcasting*, more on this below):"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a[2:5] = -1\n",
"print a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Contrary to regular python arrays, you cannot grow or shrink `ndarray`s this way:"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [],
"source": [
"try:\n",
" a[2:5] = [1,2,3,4,5,6] # too long\n",
"except ValueError, e:\n",
" print e"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You cannot delete elements either:"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"try:\n",
" del a[2:5]\n",
"except ValueError, e:\n",
" print e"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Multi-dimensional arrays can be accessed in a similar way by providing an index or slice for each axis, separated by commas:"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"b = np.arange(48).reshape(4, 12)\n",
"print b"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print b[1, 2] # row 1, col 2"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print b[1, :] # row 1, all columns"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print b[:, 1] # all rows, column 1"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [],
"source": [
"print b[(0,2), 2:5] # rows 0 and 2, columns 2 to 4 (5-1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also provide an `ndarray` of boolean values to specify the indices that you want to access. This will come in handy later:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"bools = np.array([True, False, True, False])\n",
"print b[bools, :] # Rows 0 and 2, all columns. Equivalent to b[(1, 3), :]"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print b[b % 3 == 1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note the subtle difference between these two expressions: "
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print b[1, :]\n",
"print b[1:2, :]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The first expression returns row 1 as a 1D array of shape `(12,)`, while the second returns that same row as a 2D array of shape `(1, 12)`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Everything works just as well with higher dimension arrays:"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"c = b.reshape(4,2,6)\n",
"print c"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print c[2, 1, 4] # matrix 2, row 1, col 4"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print c[2, :, 3] # matrix 2, all rows, col 3"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you omit coordinates for some axes, then all elements in these axes are returned:"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print c[2, 1] # Return matrix 2, row 1, all columns. This is equivalent to c[2, 1, :]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You may also write an ellipsis (`...`) to specify that all non-specified axes must be entirely included."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print c[2, ...] # matrix 2, all rows, all columns. This is equivalent to c[2, :, :]"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print c[2, 1, ...] # matrix 2, row 1, all columns. This is equivalent to c[2, 1, :]"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print c[2, ..., 3] # matrix 2, all rows, column 3. This is equivalent to c[2, :, 3]"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [],
"source": [
"print c[..., 3] # all matrices, all rows, column 3. This is equivalent to c[:, :, 3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Broadcasting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we discussed above, assigning to an `ndarray` slice requires an `ndarray` of the same shape as the slice. In general, when NumPy expects arrays of the same shape but finds that this is not the case, it applies the so-called *broadcasting* rules:\n",
"\n",
"**First rule**: if the arrays do not have the same rank, then a 1 will be prepended to the smaller ranking arrays until their ranks match.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a = np.arange(10).reshape(1, 1, 10)\n",
"print a"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print \"Slice:\", a[..., 2:4]\n",
"print \"Shape:\", a[..., 2:4].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's try to assign a 1D array of shape `(2,)` to this 3D array of shape `(1,1,2)`. Applying the first rule of broadcasting!"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"a[..., 2:4] = [55, 66] # acts as [[[55, 56]]]\n",
"print a"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Second rule**: arrays with a 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is repeated along that dimension."
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"b = np.arange(10).reshape(2, 5)\n",
"print b"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print \"Slice:\"\n",
"print b[..., 1:4]\n",
"print \"Shape:\", b[..., 1:4].shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try to assign a 2D array of shape `(2,1)` to this slice of shape `(2, 3)`. NumPy will apply the second rule of broadcasting:"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"b[..., 1:4] = [[44], [55]] # acts as [[44, 44, 44], [55, 55, 55]]\n",
"print b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Combining rules 1 & 2, we can do this:"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"b[..., 1:4] = [66, 77, 88] # after rule 1: [[66, 77, 88]], and after rule 2: [[66, 77, 88], [66, 77, 88]]\n",
"print b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And also, very simply:"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"b[..., 1:4] = 99 # after rule 1: [[99]], and after rule 2: [[99, 99, 99], [99, 99, 99]]\n",
"print b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Third rule**: after rules 1 & 2, the sizes of all arrays must match."
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"try:\n",
" b[..., 1:4] = [33, 44]\n",
"except ValueError, e:\n",
" print e"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Broadcasting rules are used in many NumPy operations, not just assignment, as we will see below.\n",
"For more details about broadcasting, check out [the documentation](https://docs.scipy.org/doc/numpy-dev/user/basics.broadcasting.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Iterating\n",
"Iterating over `ndarray`s is very similar to iterating over regular python arrays. Note that iterating over multidimensional arrays is done with respect to the first axis."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"c = np.arange(24).reshape(2, 2, 6) # A 3D array (composed of two 2x6 matrices)\n",
"print c"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"for m in c:\n",
" print \"Item:\"\n",
" print m"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"for i in range(len(c)): # Note that len(c) == c.shape[0]\n",
" print \"Item:\"\n",
" print c[i]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to iterate on *all* elements in the `ndarray`, simply iterate over the `flat` attribute:"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"for i in c.flat:\n",
" print \"Item:\", i"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Arithmetic operations\n",
"All the usual arithmetic operators (`+`, `-`, `*`, `/`, `**`, etc.) can be used with `ndarray`s. They apply elementwise:"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [],
"source": [
"a = np.array([14, 23, 32, 41])\n",
"b = np.array([5, 4, 3, 2])\n",
"print \"a + b =\", a + b\n",
"print \"a - b =\", a - b\n",
"print \"a * b =\", a * b\n",
"print \"a / b =\", a / b\n",
"print \"a % b =\", a % b\n",
"print \"a ** b =\", a ** b"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that the multiplication is *not* a matrix multiplication. We will discuss matrix operations below.\n",
"\n",
"The arrays must have the same shape. If they do not, NumPy will apply the broadcasting rules, as discussed above."
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print a * 3 # thanks to broadcasting, this is equivalent to: a * [3, 3, 3, 3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The conditional operators also apply elementwise:"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print a < [15, 16, 35, 36]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And using broadcasting:"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print a < 25 # equivalent to a < [25, 25, 25, 25]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is most useful in conjunction with boolean indexing:"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print a[a < 25]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that all matching elements are returned as a 1D array, no matter the original array's shape:"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"p = np.fromfunction(lambda row, col: row*col, (3,6))\n",
"print p"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print p[p%3 == 1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is possible (and quite convenient) to use boolean indexing and assignment jointly:"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"p[p%3 == 1] = 99\n",
"print p"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## To be continued..."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}