2019-01-15 05:36:29 +01:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-23 03:42:16 +01:00
"**Chapter 8 – Dimensionality Reduction**"
2021-10-15 10:46:27 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-23 03:42:16 +01:00
"_This notebook contains all the sample code and solutions to the exercises in chapter 8._"
2019-01-15 05:36:29 +01:00
]
},
2019-11-05 15:26:52 +01:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<table align=\"left\">\n",
" <td>\n",
2021-11-23 03:42:16 +01:00
" <a href=\"https://colab.research.google.com/github/ageron/handson-ml3/blob/main/08_dimensionality_reduction.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
2019-11-05 15:26:52 +01:00
" </td>\n",
2021-05-25 21:31:19 +02:00
" <td>\n",
2021-11-23 03:42:16 +01:00
" <a target=\"_blank\" href=\"https://kaggle.com/kernels/welcome?src=https://github.com/ageron/handson-ml3/blob/main/08_dimensionality_reduction.ipynb\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" /></a>\n",
2021-05-25 21:31:19 +02:00
" </td>\n",
2019-11-05 15:26:52 +01:00
"</table>"
]
},
2019-01-15 05:36:29 +01:00
{
"cell_type": "markdown",
2021-11-19 06:03:48 +01:00
"metadata": {
"tags": []
},
2019-01-15 05:36:29 +01:00
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2022-02-19 11:03:20 +01:00
"This project requires Python 3.7 or above:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
2019-01-16 16:42:00 +01:00
"import sys\n",
2019-01-15 05:36:29 +01:00
"\n",
2022-02-19 11:03:20 +01:00
"assert sys.version_info >= (3, 7)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"It also requires Scikit-Learn ≥ 1.0.1:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2019-04-15 18:06:57 +02:00
"execution_count": 2,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"import sklearn\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"assert sklearn.__version__ >= \"1.0.1\""
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"As we did in previous chapters, let's define the default font sizes to make the figures prettier:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2019-04-15 18:06:57 +02:00
"execution_count": 3,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-27 11:03:26 +01:00
"import matplotlib.pyplot as plt\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-27 11:03:26 +01:00
"plt.rc('font', size=14)\n",
"plt.rc('axes', labelsize=14, titlesize=14)\n",
"plt.rc('legend', fontsize=14)\n",
2022-02-19 06:17:36 +01:00
"plt.rc('xtick', labelsize=10)\n",
"plt.rc('ytick', labelsize=10)"
2019-01-15 05:36:29 +01:00
]
},
2021-10-03 12:05:49 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"And let's create the `images/dim_reduction` folder (if it doesn't already exist), and define the `save_fig()` function which is used through this notebook to save the figures in high-res for the book:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 4,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"from pathlib import Path\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"IMAGES_PATH = Path() / \"images\" / \"dim_reduction\"\n",
"IMAGES_PATH.mkdir(parents=True, exist_ok=True)\n",
"\n",
"def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n",
" path = IMAGES_PATH / f\"{fig_id}.{fig_extension}\"\n",
" if tight_layout:\n",
" plt.tight_layout()\n",
" plt.savefig(path, format=fig_extension, dpi=resolution)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"# PCA"
2019-01-15 05:36:29 +01:00
]
},
2021-11-19 11:36:04 +01:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This chapter starts with several figures to explain the concepts of PCA and Manifold Learning. Below is the code to generate these figures. You can skip directly to the [Principal Components](#Principal-Components) section below if you want."
]
},
2019-01-15 05:36:29 +01:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-27 11:03:26 +01:00
"Let's generate a small 3D dataset. It's an oval shape, rotated in 3D space, with points distributed unevenly, and with quite a lot of noise:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 5,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2022-02-19 06:17:36 +01:00
"# extra code\n",
2021-11-19 06:03:48 +01:00
"\n",
"import numpy as np\n",
2021-11-27 11:03:26 +01:00
"from scipy.spatial.transform import Rotation\n",
2021-11-19 06:03:48 +01:00
"\n",
"m = 60\n",
2021-11-27 11:03:26 +01:00
"X = np.zeros((m, 3)) # initialize 3D dataset\n",
"np.random.seed(42)\n",
"angles = (np.random.rand(m) ** 3 + 0.5) * 2 * np.pi # uneven distribution\n",
"X[:, 0], X[:, 1] = np.cos(angles), np.sin(angles) * 0.5 # oval\n",
"X += 0.28 * np.random.randn(m, 3) # add more noise\n",
"X = Rotation.from_rotvec([np.pi / 29, -np.pi / 20, np.pi / 4]).apply(X)\n",
"X += [0.2, 0, 0.2] # shift a bit"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Plot the 3D dataset, with the projection plane."
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 6,
2021-11-19 11:36:04 +01:00
"metadata": {
"tags": []
},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAhMAAAH3CAYAAAABnt0tAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOy9eXgkV3n2fVf13lKru6XROtLs+4ykkWY0NjAYGy/wGTzGLAaDX8fm5QsmgO2YOGxO+EhwyBsnDhicQAKYxbEJkLyMDcbBGByCDR7bs2ib0Wi07xqpu6Xeu6vqfH+Uqrq6Vb1XbzPnd11zzUwvVae6q+vc9ZznuR+GEAIKhUKhUCiUXGFLPQAKhUKhUCiVDRUTFAqFQqFQ8oKKCQqFQqFQKHlBxQSFQqFQKJS8oGKCQqFQKBRKXlAxQaFQKBQKJS/0aZ6ndaMUCoVCoZQnTKkHIEEjExQKhUKhUPKCigkKhUKhUCh5QcUEhUKhUCiUvKBigkKhUCgUSl5QMUGhUCgUCiUvqJigUCgUCoWSF1RMUCgUCoVCyQsqJigUCoVCoeQFFRMUCoVCoVDygooJCoVCoVAoeUHFBIVCoVAolLygYoJCoVAoFEpeUDFBoVAoFAolL6iYoFAoFAqFkhdUTFAoFAqFQskLKiYoFAqFQqHkBRUTFAqFQqFQ8oKKCQqFQqFQKHlBxQSFQqFQKJS8oGKCQqFQKBRKXlAxQaFQKBQKJS+omKBQKBQKhZIXVExQKBQKhULJCyomKBQKhUKh5AUVExQKhUKhUPKCigkKhUKhUCh5QcUEhUKhUCiUvKBigkKhUCgUSl5QMUGhUCgUCiUvqJigUCgUCoWSF1RMUCgUCoVCyQsqJigUCoVCoeQFFRMUCoVCoVDygooJCoVCoVAoeUHFBIVCoVAolLygYoJCyRBCSKmHQKFQKGWJvtQDoFDKHUIIeJ6H3+8Hy7IwGAzQ6/XQ6XRgGKbUw6NQKJSSw6S526K3YpTLGkIIIpEIBEFAJBKRH5PQ6XRUXFAolFJRNhccKiYolCRIAoIQAoZhEIlE4sQCIUT+I0HFBYVCKSJlc4GhYoJCSYAQAo7jwHEcGIYBwzByhCKVOJCEhSAI8uuouKBQKAWkbC4oVExQKAoEQUA0GpUFgTT5S2ICQMaCgIoLCoVSYMrmAkLFBIWCWJJlNBoFgDghIaFc8sh1H4niQq/Xy3+ouKBQKFlSNhcMWs1BuexRW9YoBNK2WZaV98vzPDiOk1+j1+vlyAXLslRcUCiUioCKCcplTbJljWKQuL9EccEwTFzkgooLCoVSrtBlDspliXJZI1MRke8yR7YkVotQcUGhUBIomwsAjUxQLjuU3hHFjkZkg1rkguO4uLwOKi4oFEo5QCMTlMuKRO+IbCbfYkcm0iElc0oIggCe52G326HT6ai4oFAufcrmB04jE5TLgsQkSykJspJhGAY6nU7+fyAQwOTkJPbs2SM/LyVz6vX6so7CUCiUyoaKCcolj1bLGuXe6Es6NklgSMcdDocBQO4rYjAY5DJUKi4oFIoWUDFBuaRJzDHIdfKcn5/H+fPnodPpYLfb4XQ64XA4YDAYtByupiQKCwCyuJCiM1IpqrQsQqFQKLlAxQTlkkQr7wie53Hu3DlEIhEcOnQIDMPA6/XC7XZjcnIShJA4caHXl+4nJdl+J3sOgKq4kJw9qbigUCi5QsUE5ZJDK+8In8+Hvr4+bNy4EW1tbYhGoyCEwOl0wul0AhAjHysrK3C73RgfHwfDMHA4HHA6nXIiZDlCxQWFQtESKiYolwyJlti5Tn6EEMzMzGBychLt7e2w2WwA1O/89Xo96urqUFdXBwCIRqPweDxYXl7GyMgIdDpdnLgo1wlZTVxIORdKcaHsK1Kux0KhUIoPLQ2lXBIQQhCNRsHzfF7RCI7jMDAwAJZlsXfv3rhlC2W0I1MikQg8Hg/cbjdWV1eh1+vlyIbNZtN0Qvb7/RgfH8f+/fs126aEWrt1pbiQqkUoFEpRKZsfHY1MUCoeQRAwMTEBh8MBi8WS86S2urqK/v5+bNmyBS0tLZqMzWg0oqGhAQ0NDQCAcDgMt9uN2dlZeL1emEwmWVxUV1eX7YSsZqBFCEE4HJarRaSOqDqdjooLCuUyg4oJSsWiTLJ0uVyoqqqC1WrNaTsTExOYn59HZ2cnqqqqCjBaEZPJhKamJjQ1NQEAgsEgPB4Ppqam4PP5YDabZXFRVVWV1YScKgFTa9TEhSAICIVC8mO03TqFcvlAxQSlIkn0jmBZNs4NMlMikQj6+/thsVhw5MiRoucBWCwWWCwWNDc3gxCCYDAoJ3P6/X5UVVXJ4iKfqEuhoeKCQrm8oWKCUnFISZZKS+xc7srdbjcGBwexc+dOeRmilDAMA6vVCqvVio0bN4IQgkAgALfbjZGREQQCAVRXV8eJi3IlmbgIBoNxyZ5UXFAolwZUTFAqhlSW2NmICUIIRkdHsby8jO7u7rKdlBmGQVVVFaqqqtDa2gpCCHw+H9xuN86fP49wOAybzRZXqlquSOJC+s6U4mJoaAh79uyh4oJCqWComKBUBOm8IzIVE6FQCH19fXA4HDh8+HBFlTcyDAObzQabzYZNmzZBEATZQGtwcBDhcBiEECwuLsLhcMBoNJZ6yElRiotAICAvUykjF8qOqFRcUCjlDRUTlLIm0TsiWdlnJjkTFy9exPnz57Fnzx7ZF6KSYVkWdrsddrsdW7Zsgd/vx9DQEPx+P6anp+UOopVi/Z0YuZC+d6W4kCIXtCMqhVJeUDFBKVuy8Y5IFZkQBAHDw8Pwer04fPgwTCZToYZcUiTfh61bt2Lr1q3geV5255Ssv5UGWqW0/k5HMnHBcZz8vHJZhIoLCqW0lO/VhHJZIwgCIpFIXJJlKpKJiUAggL6+PjQ0NMi9NXKl0iYrnU6H2tpa1NbWAoi3/h4bG6sY629APaEzUVwol0WouKBQigsVE5SyIjG8nWlOg5qYmJ+fx8jICPbv3w+Hw5H32Cp9ckpm/b20tCRbf0vJnDU1NUXJJ8nVF0NNXCR2iKXigkIpHlRMUMqGRO+IXA2blJ0+jxw5olmuQLEMoYqFwWBAfX096uvrAcSsvxcWFjA8PFxQ62+tURMX0Wh0nbhQNi2j4oJC0Q4qJihlQbbLGolICZiJnT7phJE5pbD+lr5vrZFyKpT7SRQXiU3L6LlCoeQOFROUkpLKOyJbXC7Xuk6flxNa22mrWX9LyZx+vx8Wi0WuFMnW+rvYqImLSCQi9xVRa1pWzsdDoZQbVExQSkY674hM4TgOs7OzAIAjR44UtEqhUHfSlYBk/d3S0qKZ9XepPs9U4kI6Fw0Gg7wsQsUFhZIaKiYoRSdT74hMkDp91tTUwGazlXW546WEmvW33++H2+3GhQsXEAqFZOtvqZtrOaMUF1J0JxKJIBKJABAjF4k5FxQKJQa98lKKSuKyRq4ighCCyclJzM7OorOzEysrK/KFn1J8GIZBdXU1qqur0dbWltb6u5y9PpS9Q4Dk4oLneVRXV1NxQaGAiglKEdFqWUPZ6fOKK64Ay7JYXV295KotsqWcwvDprL85jkNNTQ1qamrK/ntTExeEEJw8eRKHDh0CEJ9zQcUF5XKEiglKwcnVO0KNZJ0+GYbJqQX5pUa5TsyJ1t+CIGBlZQUulwuBQACvvvpqRVl/S3/rdDpZXITDYdWETiouKJcDVExQCko+3hGJ2xkdHcXS0pJqp0+tKxkohYVlWbnEdHV1FR0dHRVl/a1MHFXzuEgUF5L1t06nk6tFKJRLifL6hVIuKfL1jpBQdvrs6elRvcsrhpigE0DhULP+9ng8cLlcsvW3FLUoB+vvVFUoauJCEASEQiH5fbTdOuVSg4oJiuZIyXeBQAB2uz2vEG+mnT7pMkdlkkwA6vV6bNiwARs2bABQHtbfSrIpaU0lLiSouKBUOlRMUDRFWtbweDxwu90598TIttMny7KX/TJHpU5AmYxbzfrb7XZjfn4e58+fh9FojHPnLLS4yMcfg4oLyqUIFRMUzVA2Wspncpc6fdbX12fc6ZPmTIhU4meQy0RpNBrR2Ni
"text/plain": [
"<Figure size 648x648 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – this cell generates and saves Figure 8– 2\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"import matplotlib.pyplot as plt\n",
"from sklearn.decomposition import PCA\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"pca = PCA(n_components=2)\n",
"X2D = pca.fit_transform(X) # dataset reduced to 2D\n",
"X3D_inv = pca.inverse_transform(X2D) # 3D position of the projected samples\n",
"X_centered = X - X.mean(axis=0)\n",
"U, s, Vt = np.linalg.svd(X_centered)\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"axes = [-1.4, 1.4, -1.4, 1.4, -1.1, 1.1]\n",
"x1, x2 = np.meshgrid(np.linspace(axes[0], axes[1], 10),\n",
" np.linspace(axes[2], axes[3], 10))\n",
"w1, w2 = np.linalg.solve(Vt[:2, :2], Vt[:2, 2]) # projection plane coefs\n",
"z = w1 * (x1 - pca.mean_[0]) + w2 * (x2 - pca.mean_[1]) - pca.mean_[2] # plane\n",
"X3D_above = X[X[:, 2] >= X3D_inv[:, 2]] # samples above plane\n",
"X3D_below = X[X[:, 2] < X3D_inv[:, 2]] # samples below plane\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"fig = plt.figure(figsize=(9, 9))\n",
"ax = fig.add_subplot(111, projection=\"3d\")\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"# plot samples and projection lines below plane first\n",
"ax.plot(X3D_below[:, 0], X3D_below[:, 1], X3D_below[:, 2], \"ro\", alpha=0.3)\n",
2019-01-15 05:36:29 +01:00
"for i in range(m):\n",
2021-11-19 06:03:48 +01:00
" if X[i, 2] < X3D_inv[i, 2]:\n",
" ax.plot([X[i][0], X3D_inv[i][0]],\n",
" [X[i][1], X3D_inv[i][1]],\n",
" [X[i][2], X3D_inv[i][2]], \":\", color=\"#F88\")\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"ax.plot_surface(x1, x2, z, alpha=0.1, color=\"b\") # projection plane\n",
"ax.plot(X3D_inv[:, 0], X3D_inv[:, 1], X3D_inv[:, 2], \"b+\") # projected samples\n",
"ax.plot(X3D_inv[:, 0], X3D_inv[:, 1], X3D_inv[:, 2], \"b.\")\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"# now plot projection lines and samples above plane\n",
"for i in range(m):\n",
" if X[i, 2] >= X3D_inv[i, 2]:\n",
" ax.plot([X[i][0], X3D_inv[i][0]],\n",
" [X[i][1], X3D_inv[i][1]],\n",
" [X[i][2], X3D_inv[i][2]], \"r--\")\n",
"\n",
"ax.plot(X3D_above[:, 0], X3D_above[:, 1], X3D_above[:, 2], \"ro\")\n",
"\n",
"def set_xyz_axes(ax, axes):\n",
" ax.xaxis.set_rotate_label(False)\n",
" ax.yaxis.set_rotate_label(False)\n",
" ax.zaxis.set_rotate_label(False)\n",
" ax.set_xlabel(\"$x_1$\", labelpad=8, rotation=0)\n",
" ax.set_ylabel(\"$x_2$\", labelpad=8, rotation=0)\n",
" ax.set_zlabel(\"$x_3$\", labelpad=8, rotation=0)\n",
" ax.set_xlim(axes[0:2])\n",
" ax.set_ylim(axes[2:4])\n",
" ax.set_zlim(axes[4:6])\n",
"\n",
"set_xyz_axes(ax, axes)\n",
"ax.set_zticks([-1, -0.5, 0, 0.5, 1])\n",
"\n",
"save_fig(\"dataset_3d_plot\", tight_layout=False)\n",
2019-01-15 05:36:29 +01:00
"plt.show()"
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 7,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAaAAAAEBCAYAAAA+dnESAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAAYcElEQVR4nO3df6xfdX3H8debVrpSqfwoXCoCZaaZo1nQ0jAu6rikIsgfwxndICSthKwqdrpYQyCN3ouR6JY0y1idSExDlQy2LFGbUUXseofSutESqlxntTSV1sLKL2lv211sfe+P8/2u3377/XW/33PO53zOeT6Sb+73x+m3788995zXOZ/P53u+5u4CACBvp4UuAABQTQQQACAIAggAEAQBBAAIggACAARBAAEAgpgZuoC0zJs3zxcsWBC6jK4OHz6sOXPmhC4jM2VuX5nbJtG+2BW1fdu3b3/Z3c9r9VppAmjBggXatm1b6DK6Gh8f18jISOgyMlPm9pW5bRLti11R22dmv2r3Gl1wAIAgCCAAQBAEEAAgCAIIABBE7gFkZuvM7ICZPdvmdTOz+8xsl5n9xMwW510jACB7Ic6AHpR0Q4fXPyBpYe22QtJXc6gJAJCz3API3Z+Q9GqHRW6S9A1P/FjSWWY2P5/qAAB5KeLngC6UtLfh8b7acy80L2hmK5ScJWloaEjj4+N51DeQycnJKOrsV1nb9+tfz9Yddwzr4Yd/qDPOOB66nEyUdd3V0b7iKWIAWYvnWn5rnrs/IOkBSVqyZIkX8UNYzYr6YbG0lLV9DzwgHTwonXbae1XC5kkq77qro33FU8RZcPskXdTw+G2S9geqBZAkPfqoJLkefzx0JUB5FDGANkhaVpsNd5Wk1939lO43IC/u0hNPSJJp48bQ1QDlkXsXnJk9LGlE0jwz2ydpVNKbJMnd75e0UdKNknZJOiLptrxrBBrt2iW98UZy/7nnpEOHpDPPDFsTUAa5B5C739LldZf0yZzKAbravPnE/dmzpSeflG7o9EECAD0pYhccUCgbN0pHjiT3JyfFOBCQEgII6ODE+E/id78T40BASgggoIPnnpOmpk597tChMPUAZUIAAR00jv/U1ceBAAyGAAI6ePTRE+M/dYwDAeko4pUQgMI45xzp3HOTsZ/XXkvuS8lZEIDBEEBAB+vWJT/37ZMuukh6+eWw9QBlQhccACAIAggAEAQBBAAIggACAARBAAEAgiCAAABBEEAAgCAIIERlbCx0BQDSQgAhGlu3Svfck/wEED8CCFHYulVaujS5v3QpIQSUAQGEwhsbk66+Wjp6NHl89GjymO44IG4EEApvbEzasuXEBUBnz04eE0BA3AggRGF4WNq0Kbm/aVPyGEDcCCBEY3hYGh0lfICyIIAQFbrdgPIggAAAQRBAAIAgCCAAQBAEEAAgCAIIABAEAQQACIIAAgAEQQABAIIggAAAQRBAAIAgCCAAQBAEEAAgCAIIKKlWF27lYq4oEgIIKKGtW6V77jn5q8tbPQeERAABJbN1q7R0aXJ/6dLkcavngNAIIKCA+u0qGxuTrr5aOno0eXz0aPK41XN0xyE0AggomEG6ysbGpC1bpNmzk8ezZyePWz1HACE0AggokDS6yoaHpU2bkvubNiWPWz0HhEYAAQXRrvusnzOV4WFpdPTkoGn1HBASAQQURLvus0HGg3p5DgiFAAIKhK4yVAkBBBQMXWWoCgIIhVT1rqKqtx/VQAChcPjEPlANQQLIzG4ws51mtsvM7mrx+oiZvW5mz9Runw9RJ/LHJ/aB6sg9gMxshqSvSPqApMsk3WJml7VY9Ifu/s7a7Qu5Fokg0pyGDKD4QpwBXSlpl7vvdvc3JD0i6aYAdaBg0p6GDKDYQgTQhZL2NjzeV3uu2bCZ7TCz75rZonxKQ2hMQwaqY2aA/9NaPOdNj5+WdIm7T5rZjZK+LWnhKW9ktkLSCkkaGhrS+Ph4upVmYHJyMoo6+5VW+5YvX6CpqT0qyq/qpZdmSRpm3UWM9hWPuTfv+zP+D82GJY25+/W1x3dLkrt/qcO/2SNpibu/3G6ZJUuW+LZt21KuNn3j4+MaGRkJXUZmytq+ffukiy6Sct5cclXWdVdH+8Iws+3uvqTVayG64J6StNDMLjWz0yXdLGlD4wJmdoGZWe3+lUrqfCX3SgEAmcm9C87dj5nZSkmPSZohaZ27T5jZx2uv3y/pw5I+YWbHJB2VdLPnfaoGAMhUiDEguftGSRubnru/4f5aSWvzrgsAkB+uhAAACIIAAgAEQQABAIIggAAgsKpe7YMAAoCAqnz1dwIIAAKp+tXfCSAgIlXtqikjrv5OAAHRqHJXTRlx9XcCCIhC1btqyirrq78XPcwIIFRW0TfOOrpqym14WBodTT98YjhjJoBQSTFsnHV01ZRf2usyljNmAgiVE8vG2Ygv6kOvYjpjJoBQKTFtnM2y6qpBucR0xkwAoVJi2jhbaVVnLLUjP7GcMRNAqJxYNs5exDSWhXzFcMZMAEWCo9x0xbBxdhPjWBay0W7/UPT9BgEUAY5ys1H0jbOT6Y5lxdxWdFbfP0xMzA1dyrQRQAXHUS5amc5YFgcw5dW4f1i16vLo1jEBVGAxz9hC9noZy+IAprya9w9TUzOi2z8QQAUW+4ytbsrSjpA6jWVxAFNuzfuHWbOOR7d/IIAKrpej3Jj+4OroFkpPp3GfMh/A4OT9w5o1O6KbVNNTAJnZnWbmLW5fyLpAdD7KDb0j72dnRrdQfso05Ryt1fcPixYdDF3KtPV6BvRVSfMbbmskvSjpGxnVhSbtBpdD7sj7CT+6hfKX5pRz1lMxxbpeegogdz/k7i+6+4uSlku6RdKIpCkzGzezn5nZDjP7UIa1okHoHXm38KNbqFjS+P2GPttG+UxrDMjM7pb0KUnXuvtOScck/bW7XybpOkl/b2ZnpF8mmoXckXcLv247KrqF4hP6bBvl1HMAmdlqSXdIusbdfyFJ7v6Cuz9Tu39A0muS5mVQJ1oItSPvFH4TE3N72lGV4UoEVRH6bBvl1eskhM9J+pikEXff1WaZJZLeJGlveuWhm7R25NPdmbQKv7ExaeXKxZX8dH6Z2tKMblNkpWsA1c58Pi3pZkmHzeyC2u33GpY5V8mEhNvd3TOrFi0NuiPot2+/OfzGxqS1a5+u3I6qCmMjdJsiCx0DyMxM0p2SzpX0pKQXGm7vri0zS9K3JH3J3bdkWi1SN2jffnO4LFp0MJMdVVFDrEpjI3SbIm0dA8gTb3F3a3HbVAuoByX9u7t/M5eKkZqs+vbT3lEV9QyjimMjZW4b8jfolRDeLekvJH3QzJ6p3f4ohbqQgm47iyz79tPaURX5DIOxEWAwAwWQu//I3U9z93c23H6aVnHoX69nDUXu24/hDCOr31+R2ghkhWvBldB0zxqK2rcfyxlGjF2ORfsdopoIoJLp96yhqDukIp+hNYqpy7GoY2qoHgKoZGI5a5iOop6hpS2PLscij6mhegigEkrjKxyKFlhFqycLWR88xDCmhmohgEpqkK9woIsmnCy7HMt4doy4EUAl1s9XONBFE16WXY6xjKmhGgigCunWBRNrF03R6+tHlm2qypgaio8AqpBuXTAxdtHQXdifIq9T9OfBBxeELmHaCKCK6dYFE1MXDd2FQGLrVmn9+gXRbQMEUAV164KJoYsm1u5CIG0xH4gRQBUV2zTsZjF2FwJpi/1AjABCtGLqLgSyEPuBGAGEqMXQXQhkKeYDMQKoAmI5GupX2duHkx0+LH3zm9LevaErKY7hYWn58j1RhY9EAJUe05TzQxDm48gRadky6eKLpbe/Xbr9dgJJkj760T2hS5i2IAFkZjeY2U4z22Vmd7V43czsvtrrPzGzxSHqTEuoHVPMs2NiQ9CHsXu3tG5d60B6/vnQ1aGbmXn/h2Y2Q9JXJF0naZ+kp8xsg7v/rGGxD0haWLv9saSv1n5Gp75juv76fPtmx8aS/7euPjtmdLR1II6NcQTfzn33SV/8YnL//POlW2+VrrnmxOs//3nye5WkkZHk9/6Od+Re5sCefXaefvOb0FV099JL7V/
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – this cell generates and saves Figure 8– 3\n",
2021-11-19 06:03:48 +01:00
"\n",
2019-01-15 05:36:29 +01:00
"fig = plt.figure()\n",
2021-11-19 06:03:48 +01:00
"ax = fig.add_subplot(1, 1, 1, aspect='equal')\n",
"ax.plot(X2D[:, 0], X2D[:, 1], \"b+\")\n",
"ax.plot(X2D[:, 0], X2D[:, 1], \"b.\")\n",
"ax.plot([0], [0], \"bo\")\n",
"ax.arrow(0, 0, 1, 0, head_width=0.05, length_includes_head=True,\n",
" head_length=0.1, fc='b', ec='b', linewidth=4)\n",
"ax.arrow(0, 0, 0, 1, head_width=0.05, length_includes_head=True,\n",
" head_length=0.1, fc='b', ec='b', linewidth=1)\n",
"ax.set_xlabel(\"$z_1$\")\n",
"ax.set_yticks([-0.5, 0, 0.5, 1])\n",
"ax.set_ylabel(\"$z_2$\", rotation=0)\n",
"ax.set_axisbelow(True)\n",
2019-01-15 05:36:29 +01:00
"ax.grid(True)\n",
"save_fig(\"dataset_2d_plot\")"
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 8,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import make_swiss_roll\n",
2021-10-03 12:05:49 +02:00
"\n",
2021-11-19 11:36:04 +01:00
"X_swiss, t = make_swiss_roll(n_samples=1000, noise=0.2, random_state=42)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 9,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW0AAAFYCAYAAACYtq08AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOydd5wdVd3/3+fMzK17t+9mN7vZ9B5IKAFClSIgVUVReB7Fil0sWHhUUHlU1MfCT0QRFUFsIAKKdDBAAIEQCCkkhNTtvd425ZzfH3PvZjfZ9A3ZxXm/Xvva3Vtmzsyd+znf+Z5vEVprAgICAgLGB/JQDyAgICAgYO8JRDsgICBgHBGIdkBAQMA4IhDtgICAgHFEINoBAQEB44hAtAMCAgLGEeYeng/iAQMCAgJ2j3gjdxZY2gEBAQHjiEC0AwICAsYRgWgHBAQEjCMC0Q4ICAgYRwSiHRAQEDCOCEQ7ICAgYBwRiHZAQEDAOCIQ7YCAgIBxRCDaAQEBAeOIQLQDAgICxhGBaAcEBASMIwLRDggICBhHBKIdEBAQMI4IRDsgICBgHBGIdkBAQMA4IhDtgICAgHFEINoBAQEB44hAtAMCAgLGEYFoBwQEBIwjAtEOCAgIGEcEoh0QEBAwjghEOyAgIGAcEYh2QEBAwDgiEO2AgICAcUQg2gEBAQHjiEC0AwICAsYRgWgHBAQEjCMC0Q4ICAgYRwSiHRAQEDCOCEQ7ICAgYBwRiHZAQEDAOCIQ7YCAgIBxRCDaAQEBAeOIQLQDAgICxhGBaAcEBASMIwLRDggICBhHBKIdEBAQMI4IRDsgICBgHBGIdkBAQMA4IhDtgICAgHFEINoBAQEB44hAtAMCAgLGEYFoBwQEBIwjAtEOCAgIGEcEoh0QEBAwjghEOyAgIGAcEYh2QEBAwDgiEO2AgICAcUQg2gEBAQHjiEC0AwICAsYRgWgHBAQEjCMC0Q4ICAgYRwSiHRAQEDCOCEQ7ICAgYBwRiHZAQEDAOCIQ7YCAgIBxRCDaAQEBAeOIQLQDAgICxhGBaAcEBASMIwLRDggICBhHBKIdEBAQMI4IRDsgICBgHBGIdkBAQMA4wjzUAwgYPbTWKKXIZrNIKZHSn5OFELv8nf976OMBAQFjl0C0xzFaa7TWeJ6HUgqlFADZbBYhBFrrYb93h+u69Pb2Ul5ePkzMhwr/jj/5x4c+v+NjQ38HBAQcOIFojzOGCrXneWitge2iqbXGMIx9FkrbtmlpaaGiogJgcALI/87vJ//3vm5/R1HfUfzzk0NwdxAQsHsC0R4H5N0eeYt6qFDnRQ58a7mtrY2+vj4Mw9jrn91ZyaMx9h3/33EC2PF5IQSrVq3isMMO2+P2RxLz4O4g4M1MINpjlLy7Iy/UW7duxbIsampqhgl1JpOhra2N1tZWXNelrKyMgoKCwfe7rks2mx20zEf6yQtpJpPh+eefR0q5T6I/0k/ep76j+O2tGDqOg2EYu33NjoI/9O5gV5PBvpAXdcdxsG2bRCKx27uEfZkUAgL2l0C0xwh7cnsMFYD+/n7a2tpoa2vDNE0qKiqYP38+sVgMpRSO4+yzOGSzWdatW8fChQsHJ4vd/eztZDAUIcRei77runR3d+9yQshvb8ftjwY7jjuZTNLW1sbMmTNHfH5fJ4RduYry/+94pzDS37v7HfDmJhDtQ0je7ZEXyZGEGnzLMZVKkUwm2bJlC/F4nMrKSo4++mgsyxpxuwfyBc5bySNt+0AYeufguu7g7x0fs20bz/Po6OgYcTLIW9R59mUy2Bs30UiTwY6uqP1lV3cHIz23q89Ra82mTZuYPn36Ts/tbu0guDt4cxCI9hvMUP+053mDj4/kn+7o6Bj0UZumSVFREbNnz96jeIzVL9nQMMRwOLzb17a3tw9atnti6B3K7n4cx9nt8yNNBlJKtNY4jsPatWv321W0KwHcn89KKUVvb+8w99H+rh3sLXu6O8izpzWSYEI4cALRPsgMdXv09fXhui6FhYXAzpEQmUyG9vZ22trayGazlJeXM3nyZAoLC6mvrx81a28k8pEn4xEhBKZpYpqjeznnP7fu7m7a29uZNGnSTncJ+TuD/N+7mgyGntv853ggdwT58Y3mZLCrc7AjI90ddHd309XVxbRp00Y1smh/7g7e7BNBINoHgV35p3t6enBdl+Li4sHXDQwM0NraSnt7O1JKKisrmTt3LrFYbL/2/Wa/YN9I8pOBZVmYpkkikRiV7e54t7Wrn7ybaKSfVCrF8uXLdxLVHScD0zT3a4LYnSW8q8eklPs0ce7L3cFQ1+HuuP7661m0aBEXXnjhXo9jvBGI9iixu7C8oT5RrTVdXV20tbXR2dlJLBajsrKSI488klAotMvtj2dLeLwz2ud9qA9+f/A8j5dffpmjjjpq2ON7Oxm4rksmk9nta/c0Gez4k8lkBiOZ9hRRNPQ87HheDpRUKrWTm+vNRiDaB8BQa3rohbKjGyO/qFZfX08ymaSyspLKykpmzpy531/cg0EwKeya8XAHc6CTwe7Y02SQzWbRWpNKpfYqoihvhOSFfH/vCHaMKPI8b0x9pw4GgWjvA3sTlpcnm83S3t5Oa2sr2WyWsrIyysrKqKioYNq0afu8byHEQbUgxoMoBRw69hRRlHfFTJkyZZ+2u7vJYGh0Ud5VNHQtYehr29vb+drXvoZt29xzzz1ce+21xONxPvnJT3LxxRfvdgwf+tCHuO+++6isrGT16tUAfPOb3+Tmm28ezBD+7ne/yznnnLNPx3awCER7D+wYlrdu3Tpmz549olAPDAzQ1tZGe3s7AJWVlcyZM4d4PA5Ac3Mz6XT6kBxHwJuHAw3pPBjs75hGM7z0bW97G//zP//DeeedxxlnnEEymdwrH/sHPvABPv3pT/P+979/2OOf//znufLKKw94XKNNINojsLuwvO7u7sFbMa013d3dg/7pSCRCZWUlixYtGjGk7UD80m+ETztwjwTsL2Pl2lFKDbpM8lFae+Lkk09my5YtB3dgo0gg2mx3ewxdSMyzozWttR7MRuzp6aGoqIjKykpmzJgxrn1pY81yC9g1Y9HShrFxDXmeN2qhnzfccAO33XYbRx99ND/60Y8oKSkZle0eKP+xTRDyIu04Dtlslmw2i+M4g1+IobUzbNumsbGRFStWkEwm6erqoqamhhNOOIHDDjuMCRMm7JVgj3VL+82BQogmpNwKZA5gO31IuSG3nYO1luAiRDcw/l1mY2UicV13VIynT3ziE2zcuJGXX36Z6upqvvjFL47C6EaH/yhLe2/TxmF7vYn29na01lRUVDBr1izS6TRz5szZr/2PhYv6zY2HZd2FYbwKSLROYNuXofWeLKQMhrEaSKLUVMAiFPojYCOEIh6fREfHsaM6UiG6MM2/IkQvQoDjvBWlFo3qPt5Ixopoj1b0yIQJEwb//uhHP8p55513wNscLd70oj002iOZTNLc3DyYtTU0LE9rTU9PD21tbXR0dBAOh6msrGThwoXD/NMHemGOZUt7vFvyhrEWw1iD1lWAQIhOTPNBHOeSnV4rRDdSvoIQWaRci5R9aG0ixBMoFQIiaF2F1ppQ6HVisYnA7BH3K0QLpvkM4OB5i1BqNpDGMF5EiD6Umo5Ss4Dt145p/h0hUrl9OLlxTkTryj0e51gRyLGIUmpU3CPNzc1UV1cDcPfdd7NgwYID3uZo8aYT7d2F5SmlSCaTg2KtlKKzs5O2tja6u7spLCyksrKSadOmjXpKdH4MY1UY3wwiIEQXYJAXR63jSNkO5BeTjdzrOgmHbwCSCJFEiBZc9wSgEK3TmOYKXHdJfqu5CT67i322Ewr9GtCAiZSrcJyLMM2nkbIZrUMYxlO47jvwvGNy7/KQsgWlJub+t3J3ej17JdpjkbEykeyPe+SSSy5h6dKldHR0UFtby7e+9S2WLl3Kyy+/jBCCKVOmcNNNNx2kEe87bwrR3lu3h5QSz/NoamqitbWVVCpFaWkp1dXVzJ0796DV9RgNxrLgjxV8EfRyPzInghaRyDcAcN2TcN2zMIx/A2m0rgY6EaIFKbei1GGAhdYFSNm
"text/plain": [
"<Figure size 432x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – this cell generates and saves Figure 8– 4\n",
2021-11-19 11:36:04 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"from matplotlib.colors import ListedColormap\n",
"\n",
"darker_hot = ListedColormap(plt.cm.hot(np.linspace(0, 0.8, 256)))\n",
"\n",
2019-01-15 05:36:29 +01:00
"axes = [-11.5, 14, -2, 23, -12, 15]\n",
"\n",
"fig = plt.figure(figsize=(6, 5))\n",
"ax = fig.add_subplot(111, projection='3d')\n",
"\n",
2021-11-19 11:36:04 +01:00
"ax.scatter(X_swiss[:, 0], X_swiss[:, 1], X_swiss[:, 2], c=t, cmap=darker_hot)\n",
2019-01-15 05:36:29 +01:00
"ax.view_init(10, -70)\n",
2021-11-19 06:03:48 +01:00
"set_xyz_axes(ax, axes)\n",
2019-01-15 05:36:29 +01:00
"save_fig(\"swiss_roll_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 10,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAsAAAAEQCAYAAAC++cJdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOyddZxU1fvH3/dOzxax1C4dkqKUdBmEYlAGolKKgRiogH5tLAQLxMIvYYIIgkELIt3dLLnAdk/Pvb8/zszuxJ1lQYzvz/m8XgM7N8495869z3nOE59HUlWVKKKIIooooogiiiii+LdA/rs7EEUUUUQRRRRRRBFFFH8logpwFFFEEUUUUUQRRRT/KkQV4CiiiCKKKKKIIooo/lWIKsBRRBFFFFFEEUUUUfyrEFWAo4giiiiiiCKKKKL4VyGqAEcRRRRRRBFFFFFE8a+C/u/uwKUgMTFRrV279h9up6ioiJiYmD/eof8h/NvG/G8bL/z7xvy/Pt5t27Zlqqpa6e/uB0C5cuXU+vXr/93d+NPxv/7MlAXRMf7/QHSMfwylydf/SQW4du3abN269Q+3s3r1arp16/bHO/Q/hH/bmP9t44V/35j/18crSdLJv7sPflSpUuWyyNZ/Ov7Xn5myIDrG/x+IjvGPoTT5Gg2BiCKKKKKIIooooojiX4WoAhxFFFFEEUUUUUQRxb8KUQU4iiiiiCKKKKKIIop/FaIKcBRRRBFFFFFEEUUU/ypEFeAooogiiiiiiCKKKP5ViCrAUUQRRRRRRBFFFFH8qxBVgKOIIooooogiiiii+FchqgBHEUUUUUQRRRRRRPGvQlQBjiKKKKKIIooooojiX4WoAhxFFFFEEUUUUUQRxb8KUQU4iiii+HfDewrc60HJ/bt7EkUUUUTx90PJEzLR+4+p0v6nIKoARxFFFP9OKAWQeyNkNYSc3pBeFQqeBVX9u3sWRRRRRBEZihtU7+VvV1Wh8EXIrOqTjY0gpwco+Zf/Wv8ARBXg/wdwb9xITqdOZMTGklW/PvaZM1H/6CSuKHB4PexeBo5C7WO8HijMFsdGERGqqrJr82a+/ugj1ixZgtdbRsFly4Slo2BKMnxUHzZNAsXz53b234SCEeD6FTIdsDsfdjph0xuQMuTv7tm/Hg67ncXffce3n3zCiSNH/u7u/Cvh/Oknsps3JyMmhuyrr8a5ePHf3aXLC1se7FoKRzeB2wlFOX988et1wbrXYFodmFIdVjwBjtzL0l0ACvfDxs6wzAzLLLBrELhzLl/7zm/ANhlwgJon/nevgfwhl+8a/yDo/+4ORHERcDvh9AGIrQCVa4pN27aRe911YLMBoBw7RuEjj6Cmp2N95pmg010HDqDk5+M5cwZ99eqRr3NqD7zVG+z5IElC0R0yFboNFfsVBRa+Cksmg8cJ5ngY+Dp0u/9PGfafibTz53n/rbdYuXQpVatVY/Qzz3Bdz56XrX1bURF3dezIsQMHQJIwGI2Ur1iRr9eupWpycuQTXYUwozUUnhWrfYA1L0LqRug3r/gwxevl5OLFpG/ZQlytWtS//XYA7EVFuB0O4itWvGxjuSxQVcANkvEvvq4NPLtBrgy6usL661wImU44A/jXcB7g2Bdg6gLVh/+1ffx/DFVVWThvHh+//z55ubn06duXUWPGkFCuXNixu7dsYWiPHng9HjwuF6qqcvOgQbwxYwaSJJX5moW5uUiyTEx8/GUcyR9D0blzHJkzB1d+PrV69aLKNdf83V0qhufcOVw7dqCvUQPl8GEK7r23eF7x7tpFfv/+xM+Zg+nmm//mnl4EXA4xZ8YnQqUaJduXTIFvxoLeAC67MCwY9FAhCe79EK6+6dKu993NcPp38NjF9+3T4NgvMHw36E1/cCyZsLEjePIAFVQFzn8PRYeg/VYxV/9R2N4GikI2OsH1swgRk8tpn6d6RZ+k/y2VMmoB/l/BiplwdyUY1wUebAhjO0NeBkXPP18spIphs2GbMAHV6QRAycvjXJcunGvVCk9KCqkNGpA5bBiqliXS64HXr4ecVHAUCCXYZYOZj8Cp3eKYRRPgl4liv8cFhZmoXz2OY8aL2DZuRP0fsQinnT9Pp+bNmT5tGocPHGDNr79yb79+fDJlymVp/3RKCp2rVePQrl24XS7cTie2ggLOnjrF0/fco3lOzoEDbHvlFbaMup2slADlF8BjE8I08wAArsJC5rZpw7K77mLLK6+wZvRoZtasybnDh+lXoQJ3JCUxtFEj9q1fj+Lx4HW5IvZVVVX2LF3KlH79eLtnT36fORNPKcdfNFQvqP8BEgALqPVB/YssSvYPIbsyFPSE3GaQ1wG8KYAM5yhRfv1QVDjywl/Tt38JXho7lkeGDmXTunUc3LePKW+/TdeWLSkoKAg6zuv1MrJPHwpyc7EXFuJ2ufC43fwwaxZjBg0q07VOHzrEqLZtGVi5MgMSE3miSxfOnzgR8XhVVdk6fz7v9unDOzfeyOa5c1EuowxTvF48TifHf/yRL+rVY8P48Wx++WUWdOvG3Nat2ThuHKm//vrHvXZlgPvcOQqWL8cZYFVXVZWsUaM4U6cOGXfdxbl27cgfNCh8XrHbKXrqqT+9j5cNSz6FwZVgfFcYeQU82x3ys+DQOvh2HLjtYn7zusXC3OWGzJPw4e1wbPPFX+/sFjizrkT5BWERLjgLB+dFPq+sOP05KE4g4DlRXVB0GHI3/vH2AZSMkr/lwI9bxASHQj0P6q2AWXzUHqAej9B4EfA+cD1wI/Al8CeEcVwE/rfU9X8r9q+Djx8BZ4BAOrQJXr0Zz85TmqeoioJy7hy62rXJvP9+nJs3g9MJXi+qw0HRnDkYmjUj4cknQ661CtyO8AbdTlj5Cdz3ASyeJJTiAEguG/w4geOPvYscG0vtn37CkrkGlrwNhVlQswXc+Q7Ub6/d9q4fIHU3VGkILQeA0XqRN+ni8cHbb5OXm4vbXaJk2mw2Xhk/nnuGD8dq/WN9GD1gAEW+Cd6/NlcBRVHYvm4dBXl5xCUkFB+/c+JEtr70EorbDYqH3Tpo3hnaXB/QqCTDua2Q2Jhtr40hZ/9uvE4hRDxFRahFRcQXFBQrr2mHDjGrc2cq+awDyR070mv6dJBlTm/ZQvl69ajZpg3znn2W5VOm4CoSq/8j69axduZMnlmxAp3+coiJJ4DPAf9zcwwYAOpykDr8wbYLwN4b1FTQ9QDDUyBXFbtcK8D2jLiuf97wbIWih4U1w23XbtJ5TkyKl8Oq8i9H2vnzfDJlCk5HiVxxOp2kp6Xx1YwZPDh6dPH2XZs2YbfZkCh5Z/xY8t133PXQQ7Tp0iXitWwFBTzesSMF2dnFCuX+tWt5tE0bZqWkYI2Lw15YiNfjQVVVJEnis/vuY9v8+Th9z/6hNWvY/N13PDJ37kVZnBVFIXXTJhy5uVRv3x6dwcDKxx5j/5dforjdSKqKQVXR+Y732O2kb9tG1rZt7J48mcrXXMMtv/2GfFnet2CoikLqww+TO3MmktmM6nJhbd+eWgsWUDRnDoUzZoDTWWw0iTRq77Fjl71vfwp2r4LpTwTPmfvXwet9ITlZWH21ICEMDZOug3YDoebV0PouiKt04Wue2yKssqFwFwrFuNndlzKSEhTuASVCv22HobzG3HqxMN4AjtkgK8EPgapC4UDwXgXYQX8HGB4COgKnEK4zgJVAO1BTQIrxbcsEJgFzEAqvBOiAN4ENwId/vN+XiKgC/L+AHyaHv7BeNxzfja56EzznzoWfoyjIlSuj2GzYFi6EEGuearNRMGWKUIAVBY5vB8ULBZkErTCLT1AgPwOcRdoKMmAwKygFBSgFBRSO7oi5mSwUY4CUjTDpenh2nRAqfhRkwMS2UJgBzkIwxcIPY+HpjVCxVtnv0SVg9bJlQcqvHzqdjkP799OidetLbvv8mTOkHDgQtj1Qpng8JfG8+SkpbH3xRbwBSoJHgV1roN6VUKGKf6sXNb4mKXsWsue/nxcrv4Hty4ARcAFtAIuiFP+iZ9as4fMmTSjweFB8x3tjrKQ5nXg9JW25ioo4sXUrO3/8kVZ9+17qbRBQC4DPgJDnRrW
"text/plain": [
"<Figure size 720x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – this cell generates and saves plots for Figure 8– 5\n",
2021-11-21 06:04:07 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"plt.figure(figsize=(10, 4))\n",
2019-01-15 05:36:29 +01:00
"\n",
"plt.subplot(121)\n",
2021-11-19 11:36:04 +01:00
"plt.scatter(X_swiss[:, 0], X_swiss[:, 1], c=t, cmap=darker_hot)\n",
2019-01-15 05:36:29 +01:00
"plt.axis(axes[:4])\n",
2021-11-19 06:03:48 +01:00
"plt.xlabel(\"$x_1$\")\n",
"plt.ylabel(\"$x_2$\", labelpad=10, rotation=0)\n",
2019-01-15 05:36:29 +01:00
"plt.grid(True)\n",
"\n",
"plt.subplot(122)\n",
2021-11-19 11:36:04 +01:00
"plt.scatter(t, X_swiss[:, 1], c=t, cmap=darker_hot)\n",
2021-11-19 06:03:48 +01:00
"plt.axis([4, 14.8, axes[2], axes[3]])\n",
"plt.xlabel(\"$z_1$\")\n",
2019-01-15 05:36:29 +01:00
"plt.grid(True)\n",
"\n",
"save_fig(\"squished_swiss_roll_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 11,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW0AAAFYCAYAAACYtq08AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAACx4ElEQVR4nOydd3wcZ53/3zNb1Lu0smTZkovc45LYSZyEQCD0duE4yt1RAxc4yhHgF3KUS+AOCC0cHNzBhZYCJLQQCJBQLoXYie3Ece9Wr1tUtu9OeX5/rGa1ZXa1KrZXybxfL9nS7pRnd2c/z3e+z7dIQggsLCwsLBYH8oUegIWFhYVF4ViibWFhYbGIsETbwsLCYhFhibaFhYXFIsISbQsLC4tFhCXaFhYWFosI+wzPW/GAFhYWFvmRzufJLEvbwsLCYhFhibaFhYXFIsISbQsLC4tFhCXaFhYWFosIS7QtLCwsFhGWaFtYWFgsIizRtrCwsFhEWKJtYWFhsYiwRNvCwsJiEWGJtoWFhcUiwhJtCwsLi0WEJdoWFhYWiwhLtC0sLCwWEZZoW1hYWCwiLNG2sLCwWERYom1hYWGxiLBE28LCwmIRYYm2hYWFxSLCEm0LCwuLRYQl2hYWFhaLCEu0LSwsLBYRlmhbWFhYLCIs0bawsLBYRFiibWFhYbGIsETbwsLCYhFhibaFhYXFIsISbQsLC4tFhCXaFhYWFosIS7QtLCwsFhGWaFtYWFgsIizRtrCwsFhEWKJtYWFhsYiwRNvCwsJiEWGJtoWFhcUiwhJtCwsLi0WEJdoWFhYWiwhLtC0sLCwWEZZoW1hYWCwiLNG2sLCwWERYom1hYWGxiLBE28LCwmIRYYm2hYWFxSLCEm0LCwuLRYQl2hYWFhaLCEu0LSwsLBYRlmhbWFhYLCIs0bawsLBYRFiibWFhYbGIsETbwsLCYhFhibaFhYXFIsISbQsLC4tFhCXaFhYWFosIS7QtLCwsFhGWaFtYWFgsIizRtrCwsFhEWKJtYWFhsYiwRNvCwsJiEWGJtoWFhcUiwhJtCwsLi0WEJdoWFhYWiwhLtC0sLCwWEZZoW1hYWCwiLNG2sLCwWETYL/QALBYOIQS6rhOLxZBlGVlOzMmSJOX83/g99XELC4vixRLtRYwQAiEEmqah6zq6rgMQi8WQJAkhRNr/+VBVlcnJSRobG9PEPFX4M3+Mx1Ofz3ws9X8LC4v5Y4n2IiNVqDVNQwgBTIumEAKbzTZroYzH44yMjNDU1ASQnACM/43zGL/P9viZop4p/sbkYN0dWFjkxxLtRYDh9jAs6lShNkQOEtay2+3G7/djs9kK/slnJS/E2DP/zpwAMp+XJInDhw9z0UUXzXh8MzG37g4snstYol2kGO4OQ6h7e3txOBwsXbo0Taij0Shut5vR0VFUVaWhoYHKysrk/qqqEovFkpa52Y8hpNFolL179yLL8qxE3+zH8Klnil+hYqgoCjabLe82mYKfeneQazKYDYaoK4pCPB6nqqoq713CbCYFC4u5Yol2kTCT2yNVAAKBAG63G7fbjd1up6mpiY0bN1JeXo6u6yiKMmtxiMVinDhxgi1btiQni3w/hU4GqUiSVLDoq6rK+Ph4zgnBOF7m8ReCzHGHQiHcbjednZ2mz892QsjlKjL+zrxTMPs93/8Wz20s0b6AGG4PQyTNhBoSlmM4HCYUCtHT00NFRQUul4vt27fjcDhMjzufL7BhJZsdez6k3jmoqpr8P/OxeDyOpml4vV7TycCwqA1mMxkU4iYymwwyXVFzJdfdgdlzuT5HIQRdXV2sWrUq67l8awfW3cFzA0u0zzOp/mlN05KPm/mnvV5v0kdtt9upqalh7dq1M4pHsX7JUsMQS0pK8m7r8XiSlu1MpN6h5PtRFCXv82aTgSzLCCFQFIVjx47N2VWUSwDn8lnpus7k5GSa+2iuaweFMtPdgcFMayTWhDB/LNE+x6S6Pfx+P6qqUl1dDWRHQkSjUTweD263m1gsRmNjI+3t7VRXV9Pf379g1p4ZRuTJYkSSJOx2O3b7wl7Oxuc2Pj6Ox+Nh2bJlWXcJxp2B8XuuySD1vTU+x/ncERjjW8jJINd7kInZ3cH4+DhjY2OsXLlyQSOL5nJ38FyfCCzRPgfk8k9PTEygqiq1tbXJ7YLBIKOjo3g8HmRZxuVysX79esrLy+d07uf6BXs+MSYDh8OB3W6nqqpqQY6bebeV68dwE2maRkxR6ZuIc9oXZ3OjjBQL8/TTT2eJauZkYLfb5zRB5LOEcz0my/KsJs7Z3B2kug7z8Y1vfIOtW7fy+te/vuBxLDYs0V4g8oXlpfpEhRCMjY3hdrvx+XyUl5fjcrm4+OKLcTqdOY+/mC3hxc5Cv++pPviZzjsaiHNyNMjpsRBxrRS3iHJCKWVHpeCSSy7J2r7QReRoNJp325kmg8yfaDSajGSaKaIo9X3IfF/mSzgcznJzPdewRHsepFrTqRdKphvDWFTr7+8nFArhcrlwuVx0dnbO+MU9n1iTQm7O5x1MOK5xyh3k+EiI8XAcuyyzqrGctUsqODYcxO2P5hxjIZPBXJhpMojFYgghCIfDBUUUGUaIIeRzvSPIjCjSNK2ovlPnAku0Z0EhYXkGsVgMj8fD6OgosViMhoYGGhoaaGpqYuXKlbM+tyRJ59SCsNwqFxZNF/SORTgxGuTQoJ/xsMKO9hpe2NnA6qYKSuwJUTo2HLwg45sposhwxXR0dMzquPkmg9ToIsNVlLqWkLqtx+PhU5/6FPF4nF//+tf8+7//OxUVFfzzP/8zb3rTm/KO4d3vfjcPPvggLpeLI0eOAHDrrbdyxx13JDOEv/CFL/CqV71qVq/tXGGJ9gxkhuWdOHGCtWvXmgp1MBjE7Xbj8XgAcLlcrFu3joqKCgCGh4eJRCIX5HVYFCe+UJwTI0FOuUNEFI1yp40l1aU4ZInrtrRgk9MnU+NeqNgm2bmGmS5keOkrX/lKPvnJT/Ka17yGa6+9llAoVJCP/Z3vfCcf/OAHefvb3572+I033sjHP/7xeY9robFE24R8YXnj4+PJWzEhBOPj40n/dGlpKS6Xi61bt5qGtM3HL30+fNqWe+T8EFN1zrhDHB8N4g4kinutaChjXXMly+vLeKZvkn29KmYaKASmj19oiuXa0XU96TIxorRm4uqrr6anp+fcDmwBsUSbabdH6kKiQaY1LYRIZiNOTExQU1ODy+Vi9erVi9qXVmyW23MNIQQDE1FOjAbp8oYZmogyGVF522VtrF9SSblzdtdOMX5exTAmTdMWLPTzW9/6FnfddRfbt2/na1/7GnV1dQty3PnyvG2CYIi0oijEYjFisRiKoiRv81JrZ8TjcQYHB9m/fz+hUIixsTGWLl3KlVdeyUUXXURzc3NBgl3slvZzBUUZ4eTJV6IoowtyHFV1z/kYkxGFvT0T3L13kN8eHqV3LML65kpesLqeTlcFW9uqcwq2uQQW5zUw3yzchUJV1QUxnt7//vdz9uxZDhw4QEtLCx/72McWYHQLw/PK0i40bRym6014PB6EEDQ1NbFmzRoikQjr1q2b0/mL4aJ+PjA09CWCwScZGvoS7e23F7SPoozQ1fUuVq78EQ5Hc9pxJOkbSNKNBZ9f0XS6vGEODEwyMBGl3GGnra6UnSvqWNFYjl2WeLpvAklKnPdMz7vTzmvMzeYp7LnE/MJSLKK9UNEjzc3Nyd/f+9738prXvGbex1wonvOWthAiWdxofHycEydOoKoqQJo1DYnkl1OnTrF7925OnDiB3W5ny5YtXHbZZaxcuZLKysp5X5jFbGk/Fyx5RRnB5/sxoOPz3ZPX2k61yFOFPvM4weAvEMKX97zx+DC7nn0Tfz5+ljufGuAvJ70cGvAw6vk1b77YwWsvaqbTVYFdNrIZE/sND3857bwzUawLkcWCrusL4h4ZHh5O/n7//fezadOmeR9zoXjOWdr5wvJ0XScUCiUXEnVdx+fz4Xa7GR8fp7q6GpfLxcqVKxc8JdoYQ7EK43NFBBLiZ6xJ6HmtbUOoBwdvYWzsVxhC39r6ibTjCKERi90BXJ51jFB
"text/plain": [
"<Figure size 432x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAVgAAAEQCAYAAAD1Z2xBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAABh2ElEQVR4nO29fZgcxXkv+qvRfmhZhHbBYrUKIJBxZD6etYxY5HW4kvY6TuzjY5zo2gRjY3KTWDbcBHyey03EMT4Y9JyY3MO9HJ+P4E+BBTYxcfC1wTKJ4yMJHISMFgkMu2eRtCwx1swKraRltLua/ei6f/TUTHV1VXX11/Sspn7Po0fSTE93dXf122+97+/9vYRSCgsLCwuL5JHLegAWFhYWZyqsgbWwsLBICdbAWlhYWKQEa2AtLCwsUoI1sBYWFhYpwRpYCwsLi5TQlPUATNHR0UEvvfTSrIeROCYnJ9He3p71MBLFmXhOgD2vhYRantPAwMAxSuky2XcLxsB2dXVh3759WQ8jcezatQsbN27MehiJ4kw8J8Ce10JCLc+JEPKG6jsbIrCwsLBICdbAWlhYWKQEa2AtLCwsUoI1sBYWFhYpwRpYCwsLi5RgDayFhYVFSrAG1sLCwiIlWANrYWFhkRKsgbWwsLBICdbAWlhYWKQEa2AtLCwsUsKC0SKwsLBYuFh+/3KMTY75Pu9q70LhjkIGI6oNrIG1sLAAkK4RlO1X9/mZAhsiCIFSKY/9+zegVDpz37j1gjPtWrPzKRZfSuy8kr5GjWoE04Q1sDCfqKOjWzEx8SwGBq4yntTL718Ocg/x/Vl+//Ikhh4J9Wi8xDG51/oXeOONrRmPzBy668rOZ2joU4md10K8RjKIz0M9PjNRYQ0szCZqqZTH2NhDAChmZvIYGdlitO8wXkEaE0u2z7/6wQqcOPkM3nhja02NrYkBYmNyr7WDQuGhunoR6KCaR/z5TE29iiTOa6FeIxX45yGsJ12PDgNDwxtY04k6OroVjjNf+f/Y2KOxbyi5h6B/d3/F8MVdosmMqfjbc1uADy8HcgQoFB7C4cN3hvbKo8LEABUKD2Fk5E5Q6gAAKJ1fEB6abh6Njm6tnA+D48xhYOCqyCEDfp8L5RoFgc1ZUzDD6s7h+vTkG97AmkzU6sMzy306r/Vi2c3vbE54wBqYGOKbLgJIeQ5TOo+jRx9F1Su/M7WxmRogx5nD2NijoHSmPMaZBeGhqeZRqZRHobCtcj5VzGJmJo9XX/2j0MaBXUv+Gr3+5t/i3L/2r34W3bOoLpbbXe1die+ThezcOVyfnnxDG1jZRBVvUqmUx8DAWo/3ysB7saoY4mdW1uBEDMG815byXXfPm/fKH0k0CcNDZ4D4e+C+xLzXulYeWtSlpm4ejYxsAaUl5W9Pnx5GWOMg84gJIJ1rDhz/h5C/jFVGMAnjWLijAHo3jb0fBj5kx+ZLPXryDW1gZRNVvEmjo1sxM5OH13tlqG4rxhALhW0AHHx4OWrqxerAe69yzGNw8PpIyy2dcdIZINk9EEHpDCYmngs1HtMxDwz0YWCgrzKWKOeum0fHjj1ptI8wxuHtt/f4POKWHHDFOWbjVYEZQfFPPfFU+TxCae6057t6XO00NA9WNlH5h7n6lgRcH8H/Bp6YeM63/J2fnwSlrkFuyrmexVcPpXgihrjinKr3qsL09GsA3PjsypVfQmur2VKSGSc3bLIfpdI/Vn47MrIFjuP14phBkd0DAGhvX4Pe3v1Gx46K0dGtKBafr4zxrbe+D3YPw5y7ah6dOLEb8/NvG+2DGQeT47LrUirlMTh4Ay6//PtYfF+30XGyRld7V2zaF1uJNUvmcvVF9YlYx0gKqRpYQsiFALYDWA7AAfANSulXCSHnAvg+gIsBjAK4nlJ6Is2xyBD0APOeCSHN6O7+M/z2b/9333bDw7f6YogoL80WEWDThYvxZP40/uJS4N5B4ITMGdYgqfjV5her/952dRMuaZ9Tbssmqux8RfAvGHbu/G/Hx38C8eXEXmS9vfsxPHwr8vmvY8WKzxsdTzw2MzKmBpH9zl1luBgbewSENJXH5p77ypV3Ge1bNY8GB2/G9PSrxmMKc82B2tG0kihAUO1DBzGkwBJgupVY1UGqDwObdohgDsD/SSm9DMD7APwfhJDLAWwB8HNK6bsA/Lz8/7qCSXxWtp0shug4M7j78kXoWSqPk+kQZommi6GJy74bf/df4XrliuPSGRw58jUUiy8HHte7RJ4HQCvXqlTKw3EmAQC5XBv6+vLo6zuCpUvXo6fnp7HpRjojE0QLY6sMF47vXsfJTpdKeRw9+t1Qv6F0BidP7jbeP3/d0gxDJVGAENa46pwK1UqsvX0NNm6kqa98wiBVD5ZSmgeQL/+7SAgZAvBbAD4GYGN5s+8A2AXgr9IcS1jo4mq8h2ESQwQcXHSWa8w+tBzY/kZ4L9YEYWJlo6NbQUgzKJ0BIS1oa3sXpqcPCktdB0NDN+Kaa14BIPcW/S8YF+xaUUp9yS1KacVwyb439eBEIyMur3njy++Tj5GrUGVYVPcdBqOjWyG+aPXIAXDQ0bHBeP/8dfvcpW24b2hastecNNGVRlY/SYxNjoHcQ6ReMr8SY0gygZYkahaDJYRcDOC9APYC6CobX1BK84SQ82s1DlMExWd12/nBXrcUOZJ9TFbmnU9NDUIWY56aGkSpVEBr63KpwVK9YFwvcFvZgFaPk8+zZblT+bfoOZrGP2XMBDYunfH1e69+8PeU0nlpbFmHt9/eI/28vX0NAGBy8oDwjXseR458Dd3dn8OSJT3Kfcvu34eXt+HLH8uHCpMsBDBDu1BBKE3f8hNCzgawG8B/pJQ+QQg5SSnt4L4/QSntlPxuM4DNALBs2bK1jz/+eOpjjY/PAtBbz9PzwI17zb3YzuZOPPH+J+IPrYIHAOyAG8FhaALwEQBfEL5nn98E4EYAMwBaAXwPwLnQny97MHRzTEwe8uPQYZwbDwM/Ltk5sH3+73BD/yr8PoCdwr5zoJSCkOsMxhYW4v24GMBDyq2D718Vm57bhBOz/vQGP6dOnTqFs88+W3m0/t39+uFL9hl1H1EgO27QOSWJ/v7+AUrp1bLvUjewhJBmAE8B+EdK6f9b/mwYwMay99oNYBeldLVuP6tXr6bDw8OpjjUNDA/fijd+86AnZjTjADvy8b3YqCpHL7zwXokH5XpXPT07sHfvKjhOlQKTy7Vh2bLrcfToY5WQgpjwU+0zCkwYBIODN+Po0UfAG2c2rpUr75Kew7p1I2htXS5NqpVKee43i0DIIunKhN9PEvAet4q1a19SerG6+ydeN533x5bVu3btwsaNG5XbhfEgxaV6lORW1GPxCDqnJEEIURrYtFkEBMC3AQwx41rGjwHcDOC+8t8/SnMcWeLtt/f4AvJJcBYB+fLJxOjqjBfPiGCoMiMYodu/lOf3yU/u4eFbUSh8WxNGacaKFZ8NzR44flzNTNDFz1euvEsaOhATdZTK46csXHD69OtKdkEYZoMqxMLHvkXUOokTh1qVphKXzPDXm75s2jHY34G7tvwVIeRA+bN/D9ewPk4I+VMA/4p64VSkgN7e/TWNIcWd0PKYsj+WYZKQUiXAxH2HLSIQmQmiR/nCC++Vxs9PntwtTaoxo8v/hu0XoB4Pk9IZKRWNhyq5JoMqhs/HvrOGaLDqOSZab9KKabMIfgE1F+gDaR67HsA8mc7mdFgDaUDmHcmWpCbVVSYMi1xuMXp6fqrdRvQIdckt2TmwkMCSJVdLqXduYYjc4+UNchWud5vPb0Ox+CKuvPKHPlaFacEC4wGLXj4hzaEYFRb1iYYulU0bJnoEOzfsrHBT6xW9vfuxcSP1/QlaqpowLBxnJpBnKpMyNBWDEYsgRE0JSucxPv4TZSWW7hwonUGx+LyvtDqsypUpY6WRUO80MlM0dKlsElDF2/gH+8Pdau5r/+5+dO1z40ZJlBHWE/iSzj17fgtyNoGjNSTyMuRgfjKDvwhCNLAzaG29ANde+1blM+bxdnZu8Oxz165/QC73aS4h5e63UNhW5slSqfE38WLTgmpOmRgwVYK
"text/plain": [
"<Figure size 360x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWcAAAFYCAYAAACPlD31AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAB4bUlEQVR4nO2dd3wb9f3/XydL8ojtxEueiUfsJI6zyIKEQpkFUhoaaCl0AGWU1ZYymkJbvqTfMkKBLuBX+AKFMMoO0FIKLYUESACHkUASJ3a85aXpoX3S3e8P+c4n6U46SSfp7Hyej4celk83PpJOr3vf+/MeFMuyIBAIBIK60GR6AAQCgUCIhIgzgUAgqBAizgQCgaBCiDgTCASCCiHiTCAQCCqEiDOBQCCoEG2M10mcHYFAIESHSsVOieVMIBAIKoSIM4FAIKgQIs4EAoGgQog4EwgEggoh4kwgEAgqhIgzgUAgqBAizgQCgaBCiDgTCASCCiHiTCAQCCqEiDOBQCCoECLOBAKBoEKIOBMIBIIKIeJMIBAIKoSIM4FAIKgQIs4EAoGgQog4EwgEggoh4kwgEAgqhIgzgUAgqBAizgQCgaBCiDgTCASCCiHiTCAQCCqEiDOBQCCoECLOBAKBoEKIOBMIBIIKIeJMIBAIKoSIM4FAIKgQIs4EAoGgQog4EwgEggoh4kwgEAgqhIgzgUAgqBAizgQCgaBCiDgTCASCCiHiTCAQCCqEiDOBQCCoECLOBAKBoEKIOBMIBIIKIeJMIBAIKoSIM4FAIKgQIs4EAoGgQog4EwgEggoh4kwgEAgqhIgzgUAgqBAizgQCgaBCiDgTCASCCiHiTCAQCCqEiDOBQCCoECLOBAKBoEKIOBMIBIIKIeJMIBAIKoSIM4FAIKgQIs4EAoGgQog4EwgEggoh4kwgEAgqhIgzgUAgqBAizgQCgaBCiDgTCASCCiHiTCAQCCqEiDOBQCCoECLOBAKBoEKIOBMIBIIKIeJMIBAIKoSIM4FAIKgQIs4EAoGgQrSZHgBBOViWBcMw8Hq90Gg00GiC116KoiT/cs+FywkEQuYh4jyNYVkWLMsiEAiAYRgwDAMA8Hq9oCgKLMuG/I2G3+/H2NgYSktLQ0RbKPDhD2658PXwZcK/BAJBPkScpxlCQQ4EAmBZFsCUOLIsi6ysrLgF0efzYXh4GGVlZQDACz33lzsO9zze/YeLd7jIcxcBYu0TCEGIOE8DOHcFZyELBZkTMyBo/ZpMJoyPjyMrK0v2I5rVq8TYw/8PF/rw1ymKwpdffomlS5fG3L+YaBNrnzATIOKsUjg3BSfIvb290Ol0qK6uDhFkj8cDk8mEkZER+P1+lJSUID8/n9/e7/fD6/XylrbYgxNMj8eD1tZWaDSauMRd7MH5vMNFTq7o0TSNrKysqOuEC7vQ2pcS/XjgxJumafh8PhQUFES1+uMRfwIhFkScVUIsd4Xwhz4xMQGTyQSTyQStVouysjK0tLQgLy8PDMOApum4RcDr9eLQoUNYvnw5f1GI9pAr+kIoipIt7n6/H3a7XVL4uf2F718JwsftdDphMpnQ1NQk+nq8wi/l4uH+D7f8xZ5H+0uYGRBxziCcu4ITQzFBBoKWoMvlgtPpRE9PD2bNmgWDwYDVq1dDp9OJ7jeZHypn9YrtOxmEdwJ+v5//G77M5/MhEAjAYrGIij5nIXPEI/py3Dtioh/uQkoUKWtf7DWp75FlWXR1dWH+/PkRr0Xz7RNrf3pBxDnNCP3HgUCAXy7mP7ZYLLwPWavVYvbs2Vi4cGFMkVDrj0kY3pednR11XbPZzFuqsRDecUR70DQd9XUx0ddoNGBZFjRN4+DBgwm7eKSELpHvimEYjI2Nhbh9EvXtyyWWtc8Raw6DCL98iDinGKG7Ynx8HH6/H4WFhQAiIw88Hg/MZjNMJhO8Xi9KS0tRW1uLwsJC9Pf3K2a9icFFekxHKIqCVquFVqvs6cx9b3a7HWazGXPnzo2w+jlLn3suJfrCz5b7HpOx8LnxKSn6Up9BOGLWvt1uh81mQ0NDg6KRPIlY+zNF8Ik4pwAp//Ho6Cj8fj/mzJnDr+dwODAyMgKz2QyNRgODwYDm5mbk5eUldOyZcmKqAU70dTodtFotCgoKFNlv+N2T1INz74g9XC4XPvnkkwjxDBd9rVab0IUgmmUrtUyj0cR1gYzH2he6/KLxpz/9CStWrMA555wjexxqhYizQkQLdxP6LFmWhc1mg8lkgtVqRV5eHgwGA1auXAm9Xi+5/+ls2U53lP7chT7yRAgEAti7dy9WrVoVslyu6Pv9fng8nqjrxhL98IfH4+Ejh2JF8Ag/h/DPJVlcLleEe2q6QsQ5CYTWsfCECHc/cJNb/f39cDqdMBgMMBgMaGpqSvgHmgqI+EszHe5IkhX9aMQSfa/XC5Zl4XK5ZEXwcMYGJ9iJWvjhETyBQEBVv6lkIOIcB3LC3Ti8Xi/MZjNGRkbg9XpRUlKCkpISlJWVoaGhIe5jUxSVUotgOogPIXPEiuDhXCh1dXVx7Tea6AujeTgXj9DXL1zXbDbjV7/6FXw+H1599VX89re/xaxZs3DNNdfg/PPPjzqGSy+9FK+//joMBgP2798PANiyZQseeeQRPmP2zjvvxIYNG+J6b8lCxDkG4eFuhw4dwsKFC0UF2eFwwGQywWw2AwAMBgMWLVqEWbNmAQCGhobgdrsz8j4IM4dkQyVTQaJjUjJs86yzzsIvf/lLnH322TjttNPgdDpl+cAvueQS/PjHP8ZFF10Usvz666/HTTfdlPS4EoWIswjRwt3sdjt/C8WyLOx2O+8/zsnJgcFgwIoVK0RDxZLxG6fD50zcGoREUcu5wzAM7+rgoqJiceKJJ6Knpye1A0sAIs6YclcIJ/Q4wq1jlmX57LzR0VHMnj0bBoMBjY2N09rXpTZLjCCNGi1nQB3nUCAQUCyk8oEHHsCTTz6J1atX47777kNRUZEi+5XLUSvO8fiPfT4f7z92Op2w2Wyorq5GS0tL3IH8aracpzuNDzXC5DJFLDfkGXDkqiNp38/RhFouGH6/XxEj6eqrr8att94KiqJw66234sYbb8Rf//pXBUYon6NKnOWmSwNT9RTMZjNYlkVZWRkWLFgAt9uNRYsWJXR8NZy8MxkxQY22HJAW4nj3kyg0PYyurh+ioeEJ6HTliu8/XahFnJWK1igvn/ourrjiCpx99tlJ7zNeZrw4C61jp9OJoaEhPotJGO7GsixGR0dhMplgsViQnZ0Ng8GA5cuXh/iPkz0B1Ww5Hy2WeTyCrOS+xCzvwcG74XB8iMHBu1Fb+3tZ+1GLEKoRhmEUcWsMDQ2hsrISAPDKK69gyZIlSe8zXmacOEdzVzAMA6fTyYsywzCwWq0wmUyw2+0oLCyEwWBAQ0OD4qnA3BjUKoBH049dSQs4nn2Fr0vTw7BanwHAwGp9GlVVv5i21rNaLhiJuDUuvPBC7NixAxaLBTU1NfjNb36DHTt2YO/evaAoCnV1dXj44YdTNGJpZoQ4y3VXaDQaBAIBDA4OYmRkBC6XC8XFxaisrERzc3PK6lYogZqFfTpQ+Ht5M/fpZHDwbgDc5DMj23pWixAKUcu5mYhb49lnn41Ydtlllyk1pISZtuIst7obEEzpNJlMGB4ehtPpxOzZs9HY2Ij8/Py0nuRqF1g1j22mwVnNLOsDALCsj7eeAXZa+qHVcMFQMloj00ybdxFvuNv4+DjvP9bpdDAYDFi4cCF6enpE6+Cmg2RO3lQLuxp+WOHEGzVhyDOkZNJOuP9YY4uF1HZFOuCV45lJa5oN8UNPh0lDtVjzJH07TQj9xx6PB4ODg5g3b55odAXDMHxBIZvNhoKCAhgMBtTX1/NXUrfbnXHrMNPHn07EG30hFOxkJ/2EFwCr1Qq73Y7GxsaYY4iF1HZ2Gjhphw/Ao/yyIt2jOHLVLyImDdUihGqEWM4pRMp/zE3eCXP3aZrm6x87nU4UFRWhvLwcixYtEvUfZ9qtQOKc00ci4hktjjkTYmingYGB22CzbYdw0hCYnfaxxIIrYpRpiDinAK6UoVR1N06c3G43n6H
"text/plain": [
"<Figure size 432x360 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAVgAAAEQCAYAAAD1Z2xBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAABlZklEQVR4nO2deZgU1fX+P3dgGBYRBoGBMQYlUdwyksBoUCMQv0nMppHEPYYIgsrPRI0bJhiiJMao0bgQdxaNcYmauCcaA6gJEEVQWQIKYqLMDDBsAzP0LH1/f1TXdHX1rapb1VXT3TP1Pg8P091Vt+6t5a1zzz3nPUJKSYwYMWLECB8l+e5AjBgxYnRWxAQbI0aMGBEhJtgYMWLEiAgxwcaIESNGRIgJNkaMGDEiQkywMWLEiBERuue7A7ro37+//OxnP5vvboSOPXv20KdPn3x3I1R0xjFBPK5iQkeOadmyZVullINUvxUNwVZUVPDWW2/luxuhY+HChYwbNy7f3QgVnXFMEI+rmNCRYxJCfOT0W+wiiBEjRoyIEBNsjBgxYkSEmGBjxIgRIyLEBBsjRowYESEm2BgxYsSICDHBxogRI0ZAbN/u/ntMsDFixIgRAM89B0cc4b5NTLAxYsSI4QPbtsG558LJJ0NNjfu2McHGiBEjhiaeeQYOPxz+8Afo1QtuvdV9+5hgY8SIEcMD9fVwzjnwne9AXR0cfzy88w5cdpn7fjHBxogRI4YL/vxnw2r94x8Nq/X222HRIjj4YO99Y4KNESNGhyGRqGH58rEkErX57oontm6Fs86CCRNg82Y44QR491348Y+hRJM5Y4KNESNGBqIkwY0bZ7Fz5xt89NGs0NsOE089ZVitjz0GvXvDnXfCggXgV9AvJlgfKKa3b7Gjs51rczwNDe+ENq6ozlFUJJhI1FBXNxdIUls7tyCv7ZYtcMYZ8L3vGX+PGwfvvQcXX6xvtVoREyz6N6px473OsmVf8H1zFBJhFFJfTNj7VCyWjhVu59Ucz5o154Q2rijOUZQkuHHjLKRMApBMtno+Rx19n/7pT4bV+sQT0KcPzJ4Nr74Kw4cHbzMmWPRu1PSNJ2lurmHDhumhH8N6rLBvLGub1r505E2sQ0Bmnwrd0lHB6Rpbx9PYuIowxhXVObKSoJRtoZG32V8pm1PftHg+R7rPTK738ObNcNppcPrpht91/HjDap02LZjVakWXJ1jdG3Xjxlkkk23tn+vq/qB9Qa3HqKmZw7JlYyz71mfdHLlaJqobzmxzw4bpGeNdv/6awFa5X+gQUG3tXDZsuCaShzxKuN1HVtIyYVpwQV0GURChnQSlbA6NvFXnANLPkf2e1XkuzX2Me9j/8yIlPP64YbU++STssw/cfTf8/e9w0EGBhpmFLk+wOjdq+mK3WL5tc3372i3G9DGaaWhYYjnOQxk3RxiWiZ3IrG3W1f2h/UUhZRubN/+BtFV+je9j6UKXgJLJVurq/hDJQx4lnO6jRKKG2to5FsvNhGHBrVp1hm9ycCNCJ2tOx8pTkWBY5L1r12LFOQAw2rffszrPpemyM+5hf89LXZ3hZz3zTCPG9cQTDav1wgtzt1qt6NIEq/PGTiRqWLZsVIb1asJqxTr5EE2LMX1zGTdNbe0cGhreAf6K9ebI1TJREVnmg9OG+aIw+mS1yh8OdRHGCjcCsk8drX2ybx8lgk413e6jDRumI2XCcd+9e9filxzciNBplqAzK1KRoJTN7Nz5L61+uaG6ejnjxknGjNlESUnPjN9qa+dQWzsH8zw0NLyj9VyaLjvzftG5T6SERx81rNannzas1nvvhVdegQMPzHmYWejSBKvzxt64cRbNzTVkWq8m2jLeuFYfonnDWC1GK5LJZtasOQeTcKVs4/33L6Gm5p6crDc7kWUTvBvaWL369EDTLTdyciMgp6mjFWE95Kp+LVs2pt1lE9Q143Yfbd36nFYbfl4iTkS4ffsi5SxBd1ZkkqD9X3X1cq1+6UDtLmlGSvOl38aaNee4Ppem0SOl/UXs/rzU1hoxrWefbegJfOUrsHIlTJ0KQoQ1wkx0aYL1emOnb0wA9RXYufNfDj5Ek5DTFmMmzAWP1vbjbt36J4w3srU/+g+eisiM6bY7gVnR1LSOIO4Jq8UOl2Tsu2HDdJLJTCvOHJfT1LFPn5GRPeTWPjc0LKGhYUmWb9rP2N0Ir61tl1Ybfl6mVmuwX78TGDOmhnHjJP37n6CcJUS1cBUE6uudJG1oNNPYuNr1uVy/fjrNzTWWZ8y6Xfb4pIRHHjGs1r/8Bfr2hfvvh7/9DYYNC2tkakRaVVYIcQDwEDAE4wzeJ6W8XQgxAHgcOBDYCJwupfRQVgwfXg+t9cYUopShQ8/nkENmZ223du20LB+iecMAlJT0ZMCAb7N165PYCTQT2b/5sd7U1mBb1pseQIieSLnXuSepG1U1XjvsPl5IZuxbX/8C2S8OY1zV1ctZu3YaNTX3Ull5odbx7MdevfpMDj/8ccrKhvjaz5hlGKirexghuqf6Zox92LAZWm073UerV0+kqWmVdp/8nHPInDUNGzZDOUsYOnSq8vthw671db4g+Lm2oqrqRVavPpPPfvYOli//Islk9j3o9qwlEjVs3vyIY/vp5+U0wFC7uvBCePZZ4/evfc0g1wMOCNR934jagm0FLpdSHgZ8Efh/QojDgenAq1LKg4FXU58LCrorqjo+xGSymfr6Z3En1zSE6EFl5TTf1puuNThunOSLX9yAk1UOxng3bbqHhoZ3PY+b7eOVGYsuyeQeAEpKejFmTE275VVV9VLOi3pu03qvsLBMCyiZda2Drk6bx3YjAhWkbGbHjkXa7dujQVTTaq/pth+EEXdrjQd2mlm5GRUbN87C/nyZMO/z6urlSAkPP2xYrc8+C/vuCw8+CC+91HHkChETrJSyRkr5durvBmANsD9wCjA/tdl84DtR9iMIdFdUdXyI1oe3pKQXo0ataJ/awQLGjNkE9LAcJ9jKuR8f2saNsxCiFDAIvXfvIxCih22rJGvWnN3+SUVY2S8YcwzpRRf79NT6oOYyffUiZ7ewMNNH7oR0hEVw4nciAjWMR7F//7Ha7VvP27Ztzyun1U1N60NZuAojusUeD2ztl+rl67w/WftZ7/NNm+BnPzuSH/wAduyAr38dVq2CSZOi87U6IVIXgRVCiAOBzwNLgQopZQ0YJCyEGNxR/dCF7oqqc/iJFeZ7TLZbFY2Na1IP/mmph1FNUH6nzDpQWeeNjatRWdiNjatJJGopKxuSQVhmv5xeMMZLYg5Syozj1NSY0/Jk+99Bp68qcjb7ZScEa5vZ1ms2rNfUXCyE5SQSf9Pq265di5Xf9+kzEoA9e1bYfjHGsWnTPQwdegF9+1Y5tq26fslkI2PG1ASeunvB7VwHacMOs00pZcY9ZnVLuBk9hxwyGynhoYfg0kthx46B9OsHv/sdTJzY8cRqQkipN23N6SBC7AMsAn4lpXxaCLFDStnf8vt2KWW5Yr+pwFSAQYMGjXriiSci72vumAJ8oLltGXv23E+fPj8FPlb8fiAwV/F9rrgNeBFzgc1Ad+CbwKW2383vzwXOxngRlAF/BAbgPl7zrna7x4Ttd2s/3FBv6Y8Ja79UYzDbPA/D9e+ErwELbG2XIKVEiJM1+uYX9utxIO7X3ev6qVAPXA/MxDg/aezevZt99tnH5XhO5/ou4E5lm3pt2HEgsInMe2w+8BzwVQwKUa0bfJYtW+bz29+OYOnS/QAYPbqOq65az6BBOtEzuWH8+PHLpJSjVb9FbsEKYx76FPCIlPLp1Nd1QoihKet1KLBZta+U8j7gPoARI0bIcePGRd3dEPB+xqe1a6dRW/ug0soVQtKnz1P07t2PxsY0wXbrVk5b204qK7/BIYeMa/8+jEUGgDffvIw9e1pt37bSp89HVFWNYOnSl0kmW9u/Lyl5mUGD+rN5s7EiK4Rk6NBXUxbM+6k2P6+wynRe3vZtjH5UV49z3Wv16ols3pxphZr9GjZshnIMxxxzD2VlQ1i79utZi2qJRA1Llw5PLbr8HSG6kWl7JBGCjHbCgHFca18
"text/plain": [
"<Figure size 360x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – this cell generates and saves plots for Figure 8– 6\n",
2021-11-21 06:04:07 +01:00
" \n",
2019-01-15 05:36:29 +01:00
"axes = [-11.5, 14, -2, 23, -12, 15]\n",
"x2s = np.linspace(axes[2], axes[3], 10)\n",
"x3s = np.linspace(axes[4], axes[5], 10)\n",
"x2, x3 = np.meshgrid(x2s, x3s)\n",
"\n",
2021-11-19 11:36:04 +01:00
"positive_class = X_swiss[:, 0] > 5\n",
"X_pos = X_swiss[positive_class]\n",
"X_neg = X_swiss[~positive_class]\n",
2021-11-19 06:03:48 +01:00
"\n",
"fig = plt.figure(figsize=(6, 5))\n",
"ax = plt.subplot(1, 1, 1, projection='3d')\n",
2019-01-15 05:36:29 +01:00
"ax.view_init(10, -70)\n",
"ax.plot(X_neg[:, 0], X_neg[:, 1], X_neg[:, 2], \"y^\")\n",
"ax.plot_wireframe(5, x2, x3, alpha=0.5)\n",
"ax.plot(X_pos[:, 0], X_pos[:, 1], X_pos[:, 2], \"gs\")\n",
2021-11-19 06:03:48 +01:00
"set_xyz_axes(ax, axes)\n",
2019-01-15 05:36:29 +01:00
"save_fig(\"manifold_decision_boundary_plot1\")\n",
"plt.show()\n",
"\n",
"fig = plt.figure(figsize=(5, 4))\n",
2021-11-19 06:03:48 +01:00
"ax = plt.subplot(1, 1, 1)\n",
2021-11-19 11:36:04 +01:00
"ax.plot(t[positive_class], X_swiss[positive_class, 1], \"gs\")\n",
"ax.plot(t[~positive_class], X_swiss[~positive_class, 1], \"y^\")\n",
2021-11-19 06:03:48 +01:00
"ax.axis([4, 15, axes[2], axes[3]])\n",
"ax.set_xlabel(\"$z_1$\")\n",
"ax.set_ylabel(\"$z_2$\", rotation=0, labelpad=8)\n",
"ax.grid(True)\n",
2019-01-15 05:36:29 +01:00
"save_fig(\"manifold_decision_boundary_plot2\")\n",
"plt.show()\n",
"\n",
2021-11-19 11:36:04 +01:00
"positive_class = 2 * (t[:] - 4) > X_swiss[:, 1]\n",
"X_pos = X_swiss[positive_class]\n",
"X_neg = X_swiss[~positive_class]\n",
2021-11-19 06:03:48 +01:00
"\n",
"fig = plt.figure(figsize=(6, 5))\n",
"ax = plt.subplot(1, 1, 1, projection='3d')\n",
2019-01-15 05:36:29 +01:00
"ax.view_init(10, -70)\n",
"ax.plot(X_neg[:, 0], X_neg[:, 1], X_neg[:, 2], \"y^\")\n",
"ax.plot(X_pos[:, 0], X_pos[:, 1], X_pos[:, 2], \"gs\")\n",
2021-11-19 06:03:48 +01:00
"ax.xaxis.set_rotate_label(False)\n",
"ax.yaxis.set_rotate_label(False)\n",
"ax.zaxis.set_rotate_label(False)\n",
"ax.set_xlabel(\"$x_1$\", rotation=0)\n",
"ax.set_ylabel(\"$x_2$\", rotation=0)\n",
"ax.set_zlabel(\"$x_3$\", rotation=0)\n",
2019-01-15 05:36:29 +01:00
"ax.set_xlim(axes[0:2])\n",
"ax.set_ylim(axes[2:4])\n",
"ax.set_zlim(axes[4:6])\n",
"save_fig(\"manifold_decision_boundary_plot3\")\n",
"plt.show()\n",
"\n",
"fig = plt.figure(figsize=(5, 4))\n",
2021-11-19 06:03:48 +01:00
"ax = plt.subplot(1, 1, 1)\n",
2021-11-19 11:36:04 +01:00
"ax.plot(t[positive_class], X_swiss[positive_class, 1], \"gs\")\n",
"ax.plot(t[~positive_class], X_swiss[~positive_class, 1], \"y^\")\n",
2021-11-19 06:03:48 +01:00
"ax.plot([4, 15], [0, 22], \"b-\", linewidth=2)\n",
"ax.axis([4, 15, axes[2], axes[3]])\n",
"ax.set_xlabel(\"$z_1$\")\n",
"ax.set_ylabel(\"$z_2$\", rotation=0, labelpad=8)\n",
"ax.grid(True)\n",
2019-01-15 05:36:29 +01:00
"save_fig(\"manifold_decision_boundary_plot4\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 12,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjAAAAEQCAYAAACutU7EAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAABclElEQVR4nO3dd3iT5frA8e+TdO+yCpRREARB9lIcDAFlKILgOCJbRPQ4URTFddTjwA0yFH6oR0VEFEEc4BFRhCMUobILBQplQymdtGme3x9PUlooUKDNm7T357pyJXnzjvt9kyZ3n6m01gghhBBC+BKb1QEIIYQQQpwvSWCEEEII4XMkgRFCCCGEz5EERgghhBA+RxIYIYQQQvgcSWCEEEII4XP8rA6gNFWpUkXHxcVZcmyHw0FSUhJKKWrUqEFYWJglcZyPzMxMQkNDrQ6jRHwlVl+JE7wn1vj4+MNa66qlvd+oqCjdoEGD0t6t1/KW99NTKtL5VqRzhZJ/J5SrBCYuLo7Vq1dbcuzk5GTatGmD3W7nmWee4R//+IclcZyPpUuX0rlzZ6vDKBFfidVX4gTviVUptass9hsTE2PZ94EVvOX99JSKdL4V6Vyh5N8J5SqBsVKdOnVYsGABderUYevWrVaHI4QQQpRr0gamFF1xxRXUrFmz4LnT6bQwGiGEEKL8kgSmjMyaNYurr76a7Oxsq0MRQgghyh1JYMrAiRMneOmll1ixYgVz5syxOhwhhBCi3JE2MGUgMDCQefPmsWrVKoYMGWJ1OEIIIUS5IwlMGWnWrBnNmjWzOgwhhBCiXJIqJA/Yv38///jHPzh8+LDVoQghhBDlgpTAeMCYMWP4+uuvsdvtfPLJJ1aHI4QQQvg8SWA84N1338Vms/H6669bHYoQQghRLkgC4wG1atVi7ty5VochhBBClBvSBsYCM2fOZNu2bVaHIYQQQvgsSWA8bOrUqYwYMYJbbrmF3Nxcq8MRQgghfJIkMB52xx130KJFCx599FECAgKsDkcIIYTwSdIGxsMiIyOJj4/HbrdbHYoQQgjhs6QExgKFk5eUlBTWrl1rXTBCCCGED5ISGAtt3ryZTp06ERAQwJo1a6hatarVIQkhhBA+QRIYC9WvX59LLrmE4OBgq0MRQgghfIokMBYKCAhg4cKFRERE4Ocnb4UQQghRUtIGxmKVKlUqSF601uzcudPagIQQQggfIAmMl8jNzWXo0KG0aNGCxMREq8MRQgghvJokMF7C39+fzMxMHA4HW7ZssTocIYQQwqtJwwsvoZRi5syZ7N69m6ZNm1odjhBCCOHVpATGi0RERBRJXnJyciyMRgghhPBeksB4qf/+9780aNCAFStWWB2KEEII4XUkgfFSCxcuJCUlhWnTplkdihBCCOF1pA2Ml3r11Vdp1KgRI0eOtDoUIYQQwutYksAopWYCfYCDWuvLi3ldAe8AvYAsYKjWeo1no7SWv78/99xzj9VhCCHOZs8eWLUKDh2CqlWhXTuoVevs69lsoLW5nW2bU7dTClJTYccOs23btnDDDadvez7HWrUK5s6FrVshO9ssy8oyxwJITjb7cTjMPmw28PeH6tWhRw9IT4dFi0xc+flF4/DzM+s6nSdvWpt79+uVK0PjxpCTA5s2wfHjp18Dpcx2p+gYEACRkea1jAyzjzMJCDCxOxwnjx8UdHL/7tj8/CAszJxfXByEh0NmJhw+bK7D8eOQm2uu1YkTJ8/H5qrMsNvNsWJioFkzuOIKiI6Go0fNLToaGjUy78HatbBsGRw8CNWqwbXXQo0a8NtvkJICsbEwYIB5zwq/r1u2mOtts5lju69PpUpw6aUl/zy5Pw8AP/wAq1ebfbVuDT17nnkfXsSqEphZwCTg4zO83hNo6Lp1AKa47iukzMxMRo0axS233EL//v2tDkeIUpGbm8tLL71kdRgXbs8emD8foqLMD1ZGhnnet2/RL//C69nt5kdLKbjmGpMsFLdNcdv98INJKJo1g4gI80N34AAMGXJy2/M51qpVMHEiBAbCkSPmRzMtzfyAOxzmhzUv7+QPpZvDYdadPt0kLe4E6VQOh7mdicNh4j9w4OzXubh9A/bcXPNDXBK5uacvy8oqft2sLHMdUlJMQmO3m+2PHTPXorhEyX19HA6T2OTlmc/Dli1Qr55JtCIjzT5OnICZM815ZWWZ65+SArNnm2vRqhXUqWPWnTgRxo4l4OBB2LDBXO8dO0wClZRkPncHDphjHDsGwcGwd++5P0/uz+usWSYJ3b/fJJNaw/LlZp9Dh3p9EqP0GT4cZX5gpeKAhWcogZkGLNVaf+56vgXorLXed7Z9tm3bVq9evboswj0vS5cupXPnzqW2v5kzZzJixAhq1qzJ9u3bCXL/53CRSjvOsuQrsfpKnGBtrH///TeDBw92z8Qer7VuW9rHUEqV6ZdbXyAUKFxmEAFkAvPPsN4VQCCggVxg5Rm2KW67JoA/pkh6MxCCacS4rNC253Osl13PqwK1gWggCnAAQUC4a582TjaW1K7Xc13rqEK3C+XE+saYhT8oDtfNCWS4loW6ngdz8r/+4s5Zu255QDZwFDgC7AcSXNvHufZXCdiLuZYBQFPXNltcN4DKmPfyf65tmrjWret6Hgkcw7ynu4ETwEbO/Xly6wJcAiRiPle4YrRT9HNlgRJ9J1j9uTmTWMz74bbHtaxCGjZsGI8//jiLFy8uteRFCCs4HA7+/e9/06ZNG9auXUtcXJzVIV2wakD6KcvSXcvPtF4k5kcqG5M8nGmbU7eLwCQvGZjEBdc+Ak7Z9nyOVQvzA+v+wbIB+ZgfaH+K/4FWnExo3LeLSV7AO36ECp+DOx4/Tp6jPyYpsRezfnH7sbnW9cO8X+6kJxuTuDhdy93lQrmY5MJ973YU8z6539cI1z5CXM/DMZ+JUNfySEr2eXILKLRPt+I+V97KW0tgvgP+rbX+3fX8Z+BxrXV8MeuOAkYBxMTEtJk9e3aZxl0SGRkZhIWFWR3GOflKnOA7sfpKnOD5WJOTk3nllVfYtGkTADfeeCOjR4+md+/eZVIC06hRI12mo1p//bWpAoiIOLns+HEICYF+/Ypfb8UKU4UApurgyiuL36a47davN1UTYWGm3UhWlqlS6NQJ+vUzJWqpqSU/1rhxpvrg0CHYtctUQRw7ZqpMcnKKtkcpXIXk52faeZypCsZD8jmZUJQKd1sSdzsWrU++t+np5ryzs89eLebej5+fqc6pVAmqVDFtW1q0MNds507TrubIEdPOJTDQvE/r15v3plEj8/6CWSc8nL/DwmhWv76pRjpxwlQlZmaa9ysy0nwm6tQx+2ra9NyfJ7eff4Zt28wxQ1yp8SmfKysopUr0neCtvZD2YEo13WphSttOo7WeDkwHU4XkDcX3ZV00/9NPP5GcnHzRPZSkuqP0+Uqc4LlYnU4n77zzDuPHjycnJ4fY2FhmzJjB9ddfX+bHLlPt2pk2BWB+QDIyzA9Kp05nXu+SS4q2Szl+vPhtitsuMdG0TYiNPfkD6G60eSHHGjDAtLEIDjYNWvPzTaISEmJ+xDMzT7aBKY6//9nbwPga9zm478PDzXWw283jY8dMYnKuBMadBIWEmFtMjNn+yBFzvZo3Nw1mY2LMe+JOYOrWNe+v+704dsxsM2wY6Tt2mESzenWTxAQHw759JjE6cMAkSRkZZh8l+Ty5P69VqpjjF24Dk5oKDRoU/Vx5KW9NYL4F7ldKzcY03k07V/uXimLLli307NkTm81G+/btad68udUhCXFGSUlJDBs2jGXLlgEwZMgQ3n77baKioqwNrDTUqmUaS65aZX5EqlY1PxynNnwsvF5mpkkmtDY/UiEhxW9T3HY9epzshZSVZfZzai+k8zlWu3YwdqzphZSeXvQ/cKVMAuTFvZDyAwKw+2ovpF69ivZCio0t2gspOdksGzYM2rUjNzMTOnY072tWlrnesbHm2A0bnuyFVLv2mXshFfd5HTrUvFa4F9JVV0kvpLNRSn0OdAaqKKX2AM9iqhnRWk8FFmG6UG/DtC0aZkWc3qhRo0Y8+uijhIaGcvnlp9W+CeEVtNZ
"text/plain": [
"<Figure size 576x288 with 4 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – this cell generates and saves Figure 8– 7\n",
2021-11-19 11:36:04 +01:00
"\n",
2019-01-15 05:36:29 +01:00
"angle = np.pi / 5\n",
"stretch = 5\n",
"m = 200\n",
"\n",
"np.random.seed(3)\n",
2021-11-19 11:36:04 +01:00
"X_line = np.random.randn(m, 2) / 10\n",
"X_line = X_line @ np.array([[stretch, 0], [0, 1]]) # stretch\n",
"X_line = X_line @ [[np.cos(angle), np.sin(angle)],\n",
" [np.sin(angle), np.cos(angle)]] # rotate\n",
2019-01-15 05:36:29 +01:00
"\n",
"u1 = np.array([np.cos(angle), np.sin(angle)])\n",
2021-11-19 06:03:48 +01:00
"u2 = np.array([np.cos(angle - 2 * np.pi / 6), np.sin(angle - 2 * np.pi / 6)])\n",
"u3 = np.array([np.cos(angle - np.pi / 2), np.sin(angle - np.pi / 2)])\n",
"\n",
2021-11-19 11:36:04 +01:00
"X_proj1 = X_line @ u1.reshape(-1, 1)\n",
"X_proj2 = X_line @ u2.reshape(-1, 1)\n",
"X_proj3 = X_line @ u3.reshape(-1, 1)\n",
2021-11-19 06:03:48 +01:00
"\n",
"plt.figure(figsize=(8, 4))\n",
"plt.subplot2grid((3, 2), (0, 0), rowspan=3)\n",
"plt.plot([-1.4, 1.4], [-1.4 * u1[1] / u1[0], 1.4 * u1[1] / u1[0]], \"k-\",\n",
" linewidth=2)\n",
"plt.plot([-1.4, 1.4], [-1.4 * u2[1] / u2[0], 1.4 * u2[1] / u2[0]], \"k--\",\n",
" linewidth=2)\n",
"plt.plot([-1.4, 1.4], [-1.4 * u3[1] / u3[0], 1.4 * u3[1] / u3[0]], \"k:\",\n",
" linewidth=2)\n",
2021-11-19 11:36:04 +01:00
"plt.plot(X_line[:, 0], X_line[:, 1], \"ro\", alpha=0.5)\n",
2021-11-19 06:03:48 +01:00
"plt.arrow(0, 0, u1[0], u1[1], head_width=0.1, linewidth=4, alpha=0.9,\n",
" length_includes_head=True, head_length=0.1, fc=\"b\", ec=\"b\", zorder=10)\n",
"plt.arrow(0, 0, u3[0], u3[1], head_width=0.1, linewidth=1, alpha=0.9,\n",
" length_includes_head=True, head_length=0.1, fc=\"b\", ec=\"b\", zorder=10)\n",
2021-11-27 11:03:26 +01:00
"plt.text(u1[0] + 0.1, u1[1] - 0.05, r\"$\\mathbf{c_1}$\", color=\"blue\")\n",
"plt.text(u3[0] + 0.1, u3[1], r\"$\\mathbf{c_2}$\", color=\"blue\")\n",
2021-11-19 06:03:48 +01:00
"plt.xlabel(\"$x_1$\")\n",
"plt.ylabel(\"$x_2$\", rotation=0)\n",
2019-01-15 05:36:29 +01:00
"plt.axis([-1.4, 1.4, -1.4, 1.4])\n",
2021-11-19 06:03:48 +01:00
"plt.grid()\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"plt.subplot2grid((3, 2), (0, 1))\n",
"plt.plot([-2, 2], [0, 0], \"k-\", linewidth=2)\n",
"plt.plot(X_proj1[:, 0], np.zeros(m), \"ro\", alpha=0.3)\n",
2019-01-15 05:36:29 +01:00
"plt.gca().get_yaxis().set_ticks([])\n",
"plt.gca().get_xaxis().set_ticklabels([])\n",
"plt.axis([-2, 2, -1, 1])\n",
2021-11-19 06:03:48 +01:00
"plt.grid()\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"plt.subplot2grid((3, 2), (1, 1))\n",
"plt.plot([-2, 2], [0, 0], \"k--\", linewidth=2)\n",
"plt.plot(X_proj2[:, 0], np.zeros(m), \"ro\", alpha=0.3)\n",
2019-01-15 05:36:29 +01:00
"plt.gca().get_yaxis().set_ticks([])\n",
"plt.gca().get_xaxis().set_ticklabels([])\n",
"plt.axis([-2, 2, -1, 1])\n",
2021-11-19 06:03:48 +01:00
"plt.grid()\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"plt.subplot2grid((3, 2), (2, 1))\n",
2019-01-15 05:36:29 +01:00
"plt.plot([-2, 2], [0, 0], \"k:\", linewidth=2)\n",
2021-11-19 06:03:48 +01:00
"plt.plot(X_proj3[:, 0], np.zeros(m), \"ro\", alpha=0.3)\n",
2019-01-15 05:36:29 +01:00
"plt.gca().get_yaxis().set_ticks([])\n",
"plt.axis([-2, 2, -1, 1])\n",
2021-11-19 06:03:48 +01:00
"plt.xlabel(\"$z_1$\")\n",
"plt.grid()\n",
2019-01-15 05:36:29 +01:00
"\n",
2019-05-06 07:15:01 +02:00
"save_fig(\"pca_best_projection_plot\")\n",
2019-01-15 05:36:29 +01:00
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"## Principal Components"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 13,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"import numpy as np\n",
2019-01-15 05:36:29 +01:00
"\n",
2022-06-14 07:47:11 +02:00
"# X = [...] # the small 3D dataset was created earlier in this notebook\n",
2021-11-19 06:03:48 +01:00
"X_centered = X - X.mean(axis=0)\n",
"U, s, Vt = np.linalg.svd(X_centered)\n",
"c1 = Vt[0]\n",
"c2 = Vt[1]"
2019-01-15 05:36:29 +01:00
]
},
{
2021-11-19 06:03:48 +01:00
"cell_type": "markdown",
2019-01-15 05:36:29 +01:00
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Note: in principle, the SVD factorization algorithm returns three matrices, **U**, **Σ** and **V**, such that **X** = **UΣV**<sup>⊺</sup>, where **U** is an _m_ × _m_ matrix, **Σ** is an _m_ × _n_ matrix, and **V** is an _n_ × _n_ matrix. But the `svd()` function returns **U**, **s** and **V**<sup>⊺</sup> instead. **s** is the vector containing all the values on the main diagonal of the top _n_ rows of **Σ**. Since **Σ** is full of zeros elsewhere, your can easily reconstruct it from **s**, like this:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 14,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – shows how to construct Σ from s\n",
2021-11-19 06:03:48 +01:00
"m, n = X.shape\n",
"Σ = np.zeros_like(X_centered)\n",
"Σ[:n, :n] = np.diag(s)\n",
"assert np.allclose(X_centered, U @ Σ @ Vt)"
2019-01-15 05:36:29 +01:00
]
},
2021-10-03 12:05:49 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"## Projecting Down to d Dimensions"
2021-10-03 12:05:49 +02:00
]
},
2019-01-15 05:36:29 +01:00
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 15,
2019-01-18 16:08:37 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"W2 = Vt[:2].T\n",
"X2D = X_centered @ W2"
2019-01-18 16:08:37 +01:00
]
},
{
2021-11-19 06:03:48 +01:00
"cell_type": "markdown",
2019-01-15 05:36:29 +01:00
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"## Using Scikit-Learn"
2019-01-15 05:36:29 +01:00
]
},
{
2021-11-19 06:03:48 +01:00
"cell_type": "markdown",
2019-01-15 05:36:29 +01:00
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"With Scikit-Learn, PCA is really trivial. It even takes care of mean centering for you:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 16,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"from sklearn.decomposition import PCA\n",
"\n",
"pca = PCA(n_components=2)\n",
"X2D = pca.fit_transform(X)"
2019-01-15 05:36:29 +01:00
]
},
2021-10-03 12:05:49 +02:00
{
2021-11-19 06:03:48 +01:00
"cell_type": "code",
"execution_count": 17,
2021-10-03 12:05:49 +02:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0.67857588, 0.70073508, 0.22023881],\n",
" [ 0.72817329, -0.6811147 , -0.07646185]])"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
2021-10-03 12:05:49 +02:00
"source": [
2021-11-19 06:03:48 +01:00
"pca.components_"
2021-10-03 12:05:49 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"## Explained Variance Ratio"
2021-10-03 12:05:49 +02:00
]
},
2019-01-15 05:36:29 +01:00
{
2021-11-19 06:03:48 +01:00
"cell_type": "markdown",
2019-01-15 05:36:29 +01:00
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Now let's look at the explained variance ratio:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 18,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"array([0.7578477 , 0.15186921])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"pca.explained_variance_ratio_"
2019-01-15 05:36:29 +01:00
]
},
{
2021-11-19 06:03:48 +01:00
"cell_type": "markdown",
2019-01-15 05:36:29 +01:00
"metadata": {},
"source": [
2021-11-19 11:36:04 +01:00
"The first dimension explains about 76% of the variance, while the second explains about 15%."
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 11:36:04 +01:00
"By projecting down to 2D, we lost about 9% of the variance:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 19,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"0.09028309326742046"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
2021-10-03 12:05:49 +02:00
"source": [
2022-02-19 06:17:36 +01:00
"1 - pca.explained_variance_ratio_.sum() # extra code"
2021-10-03 12:05:49 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"## Choosing the Right Number of Dimensions"
2021-10-03 12:05:49 +02:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 20,
2021-10-03 12:05:49 +02:00
"metadata": {},
"outputs": [],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"from sklearn.datasets import fetch_openml\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"mnist = fetch_openml('mnist_784', as_frame=False)\n",
"X_train, y_train = mnist.data[:60_000], mnist.target[:60_000]\n",
"X_test, y_test = mnist.data[60_000:], mnist.target[60_000:]\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"pca = PCA()\n",
"pca.fit(X_train)\n",
"cumsum = np.cumsum(pca.explained_variance_ratio_)\n",
2021-11-19 11:36:04 +01:00
"d = np.argmax(cumsum >= 0.95) + 1 # d equals 154"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 21,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"154"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"d"
2021-10-03 12:05:49 +02:00
]
},
2019-01-15 05:36:29 +01:00
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 22,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"pca = PCA(n_components=0.95)\n",
"X_reduced = pca.fit_transform(X_train)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 06:03:48 +01:00
"execution_count": 23,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"154"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"pca.n_components_"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 24,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"0.9501960192613035"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"pca.explained_variance_ratio_.sum() # extra code"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 25,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAaAAAAEQCAYAAAD2/KAsAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAA37klEQVR4nO3deXxU9fX/8dfJTkhYg4BsIpugIqLi1spgsUWpUlErVbRqLbhW6lK3Vq12oa226M8VLcWCFaziV1DEgjW44AJakF2QXQQMIBACWc/vj3szmYRMZkJm5t5JzvPxmEfuvXNz553bOofPvZ/7+YiqYowxxiRaitcBjDHGNE1WgIwxxnjCCpAxxhhPWAEyxhjjCStAxhhjPGEFyBhjjCcSVoBEZJKI7BCRZWHeFxF5TETWisjnIjIwUdmMMcYkXiJbQJOBYXW8fy7Qy32NAZ5KQCZjjDEeSVgBUtV3gV117DIC+Kc6PgJaiUjHxKQzxhiTaH66B9QJ2ByyvsXdZowxphFK8zpACKllW63jBInIGJzLdGRlZZ3UtWvXeOaKqYqKClJS/FT362Z548vyxlcs8ipQOWJZ5XK1n8FlZ4PWsR9hfxcqh0Wr/NKr3H7Ico08XinZtrZAVds15Bh+KkBbgC4h652BrbXtqKoTgYkAffr00dWrV8c/XYzk5+cTCAS8jhE1yxtfljc6FRVKcVkFB0vLOVhWTnFpBQfLyjlY6m4rLQ++X/VeOSu/+JIju3QN2a/C/f2q/SvfKy6roKSsgpLyCkrLKigud9YjEWr/13Njt/FPP9zY0GP4qQDNBG4SkWnAqcAeVf3a40zGmCiUVyhFJWUcKCmnyH0dKC1zfpaUc6A0ZHtJWchyOUWllftU7V/k/k5xaTkHy6IrBGGt/TJ2f6hPpaUIaalCekoK6WkppKUI6akppKVKcLlyPT3F3Z6aQnrIfumpKe5xUkhNgbSUFFJTJPhKSxFSxP2ZItz0pxjkbvghoiMiLwIBIE9EtgD3A+kAqvo0MBs4D1gLFAFXJyqbMU2JqtOaKCwuo/BgGYXFZewvLmN/SRmFxeUUHnTWD91eytYdB3h46XuHFJYGFQifS0sRMtJSyEhzvsQzUlPIdNcz0pz12pYza6ynp1btU/33U8lIS2Hl8qWcdOKAYBFITxXSUpyftRWJjJACI5L4NthNMThGwgqQqv4kwvsK3JigOMYkpYoKZX9JGXsPlrH3QKnzOljGvoNVy87PUvYeqCweZdWLSkk55RUNuHuwe2/s/qB6yExLISs9laz0FDLTnJ9Z6alkpaWSWXObu7596xb69OhebVtmcJ9UstJSyAx5L7SIZLpFIzUlMV/u6TtWcmbPvIR8ll/46RKcMU1CeYWy50Apu4tKWLO7nLIV2/k2WEycwrH3YNX6voNlwe37DpbSkNoRLyLQLD2V7IxUmmWkkp2e5vzMqNyWRna6857zvvszI63qdyr3Ta/aVlVcUg7rX/n5+TsIBHrF4S82sWAFyJgGOFhazu6iEnbvL+XbohJ2F5W6686ysy102Skq1eaB/HhRwnNnpKbQPDOV5plp5Liv5sGf4bansXblMk4fdJJbKCqLRhpZ6YdXIEzTZgXImBCqyt4DZRTsL6ZgXzE795dQUFhMQaHzc6e7XPmzsLgs4RmzM1JpkZVOi2Zp7s90WmSl0aJZOrlZVdtysw4tIpXLGWmH1zU5fcdKBnRpFds/yDRZVoBMo/PCCy9w7733smnTJrp27crvf/97fvKTy9hVVMK2PQfZse8g2/YUs33vQbbvPci2vQedIrOvhJ37iyktj/81rhZZabRunkFq2UG6dmhL6+wMWrqFJPeQ4lK1npOVRnpq8jyHY0xdrACZpFdRoRQUFrPl2wM8/e+3ePXvEygpPgjAxo0bufLqn3HrS4tp1jcQ889OSxHaNM+gdXYGrbLTaZ2dQevm6bTKzqB1duXP0OV0WjZLJ80tIs5zNYNinsuYZJD0/5TavHkzkydPBqC0tJRAIMDUqVMBKCoqIhAIMH36dAD27NlDIBBgxowZABQUFBAIBJg1axYA27ZtIxAIMGfOnOCxA4EA8+bNA2DdunUEAgHmz58PwOrVqwkEAixYsACAZcuWEQgEWLhwIQCLFy8mEAiwePFiABYuXMi4ceNYtswZEHzBggUEAgEqH6SdP38+gUCAdevWATBv3jwCgQCbNzsjFM2ZM4dAIMC2bdsAmDVrFoFAgIKCAgBmzJhBIBBgz549AEyfPp1AIEBRUREAU6dOJRAIUFpaCsDkyZOrPVT47LPPMnTo0OD6k08+yZ133hlcf/TRR7nggguC6w8//DAXXXRRcH38+PGMGjUquP7QQw8xevTo4Pp9993H1VdX9a6/++67GTNmTHD99ttv58YbqzpCjhs3jnHjxlFWXsGW3UVcfMXPGHHFWP7f22u48+XP6XXWj+h29uUcc98cBv3hbc4a/mNe+vtjweJTqaK0mJ35zxOtnMw0urXN5qRurfl+v/ZcdmpXfnF2Tx4ccSxPXDaQ6WNOY96tg1ly3/dZ8/tz+eTeobz1y7OYPvZ0nr7iJP44sj93DjuGMWf14Mcnd+Gcfu05+ag29Dwih7Y5mcHiY0xTZy0g4wtl5RXsPVDKjm8PMPmD9WzYWcR/lm9j78FSZv1mDuUVys7l20lJy2DJ3C8A2Ln3ICnN0mkd8gyKFhfVevzyvQW0bJZO+xaZtG+RRYcWWbRvkUX7llm0z83kiBZZ5OVkkJeTSVZ6akL+ZmOaOlH1YZ/OerCheOIr1nn3HSxlzY5C1mzfx5rthawv2M/6nfvZvKuoQfdeWjZLp3PrZrzzwMUc2L3jkPe7dO3Kpo0NHjkk5pr6/x/iLZnyJlNWABH5VFVPbsgxrAVk4qKopIw12wv5Yvs+1uwoZPW2fazZvo+tew5G/uVatMvNpFOrZnRq3YzOlT9bN6NTq2w6tW5GTqbzf+Vf7/g5f/vb34KXHQGys7P54x/+EJO/yxgTO1aATIPt2l/C8q17WL51r/vaw/qC/dS3cX1EbiZH5TWne9vmzs+8bLrn5dCtbXbUl8WGDh1K3759ueOOO9i2bVuwF9zll19+GH+ZMSaerACZetl3sJTFm7/ls43fsvSrPazYuqderZq0FOHods3p1T6X3kfk0uOI5hzlFpzKVkxDXX755Tz77LP07t2b/Pz8mBzTGBN7VoBMWKrK14UV/HvRZj7b9C2fbdzNFzv2RdWySRHontec3u1zQ145HJXXPCHPsfzlL3+J+2cYYxrGCpAJUlU27izigy8LWPDlTj76cic795cAn9f5exlpKfTtkEu/I1ty7JEtOPbIFhzToQXNMrzrTXbKKad49tnGmOhYAWriduw9yPtrnYKzYG1BxMtpKQJ9O7ZgYNfWDOjSiuM6taRHu+a+e7al8tmrAQMGeJrDGBOeFaAmpqJCWfrVHt5etYP/rtrOsq/qHlo/Jx1O7XEEA7u15sSurTihcyuax+heTTyNGzcOwO4BGeNj/v8mMQ1WWl7BB2sLeHPpNv67egff7CsOu29OZhqndm/DGT3zOLNnW7au/JSzhyTf5awJEyZ4HcEYE4EVoEaqrLyCj9fv4vXPtzJn2TZ2F5XWul9ainDyUa35Ts88zuiZR/9OLatdTtu2KjmH2LdLb8b4nxWgRkRV+d/mb3n1s694c9nXFBSW1Lpfm+YZBPq0Y2jf9nynVx4tstITnDT+Ksfjs84IxviXFaBGoKCwmFc/+4qXFm1mzY7CWvfp2DKL4cd35NzjOzCgS+uETTPslTvuuAOwe0DG+JkVoCSlqizauJtJ769n7ortlNUyT3O73EyGH9+RH/bvyMCurUlp5EUn1OOPP+51BGNMBFaAkkxpeQWzl37N399fz+db9hzyfnZGKj/s35ELT+zMoO5tGn1LJ5zjjjvO6wjGmAisACWJAyXlvPjJJia+u45tew99Vufkbq358cldGN6/Y1J0k463yjmazjjjDI+TGGPCsW8qnysqKeOFjzbxzLvrKCis3n06Iy2FkSd24uozu9OnQ65HCf3pnnvuAewekDF+ZgXIp0r
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-06-14 07:47:11 +02:00
"# extra code – this cell generates and saves Figure 8– 8\n",
"\n",
2022-02-19 06:17:36 +01:00
"plt.figure(figsize=(6, 4))\n",
2021-11-19 06:03:48 +01:00
"plt.plot(cumsum, linewidth=3)\n",
"plt.axis([0, 400, 0, 1])\n",
"plt.xlabel(\"Dimensions\")\n",
"plt.ylabel(\"Explained Variance\")\n",
"plt.plot([d, d], [0, 0.95], \"k:\")\n",
"plt.plot([0, d], [0.95, 0.95], \"k:\")\n",
"plt.plot(d, 0.95, \"ko\")\n",
"plt.annotate(\"Elbow\", xy=(65, 0.85), xytext=(70, 0.7),\n",
" arrowprops=dict(arrowstyle=\"->\"))\n",
"plt.grid(True)\n",
"save_fig(\"explained_variance_plot\")\n",
"plt.show()"
2019-01-15 05:36:29 +01:00
]
},
{
2021-11-19 06:03:48 +01:00
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 26,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"RandomizedSearchCV(cv=3,\n",
" estimator=Pipeline(steps=[('pca', PCA(random_state=42)),\n",
" ('randomforestclassifier',\n",
" RandomForestClassifier(random_state=42))]),\n",
" param_distributions={'pca__n_components': array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,\n",
" 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,\n",
" 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,\n",
" 6...\n",
" 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426,\n",
" 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439,\n",
" 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452,\n",
" 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465,\n",
" 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478,\n",
" 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491,\n",
" 492, 493, 494, 495, 496, 497, 498, 499])},\n",
" random_state=42)"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.model_selection import RandomizedSearchCV\n",
"from sklearn.pipeline import make_pipeline\n",
"\n",
"clf = make_pipeline(PCA(random_state=42),\n",
" RandomForestClassifier(random_state=42))\n",
"param_distrib = {\n",
" \"pca__n_components\": np.arange(10, 80),\n",
" \"randomforestclassifier__n_estimators\": np.arange(50, 500)\n",
"}\n",
"rnd_search = RandomizedSearchCV(clf, param_distrib, n_iter=10, cv=3,\n",
" random_state=42)\n",
"rnd_search.fit(X_train[:1000], y_train[:1000])"
2019-01-15 05:36:29 +01:00
]
},
{
2021-11-19 06:03:48 +01:00
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 27,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'randomforestclassifier__n_estimators': 465, 'pca__n_components': 23}\n"
]
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"print(rnd_search.best_params_)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 28,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"GridSearchCV(cv=3,\n",
" estimator=Pipeline(steps=[('pca', PCA(random_state=42)),\n",
" ('sgdclassifier', SGDClassifier())]),\n",
" param_grid={'pca__n_components': array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,\n",
" 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,\n",
" 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,\n",
" 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,\n",
" 78, 79])})"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"from sklearn.linear_model import SGDClassifier\n",
"from sklearn.model_selection import GridSearchCV\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"clf = make_pipeline(PCA(random_state=42), SGDClassifier())\n",
"param_grid = {\"pca__n_components\": np.arange(10, 80)}\n",
"grid_search = GridSearchCV(clf, param_grid, cv=3)\n",
"grid_search.fit(X_train[:1000], y_train[:1000])"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 29,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"{'pca__n_components': 67}"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"grid_search.best_params_"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"## PCA for Compression"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 30,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"pca = PCA(0.95)\n",
"X_reduced = pca.fit_transform(X_train, y_train)"
2019-01-15 05:36:29 +01:00
]
},
{
2021-11-19 06:03:48 +01:00
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 31,
2019-01-15 05:36:29 +01:00
"metadata": {},
2021-11-19 06:03:48 +01:00
"outputs": [],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"X_recovered = pca.inverse_transform(X_reduced)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 32,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAegAAAEECAYAAADj1qf1AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOz9d3xk13Umin6nUDlXoRJyIacG0OjIJpvdzDRJS5RkySM9S7LH9sz12JY9ntHcGb+JV+96rHlz7ecZS7J9NbZ1bQVaHmokUSRFNUOTbHZGo5EzCqkAVKEKlXN6f6DX5qmDAhoZaLq+3w8/ABXO2Wefs/dK31qLy+VyKKKIIooooogijhZEhz2AIooooogiiihiPYoCuogiiiiiiCKOIIoCuogiiiiiiCKOIIoCuogiiiiiiCKOIIoCuogiiiiiiCKOIIoCuogiiiiiiCKOIIoC+gEHx3EzHMd9eZvfyXEc9+k9Hsd/4jhucC+PWUQRRTyY4DjuyxzHzRz2OB50FAX0EQDHcRUcx/3fHMctcByX5DjOyXHcNzmOq9zC108D+MY2T1kG4JXtj7SIIj764DjOynHcf+M4borjuMS99fg6x3HPH/bYiviHBfFhD+AfOjiOqwVwFYADwC8DmABQD+APANziOO5cLpebKfA9aS6XS+ZyuZXtnjOXyy3vbtRFFPHRBMdxdgAfAAgB+H0AfVgzZJ4E8OcAqg9tcAVA+8Bhj6OI/UHRgj58fB1AFsBTuVzurVwuN5fL5d4B8NS9178OABzHXeY47s84jvu/OI5bwdomss7FzXFcE8dx73IcF+c4bozjuOc5jgtzHPcrvM8wFzfHcfZ7//8Cx3GXOI6Lchw3zHHc07zPl3Ac95ccxzk4jotxHDfBcdz/znFc8fkp4qOGbwDgAJzK5XLfz+VyY7lcbiSXy30NQBcAcBxXzXHc/+I4LnTv5wd8bxeFeziO++V76zPMcdxfcxwn5TjuNzmOm+c4zstx3B/z19C9z/4njuO+fe87y8Lw1b21+lv3zhkB8J/vvf4xjuN67q17B8dxf8BxnJT3vU9xHNd/b/2u3tsjrPfeq+I47kf3Xo9yHDfKcdxned+t4DjuJY7jfPd+XuU4rlEwrv/93njDHMf9DQD1Ht6Tf7AobrCHCI7jjAB+DsDXc7lclP/evf+/AeA5juMM917+PNY2j0cBfLHA8UQA/heANICHAPwKgP8IQLaF4fwBgP+OtU3oFoCXOI6jRSYC4ATwiwBaAfxbAP9vAP94i5daRBFHHrz1+LVcLhcWvp/L5Xwcx3EAfgjACuAJAI8DKAfww3vvEewAXgTw8wB+AcBnAPwIayGpZwD8OoAvAfik4DT/AsAIgBNYW7v/meO4Twk+8x8BvAagA8DXOY57FsB3AHwNQDuAXwXwaXwovG0AXgLw/2Bt/V4A8Le8430DgPLetbQD+OcA/Pe+qwTwDoA4gIsAzgFYAvDmvffAcdwvAvg/743rBICxe9dRxG6Ry+WKP4f0A+AsgByAT27w/ifvvX8GwGUA/QU+MwPgy/f+fhZrwrmC9/7D947xK7zXcgA+fe9v+73//zfe+xX3Xju/ydi/CuBN3v//CcDgYc9p8af4s9Ofe+tsw/V47zNPA8gAsPNeq8OHXjBaCzEAOt5n/ieAFQBS3muXsaYM0P8zAC4Jzvc/AFzh/Z8D8KeCz7wH4N8LXvsEgDDWFPoT975Xs8E19QP4jxu896tYC7txvNdKAHgB/OK9/68C+Kbge28CmDnse/qg/xQt6KOBjTqWcIL3e+5znBYAi7lczsl77RbWNo/7oZ/39+K93xY2EI77DY7jbnMct8JxXBjA7+GIxeOKKGKX4O7/EbRibY3N0Au5XG4aa2umjfe5uVwuF+D97wIwnsuPF7vAW2P3cK3A/22C124L/j8J4N/ecy+H763P7wJQAbBhLY7+JoBBjuNe5jjun3EcZ+Z9/78B+Hccx13jOO7/5DjupODYtQBCvGMHABiwxpWhOSk07iJ2iaKAPlxMYE34tm/wfuu996fu/R+5z/E4bCzs74cU/ZG7pwLj3vPBcdw/AvAnAL6FNSv9ONbcYlIUUcRHB7QeWzf5zGZrjP96qsB7hV7byR4s3AdEAP4PrK1L+ukE0AhgJZfLZbDmVn8Ga4r4rwGY4DiuCwByudxfYk0I/zWAJgBXOY77T7xj3xUc+/i9z/3FDsZexDZQFNCHiFwutwrgDQC/SfEcwr3/fwvA6/c+txWMAKjgOK6c99op7P4+nwdwI5fLfS2Xy93J5XKT+FB7LqKIjwR46/G3efwLBo7j9ACGsbbG7LzX67AWhx7eg2E8VOD/kft85w6AllwuN1ngJw2sKd25XO5aLpf7P7AWB18E8I/oALlcbiGXy/3fuVzuFwH8BwD/lHfsBgCeAsemfWlkg3EXsUsUBfTh47exlu72JsdxT9xjVD4G4BLWtPXf3saxLmGNoPH/cBzXxXHcQwD+GGtx6d00/h4HcILjuOc4jmvkOO7fY40wUkQRHzX8JtbW3W2O4z7DcVwzx3EtHMf9M6xZn29izWX8HY7jTnIcdwprBK07AN7eg/M/xHHc799bZ/8Ea2TQ/999vvMVAP8vjuO+wnHcsXvj/TTHcf9fAOA47iGO4/4dx3GnOY6rBvBxAFW4p1BwaznfP8dxXB3HccexRpQjZeM7WHPF/4jjuIscx9VyHHeB47g/4jG5/xuAX+Y47p/cG/fvY41fU8QuURTQh4xcLjeFNSt3CGvMymmsxY9GAJzO5XKObRwrizVimQzATayxNv8Aa8I5voth/gWA798b1y2sEcv+aBfHK6KII4l76+0E1pTd/4I1ofw21oTa/3Yv/PMJrBG+LmON4bwM4BO80NBu8MdYc0/3Yo0Z/R9yudz/vM+Y3wDwAtZY2Dfv/fwbAHP3PhIA8AiAn2DNjf9HAP4/uVzu2/feFwH4U6wJ5UtYE8i/fO/YUayxvqcB/D2AUaztKwYAvnuf+TusEeP+4N64O+5dRxG7BLc3z1QRRxX34kx3sZbXeT+SWRFFFHFI4NZKY34tl8v9X4c9liKOBoqVxD5i4Djuk1gjkUxgzdL9Y6y55O4c4rCKKKKIIorYJooC+qMHDdZcc1VYc0FdBvB7e+R+K6KIIooo4oBQdHEXUUQRRRRRxBFEkSRWRBFFFFFEEUcQ93NxF83rIoo4ethKxauNUFzTRRRxtLDhei5a0EUUUcQ/aBTDfEUcVRRJYkUUUcS2wBdo+Q2cHizQdXxUrudBRXH+N0ZRQBdRRBFbRi6XQzabRS6XA8dxEIlEG26qQsv0KG2+uVwOmUwm71roB8gfK/0tEhUdjtsBzetWIPwcdXPiv7ffzw9fYRMqDYd174sCuogiitgyaPPKZrP3Fc70c1Ab7HbB34j5Qpr+579+VIWzUIgdJWxlbBuFF4TPz0EK6ULP7mGhKKCLKKKILSObzSKdTm+6saZSKWaZikQiiEQiiMVHb6shwSsWiwtuxNlsFslkEhKJ5EgK6Gw2yxQl4GgJaZo7EnJyuTzv/XA4zN7X6XTrng++d4MUwZKSEvb+flwrX/lk/ZgLKKGZTAbpdBrxeJw9PwqFYs/HAxQFdBFHALFYDLFYDE6nE+l0mr2eyWSwvLyMYDCIlZUVKJVKqFQqtLe3w2q1wmazHeKot47R0VEsLS1haGgICoUCBoMBFy9eRGlp6WEPbdvIZDLIZDIFN0ja1MRicd4GxxdumUwGyWQSIpEIJSUlhya4OY6DWCxGNptl18MXAIRsNotUKsWu66ggm82yaxC+tpGykc2utYUnobNfQi4cDiMajcLtdiORSCAej7O59fv9CAaDcLlckEgkUCqVOHbsGMrKylBe/mETPolEgnQ6nbcfEPZLERGJROyZ3ehciUQCExMTWFlZwejoKORyOXQ6HU6cOAGLxQKlUik87K6w70/cbt0Emz10Bw1hbIJwVMb3oIIE8JUrVxCNRtnryWQSt2/fxvT0NPr6+mCxWFBWVoZf/dVfxcmTJ2G1Wh+Ieb99+zZu3LiB//E//gfMZjMaGxvR1NT0wAloEma
"text/plain": [
"<Figure size 504x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – this cell generates and saves Figure 8– 9\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"plt.figure(figsize=(7, 4))\n",
"for idx, X in enumerate((X_train[::2100], X_recovered[::2100])):\n",
" plt.subplot(1, 2, idx + 1)\n",
" plt.title([\"Original\", \"Compressed\"][idx])\n",
" for row in range(5):\n",
" for col in range(5):\n",
" plt.imshow(X[row * 5 + col].reshape(28, 28), cmap=\"binary\",\n",
" vmin=0, vmax=255, extent=(row, row + 1, col, col + 1))\n",
" plt.axis([0, 5, 0, 5])\n",
" plt.axis(\"off\")\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"save_fig(\"mnist_compression_plot\")"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"## Randomized PCA"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 33,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"rnd_pca = PCA(n_components=154, svd_solver=\"randomized\", random_state=42)\n",
"X_reduced = rnd_pca.fit_transform(X_train)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"## Incremental PCA"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 34,
2021-11-19 06:03:48 +01:00
"metadata": {},
2019-01-15 05:36:29 +01:00
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"from sklearn.decomposition import IncrementalPCA\n",
"\n",
"n_batches = 100\n",
"inc_pca = IncrementalPCA(n_components=154)\n",
"for X_batch in np.array_split(X_train, n_batches):\n",
" inc_pca.partial_fit(X_batch)\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"X_reduced = inc_pca.transform(X_train)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2022-06-14 07:47:11 +02:00
"**Using NumPy's `memmap` class – a memory-map to an array stored in a binary file on disk.**"
2019-01-15 05:36:29 +01:00
]
},
{
2021-11-19 06:03:48 +01:00
"cell_type": "markdown",
2019-01-15 05:36:29 +01:00
"metadata": {},
"source": [
2022-06-14 07:47:11 +02:00
"Let's create the `memmap` instance, copy the MNIST training set into it, and call `flush()` which ensures that any data still in cache is saved to disk. This would typically be done by a first program:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 35,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"filename = \"my_mnist.mmap\"\n",
"X_mmap = np.memmap(filename, dtype='float32', mode='write', shape=X_train.shape)\n",
"X_mmap[:] = X_train # could be a loop instead, saving the data chunk by chunk\n",
"X_mmap.flush()"
2019-01-15 05:36:29 +01:00
]
},
2021-10-03 12:05:49 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Next, another program would load the data and use it for training:"
2021-10-03 12:05:49 +02:00
]
},
2019-01-15 05:36:29 +01:00
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 36,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"IncrementalPCA(batch_size=600, n_components=154)"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"X_mmap = np.memmap(filename, dtype=\"float32\", mode=\"readonly\").reshape(-1, 784)\n",
"batch_size = X_mmap.shape[0] // n_batches\n",
"inc_pca = IncrementalPCA(n_components=154, batch_size=batch_size)\n",
"inc_pca.fit(X_mmap)"
2019-01-15 05:36:29 +01:00
]
},
2021-10-03 12:05:49 +02:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"# Random Projection"
2021-10-03 12:05:49 +02:00
]
},
2019-01-15 05:36:29 +01:00
{
2021-11-19 06:03:48 +01:00
"cell_type": "markdown",
2019-01-15 05:36:29 +01:00
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"**Warning**: this sections will use close to 2.5 GB of RAM. If your computer runs out of memory, just reduce _m_ and _n_:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 37,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"7300"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"from sklearn.random_projection import johnson_lindenstrauss_min_dim\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"m, ε = 5_000, 0.1\n",
"d = johnson_lindenstrauss_min_dim(m, eps=ε)\n",
"d"
2019-01-15 05:36:29 +01:00
]
},
2021-10-03 12:05:49 +02:00
{
2021-11-19 06:03:48 +01:00
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 38,
2021-10-03 12:05:49 +02:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"7300"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
2021-10-03 12:05:49 +02:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – show the equation computed by johnson_lindenstrauss_min_dim\n",
2021-11-19 06:03:48 +01:00
"d = int(4 * np.log(m) / (ε ** 2 / 2 - ε ** 3 / 3))\n",
"d"
2021-10-03 12:05:49 +02:00
]
},
2019-01-15 05:36:29 +01:00
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 39,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"n = 20_000\n",
"np.random.seed(42)\n",
"P = np.random.randn(d, n) / np.sqrt(d) # std dev = square root of variance\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"X = np.random.randn(m, n) # generate a fake dataset\n",
"X_reduced = X @ P.T"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 40,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"from sklearn.random_projection import GaussianRandomProjection\n",
"\n",
"gaussian_rnd_proj = GaussianRandomProjection(eps=ε, random_state=42)\n",
"X_reduced = gaussian_rnd_proj.fit_transform(X) # same result as above"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2022-06-14 07:47:11 +02:00
"**Warning**, the following cell may take several minutes to run:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 41,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"components_pinv = np.linalg.pinv(gaussian_rnd_proj.components_)\n",
"X_recovered = X_reduced @ components_pinv.T"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 42,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"GaussianRandomProjection fit\n",
"4.05 s ± 327 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"SparseRandomProjection fit\n",
"3.85 s ± 647 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"GaussianRandomProjection transform\n",
"11.1 s ± 507 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"SparseRandomProjection transform\n",
"5.37 s ± 640 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – performance comparison between Gaussian and Sparse RP\n",
2021-11-19 06:03:48 +01:00
"\n",
"from sklearn.random_projection import SparseRandomProjection\n",
"\n",
"print(\"GaussianRandomProjection fit\")\n",
"%timeit GaussianRandomProjection(random_state=42).fit(X)\n",
"print(\"SparseRandomProjection fit\")\n",
"%timeit SparseRandomProjection(random_state=42).fit(X)\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"gaussian_rnd_proj = GaussianRandomProjection(random_state=42).fit(X)\n",
"sparse_rnd_proj = SparseRandomProjection(random_state=42).fit(X)\n",
"print(\"GaussianRandomProjection transform\")\n",
"%timeit gaussian_rnd_proj.transform(X)\n",
"print(\"SparseRandomProjection transform\")\n",
"%timeit sparse_rnd_proj.transform(X)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# LLE"
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 43,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"from sklearn.datasets import make_swiss_roll\n",
2019-01-15 05:36:29 +01:00
"from sklearn.manifold import LocallyLinearEmbedding\n",
"\n",
2021-11-19 06:03:48 +01:00
"X_swiss, t = make_swiss_roll(n_samples=1000, noise=0.2, random_state=42)\n",
2019-01-15 05:36:29 +01:00
"lle = LocallyLinearEmbedding(n_components=2, n_neighbors=10, random_state=42)\n",
2021-11-19 06:03:48 +01:00
"X_unrolled = lle.fit_transform(X_swiss)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 44,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAaAAAAEhCAYAAAA52nQkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAADdOUlEQVR4nOydd5zURBvHv7N9r3D03rtIx4IiCGIBARWxFxRFRMX6WrB3LGABRLCAFUEURCyAgoCg9N57O/rRrmzfnfePyd5md7PHYQGVfD+fwCaZJJPc7jyZZ575PUJKiYmJiYmJyYnGcrIrYGJiYmJyamIaIBMTExOTk4JpgExMTExMTgqmATIxMTExOSmYBsjExMTE5KRgGiATExMTk5OCaYBM/hEIIT4WQnyfav0PnrOsEEIKIdr/2foV83ozhRDvnIhrFRchxHNCiFWp1v+ma/7pv53JqYFpgExSNpxCiFuFEPkno07/Uq4EHj/ZlfgHcD9w0999kWN9P49lCIUQ27QXlMTl1b+nxiaJ2E52BUz+2wghHFLKwMmux4lASnnoRF3rn/xcpZRHT3YdjoMXgOEJ28yXrhOE2QMyKTbRN0ohxP1CiF1CiMNCiI+EEGm6MjOFEMOFEIOEEAeA37Tt7YQQ84UQPiHEPiHEW0IIx3FcWwghHhVCbBZCeIUQK4UQNyWUOVMIsVi7xlLg7GKct50QYp4QIl8IcVSrY2Nt314hxLW6sr8JIfKEEDZtvZ72xlxFd+/v6MpfKYRYodX3kBBilhCigravmhDiW227RwixTghxXRH1jD77x4QQ2UC2tr2JEGKa7hofCyGyivtcDa5TU7unMxK2SyHEVbr1Z4QQ24UQfu05fZpYV936TCHEu0KIAUKIHCHEfu37YdGVqSCEmKTdx3YhRC8hxCohxHN/9F6KSZ6Ucm/CYhqgE4TZAzI5XtoCe4ALgWrAOGAD8IquzE3A+1pZoTXQk4HPgFuBOsCHQAT4XzGv+xJwFXAPsB44B/hACHFYSvmDECId+AGYBdwCVAHeLuqEmiH5FhgJ3AjYgZZAWCsyC+gAfKkZ2TOAPO3/eUB7YJOUcpfBuSsCY1EuufFABtBaV+RdwKWdPxdoUIxncD5wFOikLiHSgCnAQuAsoDTwATAK6FGM8/0hhBA9gIeB64GVQHni782IG4HBwLlAc+ALYDEwRtv/CVAJuADwAm8ANf7iqpv8wzANkMnxkgvcJaUMAWuFEF8BHYk3QFullIWGRQjxMspo3S2ljGjH9QfeE0I8LaX0FHVBzbg8BFwspZwdvYYQ4iyUQfoB1cA5gF7aG+wq7bqfFXHqEkBJ4Dsp5WZt2zrd/pnAA9rnNsAWYAHKaEQN0MwU566MMmhfSym3a9v0g/81gPFSyuXR+yminlF8wG1SSj+AEOIOlGG7WUqZp23rA8wQQtSVUm4qxjn/CDVQf8+fpJRBYAew6BjHrJFSPqN93qDVvSMwRgjRALgEOEdKOQ/U+A6w7W+oeyIvG/SyrpNSmkEUJwDTBWdyvKzRjE+U3ag3YD2LE9ZPA+ZqxifKHJTBqFuMazZC9RamaK6yfKEGn+9C9aai11iR4D6ZW9RJtTGbj4GpQogfhBAPCSGq6YrMBOoLISqjjM0MbVt7bf/5pDZAy4FpKEM4XghxlxCinG7/YOApIcRcIcRLQohWRdVVY1XU+GhE7zlPt+13VM+yUTHO90f5CvX32CqEGCmEuFoI4TzGMSsS1vXfm4aoOhcaMSnlTq3M382bqB6ZfplxAq5rgmmATBS5gNG4QUmUy0dPMGFdkvw9KkhYF1o5I4ojxx49fzfiG4rTgYt11zhupJS9UGNFvwKXod7OL9H2rQX2oQxOe1TDNANoI4RohHLzzUxx3rBWt4tRje/twEYhRDNt/0igFvARUB/4vRjjHX/1czUi+pJQ+DyFEPa4Eyvj0AC4E/XdeQNYrPVUU1HU9+YP/e3+Ig5KKTclLInP2eRvwjRAJqDGVFoKIRIbgpbavj/LGuAc/aAzcB4QADYbH5J0vB+oYdBYbNeVaZLQCB5rXAIAKeVyKeVrUsr2KINyi273LKALatxnlpRyG5ADPEqK8R/deaWUcq6U8nngTNQb/bW6/dlSyvellNcAzwB9ilNfHWuAZkKITN22c1G/67XHea4oB7T/K+m2NU8sJKX0SSl/kFI+iLq301Fuyj/CWlSdC3uBQoiqKDemyX8YcwzIBFQYaj9gqBDiA9RYw6WoQebL/4Lzv4saS3lXCDEYqA28CrxzrPEfACllnhBiEDBIM5K/EhvUj0gp30cNar8MjBJCvIBqvJ4s6rxCiFqot/hJwC6tXk2JD8udCQwF1kkp92vbZqECLT4q4tytUYEaU1G9qBaooI012v7BqMCMDaixqE7RfcfBaOB54FMhxDNAKeA9YMIfHf+RUnqFEPOAx4QQm1E9Y/34XnR8xgbMR4UsX4vq4Wz8g9dcL4SYCowQQtyF+v4NBDwcuydnEUI0T9gWklJGx9tKGOw/or1IAGRqASN6vP+yUPJ/LaYBMkFKuUUI0Q4VafYTyr+/DrhaSvnjX3D+XUKIzqhGZRlwBGUwnjiO0zyNasgfRhmIXO1cr2vXyBdCdNX2LdHq/xjKuKTCg3J/fQWU1c4/GnhNV2YGYCXe1TYD6Enq8R9Qrss2wL0oV+ZO4EUp5efafgvKsFVDRdZNp/gRgQBIKT2au/BtVHCEDxXVd//xnMeA21BRigtRPdS7UUY/yhHUsx2ECrRYA1wppSxOIEUqbkVF8M0E9qN6hLVR91QUbmBpwraDqL8nqEjMxP3jURGVaNd5JmH/aE7ARFoTEGZGVBMTk38aQoiyKJfl9VLK8Se7PiZ/D2YPyMTE5KQjhLgAyCQ2r+hl1FjblJNZL5O/F9MAmZiY/BOwo1zAtVGu0flAOzMi7b+N6YIzMTExMTkpmGHYJiYmJiYnBdMAmZiYmJicFP7zY0Bly5aVNWvWPNnVOC4KCgpITy9qUvl/h1PpXuHUul/zXv+7LF68OEdKWe7YJYvmP2+AatasyaJFx9JJ/Gcxc+ZM2rdvf7KrcUI4le4VTq37Ne/1v4sQYvuxSx0b0wVnYmJiYnJSMA2QiYmJiclJwTRAJiYmJiYnBdMAmZiYmJicFEwDZGJiYmJyUjANkImJiYnJScE0QCYmJiYmJwXTAJmYmJiYnBROigESQnQSQqwXQmwSQvQ32C+EEEO0/SuEEC11+x4UQqwWQqwSQowRQrhObO1NTExMTP4KTrgBEkJYgWFAZ6ARcL0QolFCsc5APW3pg5YiWQhRBbgPOENK2RiVqfK6E1R1ExMTE5O/kJPRAzoL2CSl3CKlDABjgcsTylwOfCoV84CSQohK2j4b4BZC2IA0VNZEExMTE5N/GSfDAFUBdurWs7VtxywjpdyFykO/A9gDHJVS/vQ31tXE5NRg10ZY8jMc3HOya2JyCnEyxEiFwbbErHiGZYQQpVC9o1rAEeArIcRNUsrP4w4Wog/KdUeFChWYOXPmn63zCSU/P/9fV+c/yql0r3CS7zfvEBzaB+EQuDOgfHUQAvZsAm+B+rxkJZQoq/b9SU6lv+2pdK9/JSfDAGUD1XTrVUl2o6UqcyGwVUp5AEAIMQE4F4gzQFLK94H3Ac444wz5b1OpPZWUdU+le4WTeL+v9oYpH4MMq3UBWCxwRgdYOweC/lhZuwMu7Qt931ZG6Q9yKv1tT6V7/Ss5GS64hUA9IUQtIYQDFUQwKaHMJKCnFg3XGuVq24NyvbUWQqQJIQTQEVh7IitvYnJCyTkAm9ZDKARSwryZ8MaTMPJN2F8Md1nOHrjnfPhuJATDEEH96gUgI7BwOvj8ygchtCUUgO+GwtUlYdB1sGqmuraJyV/MCe8BSSlDQoh+wFRUFNsoKeVqIURfbf8I4EfgUmAT4AF6afvmCyG+BpYAIWApWk/HxOQ/xZHDcPeNMOcXsNnB4YA6tWDHOvAUgNMJbz4Nw76G9p2NzxEOw11tYI8udYuNZAd3WFtsgF3bLyV4cmHGl7D4e+h4O/Qe/DfcqMmpzElJSCel/BFlZPTbRug+S+C
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – this cell generates and saves Figure 8– 10\n",
2021-11-21 06:04:07 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"plt.scatter(X_unrolled[:, 0], X_unrolled[:, 1],\n",
" c=t, cmap=darker_hot)\n",
"plt.xlabel(\"$z_1$\")\n",
"plt.ylabel(\"$z_2$\", rotation=0)\n",
"plt.axis([-0.055, 0.060, -0.070, 0.090])\n",
2019-01-15 05:36:29 +01:00
"plt.grid(True)\n",
"\n",
"save_fig(\"lle_unrolling_plot\")\n",
2021-11-27 11:03:26 +01:00
"plt.title(\"Unrolled swiss roll using LLE\")\n",
2019-01-15 05:36:29 +01:00
"plt.show()"
]
},
{
2021-11-19 06:03:48 +01:00
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 45,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXsAAAEdCAYAAADtk8dMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAA1uElEQVR4nO3deZxcVZ338c+veu+ELCQhLEECKDiALKElKFtiEMIiKIIIOmzDIowsI8iOKKgjwgzi4zIPKioSCIIwIvCoIIQENECCJCJbwpaEJSQhW6+1neePW6frVnVVb1XV1dX1fb9e9eqqe2/dOocOvzp9lt8x5xwiIjKyRcpdABERKT0FexGRKqBgLyJSBRTsRUSqgIK9iEgVULAXEakCCvYiIlVAwV5EpAoo2IsMITO7ycz+WO5ySPVRsBcZWh8Hnil3IaT6KNjLiGNml5qZy/G4bgD3OMfMVptZbdbxO83s96nnB5vZQjNrNbONZva0me2R5351ZhYFDgauSZXnn4XUU2QgFOxlJPopsE3o8V/Ae8DtA7jHb4FxwKH+gJmNAo4F7kh9CfweeBLYC5gO3AIk8twvAXwi9Xx6qlwHDqA8IgWp7fsSkcrinNsMbAYws8uAk4AZzrnlZvYAcBDwF+fc8b3cY72ZPQx8CfB97J8D4sAfgDEEXwZ/cM69ljr/ci/3S5rZNqlyPeuUgVCGmFr2MmKZ2RXABcBM59wrqcM3A6f08xZ3AJ81s+bU6y8B9zrnOp1zHwC/Av5kZg+Z2dfMbPs+7rcPsESBXspBwV5GJDO7CjgPOMQ596o/7px7nFSrvx8eJGjJH2tmWxF06dwRutfpBF0y84FjgFfN7PBe7rc38PcBVEOkaBTsZcQxs2uAc0h13Qz2Ps65LuBeghb9iQT9/k9kXbPEOXeDc24GMA84tZdb7gUsHWx5RAqhPnsZUVIt+gsJWtptZrZ16tQG51znIG55B/AosCNwp3MumfqcHQm+UB4A3gZ2AvYkGBzOpxb4qJltC7Q75zYMojwig6KWvYwYZmbApcAE4Cng3dDjgEHedj5BMN+NUBcO0A7sAtwDvAr8GpgD3NDLva4CvgisAv5zkOURGRTTWJFUGzObAXy1t9k4IiONgr1UFTN7lKDvfBTwAXCCc+5v5S2VSOkp2IuIVAH12YuIVAEFexGRKqBgLyJSBYbVPPuJEye6qVOnlrsYA9bW1saoUaPKXYySUf0qm+pX2fpTv8WLF691zk3q7ZphFeynTp3KokWLyl2MAZs3bx4zZswodzFKRvWrbKpfZetP/czsrb7uo24cEZEqoGAvIlIFFOxFRKqAgr2ISBUYVgO0IiIjXxx4iSBh6iaCbYmnE2x8VrqQrGAvIlJyMeBPwJ3A60A4Tc08wIBm4N+Ak1Ovi0vBXkSkZF4EfggsJh3A/U8Xeu2ANuBWoBH4fNFLUnCfvZndZmbvm9kLOc5dYmbOzCYW+jkiIpXluwR73f8NiAKdoUcydU12C74TuK0kpSnGAO2vgNnZB1ObL38aWFGEzxARqSB3AncTtNjDLXhSr6O9vPeDkpSo4GDvnJtP7tLdTLBrkHIoi0gVWAWcT7Cv/A30DPLh5731yX+o6CWDEvXZm9kxwNvOuSXBTnEiIiPRO8DPCLYpfid0vJEg2OcKsb6PPknQ3s6OkRcWv5gUafMSM5sKPOic28PMmoHHgcOccxvN7E2gxTm3Ns97zwbOBpg8efK+c+fOLbg8Q621tZXRo0eXuxglo/pVNtWvFLqAteTu1Ai33ntr7OZq8Y8Bts24qj/1mzlz5mLnXEtv15SiZb8zsCPgW/VTgOfMbD/n3HvZFzvnbiUYgqalpcVVYkIjJWKqbKpfZRv6+l1FsL+851vqXk3qYQS7X2YHfN+qryXdk14DHA58k+ywXKz6FT3YO+f+AWzlX/fVshcRqRxzgV8CidTrOnoOS/pzNQSzaxqzzvtB2zjQBBwGnAdsV4LyphUc7M3sLmAGMNHMVgHXOud+Ueh9RUSGl4eAfycdrI0goPvWeZIggPu++MbUNW2kW/Ex4KPABQSTFYduTLPgYO+cO6mP81ML/QwRkfJZB1wC3J913AHtQD1BKA1Pp0ymzjUTtP7jpL8gbge2LG2Rc9AKWhGRnF4HTgWeJwjS+WaqR8k/w7yDzDB7DOUI9KBgLyIS4oCnCfLVXENmEK8FGsjd9ZIv2PvB2BqCL46ri1XQAVOwFxFx7xKkN7iNYFplKnhnrBOKp35mD7j2ZhTwDDCJoeyfz0XBXkSqm3sN+DiwgSDIR7KCfFi47737BgRdOXVZx5sIlhBtxXCgzUtEpHq59yFxGiQ3AA6cQdJBb4tNXTL8gqCbJhp6DcGg7cnAZcUu8aCpZS8i1cc5iH0NEj8i3T0D4ILGeR2pTAZZLXwHuHjqDwAjPdUSgjeeC3yJYM78FqWswYAp2ItI9XBJSDwKsSvAPZd/rDXmoMGCLwUf8J2DhINkJ9REwKXCp9UBCwgSoA3fXGAK9iJSHRKvQNs0iLTnzj8W5oCuJNRacK0DksnQpJtk6jEbmEOQ02Z4U7AXkZHLOUgshsR6aD8SLNVl05/RSt/C9xG+JnTOJgJ/AduzqMUtJQV7ERmZ4i/CpsPAvZ15PEE600F/e10yElQeBPY4WE3ey4cjzcYRkZElmYS2n8AHH4P425mbRXmx1M/s47muhVSwb4LIVVA3v+ICPahlLyIjSexVWL8XQbbJkOwWvBHMlgxnGfZ67CnSADX3Q+0RxS7tkFKwF5GRIRmFddOC2TI+gOfrpvGt93jWdUmCVnvDMRCZDJFdoPZksMmlKvWQUbAXkcrX8RCsOYb0nPcQn5Qye9Fr9mv/qNkBGm8DG1eq0paF+uxFpHLF10FyE6z+TNBXnySz390RpLpJkL9/PiPQHwFbLB1xgR4U7EWkEsXfhpUt8OZEiC0jnbiMnj+NoAs/+4sgSfAlkASSBo3Xw5iHwEYNTR2GmLpxRKSyuCSs/DjE3w1e5xp8zaWdoHlbRxD5khFoPAwaT4X648DqS1XiYUHBXkQqR/w9WH0xdL07uPcnCaZdxhpgzNdh7PXFLN2wpmAvIsNffC1svAPevxysK/91vosm1+Brtwao3R7GXFz0Yg5nCvYiMnwlo/D2mbDxt+BSQd7vEOj748P7f3su67mfO99wAIz+Mow6BSLNQ1CB4UPBXkSGr5Wnwwdzg356n+IA0jv9Qc+We85ZNsCEX8IWpw1JsYcjBXsRGT46l8Gmv0BiE6y9FaKvpc/5FnpdjvdFyb8neO12sM3D0Fg5SctKQcFeRMpvw5/h9ZMg/kHwOiPxWNbrBEFLPXvevM86bEBNHTR+AsZdAKOP62WbweqhYC8i5bV5ASw7Gojl30wktTUskO7C8blt/DU+m6U1wS7rINJU2nJXGAV7ERlarc/BxiehfjKMmQkvHQaxWHqQNVdyMsgchPVZEfz+312p9zT9C0ydp0Cfg4K9iAyNZAyWfgpanw42FYk0QLIt8xpH0GKvp/f1/T2mWNbCzs/CqL2LW+YRRMFeREqv/XV4bj9Irgu1zuM9p0x6cYKAHxbOd+O7dWq2hJpxsNtKqN+6FCUfMZQbR0RKa9X/wDO7ZgZ6L9+4aY7klZlTLJth3Gmw5zqo31mBvh/UsheR4nMOEp2w/Bvw5k3BsQjQkPqZJOiuaaDvnPM+xUF3XpsJsPXXYZtLSlf+EUjBXkSKxyXh+RNh9e/oMTcyCXQAjaQ3koqTP9e8H3wNvz8+BlregcjITlpWCgr2IlI8i4+Gtf8veJ6vPz6c2sYvhgpv6RpLvY6QXkhlBAd3/4MC/SAp2ItIcSQ6+w70/lxYF+mFU76v3gf4iAWzdsYdBrvcBvUTil7saqFgLyKD5xx0rYZII3S904/r6X3hVFjjdJi+sPAyClCEYG9mtwFHA+875/ZIHbsR+AzBH2mvAac75zYU+lkiMoys/jMsPhO63gOS0Lh9ekqkzzKZK7D7dAe9iWwB+zxc3PJWuWJMvfwVMDvr2CPAHs65PYFXgSuK8DkiMhy4JCw6C+YfDm0rIR6DeALa3kxv/we5s0/
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – shows how well correlated z1 is to t: LLE worked fine\n",
2021-11-19 06:03:48 +01:00
"plt.title(\"$z_1$ vs $t$\")\n",
"plt.scatter(X_unrolled[:, 0], t, c=t, cmap=darker_hot)\n",
"plt.xlabel(\"$z_1$\")\n",
"plt.ylabel(\"$t$\", rotation=0)\n",
"plt.grid(True)\n",
"plt.show()"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 46,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
"from sklearn.manifold import MDS\n",
"\n",
"mds = MDS(n_components=2, random_state=42)\n",
2021-11-19 06:03:48 +01:00
"X_reduced_mds = mds.fit_transform(X_swiss)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 47,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
"from sklearn.manifold import Isomap\n",
"\n",
"isomap = Isomap(n_components=2)\n",
2021-11-19 06:03:48 +01:00
"X_reduced_isomap = isomap.fit_transform(X_swiss)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 48,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
"from sklearn.manifold import TSNE\n",
"\n",
2021-11-19 06:03:48 +01:00
"tsne = TSNE(n_components=2, init=\"random\", learning_rate=\"auto\",\n",
" random_state=42)\n",
"X_reduced_tsne = tsne.fit_transform(X_swiss)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 49,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAwgAAAEQCAYAAAAUH1PIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOydd5jU1N6A35OpW+gdFgFBqgUF7Cj2goJ6bdfee7/Yrn4o9oIVC6JeEcUCKqggFoRFVLCggFRBei/bd3ZqzvfHSWYyM5nZXXrJ+zyzO5OcJCczycn5dSGlxMHBwcHBwcHBwcHBAUDb2R1wcHBwcHBwcHBwcNh1cAQEBwcHBwcHBwcHB4c4joDg4ODg4ODg4ODg4BDHERAcHBwcHBwcHBwcHOI4AoKDg4ODg4ODg4ODQxxHQHBwcHBwcHBwcHBwiOMICA4ODg4ODg4ODg4OcRwBwaFGCCGGCyGkEOItm3XPGOvGGZ8fNj5LIURUCFEkhPhZCHG/ECI/ZdsmQojXhBDLhBAhIcR6IcT3QoiTdtS5OTjsrRj39bid3Q8HB4ddAyFEoRDilRq0u0YI8acQokIIUSqEmC2EeMyy/gpjDjDRZlsphDjX8nmZZc5gfT217c7Moba4d3YHHHYrVgIXCCFul1JWAggh3MClwIqUtguBPoAAGgJHA/cDVwkheksp1xntPgVygauBxUBT4Fig0fY9FQcHBwcHB4faIoS4CngZuBP4HvAC3YAjUprGgGOFEKdIKb+pZrePAK+nLKvYBt112EIcC4JDbZgNLALOtyzrCwSBwpS2USnlOinlWinlXCnlG6jBoyHwNIAQoj7QG7hPSvm9lHK5lPI3KeVgKeVH2/lcHBwcLAghDjCsd2VCiHIhxCwhxHGW9ccIIX4RQgQNS98LQgivZX2hEOJ1IcRzhtVwoxDidiGETwjxqhCiRAixQghxacpxnxJCLBRCVBmaxGeEEH7L+oeFEHMMjeUKo91YIUTjHfPNODjsHQghhqMUdDdbtPhtbZr2Az6TUr4hpVwspZwnpRwtpbwrpV0QGAY8LYSobr5ZbswZrC9HQNiJOAKCQ215G7jK8vkq4B1AVrehlHItMBI4yxgsKoxXP+uEwMHBYafwAbAWOBQ4GHgY9YBHCNEKmAD8aay7Gvg38GTKPi4GyoHDgKeAF4GxwN9AT+Bd4C0hREvLNpWocaQLcBNwIfBAyn7bApcA/YETgf2A/23FuTo4OKRzOzAN9UxvYbxW2rRbBxwqhNi3BvscBLRHjQ0OuxGOgOBQWz4Aegoh9hNCNAdOBYbXYvt5QF2gsZQyClyBevCXCCGmCSEGCyEO28Z9dnBwqJ42wHdSygWGVnCMlHKase4mlPBwk5RyvpRyHHAfcIsQIteyj7lSyoellIuA54FNQERK+ZKUcjHKjUAAR5obSCkflVL+JKVcJqX8CngCJXxYyQEuk1L+KaX8CbgeOFMIsd82/xYcHPZSpJSlQBgIWLT4MZumg4DNwD9CiEVCiPeFEJcJITw2+9wADAYeFUL4shz+cSOewfo6Y1ucl8OW4QgIDrVCSlkMjEFp/C4HCqWUqfEH2RDmroz9fQq0BM5EaSiPBKYLIf67zTrt4OBQE55HafcnCSEeEEJ0tqzrAkyTUuqWZT+ifI87WJbNNt9IKSWwAfjLsiwCFKNijQAQQpwrhPhRCLFOCFEBvADsk9K31SnjzC+AbvTLwcFhOyGEmGuZsE8A5Q0gpTwCOABlJRTAG8CvKQoDk+cAP3BzlkM9D3RPeU3eJifhsEU4AoLDlvA/4DKUkFBbM39XoAylfQBAShmUUn4npXxESnkkyo3pYat/s4ODw/ZFSvkw6v4cixLUZxvBiKAmAJncCK3LIzbr7JZpAEKIw4GPgG9QSoKDgQeBNE2kg4PDTuF0EhP2a6wrpJRzpJSvSikvBk4y2pyfsj1GLMEjwANG7KEdmw3LpfVVuc3OwqHWOAKCw5bwPcoM2Rg1magRQogWwEWo4CY9S9N5qAxbTlyCg8MOREq5SEr5spSyL0pQNycE84AjUgINj0aNA/9sxSGPQlkHHjUSFCxCuTql0koI0dry+VDU82v+VhzbwcEhnTDgMj8YyUPMCfvqLNvNM/7nZ1g/DKUYvG/bdNNhe+OkOXWoNVJKKYQ4EBBSylCGZm4jRsFMc3oU8F+gCJXuFCFEI2A0ygoxGxXc2BO4B/heSlm2XU/EwcEBACFEDspPeDSwDGiGEgB+MZq8BtwBvCaEeAnYFxWE/IqUMrAVh/4bNfm/GBUceQrp8QcAVcC7Qoi7UPEIQ4HxhkDh4OCw7ViGCkBui0oiUpSq0BNCvA6sASYBq1DBzA8CAeBbu51KKaOG6/CIDMetY8wZrFQZcREOOwHHguCwRUgpy6uZwHdCBTWuQvkqX4nSIBxiqYFQAUxHZU6YAsxFBSh+AFywnbru4OCQTgxogMoytBAVZzQNuAvA0ByehnIBmokS6j9ECf1bjJTyS+BZlB/zbJSbwkCbpstQrkhfoiYlS1BjioODw7ZlMMqKMA/YSHo8EMB3qExlo1BC/hhj+UlSyr8z7VhK+QmWOKUUBqLmDNbXq1vQf4dthFBxZA4ODg4ODrseQoiHgXOllPvv7L44ODg47C04FgQHBwcHBwcHBwcHhziOgODg4ODg4ODg4ODgEMdxMXJwcHBwcHBwcHBwiONYEBwcHBwcHBwcHBwc4jgCgoODg4ODg4ODg4NDnN26DkLjxo1l27Ztd3Y3akxlZSV5eXk7uxvblb3hHGHPO88ZM2ZsklI22dn9qClbc+/vqr+d06/a4fSrdmTr1+50/+9uz32TXfW6qC17wnnsCecA2+Y8st37u7WA0LZtW37//fed3Y0aU1hYSJ8+fXZ2N7Yre8M5wp53nkKI5Tu7D7Vha+79XfW3c/pVO5x+1Y5s/dqd7v/d7blvsqteF7VlTziPPeEcYNucR7Z733ExcnBwcHBwcHBwcHCI4wgIDg4ODg4ODg4ODg5xHAHBwcHBwcHBwcHBwSGOIyA4bDl6GKLl9uvC6yGyecf2x8HBwWFXQUqIrga9dGf3xGGPRQcCgLWeVTmwMWWZg0Pt2a2DlB12EtFKmH8rrP0AZAxy20O3N6DhsaBXwYz9oWoxICG/J3T+APxtdnavHRwcHLYv4b8htgE23QHB0SCLQOqQcwo0ehe0Bju7hw57BCuAt4AxKAGhPnAk8CsQBQTQALgTOM347OBQO3a4BUEI8T8hxAYhxBzLsoeFEKuFEDON1+k7ul8OtWDmebD2Q9BDIKNQuRB+Px1KpkHVQgjMBRkCGYbyX2DW0aqdg4ODw57K5vth1UEQXQWlL0FoDcSCQBiqvoENZ+zsHjrs9mwE+gKHAq8Ba4EqoBL4CYgY7TSgFBgE9AMW7vCeOuz+7AwXo+HAqTbLX5BSdjdeX+3gPjnUlMBSKJoMejB5uQzBgjtJN2vGIFYKRRN2VA8dHBwcdixVP0Lpi0AQhEw8WWMYQ2IYIjMhMm8nddBhz+BsElYCkzDqQgNlKXAZ/02rwQbgBpQg4eBQc3a4gCCl/AEo2tHHddhGVC0FzZe+XMagarkyp6eti0AoJdWuHobginRBw2GPxbEeOuyxlL4CIph4oprzNA2LzsQD0d2m3IDDLkUIOAtYgH1sQaXxP5MrUQDlkuTgUHN2pSDlW4QQs41JhOOouauS39V+Ui+8UPdgEHaXlAvye6m3UsKKx2F6I5jRBaY1gqX/tRcsHPY0huNYDx32RMI/2y+3ztdkCLzdd0RvHPY47gWmkjnw2GpBsEMCI4D127hfDnsyu0qQ8uvAo6ir+FHgOeAqu4ZCiOuA6wCaNWtGYWHhDuri1lNRUbFb9Tcj+hsQLkJlUDAQLijtSkWslMLS54gPZEIDLR/+qAIKIbIRwrkgH05sW6LBolHgbb7DTmFr2WN+yx2IlPIHIUTbnd0PB4dtTmR1YjiUqPmay7Je5ELeFeBqscO75rAbIoOouIGmIBqgJvcBIC/DBm5U/IFpurIjBDwCvLpt++qwx7JLCAhSyrhYK4R4ExiXpe0wYBhAz5495e5ULnt
"text/plain": [
"<Figure size 792x288 with 3 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2022-02-19 06:17:36 +01:00
"# extra code – this cell generates and saves Figure 8– 11\n",
2021-11-21 06:04:07 +01:00
"\n",
2019-01-15 05:36:29 +01:00
"titles = [\"MDS\", \"Isomap\", \"t-SNE\"]\n",
"\n",
2022-02-19 06:17:36 +01:00
"plt.figure(figsize=(11, 4))\n",
2019-01-15 05:36:29 +01:00
"\n",
"for subplot, title, X_reduced in zip((131, 132, 133), titles,\n",
" (X_reduced_mds, X_reduced_isomap, X_reduced_tsne)):\n",
" plt.subplot(subplot)\n",
2021-11-19 06:03:48 +01:00
" plt.title(title)\n",
" plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=t, cmap=darker_hot)\n",
" plt.xlabel(\"$z_1$\")\n",
2019-01-15 05:36:29 +01:00
" if subplot == 131:\n",
2021-11-19 06:03:48 +01:00
" plt.ylabel(\"$z_2$\", rotation=0)\n",
2019-01-15 05:36:29 +01:00
" plt.grid(True)\n",
"\n",
"save_fig(\"other_dim_reduction_plot\")\n",
"plt.show()"
]
},
2021-11-19 06:03:48 +01:00
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Extra Material – Kernel PCA"
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 50,
2021-11-19 06:03:48 +01:00
"metadata": {},
"outputs": [],
"source": [
"from sklearn.decomposition import KernelPCA\n",
"\n",
"rbf_pca = KernelPCA(n_components=2, kernel=\"rbf\", gamma=0.04, random_state=42)\n",
"X_reduced = rbf_pca.fit_transform(X_swiss)"
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 51,
2021-11-19 06:03:48 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAq4AAAEDCAYAAAD0lYarAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOydd5jURBvAf7PtOkfvHQHFgopdUVRAULE37BV7R/3svWMXsSs2UBQVOwqcoBRBmvTe+8H12z7fH5PcZrPJ3t5R7oD8nifP3SaTyWQ3k3nnnbcIKSUODg4ODg4ODg4OtR1XTTfAwcHBwcHBwcHBIRUcwdXBwcHBwcHBwWG3wBFcHRwcHBwcHBwcdgscwdXBwcHBwcHBwWG3wBFcHRwcHBwcHBwcdgscwdXBwcHBwcHBwWG3wBFcHRwcHBwcHBwcdgscwdXBwcHBwcHBwWG3wBFcaxlCiBVCiIE13Q47hBAfCyF+rOl2WCGE6CGEkEKIhjXdFoeaRwiRJ4R4s6bbYUdt7ksO209t+n1TaYsQ4kchxMeVlHH6lEON4wiuu5gUOtbhwFu7qj0ODjsT7XmX2hYWQqwSQgwRQtSzKSOFEFu0QXTfJHUZt4N3+Y3tBQghbhJCLBdC+IUQ/wohuu+IcyorI4S4XwgxVQhRJITYLIT4QQhxwI68tx2BEKKREOItTdkQEEJsFEKMEUL00orcDlxak200UJvaslfi9KdEhBDHCyFGCSHWau/yK1M5zxFcaxlSys1SyrKabocQwreLruMSQrh3xbUcaow/gGZAW+BaoB+JkzO9TDOgN5ABfJukLuM2Z2c02opd1S9qGiHEhcBrwDPAIcBE4BchROvtOSfFenugno9jgJOAMPCHEKL+jrq/HcQ3wBHANUAn4HTgF6ABgJSyUEpZUGOtM1Cb2mLE6U+7Z38SQniqe66JbNT7+3agPOWzpJTOtgs34GPgxyTHVwADDZ8lMAAYAZQCy4BLTee0AIYD27TtJ6Cj4XgH4Htgg1bHdOB0i+s+BnwIFAAjUmk/0BVYDzytfc4F3gU2AcXAn8BhhvJXAiXAqdoDGwYO0K7/EPAOUASsAe4xXbuyunto31fDmv6dnc3+eQdeAvIrKXO69ltmJCtXybXzgDcNn0/Wnu3rtc8CuBdYinpp/mfRt/KAIcAgYDMw1bD/LdSAsUV7JgcBLsO5Seuv6v1o58wHpgLZpv2jgXd24O82BXjPtG8x8Oz2nFPNerOBCNCvpp9nQ5vqas9nz1SffSAL+ER7/20E7gd+BD62eN5eArZqz9ztQBowWHt+VwGXma6VBryq1esHJgPHJWlLprZPb8sD5rbsjD61M/tTdfqU059SantL7Vm/CBirPV9X74Q+VQJcmUpZR+O6e/AISvDsCnwJfCiEaAMghMgExqEephOAo1GC5B/aMVAP6i9AL62Ob4CR5qVY4C5gAXAY6kWWFG1JYhzwgpTyQSGEQAnNLVCCxyHAeGCsEKKZ4dR0lJB6PdAFWKntvxP1MjoUeB54QQhxtHatVOt2qMUIIdoDfYBQkjI5wIXAf1LK1Gfhya97LkqDO0BK+Y62+ymUtuxm1HP4LPCOEOI00+mXogbN7sDlhv2XoCZexwC3AHdo7dZJtf6qcCFqotfDcG/9UJq/h40FhRAPCCFKKtmslh59QDfU4G1ktHavCaRyTnXq1chBrQ5uS1JmV1OibWcIIdJTPOcl1Dv6bJTmqyvqmTJzCWpifiTwHEog/Q5YhHo3DwXeF0I0N5zzAurZuBr1bvwP+DXJu3EQajw4FyV8HgIcn+J9ANvVp5z+pNhd+tPB2t/7UM/N/ih5xNjean031WZHS83OVums4mOqrnE1zrA8QBnaTBP1oloMCEMZN5APXJDkOpOBh0zX/SHV9qOExyLgcsOxk1Av8wzTOTOBe7X/r9TuqZvFfQ8z7VustzHFunvgaFxr1aY9L2HttyvXfh8J3GlTpkQ7vgo4IEld+vZLkmvnAW+iViwKgd6GY1lae7qbznkV+NlUx2ybuieZ9v0OvJ9q/VRD46qd9xfaOwLwoQSauy3K1Qf2qWTLsDivufYbHG/a/wiw0KZNlZ5TnXq1418BMwB3TT/Ppnadi9KK+oFJqEH9SNPz+qP2fzYQBC4yPYPbSNS4TjJ8FijN5CjDPq9W13mGeoLEv4vdKM3kUzZtCQCXGMpnozSnH1dyz3lsR59iJ/an6vYpnP5U2ffzoPbd75OkTLW+G1MdKWtcd5SdgsPOZbb+j5QyLITYDDTWdnUD2gHFSilZQSbKRAAhRBbwKErYbIZ68aUb69WYlmJ7uqFm2hdLKUeY9mcCm01tSdfbohFGCZxmzO1ZR/x9plK3Q+1jPGqgywCuQ/1er9uUAfUSvAkYLYQ4Ukq52qYcVG4XdSZKs3+8lHKSYX8X1LPzqxBCGvZ7UZMoI//a1J3sea1K/VVlIdBZ+/827e8b5kJSyq0owaq6SNNnYbGvOuekXK8Q4mXgONSyd6SSa+9SpJTfCCF+QmkOj0atJNwthHhQSvmMqXgH1G//j+H8UiGElX228X0vhRCbUBpUfV9ICLGN2LOm1/23oUxECDEJ9Rya6YAS0CYZypcIIf6zKGvF9vYppz8pdpf+dDBqcrDEtpHb/91UCUdw3T0wL6tKYo51LpQQeJHFefqDNAj1Uh2I0mKWoWytzIbxpSm2ZznKBulqIcQoKWXA0JaNWC9/FRn+D9h0msruM5W6HWofZYaX3m1CiHGoZbjHbMoghPgXpdEZQPySXVmyF6gFs1HP0TVCiMlSm9oTe676obS7RszPoV2/qOx5TbX+qrIQOF0I0Rj13VwqpQyaCwkhHqByk5++UsoJpn1bUDZwTU37G6P6oBWpnFOleoUQr6DeaydKKZcluYcaQ0rpR2kGfweeEEK8DzwmhBhkKqrPtisTVMD6uUr2rCWr22qfsNhXFba3Tzn9SbG79KeuKC27Ldvx3VQLR3Dd/ZkO9Ae2SHuv0eOAT6SU3wBoNlkdUEsi1WErcAYwBvhWCHG2JrxOB5oA0Z0w0OzMuh12LY+jPF/flVKusykjgShKy749LAduRS1FviuEGKANtPNQy6VtpJRjt/MaVuzM+hcCdwNPA/9IKX+wKfc2alkwGWvNO6SUQW3i0AvlFKrTC2Ufn0Aq51SlXiHEa6hBtoeUckEl91CbmIcaV812r0tQAtYRqGdS9084ALWkvz0sQZkKHIdy3kWL1HI08IVN+RBwlKF8VhXaUhN9yulPpnN2RX/SnosOqPE3GdX6bqqLI7jWDHVEYuzJAinlimrU9TlKk/q9EOIR1Gy0FWo5520p5WKUgHq2EOJ71AvrURJfrFVCSrlFCHEyystwpBDiHFSoor+1ttyLcvRqitL2/rGds62dWbfDLkRKmSeEmIty0LtJ250mhNA1B/VQzhnZgN0gUpXrLRNCnEj8QFusacUGaY5/47XrHYWaHL27ndfcmfUvQmlVLkc5Mtq1YXuW714GPhVC/IPqdzegbOreBhBC3ALcIqXcN9VzUi0jhBgMXAacBWwzPBclUsqSat7PDkUI0QAlLHyI0kAWoxyn7gXGSCmLjCZN2lL8h8DzQogtKAfah1CaxFS0sLZoJgdDgOe0upejHF2bYBETXGvLB1pbNqOW5B9B2cWmes1d2qf29P4Eln2qNvSng7S/M5MVqu53I4TIRtm/guoLrTXZaKuU0qxZr8ARXGuG7ijjaCPfAOdVtSIpZZkQ4niU9+kIVMiodShv/21asbuAD4AJ2r5X2U7BVbv2FiHESSjh9RuUs8KpKO/P94gtWfyNMk3YnmtJIcROqduhRngZ+EgI8bz2uSdqMAclBCwAzpdS5u2Ii0kplwoheqAG2neEENejlgU3oiZ+Q1AmJzNRHto7girVL1Tw7Y+AdpVMYpeglgjfk1LO3UFtjUNK+aUmnD1ELFbuqVJKPQJIQ2J2gamek1IZYpOZMaZmPU68eUlNUoJycL0dNfCmoTRKX6DeUVYMRDkZjdLOfwUlXPp3QHvu0/5+hArVNQPoI6Vcb1Neb8u3KNOxN7TPKVMDfWpP7k9g6lO
"text/plain": [
"<Figure size 792x252 with 3 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2021-11-19 06:03:48 +01:00
"source": [
"lin_pca = KernelPCA(kernel=\"linear\")\n",
"rbf_pca = KernelPCA(kernel=\"rbf\", gamma=0.002)\n",
"sig_pca = KernelPCA(kernel=\"sigmoid\", gamma=0.002, coef0=1)\n",
"\n",
"kernel_pcas = ((lin_pca, \"Linear kernel\"),\n",
" (rbf_pca, rf\"RBF kernel, $\\gamma={rbf_pca.gamma}$\"),\n",
" (sig_pca, rf\"Sigmoid kernel, $\\gamma={sig_pca.gamma}, r={sig_pca.coef0}$\"))\n",
"\n",
"plt.figure(figsize=(11, 3.5))\n",
"for idx, (kpca, title) in enumerate(kernel_pcas):\n",
" kpca.n_components = 2\n",
" kpca.random_state = 42\n",
" X_reduced = kpca.fit_transform(X_swiss)\n",
"\n",
" plt.subplot(1, 3, idx + 1)\n",
" plt.title(title)\n",
" plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=t, cmap=darker_hot)\n",
" plt.xlabel(\"$z_1$\")\n",
" if idx == 0:\n",
" plt.ylabel(\"$z_2$\", rotation=0)\n",
" plt.grid()\n",
"\n",
"plt.show()"
]
},
2019-01-15 05:36:29 +01:00
{
"cell_type": "markdown",
2020-04-06 09:13:12 +02:00
"metadata": {},
2019-01-15 05:36:29 +01:00
"source": [
"# Exercise solutions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. to 8."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-25 09:45:32 +01:00
"1. The main motivations for dimensionality reduction are:\n",
" * To speed up a subsequent training algorithm (in some cases it may even remove noise and redundant features, making the training algorithm perform better)\n",
" * To visualize the data and gain insights on the most important features\n",
" * To save space (compression)\n",
" \n",
" The main drawbacks are:\n",
" * Some information is lost, possibly degrading the performance of subsequent training algorithms.\n",
" * It can be computationally intensive.\n",
" * It adds some complexity to your Machine Learning pipelines.\n",
" * Transformed features are often hard to interpret.\n",
"2. The curse of dimensionality refers to the fact that many problems that do not exist in low-dimensional space arise in high-dimensional space. In Machine Learning, one common manifestation is the fact that randomly sampled high-dimensional vectors are generally far from one another, increasing the risk of overfitting and making it very difficult to identify patterns without having plenty of training data.\n",
2022-06-14 07:47:11 +02:00
"3. Once a dataset's dimensionality has been reduced using one of the algorithms we discussed, it is almost always impossible to perfectly reverse the operation, because some information gets lost during dimensionality reduction. Moreover, while some algorithms (such as PCA) have a simple reverse transformation procedure that can reconstruct a dataset relatively similar to the original, other algorithms (such as t-SNE) do not.\n",
2021-11-25 09:45:32 +01:00
"4. PCA can be used to significantly reduce the dimensionality of most datasets, even if they are highly nonlinear, because it can at least get rid of useless dimensions. However, if there are no useless dimensions—as in the Swiss roll dataset—then reducing dimensionality with PCA will lose too much information. You want to unroll the Swiss roll, not squash it.\n",
"5. That's a trick question: it depends on the dataset. Let's look at two extreme examples. First, suppose the dataset is composed of points that are almost perfectly aligned. In this case, PCA can reduce the dataset down to just one dimension while still preserving 95% of the variance. Now imagine that the dataset is composed of perfectly random points, scattered all around the 1,000 dimensions. In this case roughly 950 dimensions are required to preserve 95% of the variance. So the answer is, it depends on the dataset, and it could be any number between 1 and 950. Plotting the explained variance as a function of the number of dimensions is one way to get a rough idea of the dataset's intrinsic dimensionality.\n",
"6. Regular PCA is the default, but it works only if the dataset fits in memory. Incremental PCA is useful for large datasets that don't fit in memory, but it is slower than regular PCA, so if the dataset fits in memory you should prefer regular PCA. Incremental PCA is also useful for online tasks, when you need to apply PCA on the fly, every time a new instance arrives. Randomized PCA is useful when you want to considerably reduce dimensionality and the dataset fits in memory; in this case, it is much faster than regular PCA. Finally, Random Projection is great for very high-dimensional datasets.\n",
"7. Intuitively, a dimensionality reduction algorithm performs well if it eliminates a lot of dimensions from the dataset without losing too much information. One way to measure this is to apply the reverse transformation and measure the reconstruction error. However, not all dimensionality reduction algorithms provide a reverse transformation. Alternatively, if you are using dimensionality reduction as a preprocessing step before another Machine Learning algorithm (e.g., a Random Forest classifier), then you can simply measure the performance of that second algorithm; if dimensionality reduction did not lose too much information, then the algorithm should perform just as well as when using the original dataset.\n",
"8. It can absolutely make sense to chain two different dimensionality reduction algorithms. A common example is using PCA or Random Projection to quickly get rid of a large number of useless dimensions, then applying another much slower dimensionality reduction algorithm, such as LLE. This two-step approach will likely yield roughly the same performance as using LLE only, but in a fraction of the time."
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
2020-04-06 09:13:12 +02:00
"metadata": {},
2019-01-15 05:36:29 +01:00
"source": [
"## 9."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Exercise: _Load the MNIST dataset (introduced in chapter 3) and split it into a training set and a test set (take the first 60,000 instances for training, and the remaining 10,000 for testing)._"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The MNIST dataset was loaded earlier."
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 52,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"X_train = mnist.data[:60000]\n",
"y_train = mnist.target[:60000]\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"X_test = mnist.data[60000:]\n",
"y_test = mnist.target[60000:]"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Exercise: _Train a Random Forest classifier on the dataset and time how long it takes, then evaluate the resulting model on the test set._"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 53,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2019-01-18 16:08:37 +01:00
"rnd_clf = RandomForestClassifier(n_estimators=100, random_state=42)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 54,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 37.7 s, sys: 588 ms, total: 38.3 s\n",
"Wall time: 38.4 s\n"
]
},
{
"data": {
"text/plain": [
"RandomForestClassifier(random_state=42)"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"%time rnd_clf.fit(X_train, y_train)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 55,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"0.9705"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
"from sklearn.metrics import accuracy_score\n",
"\n",
"y_pred = rnd_clf.predict(X_test)\n",
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Exercise: _Next, use PCA to reduce the dataset's dimensionality, with an explained variance ratio of 95%._"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 56,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
"from sklearn.decomposition import PCA\n",
"\n",
"pca = PCA(n_components=0.95)\n",
"X_train_reduced = pca.fit_transform(X_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Exercise: _Train a new Random Forest classifier on the reduced dataset and see how long it takes. Was training much faster?_"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 57,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1min 28s, sys: 590 ms, total: 1min 28s\n",
"Wall time: 1min 28s\n"
]
},
{
"data": {
"text/plain": [
"RandomForestClassifier(random_state=42)"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"rnd_clf_with_pca = RandomForestClassifier(n_estimators=100, random_state=42)\n",
"%time rnd_clf_with_pca.fit(X_train_reduced, y_train)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Oh no! Training is actually about twice slower now! How can that be? Well, as we saw in this chapter, dimensionality reduction does not always lead to faster training time: it depends on the dataset, the model and the training algorithm. See figure 8-6 (the `manifold_decision_boundary_plot*` plots above). If you try `SGDClassifier` instead of `RandomForestClassifier`, you will find that training time is reduced by a factor of 3 when using PCA. Actually, we will do this in a second, but first let's check the precision of the new random forest classifier."
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Exercise: _Next evaluate the classifier on the test set: how does it compare to the previous classifier?_"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 58,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"0.9481"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
"X_test_reduced = pca.transform(X_test)\n",
"\n",
2021-11-19 06:03:48 +01:00
"y_pred = rnd_clf_with_pca.predict(X_test_reduced)\n",
2019-01-15 05:36:29 +01:00
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"It is common for performance to drop slightly when reducing dimensionality, because we do lose some potentially useful signal in the process. However, the performance drop is rather severe in this case. So PCA really did not help: it slowed down training *and* reduced performance. 😭"
2019-01-15 05:36:29 +01:00
]
},
{
2021-11-19 06:03:48 +01:00
"cell_type": "markdown",
2019-01-15 05:36:29 +01:00
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Exercise: _Try again with an `SGDClassifier`. How much does PCA help now?_"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 59,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 2min 34s, sys: 906 ms, total: 2min 35s\n",
"Wall time: 2min 35s\n"
]
},
{
"data": {
"text/plain": [
"SGDClassifier(random_state=42)"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"from sklearn.linear_model import SGDClassifier\n",
"\n",
"sgd_clf = SGDClassifier(random_state=42)\n",
"%time sgd_clf.fit(X_train, y_train)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 60,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"0.874"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"y_pred = sgd_clf.predict(X_test)\n",
2019-01-15 05:36:29 +01:00
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Okay, so the `SGDClassifier` takes much longer to train on this dataset than the `RandomForestClassifier`, plus it performs worse on the test set. But that's not what we are interested in right now, we want to see how much PCA can help `SGDClassifier`. Let's train it using the reduced dataset:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 61,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 29.7 s, sys: 418 ms, total: 30.2 s\n",
"Wall time: 27.9 s\n"
]
},
{
"data": {
"text/plain": [
"SGDClassifier(random_state=42)"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"sgd_clf_with_pca = SGDClassifier(random_state=42)\n",
"%time sgd_clf_with_pca.fit(X_train_reduced, y_train)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Nice! Reducing dimensionality led to roughly 5× speedup. :) Let's check the model's accuracy:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 62,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"text/plain": [
"0.8959"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"y_pred = sgd_clf_with_pca.predict(X_test_reduced)\n",
2019-01-15 05:36:29 +01:00
"accuracy_score(y_test, y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Great! PCA not only gave us a 5× speed boost, it also improved performance slightly."
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"So there you have it: PCA can give you a formidable speedup, and if you're lucky a performance boost... but it's really not guaranteed: it depends on the model and the dataset!"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 10."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Exercise: _Use t-SNE to reduce the first 5,000 images of the MNIST dataset down to two dimensions and plot the result using Matplotlib. You can use a scatterplot using 10 different colors to represent each image's target class._"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Let's limit ourselves to the first 5,000 images of the MNIST training set, to speed things up a lot."
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 63,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"X_sample, y_sample = X_train[:5000], y_train[:5000]"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Let's use t-SNE to reduce dimensionality down to 2D so we can plot the dataset:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 64,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 5min 49s, sys: 31 s, total: 6min 20s\n",
"Wall time: 29.4 s\n"
]
}
],
2019-01-15 05:36:29 +01:00
"source": [
"from sklearn.manifold import TSNE\n",
"\n",
2021-11-19 06:03:48 +01:00
"tsne = TSNE(n_components=2, init=\"random\", learning_rate=\"auto\",\n",
" random_state=42)\n",
"%time X_reduced = tsne.fit_transform(X_sample)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's use Matplotlib's `scatter()` function to plot a scatterplot, using a different color for each digit:"
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 65,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAp8AAAI3CAYAAADdpzCLAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOydd5hcR5n1f9XTk3PQKGfJQQ5yNs42BhNsmjhgwgeLWHLOu25ADNBL2iUs2SxmyZjGhh2TDBhsg7MtJznJVs4ahcmxu+v741SpemRJlmxL1sh1nqefnum+t27dutW3zj1vMtZaIiIiIiIiIiIiIg4GEs90ByIiIiIiIiIiIp49iOQzIiIiIiIiIiLioCGSz4iIiIiIiIiIiIOGSD4jIiIiIiIiIiIOGiL5jIiIiIiIiIiIOGiI5DMiIiIiIiIiIuKgIZLPiIiIiIiIiIiIpwRjzPuNMUuNMQ8YYz6wt20j+YyIiIiIiIiIiHjSMMYcC7wVOA1YCFxijJm/p+0j+YyIiIiIiIiIiHgqOBq41Vo7YK3NATcAL9/TxsmD1q2IiIiIiIiIiIinBafNm2e7BwYOyrGWbdz4ADBU9NHl1trLi/5fCmSMMc3AIPBi4M49tRfJZ0RERERERETEOEP3wADfe9vbDsqxLmhvH7LWnrKn7621Dxljvgj8BegD7gVye9o+mt0jIiIiIiIiIsYZDCJxB+O1L7DW/sBae5K19lxgO/DonraNymdERERERERERMRTgjGm1Vq7xRgzA3gFcMaeto3kMyIiIiIiIiJiHOIQM19f5Xw+R4F3W2t37GnDSD4jIiIiIiIiIiKeEqy15+zrtocYaY6IiIiIiIiIiDicEZXPiIiIiIiIiIhxiPGqII7XfkdERERERERERIxDROUzIiIiIiIiImKcwQAlz3QnniSi8hkREREREREREXHQEJXPiIiIiIiIiIhxiPGqII7XfkdERERERERERIxDROUzIiIiIiIiImKcwZfXHI8Yr/2OiIiIiIiIiIgYh4jKZ0RERERERETEOMR4VRDHa78jIiIiIiIiIiLGIaLyGRERERERERExzhDzfEZERERERERERETsA6LyGRERERERERExDjFeFcTx2u+IiIiIiIiIiIhxiKh8RkREPHuQNiXAAmAusAO4m4zteWY7FREREbH/GM95PiP5jIiIeHYgbcqA9wDHAznkq99G2vwnGbvqmexaRERExLMJ45U0R0REROwvXgK8EpgFNAPbAAssIm3MM9iviIiIiGcVovIZERExPiCCOAdoArYAa8hYu4/7zgU+BdQDQ0AjMAO4DZgOtACdT3+nIyIiIg4cxmuqpUg+IyIiDn2kTTXwbmQyb0XK5aOkzSfJ2EefYN9S4L+AmUCF+9QAw8BEoBs4nbT5/ePIbNokALvPJDciIiIi4gkRyWdERMR4wMuAhcBsRD5rUNDQc0iby8jYn+9l39OAU5FIUIJM7SWE+98jwKuRmno7AGkzD2gDjgB2kDbXADeQsYWnfCZpMxN4OTAZuAX4Ixk7/JTbjYiIeFYhBhxFREREHChIfTwPKCMQz2FgBCgHLiNt7gMeACYB1cAGMnbAtXAmUjwTRftY9385cB8ywbeTNt8H1gAZ11YOmeOnAhWkzVbgIteHO4G/7Fe0fNqcA3zN7Z9HPqj/Qtq8nozt38+RiYiIiBiXiOQzIiLiUIevItcKVCEC6c3geff961AA0bFAAciTNlcCf3P/W7fdCGPve3lEJn30+4uBc4BSYIP7fgLyFf0AIqJbXTuXAKeQNp8tIrp7hsz/n3Ztbyz65ljgrYiURkREROwzxqvyOV77HRER8WxBxuZRYFA5Im4GkdAGYBQRx7OAs5FZ/kTgFETmPoNIZD9SS+XDKQLq1c/JiGBuIviB1rnjAfS64x0NbAa6gAFgNVJHT9vHM5kJTAN6XN/9cXqBF+1jGxERERHjHlH5jIiIGA+4CpG8o5HJ2hJUySZgEOhDqmQdUioTSNWcCdwDXIDuef6hO4/I6zAitZXA+UgBLUem+F5gfdF+xabxSUi1/E/SZhkio2uBZcBtZGzvLucw6I4za5d+WKCLtCklY0efzOBEREQ8OzFeFcRIPiMiIg49pE0d8vNciAjl34D3I7K3ABHHUgIJ9T6dZYh4GveaCPzZ7bcKme5rUbolgxTMAWROn4f8PQcRwcW9NyLlc507XhVwAoq8L3f7z0b307XAvcALSJvPu34dgUjuUtd2izs27hySSBHtIG06EXn9K3D70xLgFBEREXGIIZLPiIiIQwsinmmkanahYJ/nIPVzKQr0OQPl59yOCN08ZH6fiMjoCCJ2DSgq/ljkZ3k3UlCHCYqnj4L3IkIl8hOtdJ8XEPm90+07D5HFMkQuk64PA0iFbULk9jKUEsqjgPxSZxQds8T1o9W1Mc1tdwQwBbj6yQxhRETE4Q/vDD8eMV4V24iIiMMX5yIytgb5R252rxQietuQiXs18tPMu8+q3f7+npxDhO55iMg2IyW122036LZ9AKmbeaRgbkIqaQ8ihl3AR4HvoXrw3h/UuO8MIqqVSCmd747/IqTa5pDy2Y+I8Ar3GkTpnfoQCe5HpLXZndvFpE39kxzDiIiIiEMWUfmMiIg4uFCN9aORwvcYGTu4yxbHIFJXDE8UVyLFsxfdv3y+zruQT6cPKkoQSGHetdfrvm9AxHICUkiHgZ8gdXQhClACqap1wHKgmYzdSNrsAH6PAprmEqLoy9x7HhHZs93fFxJM7HlERGuRCtvsjt/o+mDc91VuW4v8Sj1ZjoiIiBiD8aogRvIZERFx4KEKRZcArwVOd592AytIm0+QsUuKtt6KymgWk64EUkPzyOTuzeVeGTwPqZHeTD7q/q5x/68lRJbnXdtbkFn7dkQwX4NM6vWEVE5LCblBcfsc6T6fylj/UhCJ7HPHTbjt8+67Uvc+iFTaEkSOe1z7w66PKwjqbdfuhjMiIiJiPCOSz4iIiAMLJYl/H3ASSvju825WoEj0L5M2ryJjd7g9/o6Uw2pkii5x+05x2/tcnTWIZFYjQjeI1MwBZL6ucvsOImXSun3rkar5JTJ2aVE/70AJ57td+92ur5MQOQW41vWlE/gL8ELX3jBSLS0iuNtdX4vLcvoAp78g4rqFEKhUjgKYet0+M4E7ydjN+zjKERERzzKM5wpH47XfERER4wfzUQBNA7rnDCGyVo6I4XRU/lLI2BXANxApm47ycCYJD8sViOglkFL4GPKR7EXkMoFM2oPuM6+CDiFV9Sbg42OIp7AcRZnXu2O1IJL4CzK2y/XtEeDb7vsK4H53/OuBDiAL/AGZ7jcjE3u9e5UCDwJXogpKPwX+iNYQf24+yj8H/OAJxjUiIiJiXCIqnxEREQcaE9x7LVISPYpJVzNjcR9SGV+CVMvJSIEsd/v4fS0yx5e7l/fxHCKQvxb3Xd5t+2VgCmlzLiKjd5GxXWSsJW1+hvxHT0Sk9Q4yduWYnmXs7aTNcuATiNRuR6b4tahWexkimsPumMNI5axEfqLnA79BCmgKKayrXOulSMmtdOO2Zk+DGhERETFeFcRIPiMiIg40trr3dcin0iDSaBF5qwXmkjYnA/eSsTngjYikbULkzFcDGkXqJgQS2uLaKicEG5WhfKDe7N3t+vF11+6RiKCWAq8gbb5Mxq5yeTUfdK+94ZWu3ysQQTzFnVsNIo9NSBktQab2hUiFvQv5s16GFNha1+8md179rs91KO1SJJ8RERGHHcYraY6IiBg/WAY8ihTArYic1SGfzJko8nsK8gt9J2kzCdVXX4mIWxIRtxJCYE8CmaZ9pHklIeG8J5yliKx2ExTQSkQ8V7rjrkFq7L+QNr7tvSNtSlDQ1AZkTj8HkeMC8uE07ti+3nszIqWtwAsQcT0WaHPbT3CvaW48fAnQnn3qT0RERMQ4Q1Q+IyIiDiwytkDafB2Z0CtQSqQqRLJ6UeT4Frf1KaicZR75iJ6AiGoDwc+z1H2/axS5QeqhT7+UI0S3DyIV9XyUE7QYPvF7A8rjuS8oIMXyTPe/J7gGlc/sRCS7zPXH15H3yeunutc
"text/plain": [
"<Figure size 936x720 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"plt.figure(figsize=(13, 10))\n",
"plt.scatter(X_reduced[:, 0], X_reduced[:, 1],\n",
" c=y_sample.astype(np.int8), cmap=\"jet\", alpha=0.5)\n",
2019-01-15 05:36:29 +01:00
"plt.axis('off')\n",
"plt.colorbar()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Isn't this just beautiful? :) Most digits are nicely separated from the others, even though t-SNE wasn't given the targets: it just identified clusters of similar images. But there is still a bit of overlap. For example, the 3s and the 5s overlap a lot (on the right side of the plot), and so do the 4s and the 9s (in the top-right corner)."
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Let's focus on just the digits 4 and 9:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 66,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgQAAAH3CAYAAADE7Ee8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAACKjElEQVR4nO39eZhcZ3nnjX9O74vUi5aWLEtuybZsCyy8Y2yIDZgYk5A2dCIzySThGpKZeGL/QpjXQ2aI+SkdE4ZhPMFOTF7PO8C8kMwSd9LEChBwAGMWGzBesAzGlm2psWxJraU3qfeuev+470fPqeqqXiT1/v1cV1/Vfarq1HNOnT7397mfe0my2SxCCCGEWN6UzPcAhBBCCDH/SBAIIYQQQoJACCGEEBIEQgghhECCQAghhBBIEAghhBACCQIhhBBCIEEghBBCCCQIhBBCCIEEgRBCCCGQIBBCCCEEEgRCCCGEQIJACCGEEEgQCCGEEAIJAiGEEEIgQSCEEEIIJAiEEEIIgQSBEEIIIZAgEEIIIQQSBEIIIYRAgkAIIYQQSBAIIYQQAgkCIYQQQiBBIIQQQggkCIQQQggBlM33AIQQs0tbkvwh8AfAauAo8Bc7s9l75nNMQoiFR5LNZud7DEKIWcLFwJ8CQ8AAUANUAf9/iQIhRBotGQixtPkDohjAH4d8uxBCnESCQIilzWqiGAgM+HYhhDiJBIEQS5uj2DJBmhrfLoQQJ5EgEGJp8xdYzEAQBSGG4C/mbURCiAWJggqFWOIoy0AIMR0kCIQQQgihJQMhhBBCSBAIIYQQAgkCIYQQQiBBIIQQQggkCIQQQgiBBIEQQgghkCAQQgghBBIEQgghhECCQAghhBBIEAghhBACCQIhhBBCIEEghBBCCCQIhBBCCAGUzfcAhBBirmihvRX4I+A8YAT4NvBnu9ixe14HJsQCQO2PhRDLAhcD9wC1wCBQClQDTwF/IFEgljtaMhBCLBduB8oxMTAGDPvv5wOt8zguIRYEWjIQQiwXNvrjWGrbMLASaJ6ND2xLku2Y2GgGOoGOndmsPBFiQSIPgRBiubDfH9MToUpgADPWZxQXA3cAjcAr/niHbxdiwSFBIIRYLtwHjGJxA2WYGKgGXgQ6ZuHzWoFuoAfI+mM3Wp4QCxQJAiHEsmAXOzqAP8QEQBVmpL/K7AUUNgO9edt6maXlCSFOF8UQCCGWDS4KZsMbUIhObJmgJ7WtnllYnhDiTCAPgRBCzA4dmCBoABJ/bGTuBIkQM0J1CMSyRlHgYjbR9SUWExIEYtmSigLvxtZ267EZ3N26aQshlhtaMhDLGUWBCyGEo6BCsZxpxvLD0ygKXCw4WmifsPSgUsviTCMPgVjOdGLLBGkUBS4WFN6DoR24DXgX8F7gYy4ShDhjSBCI5YyiwMWCxo3+x4G1WHfGceBs4GLg1nkcmliCKKhQLGsUBS5gYbjk3RNwO9ZzYT9WWXE78PtYhcXQg6HMfzJYYSUtIYgzggSBmDVkbMViwMXAHcAqbObdAAwB/3kXO+6ZozG0Ap8E+okZLyux/5vXYZ6BIAjKfaz9wP9NKjtGokCcDloyELOCGruIRUQrZmCvxUoa9wKlwEfdUM8FtxPFAP7YD2wBurC+CyEIfCWWFbMPW0q4BLgSuFdxBeJ0kCAQs4VS+sRioRnzDIxg7ZDBOiBmMUM9F2ykcN+DEszw9/l4ajAPwSHgZUzEVAOHgXXAHRIF4lRR2qGYLZTSJxYLndgyQdogV2FegitaaN+Jr9GfSqzBNN+zH/OipcewDosTyABHMM/AYWzp4HlMRCT+ulrgBCYgWgEtHYgZI0EgZgs1dhGLhQ4sYr8G8wxUAXXAIHb9NmIz7weBmzFPV1gG+1gL7fuxWfoEY5+KT0i/544W2vPX++/DYgjARME6LJvgEeAZ4v/T3f6aO4BNwArMc5DBvAjbfZsQM0ZLBmK2UEqfWBS4Yf7PmEdgBbZeP4hdtz8kLnfdTu4y2HrgGuDXgHOB85nosg9LZ+uB3wZ+E3g78JHwAn/9Ddj9+DwsJmAFJgZeAa4DrgcuAG718d7try/DAiBf8c/JMrG2hhDTQoJAzAqeTXA3dpPa5I/qESAWJJ5N8K+x2XgJNtv+Z+A5f0kvuev8TZiRHsdm543AO4A3kxvc1wycBfwi5nk4jhnxX2qhvdVf9zHgJiw2YAhLMSzDvBXXYgGPDZhY+BcttLe6KPgxttRwCFsuqMJETH4sghDTQmmHQgiRwmMG8pe7GoA3YIKhBxMDr8cCERPMu1CBLR2UAMcwr0M9thxR668bwwIXM8DPMC9AK7DBt4/5a+swsXECW8pIfL8hQPc2bHngfEyoNPj2/cCLu9jRdmbOhlhOKIZACLHkKRTY508VCvbrwNboIbcL5n1YDAGYAR7GZuUj2L00pAOewNMWgc9gM/xQZbDM3/MyZsibsdTBYczor8JEBUSRkfjfWd9Pte/7LuBy3x/E2Acty4lTQksGQoglTSqwrxGbff8u8CjwTSx/Px3stz21Rp9e7noQm5HXYZ6CSuCo/5T7frP+M0BMW9yBeQuymIEfw0RGIzab7yRWHqzHBEDILCjBxED4CfsvwTwKH/MxbfFxbsREiRCnhJYMhFjGLIdqkqklgPXYWv4IMZ8f4CB27AXd7XmZAsFjsMWf7gPeghnkccxtP4gZ+FFsxv9VLL4g1Dmo9M//V8Ae4AHs/I/7c4EgBMJSQfjJYMWKqvy5DFaroA4TBSPAU8B9u9ghb4GYNvIQCLFMWUbVJJsxQ3410ShXYMa3HAv6W4XNti8t8P5CRbb24gICeBoTAceJYiBE/x8FDmABikNY9sAY8JVd7AhLFH9MjEXIYkJi3F8fCM+V+HPHMEEw6Mdzjv9k/bMbgU/OYaVFsQRQDIEQy5e0oSP1uNQK24Qc/pWY0S4nrs0n/vfZWNGfQil7hYpsbcA8AyPYDP01zEtQjhnsBMtE+AzmmTgIfIFU34Gwo13s6Gih/QYsHqCBmKq70vc9hi03lPr2CsxDkU47XOuvK/PtDb799hba9zDPjZvE4kBLBkIsU9qS5HOYoUvfBBJg085s9gPzM6ozi7v7bwVuxAz0CLbeX0087oxvP4FlEXyb3ODDVnKzDrYB7ySKgRLMWB/A6hGMYn0IQuXAJzBPwQZMEPRiXoWThjlvWWIDVntgLTENMT82YJQYYDhGjD3IYl6DfkwcjANfJ3e5Q42QREG0ZCDE8qWTiTPiJVNNMmVkRzGjeIwYxR/W48FEUCU2qz6X3CWUjwEXAe8Ffg/4A+BXMdd/yAYYwIzxef57v39mDXY+34x1LCzBBMczpIIY4WRxpLsxD8Nl2JLEc5gRr8WEwXF/HPPPqyZ6DEKswZi/bsz/rkY9RcQ0kSAQYvmy1KtJ5q/99/pPmEknqdeG9fnVWExBFjO0FwNvxc7LGixwL6znhzS/av8px4zxKmyZoBwTCDXAVdgSwyXYzL+HPMPsoqAL+DLwNcxT0YXN8kcwL0Q4jrCEMEQUCeF4RjGBE54v1DRJPUXEBCQIhFimLINqkiGYEMzNP4QF4AUjGoRBEAfDmJv/jf6eKzGDH+oLZPJ+wO6hdf572hCP+e9VmLAoIRYuuhZbvihkmC/1n5t9zD/DlgCqfP/HMZGRYCKhBxMNrxKFwgo/vkf9/UvWCyTOLAoqFGIZ48Z/wQmA1Nr/Nb7pMeD+Ga57pxtsNWAGc4y4vl6FGfoxYspglmjgN/lzwXsAUTyEv0uIaYEHMWM7TvQwlGEio5oYAAhm7EdIGWY/5s3+5xhWhfANxAZLB/2xxj8zlEE+4O9Z4cf4fxNjBf4HsZhSOobgs5OfOrEckSAQQiwoUvX9L8eMcwXW2OeqFtp/ZzJRkFeRcBDLy9+Lza4vxNbjS/3vMMsfxYxtNbYsMIx1GyzB3P5heSDk/GcwURG6Co5h7YkPY8Z/s+8r7c4Pbv8q39bERMPcCjyLLS+s9nGEIMjnMSFS7Z8f6hmk+xj0+Tg2YULjs96yOT/L4LMKKBSFkCAQQiw0WrFAvrWYsR7G7lXbsS6Bv17oTQVaDQdX+Tr/qSD
"text/plain": [
"<Figure size 648x648 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"plt.figure(figsize=(9, 9))\n",
"cmap = plt.cm.jet\n",
"for digit in ('4', '9'):\n",
" plt.scatter(X_reduced[y_sample == digit, 0], X_reduced[y_sample == digit, 1],\n",
" c=[cmap(float(digit) / 9)], alpha=0.5)\n",
2019-01-15 05:36:29 +01:00
"plt.axis('off')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Let's see if we can produce a nicer image by running t-SNE on just these 2 digits:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 67,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
2021-11-19 06:03:48 +01:00
"idx = (y_sample == '4') | (y_sample == '9')\n",
"X_subset = X_sample[idx]\n",
"y_subset = y_sample[idx]\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"tsne_subset = TSNE(n_components=2, init=\"random\", learning_rate=\"auto\",\n",
" random_state=42)\n",
2019-01-15 05:36:29 +01:00
"X_subset_reduced = tsne_subset.fit_transform(X_subset)"
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 68,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAgQAAAH3CAYAAADE7Ee8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAACgVUlEQVR4nO39bYwd15mgCT5XZZKpLlm8tK2Uq0R2UrKlMj1Km1J1rVZqrCVPeVQGep10cZv1x64pjDBYELAAQ4BRu8BKm0hIPxaCAa+7VQvtD2thzzQG09nDAunxwNa625K8K7ZRvaIsaiyX9ZktyraScjFTksEPlXX3x3sO49zI+DgRcSLixM33AYjLvB8RJ77O+573czSZTFAURVEUZXtzRd8DUBRFURSlf1QhUBRFURRFFQJFURRFUVQhUBRFURQFVQgURVEURUEVAkVRFEVRUIVAURRFURRUIVAURVEUBVUIFEVRFEVBFQJFURRFUVCFQFEURVEUVCFQFEVRFAVVCBRFURRFQRUCRVEURVFQhUBRFEVRFFQhUBRFURQFVQgURVEURUEVAkVRFEVRUIVAURRFURRUIVAURVEUBVUIFEVRFEVBFQJFURRFUVCFQFEURVEUVCFQFEVRFAVVCBRFURRFAT7Q9wAURVHSLLG6CBwGFoA14NgJjpzud1SKMtuohUBRlKgwysDXgD3A6+b1a+Z9RVFaQhUCRVFi4zBwDtgAJub1nHlfUZSWUJeBoiixsYBYBlw2zfuDoKrLY4nVw8C9wF7gDPDICY4c62KsimIZTSaTvsegKIpymSVWlxE3wQYwDxwwr28CX80TrCuj0RYhvDyZtBZ3kCfEHZfHOUSR2W2O5+tZYzfbeRh4x/n+B4G/VqVA6RJ1GSiKEhvHEAH6MeC/BP4I+CjwIeChrFgCowxsiTsw7wfHEeJ7gF+a14fN+1VdHveSKAOY13fM+4rSGaoQKIoSFWYV/XVEEbgW+B3wCnABuBE4mvGzruMOioT4gvM+zud5Lo+9Od/fG2SkiuKJKgSKokSHUQo+APw98DLwW0QheBu4PeMnVYVwU4qE+Bpi9nfZbd7P4kzO9880HKOiVEIVAkVRZoGqQrgpRULcujzGwMi87jHvZ/EIEjNgt2djCB4JOWBFKUMVAkVRYuUkcDUwZ/6eM3+fzPhuVSHclFwh7rg8zgH7zGtmQCGACRz8a/O9PzSvhQGFS6wuLrG6vMTqY+ZVazQojdEsA0VRosQIuYeQDIM5xGWwDtyfJVxjyTJoa3/OfitlMSiKL6oQKIoSLbGXMO5jfKm0TMsYOHeCIytt7luZbVQhUBRFqUHBSv04sEhLSsISq48hqZXu5D0C9p3gyD2h9qNsP7RSoaIovRO7JSAHN9UR8/ph4AHgSab7MIQ056+x1ULQZgClsk1QhUBRlF5JrbTbEqJtkFVieS+wk2klAUR5CHUsx5DzBdOWiW8F2n7rdB3vofihWQaKovTNUJsZZaU6zgNnU+/tAg6FygiomsUQG11XlVT8UQuBoih9M9RmRlkr9UtMH8s8cKf53LV+NIozMN+NXgHIsgSQ7WqBsFYUpQYaVKgoSq8MOWo+I/bhNHCIJNDwbkRReAJJmQTp0XAzEmcwE2mDOYIfsoMurwaeIyMocnky0aDIHlELgaIofTNYn3jWSn2J1RdJhOMuRPCvO1/xjjPwDbbsMyjTcQFMxYAA75JtCdiHXOMNZzMaFBkBaiFQFKU1hiDQ2iTH+nEEiTN4wnlvS9qgbwGivgsVrYxGeRaezwDfZasl4FNIT4ot49XAwn5RC4GiKJWoKOS9sgeG4hOvgU+cAWSvkH197b345B03wZeAN4AXSCwhtvFTliXgWZJYAnsPfUuVgf5RhUBRFG8qpghu++CxExw5vcTq15kWfg8icQZjil0kvsGWnQdlptwEv0SO5Q7gaUQp2I30nNjjjOfycRrhvy3ugSGhCoGiKFWoIuSDCaohuxQ84gzWgG9lHI9vAaJWChWV1Apw74OfIcrABDiAWED2IKmRN5Lq96CWgHhRhUBRlCpUEfJBBNWACxfl4uki8Q22DB6UmRcouDIaWT+/ex+sI5aBP0FSKa114EbEEvIc8GPz/qGV0ehFVQriRAsTKYpShaxiPHlCPlRL4ugKF3XRfti3AFFLhYrKznnWffD7wItIIOF7SAnnKwq2oUSGWggURamC92o0x3+eZRovI6rCRV1aLHyDLVsIyrTnfB5xA4wRgb5hPk/fB7eY/58iEf47EVfBy852h1BwatuiCoGiKN5UFfJlgsozNqDQ9dBDfMF2CJZcAz6OuAAuIIJ8DIxXRqPF5cnk9Mpo5N4HWfUWziIKhYvWG4gYrUOgKEovhMizN1/pNAd/O7QfNjEE3zF/bgJz5t9p4KXlyWQl9f2sWgS5FRk1hiBO1EKgKD0SanU70Ch8r5V2kVXCFP7perU+8+2HjQXgNeQ4bS2BZ5BV/2WTv5OJcBDYDzwPvGJ+8z6SYun2bNB6AxGjCoGi9EQoX3RMUfgVFRPv2IAC10Mf8QWDLbVckWfJrkC4BlsyEZ5DShXfDFxlfmuFf9UgUqUnVCFQlP4I5YuOwqddQzEJsdLufLUeMFgydsoUn/R99zLwG+Bc2qXgQ0ndA6UDVCFQlP4ItbqNJQq/qmISYqXdy2p9hkstXyYjcDBt8g9233nUPVA6QBUCRemPUKvbWHzapQIiw6VwnJSPucpKexut1nuhpMRwyPsuCivXdkcVAkXpj1Cr21h82jZVbS9J3voZ4CXIdSn8lflObWJerQ802DOTDJP+aaQSIXjcdyUugVisXNsarVSoKD1RVmHOtxpeS5Xq6nAauJ2kac/Y/J1V/36CFK65EbiV6ZiD4FX/+sBRgPYw8ONzTPrusRxCLDxT912WiT/n918z70O1CphKS6iFQFF6JG91WzVAL5JV8iJSw961EDxv3j/G1lXgAeBtJL/dVrcDYyaegdX1UeAm5Pg2kPbAtnTvkI4D8k36ixk1CbZct4Lf23MRi5VrW6MKgaJEQIbwm2d4PtUFJAfdLVU7IjH7pn3OY+Ai0z7oTWAhplTKOpjx341E3W8CVyIdAU8yTDO4l0k/LzgQuBpJTcz8vRPAeBT4gvn8ZMDxKx6oQqAoPZMj/O4Gfpj6ai2faocr7bIgs/Qq8IL5/JmM7zcKMovAunAYeAtRiECOFaSAz/c7HEcofAMI867bPpICR0W/vwp4isRKUJppoOmK4dAYAkXpn6zOcm8hwsOlsk+1Yz92YXfDjFiHZ5DueJec71+PWEe+BHya6Vr4XgpRJL77BaQ4jy35a/kIwyzUU9q50gjmQ8Cd5p+9dpvmX+HvqdHV0iM2QamAWggUpX+yzLHPAp8jCdCr61PNXWkvsWo/r7WKzlqFIwJ/KgXQfHc5az+pbZw3m34P+CVy7HcATyNNc3wVohhS2OyK+mmSboEXgMdjcHlUtaCU1SRwBPNFROBbF8nTiML3LEksQV4Z4zqZBjFc65lBFQJF6Z8sc+xF4HFksmuSX583yR4EbqCmjz7Px49kN6yUfc/uxw2GNErDe8h5+BkiUCaIQP0gUhb3NfO900zXL3AFWgwpbNY9cg4xgVuF7tEOx5BJ3fiMkpoEVjCfQq7bBfPvFuAXJMK/6N6qU9cghms9M6hCoCj9cxp4AEnDW0fy8t8nTOpg3iRrJ1r7vn31XVn5rsyqrODcyX0dWV1+EnEjjEka53wc+Esk6OwVtgq03gs1RV4wqQ2rkb12E6atIhP8uxvWyTTo/VrPEqoQKEqPmNXaIUTY7UP8rmPgwUDCI2+StX5dlyorK9+VWZUVXHpyX0fMze8jEer2/b3AO+b1ZbYqGVGksEWSCppFG1Yj99qtm39jpK+B1znwdEukXVRRXOtZQYMKFaVf7GrtZeAJYBXpH98oKMoWNQLuQ7rQ7cApHoP4dJsUgvEtJFOl4Exe4FpaeRmTFD6yXFYyIirUFCt512Q3FYP6HNxrdy3wZ8C/AOZDBPjlBQ+aj91rvQO53+9bGY2WNbiwGmohUJR+Ce4DzfAR21WTWwUR4CHEIjGH+Hv
"text/plain": [
"<Figure size 648x648 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"plt.figure(figsize=(9, 9))\n",
"for digit in ('4', '9'):\n",
" plt.scatter(X_subset_reduced[y_subset == digit, 0],\n",
" X_subset_reduced[y_subset == digit, 1],\n",
" c=[cmap(float(digit) / 9)], alpha=0.5)\n",
2019-01-15 05:36:29 +01:00
"plt.axis('off')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"That's much better, although there's still a bit of overlap. Perhaps some 4s really do look like 9s, and vice versa. It would be nice if we could visualize a few digits from each region of this plot, to understand what's going on. In fact, let's do that now."
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
2020-04-06 09:13:12 +02:00
"metadata": {},
2019-01-15 05:36:29 +01:00
"source": [
2022-06-14 07:47:11 +02:00
"Exercise: _Alternatively, you can replace each dot in the scatterplot with the corresponding instance’ s class (a digit from 0 to 9), or even plot scaled-down versions of the digit images themselves (if you plot all digits, the visualization will be too cluttered, so you should either draw a random sample or plot an instance only if no other instance has already been plotted at a close distance). You should get a nice visualization with well-separated clusters of digits._"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2022-06-14 07:47:11 +02:00
"Let's create a `plot_digits()` function that will draw a scatterplot (similar to the above scatterplots) plus write colored digits, with a minimum distance guaranteed between these digits. If the digit images are provided, they are plotted instead. This implementation was inspired from one of Scikit-Learn's excellent examples ([plot_lle_digits](https://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html), based on a different digit dataset)."
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 69,
2019-01-15 05:36:29 +01:00
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import MinMaxScaler\n",
"from matplotlib.offsetbox import AnnotationBbox, OffsetImage\n",
"\n",
2021-11-19 06:03:48 +01:00
"def plot_digits(X, y, min_distance=0.04, images=None, figsize=(13, 10)):\n",
2019-01-15 05:36:29 +01:00
" # Let's scale the input features so that they range from 0 to 1\n",
" X_normalized = MinMaxScaler().fit_transform(X)\n",
" # Now we create the list of coordinates of the digits plotted so far.\n",
" # We pretend that one is already plotted far away at the start, to\n",
" # avoid `if` statements in the loop below\n",
" neighbors = np.array([[10., 10.]])\n",
" # The rest should be self-explanatory\n",
" plt.figure(figsize=figsize)\n",
2021-11-19 06:03:48 +01:00
" cmap = plt.cm.jet\n",
2019-01-15 05:36:29 +01:00
" digits = np.unique(y)\n",
" for digit in digits:\n",
2021-11-19 06:03:48 +01:00
" plt.scatter(X_normalized[y == digit, 0], X_normalized[y == digit, 1],\n",
" c=[cmap(float(digit) / 9)], alpha=0.5)\n",
2019-01-15 05:36:29 +01:00
" plt.axis(\"off\")\n",
2021-11-19 06:03:48 +01:00
" ax = plt.gca() # get current axes\n",
2019-01-15 05:36:29 +01:00
" for index, image_coord in enumerate(X_normalized):\n",
2021-03-02 06:14:12 +01:00
" closest_distance = np.linalg.norm(neighbors - image_coord, axis=1).min()\n",
2019-01-15 05:36:29 +01:00
" if closest_distance > min_distance:\n",
" neighbors = np.r_[neighbors, [image_coord]]\n",
" if images is None:\n",
" plt.text(image_coord[0], image_coord[1], str(int(y[index])),\n",
2021-11-19 06:03:48 +01:00
" color=cmap(float(y[index]) / 9),\n",
" fontdict={\"weight\": \"bold\", \"size\": 16})\n",
2019-01-15 05:36:29 +01:00
" else:\n",
" image = images[index].reshape(28, 28)\n",
2021-11-19 06:03:48 +01:00
" imagebox = AnnotationBbox(OffsetImage(image, cmap=\"binary\"),\n",
" image_coord)\n",
2019-01-15 05:36:29 +01:00
" ax.add_artist(imagebox)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Let's try it! First let's show colored digits (not images), for all 5,000 images:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 70,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuMAAAIuCAYAAAAVCwbOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOy9eZyeZX3v/75n35csk53JQgJBAmErBBVELNJWB805sYttOdL2FIWK9kdt68CJI4xVD61AsYeeSqzac7qMJ0q0LqigqODCHmRJAmSyZ7LMPvPMev/++Fzfue7nmWe2ZJJJ4Pq8XvN65rmfe7nu/XN9r8/3843iOCYgICAgICAgICAg4OQjZ6YbEBAQEBAQEBAQEPBGRSDjAQEBAQEBAQEBATOEQMYDAgICAgICAgICZgiBjAcEBAQEBAQEBATMEAIZDwgICAgICAgICJghBDIeEBAQEBAQEBAQMEMIZDwgICAgICAgICBghhDIeEBAQEBAQEBAQMAMIZDxgICAgICAgICAgBlCIOMBAQEBAQEBAQEBM4RAxgMCAgICAgICAgJmCIGMBwQEBAQEBAQEBMwQAhkPCAgICAgICAgImCEEMh4QEBAQEBAQEBAwQwhkPCAgICAgICAgIGCGEMh4QEBAQEBAQEBAwAwhkPGAgICAgICAgICAGUIg4wEBAQEBAQEBAQEzhEDGAwICAgICAgICAmYIgYwHBAQEBAQEBAQEzBACGQ8ICAgICAgICAiYIeTNdAMCAgICAk4T1Ec7gdpx5mimMV56choTEBAQ8PpAiIwHBAQEBEwXOme6AQEBAQGnG0JkPCAgICBgcsgW9a6PNgKfcN++dBJbExAQEPC6QIiMBwQEBAQcG+qjfOBP3bdu4Asz2JqAgICA0xKBjAcEBAQEHCs2AAvc/1+mMW6bwbYEBAQEnJYIZDwgICAg4FjxYfcZA38/kw0JCAgIOF0RNOMBAQEBAVNHfXQJcKn79j0a4xenab07CY4tAQEBbyAEMh4QEBAQMDHqozXAekSUm4GLE7/ecxJbEhxbAgICXlcIZDwgICAgYHyIiN8KtAK7gUXAb7pftwPfnrZtBceWgICANxgCGQ8ICAgIGBsi4vcA84AW4EVgJT7n6F4a4/gEbj84tgQEBLyuEch4QEBAwOsJo+Ukm2mMtx7jutYDn3LrGgRmA0tRZBxgGPjF8TV4QgTHloCAgNc1ovgEBjQCAgICAo4Bx5rEmC4nWQBcDlQALwOfpjHePIU2rAGagPlAEZDrfokS/+8HvgfcNYrwT1enoD76GUoUjYE3TVuiaEBAQMApgmBtGBAQEHD6YawkxvWIiM8H3oFGPzuBxcBnXaR7slgPzAXyUVQ8RkQ8AnqBZ4Gvu+2lr9d3CqqRxrwauNVNnzym4thSH62nPnqY+mib+5zKvgYEBATMGIJMJSAgIOBUw7EnMdYi8nsd0A/0uemFiJTfDGx265socl0LFCApypD7LHG/5QKPIQ35POA66iO/HrgRWIUi6m1IZ26kfesUouYfTvw/tmOLiPdn3T7uQ+T/s9RHTGk0ICAgIGAGECLjAQEBAac6Jp/E2AxUAuV4Ip4HpIB2FCGfbOS6GRH6HPc3jCLk9rkOOapc47Zl67kTeBeKoLcDxUguUwjUTjpqXh/NA97nvk3k2HIzIuLt7ns7vvMREBAQcEojRMYDAgICTn1MNolxMyK6vUApimzno6j0PGCPm289ItjnA1XAACLLX6I+etCtZzPwu8BCt56kZrwHEd5VKFr+SyRjKQAuRAQ7BziAOg8Aa4Hv4KU0tg9tKDH0HqeVt+j6e936YGLHlsWuPbUoGp8CDmOdj4CAgIBTGCEyHhAQEHDqY3Jl5yX1uAvYiiLkuYj45gNLgBT10Sbgt4FLUNR6EFiGSHcxFqkW7sfLVFIoUp5CJLsSEfNmRPRr8BHwXkSKl6JOAcAcRLJr8RFs3HJrgDOA5Sga/h/AX7vf+5jYsaXVLVvstjvXrbOC+mgT9dHGKevVAwICAk4SQmQ8ICAg4FTGWEmMY+muG+Ot1Ec/RwT6XCRZSSFSW42i2L+OItqDyKawwM1fjo9YWwLkN1CEuQolhrYAR4EfAVciAlwFrHbbGXLr7nfrXQrsAB5ybWt27WhDRPw33PJ5KIreA5yNSDXAC8BN1Ee9bvnR+w2vue1b5yJy6yt2+30m8GUXeX+G47F7DAgICJhmhMh4QEBAwKmN0UmME+uua4GnUKLnfUhzvRcR3BjoQoR1GZ6I5wJliCAXoiTQ96OI+ovAg8BLKOpd5bbzIrJOTLlpxe5vCJHgTkSKW1CUHUSeq4EVKJJejiL3KRRhX4Ai54eAHwJbsOTPsfZb7W1Gcps89G5rc224CEXJcfMfm7NLQEBAwAlCiIwHBAQEnKoYO4kxm+7apm9FxLQ6Mb0KEVP7fsD9XoDIeYyI8xzggxmtWIki4Lh59iCyHKHo93Y3bR0wC5/4OQdJVLqAPSORaEW370Idi3y3zhwUFc9D8pdu99t5brsvoQ5Gpta9zW270k1/2s1/FuoQDKCo/i5E9iuzHKuAgICAGUUg4wEBAQGnAmTPdzMij3tQRPtcsicxmoVhEkpgVMS3BrmcHEayDCOiT7nfShEZT6Go9QK8Bnw8JMn3EkS8H0jIRzajyLnJVYqAx1G03EPz7wR+jPTd70XR7T4Uoa9EJL7LLXsl8AhKAl2Gd4cpRlHvI65tlW76oNv2Ebe/KdRZKHDrEOqjTyRalb2QUkBAQMAJRiDjAQEBATMFr3++ApHEFpI+2Yo0A3QA/5xYMjPyDSKivUi2YTKN5cjx5Ofutzcjst+HtNkxItWdiDxXITL8MiLCVnnzXLeNJ4HbsuqtRbAfQm4qFoF+CpHkA1n23vahBXjUHYMyFM22TsfhjGUq3W8p9z2FSHwE3AHcjsj9AXTs8lGnZTIdjbEKKQUEBAScUATNeEBAQMBMIF3/vByR4WoUtW5HwZJqN/cmGuOuxNKmu65CJLMqMW8OihYPIU33Lrd+EEEdcn9tyCmlAxHvAURIjyDyakmfq9yy/cBvT5D4eD+wDSV3PuqWqcYKDaUjuQ8vAd9HxLnVreM1dwx63fqK8fIYS+4swvzMVdznD5ETy1Zko/gssk1c5fbtCeBfgC8jCcwnEu0Zq5BSQEBAwAlFFI9r3RoQEBAQcEKgipoW3b4ZEeI8RIqb3VwLaYxXjbF8NleRjyIivgRvKdiBSOs29zmI13OXuu0XIuK81y2zHMlRKpBLCcC/0xj/ziT2a7LVNZPzrsVLTCpdO15JzFmFSDrIGcXcXdqQZGYHjXFDlnXfiTTvA6gjUgYcBD6OXGKakUSnG1g8jn97QEBAwAlDkKkEBAQEzAySuu9ORJT78FHfSnyRntGQLAQ88V2PLAXPQtKSITenFeCpcttZhqLJpYiE5wP73TJGvLvdMkvd9xhomBTR1vfJJUb6fVju1tfu/l/n5njVfT8X2SmWI8eVfcDPELk+FyhznZtke9YjvXgHivK34t1k1rj/J1NIKSAgIOCEIpDxgICAgJlBUvf9c+T9nYu03JVIXvGSK9Izmvh6mUsr3ubvfPRcH3Z/OYh8DiHi2omSGEvcdLMCXITkICZdzMcXAQL4npsvc3u3Uh/ddZye3UlnmBp81cw3uzbMQoT6HNeuXNf+ta69g6jTcBHwHuqjO5xkZS2+M2EVObvRsa1F/uYwUSGlgICAgBOMoBkPCAgImBkkNdMvoyhwBSKgy1GU+CDZfcQh3eavzn2aX3eMyPMQIuADiMRaufjI/aUQCY8QSW91f4N4DTrIhjBJmmP32YovDnSssIqcVsGzGI0I9CIivhevW+9D0XzzJs9x7S5E0ptK4HbnTLMU3xGxCqTmIDNAtkJKAQEBATOAQMYDAgICZgK+dH0r8BZEwF8DHkMR3CWo4uVYxHctkltYYmMxkp7EiNgfwhPPgyi63I9I+SC+UmWhW2YY6cTtz6Li5m+eWcYezE7x+NCMSLRV8EyhDkMLItpvcm3uw1cNzcN3KIbdpyXAFiAN/vNunwvddmIkS2lBHR7DPcfZ/oCAgID
"text/plain": [
"<Figure size 936x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"plot_digits(X_reduced, y_sample)"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Well that's okay, but not that beautiful. Let's try with the digit images:"
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 71,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAB68AAAVdCAYAAACoywxDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOzde5yc893/8dcVkmyIZFOSaInEWRBCuYsqilKqCSk93M5ux6LomaWRkmjdKeVHGzTuFj2QNs7VplqH9kZvLSoqiENSccihsjnZSSKZ3x/fmd3Z3ZndOVxz2n09H499TObamev6zjXXzE7mfX0+3yiZTCJJkiRJkiRJkiRJUjX1qfYAJEmSJEmSJEmSJEkyvJYkSZIkSZIkSZIkVZ3htSRJkiRJkiRJkiSp6gyvJUmSJEmSJEmSJElVZ3gtSZIkSZIkSZIkSao6w2tJkiRJkiRJkiRJUtUZXkuSJEmSJEmSJEmSqs7wWpIkSZIkSZIkSZJUdYbXkiRJkiRJkiRJkqSqM7yWJEmSJEmSJEmSJFWd4bUkSZIkSZIkSZIkqeoMryVJkiRJkiRJkiRJVWd4LUmSJEmSJEmSJEmqOsNrSZIkSZIkSZIkSVLVGV5LkiRJkiRJkiRJkqrO8FqSJEmSJEmSJEmSVHWG15IkSZIkSZIkSZKkqjO8liRJkiRJkiRJkiRVneG1JEmSJEmSJEmSJKnqDK8lSZIkSZIkSZIkSVVneC1JkiRJkiRJkiRJqjrDa0mSJEmSJEmSJElS1RleS5IkSZIkSZIkSZKqzvBakiRJkiRJkiRJklR1hteSJEmSJEmSJEmSpKozvJYkSZIkSZIkSZIkVZ3htSRJkiRJkiRJkiSp6gyvJUmSJEmSJEmSJElVZ3gtSZIkSZIkSZIkSao6w2tJkiRJkiRJkiRJUtUZXkuSJEmSJEmSJEmSqs7wWpIkSZIkSZIkSZJUdYbXkiRJkiRJkiRJkqSqM7yWJEmSJEmSJEmSJFWd4bUkSZIkSZIkSZIkqeoMryVJkiRJkiRJkiRJVWd4LUmSJEmSJEmSJEmqOsNrSZIkSZIkSZIkSVLVGV5LkiRJkiRJkiRJkqrO8FqSJEmSJEmSJEmSVHWG15IkSZIkSZIkSZKkqjO8liRJkiRJkiRJkiRVneG1JEmSJEmSJEmSJKnqDK8lSZIkSZIkSZIkSVVneC1JkiRJkiRJkiRJqjrDa0mSJEmSJEmSJElS1RleS5IkSZIkSZIkSZKqzvBakiRJkiRJkiRJklR1hteSJEmSJEmSJEmSpKozvJYkSZIkSZIkSZIkVZ3htSRJkiRJkiRJkiSp6gyvJUmSJEmSJEmSJElVt2G1ByBJkiRJkqpvwIAB7yYSieHVHkc2DQ0NC1taWjav9jgkSZIkSeUVJZPJao9BkiRJkiRVWRRFyVr9jiCKIpLJZFTtcUiSJEmSysu24ZIkSZIkSZIkSZKkqjO8liRJkiRJkiRJkiRVneG1JEmSJEmSJEmSJKnqDK8lSZIkSZIkSZIkSVVneC1JkiRJkiRJkiRJqjrDa0mSJEmSJEmSJElS1RleS5IkSZIkSZIkSZKqzvBakiRJkiRJkiRJklR1hteSJEmSJEmSJEmSpKozvJYkSZIkSZIkSZIkVZ3htSRJkiRJkiRJkiSp6gyvJUmSJEmSJEmSJElVZ3gtSZIkSZIkSZIkSaq6Das9AEmSJEmS1Du8/vrrrFixAoDdd9+9yqORJEmSJNUaw2tJkiRJktStSy+9lPnz53PiiSd2+t1HP/pRNt1006z3mzdvHj/5yU8YOnQoDz30EE899RQrVqxg3bp15R6yJEmSJKnORMlkstpjkCRJkiRJVRZFUbKr7wj69OlDFEXtliWTSaIo4qCDDuKUU05pXf7QQw8xZ84c9tprLx566CHefvttoihik0024YMPPqClpaWg8DqKIpLJZNT9LeMxYMCAdxOJxPBKba8QDQ0NC1taWjav9jgkSZIkqRwMryVJkiRJUknhdXfLfvnLXzJ//nyWLFnC1KlTAWo6vO5uX1RTpfeFJEmSJFWSbcMlSZIkSVK3Jk6cyI033siSJUtal2ULeAcOHMixxx7LzjvvDMDee+/NgQceCITW48lkku985zuVGbQkSZIkqa5YeS1JkiRJUqU1RWOACcBIYD4wk8nJ2dUcUj7VxkuXLuXBBx8E4O9//zuPPvooAFtvvTUTJkwAYM8992wNrjO1tLRwyCGHMHfuXF5++WU+9KEPFTI2K69TrLyWJEmS1JMZXkuSJEmSVEkhuP46sBRYBgwGhgBTqxlglzOwXbVqFf/93//Nj370I7785S9z+eWXFzo2w+sUw2tJkiRJPZltwyVJkiRJqqwJhOC6OXU9fXk2TdEiaqgaOw4vvPACt9xyC3vttRd77rkn//Vf/1XtIUmSJEmSapThtSRJkiRJlTUSeLPDsv7AocCDqd8NAb5OU1TVauw4XHnlldx1111EUcTMmTMZMWJEtYckSZIkSapRhteSJEmSJFXWfEI43ZyxbCywJLVsGDA6dXkdTdEF9RpgP/744zz44IMkk0m22247dt999+oMpAbnGJckSZIkdeac15IkSZIkVVL2Oa8/AzwMJIH9gETqZyjwNyowH3ac8zy3tLTwhz/8gR//+MfMmjWLQYMG8cwzz7D11lsXO7bi53kuYo7xXj/nddhnZwP7ppY8CUwz8JckSZJUbn2qPQBJkiRJknqVEABOJYSpI1KXs4DVhIrrdHDdACxK/X5CVcZapFdeeYVjjjmGWbNmAXDppZcWHVzHIHOO8WTqsu72acWE4PpK4JPAmtTPJ4ErU7+TJEmSpLKxbbgkSZIkSZUWAuy2Kta26uBhwGJCcN0APEOoFh5Z+UEW77bbbmv998knn8zXvva10lfaFN1KoS2/w34dD0SE0HoO4YSAutunFTSBcBwuJ5xEkTYs9TurryVJkiSVjeG1JEmSJEnVNjk5m6ZoKnAdMJwQsD6TumwkhLbFqfB8z5MmTeKmm24CYJddduF73/teXKt+k9Du++s0Rd23UW87IWA1IbweQGjJ/gShmrj4fdqzjSScOLEsY1mC0G7dwF+SJElSWdk2XJIkSZJUMQMGDHg3iqJkrf4MGDDg3artnBDGXkCY4/ofhArsRkJgO7OodbYFuENoH/6Wrf3zypUref/99+nXrx+/+tWvGDZsWFyrLrTld7pd+LOEMBZCCLsHpezTnm8+bW3r0xpSywz8JUmSJJWV4bUkSZIkqWISicTwZDJJrf4kEonhVd1B2efD7r7KOLeKz/ccRRFRFPH973+fnXfeuRybyLfl98jUbRcRqq1bgP6pn1L2aU83k7DPBtHWvn5QapmBvyRJkqSysm24JEmSJEm1pON82KUZSai4zlTW+Z633357jj/+eI4//vhybWIw+VUAzydUWDcTgtd0C/alBtddCC3sLwXOBvZNLX0EmOZ+kyRJklRuhteSJEmSJNWSeOeong9sB2xJCG6bgQXAq1m2GYszzjiDM844I67VZYoIwfUQYHq732TbZ+Hn66lbLMt53xLccsstrFy5svX6Y489Rp8+fXj22Wf5yle+wn333cejjz5KFEXt7pdMJhk7diwnnXQSn/nMZ9hhhx3iGlI8QoA9jRD4j0xdSpIkSVLZRclkstpjkCRJkiT1ElEUJWv5/6FRFJFMJqPub1kmbXNUL6V94Fpcm+umaAJwNbAiY32bAN9kcnJm5jajKZxUq89NFEUkL+F/yBbmd7XPgoJPBMjnOD3//PO5+eab+eCDD1qXJZPJrEF1V8t23XVX/vGPf3Q3pMyxlf8Yjfs4lCRJkqQ8WXktSZIkSVLtyJyjmozLCaRbiRdWmT0GeJL2ldcvpJan5y9Ob7O2TU6eluM3uffZ5OQk4mvB3s4DDzzQLrjuzpgxY9hkk02AEF7//e9/Z82aNbzyyiv86le/4otf/GI5hlms7o9DSZIkSSoDw2tJkiRJkmpH13NUt6+IfZNQDft1mqJcFbEjgdeB1zKWRbSf8zrbNutJxef1Bvj973/Pcccdx/Lly2lubqaxsZH169fTp08fRo4
"text/plain": [
"<Figure size 2520x1800 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"plot_digits(X_reduced, y_sample, images=X_sample, figsize=(35, 25))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's nicer! Now let's focus on just the 3s and the 5s:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 72,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABNoAAAS6CAYAAACMfwQQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOzdd1hTdxcH8O8VxeCEiquKuGfdrdW21j1rwYVa627dW7FaF+DWIi7qLFq3goJa62zddVSts26riANcgIIEBO77R0JeIkOS3OTehO/neXxibi65hyySk/M7RxBFEURERERERERERGSaHHIHQEREREREREREZAuYaCMiIiIiIiIiIpIAE21EREREREREREQSYKKNiIiIiIiIiIhIAky0ERERERERERERSYCJNiIiIiIiIiIiIgkw0UZERERERERERCQBJtqIiIiIiIiIiIgkwEQbERERERERERGRBJhoIyIiIiIiIiIikgATbURERERERERERBJgoo2IiIiIiIiIiEgCTLQRERERERERERFJgIk2IiIiIiIiIiIiCTDRRkREREREREREJAEm2oiIiIiIiIiIiCTARBsREREREREREZEEmGgjIiIiIiIiIiKSABNtREREREREREREEmCijYiIiIiIiIiISAJMtBEREREREREREUmAiTYiIiIiIiIiIiIJMNFGREREREREREQkASbaiIiIiIiIiIiIJMBEGxERERERERERkQSYaCMiIiIiIiIiIpIAE21EREREREREREQSYKKNiIiIiIiIiIhIAky0ERERERERERERSYCJNiIiIiIiIiIiIgkw0UZERERERERERCQBJtqIiIiIiIiIiIgkwEQbERERERERERGRBJhoIyIiIiIiIiIikgATbURERERERERERBJgoo2IiIiIiIiIiEgCTLQRERERERERERFJgIk2IiIiIiIiIiIiCTDRRkREREREREREJAEm2oiIiIiIiIiIiCTARBsREREREREREZEEmGgjIiIiIiIiIiKSABNtREREREREREREEmCijYiIiIiIiIiISAJMtBEREREREREREUmAiTYiIiIiIiIiIiIJMNFGREREREREREQkASbaiIiIiIiIiIiIJMBEGxERERERERERkQSYaCMiIiIiIiIiIpJATrkDICIiIiIi2+Lg4BCuVquLyh1HRlQqVURcXFwxueMgIiLbI4iiKHcMRERERERkQwRBEJX8OUMQBIiiKMgdBxER2R4uHSUiIiIiIiIiIpIAE21EREREREREREQSYI82IiIiIrJJ7BNGRERElsYebURERERkk9gnTD687YmIKLvi0lEiIiIiIiIiIiIJMNFGREREREREREQkASbaiIiIiIiIiIiIJMBEGxERERERERERkQSYaCMiIiIiIquVnJyM2bNnQxAE3Lt3T+5wiIgom8spdwBERERERETGOHv2LPbv34+pU6eiTp06uHHjBsqUKSN3WERElI0x0UZERERERFZn06ZNGDBgAN68eYP27dtj3bp1yJcvn9xhERFRNieIoih3DEREREREkhMEQVTye11BECCKoiB3HOZg7ts+NDQUFSpUQGJiItzd3REUFIScObNeQ2DLtz0REcmLPdqIiIiIiMhqHDp0CF999RUSExPRqVMnbN682aAkGxERkTmxoo2IiIiIbBIr2uQjxW0fHByMmjVroly5cnrbW7ZsiT/++APOzs44efIkypcvb0x8NnvbExGRvFjRRkREREQkgcTERKxfvx7u7u6Ij4+XOxyr9ejRI/Tp0wfffvstIiMj9S6bP38+jhw5gg8//BA3b940KsmWwg1B1U2NNQvH6OiGoENuCLqlPe1o7mMSEZG8mGgjIiIiIjLR27dv0atXL/Tq1QvHjx9Hp06d5A7JKq1atQpDhw7FunXrULVqVXz88ce6y/z9/bFkyRLkz58fO3bsgJOTk6mHM2vSS5tUmwfACcBj7ek8JtuIiGwbmxkQEREREWUiKSkJcXFxGU60TEhIQL9+/bB582a0adMGO3fuRPXqhhdL+QhCdWiSP64AQgEEe4niFVNitybfffcd1q1bh6SkJHz//ffw9/fXXfbff/9h+PDhEAQBM2fO1EvAmcBViivJxDAArwFEa89Hp9oebOZjExGRTFjRRkRERESUgaSkJEyZMgXVqlXDrVu30t2nf//+2LhxIypXroyQkBDkypULo0aNMug42iSbJzRVT2HaU0/tdpsWHR2NZs2aYePGjUhKSkKfPn2wZMkS2NvbAwD++OMPfPrppwA0FW9jx47V/ezNmzf1EnIGCjUx9Pcpif8n11JEa7cTEZGNYkUbEREREVE6UpJss2fPRoUKFfDhhx/qXR4fH4/Ro0dj69atUKlU+P3335E7d268efMGHh4ehh6uI4BIAFHa81Gpttt0Vdu9e/dw6NAh3fnw8HAEBASgWbNmuHXrFvbu3Yvnz5+jQIECKF26NHbv3o26devCwcEB3bp1g729PSpUqIBWrVoZemhzV5U9hCZhmjrZVlC7nYiIbBQTbURERERE73jw4AEGDhyIffv2oXHjxli4cKHe0tH4+Hh8//332LBhAxwcHLBx40aULVsWb968wcWLF/HZZ58ZekhXaCrZUouG+Zc3yu6PP/6AIPx/AOi+ffuwb98+vX0EQcCrV6/QokULAEDu3Lnh7OyMR48eAQCWLFlicKJtFzzMncD0h6ZHG6C5LwsCyA9gupmPS0REMuLSUSIiIiKiVI4ePYoaNWpg3759aNKkCX799VfUrFlTd3l8fDz69++vS7IdOXIEHTp0QGxsLI4dO4bKlSsbc9hQaBIxqRWE+Zc3ym7EiBG4fPkyxo8fr3c7ZyY+Pl6XZPvss8+wfv16c4ZolF3wCAbwAzSVih9qT3/QbiciIhsliKIodwxERERERJITBEE09L3upk2bMGTIEERHR6NNmzYICgpC3rx5dZenJNnWr1+vS7LVq1cPT548wYgRI7B27VrkyZMnq/FBFEUB0OvRFon/Vz85AfC1xoEIxtz2gGawxNKlSzFmzBjdtty5c8PR0RGiKOpVvgFA165d8eLFC4MTbalveyIiIilx6SgREREREYBjx45h8ODBePPmDUaPHo3p06frJdkAYPjw4bok24YNG1CvXj1s2rQJ3t7eWLp0aZaTbO/yEsUrPoLgC/2powHWmGQzhb29PUqXLq07P2DAAHz77bdo2LChfEEREREZgBVtRERERGSTDK2qqlKlCm7cuIHRo0fDz88PkZGRcHJy0l0eEhKCwYMH49WrVzhy5AiqVauGyZMnIzo6GpMnT0bZsmUNjc9mq6qMrWg7deoU3Nzc8OLFC8yaNQv9+vVDkSJFzBGfzd72REQkLybaiIiIiMgmGZrsSVmWWKBAARQqVAhv3rxBnjx5UL9+fZw+fRpPnjyBWq1G7ty5UbFiRZQvXx5NmzbFoEGDkDOn4QtFbDnZY2yibeLEiZgzZw5q1qyJEydOpKkolIot3/ZERCQvJtqIiIiIyCYZmuwJCgrCtm3bcPjwYVSqVAmApmfY33//rdvnhx9+QO/evVGoUCEULVrU1PhsNtljTKLt5cuXaN26NV68eIFTp06ZpZIthS3f9kREJC8m2oiIiIjIJhlbVRUVFQVHR0ckJCSgX79+2LhxIxwcHLB27Vq0b98euXLlkio+m032GHPbt23bFocPH0ZAQAC6d+9upsg0bPm2JyIieXEYAhERERFRKo6OjgCAsWPHYuPGjVCpVDh69Cg++eQTeQOzcQ8fPsTo0aPNnmRLIQiCoioOVCpVRFxcXDG54yAiItOwoo2IiIiIbJKxFW0AEBsbi0qVKuHRo0dYs2YN+vTpI21wsO2qKlNue0vQ3vZyh6HHlh8PRETZCSvaiIiIiIjecfr0adSpUwf+/v74+uuv5Q6HKEvcEFQdQEcArgBCAQTvgscVeaMiIspeWNFGRERERDbJSqqqbLKCyUpue7nD0GPq40GbZPMEEAkgGkBBAE4AfJlsIyKynBxyB0BEREREREQm6whNki0KgKg9jdRuJyIiC2GijYiIiIiIyPq5QlP
"text/plain": [
"<Figure size 1584x1584 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
"plot_digits(X_subset_reduced, y_subset, images=X_subset, figsize=(22, 22))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Notice how similar-looking 4s are grouped together. For example, the 4s get more and more inclined as they approach the top of the figure. The inclined 9s are also closer to the top. Some 4s really do look like 9s, and vice versa."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exercise: _Try using other dimensionality reduction algorithms such as PCA, LLE, or MDS and compare the resulting visualizations._"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's start with PCA. We will also time how long it takes:"
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 73,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.49 s, sys: 219 ms, total: 1.71 s\n",
"Wall time: 157 ms\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuMAAAIuCAYAAAAVCwbOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOydeXxcZb3/32cmM5nsk6VpkiYk3WloSqGUUqjsVsDS1mrRK4Jarz/Lxa1XrtsVawW9ykUQRQWvVEW9XqlUWhGxslMoZS1NmdKmS9K9WZrJPpkkc35/fM/JnJnMJDPJZOV5v155TebMWZ6zf57v8100XddRKBQKhUKhUCgUI49ttBugUCgUCoVCoVC8V1FiXKFQKBQKhUKhGCWUGFcoFAqFQqFQKEYJJcYVCoVCoVAoFIpRQolxhUKhUCgUCoVilFBiXKFQKBQKhUKhGCWUGFcoFAqFQqFQKEYJJcYVCoVCoVAoFIpRQolxhUKhUCgUCoVilFBiXKFQKBQKhUKhGCWUGFcoFAqFQqFQKEYJJcYVCoVCoVAoFIpRQolxhUKhUCgUCoVilFBiXKFQKBQKhUKhGCWUGFcoFAqFQqFQKEYJJcYVCoVCoVAoFIpRQolxhUKhUCgUCoVilFBiXKFQKBQKhUKhGCWUGFcoFAqFQqFQKEaJpNFugEKhUCiGD83DQuBbwDxgEuAE6oGdwF16OTtGsXkKhULxnkfTdX2026BQKBSKYULz8Cng11F+7gYu0ct5deRapFAoFAoryk1FoVAoJjZVwGeAqYALKAdeN35LAj4+Su1SKBQKBcpNRaFQKCY0ejkvAS9ZJu3VPDwMXGB87xr5VikUCoXCRIlxhUKheI+geUgCZgE3G5NaiO7ColAoFIoRQIlxhUKheA+geagGSi2TTgIr9XI8o9MihUKhUIDyGVcoFIr3KoXAE5qHeaPdEIVCoXgvo8S4QqFQvAfQyylD0hqeDfzZmJwL3DFabVIoFAqFSm2oUCgU7zk0D/OBt4yv7+rlzBnF5gw7Kte6QqEYyyifcYVCoZhgaB4qgFWIj/gc4I/AX4DTQDHwNcvsB0e8gSPPOcDysGmFwEpgmeZRudYVCsXoodxUFAqFYgJhCPHbgGzgKCJE7wOOAJ2I+P6YMXsbsGEUmjnSqFzrCoVizKIs4wqFQjGxWAU0Al7j+6uIdTwLcc/oRoT5c8CP9HKqRr6JI4vKta5QKMYySowrFArFxKIUsYibbEeEaIlezprRadLYQeVaVygUYw0lxhUKhWJiUYO4qHgt07KM6e9pVK51hUIxFlE+4wqFQjGx2IyIcTegGZ/ZxnRFKCrXukKhGHWUGFcoFIoJhF5OJXA34jdeYnzebUx/T6NyrSsUirGIyjOuUCgUivcc77Vc6wqFYuyifMYVCoVCMeFQudYVCsV4QVnGFQqFQjGhsORabwSagHVARpTZ24Ar9HJeG6HmKRQKRQjKZ1yhUCgUEw1rrnUdybV+AhHeXUAHsA94EDhPCXGFQjGaKMu4QqFQKCYUmoeNSK516wtOQ+VaVygUYxBlGVcoFArFRKMGya1uReVaVygUYxIlxhUKhUIx0VC51hUKxbhBuakoFAqFYsIRlk2lBtiscq0rFIqxiBLjCoVCoVAoFArFKKHcVBQKhUKhUCgUilFCiXGFQqFQKBQKhWKUUGJcoVAoFAqFQqEYJZQYVygUCoVCoVAoRgklxhUKhUKhUCgUilFCiXGFQqFQKBQKhWKUUGJcoVAoFAqFQqEYJZQYVygUCoVCoVAoRgklxhUKhUKhUCgUilEiabQboFAoEo+2hmXAauBCoABwANXA48Bd+kbOjF7rFAqFQqFQmGi6ro92GxQKRYLR1vAk8IEoPx8GztM30jSCTVIoFAqFQhEBZRlXKCYmncDPgYcAD3Au8GegGJgKfAa4Z9RaN4HQPCwEvgXMAyYBTqAe2AncpZezYxSbNyg0DxXAKqAUqAE26+VUjm6rFAqFYmKiLOMKxQREW0OGvpGWsGm3Af9tfH1Q38jakW/ZxEPz8Cng11F+7gYu0ct5deRaNDQMIX4b0Ag0AVlANnC3EuQKhUKReFQAp0IxAQkX4gYuy/9HR6ot7wGqkJGGqcgxLgdeN35LAj4+Su0aLKsQIe4FdOOz0ZiuUCgUigSj3FQUivcA2hoKgc8bX9uBh0exORMKvZyXgJcsk/ZqHh4GLjC+d418q4ZEKX07a03GdIVCoVAkGCXGFYoJjraGEmAbMBkIAJ/UNyrL+HCgeUgCZgE3G5NaiO7CMlapQdxSvJZpWcZ0hUKhUCQYJcYVigmMtoazESFegvgvf1LfyJ9Ht1UTE81DNaHW45PASr0czwi3Y6jBl5sRn3EI9Rl/KJHtVCgUCoWgfMYVigmKtoYLgBcRId4OrNA38r+j26r3FIXAE5qHeSO1QUvwZTbiapIN3GZMjwlDuN+N+ImXGJ8qeFOhUCiGCZVNRaEYR2hrIlg9N1IZYfox4F4gA2gAlukbeWV0Wv3eQfPgAKYBdwIfMSZv1ctZMULbX09fFxM30KiXs2Ek2qBQKBSK+FBiXKEYJxiCO1LKuS3AirDpH0N8xKPxvL6Ry4ezve9lNA/zgbeMr+/q5cwZoe1uRCzi1ge7BpTo5ayJeT2qgqtCoVCMGMpnXKEYP1hTzmH5/DywO2x698g1671HmF/2HOCPwF+A00hhpa9ZZj84gkV0EhV8+Xn6VnA9x/i7QVujKrgqFApFolBiXKEYP0RLOVeM+IZbeQgo0TfGbg1VxEZYUZyjiDvKfcZfOG3A78PmN/24h8MPO1HBl6qCq0KhUIwQKoBToRg/1CDiykoW4h8eabpKRTc8hBfF8SApDLuBHsAH7AMeBM4DZjNCRXQSGHz5CX0jt+obeVPfiE/fyE5COxuzEtNihUKhUCjLuEIxfohm9bwfegMEVSq64cc6QpGPVN30AMnA84SVjtc8I1tEx9jukCzuqoKrQqFQjBzKMq5QjBP0jVGsnhvZHGW6SkU3PFhHKOYglnAQi7eXvlbvaCMa42bkQlVwVSgUiuFDWcYVinGEIbD7iOxo0xXDgnWEwo34V7uAN41p4VbvcV1ER1VwVSgUiuFFpTZUKBSKOLFkR1mBiPG3gFrjZzdheb1HMJtKQolSwXVMF47axPJPAb/uZ5Y5q9n67gg1R6FQKAZEWcYVCoUiTky/bM3Ta/X2I/m8I1q9E+HHPdIYFVz/DuQhrimr9Y08MbqtUigUiomHEuMKhWLConlYCHwLmAdMApxAPbATuEsvZ8dQ1q+XU6l5uJtQq/dD48HqHU5YFVcbkrIxjXFcwXU1W7XRboNCoVAMhBLjCoViInMOsDxsWiGwElimebhEL+fVoWxgPFq9wwmr7noUySOeZvycC+zQQjPWqwquCoVCkSCUGFcoFBOZKkRYPgOcBKYhmUAuQJ5/H4ehifEJQnh11wlRwXUTy08hnYkG4DngztVs3TOqjVIoFIowlBhXKBQTFr2cl4CXLJP2ap5eMQ7QNfKtGpOE50L/DeIDP96ruE62fH4UWL6J5ZetZutro9gmhUKhCEGJcYVC8Z5A85CEVI682ZjUQpSsG+M1+8kQqEECT72WaeMqF7qFA8AtwFNIddoSpHrotUAK8F/A1aPWOoVCoQhDpTZUKBQTHs1DNaG5v08CKyP5ixtC3PSftuYFH0xZ+XFBmM946D4PsXiUtoZlwGrgQqAAcADVwOPAXfpGzgxl/bGwieVTEGEO0LaarenDvU2FQqGIFSXGFQrFoBgLIitWIohxED/iK/VydofNu56+VmI3YbnDxwuxWvlDsqlMo4PLgRxS+lsmpu2v4UngA1F+Pgycp2+kaTDrjsQmlttWszUQNq0IOG58bV3N1oxEbU+hUCiGihLjCoViUIy0yBoqmgcHEsB5J5K2D2CrXs4KyzwVwG8Rf2kvsBcp5iP+0+WD958eDdeXwVj5Ez0yoK1hC2KVfgjwAOcCfwaKjVm+om/knnjXa7KJ5eHH9Wrgr8Bj0NsJM91UAP6+mq3XDXZ7CoVCkWiUz7hCoRgsncDPiSyypiJZTCKKrLCc1iJ
"text/plain": [
"<Figure size 936x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 11:36:04 +01:00
"pca = PCA(n_components=2, random_state=42)\n",
"%time X_pca_reduced = pca.fit_transform(X_sample)\n",
2021-11-19 06:03:48 +01:00
"plot_digits(X_pca_reduced, y_sample)\n",
2019-01-15 05:36:29 +01:00
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wow, PCA is blazingly fast! But although we do see a few clusters, there's way too much overlap. Let's try LLE:"
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 74,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1min 16s, sys: 7.77 s, total: 1min 24s\n",
"Wall time: 8.86 s\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuMAAAIuCAYAAAAVCwbOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOzdd3hc1Zn48e+ZPqqjYhVLsmyDOy5gwNj0EEoI2MHBkM2GsAiSkA0bQvCPZEMSx5uQakLY9E3wJiSbZmLAIYSYXo0xrjLuTW7qvU2/vz/OHWk0GsmSLGk05v08j56RRjNzz7Rz33vue96jDMNACCGEEEIIMfosiW6AEEIIIYQQ71cSjAshhBBCCJEgEowLIYQQQgiRIBKMCyGEEEIIkSASjAshhBBCCJEgEowLIYQQQgiRIBKMCyGEEEIIkSASjAshhBBCCJEgEowLIYQQQgiRIBKMCyGEEEIIkSASjAshhBBCCJEgEowLIYQQQgiRIBKMCyGEEEIIkSASjAshhBBCCJEgEowLIYQQQgiRIBKMCyGEEEIIkSASjAshhBBCCJEgEowLIYQQQgiRIBKMCyGEEEIIkSASjAshhBBCCJEgSR2MK8U4pXhUKTYqhU8pDPPnnkS3TQghhBBCiFOxJboBp6kI+HyiGyGEEEIIIcRQJPXIONAEPAJ8DPhFYpsihBBCCCHE4CT1yLhhcAT4IoBSzEhsa4QQQgghhBicZB8ZF0IIIYQQImlJMC6EEEIIIUSCSDAuhBBCCCFEgkgwLoQQQgghRIIkVTCuFEuV4iWl2GdeLk10m4QQQgghhBiqpKmmYgbePwVS0e0uAH6qFOnA34GUqJunKkUugGFQN9ptFUIIIYQQYiCUYRiJbsOAKEU5MBXwAyHACjjMnz4ZBmrkWyeEEEIIIcTgJVMw3g4owG5eGkAQcPV3PwnGhRBCCCHEWJU0aSro/HYHOggHHZA7AK9h4E5Yq4QQQgghhBiiZJrAGUQH4JaoH2VeL4QQQgghRNJJpmC8k56j4ph/dyamOUIIIYQQQpyeZArGfUA45rqweb0QQgghhBBJJ5mCcTu6ggp0j5BbzeuFEEIIIYRIOskUjKehR8Ijgbhh/p2WsBYJIYQQQghxGpKpmoqVnoF45NIa/+ZCCCGEEEKMbck0Mt5Jz0oqkR+ZwCmEEEIIIZJSMgXjfS1rL8vdCyGEEEKIpJRMwbgNnZaion4MkivVRgghhBBCiC7JFIzn0Lu9FvN6IYQQQgghkk4yBeN9LXnf1/VCCCGEEEKMackUjFvoXnkzQpFcz0EIIYQQQoguyRTI9tXWZHoOQgghhBBCdDkTAlmlFBuVwqcUhvlzT6IbJYQQQgghxKkkVyWSLKAUSAXagYNAKwAXJq5RQgghhBBCDE3yjIxnAbMABzoQdwDTACcAHwN+kaimCSGEEEIIMRTJE4yXAn7zB/PSAszAMAz+DFQnqmlCCCGEEEIMRfIE46l0B+IRfvN6IYQQQgghklDyBOOR1JRokZQVIYQQQgghklDyBOMV6OA7EpBHfq9IWIuEEEIIIYQ4LckTjDcC79GdmuI3/25MZKOEEEIIIYQYuuQqbdiIBN9CCCGEEOKMkTwj4/1QilwgJeqqVKXINa8XQgghhBBiTFKGYSS6DQOiFGFADfZ+hjH4+wghhBBCCDEazoiRcSGEEEIIIZJRcuWMx2cYhhxUCCGEEEKI5CNBrBBCCCGEEAkiwbgQQgghhBAJIsG4EEIIIYQQCSLBuBBCCCGEEAkiwbgQQgghhBAJIsG4EEIIIYQQCSLBuBBCCCGEEAmSTMF4X0uFJscSokIIIYQQQsRIpkV/QunnNFkKl57AXdpBZ0UKlWuLaN3pCSW6YUIIIYQQQgxF0gTjGXMbg+ff8ZStaMfLyr21ms5x+Zy460rj3f/9SBCyEt08IYQQQgghBi1pgvE5i//O1Jd+j8+VQYcnD3tbK1Nf+j3+xXbgE4lunhBCCCGEEIOWNDnjFxz7ve2sjpPqotZyzm3aQZolhM+Voc6ufSZpnoMQQgghhBDRkiOQfVDNPnvvdqvLFabdloIj7Gd203tkWP04T9ZZE908IYQQQgghhiI5gnFYGnRn4w8CFoXf4sBvt1PSeIDG0MREt00IIYQQQoghSZZgvLTNMp9gQ4hwux+sYbyd4KhrMzbtvT3RbRNCCCGEEGJIkiUYr0gj0wikX0bQn0bguA+rz0FL5rUcPnCplDYUQgghhBBJKVmC8bU5KQ2hNJWO1/gQTs+HyMqcx5933AvgTXTjhBBCCCGEGAplGMmxgOW5+Vsal0x9OnNC5jFV0TyBtXuXsrN2jgE0GAa5iW6fEEIIIYQQg5U0wbhStACpgDJ/DPOn3TDISGTbhBBCCCGEGIpkSVMB3dbYIweD5HoOQgghhBBCdEmmQDaMbq8y/1bm3+GEtUgIIYQQQojTYEt0AwbBRvyR8WR6DkIIIYQQQnRJpkA2Miqu4lwvhBBCCCFE0km2QDY2EI/9WwghhBBCiKSRTMG4dZDXCyGEEEIIMaYlTZrK7HHbrTdNe5LSzKNE1RmHqGB8MWvSgV1AsXnV5nUsOz8BzRVCCCGEEOKUkmNk/EE1e/mCh8lyNXKspZgsVyPLF6zinHE7Ym/5HboDcSGEEEIIIca05AjGYWmj10OTLwsDC02+LBq9WSydthbMCiuLWXMR8FmgPZENFUIIIYQQYqCSJRgvbfZl9rii2ZdJaeZRAGMxa+zAr9DP56uj3zwhhBBCCCEGL1mC8YpMZ3OPKzKdzVQ0T4j8+QBwDrAWeGpUWyaEEEIIIcQQJUswvjbL1YTH2YgijMfZSJarkbV7l5I6pVWhR8ObgXsS3E4hhBBCCCEGLDmC8YeM8oc3fjHU5PUYJRnHafRmsWrjcnbWzjHm/HKzAlzAA+tYVpnopgohhBBCCDFQyjBiV5gfm5TCC9jTZzepwqUncJd2oCxhSm4/qoB9wMfQkznHA38377YL+Ffg6DqWNSSk4UIIIYQQQvQhaeqMA03pc5rGTf9mOY48n7I4w1hTgpEjianAljj3mQlsBe4AfjNK7RRCCCGEEGJAkikYby79zIG89FktyuIIoWwGyh5WiW6UEEIIIYQQQ5VMwXhB7pW12DIChH1Wwj4LKmTBV6sM57jAnnUsmwmwmDUTgcPmfWQFTiGEEEIIMWYlxwROzW3LDCgjCEZQD4gbQYURQgGehLZMCCGEEEKIIUimkXFL2GvBmR8AI0w4oAj7FMoKQFPkRutYdgSQ9BUhhBBCCDHmJc3IePqcJhX2WwgHFEbIwGIPYUsPEfJZDODlRLdPCCGEEEKIwUqaYLzwpuOqeVM2wUY7oMCiMEJgBBTALxLcPCGEEEIIIQYtadJU3KWdBNtshH1Wgi2gbGGMoEJZJSNFCCGEEEIkp6QJxjuPuo0FU95SV/z0DfKqaqjKzee1yy7m0KTJpEzoXAqUJ7qNQgghhBBCDEbSpKlk/bFJ/euzfyatrZWanHFkeFv4+NN/Ie8fNQooTXT7hBBCCCGEGKykCcavO/o8Na3jaMtKR2VAa0YGtZ25XFX+KkBFotsnhBBCCCHEYCVNmkqp6ygnqifgGddI2GshFLDiz3BQ3HDSANYmun1CCCGEEEIMVtKMjB/1lIRcNT4at+WAxSBtQiu5aXWczC0wEt02IYQQQgghhiJpgvEnx93kzQo2khlowmoLYTsYxLY/xPPnXRUCli9mzexEt1EIIYQQQojBSJpgfHvo3PDDWfcboakW8qpqabJmsfqyf2NX2jlhoBFYmug2CiGEEEIIMRhJkzPOIazl58w1HvvkHXSedCssCmwYqs6wAM1IRRUhhBBCCJFkkmZknEZgJ6rzRAr2rACEgBoUXgAykYoqQgghhBAiySRPMA4GjVD5syLs3oBh9/nBZ2D3+A0gC6moIoQQQgghkkwyBeMAtJZ7OLhqKoFGh+Eu6TQCTQ4DWLWOZbICpxBCCCGESCrJFIwbgA8IoQijDCPUag1X/73Q9mzaTf+nFD6lMMyfexLdWCGEEEIIIU4leSZwQu2cGdtKPnbHH9TMK3dZTjSNN/78wjJ
"text/plain": [
"<Figure size 936x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"lle = LocallyLinearEmbedding(n_components=2, random_state=42)\n",
"%time X_lle_reduced = lle.fit_transform(X_sample)\n",
"plot_digits(X_lle_reduced, y_sample)\n",
2019-01-15 05:36:29 +01:00
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"That took more time, and yet the result does not look good at all. Let's see what happens if we apply PCA first, preserving 95% of the variance:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 75,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1min 18s, sys: 6.54 s, total: 1min 24s\n",
"Wall time: 7.57 s\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuMAAAIuCAYAAAAVCwbOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOzdd3xcV5nw8d+50zWSZiSNZMu2LHfHspU41XECmBRCgKAkTkwPxbzshrobNgu7SyBkk9DWLLAbIOwSQ6hLBA4xIUBIr6QntiP3Ilu2ZWnUy/R73j/OHWnUbLlJlvJ8P59hpKuZe8+U4Oc+9znPUVprhBBCCCGEEGPPGu8BCCGEEEII8UYlwbgQQgghhBDjRIJxIYQQQgghxokE40IIIYQQQowTCcaFEEIIIYQYJxKMCyGEEEIIMU4kGBdCCCGEEGKcSDAuhBBCCCHEOJFgXAghhBBCiHEiwbgQQgghhBDjRIJxIYQQQgghxokE40IIIYQQQowTCcaFEEIIIYQYJxKMCyGEEEIIMU4kGBdCCCGEEGKcSDAuhBBCCCHEOJFgXAghhBBCiHEiwbgQQgghhBDjRIJxIYQQQgghxokE40IIIYQQQowTCcaFEEIIIYQYJxKMCyGEEEIIMU4kGBdCCCGEEGKcTPpgXClKleJ7SvGcUiSUQju3z4z32IQQQgghxBube7wHMAamA58b70EIIYQQQggx2KTPjAPtwHeA9wF3ju9QhBBCCCGE6DfpM+Naswf4PIBSLBrf0QghhBBCCNHvjZAZF0IIIYQQ4pQ06YJxpVipFI8oxTbnfuV4j0kIIYQQQojhTKoyFSfw/hXgczbNBy5Qig9ozbrxG5kQQgghhBBDTapgHLh7iWuDb6V3HZXWXurtmaxLrvRtypx+N0gwLoQQQgghTi2Tqkyl2vVa8Eb/GopUG/vsGRSpNm70r6Ha9VpwvMcmhBBCCCHEYJMqM77Su46e4jxci1NMCR0k1eGha1OQy6N/RqkzIkBezsODShEB0Jro+IxYCCGEEEK8kSmt9XiP4YT5ecUHbXVpWhXlt+NzJ0ikfWw7OJ/P/+6Owz5Pa9QYDVEIIYQQQog+k6pMJb3MYo53Ny4yxON+XGSY6d033sMSQgghhBBiWJMqGG98fwR3expPbxK8Gk9vkorMPr722xu01qiRbuM9biGEEEII8cY0qYLxjvlh/vrBi+gtyKO4qZ3egjz++sGL6JgfHu+hCSGEEEIIMcSkmsDZsKWC8nP28+dZl2EnLSyvTV5+DwdfnA6nj/fohBBCCCGEGGhSZcafvP8iUnu9eEnizk/hJUlqr5cn779ovIcmhBBCCCHEEJOqm4o6E7ti0W617C3PUlreTPPBUp57Yjn7Ns/W+pXJdeIhhBBCCCEmvskVjBdjsxhFEkgCXuf2Olq3SjAuhBBCCCFOLZMrGFfYFKGoBIJAD1APtKG1lmBcCCGEEEKcWiZfMM6wrQolGBdCCCGEEKccCVCFEEIIIYQYJxKMCyGEEEIIMU4kGBdCCCGEEGKcSDAuhBBCCCHEOJFgXAghhBBCiHEiwbgQQgghhBDjRIJxIYQQQgghxol7vAdwoi0p3cDKheuoDO2lvmMm67auZFPz6eM9LCGEEEIIIYaYVJnx6tLXuHHZGor8bezrnEGRv40bl62huvS18R6aEEIIIYQQQ0yqYHzlwnW0xYtoTxShsWhPFNEWL2LlwnXjPTQhhBBCCCGGmFTBeGVoLx2J0IBtHYkQlaG94zQiIYQQQgghRjapasbrO2YS9rfTnigCoCeZx3MHlrGrba5arUgAXuehn9WaO8ZtoEIIIYQQQjDJMuP3br06U+RvI+xrQ2ED8OS+FezvngH9gbgQQgghhBCnhEkVjG9oXpr69nP/pNviRVQUNpDMeCkJNAMkgTvHeXhCCCGEEEIMMKnKVAA2Np+hNjafkbtJA2ng0PiMSAghhBBCiOFNmsy4UlQDvhH+POlOOoQQQgghxMQ3aYJxYOVh/qbGbBRCCCGEEEKM0mQKxt/ByEF3bCwHIoQQQgghxGhMpmB80WH+tnnMRiGEEEIIIcQoTaZg3A/Yzk3nbNfAn8ZlREIIIYQQQhzGZJrYmAACgMv5XdMflM9kYAlLUCkiAFoTHbMRCiGEEEIIkUNprY/8qAlAKeqA0zjKyZpay+ROIYQQQggxPiZTmcqUYbZNjjMNIYQQQggxKU2mzHgakxXPZroV/aUqtzq/V2jN6vEZoRBCCCGEEANNpprxLJvhS1VCQP0Yj0UIIYQQQogRTaYylR5MEG7RnyFXQAoIA0XAuvEanBBCCCGEEINNpmD8GUxJSm6pigaagTZgjdZsHKexCSGEEEIIMcRkKlPRDN9jfKPW3DI+QxJCCCGEEGJkk2kC5wHMwj/ZXuMZIAbEtWbaeI5NCCGEEEKI4UymzHgA8GJqxBOYEhwfR9l3XAghhBBCiLEymYLxHiCfgXXwNqZeXAghhBBCiFPOpArGq097zbXyneuonLGX+oaZrHtgpWvjljN6xntgQgghhBBCDGfSdFNZWvXy1H/+1BpdFGpj34EKVRRq458/tUYvrXp56niPTQghhBBCiOFMmsz41e+6N9DaVqTaO4sAtHOvrn7XvQE4a3wHJ4QQQgghxDAmTWZ81ox63dEVGrCtoyvErBn1k6NdjBBCCCGEmHQmTTC+v3F6PFTQASiVvYUKOtjfOD0+3mMTQgghhBBiOJMmGP/dH6/pLAq36XBhGyibcGEbReE2/bs/XtM53mMTQgghhBBiOJNp0Z/G0097rfTqd65T2W4q9z6wUm/Yckaz1sgkTiGEEEIIccqZTMF4N5AH5L4gBfRqTf74jEoIIYQQQoiRTZoyFUxnmMygbRkmUccYIYQQQggxuUymYDwFuDCvKXtzOdsBUIpSpfieUjynFAml0M7tM+MzZCGEEEII8UY2mbLGSSDIwDKV7Pas6cDnxmxEQgghhBBCHMZkyowrwB7mpnIe0w58B3gfcOcYj08IIYQQQogBJlNm3MYE20HM60oDPQB8SVUDK/W/UQnUA3Xqa3rR+AxTCCGEEEIIYzJlxndigvAkEHfu3cumPXsAuBEoAvY59zcuKN5aNl4DFUIIIYQQAiZXMP5rwAv4AZ9z7/23C76+H2jDZM21c992etkGyYwLIYQQQohxNZnKVOZP5cXORfw+XMg+q5MKezNXdeZ5eiqB5wc9tqPA2xUehzEKIYQQQgjRZ9Jkxqfxt4tWcJsrxN5okoL9IfZGV3Cb66UdUyNAaNDDQ13JgvZxGKYQQgghhBB9Jk0wvoR7wjGK6aY8mKSgpJvyYIxiXn8dMHXiYUxnlTBQtKHp9M3jNlghhBBCCCGYRGUqIfZ5mzmtGFwKtK3IpLuYird3R88lv3yo8Pqz7qw4f9ozHbFMcNvGpuofb2td+O6cpweVIgKgNdFxeglCCCGEEOINRmk9eI2ciUcp/vGtfGWNn3YrTpEGFCgVoCWTxtv8V779P5hSlRnANYfbl9YD+pILIYQQQghx0kyWMpXPbWalHaBN+2m3wMZPq/bTrl7nPe30d1HpHNdRCiGEEEIIkWOyZMZ7AH8Zr+pF3KvC1NPOLGszV9tNnPEE8Hj2oUCF1qwev9EKIYQQQghhTJaacQXYTSx1NbE0uy17lpE7UTOEWYFTCCGEEEKIcTdZylSSgMv5WTk3gLTzt74uKsC6sR6cEEIIIYQQw5kswbjCZMJzJ19qTCDeBlQ492u0ZuPYD08IIYQQQoihJkvNeDeQN8yferUmf6zHI4QQQgghxGhMlsw49GfGs7eJf5YhhBBCCCEmtckSjFsMfS3DbRNCCCGEEOKUMVmCVd9RbhdCCCGEEGLcTZZgfKRVM2U1TSGEEEIIccqaLMG4EEIIIYQQE85kCca7jnL78CKqhoh6iIhqIaLSRFQ7EfUYEXXt8Q9RCCGEEEKIgSZLML4BsDEdVLI329k+OhH1XuA+4BKgGLOIUAhYAdQSUR89oSMWQgghhBBveJMlGC8B4vS3M9TO7yVHsY+P5fz8VSAIfCZn2/XHMT4hhBBCCCGGmCzBeAQzWbMH6HT
"text/plain": [
"<Figure size 936x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"pca_lle = make_pipeline(PCA(n_components=0.95),\n",
" LocallyLinearEmbedding(n_components=2, random_state=42))\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"%time X_pca_lle_reduced = pca_lle.fit_transform(X_sample)\n",
"plot_digits(X_pca_lle_reduced, y_sample)\n",
2021-11-19 11:36:04 +01:00
"plt.show()"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"The result is more or less as bad, but this time training was a bit faster."
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Let's try MDS:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2022-06-14 07:47:11 +02:00
"**Warning**, the following cell will take about 10-30 minutes to run, depending on your hardware:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 76,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1h 5min 8s, sys: 7min 56s, total: 1h 13min 4s\n",
"Wall time: 12min 12s\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuMAAAIuCAYAAAAVCwbOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOydeXxU1fn/33cms2WbyZ6QhIQ1JBAWBZFFcaFIWwSkYn+1atvYxe6l0l3lS9WuVmtXbWvq0k2xWBYtUkFABAGVJRj2QEhC9mWyzpKZ+/vj3ElmJjPJJIRFPe/Xi1fIzZ17z13m3s95znM+j6KqKhKJRCKRSCQSieTio7vUDZBIJBKJRCKRSD6sSDEukUgkEolEIpFcIqQYl0gkEolEIpFILhFSjEskEolEIpFIJJcIKcYlEolEIpFIJJJLhBTjEolEIpFIJBLJJUKKcYlEIpFIJBKJ5BIhxbhEIpFIJBKJRHKJkGJcIpFIJBKJRCK5REgxLpFIJBKJRCKRXCKkGJdIJBKJRCKRSC4RUoxLJBKJRCKRSCSXCCnGJRKJRCKRSCSSS4QU4xKJRCKRSCQSySVCinGJRCKRSCQSieQSIcW4RCKRSCQSiURyiZBiXCKRSCQSiUQiuURIMS6RSCQSiUQikVwipBiXSCQSiUQikUguEVGXugESiUQikbzfUIpYBCwHrgLSAQNwBtgI/EItpunStU4ikbyfUFRVvdRtkEgkEonkfYVSxCbgpjB/Pg1MU4uxX8QmSSSS9ykyTUUikUgkksHjBP4AXAlYgKuBSu1vo4C7L1G7JBLJ+wyZpiKRSCQSyeC5Qy2mze/3PUoRjwO/1H4ffwnaJJFI3ofIyLhEIpFIJIMkSIj7MPv9v+JitUUikby/kWJcIpFIJJLzRCkiA/ia9msn8OwlbI5EInkfIcW4RCKRSCTngVJENrAVSAO8wGfUYhkZl0gkkSHFuEQikUgkQ0QpYgLwJjAB6AbuVIt58dK2SiKRvJ+QEzglEolEIhkCShHTgf8CyYjUlOVqMa9c2lZJJJL3G1KMSyQSiUQSAUoRhcAyIAcxsnwrEAM0AovUYt66hM2TSCTvU2TRH4lEIpFIBkAT4iuBZsCO8BHP7Ocj29VirrsITZNIJO9zZGRcIpFIJJKBWYYQ4i3a792XrikSieSDhBTjEolEIpEMTA6B3uFPAwqQrRZTdElaJJFIPhBINxWJRCKRSAamHLAGLbNqyyUSiWTISDEukUgkkohQSpmhlLJOKeW0Ukq7UopLKeWcUspLSimzLnX7LjBrgQTAhoiI27Tf1166Jkkkkg8CcgKnRCKRSCJCKeWzwF/D/LkbmKMWsPfitejiEuSmUg6sVYspubStkkgk73ekGJdIJBJJRCilzAHyENUmq4HRiLLv07VVHlcL+NalaZ1EIpG8P5ETOCUSiUQSEWoBbyKqTfo4opQGiHH3xW+VRCKRvL+RYlwikUgkg0YpJQoYD9ylLWojfAqLRCKRSMIgxbhEInlfoBSxCFgOXAWkAwbgDLAR+IVaTNOla92HC6WUM4i8aR/VwFK1gNJL0yKJRCJ5/yJzxiUSyfsCpYhNwE1h/nwamKYWY7+ITbokXA6dkhBiHERJ+BvUAg5d6P1LJBLJBwlpbSiRSN4vOIE/AFcCFuBqoFL72yhEefIPA19DpIZMQNjrxQATge8BbytFfbywhx21gFzAqLXhRW1xEvDghd63RCKRfNCQaSoSieSyQyntayEH3KEW0+a32h6liMeBX2q/j7+4rbxk+DolTwGlwBSEIM6it1Py6IVuhFqAGzimlPIwcKu2+MNyDSQSiWTYkGJcIpFcVmhCfCXQjCg/ngCsZCWPQB9PZ7Pf/yv4cHBROyVBHaN84J/AS0AtogPwPb/VTw3nviUSieTDgBTjEonkcmMZQoi3aL+3+C3vEeNKERmIlA2AToTf9QeeICHu44J0SkJ0jG4FHtf+BdMBrB6ufV8oZOEeiURyuSFzxiUSyeVGDvSZiGnHb8KgUkQ2ovBMGuAFPqMWf2gi4wFc4E6Jf8dIBfYC5xDC2w10AceAJ4FpagH7hnHfw44mxFciRlt6Rl205Re/PaXMUEpZp5RyWimlXSnFpZRyTinlJaWUWZeiTRKJ5OIjI+MSieRyoxwhklr8llm15ShFTAA2A9mIEuyfUYt7JhF+qNA6JZu5cJ2SHAIj7TsRRX+y1QKKhnE/F4uIRl0uIhOBxUHLMoClwCKllDlqAXsveqskEslFRUbGJRLJ5cZahBi3AYr2MwFYqxQxHXgDIcQ7gSVqMf+4NM28tGidkjcRjibdwJ0XoFNSDn3cWXo6Ru9DBhx1ucicQEy4HYVINSoA3tb+FgXcfonaJZFILiIyMi6RSC4r1AJKlFIeAZbRzlSqsHIAOxXci8hZjkF4Wi9Si3lrOPd9OXh4R4LWKfkvkIzolCxXi3nlAuxqLSKtA4RotSI6Rk9dgH1dDPoddbnYqAW8iehQ+TiilPIsMF373X3xWyWRSC42suiPRCK5LPHL721GCMG7gcx+PrJdLea689znZVdYKMSEw0rgMSCOC9QpCdh/CJtJteD9OeExxD3l61w8cqkncSqlRCGccJ5BiPE24GpZ1VQi+eAjI+MSieRyJTi/t/si7POy8PDuEcDtTGUBuRzjMOWUIYTjlxFCHEShnd1KYPb2eXdK/NGE9/tOfCulzADuAyYDKYCRlTTQxRFex0Up8YjOxVOXgRA/Q2CqTDWwVApxieTDgRTjEomkh8ssTSN48uDTiBzybLX4gk0evOSFhQLsBKuwEQ3MphA9bZRRx8XplHwQCD050kIGH+NaPnZZT47MAF5RSrlBLeDQpW6MRCK5sEgxLpFI/PkafdM0Jmr/blOKLmqaxkXP772YHt790Dsi4MKGETsuzEwkXxPjT3FhOyQfFHyTI7ciIs2joScf2zc58rIQ42oBuUopBkQbH0LMjUgCHgSWXMq2SSSSC490U5FIJP740jSuBCzA1YgcZehN07hYhHVVuVgNuESFhXodP4y04MGMCwcx2LS/v5/dTC4aagFvqgUUqwWcUQtwqgUcIfD6XVaTI9UC3GoBx4CH/RZf8JEYiURy6ZGRcYlE4s8lT9PwoRZTohRpriq9kwcvWn7vRfDwDkfviEAKR6hiNkZMtGOnt0MybG4mH4aKlH6TI+/SFrUBf71EbfE/3/nAP4GXgFrE3ITv+a1+6qI3UCKRXHSkGJdIJD1cJmkaPWii8KILw/MpLDQM4rbXTjCRevSU0M4kdtKMSF8Ztg5JkLuIf0XKS+4uMlxcTpMjA+YDiPN9K/C49i+YDmD1xWudRCK5VEhrQ4lEEhYtTWM/IjrcCUz4oJedPx8P7+GyzrtYdoJKEavom5dvA5rV4g+GEAwhxkFYQl70yZFKaZ/zPRcRHbcCRkTH7yywDfiVWsCJi9k+iURyaZCRcYlEEpJLmKZxUQmKZOs4v8JCw1Ju/SLaCQY71sClrUg57FxmkyODz/dORNGfbLVATsiVSD6syAmcEomkDxep1Polxy+SnYAQSfMRQhx6PbxVv3/bBtjk5VZufSA+aOXuQ3IZTY78UJxviUQyOGRkXCKRBHARS61fDgx3YaHLqtx6BJx3ufvLZQJoiNSeMcC/gXe5fCZHnvf5lkgkHzxkzrhE8iHmUpdav9QoRRQjIuL+D8IhFxa6nMuth+N8xPTlcrxBEyN97fg8vaMcwXQA16sF7Ls4LezlYs0HkEgk7x9kZFwi+ZASxknjopZavwzwRbKNiIl0NsCBiKYOmuGyY7yYlVDP07FmWHLkByKCDkOodhxAuOGYEdf4spgceRHnA0gkkvcJUoxLJB9eQgmYD1up9bWISX3jgFZE0SMrkKUUUTiU6O4w2TFesEqoa1jcR9guZ/1Q2xvxBNBI99sncvw2JYiJlv3ZL4Zqxxb8JkYGbfd2pVRGpCUSyeWBFOMSyYeXUALmQ1VqXYtkVwKpiAhqCyIq7mKYo7uDxFcJ9SmgFJgCvIjIe/ZVQn10sBvVBHEfX/E1LH5kiII8ohz5cPu94tf79u4/NOMqekcATMThJIO
"text/plain": [
"<Figure size 936x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"%time X_mds_reduced = MDS(n_components=2, random_state=42).fit_transform(X_sample)\n",
"plot_digits(X_mds_reduced, y_sample)\n",
2019-01-15 05:36:29 +01:00
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Meh. This does not look great, all clusters overlap too much. Let's try with PCA first, perhaps it will be faster?"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2022-06-14 07:47:11 +02:00
"**Warning**, the following cell will take about 10-30 minutes to run, depending on your hardware:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 77,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1h 11min 58s, sys: 5min 42s, total: 1h 17min 41s\n",
"Wall time: 11min 5s\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAuMAAAIuCAYAAAAVCwbOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOydeXxU5b3/32cms2Wbyb6QkBCWkEBYBEUQRcUiWgWkYvtrq21j76121ertclvLpdre1vXa1fbW2OrtoigKUqtUESiIKHsw7JCQfZ/JNksmc35/PGeSyWRmMglhUZ/365XXZGbOnPOc52yf5/t8F0VVVSQSiUQikUgkEsn5R3ehGyCRSCQSiUQikXxckWJcIpFIJBKJRCK5QEgxLpFIJBKJRCKRXCCkGJdIJBKJRCKRSC4QUoxLJBKJRCKRSCQXCCnGJRKJRCKRSCSSC4QU4xKJRCKRSCQSyQVCinGJRCKRSCQSieQCIcW4RCKRSCQSiURygZBiXCKRSCQSiUQiuUBIMS6RSCQSiUQikVwgpBiXSCQSiUQikUguEFKMSyQSiUQikUgkFwgpxiUSiUQikUgkkguEFOMSiUQikUgkEskFQopxiUQikUgkEonkAiHFuEQikUgkEolEcoGQYlwikUgkEolEIrlASDEukUgkEolEIpFcIGIudAMkEolEIhkrlFJuAlYBlwGZgAGoBDYCD6tltF241kkkEslQFFVVL3QbJBKJRCIZE5RSXgeuD/P1aWC2WobjPDZJIpFIIiLdVCQSiUTyUcIN/AaYA1iAy4Ea7bsJwJ0XqF0SiUQSEummIpFIJJKPEp9Xy+gMeL9LKeVJ4BHt/ZQL0CaJRCIJi7SMSyQSieQjQ5AQ92MO+L/6fLVFIpFIokGKcYlEIpF8ZFFKyQK+rr3tAZ69gM2RSCSSIUgxLpFIJJKPJEopucBmIAPwAV9Qy6RlXCKRXFxIMS6RSCSSjxxKKVOBHcBUwAvcrpbx4oVtlUQikQxFBnBKJBKJ5COFUspc4B9AKsI1ZZVaxmsXtlUSiUQSGinGJRKJRPKhRimlBFgJ5CFmfG8F4oBW4Ca1jHcvYPMkEokkIrLoj0QikUg+tGhC/H6gHXAg8oiPi/CTrWoZV5+HpkkkEklUSMu4RCKRSD7MrEQIcbv23nvhmiKRSCQjR4pxiUQikXyYyWNw7vA/AgqQq5ZRekFaJJFIJCNAZlORSCQSyYeZKsAa9JlV+1wikUgueqQYl0gkEsmHmXVAEmBDWMRt2vt1F65JEolEEj0ygFMikUgkACgVXAr8EJgBpAFGoAXYBTysFrPzAjYvLEHZVKqAdWoZ5Re2VRKJRBIdUoxLJBKJBAClgi8Cz4T52gtcoRbz3vlrkUQikXz0kW4qEolEIvFzHJEacAJgBoqB3dp3McBnL1C7JBKJ5COLzKYikUgkEgDUYnYgSsj7OaxU8CwwV3vfe/5bde5QSrkJWAVcBmQCBqAS2Ag8rJbRduFaJ5FIPi5INxWJRCKRDEGpIAaYAvwJIcY7gcvVYiouaMPGEKWU14Hrw3x9GpitluE4j02SSCQfQ6SbikQikUgGoVRQibCCf4AQ4vXAdR8lIa7hBn4DzAEswOVAjfbdBITLjkQikZxTpJuKRCKRSIYjC3hNqeBatZiDF7oxY8jn1TI6A97vUkp5EnhEez/lArRJIpF8zJBiXCKRSC4QF6vPslpMvlKBASgAHgJuBVKAB4HlF6JN54IgIe7HHPB/dYjvJRKJZEyRPuMSieSiQ6kIkTe6+KOXN/rD4LOsVDAL2Ke9PaIWU3QBm3NOUUrJQuxrBtADTFXLpCCXSCTnFukzLpFILio0IX4/oopitfZ6v/b5R40L7rOsVFCiVLBaqaBMqWCnUsE3lQpylQqMSgUFwHcDFj95rttzoVBKyQU2I4S4D/iCFOISieR8IN1UJBLJxcZKoB2wa+/F63t8T3kULxeZS8dZckF9lgMGPu2Igc+twJPaXzDdwJpz2Z4LhVLKVGATkIsobvQFtYwXL2yrJBLJxwUpxiUSycVGHkN9dR0c4VZgXNDn07S/25TSC+/SMVIuhM+yoqwZcAF6/NZ8SsbVkm2za1+/BxQBVsCIEKZngC3AY2oxx8e6PRcapZS5wD+AVIRryiq1jNcubKskEsnHCSnGJRLJxUYVwjXFHvCZFZUuhEvH00AFMBN4EchhwKXj8fPa0jFG81n+uva2B3h2TNcvhPiAJdwWO493TyVzSV4n+SlNwHZE0Z9ctZjSsdz2xYJSOigeQYeYDYgDWoGb1DLevYDNk0gkH0OkGJdIJBcb6xCCEcCBsNImsZgvqEvYFbDcRyoNneazvIlz67M82AWovaeJpDgbRxuKNDEOor+rxni7FwWaEA90y7kTIcRBZIvZqQwegmxVy7j6PDZRIpF8DJEBnBKJ5KJCy5ryKEIw5WqvjwYJcT8fiTR0ms/yDmAqwjXk9nPks5wHAa48rx44jM2ioCjpgALYELMS687BtkeMUsGlSgXrlQpOKxV0KRV4lArqlApeViqYP4pVBg5GVERfSyQSyQVFWsYlEslFhybII6YyPNcuHeeLSD7L5yDF42AXoC3Hm+jtK+dTl4xDDHyqgKcvojSS04BlQZ9lASuAm5QKrlCLeW8E6wuOR/gjYhCSq5Z9NN1yJBLJxY8U4xKJ5EPHeXLpOCdE67McItOJP8Xjo2chloe6AO045WPHqW+p6rSLRYAHchzhSrIZqEcUIXoWmIt4fn0WRiTGQ8cjfETdciQSyYcDKcYlEsmHivOVhu5cVMccsc9yMrWU8gftnV17XckwswbhUNXV5Yqy5lEGW9ufVtXVF6MQRy1mB8J9x89hpaJfjAP0jnCVoeMRRFCwRCKRXBCkGJdIJB8aznMauq8ztDrm2aZSDM6hHtln2TfkewdCRI8aTXhflOI7EkoFMYgg3Tu0jzqBZ0ayDrWMcqWUoYORsg9ff0gkko8OUoxLJJKLlguchs5fHXMsUylG7bOsVLAaYbUN5GPpUqFUUMngQUg9sEItpmKk69KEtxTfEonkokGKcYnkY8y5cMUYw7Zd6DR056I65kh8lqVLRXiygNeUCq5Vizl4oRsjkUgkZ4NMbSiRfLz5OmLafyoirV0cwg3ju8BupRTrhWvahU1Dd46qY65DCGobw6QSDJvi8eLJdHLeUIvJR1QEnQr98QEpwIMXqk0SiUQyViiqql7oNkgkkguEUsp6oIbQrhgA96llF6aqpVJKGULwBt6kLlgaOi2V4j5EBpceYKpaRnWQK41IPxjBB3mky0sGo1QwC3EcAI6oxRRdwOZIJBLJWSPdVCSSjzfnwhVjrLho0tCFS6UYwpVGpB8s5dFwAvti8lm+GAcGQbnVi4C/Ai8DjYhB4ncDFj953hsokUgkY4x0U5FIPsacI1eMsSJql45zyTDVMYNdaeza+5Xns42jIWAgkcTggUTJOd92BSVKBauVCsq01xL/50FtmgY8CZxBBNSeBD6jraYbWHOu2yqRSCTnGinGJRJJPxdTVUvNQjvUZ/o8Wm61VIr/0rbfAyxXy/hLwCKDy8sLzjr94HniggwkQghufzEjv0U8sE3vAXUI4d0LOIGjwO+A2Wox75/LtkokEsn5QLqpSCQS4OKsanm+XTpGkUrxonGlGQXBaRZhhAOJSG4uEb4LzrXuf/UvG9im7YhZiVy1WJarl0gkH02kGJdIJOetquXFzGhSKQLf4BylHxzLtJNrWTZEGMOGsxpIRPKX1xYJ/d39IQcBJmC59n8RIkCzaaRt+qgR5D8vBjQfw2w6EslHHemmIpF8zInCFePjwohTKZ5jV5oxSTupCfEhbiF3fPoP5ZydT34kN5dI31XBoLanA4sQPuG7tO+uRszQjLRNHxmGceeRSCQfIaRlXCL5mHGBq1pezERdHTOQc+hKM6gC6LVXblr57u4rftvjjEsEJozPqfwB5H8nivWEdAu56foNJc8+/+WzKQ0/nJtLuO+eYPBswmztf781fIv22TxgPfD0WFqDP0TW5kjuPBdjeyUSySiRYlwi+RhxEVS1vJi52Py/+9NOatb
"text/plain": [
"<Figure size 936x720 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"pca_mds = make_pipeline(PCA(n_components=0.95, random_state=42),\n",
" MDS(n_components=2, random_state=42))\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"%time X_pca_mds_reduced = pca_mds.fit_transform(X_sample)\n",
"plot_digits(X_pca_mds_reduced, y_sample)\n",
2019-01-15 05:36:29 +01:00
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Same result, and not faster: PCA did not help in this case."
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Let's try LDA now:"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
2021-11-19 11:36:04 +01:00
"execution_count": 78,
2019-01-15 05:36:29 +01:00
"metadata": {},
2022-02-19 10:24:54 +01:00
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 5.04 s, sys: 74.5 ms, total: 5.11 s\n",
"Wall time: 685 ms\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAqsAAAKaCAYAAAAZPRD5AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/MnkTPAAAACXBIWXMAAAsTAAALEwEAmpwYAAEAAElEQVR4nOydd3hc5ZX/P3f6qM2oW3KRXGWDZYypppkWB7JgEweTBruLk/zCppOwKQtZwgY2kJCQ7IYs2QSzCclughMngBPAdAIYU41lcLcl2/Koa1Snz/39ce7VvRqNmqsM7+d55hlpZu697y3SfO95z/keTdd1FAqFQqFQKBSKiYjjeA9AoVAoFAqFQqEYDiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQqFQqFQKBQTFiVWFQrFhEfTKNU0fqJpbNQ0YpqGbjy+cLzHplAoFIqji+t4D0ChUCjGwGTgS8d7EAqFQqE49qjIqkKhOBEIA/cAHwPuO75DUSgUCsWxREVWFQrFhEfXqQe+CqBpzDu+o1EoFArFsURFVhUKhUKhUCgUExYlVhUKhUKhUCgUExYlVhUKhUKhUCgUExYlVhUKhUKhUCgUExZVYKVQKCYcmkYtsAKoAhqAtbpO3fEdlUKhUCiOByqyqlAoJhSGUL0JKAT2G883aRrnaxolQI7t47maRonxukKhUCjeg2i6rh/vMSgUCsUAmsatiEAN216eBlw/0nK6jnYUh6VQKBSK44SKrCoUiolGFdCV8Vr38RiIQqFQKI4/SqwqFIqJRgMQyPL6d3QdbbjHsR6kQqFQKI4NSqwqFIqJxlokDSAIaMZzofG6QqFQKN5nqJxVhUIx4VBuAAqFQqEwUWJVoVC879BWZRHDq5UYVigUiomISgNQKBTvKwyhOtQaS15XKBQKxQRDiVWFYjRC2umEtDWEtCZCWpyQ1kJIe5aQdtnxHprikFgBdCLWWLrx3Gm8rlAoFIoJhhKrCsVIhLTrgFeAq4FywA2UAhcC5x2/gSkOg2zWWF3G6wqFQqGYYKh2qwrFcIS0OcAvACcyXfw54AXAC5wOpI/f4BSHQQNDmw4EjNcVCoVCMcFQYlWhGJ4vIcIU4Hoq9Kdt7z12HMajGImQNqRoigo9W9HUWiRnFSSiGkDE6/3HYpgKhUKhGB8qDUChGJ5LjOcEsJSQtpeQFiOkbSWkfYGQpozoJwoiVIcUTRmvD8Ko+r8byVOdajzfrdwAFAqFYmKirKsUiuEIaX1Azgif+D4V+jeO1XAUIxDSbmXo1H4Q6KRCv+1IbUbTKAVuAc4GFgIe460v6jo/PVLbUSgUCoWFiqwqFMPjtv38GFCE5Kr2Gq99jZBWdsxHpcjGsSqamoykh5yJJVQVCoVCcRRRYlWhGJ4228/3UaF3UqG/ATxlvOYEFhz7YSmy0IDknto5GkVTYeAe4GPAfUd43QqFQqHIgiqwUijsDC7S6QEqRlmi/6iPSTEWjknRlK5TD3wVQNOYdyTXrVAoFIrsqMiqQmEytEhnp+3dzxLSCglppwGXGq91Am8e20EqsiJV/0OKpoZxA1AoFArFCYSKrCoUFvbORgCvAzXALOBDQIftszrwFSr06LEcoGIERJgqcapQKBTvMVRkVaGwyFak8ztEtG4F4kA38CTwASr0Xx/b4SkUCoVC8f5DRVYVCotsnY3ygXVH0v5IoVAoFArF2FFiVaGwUJ2NFIPQNIZ0xdJ1lWqgUCgUxxKVBqBQmKgiHYUNQ6gO6YqlaZyvaZQwuGFErqZRYryuUCgUiiOI6mClUCgUWdA0snXFmgZcP9Jyuo5qw6tQKBRHEBVZVZyYhLTTCWlrCGlNhLQ4Ia2FkPYsIe2y4z00xfjRNEo1jZ9oGhs1jZimoRuPLxyn8dQCy4ElxsPsVNZ9PMajUCgU72eUWFWceIS064BXgKuBcqQtailwIXDe8RuY4jCYMG1MbdP/MePhB87BEqzf0XW04R7HadgKhULxnkWJVcWJRUibA/wCaXW6H7gSKYQqQ7xQ/3b8Bqc4DMJMnDampt/uW4DPeC0KnIqkBaw9TuNSKBSK9yXKDUBxovElwGv8fD0V+tO29x47DuMZFU2jFLgFOBtYiBU5/KKu89PjNa6JxARrY1qF3AjpwMvAPCBo/H63cgNQKBSKY4sSq4oTjUuM5wSwlJD2S6AS2APcC9xLxYSrGjSnuBUnBna/3RbjEQQ6lVBVKBSKY49KA1CcaEwznt3A14FqJFI5F/hP4M7jM6wRCTNxprgVo7MWEatBQDOe1fS/QqFQHCeUWFWcaLhtPz8GFAGnA73Ga18jpJUNWeo4ouvU6zpf1XV+DzQf7/EoRsaIng7x21VRVYVCoTg+qDQAxYlGG1Bh/HwfFXon8AYh7SngKqTwagHw1PEZnuK9gCFMlThVKBSKCYASq4oTjTeAK0b5TP+xGIji0FFtTBUKhUIxVpRYVUx8Qppd2NiF6GcJaX8DZgCXGq91Am8e2wEqxoPNx7STwW1MfwlsJUsbUwBdp+1Yj1WhUCgUxx+Vs6qY2IhQtfdnDyGROBBf1Q7gdSAPsRb6ChV69DiMVDF2TB/TMHLOwkAKeAFoBf7Z9tk7jddaDZGrOEZMtK5iCoXi/YuKrComOnZhg/H8CFL9PwWYiRi2bwTuyvBdPS6oKe5RMX1M7YyljelNmjYxCp3eJ+dYWa4pFIoJgRKriolONmHTCRygQl96HMYzImqKe0zYfUztfEfXuQ1A07g18zNLlzw+89OfuP8nhP5Qb6xjLRX6MReII5zjCSGkjyBhxHJtI9LK+IbjORiFQvH+RYlVxUQnm7AJYKUCTDSyRYILkCnuTO7E8oV9P/WUX4uIPYAu5HwWAvfbPjPoJuWKDzxS9s0v3Fnb1DLJjbTULQRuIqTdfTQE6yiR02znGGCFpg28f8JHXCdYVzGFQvE+RuWsKiY6J5pBexUiwOyMZYr7fcMYfUwbEBELwHVXPzivvbNYT6edLVh5rp2IMDwkNI1aTeNWTWO18Vxrvs7gPGkzcmrmzGY7x11IK92RljvUcarcUYVC8b5GRVYVE5sKvY6QdjeDo1X3H4/p3zEy6hT3hGSw48JRn2Ifg4/poOhrSWF7Wf2BqsS5Z7y81faZroPNkxZOruRWxhnJHGkqnxEip8aYs53jGcApQARpz7rVeLYvN9JYRorGqtxRhULxvkaJVcXER0TTRBWnmYxlivuw0VZlETirD/EYWY4Lg4Tb0ZpiHwu6Tp1NOFa1dxY3X3zeM43VU/aZApD6A9NmPPXCpdXI/seAy4BrNY31wH2jiNaRBGm2POku43UYeo5nAIuBNOJc4AfOAV42fq9iGMaY/xpG5Y4qFIr3MSoNQKE4ghyLVp2GUB063bzqkKebs1lJHdYU+5FA16nTdW7TdVatvPIPX66esi+NLR3kYFPl/DWPrtwCeBCxqAHtwCJGn34fbirfFP+BjPcG8qSznOPJwAbkXPgQd4ooMI/R86vHcuzzjbF9EHHBUCgUivcVKrKqUBxhjkGrztGmqcfLaJHErGgapcAtwNlIvqbHeOuLus5PD2Ecw5MlHeTbd91e/8xLl+wBLsASiCAC0RR8wx2PkQr3Ro2O28+xprEaOX49SEQVYyxlmctlYcRjnyXyetII6zps3ieWXAqF4gRDiVWF4sTjkMTlCByq48KouZRHVPxkpIM889KAvVUQ2f9cYBLgRMRz5whrG1aQZqYgGOO+f4Rxm8evBZn6Px0RlUm
"text/plain": [
"<Figure size 864x864 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
2019-01-15 05:36:29 +01:00
"source": [
2021-11-19 06:03:48 +01:00
"from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n",
2019-01-15 05:36:29 +01:00
"\n",
2021-11-19 06:03:48 +01:00
"lda = LinearDiscriminantAnalysis(n_components=2)\n",
"%time X_lda_reduced = lda.fit_transform(X_sample, y_sample)\n",
2021-11-27 11:03:26 +01:00
"plot_digits(X_lda_reduced, y_sample, figsize=(12, 12))\n",
2019-01-15 05:36:29 +01:00
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"This one is very fast, and it looks nice at first, until you realize that several clusters overlap severely."
2019-01-15 05:36:29 +01:00
]
},
{
2021-11-19 06:03:48 +01:00
"cell_type": "markdown",
2019-01-15 05:36:29 +01:00
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"Well, it's pretty clear that t-SNE won this little competition, wouldn't you agree?"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
2021-11-19 06:03:48 +01:00
"And that's all for today, I hope you enjoyed this chapter!"
2019-01-15 05:36:29 +01:00
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
2021-11-19 06:03:48 +01:00
"display_name": "Python 3",
2019-01-15 05:36:29 +01:00
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2021-10-17 03:27:34 +02:00
"version": "3.8.12"
2019-01-15 05:36:29 +01:00
}
},
"nbformat": 4,
2020-04-06 09:13:12 +02:00
"nbformat_minor": 4
2019-01-15 05:36:29 +01:00
}