2016-05-22 16:01:18 +02:00
{
"cells": [
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-05-22 16:01:18 +02:00
"source": [
2016-09-27 16:39:16 +02:00
"**Chapter 4 – Training Linear Models**"
]
},
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-09-27 16:39:16 +02:00
"source": [
"_This notebook contains all the sample code and solutions to the exercices in chapter 4._"
]
},
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-09-27 16:39:16 +02:00
"source": [
"# Setup"
]
},
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-09-27 16:39:16 +02:00
"source": [
"First, let's make sure this notebook works well in both python 2 and 3, import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures:"
2016-05-22 16:01:18 +02:00
]
},
{
"cell_type": "code",
2017-02-17 11:51:26 +01:00
"execution_count": 1,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
2016-09-27 16:39:16 +02:00
"# To support both python 2 and python 3\n",
2016-05-22 16:01:18 +02:00
"from __future__ import division, print_function, unicode_literals\n",
"\n",
2016-09-27 16:39:16 +02:00
"# Common imports\n",
2016-05-22 16:01:18 +02:00
"import numpy as np\n",
2016-09-27 16:39:16 +02:00
"import os\n",
"\n",
"# to make this notebook's output stable across runs\n",
2017-06-06 15:16:46 +02:00
"np.random.seed(42)\n",
2016-05-22 16:01:18 +02:00
"\n",
2016-09-27 16:39:16 +02:00
"# To plot pretty figures\n",
2016-05-22 16:01:18 +02:00
"%matplotlib inline\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt\n",
"plt.rcParams['axes.labelsize'] = 14\n",
"plt.rcParams['xtick.labelsize'] = 12\n",
"plt.rcParams['ytick.labelsize'] = 12\n",
"\n",
2016-09-27 16:39:16 +02:00
"# Where to save the figures\n",
2016-05-22 18:16:29 +02:00
"PROJECT_ROOT_DIR = \".\"\n",
2016-05-22 16:01:18 +02:00
"CHAPTER_ID = \"training_linear_models\"\n",
"\n",
"def save_fig(fig_id, tight_layout=True):\n",
" path = os.path.join(PROJECT_ROOT_DIR, \"images\", CHAPTER_ID, fig_id + \".png\")\n",
" print(\"Saving figure\", fig_id)\n",
" if tight_layout:\n",
" plt.tight_layout()\n",
2016-09-27 16:39:16 +02:00
" plt.savefig(path, format='png', dpi=300)\n"
2016-05-22 16:01:18 +02:00
]
},
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-05-22 16:01:18 +02:00
"source": [
"# Linear regression using the Normal Equation"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
2017-05-29 23:20:14 +02:00
"import numpy as np\n",
"\n",
"X = 2 * np.random.rand(100, 1)\n",
"y = 4 + 3 * X + np.random.randn(100, 1)"
2016-05-22 16:01:18 +02:00
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"plt.plot(X, y, \"b.\")\n",
"plt.xlabel(\"$x_1$\", fontsize=18)\n",
"plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n",
"plt.axis([0, 2, 0, 15])\n",
2016-09-27 16:39:16 +02:00
"save_fig(\"generated_data_plot\")\n",
2016-05-22 16:01:18 +02:00
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
2016-09-27 16:39:16 +02:00
"X_b = np.c_[np.ones((100, 1)), X] # add x0 = 1 to each instance\n",
2017-05-29 23:20:14 +02:00
"theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)"
2016-05-22 16:01:18 +02:00
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"theta_best"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"X_new = np.array([[0], [2]])\n",
2016-09-27 16:39:16 +02:00
"X_new_b = np.c_[np.ones((2, 1)), X_new] # add x0 = 1 to each instance\n",
"y_predict = X_new_b.dot(theta_best)\n",
2016-05-22 16:01:18 +02:00
"y_predict"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
2017-05-29 23:20:14 +02:00
"source": [
"plt.plot(X_new, y_predict, \"r-\")\n",
"plt.plot(X, y, \"b.\")\n",
"plt.axis([0, 2, 0, 15])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"The figure in the book actually corresponds to the following code, with a legend and axis labels:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
2016-05-22 16:01:18 +02:00
"source": [
"plt.plot(X_new, y_predict, \"r-\", linewidth=2, label=\"Predictions\")\n",
"plt.plot(X, y, \"b.\")\n",
"plt.xlabel(\"$x_1$\", fontsize=18)\n",
"plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n",
"plt.legend(loc=\"upper left\", fontsize=14)\n",
"plt.axis([0, 2, 0, 15])\n",
"save_fig(\"linear_model_predictions\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 9,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.linear_model import LinearRegression\n",
"lin_reg = LinearRegression()\n",
"lin_reg.fit(X, y)\n",
"lin_reg.intercept_, lin_reg.coef_"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 10,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"lin_reg.predict(X_new)"
]
},
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-05-22 16:01:18 +02:00
"source": [
"# Linear regression using batch gradient descent"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 11,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"eta = 0.1\n",
"n_iterations = 1000\n",
"m = 100\n",
"theta = np.random.randn(2,1)\n",
"\n",
"for iteration in range(n_iterations):\n",
" gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)\n",
" theta = theta - eta * gradients"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"theta"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"X_new_b.dot(theta)"
]
},
{
"cell_type": "code",
"execution_count": 14,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"theta_path_bgd = []\n",
"\n",
"def plot_gradient_descent(theta, eta, theta_path=None):\n",
2016-09-27 16:39:16 +02:00
" m = len(X_b)\n",
2016-05-22 16:01:18 +02:00
" plt.plot(X, y, \"b.\")\n",
" n_iterations = 1000\n",
" for iteration in range(n_iterations):\n",
" if iteration < 10:\n",
2016-09-27 16:39:16 +02:00
" y_predict = X_new_b.dot(theta)\n",
2016-05-22 16:01:18 +02:00
" style = \"b-\" if iteration > 0 else \"r--\"\n",
" plt.plot(X_new, y_predict, style)\n",
2016-09-27 16:39:16 +02:00
" gradients = 2/m * X_b.T.dot(X_b.dot(theta) - y)\n",
2016-05-22 16:01:18 +02:00
" theta = theta - eta * gradients\n",
" if theta_path is not None:\n",
" theta_path.append(theta)\n",
" plt.xlabel(\"$x_1$\", fontsize=18)\n",
" plt.axis([0, 2, 0, 15])\n",
" plt.title(r\"$\\eta = {}$\".format(eta), fontsize=16)\n",
"\n",
2017-06-06 15:16:46 +02:00
"np.random.seed(42)\n",
"theta = np.random.randn(2,1) # random initialization\n",
2016-05-22 16:01:18 +02:00
"\n",
"plt.figure(figsize=(10,4))\n",
"plt.subplot(131); plot_gradient_descent(theta, eta=0.02)\n",
"plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n",
"plt.subplot(132); plot_gradient_descent(theta, eta=0.1, theta_path=theta_path_bgd)\n",
"plt.subplot(133); plot_gradient_descent(theta, eta=0.5)\n",
"\n",
"save_fig(\"gradient_descent_plot\")\n",
"plt.show()"
]
},
2017-05-29 23:20:14 +02:00
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
2017-06-06 15:16:46 +02:00
"np.random.seed(42)\n",
"theta = np.random.randn(2,1) # random initialization\n",
2017-05-29 23:20:14 +02:00
"\n",
"plt.figure(figsize=(10,4))\n",
"plt.subplot(131); plot_gradient_descent(theta, eta=0.02)\n",
"plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n",
"plt.subplot(132); plot_gradient_descent(theta, eta=0.1, theta_path=theta_path_bgd)\n",
"plt.subplot(133); plot_gradient_descent(theta, eta=0.5)\n",
"\n",
"save_fig(\"gradient_descent_plot\")\n",
"plt.show()"
]
},
2016-05-22 16:01:18 +02:00
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-05-22 16:01:18 +02:00
"source": [
"# Stochastic Gradient Descent"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 16,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"theta_path_sgd = []\n",
"m = len(X_b)\n",
2017-06-06 15:16:46 +02:00
"np.random.seed(42)"
2017-05-29 23:20:14 +02:00
]
},
{
"cell_type": "code",
"execution_count": 17,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
2017-05-29 23:20:14 +02:00
"n_epochs = 50\n",
2016-05-22 16:01:18 +02:00
"t0, t1 = 5, 50 # learning schedule hyperparameters\n",
"\n",
"def learning_schedule(t):\n",
" return t0 / (t + t1)\n",
"\n",
2017-05-29 23:20:14 +02:00
"theta = np.random.randn(2,1) # random initialization\n",
2016-05-22 16:01:18 +02:00
"\n",
2017-05-29 23:20:14 +02:00
"for epoch in range(n_epochs):\n",
2016-05-22 16:01:18 +02:00
" for i in range(m):\n",
2017-05-29 23:20:14 +02:00
" if epoch == 0 and i < 20: # not shown in the book\n",
" y_predict = X_new_b.dot(theta) # not shown\n",
" style = \"b-\" if i > 0 else \"r--\" # not shown\n",
" plt.plot(X_new, y_predict, style) # not shown\n",
" random_index = np.random.randint(m)\n",
2016-09-27 16:39:16 +02:00
" xi = X_b[random_index:random_index+1]\n",
" yi = y[random_index:random_index+1]\n",
2016-05-22 16:01:18 +02:00
" gradients = 2 * xi.T.dot(xi.dot(theta) - yi)\n",
" eta = learning_schedule(epoch * m + i)\n",
" theta = theta - eta * gradients\n",
2017-05-29 23:20:14 +02:00
" theta_path_sgd.append(theta) # not shown\n",
"\n",
"plt.plot(X, y, \"b.\") # not shown\n",
"plt.xlabel(\"$x_1$\", fontsize=18) # not shown\n",
"plt.ylabel(\"$y$\", rotation=0, fontsize=18) # not shown\n",
"plt.axis([0, 2, 0, 15]) # not shown\n",
"save_fig(\"sgd_plot\") # not shown\n",
"plt.show() # not shown"
2016-05-22 16:01:18 +02:00
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 18,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"theta"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 19,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.linear_model import SGDRegressor\n",
2017-06-06 15:16:46 +02:00
"sgd_reg = SGDRegressor(n_iter=50, penalty=None, eta0=0.1, random_state=42)\n",
2016-05-22 16:01:18 +02:00
"sgd_reg.fit(X, y.ravel())"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 20,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"sgd_reg.intercept_, sgd_reg.coef_"
]
},
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-05-22 16:01:18 +02:00
"source": [
"# Mini-batch gradient descent"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 21,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": true,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"theta_path_mgd = []\n",
"\n",
"n_iterations = 50\n",
"minibatch_size = 20\n",
"\n",
2017-06-06 15:16:46 +02:00
"np.random.seed(42)\n",
"theta = np.random.randn(2,1) # random initialization\n",
2016-05-22 16:01:18 +02:00
"\n",
"t0, t1 = 10, 1000\n",
"def learning_schedule(t):\n",
" return t0 / (t + t1)\n",
"\n",
"t = 0\n",
"for epoch in range(n_iterations):\n",
2017-06-06 15:16:46 +02:00
" shuffled_indices = np.random.permutation(m)\n",
2016-09-27 16:39:16 +02:00
" X_b_shuffled = X_b[shuffled_indices]\n",
2016-05-22 16:01:18 +02:00
" y_shuffled = y[shuffled_indices]\n",
" for i in range(0, m, minibatch_size):\n",
" t += 1\n",
2016-09-27 16:39:16 +02:00
" xi = X_b_shuffled[i:i+minibatch_size]\n",
2016-05-22 16:01:18 +02:00
" yi = y_shuffled[i:i+minibatch_size]\n",
" gradients = 2 * xi.T.dot(xi.dot(theta) - yi)\n",
" eta = learning_schedule(t)\n",
" theta = theta - eta * gradients\n",
" theta_path_mgd.append(theta)"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 22,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"theta"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 23,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"theta_path_bgd = np.array(theta_path_bgd)\n",
"theta_path_sgd = np.array(theta_path_sgd)\n",
"theta_path_mgd = np.array(theta_path_mgd)"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 24,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"plt.figure(figsize=(7,4))\n",
"plt.plot(theta_path_sgd[:, 0], theta_path_sgd[:, 1], \"r-s\", linewidth=1, label=\"Stochastic\")\n",
"plt.plot(theta_path_mgd[:, 0], theta_path_mgd[:, 1], \"g-+\", linewidth=2, label=\"Mini-batch\")\n",
"plt.plot(theta_path_bgd[:, 0], theta_path_bgd[:, 1], \"b-o\", linewidth=3, label=\"Batch\")\n",
"plt.legend(loc=\"upper left\", fontsize=16)\n",
"plt.xlabel(r\"$\\theta_0$\", fontsize=20)\n",
"plt.ylabel(r\"$\\theta_1$ \", fontsize=20, rotation=0)\n",
"plt.axis([2.5, 4.5, 2.3, 3.9])\n",
"save_fig(\"gradient_descent_paths_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-05-22 16:01:18 +02:00
"source": [
"# Polynomial regression"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 25,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"import numpy as np\n",
"import numpy.random as rnd\n",
"\n",
2017-06-06 15:16:46 +02:00
"np.random.seed(42)"
2017-05-29 23:20:14 +02:00
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
2016-05-22 16:01:18 +02:00
"m = 100\n",
2017-05-29 23:20:14 +02:00
"X = 6 * np.random.rand(m, 1) - 3\n",
"y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1)"
2016-05-22 16:01:18 +02:00
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 27,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"plt.plot(X, y, \"b.\")\n",
"plt.xlabel(\"$x_1$\", fontsize=18)\n",
"plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n",
"plt.axis([-3, 3, 0, 10])\n",
"save_fig(\"quadratic_data_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 28,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.preprocessing import PolynomialFeatures\n",
"poly_features = PolynomialFeatures(degree=2, include_bias=False)\n",
"X_poly = poly_features.fit_transform(X)\n",
"X[0]"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 29,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"X_poly[0]"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 30,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"lin_reg = LinearRegression()\n",
"lin_reg.fit(X_poly, y)\n",
"lin_reg.intercept_, lin_reg.coef_"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 31,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"X_new=np.linspace(-3, 3, 100).reshape(100, 1)\n",
"X_new_poly = poly_features.transform(X_new)\n",
"y_new = lin_reg.predict(X_new_poly)\n",
"plt.plot(X, y, \"b.\")\n",
"plt.plot(X_new, y_new, \"r-\", linewidth=2, label=\"Predictions\")\n",
"plt.xlabel(\"$x_1$\", fontsize=18)\n",
"plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n",
"plt.legend(loc=\"upper left\", fontsize=14)\n",
"plt.axis([-3, 3, 0, 10])\n",
"save_fig(\"quadratic_predictions_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 32,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.pipeline import Pipeline\n",
"\n",
"for style, width, degree in ((\"g-\", 1, 300), (\"b--\", 2, 2), (\"r-+\", 2, 1)):\n",
" polybig_features = PolynomialFeatures(degree=degree, include_bias=False)\n",
" std_scaler = StandardScaler()\n",
" lin_reg = LinearRegression()\n",
" polynomial_regression = Pipeline((\n",
" (\"poly_features\", polybig_features),\n",
" (\"std_scaler\", std_scaler),\n",
" (\"lin_reg\", lin_reg),\n",
" ))\n",
" polynomial_regression.fit(X, y)\n",
" y_newbig = polynomial_regression.predict(X_new)\n",
" plt.plot(X_new, y_newbig, style, label=str(degree), linewidth=width)\n",
"\n",
"plt.plot(X, y, \"b.\", linewidth=3)\n",
"plt.legend(loc=\"upper left\")\n",
"plt.xlabel(\"$x_1$\", fontsize=18)\n",
"plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n",
"plt.axis([-3, 3, 0, 10])\n",
"save_fig(\"high_degree_polynomials_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 33,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.metrics import mean_squared_error\n",
2016-11-05 14:26:29 +01:00
"from sklearn.model_selection import train_test_split\n",
2016-05-22 16:01:18 +02:00
"\n",
"def plot_learning_curves(model, X, y):\n",
" X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=10)\n",
" train_errors, val_errors = [], []\n",
" for m in range(1, len(X_train)):\n",
" model.fit(X_train[:m], y_train[:m])\n",
" y_train_predict = model.predict(X_train[:m])\n",
" y_val_predict = model.predict(X_val)\n",
" train_errors.append(mean_squared_error(y_train_predict, y_train[:m]))\n",
" val_errors.append(mean_squared_error(y_val_predict, y_val))\n",
"\n",
2017-05-29 23:20:14 +02:00
" plt.plot(np.sqrt(train_errors), \"r-+\", linewidth=2, label=\"train\")\n",
" plt.plot(np.sqrt(val_errors), \"b-\", linewidth=3, label=\"val\")\n",
" plt.legend(loc=\"upper right\", fontsize=14) # not shown in the book\n",
" plt.xlabel(\"Training set size\", fontsize=14) # not shown\n",
" plt.ylabel(\"RMSE\", fontsize=14) # not shown"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
2016-05-22 16:01:18 +02:00
"lin_reg = LinearRegression()\n",
"plot_learning_curves(lin_reg, X, y)\n",
2017-05-29 23:20:14 +02:00
"plt.axis([0, 80, 0, 3]) # not shown in the book\n",
"save_fig(\"underfitting_learning_curves_plot\") # not shown\n",
"plt.show() # not shown"
2016-05-22 16:01:18 +02:00
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 35,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.pipeline import Pipeline\n",
"\n",
"polynomial_regression = Pipeline((\n",
" (\"poly_features\", PolynomialFeatures(degree=10, include_bias=False)),\n",
2017-06-06 15:16:46 +02:00
" (\"lin_reg\", LinearRegression()),\n",
2016-05-22 16:01:18 +02:00
" ))\n",
"\n",
"plot_learning_curves(polynomial_regression, X, y)\n",
2017-05-29 23:20:14 +02:00
"plt.axis([0, 80, 0, 3]) # not shown\n",
"save_fig(\"learning_curves_plot\") # not shown\n",
"plt.show() # not shown"
2016-05-22 16:01:18 +02:00
]
},
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-05-22 16:01:18 +02:00
"source": [
"# Regularized models"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 36,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.linear_model import Ridge\n",
"\n",
2017-06-06 15:16:46 +02:00
"np.random.seed(42)\n",
2016-05-22 16:01:18 +02:00
"m = 20\n",
2017-06-06 15:16:46 +02:00
"X = 3 * np.random.rand(m, 1)\n",
"y = 1 + 0.5 * X + np.random.randn(m, 1) / 1.5\n",
2016-05-22 16:01:18 +02:00
"X_new = np.linspace(0, 3, 100).reshape(100, 1)\n",
"\n",
"def plot_model(model_class, polynomial, alphas, **model_kargs):\n",
" for alpha, style in zip(alphas, (\"b-\", \"g--\", \"r:\")):\n",
" model = model_class(alpha, **model_kargs) if alpha > 0 else LinearRegression()\n",
" if polynomial:\n",
" model = Pipeline((\n",
" (\"poly_features\", PolynomialFeatures(degree=10, include_bias=False)),\n",
" (\"std_scaler\", StandardScaler()),\n",
" (\"regul_reg\", model),\n",
" ))\n",
" model.fit(X, y)\n",
" y_new_regul = model.predict(X_new)\n",
" lw = 2 if alpha > 0 else 1\n",
" plt.plot(X_new, y_new_regul, style, linewidth=lw, label=r\"$\\alpha = {}$\".format(alpha))\n",
" plt.plot(X, y, \"b.\", linewidth=3)\n",
" plt.legend(loc=\"upper left\", fontsize=15)\n",
" plt.xlabel(\"$x_1$\", fontsize=18)\n",
" plt.axis([0, 3, 0, 4])\n",
"\n",
"plt.figure(figsize=(8,4))\n",
"plt.subplot(121)\n",
2017-06-06 15:16:46 +02:00
"plot_model(Ridge, polynomial=False, alphas=(0, 10, 100), random_state=42)\n",
2016-05-22 16:01:18 +02:00
"plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n",
"plt.subplot(122)\n",
2017-06-06 15:16:46 +02:00
"plot_model(Ridge, polynomial=True, alphas=(0, 10**-5, 1), random_state=42)\n",
2016-05-22 16:01:18 +02:00
"\n",
"save_fig(\"ridge_regression_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 37,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.linear_model import Ridge\n",
2017-06-06 15:16:46 +02:00
"ridge_reg = Ridge(alpha=1, solver=\"cholesky\", random_state=42)\n",
2016-05-22 16:01:18 +02:00
"ridge_reg.fit(X, y)\n",
"ridge_reg.predict([[1.5]])"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 38,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"sgd_reg = SGDRegressor(penalty=\"l2\", random_state=42)\n",
"sgd_reg.fit(X, y.ravel())\n",
2017-02-17 11:51:26 +01:00
"sgd_reg.predict([[1.5]])"
2016-05-22 16:01:18 +02:00
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 39,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
2017-06-06 15:16:46 +02:00
"ridge_reg = Ridge(alpha=1, solver=\"sag\", random_state=42)\n",
2016-05-22 16:01:18 +02:00
"ridge_reg.fit(X, y)\n",
"ridge_reg.predict([[1.5]])"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 40,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.linear_model import Lasso\n",
"\n",
"plt.figure(figsize=(8,4))\n",
"plt.subplot(121)\n",
2017-06-06 15:16:46 +02:00
"plot_model(Lasso, polynomial=False, alphas=(0, 0.1, 1), random_state=42)\n",
2016-05-22 16:01:18 +02:00
"plt.ylabel(\"$y$\", rotation=0, fontsize=18)\n",
"plt.subplot(122)\n",
2017-06-06 15:16:46 +02:00
"plot_model(Lasso, polynomial=True, alphas=(0, 10**-7, 1), tol=1, random_state=42)\n",
2016-05-22 16:01:18 +02:00
"\n",
"save_fig(\"lasso_regression_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 41,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.linear_model import Lasso\n",
"lasso_reg = Lasso(alpha=0.1)\n",
"lasso_reg.fit(X, y)\n",
"lasso_reg.predict([[1.5]])"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 42,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.linear_model import ElasticNet\n",
2017-06-06 15:16:46 +02:00
"elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5, random_state=42)\n",
2016-05-22 16:01:18 +02:00
"elastic_net.fit(X, y)\n",
"elastic_net.predict([[1.5]])"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 43,
2016-05-22 16:01:18 +02:00
"metadata": {
"collapsed": false,
2017-02-17 11:51:26 +01:00
"deletable": true,
"editable": true,
2016-05-22 16:01:18 +02:00
"scrolled": true
},
"outputs": [],
"source": [
2017-06-06 15:16:46 +02:00
"np.random.seed(42)\n",
2016-05-22 16:01:18 +02:00
"m = 100\n",
2017-06-06 15:16:46 +02:00
"X = 6 * np.random.rand(m, 1) - 3\n",
"y = 2 + X + 0.5 * X**2 + np.random.randn(m, 1)\n",
2016-05-22 16:01:18 +02:00
"\n",
"X_train, X_val, y_train, y_val = train_test_split(X[:50], y[:50].ravel(), test_size=0.5, random_state=10)\n",
"\n",
"poly_scaler = Pipeline((\n",
" (\"poly_features\", PolynomialFeatures(degree=90, include_bias=False)),\n",
" (\"std_scaler\", StandardScaler()),\n",
" ))\n",
"\n",
"X_train_poly_scaled = poly_scaler.fit_transform(X_train)\n",
"X_val_poly_scaled = poly_scaler.transform(X_val)\n",
"\n",
"sgd_reg = SGDRegressor(n_iter=1,\n",
" penalty=None,\n",
" eta0=0.0005,\n",
" warm_start=True,\n",
" learning_rate=\"constant\",\n",
" random_state=42)\n",
"\n",
"n_epochs = 500\n",
"train_errors, val_errors = [], []\n",
"for epoch in range(n_epochs):\n",
" sgd_reg.fit(X_train_poly_scaled, y_train)\n",
" y_train_predict = sgd_reg.predict(X_train_poly_scaled)\n",
" y_val_predict = sgd_reg.predict(X_val_poly_scaled)\n",
" train_errors.append(mean_squared_error(y_train_predict, y_train))\n",
" val_errors.append(mean_squared_error(y_val_predict, y_val))\n",
"\n",
"best_epoch = np.argmin(val_errors)\n",
"best_val_rmse = np.sqrt(val_errors[best_epoch])\n",
"\n",
"plt.annotate('Best model',\n",
" xy=(best_epoch, best_val_rmse),\n",
" xytext=(best_epoch, best_val_rmse + 1),\n",
" ha=\"center\",\n",
" arrowprops=dict(facecolor='black', shrink=0.05),\n",
" fontsize=16,\n",
" )\n",
"\n",
"best_val_rmse -= 0.03 # just to make the graph look better\n",
"plt.plot([0, n_epochs], [best_val_rmse, best_val_rmse], \"k:\", linewidth=2)\n",
"plt.plot(np.sqrt(val_errors), \"b-\", linewidth=3, label=\"Validation set\")\n",
"plt.plot(np.sqrt(train_errors), \"r--\", linewidth=2, label=\"Training set\")\n",
"plt.legend(loc=\"upper right\", fontsize=14)\n",
"plt.xlabel(\"Epoch\", fontsize=14)\n",
"plt.ylabel(\"RMSE\", fontsize=14)\n",
"save_fig(\"early_stopping_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 44,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.base import clone\n",
"sgd_reg = SGDRegressor(n_iter=1, warm_start=True, penalty=None,\n",
2017-05-29 23:20:14 +02:00
" learning_rate=\"constant\", eta0=0.0005, random_state=42)\n",
2016-05-22 16:01:18 +02:00
"\n",
"minimum_val_error = float(\"inf\")\n",
"best_epoch = None\n",
"best_model = None\n",
"for epoch in range(1000):\n",
" sgd_reg.fit(X_train_poly_scaled, y_train) # continues where it left off\n",
" y_val_predict = sgd_reg.predict(X_val_poly_scaled)\n",
" val_error = mean_squared_error(y_val_predict, y_val)\n",
" if val_error < minimum_val_error:\n",
" minimum_val_error = val_error\n",
" best_epoch = epoch\n",
2017-05-29 23:20:14 +02:00
" best_model = clone(sgd_reg)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
2016-05-22 16:01:18 +02:00
"best_epoch, best_model"
]
},
2016-09-27 16:39:16 +02:00
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 46,
2016-09-27 16:39:16 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-09-27 16:39:16 +02:00
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
2017-05-29 23:20:14 +02:00
"import numpy as np"
2016-09-27 16:39:16 +02:00
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 47,
2016-09-27 16:39:16 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-09-27 16:39:16 +02:00
},
"outputs": [],
"source": [
"t1a, t1b, t2a, t2b = -1, 3, -1.5, 1.5\n",
"\n",
"# ignoring bias term\n",
"t1s = np.linspace(t1a, t1b, 500)\n",
"t2s = np.linspace(t2a, t2b, 500)\n",
"t1, t2 = np.meshgrid(t1s, t2s)\n",
"T = np.c_[t1.ravel(), t2.ravel()]\n",
"Xr = np.array([[-1, 1], [-0.3, -1], [1, 0.1]])\n",
"yr = 2 * Xr[:, :1] + 0.5 * Xr[:, 1:]\n",
"\n",
"J = (1/len(Xr) * np.sum((T.dot(Xr.T) - yr.T)**2, axis=1)).reshape(t1.shape)\n",
"\n",
"N1 = np.linalg.norm(T, ord=1, axis=1).reshape(t1.shape)\n",
"N2 = np.linalg.norm(T, ord=2, axis=1).reshape(t1.shape)\n",
"\n",
"t_min_idx = np.unravel_index(np.argmin(J), J.shape)\n",
"t1_min, t2_min = t1[t_min_idx], t2[t_min_idx]\n",
"\n",
"t_init = np.array([[0.25], [-1]])"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 48,
2016-09-27 16:39:16 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-09-27 16:39:16 +02:00
},
"outputs": [],
"source": [
"def bgd_path(theta, X, y, l1, l2, core = 1, eta = 0.1, n_iterations = 50):\n",
" path = [theta]\n",
" for iteration in range(n_iterations):\n",
" gradients = core * 2/len(X) * X.T.dot(X.dot(theta) - y) + l1 * np.sign(theta) + 2 * l2 * theta\n",
"\n",
" theta = theta - eta * gradients\n",
" path.append(theta)\n",
" return np.array(path)\n",
"\n",
"plt.figure(figsize=(12, 8))\n",
"for i, N, l1, l2, title in ((0, N1, 0.5, 0, \"Lasso\"), (1, N2, 0, 0.1, \"Ridge\")):\n",
" JR = J + l1 * N1 + l2 * N2**2\n",
" \n",
" tr_min_idx = np.unravel_index(np.argmin(JR), JR.shape)\n",
" t1r_min, t2r_min = t1[tr_min_idx], t2[tr_min_idx]\n",
"\n",
" levelsJ=(np.exp(np.linspace(0, 1, 20)) - 1) * (np.max(J) - np.min(J)) + np.min(J)\n",
" levelsJR=(np.exp(np.linspace(0, 1, 20)) - 1) * (np.max(JR) - np.min(JR)) + np.min(JR)\n",
" levelsN=np.linspace(0, np.max(N), 10)\n",
" \n",
" path_J = bgd_path(t_init, Xr, yr, l1=0, l2=0)\n",
" path_JR = bgd_path(t_init, Xr, yr, l1, l2)\n",
" path_N = bgd_path(t_init, Xr, yr, np.sign(l1)/3, np.sign(l2), core=0)\n",
"\n",
" plt.subplot(221 + i * 2)\n",
" plt.grid(True)\n",
" plt.axhline(y=0, color='k')\n",
" plt.axvline(x=0, color='k')\n",
" plt.contourf(t1, t2, J, levels=levelsJ, alpha=0.9)\n",
" plt.contour(t1, t2, N, levels=levelsN)\n",
" plt.plot(path_J[:, 0], path_J[:, 1], \"w-o\")\n",
" plt.plot(path_N[:, 0], path_N[:, 1], \"y-^\")\n",
" plt.plot(t1_min, t2_min, \"rs\")\n",
" plt.title(r\"$\\ell_{}$ penalty\".format(i + 1), fontsize=16)\n",
" plt.axis([t1a, t1b, t2a, t2b])\n",
"\n",
" plt.subplot(222 + i * 2)\n",
" plt.grid(True)\n",
" plt.axhline(y=0, color='k')\n",
" plt.axvline(x=0, color='k')\n",
" plt.contourf(t1, t2, JR, levels=levelsJR, alpha=0.9)\n",
" plt.plot(path_JR[:, 0], path_JR[:, 1], \"w-o\")\n",
" plt.plot(t1r_min, t2r_min, \"rs\")\n",
" plt.title(title, fontsize=16)\n",
" plt.axis([t1a, t1b, t2a, t2b])\n",
"\n",
"for subplot in (221, 223):\n",
" plt.subplot(subplot)\n",
" plt.ylabel(r\"$\\theta_2$\", fontsize=20, rotation=0)\n",
"\n",
"for subplot in (223, 224):\n",
" plt.subplot(subplot)\n",
" plt.xlabel(r\"$\\theta_1$\", fontsize=20)\n",
"\n",
"save_fig(\"lasso_vs_ridge_plot\")\n",
"plt.show()"
]
},
2016-05-22 16:01:18 +02:00
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-05-22 16:01:18 +02:00
"source": [
"# Logistic regression"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 49,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"t = np.linspace(-10, 10, 100)\n",
"sig = 1 / (1 + np.exp(-t))\n",
"plt.figure(figsize=(9, 3))\n",
"plt.plot([-10, 10], [0, 0], \"k-\")\n",
"plt.plot([-10, 10], [0.5, 0.5], \"k:\")\n",
"plt.plot([-10, 10], [1, 1], \"k:\")\n",
"plt.plot([0, 0], [-1.1, 1.1], \"k-\")\n",
"plt.plot(t, sig, \"b-\", linewidth=2, label=r\"$\\sigma(t) = \\frac{1}{1 + e^{-t}}$\")\n",
"plt.xlabel(\"t\")\n",
"plt.legend(loc=\"upper left\", fontsize=20)\n",
"plt.axis([-10, 10, -0.1, 1.1])\n",
"save_fig(\"logistic_function_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 50,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn import datasets\n",
"iris = datasets.load_iris()\n",
"list(iris.keys())"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 51,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"print(iris.DESCR)"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 52,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"X = iris[\"data\"][:, 3:] # petal width\n",
2017-05-29 23:20:14 +02:00
"y = (iris[\"target\"] == 2).astype(np.int) # 1 if Iris-Virginica, else 0"
2016-05-22 16:01:18 +02:00
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 53,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
2017-05-29 23:20:14 +02:00
"from sklearn.linear_model import LogisticRegression\n",
2017-06-06 15:16:46 +02:00
"log_reg = LogisticRegression(random_state=42)\n",
2017-05-29 23:20:14 +02:00
"log_reg.fit(X, y)"
2016-05-22 16:01:18 +02:00
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 54,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
2017-05-29 23:20:14 +02:00
"X_new = np.linspace(0, 3, 1000).reshape(-1, 1)\n",
"y_proba = log_reg.predict_proba(X_new)\n",
"\n",
"plt.plot(X_new, y_proba[:, 1], \"g-\", linewidth=2, label=\"Iris-Virginica\")\n",
"plt.plot(X_new, y_proba[:, 0], \"b--\", linewidth=2, label=\"Not Iris-Virginica\")"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"The figure in the book actually is actually a bit fancier:"
]
},
2016-05-22 16:01:18 +02:00
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 55,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"X_new = np.linspace(0, 3, 1000).reshape(-1, 1)\n",
"y_proba = log_reg.predict_proba(X_new)\n",
"decision_boundary = X_new[y_proba[:, 1] >= 0.5][0]\n",
"\n",
"plt.figure(figsize=(8, 3))\n",
"plt.plot(X[y==0], y[y==0], \"bs\")\n",
"plt.plot(X[y==1], y[y==1], \"g^\")\n",
"plt.plot([decision_boundary, decision_boundary], [-1, 2], \"k:\", linewidth=2)\n",
"plt.plot(X_new, y_proba[:, 1], \"g-\", linewidth=2, label=\"Iris-Virginica\")\n",
"plt.plot(X_new, y_proba[:, 0], \"b--\", linewidth=2, label=\"Not Iris-Virginica\")\n",
"plt.text(decision_boundary+0.02, 0.15, \"Decision boundary\", fontsize=14, color=\"k\", ha=\"center\")\n",
"plt.arrow(decision_boundary, 0.08, -0.3, 0, head_width=0.05, head_length=0.1, fc='b', ec='b')\n",
"plt.arrow(decision_boundary, 0.92, 0.3, 0, head_width=0.05, head_length=0.1, fc='g', ec='g')\n",
"plt.xlabel(\"Petal width (cm)\", fontsize=14)\n",
"plt.ylabel(\"Probability\", fontsize=14)\n",
"plt.legend(loc=\"center left\", fontsize=14)\n",
"plt.axis([0, 3, -0.02, 1.02])\n",
"save_fig(\"logistic_regression_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"decision_boundary"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"log_reg.predict([[1.7], [1.5]])"
]
},
{
"cell_type": "code",
"execution_count": 58,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"X = iris[\"data\"][:, (2, 3)] # petal length, petal width\n",
"y = (iris[\"target\"] == 2).astype(np.int)\n",
"\n",
2017-06-06 15:16:46 +02:00
"log_reg = LogisticRegression(C=10**10, random_state=42)\n",
2016-05-22 16:01:18 +02:00
"log_reg.fit(X, y)\n",
"\n",
"x0, x1 = np.meshgrid(\n",
" np.linspace(2.9, 7, 500).reshape(-1, 1),\n",
" np.linspace(0.8, 2.7, 200).reshape(-1, 1),\n",
" )\n",
"X_new = np.c_[x0.ravel(), x1.ravel()]\n",
"\n",
"y_proba = log_reg.predict_proba(X_new)\n",
"\n",
"plt.figure(figsize=(10, 4))\n",
"plt.plot(X[y==0, 0], X[y==0, 1], \"bs\")\n",
"plt.plot(X[y==1, 0], X[y==1, 1], \"g^\")\n",
"\n",
"zz = y_proba[:, 1].reshape(x0.shape)\n",
"contour = plt.contour(x0, x1, zz, cmap=plt.cm.brg)\n",
"\n",
"\n",
"left_right = np.array([2.9, 7])\n",
"boundary = -(log_reg.coef_[0][0] * left_right + log_reg.intercept_[0]) / log_reg.coef_[0][1]\n",
"\n",
"plt.clabel(contour, inline=1, fontsize=12)\n",
"plt.plot(left_right, boundary, \"k--\", linewidth=3)\n",
"plt.text(3.5, 1.5, \"Not Iris-Virginica\", fontsize=14, color=\"b\", ha=\"center\")\n",
"plt.text(6.5, 2.3, \"Iris-Virginica\", fontsize=14, color=\"g\", ha=\"center\")\n",
"plt.xlabel(\"Petal length\", fontsize=14)\n",
"plt.ylabel(\"Petal width\", fontsize=14)\n",
"plt.axis([2.9, 7, 0.8, 2.7])\n",
"save_fig(\"logistic_regression_contour_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 59,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"X = iris[\"data\"][:, (2, 3)] # petal length, petal width\n",
"y = iris[\"target\"]\n",
"\n",
2017-06-06 15:16:46 +02:00
"softmax_reg = LogisticRegression(multi_class=\"multinomial\",solver=\"lbfgs\", C=10, random_state=42)\n",
2017-05-29 23:20:14 +02:00
"softmax_reg.fit(X, y)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
2016-05-22 16:01:18 +02:00
"x0, x1 = np.meshgrid(\n",
" np.linspace(0, 8, 500).reshape(-1, 1),\n",
" np.linspace(0, 3.5, 200).reshape(-1, 1),\n",
" )\n",
"X_new = np.c_[x0.ravel(), x1.ravel()]\n",
"\n",
"\n",
"y_proba = softmax_reg.predict_proba(X_new)\n",
"y_predict = softmax_reg.predict(X_new)\n",
"\n",
"zz1 = y_proba[:, 1].reshape(x0.shape)\n",
"zz = y_predict.reshape(x0.shape)\n",
"\n",
"plt.figure(figsize=(10, 4))\n",
"plt.plot(X[y==2, 0], X[y==2, 1], \"g^\", label=\"Iris-Virginica\")\n",
2017-02-17 14:47:18 +01:00
"plt.plot(X[y==1, 0], X[y==1, 1], \"bs\", label=\"Iris-Versicolor\")\n",
2016-05-22 16:01:18 +02:00
"plt.plot(X[y==0, 0], X[y==0, 1], \"yo\", label=\"Iris-Setosa\")\n",
"\n",
"from matplotlib.colors import ListedColormap\n",
"custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])\n",
"\n",
"plt.contourf(x0, x1, zz, cmap=custom_cmap, linewidth=5)\n",
"contour = plt.contour(x0, x1, zz1, cmap=plt.cm.brg)\n",
"plt.clabel(contour, inline=1, fontsize=12)\n",
"plt.xlabel(\"Petal length\", fontsize=14)\n",
"plt.ylabel(\"Petal width\", fontsize=14)\n",
"plt.legend(loc=\"center left\", fontsize=14)\n",
"plt.axis([0, 7, 0, 3.5])\n",
"save_fig(\"softmax_regression_contour_plot\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 61,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"softmax_reg.predict([[5, 2]])"
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 62,
2016-05-22 16:01:18 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": false,
"deletable": true,
"editable": true
2016-05-22 16:01:18 +02:00
},
"outputs": [],
"source": [
"softmax_reg.predict_proba([[5, 2]])"
]
2016-09-27 16:39:16 +02:00
},
{
"cell_type": "markdown",
2017-02-17 11:51:26 +01:00
"metadata": {
"deletable": true,
"editable": true
},
2016-09-27 16:39:16 +02:00
"source": [
"# Exercise solutions"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"## 1. to 11."
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"See appendix A."
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"## 12. Batch Gradient Descent with early stopping for Softmax Regression\n",
"(without using Scikit-Learn)"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Let's start by loading the data. We will just reuse the Iris dataset we loaded earlier."
]
},
{
"cell_type": "code",
"execution_count": 63,
2017-02-17 11:51:26 +01:00
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2017-02-17 11:51:26 +01:00
},
2017-05-29 23:20:14 +02:00
"outputs": [],
"source": [
"X = iris[\"data\"][:, (2, 3)] # petal length, petal width\n",
"y = iris[\"target\"]"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2016-09-27 16:39:16 +02:00
"source": [
2017-05-29 23:20:14 +02:00
"We need to add the bias term for every instance ($x_0 = 1$):"
2016-09-27 16:39:16 +02:00
]
},
{
"cell_type": "code",
2017-05-29 23:20:14 +02:00
"execution_count": 64,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"X_with_bias = np.c_[np.ones([len(X), 1]), X]"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"And let's set the random seed so the output of this exercise solution is reproducible:"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"np.random.seed(2042)"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"The easiest option to split the dataset into a training set, a validation set and a test set would be to use Scikit-Learn's `train_test_split()` function, but the point of this exercise is to try understand the algorithms by implementing them manually. So here is one possible implementation:"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"test_ratio = 0.2\n",
"validation_ratio = 0.2\n",
"total_size = len(X_with_bias)\n",
"\n",
"test_size = int(total_size * test_ratio)\n",
"validation_size = int(total_size * validation_ratio)\n",
"train_size = total_size - test_size - validation_size\n",
"\n",
"rnd_indices = np.random.permutation(total_size)\n",
"\n",
"X_train = X_with_bias[rnd_indices[:train_size]]\n",
"y_train = y[rnd_indices[:train_size]]\n",
"X_valid = X_with_bias[rnd_indices[train_size:-test_size]]\n",
"y_valid = y[rnd_indices[train_size:-test_size]]\n",
"X_test = X_with_bias[rnd_indices[-test_size:]]\n",
"y_test = y[rnd_indices[-test_size:]]"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"The targets are currently class indices (0, 1 or 2), but we need target class probabilities to train the Softmax Regression model. Each instance will have target class probabilities equal to 0.0 for all classes except for the target class which will have a probability of 1.0 (in other words, the vector of class probabilities for ay given instance is a one-hot vector). Let's write a small function to convert the vector of class indices into a matrix containing a one-hot vector for each instance:"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"def to_one_hot(y):\n",
" n_classes = y.max() + 1\n",
" m = len(y)\n",
" Y_one_hot = np.zeros((m, n_classes))\n",
" Y_one_hot[np.arange(m), y] = 1\n",
" return Y_one_hot"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Let's test this function on the first 10 instances:"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"y_train[:10]"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"to_one_hot(y_train[:10])"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Looks good, so let's create the target class probabilities matrix for the training set and the test set:"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"Y_train_one_hot = to_one_hot(y_train)\n",
"Y_valid_one_hot = to_one_hot(y_valid)\n",
"Y_test_one_hot = to_one_hot(y_test)"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Now let's implement the Softmax function. Recall that it is defined by the following equation:\n",
"\n",
"$\\sigma\\left(\\mathbf{s}(\\mathbf{x})\\right)_k = \\dfrac{\\exp\\left(s_k(\\mathbf{x})\\right)}{\\sum\\limits_{j=1}^{K}{\\exp\\left(s_j(\\mathbf{x})\\right)}}$"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"def softmax(logits):\n",
" exps = np.exp(logits)\n",
" exp_sums = np.sum(exps, axis=1, keepdims=True)\n",
" return exps / exp_sums"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"We are almost ready to start training. Let's define the number of inputs and outputs:"
]
},
{
"cell_type": "code",
"execution_count": 72,
2016-09-27 16:39:16 +02:00
"metadata": {
2017-02-17 11:51:26 +01:00
"collapsed": true,
"deletable": true,
"editable": true
2016-09-27 16:39:16 +02:00
},
"outputs": [],
2017-05-29 23:20:14 +02:00
"source": [
"n_inputs = X_train.shape[1] # == 3 (2 features plus the bias term)\n",
"n_outputs = len(np.unique(y_train)) # == 3 (3 iris classes)"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Now here comes the hardest part: training! Theoretically, it's simple: it's just a matter of translating the math equations into Python code. But in practice, it can be quite tricky: in particular, it's easy to mix up the order of the terms, or the indices. You can even end up with code that looks like it's working but is actually not computing exactly the right thing. When unsure, you should write down the shape of each term in the equation and make sure the corresponding terms in your code match closely. It can also help to evaluate each term independently and print them out. The good news it that you won't have to do this everyday, since all this is well implemented by Scikit-Learn, but it will help you understand what's going on under the hood.\n",
"\n",
"So the equations we will need are the cost function:\n",
"\n",
"$J(\\mathbf{\\Theta}) =\n",
"- \\dfrac{1}{m}\\sum\\limits_{i=1}^{m}\\sum\\limits_{k=1}^{K}{y_k^{(i)}\\log\\left(\\hat{p}_k^{(i)}\\right)}$\n",
"\n",
"And the equation for the gradients:\n",
"\n",
"$\\nabla_{\\mathbf{\\theta}^{(k)}} \\, J(\\mathbf{\\Theta}) = \\dfrac{1}{m} \\sum\\limits_{i=1}^{m}{ \\left ( \\hat{p}^{(i)}_k - y_k^{(i)} \\right ) \\mathbf{x}^{(i)}}$\n",
"\n",
"Note that $\\log\\left(\\hat{p}_k^{(i)}\\right)$ may not be computable if $\\hat{p}_k^{(i)} = 0$. So we will add a tiny value $\\epsilon$ to $\\log\\left(\\hat{p}_k^{(i)}\\right)$ to avoid getting `nan` values."
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"eta = 0.01\n",
"n_iterations = 5001\n",
"m = len(X_train)\n",
"epsilon = 1e-7\n",
"\n",
"Theta = np.random.randn(n_inputs, n_outputs)\n",
"\n",
"for iteration in range(n_iterations):\n",
" logits = X_train.dot(Theta)\n",
" Y_proba = softmax(logits)\n",
" loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba + epsilon), axis=1))\n",
" error = Y_proba - Y_train_one_hot\n",
" if iteration % 500 == 0:\n",
" print(iteration, loss)\n",
" gradients = 1/m * X_train.T.dot(error)\n",
" Theta = Theta - eta * gradients"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"And that's it! The Softmax model is trained. Let's look at the model parameters:"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"Theta"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Let's make predictions for the validation set and check the accuracy score:"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"logits = X_valid.dot(Theta)\n",
"Y_proba = softmax(logits)\n",
"y_predict = np.argmax(Y_proba, axis=1)\n",
"\n",
"accuracy_score = np.mean(y_predict == y_valid)\n",
"accuracy_score"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Well, this model looks pretty good. For the sake of the exercise, let's add a bit of $\\ell_2$ regularization. The following training code is similar to the one above, but the loss now has an additional $\\ell_2$ penalty, and the gradients have the proper additional term (note that we don't regularize the first element of `Theta` since this corresponds to the bias term). Also, let's try increasing the learning rate `eta`."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"eta = 0.1\n",
"n_iterations = 5001\n",
"m = len(X_train)\n",
"epsilon = 1e-7\n",
"alpha = 0.1 # regularization hyperparameter\n",
"\n",
"Theta = np.random.randn(n_inputs, n_outputs)\n",
"\n",
"for iteration in range(n_iterations):\n",
" logits = X_train.dot(Theta)\n",
" Y_proba = softmax(logits)\n",
" xentropy_loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba + epsilon), axis=1))\n",
" l2_loss = 1/2 * np.sum(np.square(Theta[1:]))\n",
" loss = xentropy_loss + alpha * l2_loss\n",
" error = Y_proba - Y_train_one_hot\n",
" if iteration % 500 == 0:\n",
" print(iteration, loss)\n",
" gradients = 1/m * X_train.T.dot(error) + np.r_[np.zeros([1, n_inputs]), alpha * Theta[1:]]\n",
" Theta = Theta - eta * gradients"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Because of the additional $\\ell_2$ penalty, the loss seems greater than earlier, but perhaps this model will perform better? Let's find out:"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"logits = X_valid.dot(Theta)\n",
"Y_proba = softmax(logits)\n",
"y_predict = np.argmax(Y_proba, axis=1)\n",
"\n",
"accuracy_score = np.mean(y_predict == y_valid)\n",
"accuracy_score"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Cool, perfect accuracy! We probably just got lucky with this validation set, but still, it's pleasant."
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Now let's add early stopping. For this we just need to measure the loss on the validation set at every iteration and stop when the error starts growing."
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"eta = 0.1 \n",
"n_iterations = 5001\n",
"m = len(X_train)\n",
"epsilon = 1e-7\n",
"alpha = 0.1 # regularization hyperparameter\n",
"best_loss = np.infty\n",
"\n",
"Theta = np.random.randn(n_inputs, n_outputs)\n",
"\n",
"for iteration in range(n_iterations):\n",
" logits = X_train.dot(Theta)\n",
" Y_proba = softmax(logits)\n",
" xentropy_loss = -np.mean(np.sum(Y_train_one_hot * np.log(Y_proba + epsilon), axis=1))\n",
" l2_loss = 1/2 * np.sum(np.square(Theta[1:]))\n",
" loss = xentropy_loss + alpha * l2_loss\n",
" error = Y_proba - Y_train_one_hot\n",
" gradients = 1/m * X_train.T.dot(error) + np.r_[np.zeros([1, n_inputs]), alpha * Theta[1:]]\n",
" Theta = Theta - eta * gradients\n",
"\n",
" logits = X_valid.dot(Theta)\n",
" Y_proba = softmax(logits)\n",
" xentropy_loss = -np.mean(np.sum(Y_valid_one_hot * np.log(Y_proba + epsilon), axis=1))\n",
" l2_loss = 1/2 * np.sum(np.square(Theta[1:]))\n",
" loss = xentropy_loss + alpha * l2_loss\n",
" if iteration % 500 == 0:\n",
" print(iteration, loss)\n",
" if loss < best_loss:\n",
" best_loss = loss\n",
" else:\n",
" print(iteration - 1, best_loss)\n",
" print(iteration, loss, \"early stopping!\")\n",
" break"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"logits = X_valid.dot(Theta)\n",
"Y_proba = softmax(logits)\n",
"y_predict = np.argmax(Y_proba, axis=1)\n",
"\n",
"accuracy_score = np.mean(y_predict == y_valid)\n",
"accuracy_score"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Still perfect, but faster."
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Now let's plot the model's predictions on the whole dataset:"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"x0, x1 = np.meshgrid(\n",
" np.linspace(0, 8, 500).reshape(-1, 1),\n",
" np.linspace(0, 3.5, 200).reshape(-1, 1),\n",
" )\n",
"X_new = np.c_[x0.ravel(), x1.ravel()]\n",
"X_new_with_bias = np.c_[np.ones([len(X_new), 1]), X_new]\n",
"\n",
"logits = X_new_with_bias.dot(Theta)\n",
"Y_proba = softmax(logits)\n",
"y_predict = np.argmax(Y_proba, axis=1)\n",
"\n",
"zz1 = Y_proba[:, 1].reshape(x0.shape)\n",
"zz = y_predict.reshape(x0.shape)\n",
"\n",
"plt.figure(figsize=(10, 4))\n",
"plt.plot(X[y==2, 0], X[y==2, 1], \"g^\", label=\"Iris-Virginica\")\n",
"plt.plot(X[y==1, 0], X[y==1, 1], \"bs\", label=\"Iris-Versicolor\")\n",
"plt.plot(X[y==0, 0], X[y==0, 1], \"yo\", label=\"Iris-Setosa\")\n",
"\n",
"from matplotlib.colors import ListedColormap\n",
"custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])\n",
"\n",
"plt.contourf(x0, x1, zz, cmap=custom_cmap, linewidth=5)\n",
"contour = plt.contour(x0, x1, zz1, cmap=plt.cm.brg)\n",
"plt.clabel(contour, inline=1, fontsize=12)\n",
"plt.xlabel(\"Petal length\", fontsize=14)\n",
"plt.ylabel(\"Petal width\", fontsize=14)\n",
"plt.legend(loc=\"upper left\", fontsize=14)\n",
"plt.axis([0, 7, 0, 3.5])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"And now let's measure the final model's accuracy on the test set:"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": false,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
"source": [
"logits = X_test.dot(Theta)\n",
"Y_proba = softmax(logits)\n",
"y_predict = np.argmax(Y_proba, axis=1)\n",
"\n",
"accuracy_score = np.mean(y_predict == y_test)\n",
"accuracy_score"
]
},
{
"cell_type": "markdown",
2017-06-06 15:16:46 +02:00
"metadata": {
"deletable": true,
"editable": true
},
2017-05-29 23:20:14 +02:00
"source": [
"Our perfect model turns out to have slight imperfections. This variability is likely due to the very small size of the dataset: depending on how you sample the training set, validation set and the test set, you can get quite different results. Try changing the random seed and running the code again a few times, you will see that the results will vary."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
2017-06-06 15:16:46 +02:00
"collapsed": true,
"deletable": true,
"editable": true
2017-05-29 23:20:14 +02:00
},
"outputs": [],
2016-09-27 16:39:16 +02:00
"source": []
2016-05-22 16:01:18 +02:00
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2017-05-29 23:20:14 +02:00
"version": "3.5.3"
2016-05-22 16:01:18 +02:00
},
2016-09-27 16:39:16 +02:00
"nav_menu": {},
2016-05-22 16:01:18 +02:00
"toc": {
2016-09-27 16:39:16 +02:00
"navigate_menu": true,
"number_sections": true,
"sideBar": true,
"threshold": 6,
2016-05-22 16:01:18 +02:00
"toc_cell": false,
2016-09-27 16:39:16 +02:00
"toc_section_display": "block",
2016-05-22 16:01:18 +02:00
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 0
}