Adding missing figure in chapter 02

main
Aurélien Geron 2017-06-08 14:23:33 +02:00
parent 74794da1de
commit 8935c61570
1 changed files with 148 additions and 118 deletions

View File

@ -14,6 +14,13 @@
"*This notebook contains all the sample code and solutions to the exercices in chapter 2.*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: You may find little differences between the code outputs in the book and in these Jupyter notebooks: these slight differences are mostly due to the random nature of many training algorithms: although I have tried to make these notebooks' outputs as constant as possible, it is impossible to guarantee that they will produce the exact same output on every platform. Also, some data structures (such as dictionaries) do not preserve the item order. Finally, I fixed a few minor bugs (I added notes next to the concerned cells) which lead to slightly different results, without changing the ideas presented in the book."
]
},
{
"cell_type": "markdown",
"metadata": {
@ -408,6 +415,17 @@
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"housing[\"income_cat\"].hist()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false,
"deletable": true,
@ -425,7 +443,7 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 25,
"metadata": {
"collapsed": false,
"deletable": true,
@ -438,7 +456,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 26,
"metadata": {
"collapsed": false,
"deletable": true,
@ -462,7 +480,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": 27,
"metadata": {
"collapsed": false,
"deletable": true,
@ -475,7 +493,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 28,
"metadata": {
"collapsed": false,
"deletable": true,
@ -499,7 +517,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 29,
"metadata": {
"collapsed": true,
"deletable": true,
@ -512,7 +530,7 @@
},
{
"cell_type": "code",
"execution_count": 29,
"execution_count": 30,
"metadata": {
"collapsed": false,
"deletable": true,
@ -526,7 +544,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 31,
"metadata": {
"collapsed": false,
"deletable": true,
@ -538,9 +556,16 @@
"save_fig(\"better_visualization_plot\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The argument `sharex=False` fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611). Thanks to Wilmer Arellano for pointing it out."
]
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 32,
"metadata": {
"collapsed": false,
"deletable": true,
@ -551,16 +576,14 @@
"housing.plot(kind=\"scatter\", x=\"longitude\", y=\"latitude\", alpha=0.4,\n",
" s=housing[\"population\"]/100, label=\"population\", figsize=(10,7),\n",
" c=\"median_house_value\", cmap=plt.get_cmap(\"jet\"), colorbar=True,\n",
" sharex=False) # sharex=False fixes a bug (temporary solution)\n",
" # See: https://github.com/pandas-dev/pandas/issues/10611\n",
" # Thanks to Wilmer Arellano for pointing it out.\n",
" sharex=False)\n",
"plt.legend()\n",
"save_fig(\"housing_prices_scatterplot\")"
]
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 33,
"metadata": {
"collapsed": false,
"deletable": true,
@ -592,7 +615,7 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 34,
"metadata": {
"collapsed": true,
"deletable": true,
@ -605,7 +628,7 @@
},
{
"cell_type": "code",
"execution_count": 34,
"execution_count": 35,
"metadata": {
"collapsed": false,
"deletable": true,
@ -618,7 +641,7 @@
},
{
"cell_type": "code",
"execution_count": 35,
"execution_count": 36,
"metadata": {
"collapsed": false,
"deletable": true,
@ -634,7 +657,7 @@
},
{
"cell_type": "code",
"execution_count": 36,
"execution_count": 37,
"metadata": {
"collapsed": false,
"deletable": true,
@ -652,7 +675,7 @@
},
{
"cell_type": "code",
"execution_count": 37,
"execution_count": 38,
"metadata": {
"collapsed": true,
"deletable": true,
@ -665,9 +688,16 @@
"housing[\"population_per_household\"]=housing[\"population\"]/housing[\"households\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: there was a bug in the previous cell, in the definition of the `rooms_per_household` attribute. This explains why the correlation value below differs slightly from the value in the book (unless you are reading the latest version)."
]
},
{
"cell_type": "code",
"execution_count": 38,
"execution_count": 39,
"metadata": {
"collapsed": false,
"deletable": true,
@ -681,7 +711,7 @@
},
{
"cell_type": "code",
"execution_count": 39,
"execution_count": 40,
"metadata": {
"collapsed": false,
"deletable": true,
@ -697,7 +727,7 @@
},
{
"cell_type": "code",
"execution_count": 40,
"execution_count": 41,
"metadata": {
"collapsed": false,
"deletable": true,
@ -720,7 +750,7 @@
},
{
"cell_type": "code",
"execution_count": 41,
"execution_count": 42,
"metadata": {
"collapsed": true,
"deletable": true,
@ -732,19 +762,6 @@
"housing_labels = strat_train_set[\"median_house_value\"].copy()"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"housing.iloc[21:24]"
]
},
{
"cell_type": "code",
"execution_count": 43,
@ -755,8 +772,7 @@
},
"outputs": [],
"source": [
"housing_copy = housing.copy().iloc[21:24]\n",
"housing_copy.dropna(subset=[\"total_bedrooms\"]) # option 1"
"housing.iloc[21:24]"
]
},
{
@ -770,7 +786,7 @@
"outputs": [],
"source": [
"housing_copy = housing.copy().iloc[21:24]\n",
"housing_copy.drop(\"total_bedrooms\", axis=1) # option 2"
"housing_copy.dropna(subset=[\"total_bedrooms\"]) # option 1"
]
},
{
@ -782,6 +798,20 @@
"editable": true
},
"outputs": [],
"source": [
"housing_copy = housing.copy().iloc[21:24]\n",
"housing_copy.drop(\"total_bedrooms\", axis=1) # option 2"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [],
"source": [
"housing_copy = housing.copy().iloc[21:24]\n",
"median = housing_copy[\"total_bedrooms\"].median()\n",
@ -791,7 +821,7 @@
},
{
"cell_type": "code",
"execution_count": 46,
"execution_count": 47,
"metadata": {
"collapsed": false,
"deletable": true,
@ -804,7 +834,7 @@
},
{
"cell_type": "code",
"execution_count": 47,
"execution_count": 48,
"metadata": {
"collapsed": false,
"deletable": true,
@ -819,7 +849,7 @@
},
{
"cell_type": "code",
"execution_count": 48,
"execution_count": 49,
"metadata": {
"collapsed": false,
"deletable": true,
@ -841,7 +871,7 @@
},
{
"cell_type": "code",
"execution_count": 49,
"execution_count": 50,
"metadata": {
"collapsed": false,
"deletable": true,
@ -856,7 +886,7 @@
},
{
"cell_type": "code",
"execution_count": 50,
"execution_count": 51,
"metadata": {
"collapsed": false,
"deletable": true,
@ -869,7 +899,7 @@
},
{
"cell_type": "code",
"execution_count": 51,
"execution_count": 52,
"metadata": {
"collapsed": false,
"deletable": true,
@ -882,7 +912,7 @@
},
{
"cell_type": "code",
"execution_count": 52,
"execution_count": 53,
"metadata": {
"collapsed": false,
"deletable": true,
@ -895,7 +925,7 @@
},
{
"cell_type": "code",
"execution_count": 53,
"execution_count": 54,
"metadata": {
"collapsed": false,
"deletable": true,
@ -908,7 +938,7 @@
},
{
"cell_type": "code",
"execution_count": 54,
"execution_count": 55,
"metadata": {
"collapsed": true,
"deletable": true,
@ -921,7 +951,7 @@
},
{
"cell_type": "code",
"execution_count": 55,
"execution_count": 56,
"metadata": {
"collapsed": true,
"deletable": true,
@ -934,7 +964,7 @@
},
{
"cell_type": "code",
"execution_count": 56,
"execution_count": 57,
"metadata": {
"collapsed": false,
"deletable": true,
@ -947,7 +977,7 @@
},
{
"cell_type": "code",
"execution_count": 57,
"execution_count": 58,
"metadata": {
"collapsed": false,
"deletable": true,
@ -960,7 +990,7 @@
},
{
"cell_type": "code",
"execution_count": 58,
"execution_count": 59,
"metadata": {
"collapsed": false,
"deletable": true,
@ -974,7 +1004,7 @@
},
{
"cell_type": "code",
"execution_count": 59,
"execution_count": 60,
"metadata": {
"collapsed": false,
"deletable": true,
@ -992,7 +1022,7 @@
},
{
"cell_type": "code",
"execution_count": 60,
"execution_count": 61,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1005,7 +1035,7 @@
},
{
"cell_type": "code",
"execution_count": 61,
"execution_count": 62,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1022,7 +1052,7 @@
},
{
"cell_type": "code",
"execution_count": 62,
"execution_count": 63,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1035,7 +1065,7 @@
},
{
"cell_type": "code",
"execution_count": 63,
"execution_count": 64,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1052,7 +1082,7 @@
},
{
"cell_type": "code",
"execution_count": 64,
"execution_count": 65,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1085,7 +1115,7 @@
},
{
"cell_type": "code",
"execution_count": 65,
"execution_count": 66,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1099,7 +1129,7 @@
},
{
"cell_type": "code",
"execution_count": 66,
"execution_count": 67,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1121,7 +1151,7 @@
},
{
"cell_type": "code",
"execution_count": 67,
"execution_count": 68,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1134,7 +1164,7 @@
},
{
"cell_type": "code",
"execution_count": 68,
"execution_count": 69,
"metadata": {
"collapsed": true,
"deletable": true,
@ -1155,7 +1185,7 @@
},
{
"cell_type": "code",
"execution_count": 69,
"execution_count": 70,
"metadata": {
"collapsed": true,
"deletable": true,
@ -1181,7 +1211,7 @@
},
{
"cell_type": "code",
"execution_count": 70,
"execution_count": 71,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1199,7 +1229,7 @@
},
{
"cell_type": "code",
"execution_count": 71,
"execution_count": 72,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1213,7 +1243,7 @@
},
{
"cell_type": "code",
"execution_count": 72,
"execution_count": 73,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1236,7 +1266,7 @@
},
{
"cell_type": "code",
"execution_count": 73,
"execution_count": 74,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1252,7 +1282,7 @@
},
{
"cell_type": "code",
"execution_count": 74,
"execution_count": 75,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1270,7 +1300,7 @@
},
{
"cell_type": "code",
"execution_count": 75,
"execution_count": 76,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1283,7 +1313,7 @@
},
{
"cell_type": "code",
"execution_count": 76,
"execution_count": 77,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1296,7 +1326,7 @@
},
{
"cell_type": "code",
"execution_count": 77,
"execution_count": 78,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1314,7 +1344,7 @@
},
{
"cell_type": "code",
"execution_count": 78,
"execution_count": 79,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1330,7 +1360,7 @@
},
{
"cell_type": "code",
"execution_count": 79,
"execution_count": 80,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1346,7 +1376,7 @@
},
{
"cell_type": "code",
"execution_count": 80,
"execution_count": 81,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1372,7 +1402,7 @@
},
{
"cell_type": "code",
"execution_count": 81,
"execution_count": 82,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1389,7 +1419,7 @@
},
{
"cell_type": "code",
"execution_count": 82,
"execution_count": 83,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1407,7 +1437,7 @@
},
{
"cell_type": "code",
"execution_count": 83,
"execution_count": 84,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1423,7 +1453,7 @@
},
{
"cell_type": "code",
"execution_count": 84,
"execution_count": 85,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1439,7 +1469,7 @@
},
{
"cell_type": "code",
"execution_count": 85,
"execution_count": 86,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1455,7 +1485,7 @@
},
{
"cell_type": "code",
"execution_count": 86,
"execution_count": 87,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1473,7 +1503,7 @@
},
{
"cell_type": "code",
"execution_count": 87,
"execution_count": 88,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1487,7 +1517,7 @@
},
{
"cell_type": "code",
"execution_count": 88,
"execution_count": 89,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1507,7 +1537,7 @@
},
{
"cell_type": "code",
"execution_count": 89,
"execution_count": 90,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1530,7 +1560,7 @@
},
{
"cell_type": "code",
"execution_count": 90,
"execution_count": 91,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1543,7 +1573,7 @@
},
{
"cell_type": "code",
"execution_count": 91,
"execution_count": 92,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1556,7 +1586,7 @@
},
{
"cell_type": "code",
"execution_count": 92,
"execution_count": 93,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1571,7 +1601,7 @@
},
{
"cell_type": "code",
"execution_count": 93,
"execution_count": 94,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1584,7 +1614,7 @@
},
{
"cell_type": "code",
"execution_count": 94,
"execution_count": 95,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1608,7 +1638,7 @@
},
{
"cell_type": "code",
"execution_count": 95,
"execution_count": 96,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1623,7 +1653,7 @@
},
{
"cell_type": "code",
"execution_count": 96,
"execution_count": 97,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1637,7 +1667,7 @@
},
{
"cell_type": "code",
"execution_count": 97,
"execution_count": 98,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1653,7 +1683,7 @@
},
{
"cell_type": "code",
"execution_count": 98,
"execution_count": 99,
"metadata": {
"collapsed": true,
"deletable": true,
@ -1675,7 +1705,7 @@
},
{
"cell_type": "code",
"execution_count": 99,
"execution_count": 100,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1711,7 +1741,7 @@
},
{
"cell_type": "code",
"execution_count": 100,
"execution_count": 101,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1748,7 +1778,7 @@
},
{
"cell_type": "code",
"execution_count": 101,
"execution_count": 102,
"metadata": {
"collapsed": true,
"deletable": true,
@ -1761,7 +1791,7 @@
},
{
"cell_type": "code",
"execution_count": 102,
"execution_count": 103,
"metadata": {
"collapsed": true,
"deletable": true,
@ -1787,7 +1817,7 @@
},
{
"cell_type": "code",
"execution_count": 103,
"execution_count": 104,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1837,7 +1867,7 @@
},
{
"cell_type": "code",
"execution_count": 104,
"execution_count": 105,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1870,7 +1900,7 @@
},
{
"cell_type": "code",
"execution_count": 105,
"execution_count": 106,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1895,7 +1925,7 @@
},
{
"cell_type": "code",
"execution_count": 106,
"execution_count": 107,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1938,7 +1968,7 @@
},
{
"cell_type": "code",
"execution_count": 107,
"execution_count": 108,
"metadata": {
"collapsed": false,
"deletable": true,
@ -1978,7 +2008,7 @@
},
{
"cell_type": "code",
"execution_count": 108,
"execution_count": 109,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2003,7 +2033,7 @@
},
{
"cell_type": "code",
"execution_count": 109,
"execution_count": 110,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2036,7 +2066,7 @@
},
{
"cell_type": "code",
"execution_count": 110,
"execution_count": 111,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2068,7 +2098,7 @@
},
{
"cell_type": "code",
"execution_count": 111,
"execution_count": 112,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2120,7 +2150,7 @@
},
{
"cell_type": "code",
"execution_count": 112,
"execution_count": 113,
"metadata": {
"collapsed": true,
"deletable": true,
@ -2166,7 +2196,7 @@
},
{
"cell_type": "code",
"execution_count": 113,
"execution_count": 114,
"metadata": {
"collapsed": true,
"deletable": true,
@ -2189,7 +2219,7 @@
},
{
"cell_type": "code",
"execution_count": 114,
"execution_count": 115,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2203,7 +2233,7 @@
},
{
"cell_type": "code",
"execution_count": 115,
"execution_count": 116,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2226,7 +2256,7 @@
},
{
"cell_type": "code",
"execution_count": 116,
"execution_count": 117,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2249,7 +2279,7 @@
},
{
"cell_type": "code",
"execution_count": 117,
"execution_count": 118,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2265,7 +2295,7 @@
},
{
"cell_type": "code",
"execution_count": 118,
"execution_count": 119,
"metadata": {
"collapsed": true,
"deletable": true,
@ -2288,7 +2318,7 @@
},
{
"cell_type": "code",
"execution_count": 119,
"execution_count": 120,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2311,7 +2341,7 @@
},
{
"cell_type": "code",
"execution_count": 120,
"execution_count": 121,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2354,7 +2384,7 @@
},
{
"cell_type": "code",
"execution_count": 121,
"execution_count": 122,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2371,7 +2401,7 @@
},
{
"cell_type": "code",
"execution_count": 122,
"execution_count": 123,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2394,7 +2424,7 @@
},
{
"cell_type": "code",
"execution_count": 123,
"execution_count": 124,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2441,7 +2471,7 @@
},
{
"cell_type": "code",
"execution_count": 124,
"execution_count": 125,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2461,7 +2491,7 @@
},
{
"cell_type": "code",
"execution_count": 125,
"execution_count": 126,
"metadata": {
"collapsed": false,
"deletable": true,
@ -2484,7 +2514,7 @@
},
{
"cell_type": "code",
"execution_count": 126,
"execution_count": 127,
"metadata": {
"collapsed": false,
"deletable": true,