diff --git a/02_end_to_end_machine_learning_project.ipynb b/02_end_to_end_machine_learning_project.ipynb index bb71e46..dae1e4a 100644 --- a/02_end_to_end_machine_learning_project.ipynb +++ b/02_end_to_end_machine_learning_project.ipynb @@ -14,6 +14,13 @@ "*This notebook contains all the sample code and solutions to the exercices in chapter 2.*" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note**: You may find little differences between the code outputs in the book and in these Jupyter notebooks: these slight differences are mostly due to the random nature of many training algorithms: although I have tried to make these notebooks' outputs as constant as possible, it is impossible to guarantee that they will produce the exact same output on every platform. Also, some data structures (such as dictionaries) do not preserve the item order. Finally, I fixed a few minor bugs (I added notes next to the concerned cells) which lead to slightly different results, without changing the ideas presented in the book." + ] + }, { "cell_type": "markdown", "metadata": { @@ -408,6 +415,17 @@ { "cell_type": "code", "execution_count": 23, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "housing[\"income_cat\"].hist()" + ] + }, + { + "cell_type": "code", + "execution_count": 24, "metadata": { "collapsed": false, "deletable": true, @@ -425,7 +443,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 25, "metadata": { "collapsed": false, "deletable": true, @@ -438,7 +456,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 26, "metadata": { "collapsed": false, "deletable": true, @@ -462,7 +480,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 27, "metadata": { "collapsed": false, "deletable": true, @@ -475,7 +493,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 28, "metadata": { "collapsed": false, "deletable": true, @@ -499,7 +517,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 29, "metadata": { "collapsed": true, "deletable": true, @@ -512,7 +530,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 30, "metadata": { "collapsed": false, "deletable": true, @@ -526,7 +544,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 31, "metadata": { "collapsed": false, "deletable": true, @@ -538,9 +556,16 @@ "save_fig(\"better_visualization_plot\")" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The argument `sharex=False` fixes a display bug (the x-axis values and legend were not displayed). This is a temporary fix (see: https://github.com/pandas-dev/pandas/issues/10611). Thanks to Wilmer Arellano for pointing it out." + ] + }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 32, "metadata": { "collapsed": false, "deletable": true, @@ -551,16 +576,14 @@ "housing.plot(kind=\"scatter\", x=\"longitude\", y=\"latitude\", alpha=0.4,\n", " s=housing[\"population\"]/100, label=\"population\", figsize=(10,7),\n", " c=\"median_house_value\", cmap=plt.get_cmap(\"jet\"), colorbar=True,\n", - " sharex=False) # sharex=False fixes a bug (temporary solution)\n", - " # See: https://github.com/pandas-dev/pandas/issues/10611\n", - " # Thanks to Wilmer Arellano for pointing it out.\n", + " sharex=False)\n", "plt.legend()\n", "save_fig(\"housing_prices_scatterplot\")" ] }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 33, "metadata": { "collapsed": false, "deletable": true, @@ -592,7 +615,7 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 34, "metadata": { "collapsed": true, "deletable": true, @@ -605,7 +628,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 35, "metadata": { "collapsed": false, "deletable": true, @@ -618,7 +641,7 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 36, "metadata": { "collapsed": false, "deletable": true, @@ -634,7 +657,7 @@ }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 37, "metadata": { "collapsed": false, "deletable": true, @@ -652,7 +675,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 38, "metadata": { "collapsed": true, "deletable": true, @@ -665,9 +688,16 @@ "housing[\"population_per_household\"]=housing[\"population\"]/housing[\"households\"]" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note: there was a bug in the previous cell, in the definition of the `rooms_per_household` attribute. This explains why the correlation value below differs slightly from the value in the book (unless you are reading the latest version)." + ] + }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 39, "metadata": { "collapsed": false, "deletable": true, @@ -681,7 +711,7 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 40, "metadata": { "collapsed": false, "deletable": true, @@ -697,7 +727,7 @@ }, { "cell_type": "code", - "execution_count": 40, + "execution_count": 41, "metadata": { "collapsed": false, "deletable": true, @@ -720,7 +750,7 @@ }, { "cell_type": "code", - "execution_count": 41, + "execution_count": 42, "metadata": { "collapsed": true, "deletable": true, @@ -732,19 +762,6 @@ "housing_labels = strat_train_set[\"median_house_value\"].copy()" ] }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": { - "collapsed": false, - "deletable": true, - "editable": true - }, - "outputs": [], - "source": [ - "housing.iloc[21:24]" - ] - }, { "cell_type": "code", "execution_count": 43, @@ -755,8 +772,7 @@ }, "outputs": [], "source": [ - "housing_copy = housing.copy().iloc[21:24]\n", - "housing_copy.dropna(subset=[\"total_bedrooms\"]) # option 1" + "housing.iloc[21:24]" ] }, { @@ -770,7 +786,7 @@ "outputs": [], "source": [ "housing_copy = housing.copy().iloc[21:24]\n", - "housing_copy.drop(\"total_bedrooms\", axis=1) # option 2" + "housing_copy.dropna(subset=[\"total_bedrooms\"]) # option 1" ] }, { @@ -782,6 +798,20 @@ "editable": true }, "outputs": [], + "source": [ + "housing_copy = housing.copy().iloc[21:24]\n", + "housing_copy.drop(\"total_bedrooms\", axis=1) # option 2" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": { + "collapsed": false, + "deletable": true, + "editable": true + }, + "outputs": [], "source": [ "housing_copy = housing.copy().iloc[21:24]\n", "median = housing_copy[\"total_bedrooms\"].median()\n", @@ -791,7 +821,7 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 47, "metadata": { "collapsed": false, "deletable": true, @@ -804,7 +834,7 @@ }, { "cell_type": "code", - "execution_count": 47, + "execution_count": 48, "metadata": { "collapsed": false, "deletable": true, @@ -819,7 +849,7 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 49, "metadata": { "collapsed": false, "deletable": true, @@ -841,7 +871,7 @@ }, { "cell_type": "code", - "execution_count": 49, + "execution_count": 50, "metadata": { "collapsed": false, "deletable": true, @@ -856,7 +886,7 @@ }, { "cell_type": "code", - "execution_count": 50, + "execution_count": 51, "metadata": { "collapsed": false, "deletable": true, @@ -869,7 +899,7 @@ }, { "cell_type": "code", - "execution_count": 51, + "execution_count": 52, "metadata": { "collapsed": false, "deletable": true, @@ -882,7 +912,7 @@ }, { "cell_type": "code", - "execution_count": 52, + "execution_count": 53, "metadata": { "collapsed": false, "deletable": true, @@ -895,7 +925,7 @@ }, { "cell_type": "code", - "execution_count": 53, + "execution_count": 54, "metadata": { "collapsed": false, "deletable": true, @@ -908,7 +938,7 @@ }, { "cell_type": "code", - "execution_count": 54, + "execution_count": 55, "metadata": { "collapsed": true, "deletable": true, @@ -921,7 +951,7 @@ }, { "cell_type": "code", - "execution_count": 55, + "execution_count": 56, "metadata": { "collapsed": true, "deletable": true, @@ -934,7 +964,7 @@ }, { "cell_type": "code", - "execution_count": 56, + "execution_count": 57, "metadata": { "collapsed": false, "deletable": true, @@ -947,7 +977,7 @@ }, { "cell_type": "code", - "execution_count": 57, + "execution_count": 58, "metadata": { "collapsed": false, "deletable": true, @@ -960,7 +990,7 @@ }, { "cell_type": "code", - "execution_count": 58, + "execution_count": 59, "metadata": { "collapsed": false, "deletable": true, @@ -974,7 +1004,7 @@ }, { "cell_type": "code", - "execution_count": 59, + "execution_count": 60, "metadata": { "collapsed": false, "deletable": true, @@ -992,7 +1022,7 @@ }, { "cell_type": "code", - "execution_count": 60, + "execution_count": 61, "metadata": { "collapsed": false, "deletable": true, @@ -1005,7 +1035,7 @@ }, { "cell_type": "code", - "execution_count": 61, + "execution_count": 62, "metadata": { "collapsed": false, "deletable": true, @@ -1022,7 +1052,7 @@ }, { "cell_type": "code", - "execution_count": 62, + "execution_count": 63, "metadata": { "collapsed": false, "deletable": true, @@ -1035,7 +1065,7 @@ }, { "cell_type": "code", - "execution_count": 63, + "execution_count": 64, "metadata": { "collapsed": false, "deletable": true, @@ -1052,7 +1082,7 @@ }, { "cell_type": "code", - "execution_count": 64, + "execution_count": 65, "metadata": { "collapsed": false, "deletable": true, @@ -1085,7 +1115,7 @@ }, { "cell_type": "code", - "execution_count": 65, + "execution_count": 66, "metadata": { "collapsed": false, "deletable": true, @@ -1099,7 +1129,7 @@ }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 67, "metadata": { "collapsed": false, "deletable": true, @@ -1121,7 +1151,7 @@ }, { "cell_type": "code", - "execution_count": 67, + "execution_count": 68, "metadata": { "collapsed": false, "deletable": true, @@ -1134,7 +1164,7 @@ }, { "cell_type": "code", - "execution_count": 68, + "execution_count": 69, "metadata": { "collapsed": true, "deletable": true, @@ -1155,7 +1185,7 @@ }, { "cell_type": "code", - "execution_count": 69, + "execution_count": 70, "metadata": { "collapsed": true, "deletable": true, @@ -1181,7 +1211,7 @@ }, { "cell_type": "code", - "execution_count": 70, + "execution_count": 71, "metadata": { "collapsed": false, "deletable": true, @@ -1199,7 +1229,7 @@ }, { "cell_type": "code", - "execution_count": 71, + "execution_count": 72, "metadata": { "collapsed": false, "deletable": true, @@ -1213,7 +1243,7 @@ }, { "cell_type": "code", - "execution_count": 72, + "execution_count": 73, "metadata": { "collapsed": false, "deletable": true, @@ -1236,7 +1266,7 @@ }, { "cell_type": "code", - "execution_count": 73, + "execution_count": 74, "metadata": { "collapsed": false, "deletable": true, @@ -1252,7 +1282,7 @@ }, { "cell_type": "code", - "execution_count": 74, + "execution_count": 75, "metadata": { "collapsed": false, "deletable": true, @@ -1270,7 +1300,7 @@ }, { "cell_type": "code", - "execution_count": 75, + "execution_count": 76, "metadata": { "collapsed": false, "deletable": true, @@ -1283,7 +1313,7 @@ }, { "cell_type": "code", - "execution_count": 76, + "execution_count": 77, "metadata": { "collapsed": false, "deletable": true, @@ -1296,7 +1326,7 @@ }, { "cell_type": "code", - "execution_count": 77, + "execution_count": 78, "metadata": { "collapsed": false, "deletable": true, @@ -1314,7 +1344,7 @@ }, { "cell_type": "code", - "execution_count": 78, + "execution_count": 79, "metadata": { "collapsed": false, "deletable": true, @@ -1330,7 +1360,7 @@ }, { "cell_type": "code", - "execution_count": 79, + "execution_count": 80, "metadata": { "collapsed": false, "deletable": true, @@ -1346,7 +1376,7 @@ }, { "cell_type": "code", - "execution_count": 80, + "execution_count": 81, "metadata": { "collapsed": false, "deletable": true, @@ -1372,7 +1402,7 @@ }, { "cell_type": "code", - "execution_count": 81, + "execution_count": 82, "metadata": { "collapsed": false, "deletable": true, @@ -1389,7 +1419,7 @@ }, { "cell_type": "code", - "execution_count": 82, + "execution_count": 83, "metadata": { "collapsed": false, "deletable": true, @@ -1407,7 +1437,7 @@ }, { "cell_type": "code", - "execution_count": 83, + "execution_count": 84, "metadata": { "collapsed": false, "deletable": true, @@ -1423,7 +1453,7 @@ }, { "cell_type": "code", - "execution_count": 84, + "execution_count": 85, "metadata": { "collapsed": false, "deletable": true, @@ -1439,7 +1469,7 @@ }, { "cell_type": "code", - "execution_count": 85, + "execution_count": 86, "metadata": { "collapsed": false, "deletable": true, @@ -1455,7 +1485,7 @@ }, { "cell_type": "code", - "execution_count": 86, + "execution_count": 87, "metadata": { "collapsed": false, "deletable": true, @@ -1473,7 +1503,7 @@ }, { "cell_type": "code", - "execution_count": 87, + "execution_count": 88, "metadata": { "collapsed": false, "deletable": true, @@ -1487,7 +1517,7 @@ }, { "cell_type": "code", - "execution_count": 88, + "execution_count": 89, "metadata": { "collapsed": false, "deletable": true, @@ -1507,7 +1537,7 @@ }, { "cell_type": "code", - "execution_count": 89, + "execution_count": 90, "metadata": { "collapsed": false, "deletable": true, @@ -1530,7 +1560,7 @@ }, { "cell_type": "code", - "execution_count": 90, + "execution_count": 91, "metadata": { "collapsed": false, "deletable": true, @@ -1543,7 +1573,7 @@ }, { "cell_type": "code", - "execution_count": 91, + "execution_count": 92, "metadata": { "collapsed": false, "deletable": true, @@ -1556,7 +1586,7 @@ }, { "cell_type": "code", - "execution_count": 92, + "execution_count": 93, "metadata": { "collapsed": false, "deletable": true, @@ -1571,7 +1601,7 @@ }, { "cell_type": "code", - "execution_count": 93, + "execution_count": 94, "metadata": { "collapsed": false, "deletable": true, @@ -1584,7 +1614,7 @@ }, { "cell_type": "code", - "execution_count": 94, + "execution_count": 95, "metadata": { "collapsed": false, "deletable": true, @@ -1608,7 +1638,7 @@ }, { "cell_type": "code", - "execution_count": 95, + "execution_count": 96, "metadata": { "collapsed": false, "deletable": true, @@ -1623,7 +1653,7 @@ }, { "cell_type": "code", - "execution_count": 96, + "execution_count": 97, "metadata": { "collapsed": false, "deletable": true, @@ -1637,7 +1667,7 @@ }, { "cell_type": "code", - "execution_count": 97, + "execution_count": 98, "metadata": { "collapsed": false, "deletable": true, @@ -1653,7 +1683,7 @@ }, { "cell_type": "code", - "execution_count": 98, + "execution_count": 99, "metadata": { "collapsed": true, "deletable": true, @@ -1675,7 +1705,7 @@ }, { "cell_type": "code", - "execution_count": 99, + "execution_count": 100, "metadata": { "collapsed": false, "deletable": true, @@ -1711,7 +1741,7 @@ }, { "cell_type": "code", - "execution_count": 100, + "execution_count": 101, "metadata": { "collapsed": false, "deletable": true, @@ -1748,7 +1778,7 @@ }, { "cell_type": "code", - "execution_count": 101, + "execution_count": 102, "metadata": { "collapsed": true, "deletable": true, @@ -1761,7 +1791,7 @@ }, { "cell_type": "code", - "execution_count": 102, + "execution_count": 103, "metadata": { "collapsed": true, "deletable": true, @@ -1787,7 +1817,7 @@ }, { "cell_type": "code", - "execution_count": 103, + "execution_count": 104, "metadata": { "collapsed": false, "deletable": true, @@ -1837,7 +1867,7 @@ }, { "cell_type": "code", - "execution_count": 104, + "execution_count": 105, "metadata": { "collapsed": false, "deletable": true, @@ -1870,7 +1900,7 @@ }, { "cell_type": "code", - "execution_count": 105, + "execution_count": 106, "metadata": { "collapsed": false, "deletable": true, @@ -1895,7 +1925,7 @@ }, { "cell_type": "code", - "execution_count": 106, + "execution_count": 107, "metadata": { "collapsed": false, "deletable": true, @@ -1938,7 +1968,7 @@ }, { "cell_type": "code", - "execution_count": 107, + "execution_count": 108, "metadata": { "collapsed": false, "deletable": true, @@ -1978,7 +2008,7 @@ }, { "cell_type": "code", - "execution_count": 108, + "execution_count": 109, "metadata": { "collapsed": false, "deletable": true, @@ -2003,7 +2033,7 @@ }, { "cell_type": "code", - "execution_count": 109, + "execution_count": 110, "metadata": { "collapsed": false, "deletable": true, @@ -2036,7 +2066,7 @@ }, { "cell_type": "code", - "execution_count": 110, + "execution_count": 111, "metadata": { "collapsed": false, "deletable": true, @@ -2068,7 +2098,7 @@ }, { "cell_type": "code", - "execution_count": 111, + "execution_count": 112, "metadata": { "collapsed": false, "deletable": true, @@ -2120,7 +2150,7 @@ }, { "cell_type": "code", - "execution_count": 112, + "execution_count": 113, "metadata": { "collapsed": true, "deletable": true, @@ -2166,7 +2196,7 @@ }, { "cell_type": "code", - "execution_count": 113, + "execution_count": 114, "metadata": { "collapsed": true, "deletable": true, @@ -2189,7 +2219,7 @@ }, { "cell_type": "code", - "execution_count": 114, + "execution_count": 115, "metadata": { "collapsed": false, "deletable": true, @@ -2203,7 +2233,7 @@ }, { "cell_type": "code", - "execution_count": 115, + "execution_count": 116, "metadata": { "collapsed": false, "deletable": true, @@ -2226,7 +2256,7 @@ }, { "cell_type": "code", - "execution_count": 116, + "execution_count": 117, "metadata": { "collapsed": false, "deletable": true, @@ -2249,7 +2279,7 @@ }, { "cell_type": "code", - "execution_count": 117, + "execution_count": 118, "metadata": { "collapsed": false, "deletable": true, @@ -2265,7 +2295,7 @@ }, { "cell_type": "code", - "execution_count": 118, + "execution_count": 119, "metadata": { "collapsed": true, "deletable": true, @@ -2288,7 +2318,7 @@ }, { "cell_type": "code", - "execution_count": 119, + "execution_count": 120, "metadata": { "collapsed": false, "deletable": true, @@ -2311,7 +2341,7 @@ }, { "cell_type": "code", - "execution_count": 120, + "execution_count": 121, "metadata": { "collapsed": false, "deletable": true, @@ -2354,7 +2384,7 @@ }, { "cell_type": "code", - "execution_count": 121, + "execution_count": 122, "metadata": { "collapsed": false, "deletable": true, @@ -2371,7 +2401,7 @@ }, { "cell_type": "code", - "execution_count": 122, + "execution_count": 123, "metadata": { "collapsed": false, "deletable": true, @@ -2394,7 +2424,7 @@ }, { "cell_type": "code", - "execution_count": 123, + "execution_count": 124, "metadata": { "collapsed": false, "deletable": true, @@ -2441,7 +2471,7 @@ }, { "cell_type": "code", - "execution_count": 124, + "execution_count": 125, "metadata": { "collapsed": false, "deletable": true, @@ -2461,7 +2491,7 @@ }, { "cell_type": "code", - "execution_count": 125, + "execution_count": 126, "metadata": { "collapsed": false, "deletable": true, @@ -2484,7 +2514,7 @@ }, { "cell_type": "code", - "execution_count": 126, + "execution_count": 127, "metadata": { "collapsed": false, "deletable": true,