Add some section headers

main
Aurélien Geron 2021-10-03 00:14:44 +13:00
parent 2bd68d6348
commit 6b821335c0
3 changed files with 239 additions and 26 deletions

View File

@ -83,7 +83,14 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Get the data" "# Get the Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download the Data"
] ]
}, },
{ {
@ -132,6 +139,13 @@
" return pd.read_csv(csv_path)" " return pd.read_csv(csv_path)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Take a Quick Look at the Data Structure"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 5, "execution_count": 5,
@ -182,6 +196,13 @@
"plt.show()" "plt.show()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Test Set"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 10, "execution_count": 10,
@ -443,7 +464,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Discover and visualize the data to gain insights" "# Discover and Visualize the Data to Gain Insights"
] ]
}, },
{ {
@ -455,6 +476,13 @@
"housing = strat_train_set.copy()" "housing = strat_train_set.copy()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualizing Geographical Data"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 33, "execution_count": 33,
@ -540,6 +568,13 @@
"plt.show()" "plt.show()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Looking for Correlations"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 38, "execution_count": 38,
@ -585,6 +620,13 @@
"save_fig(\"income_vs_house_value_scatterplot\")" "save_fig(\"income_vs_house_value_scatterplot\")"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Experimenting with Attribute Combinations"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 42, "execution_count": 42,
@ -631,7 +673,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Prepare the data for Machine Learning algorithms" "# Prepare the Data for Machine Learning Algorithms"
] ]
}, },
{ {
@ -644,6 +686,29 @@
"housing_labels = strat_train_set[\"median_house_value\"].copy()" "housing_labels = strat_train_set[\"median_house_value\"].copy()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Cleaning"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the book 3 options are listed:\n",
"\n",
"```python\n",
"housing.dropna(subset=[\"total_bedrooms\"]) # option 1\n",
"housing.drop(\"total_bedrooms\", axis=1) # option 2\n",
"median = housing[\"total_bedrooms\"].median() # option 3\n",
"housing[\"total_bedrooms\"].fillna(median, inplace=True)\n",
"```\n",
"\n",
"To demonstrate each of them, let's create a copy of the housing dataset, but keeping only the rows that contain at least one null. Then it will be easier to visualize exactly what each option does:"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 47, "execution_count": 47,
@ -815,6 +880,13 @@
"housing_tr.head()" "housing_tr.head()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Handling Text and Categorical Attributes"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@ -910,6 +982,13 @@
"cat_encoder.categories_" "cat_encoder.categories_"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Custom Transformers"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@ -985,6 +1064,13 @@
"housing_extra_attribs.head()" "housing_extra_attribs.head()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Transformation Pipelines"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@ -1154,7 +1240,14 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Select and train a model " "# Select and Train a Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training and Evaluating on the Training Set"
] ]
}, },
{ {
@ -1269,7 +1362,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Fine-tune your model" "## Better Evaluation Using Cross-Validation"
] ]
}, },
{ {
@ -1382,6 +1475,20 @@
"svm_rmse" "svm_rmse"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Fine-Tune Your Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Grid Search"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 99, "execution_count": 99,
@ -1457,6 +1564,13 @@
"pd.DataFrame(grid_search.cv_results_)" "pd.DataFrame(grid_search.cv_results_)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Randomized Search"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 104, "execution_count": 104,
@ -1488,6 +1602,13 @@
" print(np.sqrt(-mean_score), params)" " print(np.sqrt(-mean_score), params)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Analyze the Best Models and Their Errors"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 106, "execution_count": 106,
@ -1512,6 +1633,13 @@
"sorted(zip(feature_importances, attributes), reverse=True)" "sorted(zip(feature_importances, attributes), reverse=True)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluate Your System on the Test Set"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 108, "execution_count": 108,

View File

@ -245,7 +245,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Binary classifier" "# Training a Binary Classifier"
] ]
}, },
{ {
@ -296,6 +296,20 @@
"cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring=\"accuracy\")" "cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring=\"accuracy\")"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Performance Measures"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Measuring Accuracy Using Cross-Validation"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 18, "execution_count": 18,
@ -362,6 +376,13 @@
"* lastly, other things may prevent perfect reproducibility, such as Python dicts and sets whose order is not guaranteed to be stable across sessions, or the order of files in a directory which is also not guaranteed." "* lastly, other things may prevent perfect reproducibility, such as Python dicts and sets whose order is not guaranteed to be stable across sessions, or the order of files in a directory which is also not guaranteed."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Confusion Matrix"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 21, "execution_count": 21,
@ -394,6 +415,13 @@
"confusion_matrix(y_train_5, y_train_perfect_predictions)" "confusion_matrix(y_train_5, y_train_perfect_predictions)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Precision and Recall"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 24, "execution_count": 24,
@ -453,6 +481,13 @@
"cm[1, 1] / (cm[1, 1] + (cm[1, 0] + cm[0, 1]) / 2)" "cm[1, 1] / (cm[1, 1] + (cm[1, 0] + cm[0, 1]) / 2)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Precision/Recall Trade-off"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 30, "execution_count": 30,
@ -625,7 +660,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# ROC curves" "## The ROC Curve"
] ]
}, },
{ {
@ -757,7 +792,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Multiclass classification" "# Multiclass Classification"
] ]
}, },
{ {
@ -882,7 +917,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Error analysis" "# Error Analysis"
] ]
}, },
{ {
@ -969,7 +1004,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Multilabel classification" "# Multilabel Classification"
] ]
}, },
{ {
@ -1018,7 +1053,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Multioutput classification" "# Multioutput Classification"
] ]
}, },
{ {

View File

@ -4,7 +4,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"**Chapter 4 Training Linear Models**" "**Chapter 4 Training Models**"
] ]
}, },
{ {
@ -89,7 +89,14 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Linear regression using the Normal Equation" "# Linear Regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Normal Equation"
] ]
}, },
{ {
@ -243,7 +250,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Linear regression using batch gradient descent" "# Gradient Descent\n",
"## Batch Gradient Descent"
] ]
}, },
{ {
@ -330,7 +338,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Stochastic Gradient Descent" "## Stochastic Gradient Descent"
] ]
}, },
{ {
@ -416,7 +424,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Mini-batch gradient descent" "## Mini-batch gradient descent"
] ]
}, },
{ {
@ -494,7 +502,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Polynomial regression" "# Polynomial Regression"
] ]
}, },
{ {
@ -616,6 +624,13 @@
"plt.show()" "plt.show()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Learning Curves"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 35, "execution_count": 35,
@ -678,7 +693,14 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Regularized models" "# Regularized Linear Models"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ridge Regression"
] ]
}, },
{ {
@ -772,6 +794,13 @@
"sgd_reg.predict([[1.5]])" "sgd_reg.predict([[1.5]])"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lasso Regression"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 43, "execution_count": 43,
@ -803,6 +832,13 @@
"lasso_reg.predict([[1.5]])" "lasso_reg.predict([[1.5]])"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Elastic Net"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 45, "execution_count": 45,
@ -815,6 +851,13 @@
"elastic_net.predict([[1.5]])" "elastic_net.predict([[1.5]])"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Early Stopping"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 46, "execution_count": 46,
@ -829,13 +872,6 @@
"X_train, X_val, y_train, y_val = train_test_split(X[:50], y[:50].ravel(), test_size=0.5, random_state=10)" "X_train, X_val, y_train, y_val = train_test_split(X[:50], y[:50].ravel(), test_size=0.5, random_state=10)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Early stopping example:"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 47, "execution_count": 47,
@ -1029,7 +1065,14 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Logistic regression" "# Logistic Regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Decision Boundaries"
] ]
}, },
{ {
@ -1166,6 +1209,13 @@
"log_reg.predict([[1.7], [1.5]])" "log_reg.predict([[1.7], [1.5]])"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Softmax Regression"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 62, "execution_count": 62,