Replace V with Vt and add a note about the error in ch 8

main
Aurélien Geron 2017-09-15 17:52:20 +02:00
parent 3d3b610634
commit d016b56672
1 changed files with 38 additions and 16 deletions

View File

@ -101,6 +101,22 @@
"## PCA using SVD decomposition"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: the `svd()` function returns `U`, `s` and `Vt`, where `Vt` is equal to $\\mathbf{V}^T$, the transpose of the matrix $\\mathbf{V}$. Earlier versions of the book mistakenly said that it returned `V` instead of `Vt`. Also, Equation 8-1 should actually contain $\\mathbf{V}$ instead of $\\mathbf{V}^T$, like this:\n",
"\n",
"$\n",
"\\mathbf{V} =\n",
"\\begin{pmatrix}\n",
" \\mid & \\mid & & \\mid \\\\\n",
" \\mathbf{c_1} & \\mathbf{c_2} & \\cdots & \\mathbf{c_n} \\\\\n",
" \\mid & \\mid & & \\mid\n",
"\\end{pmatrix}\n",
"$"
]
},
{
"cell_type": "code",
"execution_count": 3,
@ -110,9 +126,9 @@
"outputs": [],
"source": [
"X_centered = X - X.mean(axis=0)\n",
"U, s, V = np.linalg.svd(X_centered)\n",
"c1 = V.T[:, 0]\n",
"c2 = V.T[:, 1]"
"U, s, Vt = np.linalg.svd(X_centered)\n",
"c1 = Vt.T[:, 0]\n",
"c2 = Vt.T[:, 1]"
]
},
{
@ -135,7 +151,7 @@
"metadata": {},
"outputs": [],
"source": [
"np.allclose(X_centered, U.dot(S).dot(V))"
"np.allclose(X_centered, U.dot(S).dot(Vt))"
]
},
{
@ -146,7 +162,7 @@
},
"outputs": [],
"source": [
"W2 = V.T[:, :2]\n",
"W2 = Vt.T[:, :2]\n",
"X2D = X_centered.dot(W2)"
]
},
@ -290,7 +306,7 @@
},
"outputs": [],
"source": [
"X3D_inv_using_svd = X2D_using_svd.dot(V[:2, :])"
"X3D_inv_using_svd = X2D_using_svd.dot(Vt[:2, :])"
]
},
{
@ -338,7 +354,7 @@
"metadata": {},
"outputs": [],
"source": [
"V[:2]"
"Vt[:2]"
]
},
{
@ -1673,7 +1689,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Oh no! Training is actually 3 times slower now! How can that be? Well, as we saw in this chapter, dimensionality reduction does not always lead to faster training time: it depends on the dataset, the model and the training algorithm. See figure 8-6 (the `manifold_decision_boundary_plot*` plots above). If you try a softmax classifier instead of a random forest classifier, you will find that training time is reduced by a factor of 3 when using PCA. Actually, we will do this in a second, but first let's check the precision of the new random forest classifier."
"Oh no! Training is actually more than twice slower now! How can that be? Well, as we saw in this chapter, dimensionality reduction does not always lead to faster training time: it depends on the dataset, the model and the training algorithm. See figure 8-6 (the `manifold_decision_boundary_plot*` plots above). If you try a softmax classifier instead of a random forest classifier, you will find that training time is reduced by a factor of 3 when using PCA. Actually, we will do this in a second, but first let's check the precision of the new random forest classifier."
]
},
{
@ -1824,7 +1840,9 @@
{
"cell_type": "code",
"execution_count": 88,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.datasets import fetch_mldata\n",
@ -1842,7 +1860,9 @@
{
"cell_type": "code",
"execution_count": 89,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"np.random.seed(42)\n",
@ -1864,7 +1884,9 @@
{
"cell_type": "code",
"execution_count": 90,
"metadata": {},
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.manifold import TSNE\n",
@ -2303,21 +2325,21 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"display_name": "Python 3",
"language": "python",
"name": "python2"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.12"
"pygments_lexer": "ipython3",
"version": "3.5.2"
},
"nav_menu": {
"height": "352px",