From 50aeba2e675d930769850f247aa9110426b9d834 Mon Sep 17 00:00:00 2001 From: kaksat Date: Mon, 6 Aug 2018 20:21:35 +0200 Subject: [PATCH 1/2] Correction of a formula for silhouette coefficient Source: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.silhouette_score.html --- 08_dimensionality_reduction.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/08_dimensionality_reduction.ipynb b/08_dimensionality_reduction.ipynb index fc665c9..975d83b 100644 --- a/08_dimensionality_reduction.ipynb +++ b/08_dimensionality_reduction.ipynb @@ -2606,7 +2606,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Another approach is to look at the _silhouette score_, which is the mean _silhouette coefficient_ over all the instances. An instance's silhouette coefficient is equal to $(a - b)/\\max(a, b)$ where $a$ is the mean distance to the other instances in the same cluster (it is the _mean intra-cluster distance_), and $b$ is the _mean nearest-cluster distance_, that is the mean distance to the instances of the next closest cluster (defined as the one that minimizes $b$, excluding the instance's own cluster). The silhouette coefficient can vary between -1 and +1: a coefficient close to +1 means that the instance is well inside its own cluster and far from other clusters, while a coefficient close to 0 means that it is close to a cluster boundary, and finally a coefficient close to -1 means that the instance may have been assigned to the wrong cluster." + "Another approach is to look at the _silhouette score_, which is the mean _silhouette coefficient_ over all the instances. An instance's silhouette coefficient is equal to $(b - a)/\\max(a, b)$ where $a$ is the mean distance to the other instances in the same cluster (it is the _mean intra-cluster distance_), and $b$ is the _mean nearest-cluster distance_, that is the mean distance to the instances of the next closest cluster (defined as the one that minimizes $b$, excluding the instance's own cluster). The silhouette coefficient can vary between -1 and +1: a coefficient close to +1 means that the instance is well inside its own cluster and far from other clusters, while a coefficient close to 0 means that it is close to a cluster boundary, and finally a coefficient close to -1 means that the instance may have been assigned to the wrong cluster." ] }, { From b9269e720757fdb71426c8596033efcd9df833c3 Mon Sep 17 00:00:00 2001 From: kaksat Date: Mon, 6 Aug 2018 21:57:40 +0200 Subject: [PATCH 2/2] Correction of a typo Current version produces the following error: AttributeError: module 'matplotlib.cm' has no attribute 'spectral' --- 08_dimensionality_reduction.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/08_dimensionality_reduction.ipynb b/08_dimensionality_reduction.ipynb index 975d83b..e5ef84c 100644 --- a/08_dimensionality_reduction.ipynb +++ b/08_dimensionality_reduction.ipynb @@ -2697,7 +2697,7 @@ " coeffs = silhouette_coefficients[y_pred == i]\n", " coeffs.sort()\n", "\n", - " color = matplotlib.cm.spectral(i / k)\n", + " color = matplotlib.cm.Spectral(i / k)\n", " plt.fill_betweenx(np.arange(pos, pos + len(coeffs)), 0, coeffs,\n", " facecolor=color, edgecolor=color, alpha=0.7)\n", " ticks.append(pos + len(coeffs) // 2)\n",