The golden ratio is the larger root of the equation
φ² - φ - 1 = 0.
Tips, Expertise, Articles and Advice from the Pro's for Your Website or Blog to Succeed
The golden ratio is the larger root of the equation
φ² - φ - 1 = 0.
We have previously seen how to implement K-Means. However, the results of this algorithm strongly rely on the choice of the parameter K. In this post, we will see how to use Gap Statistics to pick K in an optimal way. The main idea of the methodology is to compare the clusters inertia on the data to cluster and a reference dataset. The optimal choice of K is given by k for which the gap between the two results is maximum. To illustrate this idea, let’s pick as reference dataset a uniformly distributed set of points and see the result of K-Means increasing K:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances
from sklearn.cluster import KMeans
reference = np.random.rand(100, 2)
plt.figure(figsize=(12, 3))
for k in range(1,6):
kmeans = KMeans(n_clusters=k)
a = kmeans.fit_predict(reference)
plt.subplot(1,5,k)
plt.scatter(reference[:, 0], reference[:, 1], c=a)
plt.xlabel('k='+str(k))
plt.tight_layout()
plt.show()