Python Functions Tutorial: Working With Functions in Python, Part 1

In today's fast-paced IT world, it is always an advantage to have an edge over the others in terms of in-depth knowledge of a certain technology. Python is a widely used language and provides 'n' number of opportunities to enthusiastic learners.

In this Python functions post, the goal is to get you the expertise required to get started and work with functions using Python. I will be covering the following topics in this Python functions article:

A Visual Introduction to Gap Statistics

We have previously seen how to implement K-Means. However, the results of this algorithm strongly rely on the choice of the parameter K. In this post, we will see how to use Gap Statistics to pick K in an optimal way. The main idea of the methodology is to compare the clusters inertia on the data to cluster and a reference dataset. The optimal choice of K is given by k for which the gap between the two results is maximum. To illustrate this idea, let’s pick as reference dataset a uniformly distributed set of points and see the result of K-Means increasing K: 

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import make_blobs
from sklearn.metrics import pairwise_distances
from sklearn.cluster import KMeans


reference = np.random.rand(100, 2)
plt.figure(figsize=(12, 3))
for k in range(1,6):
    kmeans = KMeans(n_clusters=k)
    a = kmeans.fit_predict(reference)
    plt.subplot(1,5,k)
    plt.scatter(reference[:, 0], reference[:, 1], c=a)
    plt.xlabel('k='+str(k))
plt.tight_layout()
plt.show()