Mini-Lab: K-Means Clustering in 2 Dimensions

 


Introduction

The purpose of this lab is to become familiar with the algorithm for K-Means clustering with 2-dimensional data.



Exercises

  1. Download the file clusterML.py. This contains some functions that have already been started for you.

  2. Follow the comments to complete the code in the function testWhereFcn that will practice using the np.where command. This command allows us to select the indices of an array such that the elements of the array satisfies a particular condition. For example,
        a = np.array([1, 2, 3, 4, 5, 6, 7])
        b = np.where(a < 3)
        print(b)
        
    returns the tuple [0, 1].

  3. Recall the Euclidean distance formula in two dimensions. Use this to complete the code in the distance function.

  4. Read through the comments in the code to understand how the initClusterCenters function works. This function chooses k random elements from the data array to become the initial centers of the clusters.

  5. The function labelForPt takes the list of cluster centers and a point, and determines which cluster center this point is closest to. Follow the comments in the code to complete this function.

  6. Follow the comments in the code to complete the determineLabels function. This function loops through all of the points in the data array and determines which clusters they belong to. (It calls the labelForPt function!)

  7. Read through the code in the extractCluster function and update the comments for that function to reflect what it does.

  8. Fill in the code to calculate the means of the points in each cluster in the recalculateCenters function.

  9. Run the main function with different numbers of clusters and iterations. Add more detailed comments before the main function to describe what happens.

Submit