Programming Project #2
Using Cluster Analysis to Place Cell Phone Towers

Code and Paper due Friday of 8th Week

Introduction

A cell phone company needs to establish its network by putting its towers in a particular region it has acquired. The location of these towers can be found by clustering (where data points are residence locations) so that all users receive optimum signal strength.

In a previous mini-lab, you learned how to perform a K-Means cluster analysis on a 2-dimensional data set, and the focus was on grouping similar data points together. In this project, you will also perform a K-Means cluster analysis, but the focus will be on the centers of the clusters. (You will be exploring questions such as: How many clusters are needed to meet certain requirements? Given a certain number of clusters, what are the minimum and maximum distances between points and their cluster-centers?)

As with the previous project, there will be two parts to this project:

Functioning code that runs your experiments, and
A written report describing your results.

Unlike the first project, there is only one deadline for this project. Since much of the code for this project was written in a previous mini-lab, the paper will be weighted more in final grade for this project.

The Data

For this project, you will perform a K-Means clustering on two different data sets. Instead of reading data from a file, you will generate two random sets of points to be used in your K-Means clustering program.

We will create two data sets to represent different size communities. The first data set will represent a small city with 100 homes uniformly distributed within a 10x10 square mile region. The second data set will represent a slightly larger city, with 1000 homes within a 20x20 square mile region, with more homes distributed inside the 10x10 square mile region around the city center.

To create a random set of points in Python, we can use the random.uniform function from numpy. If we assume that the city center is at (0,0), a city that is enclosed within a 10x10 square mile region would extend 5 miles north, south, east and west of the city center. To get a random set of 50 real numbers within these boundaries, we could use a statement such as:

city = np.random.uniform(low=-5.0, high=5.0, size=(50,2))

The result of this statement would be a set of 50 real-numbered points, such that the x and y coordinates are real number between -5 and 5.

To obtain a result that represents a city with a more concentrated population in the city center, we may create two uniform sets of points (where the high/low range of the inner city data set should be within the high/low range of the overall city), and then use the concatentate function from numpy. For example, if a and b are arrays as shown below, we can create a new array c which combines the two arrays a and b:

a = np.array([[1,2],[3,4],[5,6]])
b = np.array([[7,8],[9,10]])
c = np.concatenate((a,b), axis=0)

The result of this code creates the array c that looks like:

[[1,2],[3,4],[5,6],[7,8],[9,10]]

Exercises and Experiments

Create the two data sets as specified above.

Run a K-Means analysis on each of the data sets with 5 clusters and 10 iterations. Note: you will need to update the range for the axis in the scatterplot to better represent your new data sets. Save the final scatterplots to be included in your report.
Use the larger data set to perform the following experiments:
If the cell phone company decides it can build 5 new towers, determine the maximum and minimum distance from any house to its closest tower.
Experiment with how the maximum and minimum distances vary depending on how many new towers can be built.
Determine the number of towers needed so that the distance from any house to a tower is no more than m miles. (What is a reasonable value for m? How does the number of towers change depending on the value of m?)
Determine how many towers are needed if each tower should be connected to at least x houses. (What is a reasonable value for x? How does the number of towers change depending on the value of x?)
(Optional challenge): Determine how many towers are necessary if no more than 10%, or 15%, or 20% (or whatever percent you think is appropriate) of the homes are more than 2 miles (or 3 mile or 4 miles, or whatever distance you think is appropriate) from a tower.
(Optional challenge) Come up with your own question to explore related to these data sets.

You may need to write additional functions to help answer the experiemtnal questions. In particular, it may be useful to write a function that takes a list of points in a cluster, along with the center point of that cluster as parameters, and returns the maximum and minimum distances between any point in that list and its center.

Reminder: You should make sure to include good comments with any new functions and code you write.

Submitting Your Code

When you are satisfied your code is working properly, submit your file via Kit. Your Python code will be graded on the basis of style as well as correctness. Your code should include appropriate comments. (Comments before functions, inside of functions, program description, etc.) Variable names should be descriptive.

The following scale will be used to grade the program:

Grading Criteria Points

Functionality 20

Style/Organization 5

Appropriate Comments 5

TOTAL 30

Grading Criteria	Points
Functionality	20
Style/Organization	5
Appropriate Comments	5
TOTAL	30

The Report

The final report should be a 3-5 page paper detailing your experiment(s) and results. It should include appropriate figures to illustrate your results. Grammar and spelling will count, as will the overall structure and flow of the paper. Your paper should be prepared in LaTeX. Minimally, it should include:

An introduction that describes the question(s) you are addressing, and why they are interesting.
A methods section that explains the experiments you conducted.
A results section that describes the results of your experiments. You should include figures with appropriate captions and axis labels.
A discussion section that describes the significance of your results. For this paper, you might want to address the implications of your results in helping the cell phone company to make a decision on how many towers to build, and where to place them.

In writing your paper, you can consider the rest of the class to be the target audience. This means that you do not need an extensive explanation of the basic premise of the paper, but you do need to clearly describe the specific question that you are addressing. Reports will be graded according to the following rubric:

Grading Criteria Points

Content
Do you have clearly articulated experimental questions?
Does your paper effectively address those questions? 10

Spelling, grammar, sentence structure 5

Word selection and tone
Your paper should use precise and unambiguous language. It should not have an informal tone. 5

Organization
Appropriately organized sections. Well structured paragraphs. 5

Figures
Figures should support the conclusions of your paper. They should include captions and axis labels. 3

BiblographyAny outside information should be cited. 2

TOTAL: 30

Grading Criteria	Points
Content Do you have clearly articulated experimental questions? Does your paper effectively address those questions?	10
Spelling, grammar, sentence structure	5
Word selection and tone Your paper should use precise and unambiguous language. It should not have an informal tone.	5
Organization Appropriately organized sections. Well structured paragraphs.	5
Figures Figures should support the conclusions of your paper. They should include captions and axis labels.	3
BiblographyAny outside information should be cited.	2
TOTAL:	30

Submitting Your Report

Your completed report should be printed and submitted in class, and should also be submitted on Kit.

Programming Project #2Using Cluster Analysis to Place Cell Phone Towers