A cell phone company needs to establish its network by putting its towers in a particular region it has acquired. The location of these towers can be found by clustering (where data points are residence locations) so that all users receive optimum signal strength.
In a previous mini-lab, you learned how to perform a K-Means cluster analysis on a 2-dimensional data set, and the focus was on grouping similar data points together. In this project, you will also perform a K-Means cluster analysis, but the focus will be on the centers of the clusters. (You will be exploring questions such as: How many clusters are needed to meet certain requirements? Given a certain number of clusters, what are the minimum and maximum distances between points and their cluster-centers?)
As with the previous project, there will be two parts to this project:
For this project, you will perform a K-Means clustering on two different data sets. Instead of reading data from a file, you will generate two random sets of points to be used in your K-Means clustering program.
We will create two data sets to represent different size communities. The first data set will represent a small city with 100 homes uniformly distributed within a 10x10 square mile region. The second data set will represent a slightly larger city, with 1000 homes within a 20x20 square mile region, with more homes distributed inside the 10x10 square mile region around the city center.
To create a random set of points in Python, we can use the
random.uniform
function from numpy
. If we
assume that the city center is at (0,0), a city that is enclosed within
a 10x10 square mile region would extend 5 miles north, south, east and
west of the city center. To get a random set of 50 real numbers within
these boundaries, we could use a statement such as:
city = np.random.uniform(low=-5.0, high=5.0, size=(50,2))The result of this statement would be a set of 50 real-numbered points, such that the
x
and y
coordinates are real
number between -5 and 5.
To obtain a result that represents a city with a more concentrated
population in the city center, we may create two uniform sets of points
(where the high/low range of the inner city data set should be within
the high/low range of the overall city), and then use the
concatentate
function from numpy
. For example,
if a
and b
are arrays as shown below, we can
create a new array c
which combines the two arrays
a
and b
:
a = np.array([[1,2],[3,4],[5,6]]) b = np.array([[7,8],[9,10]]) c = np.concatenate((a,b), axis=0)The result of this code creates the array
c
that looks
like:
[[1,2],[3,4],[5,6],[7,8],[9,10]]
Use the larger data set to perform the following experiments:
5
new towers, determine the maximum and minimum distance from any
house to its closest tower.
m
miles.
(What is a reasonable value for m
? How does the number of towers
change depending on the value of m
?)
x
houses. (What is a reasonable
value for x
? How does the number of towers change
depending on the value of x
?)
You may need to write additional functions to help answer the experiemtnal questions. In particular, it may be useful to write a function that takes a list of points in a cluster, along with the center point of that cluster as parameters, and returns the maximum and minimum distances between any point in that list and its center.
Reminder: You should make sure to include good comments with any new functions and code you write.
When you are satisfied your code is working properly, submit your file via Kit. Your Python code will be graded on the basis of style as well as correctness. Your code should include appropriate comments. (Comments before functions, inside of functions, program description, etc.) Variable names should be descriptive.
The following scale will be used to grade the program:
Grading Criteria | Points |
---|---|
Functionality | 20 |
Style/Organization | 5 |
Appropriate Comments | 5 |
TOTAL | 30 |
The final report should be a 3-5 page paper detailing your experiment(s) and results. It should include appropriate figures to illustrate your results. Grammar and spelling will count, as will the overall structure and flow of the paper. Your paper should be prepared in LaTeX. Minimally, it should include:
In writing your paper, you can consider the rest of the class to be the target audience. This means that you do not need an extensive explanation of the basic premise of the paper, but you do need to clearly describe the specific question that you are addressing. Reports will be graded according to the following rubric:
Grading Criteria | Points |
---|---|
Content Do you have clearly articulated experimental questions? Does your paper effectively address those questions? | 10 |
Spelling, grammar, sentence structure | 5 |
Word selection and tone Your paper should use precise and unambiguous language. It should not have an informal tone. | 5 |
Organization Appropriately organized sections. Well structured paragraphs. | 5 |
Figures Figures should support the conclusions of your paper. They should include captions and axis labels. | 3 |
BiblographyAny outside information should be cited. | 2 |
TOTAL: | 30 |
Your completed report should be printed and submitted in class, and should also be submitted on Kit.