question archive Use the two functions you implemented to calculate the accuracy for every cluster and the whole algorithm, defined as above
Subject:Computer SciencePrice:3.86 Bought12
Use the two functions you implemented to calculate the accuracy for every cluster and the whole algorithm, defined as above. Implement the following function in analysis.py:
def accuracy(data, labels, centroids):
"""
Calculate the accuracy of the algorithm. You
should use update_assignment and majority_count
(that you previously implemented)
Arguments:
data: a list of lists representing all data points
labels: a list of ints representing all data labels
centroids: the centroid dictionary
Returns: a float representing the accuracy of the algorithm
"""
I have already implemented:
def update_assignment(data, labels, centroids):
closest = {}
for centroid in centroids:
closest[centroid] = []
for label, point in zip(labels, data):
centroid = assign_data(point, centroids)
closest[centroid].append(label)
return {centroid: points for centroid, points in closest.items()
if len(points) > 0}
def majority_count(labels):
maj = {}
for label in labels:
if label in maj.keys():
maj[label] += 1
else:
maj[label] = 1
v = list(maj.values())
return max(v)

The following script utilizes assign_data and update_assignment functions specified in the question
In case the function fails in your test,please upload the test code analysis_tests.py and provide the link with public access.
So that it is possible to find the test case where it fails and provide the right code
Follow comments for explanation
Please comment if you have any issues with the code or its execution
Step-by-step explanation
import math
def euclidean_distance(dp1, dp2):
"""Calculate the Euclidean distance between two data points. Arguments: dp1: a list of floats representing a data point dp2: a list of floats representing a data point Returns: the Euclidean distance between two data points """
total = 0
for i in range(len(dp1)):
value = dp1[i] - dp2[i]
value = value * value
total = total + value
return math.sqrt(total)
def assign_data(data_point,centroids):
# Define variable min with value infinity
min=float("Inf")
# Define a variable string centroid
centroid =""
# loop through each key-value pair in centroids
for k,v in centroids.items():
# find euclidean_distance between centroid and data_point
dist =euclidean_distance(data_point,v)
# update min and centroid if the euclidean_distance is less than the least distance seen in the iteration
if min>dist:
min = dist
centroid =k
# return centroid string
return centroid
def update_assignment(data, labels, centroids):
closest = {}
for centroid in centroids:
closest[centroid] = []
for label, point in zip(labels, data):
centroid = assign_data(point, centroids)
closest[centroid].append(label)
return {centroid: points for centroid, points in closest.items() if len(points) > 0}
def majority_count(labels):
maj = {}
for label in labels:
if label in maj.keys():
maj[label] += 1
else:
maj[label] = 1
v = list(maj.values())
return max(v)
def accuracy(data, labels, centroids):
"""
Calculate the accuracy of the algorithm. You
should use update_assignment and majority_count
(that you previously implemented)
Arguments:
data: a list of lists representing all data points
labels: a list of ints representing all data labels
centroids: the centroid dictionary
Returns: a float representing the accuracy of the algorithm
"""
centroid_point_dict = update_assignment(data,labels,centroids)
# Total labels
total_labels = sum([len(labels) for labels in centroid_point_dict.values()])
# Sum of majority count of labels for each cluster
majority_sum = sum([labels.count(majority_count(labels)) for labels in centroid_point_dict.values()])
# calculate accuracy given by sum of majority count/total_labels
accuracy = majority_sum/total_labels
return accuracy

