question archive Use the two functions you implemented to calculate the accuracy for every cluster and the whole algorithm, defined as above
Subject:Computer SciencePrice:3.86 Bought12
Use the two functions you implemented to calculate the accuracy for every cluster and the whole algorithm, defined as above. Implement the following function in analysis.py:
def accuracy(data, labels, centroids): """ Calculate the accuracy of the algorithm. You should use update_assignment and majority_count (that you previously implemented) Arguments: data: a list of lists representing all data points labels: a list of ints representing all data labels centroids: the centroid dictionary Returns: a float representing the accuracy of the algorithm """
I have already implemented:
def update_assignment(data, labels, centroids):
closest = {}
for centroid in centroids:
closest[centroid] = []
for label, point in zip(labels, data):
centroid = assign_data(point, centroids)
closest[centroid].append(label)
return {centroid: points for centroid, points in closest.items()
if len(points) > 0}
def majority_count(labels):
maj = {}
for label in labels:
if label in maj.keys():
maj[label] += 1
else:
maj[label] = 1
v = list(maj.values())
return max(v)
The following script utilizes assign_data and update_assignment functions specified in the question
In case the function fails in your test,please upload the test code analysis_tests.py and provide the link with public access.
So that it is possible to find the test case where it fails and provide the right code
Follow comments for explanation
Please comment if you have any issues with the code or its execution
Step-by-step explanation
import math def euclidean_distance(dp1, dp2): """Calculate the Euclidean distance between two data points. Arguments: dp1: a list of floats representing a data point dp2: a list of floats representing a data point Returns: the Euclidean distance between two data points """ total = 0 for i in range(len(dp1)): value = dp1[i] - dp2[i] value = value * value total = total + value return math.sqrt(total) def assign_data(data_point,centroids): # Define variable min with value infinity min=float("Inf") # Define a variable string centroid centroid ="" # loop through each key-value pair in centroids for k,v in centroids.items(): # find euclidean_distance between centroid and data_point dist =euclidean_distance(data_point,v) # update min and centroid if the euclidean_distance is less than the least distance seen in the iteration if min>dist: min = dist centroid =k # return centroid string return centroid def update_assignment(data, labels, centroids): closest = {} for centroid in centroids: closest[centroid] = [] for label, point in zip(labels, data): centroid = assign_data(point, centroids) closest[centroid].append(label) return {centroid: points for centroid, points in closest.items() if len(points) > 0} def majority_count(labels): maj = {} for label in labels: if label in maj.keys(): maj[label] += 1 else: maj[label] = 1 v = list(maj.values()) return max(v) def accuracy(data, labels, centroids): """ Calculate the accuracy of the algorithm. You should use update_assignment and majority_count (that you previously implemented) Arguments: data: a list of lists representing all data points labels: a list of ints representing all data labels centroids: the centroid dictionary Returns: a float representing the accuracy of the algorithm """ centroid_point_dict = update_assignment(data,labels,centroids) # Total labels total_labels = sum([len(labels) for labels in centroid_point_dict.values()]) # Sum of majority count of labels for each cluster majority_sum = sum([labels.count(majority_count(labels)) for labels in centroid_point_dict.values()]) # calculate accuracy given by sum of majority count/total_labels accuracy = majority_sum/total_labels return accuracy