question archive Use the two functions you implemented to calculate the accuracy for every cluster and the whole algorithm, defined as above

Use the two functions you implemented to calculate the accuracy for every cluster and the whole algorithm, defined as above

Subject:Computer SciencePrice:3.86 Bought12

Use the two functions you implemented to calculate the accuracy for every cluster and the whole algorithm, defined as above. Implement the following function in analysis.py:

def accuracy(data, labels, centroids):
    """
    Calculate the accuracy of the algorithm. You 
    should use update_assignment and majority_count 
    (that you previously implemented)

    Arguments:
        data: a list of lists representing all data points
        labels: a list of ints representing all data labels
        centroids: the centroid dictionary

    Returns: a float representing the accuracy of the algorithm
    """

 

 

I have already implemented:

 

def update_assignment(data, labels, centroids):

closest = {}

for centroid in centroids:

closest[centroid] = []

for label, point in zip(labels, data):

centroid = assign_data(point, centroids)

closest[centroid].append(label)

return {centroid: points for centroid, points in closest.items()

if len(points) > 0}

 

 

def majority_count(labels):

maj = {}

for label in labels:

if label in maj.keys():

maj[label] += 1

else:

maj[label] = 1

v = list(maj.values())

return max(v)

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

The following script utilizes assign_data and update_assignment functions specified in the question

In case the function fails in your test,please upload the test code analysis_tests.py and provide the link with public access.

So that it is possible to find the test case where it fails and provide the right code

Follow comments for explanation

Please comment if you have any issues with the code or its execution

Step-by-step explanation

import math
def euclidean_distance(dp1, dp2): 
  """Calculate the Euclidean distance between two data points. Arguments: dp1: a list of floats representing a data point dp2: a list of floats representing a data point Returns: the Euclidean distance between two data points """ 
  total = 0 
  for i in range(len(dp1)): 
    value = dp1[i] - dp2[i] 
    value = value * value 
    total = total + value 
  return math.sqrt(total)
def assign_data(data_point,centroids):
  # Define variable min with value infinity
  min=float("Inf")
  # Define a variable string centroid 
  centroid =""
  # loop through each key-value pair in centroids
  for k,v in centroids.items():
    # find euclidean_distance between centroid and data_point
    dist =euclidean_distance(data_point,v)
    # update min and centroid if the euclidean_distance is less than the least distance seen in the iteration
    if min>dist:
      min = dist
      centroid =k




  # return centroid string
  return centroid
def update_assignment(data, labels, centroids):


  closest = {}


  for centroid in centroids:


    closest[centroid] = []


  for label, point in zip(labels, data):


    centroid = assign_data(point, centroids)


    closest[centroid].append(label)


  return {centroid: points for centroid, points in closest.items() if len(points) > 0}






def majority_count(labels):


  maj = {}


  for label in labels:


    if label in maj.keys():


      maj[label] += 1


    else:


      maj[label] = 1


  v = list(maj.values())


  return max(v)
def accuracy(data, labels, centroids):
    """
    Calculate the accuracy of the algorithm. You 
    should use update_assignment and majority_count 
    (that you previously implemented)


    Arguments:
        data: a list of lists representing all data points
        labels: a list of ints representing all data labels
        centroids: the centroid dictionary


    Returns: a float representing the accuracy of the algorithm
    """
    centroid_point_dict = update_assignment(data,labels,centroids)
    # Total labels 
    total_labels = sum([len(labels) for labels in centroid_point_dict.values()])
    # Sum of majority count of labels for each cluster    
    majority_sum = sum([labels.count(majority_count(labels)) for labels in centroid_point_dict.values()])
    # calculate accuracy given by sum of majority count/total_labels
    accuracy = majority_sum/total_labels
    return accuracy

Related Questions