question archive Can you make a Jupyter notebook for it plz i need ASAPProblem 2: Load the Breast Cancer Wisconsin (Diagnostic) sample dataset from the UCI Machine Learning Repository (The discrete version at: breast-cancerwisconsin

Can you make a Jupyter notebook for it plz i need ASAPProblem 2: Load the Breast Cancer Wisconsin (Diagnostic) sample dataset from the UCI Machine Learning Repository (The discrete version at: breast-cancerwisconsin

Subject:Computer SciencePrice: Bought3

Can you make a Jupyter notebook for it plz i need ASAPProblem 2:

Load the Breast Cancer Wisconsin (Diagnostic) sample dataset from the UCI Machine Learning Repository (The discrete version at: breast-cancerwisconsin.data) into Python using a Pandas dataframe. Induce a binary Decision Tree with a minimum of 2 instances in the leaves, no splits of subsets below 5, and a maximal tree depth of 2 (use the default Gini criterion). Calculate the Entropy, Gini, and Misclassification Error of the first split - what is the Information Gain? What is the feature selected for the first split, and what value determines the decision boundary?

 

Problem 3:

Load the Breast Cancer Wisconsin (Diagnostic) sample dataset from the UCI Machine Learning Repository (The continuous version at: wdbc.data) into Python using a Pandas dataframe. Induce the same binary Decision Tree as above (now using the continuous data) but perform a PCA dimensionality reduction beforehand. Using only the first principal component of the data for a model fit, what is the F1, Precision, and Recall of the PCA-based single factor model compared to the original (continuous) data? Repeat using the first and second principal components. Using the Confusion Matrix, what are the values for FP and TP as well as FPR/TPR? Is using continuous data in this case beneficial within the model? How?

 

Problem 4

Simulate a binary classification dataset with a single feature using a mixture of normal distributions with NumPy (Hint: Generate two data frames with the random number and a class label, and combine them together). The normal distribution parameters (np.random.normal) should be (5,2) and (-5,2) for the pair of samples. Induce a binary Decision Tree of maximum depth 2, and obtain the threshold value for the feature in the first split.  How does this value compare to the empirical distribution of the feature? 

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE