question archive Before solving the exercises, read the instructions on the course website
Subject:Mechanical EngineeringPrice:22.99 Bought3
Before solving the exercises, read the instructions on the course website. • For each theoretical problem, submit a single pdf file that contains your answer to the respective problem. This file may be a scan of your (legible) handwriting. • For each practical problem, submit a single zip file that contains – the completed jupyter notebook (.ipynb) file, – any necessary files required to reproduce your results, and – a pdf report generated from the jupyter notebook that shows all your results. • For the bonus question, submit a single zip file that contains – a pdf file that includes your answers to the theoretical part, – the completed jupyter notebook (.ipynb) file for the practical component, – any necessary files required to reproduce your results, and – a pdf report generated from the jupyter notebook that shows your results. • Every team member has to submit a signed Code of Conduct. Problem 1 (T, 5 Points). Interpretability In the lecture, you have seen the term “interpretability” come up to describe certain models. (a) (1 Point) Describe what we mean by interpretability. (b) (3 Points) Rank the following methods by their interpretability and explain your reasoning. • Ridge Regression • LASSO • Generalized Linear Models • Neural Networks • Decision Trees • Random Forests (c) (1 Point) Let M be some model which you would have rated as highly non-interpretable in the previous exercise. You learn that every such model M can be equivalently represented by a decision tree. Does this change your opinion of the interpretability of M? If so, why? If not, why not? Problem 2 (T, 10 Points). Trees and Splits (a) (4 Points) Sketch a tree corresponding to the partition of the predictor space indicated in the figure below. The numbers inside the boxes indicate the mean of Y within each region. 1 of 3 Elements of Machine Learning, WS 2022/2023 Jilles Vreeken and Aleksandar Bojchevski Exercise Sheet #6: Trees and Forests and Supports etc. X1 X2 0 1-1 3 2 1 9 -3 42 4 2 -8 -2 0 (b) (3 points) Create a diagram similar to the one provided in a), using the tree illustrated below. You should divide up the predictor space into the correct regions, and indicate the mean for each region. X 1 < 1 X 1 < 0 X 2 < 1 X 2 < 2 X 2 < 0 X 1 < 2 X 2 < 2 -3.2 9 4 42 -7.3 -9.1 1 -0.5 (c) (3 Points) Create another equivalent tree representing exact the same partition of the predictor space as the one discussed in b), but with at least a different split at the root node. Problem 3 (T, 10 Points). Linear and Support Vector Regression (a) (4 Points) Show that the Ridge Regression problem is equivalent to min β0,β,ξ1,...,ξN 1 2 ?β? 2 2 + C X N i=1 ξ 2 i + ˜ξ 2 i subject to ξi , ˜ξi ≥ 0 − ˜ξi ≤ yi − β0 − β ?xi ≤ ξi for i = 1, . . . , N (b) (4 Points) Based on the above, derive the explicit loss function L minimized in Support Vector Regression for which the equivalent optimization objective is given by min β0,β,ξ1,...,ξN 1 2 ?β? 2 2 + C X N i=1 ξi + ˜ξi subject to ξi , ˜ξi ≥ 0 − ˜ξi − ? ≤ yi − β0 − β ?xi ≤ ξi + ? for i = 1, . . . , N 2 of 3 Elements of Machine Learning, WS 2022/2023 Jilles Vreeken and Aleksandar Bojchevski Exercise Sheet #6: Trees and Forests and Supports etc. (c) (2 Point) Explain why the optimization objective involving the 2N additional ξi , ˜ξi may be preferred for SVRs. Problem 4 (P, 15 Points). Trees and Forests and Bags and Correlations In this exercise, we will study how correlated the individual trees in Bagging and in Random Forests are. (a) (1 Point) Load the data in train.csv and compute the correlation of each predictor variable Xi with the target variable y. (b) (2 Points) Use bagging to train B = 100 Regression Trees T b on the data X. (c) (3 Points) Load the data test.csv and compute the average correlation between the predictions y b pred = T b (Xtest) for different Trees T b , b = 1, . . . , 100. (d) (2 Points) Similarly compute the average correlation between the residuals ytest − y b pred. Contrast the result with that derived from part (c) and explain which measure of correlation is more useful. (e) (2 Points) For each q ∈ {0.2, 0.4, . . . , 1}, train a Random Forest Regression with 100 trees on the training data, in which each tree uses only a fraction q of all available predictors. (f) (2 Points) Recompute the correlations from (d) for each of the random forests, and plot the results in a suitable manner. Explain what you see. (g) (3 Points) Compute the variable importances for each variable for each forest trained in (e). Plot them against the correlations computed in (a). What do you see? Why? Note: All relevant models can be fit using sklearn.ensemble.RandomForestRegressor. Problem 5 (Bonus). ¬ Heaping In the lectures, you have seen model aggregation in terms of Bagging, Boosting and Random Forests. In this exercise, we try something a little different. Let X, y be our available data, and let f1, . . . , fK be functions (not necessarily trees) trained to predict y from X. (a) We want to use all models fj by predicting y as ˆy = PK j=1 wjfj (x), where PK j=1 wj = 1 and all wj ≥ 0. Write down the optimization problem for learning the weights wj . (b) Explain what issues might arise in optimizing this objective. (c) Unlike our above approach, a much simpler aggregation mechanism is often used, simply setting yˆ = 1 K PK i=1 fj (x), i.e. all wj = 1/K. Explain under which conditions this would be a good choice, and under which conditions it would be a bad choice. (d) What kind of regularizer could we add to the optimization problem from (a) to make all wi be more similar to each other? (e) In the above, we implicitly assumed that the same weights wi are suitable for all x. Explain under which conditions this would not be suitable. (f) How would you learn appropriate weights wj (x) depending on x? What would you need to know about the functions fj to be able to do this? (g) Explain how this approach differs from Boosting.
Purchased 3 times