question archive 2) House price dataset
Subject:Computer SciencePrice: Bought3
2) House price dataset. (30 points) The HOUSES dataset contains a collection of recent real estate listings in San Luis Obispo county and around it. The dataset is provided in RealEstate.csv. You may use "one-hot—keying" to expand the categorical variables. The dataset contains the following useful ?elds (You may exclude the Location and MLS in your linear regression model). You can use any package for this question. Note: We suggest you scale the independent variables (but not the dependent variable). We also suggest you use our suggested seeds, as this dataset is particularly seed dependent. (a) 0)) Price: the most recent listing price of the house (in dollars). Bedrooms: number of bedrooms. Bathrooms: number of bathrooms. Size: size of the house in square feet. Price/Sth: price of the house per square foot. Status: Short Sale, Foreclosure and Regular. (15 points) Fit the Ridge regression model to predict Price from all variable. You can use one-hot keying to expand the categorical variable Status. Use 5-fold cross validation to select the regular- izer optimal parameter, and show the CV curve. Report the ?tted model (i.e., the parameters), and the sum-of-squares residuals. You can use any package. The suggested search range for the regularization parameter is from 1 to 80, and the suggested seed is 2. (15 points) Use lasso to select variables. Use 5-fold cross validation to select the regularizer optimal parameter, and show the CV curve. Report the ?tted model (i.e., the parameters selected and their coefficient). Show the Lasso solution path. You can use any package for this. The suggested search range for the regularization parameter is from 1 to 3000, and the suggested seed is 3.