question archive You should be able to narrow down your list to several possible variable groupings (minimum two sets, maximum four)
Subject:StatisticsPrice:36.99 Bought3
You should be able to narrow down your list to several possible variable groupings (minimum two
sets, maximum four). You should examine these groups with the ENTER method – look for the combina-
tion with the greatest adjusted R2
and the least SEE. (Hint: you may find the basement variables are quite
correlated to the finished area variables; if so, binary variables for the slab, partial basement, and crawl
space foundation types variables may be better choices over the total basement area variable).
VERY IMPORTANT: Look carefully at the VIF (Variance Inflation Factor) and/or Tolerance statistics.
A VIF greater than 3.333 (or a Tolerance less than 0.300) indicates that there is multicollinearity in the
model. If so, you may have to manually remove one or more variables from the model. However, you must
exercise judgement when doing this. For example, if total living area is showing a VIF greater than 3.333,
you probably should not remove this key variable; a basement variable or an age variable may be the real
source of the problem. Remove variables one at a time and review changes in model results. Ensure you
document these “removals”.
When reporting on variable selection, you should discuss what you did and your conclusions – i.e.,
explain your approach and the decisions you made along the way in determining your final model. The
marker will want to see the logic of your analysis – for this exercise, the soundness of your approach is
just as (or more) important than the final “answer” you provide.
In explaining this step, it is NOT necessary to include detailed variable selection reports in the main
body of your report. Your report will probably be better quality if you describe your conclusions in the
report and leave most or all statistical output in the appendices.
Separating MODEL and TEST Databases
Lesson 8 illustrated the process of splitting the overall database into a MODEL database and TEST
database. You are NOT required to do so here – you may instead carry out model calibration and testing
in the overall database. You ARE required to briefly discuss how this MODEL/TEST separation would be
done if it were required. Also, you must comment on why this separation is necessary in modeling and
what potential consequences may result from not carrying this out.
Students who are exceptionally keen are welcome to separate the MODEL and TEST databases,
going beyond the minimum requirements for this project. If you wish to do so, you will note that each
sale in the database has been assigned a random number between 1 and 1,000, to enable selection of a
random sample of sales for the MODEL and TEST files. For example, if you wanted a 550 case MODEL
database, you might have a random number of 686 at record 550 (the exact number depends on your data
screening). The remainder would go into the TEST database. Before proceeding, ensure you investigate
the binary variables in the model database to eliminate from consideration any that do not have enough
occurrences (a minimum of 5% of 500 sales would be 25 or five occurrences if the variable has a signifi-
cant impact on the time adjusted sale price).
Stepwise Regression
Using your “best” combination of variables, now run a STEPWISE regression to see which variables are
included/excluded from the model (recall the “best” combination is the one with the greatest R2
and the
least SEE – without any multicollinearity in evidence – that is, no VIF values greater than 3.3).
Make sure to examine the t-statistic values for any variables excluded from the model. If the t-statis-
tics are outside of ±1.6 you should increase the PIN and POUT settings in your STEPWISE regression to
capture those variables (Hint: .10 and .15 are probably reasonable). Your report should describe the steps
you took and why.
The final step of the Coefficients Table from the STEPWISE regression lists the variables that will go
into the Model Calibration step – the variables listed in the final step of the Excluded Variable table are
not carried forward. You are now close to having your final model.
©Copyright 2020 by the UBC Real Estate Division
Step 7: Model Calibration
Once you have chosen the variables to include in your model, you will need to calibrate the model – that
is, use additive multiple regression to determine the final model coefficients.
Using the variables listed in the final step of the Coefficients Table from the STEPWISE regression, run
this “final” model again as an ENTER regression, making sure the R2
, SEE, constant, and all the coeffi-
cients match between the two regressions (the final step of the STEPWISE regression and this new ENTER
regression). Now check the residuals against the predicted values. Look for records with outlier residuals
using the Casewise Diagnostics report in SPSS. Filter out the outliers and re-run the model. Your adjusted
R2
and SEE should improve from the original run. Remember that outliers are those records with residual
errors beyond three standard errors from zero – see the “STEP 7: Model Calibration” section in Lesson 8.
You may have to carry out this outlier removal and the re-run of the model step more than once.
Whenever you re-run your model at this stage, ensure you examine the VIF/Tolerance values and
the t-statistics to ensure that nothing significant has changed. You may need to eliminate a variable if its
t-statistic falls inside of –1.6 to +1.6). A histogram of residuals should appear fairly normally distributed.
Again, for this step, describe what you are doing and your observations. Comment on the size and sign
of your coefficients – is there anything that is difficult to explain? Can you explain it?
The main body of your report should only include the final regression results. Other detailed reports,
such as correlation reports and model summaries should be in the appendices. Your report should refer
to these appendices and explain the conclusions you have drawn from them.
Step 8: Model Testing
You must completely test and evaluate the model. However, as stated earlier, you are NOT required to
separate the database into separate MODEL and TEST components. However, back in Step 6, you ARE
required to briefly discuss how this separation would be done if it were required. Also, you must comment
on why this separation is necessary in realistic modeling and what potential consequences may result
from not carrying this out.
A few steps in model testing (see the “STEP 8: Test and Evaluate the Model” section in Lesson 8):
• Create a predicted value (i.e., Pred_Val) transformation based on the coefficients in your final model
and calculate the predicted values
Helpful Hints
•
• In evaluating variables for inclusion, keep an appraisal perspective in mind – do they make sense? Are
you addressing what a purchaser looks at in determining value? However, be cautious about forcing a
variable into your model because your appraisal judgment is saying it must be included – there may be
another variable that is acting as a proxy for the variable you are trying to force in (if you include both
you may bring multicollinearity into the model – experiment with including and excluding variables and
seeing the effect on the model)
•
• Have you considered the correlation between variables and the time adjusted sale price? You want to
explain the most variation possible
•
• Have you considered the relationships between variables? Be careful of multicollinearity as this is a
common problem in regression models. However, you may find you need to accept some moderate
correlation in order to include the variables that explain price. But watch those VIF/Tolerance statistics
•
• Try experimenting with questionable variables to see what happens if they are included or excluded
from the model – do results improve or get worse? A quick re-run of a STEPWISE regression with an
additional variable or two will tell you a great deal
•
• You may find some variable coefficients do not have the expected sign. Is it one or two variables and
maybe an acceptable sacrifice for an overall good model? Or is it a lot of variables and a model that
makes no sense?
•
• Document your thought process: explain what you did and why. We do not expect a “perfect” model,
but we do want to see a logical analysis and well reasoned decisions
• Calculate the ratio of predicted values to adjusted sale prices (PAR). Analyze the statistics for the
prediction to adjusted sale price ratio, such as the mean/median and dispersion (e.g., COD)
• Test the ratio statistics by neighbourhood (you may also wish to use a Kruskal-Wallis test, using
neighbourhood numbers you create). Discuss your findings. Does you model need a neighbour-
hood adjustment?
• Use scatterplots, boxplots, as well as Kruskal-Wallis and Mann-Whitney tests to fully examine your
model. Confirm the model is consistent for all types of property characteristics, age groups, price
groups, and neighbourhoods. You should identify where adjustments to the model are needed to
correct systematic over- or under-valuations. If you wish you can apply any adjustments you deem
necessary…however, this is not required.
The details of the statistical tests should be included in the appendix to the report, with the results
summarized in the report body (e.g., in table form).
Remember that these tests should be conducted on variables both in your model and NOT in your
model. You want to ensure the model is equally valuing all groups.
Step 9: Conclusions
The final section of the report should include:
• Conclusions concerning the quality of the model developed. The quality should be discussed from
both an appraisal perspective and in terms of statistical theory
• Recommendations for model use, along with a discussion of any potential problems that may arise
in the use of the model
• Suggestions for improvements in future model building exercises for Midsize City
Length of Report
Your report must describe the processes you used, demonstrating your understanding of mass appraisal
theory, and knowledge of how to develop a sound, practical valuation model. However, your report
should be presented in the format of an actual report as much as possible. The only statistics that should
be included in the body of the report are those necessary to prove the validity of the opinion being
expressed. The full detailed reports should only be included as appendices to the main report. For
example, if the mean and standard deviation obtained from a descriptive statistics report are included in
your report, these should be followed by a reference to the full report (i.e., “Full details of this report may
be found in the descriptive statistics report contained on page 3 of Appendix A”).
Your report should contain no more than 30 pages of report detail and will be supported by an
appendix which contains all statistical reports that were relied on to justify any decisions or reach any
conclusions, including the syntax files showing transformations. The appendix should be organized
and indexed and may contain, on average, approximately 20-40 pages. Note that these are only rough
limits and that you should provide all relevant information, i.e., anything which you used to substantiate
decisions or adjustments. Note also that a very well-written and organized report could require signifi-
cantly less than these numbers of pages.
The table below summarizes the suggested size of submissions and the maximum allowable pages.
Under no circumstances should you need more than the stated maximum number of pages for your
report. If your submission exceeds these limits, EITHER: 1) you will get a request from the marker to
resubmit your assignment with a reduced page count; OR 2) the additional pages will not be read by the
marker and marks will be deducted for their inclusion.
Report Appendices
Suggested 15-20 pages 20 pages
Maximum 30 pages 40 pages
Please download the answer files using this link
https://drive.google.com/file/d/1RsWRrmkQBBPx5TsI0Bo7EFoaO9btiNlm/view?usp=sharing