question archive You have to submit 2 files: Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner

You have to submit 2 files: Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner

Subject:StatisticsPrice:32.99 Bought3

You have to submit 2 files:

  1. Answer Report: In this, you need to submit all the answers to all the questions in a sequential manner. It should include a detailed explanation of the approach used, insights, inferences, all outputs of codes like graphs, tables etc. Your report should not be filled with codes. You will be evaluated based on the business report.

    Note: In the business report, there should be a proper interpretation of all the tasks performed along with actionable insights. Only the presence of interpretation of the models is not sufficient to be eligible for full marks in each of the criteria mentioned in the rubric. Marks will be deducted wherever inferences are not clearly mentioned.
    THE REPORT HAS TO BE STRICTLY SUBMITTED IN A PDF/DOC FORMAT. ANY OTHER FORMAT WILL NOT BE CONSIDERED FOR GRADING. 6 Marks are allotted for the "Quality of Business Report".

     
  2. Jupyter Notebook file: This is a must and will be used for reference while evaluating
  • Any assignment found copied/ plagiarized with another person will not be graded and marked as zero.
  • Please ensure timely submission as a post-deadline assignment will not be accepted.

Problem:

For this particular assignment, the data of different types of wine sales in the 20th century is to be analysed. Both of these data are from the same company but of different wines. As an analyst in the ABC Estate Wines, you are tasked to analyse and forecast Wine Sales in the 20th century.

Data set for the Problem: Sparkling.csv and Rose.csv

Please do perform the following questions on each of these two data sets separately.

  1. Read the data as an appropriate Time Series data and plot the data.
  2. Perform appropriate Exploratory Data Analysis to understand the data and also perform decomposition.
  3. Split the data into training and test. The test data should start in 1991.
  4.  Build all the exponential smoothing models on the training data and evaluate the model using RMSE on the test data. Other additional models such as regression, naïve forecast models, simple average models, moving average models should also be built on the training data and check the performance on the test data using RMSE.
  5. Check for the stationarity of the data on which the model is being built on using appropriate statistical tests and also mention the hypothesis for the statistical test. If the data is found to be non-stationary, take appropriate steps to make it stationary. Check the new data for stationarity and comment.
    Note: Stationarity should be checked at alpha = 0.05.
  6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate this model on the test data using RMSE.
  7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and evaluate this model on the test data using RMSE.
  8. Build a table with all the models built along with their corresponding parameters and the respective RMSE values on the test data.
  9. Based on the model-building exercise, build the most optimum model(s) on the complete data and predict 12 months into the future with appropriate confidence intervals/bands.
  10. Comment on the model thus built and report your findings and suggest the measures that the company should be taking for future sales.

Important Note: Please reflect on all that you have learned while working on this project. This step is critical in cementing all your concepts and closing the loop. Please write down your thoughts here.

All the very best!

Regards,

Program Office

Scoring guide (Rubric) - Time Series Forecasting Project (1)

Criteria

Points

1. Read the data as an appropriate Time Series data and plot the data.

2

2. Perform appropriate Exploratory Data Analysis to understand the data and also perform decomposition.

5

3. Split the data into training and test. The test data should start in 1991.

2

4. Build all the exponential smoothing models on the training data and evaluate the model using RMSE on the test data. Other models such as regression,naïve forecast models and simple average models. should also be built on the training data and check the performance on the test data using RMSE.

16

5. Check for the stationarity of the data on which the model is being built on using appropriate statistical tests and also mention the hypothesis for the statistical test. If the data is found to be non-stationary, take appropriate steps to make it stationary. Check the new data for stationarity and comment. Note: Stationarity should be checked at alpha = 0.05.

3

6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate this model on the test data using RMSE.

8

7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and evaluate this model on the test data using RMSE.

8

8. Build a table (create a data frame) with all the models built along with their corresponding parameters and the respective RMSE values on the test data.

2

9. Based on the model-building exercise, build the most optimum model(s) on the complete data and predict 12 months into the future with appropriate confidence intervals/bands.

3

10. Comment on the model thus built and report your findings and suggest the measures that the company should be taking for future sales.

Please explain and summarise the various steps performed in this project. There should be proper business interpretation and actionable insights present.

5

Quality of Business Report (Please refer to the Evaluation Guidelines for Business report checklist. Marks in this criteria are at the moderator's discretion)

6

Points

60

 

Time Series Forecasting Project Problem - FAQs

  1. How to treat the null values present in the data? Is it required to use multiple methods to treat the null values for the model building procedures?
    Ans: 
    Any method can be used to treat the missing values or impute the null values. Please refer to the materials of Week 1 of the mentored learning session. It is not imperative to use multiple imputation techniques to impute the null values.
  2. Models like SARIMA and ARIMA is taking very long to execute. Is there any particular way to make sure that the models run faster?
    Ans: 
    There is no particular methodology in general which we can apply to make sure that the algorithms are executed a bit faster in the computer system.
  3. Should the differenced data or the original data be used for the ACF and the PACF plots and building ARIMA/SARIMA models?
    Ans: 
    The differenced data (if differencing is needed to make the series stationary) should be used for plotting the ACF and the PACF plots to determine the appropriate parameters. The stationary training data should be used to build the ARIMA/SARIMA models.
  4. Is it absolutely necessary to build both ARIMA and SARIMA models for this particular problem?
    Ans: 
    It is necessary to build both ARIMA and SARIMA models  and proper explanations should be provided for if any one is better than the other and whether any of the one would have made sense in the analysis.
  5. Should two different business reports be created for the project along with two separate Python files?
    Ans: It is entirely up to the student. Two different business reports accompanied by two different Python Notebooks can be submitted.
  6. What are the expectations for the question which asks for a comment on the final model?
    Ans: 
    For this particular question, it is expected that the model should be explained in terms of business terminology. It should be clearly pointed out the reason for choosing this as the final model and how will the company be benefitted if they adopt this particular model for future sales.

     
  7. Is it necessary that all three kinds of Exponential Smoothing models should be built for this assignment?
    Ans: 
    It is mandatory to build all three exponential smoothing models . Also it needs to be stated that out of all of them which works the best in this situation should also be stated 
  8. For forecasting into the future, should only ARIMA/SARIMA models be considered or should all the models be considered?
    Ans: 
    All the models built till the end of the assignment should be considered.
  9. Can we merge these two datasets in a common data frame and perform the project?
    Ans:
     No, the assignment needs to be solved differently for these two different data sets. Do not merge these two data sets into one common data frame.
  10. Do the AIC value for Automated ARIMA/SARIMA (in which the model parameters are selected by looking at the lowest AIC) and the Manual ARIMA/SARIMA (in which the model parameters are selected by looking at the ACF and the PACF plots) be close?
    Ans: 
    There is no such rule that the AIC values for these two models should be close.
  11. Is there a need to compare and contrast the results of the two datasets (Rose and Sparkling)?
    Ans: 
    There is no need to compare the results of the models built on two different data sets.

Option 1

Low Cost Option
Download this past answer in few clicks

32.99

PURCHASE SOLUTION

Option 2

Custom new solution created by our subject matter experts

GET A QUOTE

rated 5 stars

Purchased 3 times

Completion Status 100%

Related Questions