question archive Business Problem: We all know that Health care is very important domain in the market
Subject:StatisticsPrice:18.99 Bought6
Business Problem:
We all know that Health care is very important domain in the market. It is directly linked with the life of the individual; hence we have to be always be proactive in this particular domain. Money plays a major role in this domain, because sometime treatment becomes super costly and if any individual is not covered under the insurance then it will become a pretty tough financial situation for that individual. The companies in the medical insurance also want to reduce their risk by optimizing the insurance cost, because we all know a healthy body is in the hand of the individual only. If individual eat healthy and do proper exercise the chance of getting ill is drastically reduced.
Goal & Objective: The objective of this exercise is to build a model, using data that provide the optimum insurance cost for an individual. You have to use the health and habit related parameters for the estimated cost of insurance
File: Data.csv
Target variable: insurance_cost Data dictionary:
Variable |
Business Definition |
applicant_id |
Applicant unique ID |
years_of_insurance_with_u s |
Since how many years customer is taking policy from the same company only |
regular_checkup_lasy_year |
Number of times customers has done the regular health check up in last one year |
adventure_sports |
Customer is involved with adventure sports like climbing, diving etc. |
Occupation |
Occupation of the customer |
visited_doctor_last_1_year |
Number of times customer has visited doctor in last one year |
cholesterol_level |
Cholesterol level of the customers while applying for insurance |
daily_avg_steps |
Average daily steps walked by customers |
age |
Age of the customer |
heart_decs_history |
Any past heart diseases |
other_major_decs_history |
Any past major diseases apart from heart like any operation |
Gender |
Gender of the customer |
avg_glucose_level |
Average glucose level of the customer while applying the insurance |
bmi |
BMI of the customer while applying the insurance |
smoking_status |
Smoking status of the customer |
Year_last_admitted |
When customer have been admitted in the hospital last time |
Location |
Location of the hospital |
weight |
Weight of the customer |
covered_by_any_other_co mpany |
Customer is covered from any other insurance company |
Alcohol |
Alcohol consumption status of the customer |
exercise |
Regular exercise status of the customer |
weight_change_in_last_one _year |
How much variation has been seen in the weight of the customer in last year |
fat_percentage |
Fat percentage of the customer while applying the insurance |
insurance_cost |
Total Insurance cost |
You have to submit 2 files :
Business Report: In this, you should cover all the topics given in the rubric in a sequential manner. It should include a detailed explanation of the approach used, insights, inferences, all outputs of codes like graphs, tables, etc. and their business implications. Your report should not be filled with codes. You will be evaluated based on the business report.
Python Notebook file: This is a must and will be used for reference while evaluating. Failing to do so shall lead to ZERO marks in all the sections where code file is necessary.
1. Problem Understanding
a) Defining problem statement b) Need of the study/project c) Understanding business/social opportunity
2. Data Report
a) Understanding how data was collected in terms of time, frequency and methodology b) Visual inspection of data (rows, columns, descriptive details) c) Understanding of attributes (variable info, renaming if required)
3. Exploratory Data Analysis
a) Univariate analysis (distribution and spread for every continuous attribute, distribution of data in categories for categorical ones) b) Bivariate analysis (relationship between different variables , correlations) a) Removal of unwanted variables (if applicable) b) Missing Value treatment (if applicable) d) Outlier treatment (if required) e) Variable transformation (if applicable) f) Addition of new variables (if required)
4. Business insights from EDA
a) Is the data unbalanced? If so, what can be done? Please explain in the context of the business b) Any business insights using clustering (if applicable) c) Any other business insights
Purchased 6 times