question archive Statistics & Data Science Big Data Analytics Perform data analysis on the dataset associated with the given problem
Subject:WritingPrice: Bought3
Statistics & Data Science Big Data Analytics Perform data analysis on the dataset associated with the given problem. This analysis should be written up (elec-tronically) in the form of a report following the required format. Below are some points to consider when analyzing the problem and presenting results. 1. Brief summary and detailed descriptions of the problem is given below. 2. The problem has a dataset associated with it. 3. Reports is to be submitted in PDF format. 4. Reports using the IEEE Manuscript Templates for Conference Proceedings https:// www.ieee.org/conferences/publishing/templates.html. The maximum file size is 50MB. 5. Submit results as an four (4) to eight (8) content pages report, including all figures and tables. Include (as part of the same PDF file) an appendix containing any computer code used to obtain results. The appendix may contain any other material that might be relevant but chose not to include in the main report. The quality of writing is a crucial component. 6. The goal is to demonstrate capability of taking a problem description and its associated dataset and producing a report 1 explaining data analyses that best solve the stated problem. There is not necessarily a correct answer, but there are certainly many incorrect approaches (or correct approaches that are not clearly explained). A successful report should produce not only a correct analysis method but an explanation that would be comprehensible to someone with only basic knowledge of statistics. Innovative (correct) analyses is expected. Standard pedestrian analyses, at best, will be marginally acceptable, and if too simple to adequately answer the posed question, might result in failure. It is possible that the ’most correct’ statistical analyses would involve techniques taught in courses more advanced than those which a PhD student has formally studied. 8. Use any text, software, or internet resources desired. Problem Summary — Traffic Po llution Study The complete problem description for this problem is contained below, with data contained in the file TrafficPollution.xlsx. The main data-set (first sheet of the file), in addition to the header row, has 90 rows and 29 columns. The 90 rows represent measurements at 3 times (early morning, morning rush hour, and evening rush hour) for 30 different i ntersections i n a large city in a developing nation. The primary response variables are three air-borne pollutants: Carbon Monoxide (CO), particulate matter of diameter less than 2.5 microns (PM2.5), and particulate matter with diameter less than 10 microns (PM10). Values for the minimum, maximum, and average of these three values over a 15-minute time period in December 2004 are reported; the three average values are the primary response variables of interest. The last 16 columns (N-AC) of the spreadsheet contain information on various covariates that might affect the pollution v ariables. The first six (columns N-S) of these pertain to variables that are dynamic, depending on what was happening at the intersection at the time the measurements were being gathered, such as the number of vehicles passing. The final 1 3 ( columns T-AC) pertain to variables that are fixed c haracteristics o f t he i ntersection u nder s tudy, s uch as whether or not a stoplight is present. The primary scientific i nterest i n t his s tudy concerns the nature of the association between the physical characteristics of the sites and the pollutant concentrations. 2 Traffic Pollution Problem A study of traffic-related air pollution levels was conducted in a large city in a developing nation. Measurements of three airborne pollutants, carbon monoxide (CO), particulate matter of diameter less than 2.5 microns (PM2.5), and particulate matter with diameter less than 10 microns (PM10), were taken at 3 times of day, early morning, morning rush, and afternoon rush, at each of 30 sites. Each site was at the edge of a major urban street or avenue. At each time of day at each site, each pollutant was measured for a period of fifteen minutes and the average pollutant concentration during that 15-minute measurement period was taken as the response variable to be analyzed. All measurements were taken on weekdays during a 9 day period in December. In addition to the pollutant measures, several site characteristics were quantified. These included characteristics of the traffic (gasoline vehicles passing per minute, diesel vehicles per minute, total vehicles per minute, speed of traffic flow, pedestrians per minute, etc.), characteristics of the road/site (whether a median was present, whether a stoplight, stop sign or other traffic halting device was present; the average height of surrounding buildings; whether the site was on a hill; the road width in meters and number of lanes; etc.); information about other pollution sources (how many upwind non-vehicle pollution sources were present); and wind strength and direction information. The Excel spreadsheet TrafficPollution.xlsx contains the data (sheet 1) and variable descriptions (sheet 2). There are some missing values in the dataset, which are indicated by periods (‘.’) in the spreadsheet and, in the case of the variable people min, the value -9. The scientific interest in this study is in the nature of the association between the physical characteristics of the sites and the pollutant concentrations. It is of interest here to examine these relationships for each measurement period (time of day) as well as to determine how these associations differ across periods. Associations among pollutants and across measurement times (within a pollutant) were also of interest. It is also of interest to quantify the ambient level of each pollutant at urban sites within this city in an appropriate manner. That is, it is desirable to try to answer the perhaps simplistic question, “How much air pollution is there on the busy streets of this city?” 3