question archive Why are the original/raw data not readily usable by analytics tasks? What are the main data preprocessing steps? List and explain their importance in analytics

Why are the original/raw data not readily usable by analytics tasks? What are the main data preprocessing steps? List and explain their importance in analytics

Subject:Computer SciencePrice:3.87 Bought7

Why are the original/raw data not readily usable by analytics tasks? What are the main data preprocessing steps? List and explain their importance in analytics.

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Answer:

1.

Why are the original/raw data not readily usable by analytics tasks?

The main reason why original / raw data is not usable by analytics its because raw data is usually dirty, misaligned, inaccurate and overly complex. Data processing and cleansing is necessary in order to feed data mining models with clean data.

The main challenges of using raw data in analytics tasks are:

·       Data is never static - as I have mentioned earlier data should undergo data cleansing to remove duplicates and to properly structure the data to be used in data mining processes

·       Incorrect Data being analyzed can result to bad strategic decisions

·       Development and utilization of a data cleansing framework. To ensure that the right data is used in the right time and to maximize the value of the data being analyzed it is recommended to use data cleansing frameworks.

·       Big Data can bring Big Problems - unstructured and uncleaned data can cause a negative impact instead of benefits to the organization.

2.

What are the main data preprocessing steps? List and explain their importance in analytics.

The main data preprocessing steps are outlined below:

  • Dataset acquisition - This is one of the important steps in data preprocessing, to fully prepare the data for analytics the very basic requirement is to acquired these data sets. These datasets can be from multiple sources and are combined to provide a formal type of dataset.
  • Data preprocessing is often done using the Python programming language. To initiate a preprocessing procedure it is empirical that you have imported all necessary libraries to be used by your application during the said procedure.
  • Once all the data is prepared and the necessary libraries of the application are all set it is not time to import the dataset acquired
  • Identification and handling of missing values - as the data preprocessing procedure takes place you should properly identify and handle the missing values which can contribute to the result of the dataset being processed.
  • Categorizing the data - Data being processed should be categorized for ease of use in the future
  • Splitting the dataset - it is recommended to split the dataset into Training set and test set to isolate possible problems in the data provide classification in the dataset used
  • Standardization of the independent variable within the data set processed.

 

Related Questions