question archive You have received a new data source of millions of customer records, and you've this data into HDFS

You have received a new data source of millions of customer records, and you've this data into HDFS

Subject:Computer SciencePrice: Bought3

You have received a new data source of millions of customer records,

and you've this data into HDFS. Prior to analysis, you will change all customer registration to the same date format, make all addresses uppercase, and remove all customer names (for anonymization). Which process will accomplish all three objectives?
a) Adapt the data cleansing module in Mahout to your data, and invoke the Mahout library when you run your analysis
b) Pull this data into an RDBMS using sqoop and scrub records using stored procedures
c) Write a script that receives records on stdin, corrects them, and then writes them to stdout. Then, invoke this script in a map-only adoop Streaming Job
d. Write a MapReduce job with a mapper to change words to uppercase and to reduce different forms of dates to a single form
e. None of the above

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE