Project Goal

The goal of this project is to design an automated data pipeline using hdfs commands and MapReduce.

Overview

Your data pipeline will consist of a Unix script that completes the following steps:

Takes some file(s) from the local file system and copies them to a staging folder in hdfs.

Runs a MapReduce job on the files contained in the staging folder.

Takes the output files, renames them and moves them to a target folder.

Deletes the temporary (MR-generated) output folder.

Moves the content of the staging folder to an archive folder in hdfs.

By local file system, we mean the file system on the client machine. For example, if you are accessing Hadoop from a virtual machine, the virtual machine would be the local system.

pur-new-sol

Related Questions