question archive Project Task:  I need a script that will transfer data files into hdfs

Project Task:  I need a script that will transfer data files into hdfs

Subject:Computer SciencePrice:9.82 Bought3

Project Task: 

  1. I need a script that will transfer data files into hdfs.
  2. This script should place existent, sized files into an individually appropriate directory on hdfs based on the file name.
  3. Status messages and flow control should also be included.
  4. An archive of the files should also be created

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

To provide directory name & file name. Suppose a file called input.txt is located at hdfs path /user/target directory.We can pass the total file path like /user/target/input.txt or
<input_directory> <sourcefile> as two different parameter.

 

Currently we are using below steps :
1. Downlading HDFS file to local unix system by below command :
Hdfs dfs -copyToLocal (HDFS file path) (Local directory path)

2. Adding sequnce generated No :
Code:
awk '{printf "%06d,",NR} 1' File.txt >File_Output.txt

Code 1: 

#! /bin/bash
#Downloading HDFS file to Local Unix & Reformatting

hdfs dfs -copyToLocal /user/target/file.txt .

awk '{printf "%06d|",NR} 1' file.txt >output.txt

 

Code2: 
case "$#" in
2)
       hdfs dfs -copyToLocal "$1"/"$2" .
       FILE="$2"
       ;;
1)
       hdfs dfs -copyToLocal "$1" .
       OLDIFS="$IFS"
       # Split $1="a/b/filename" into $1="a", $2="b", $3="filename"
       IFS="/"
               set -- $1
       IFS="$OLDIFS"

       # Get rid of "a", "b"
       shift "$(( $# - 1 ))

       FILE="$1"
       ;;
*)
       echo "Usage:  $0 path file"
       echo "Alternate usage:  $0 path/file"
       exit 1
       ;;
esac

awk '{printf "%06d|",NR} 1' "$FILE" >output.txt

 

 

 

 

 

 

 

 

 

Short Script

Command to moveFromLocal
Usage: hadoop fs -moveFromLocal <localsrc> <dst>
Similar to put command, except that the source localsrc is deleted after it's copied.
Command to moveToLocal
Usage: hadoop fs -moveToLocal [-crc] <src> <dst>
Displays a "Not implemented yet" message.

 

 

 

 

Step-by-step explanation

Step by Step
1. Execute the instrutions in the README to build and deploy the HDFS Slurper to a node that has access to HDFS.
2. On that node, create a local source directory, generate a random file and a MD5 hash (the hash will be different from the output below.
$ mkdir -p /tmp/slurper-test/in
$ sudo dd bs=1048576 count=1 skip=0 if=/dev/sda of=/tmp/slurper-test/in/random-file
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.071969 seconds, 14.6 MB/s
$ md5sum /tmp/slurper-test/in/random-file
969249981fa294b1273b91ec4dc3d34b  /tmp/slurper-test/in/random-file
3. Edit conf/slurper-env.sh and set your JAVA_HOME and HADOOP_HOME settings.
4. Run the HDFS Slurper in standalone mode.

bin/slurper.sh \
 --config-file /path/to/slurper/conf/examples/test.conf
5. Verify that the file was copied into HDFS
$ fs -ls /tmp/slurper-test/dest/random-file
Found 1 items
-rw-r--r--   1 user group    1048576 2012-01-17 21:09 /tmp/slurper-test/dest/random-file
6. Get the MD5 hash of the file in HDFS and verify it's the same as the original MD5 in step 2
$ fs -cat /tmp/slurper-test/dest/random-file | md5sum
969249981fa294b1273b91ec4dc3d34b  -

 

Example
#!/bin/bash

#use today's date and time
day=$(date +%Y-%m-%d)

#change to log directory
cd /var/log/qradar

#move and add time date to file name
mv qradar.log qradar$day.log

#load file into variable
#copy file from local to hdfs cluster

if [ -f qradar$day.log ]

then
   file=qradar$day.log
   hadoop dfs -put /var/log/qradar/$file /user/qradar

else
   echo "failed to rename and move the file into the cluster" >> /var/log/messages

fi

 

References
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FileSystemShell.html
https://www.alluxio.io/learn/hdfs/basic-file-operations-commands/