question archive Project Task: I need a script that will transfer data files into hdfs
Subject:Computer SciencePrice:9.82 Bought3
Project Task:
To provide directory name & file name. Suppose a file called input.txt is located at hdfs path /user/target directory.We can pass the total file path like /user/target/input.txt or
<input_directory> <sourcefile> as two different parameter.
Currently we are using below steps :
1. Downlading HDFS file to local unix system by below command :
Hdfs dfs -copyToLocal (HDFS file path) (Local directory path)
2. Adding sequnce generated No :
Code:
awk '{printf "%06d,",NR} 1' File.txt >File_Output.txt
Code 1:
#! /bin/bash
#Downloading HDFS file to Local Unix & Reformatting
hdfs dfs -copyToLocal /user/target/file.txt .
awk '{printf "%06d|",NR} 1' file.txt >output.txt
Code2:
case "$#" in
2)
hdfs dfs -copyToLocal "$1"/"$2" .
FILE="$2"
;;
1)
hdfs dfs -copyToLocal "$1" .
OLDIFS="$IFS"
# Split $1="a/b/filename" into $1="a", $2="b", $3="filename"
IFS="/"
set -- $1
IFS="$OLDIFS"
# Get rid of "a", "b"
shift "$(( $# - 1 ))
FILE="$1"
;;
*)
echo "Usage: $0 path file"
echo "Alternate usage: $0 path/file"
exit 1
;;
esac
awk '{printf "%06d|",NR} 1' "$FILE" >output.txt
Short Script
Command to moveFromLocal
Usage: hadoop fs -moveFromLocal <localsrc> <dst>
Similar to put command, except that the source localsrc is deleted after it's copied.
Command to moveToLocal
Usage: hadoop fs -moveToLocal [-crc] <src> <dst>
Displays a "Not implemented yet" message.
Step-by-step explanation
Step by Step
1. Execute the instrutions in the README to build and deploy the HDFS Slurper to a node that has access to HDFS.
2. On that node, create a local source directory, generate a random file and a MD5 hash (the hash will be different from the output below.
$ mkdir -p /tmp/slurper-test/in
$ sudo dd bs=1048576 count=1 skip=0 if=/dev/sda of=/tmp/slurper-test/in/random-file
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.071969 seconds, 14.6 MB/s
$ md5sum /tmp/slurper-test/in/random-file
969249981fa294b1273b91ec4dc3d34b /tmp/slurper-test/in/random-file
3. Edit conf/slurper-env.sh and set your JAVA_HOME and HADOOP_HOME settings.
4. Run the HDFS Slurper in standalone mode.
bin/slurper.sh \
--config-file /path/to/slurper/conf/examples/test.conf
5. Verify that the file was copied into HDFS
$ fs -ls /tmp/slurper-test/dest/random-file
Found 1 items
-rw-r--r-- 1 user group 1048576 2012-01-17 21:09 /tmp/slurper-test/dest/random-file
6. Get the MD5 hash of the file in HDFS and verify it's the same as the original MD5 in step 2
$ fs -cat /tmp/slurper-test/dest/random-file | md5sum
969249981fa294b1273b91ec4dc3d34b -
Example
#!/bin/bash
#use today's date and time
day=$(date +%Y-%m-%d)
#change to log directory
cd /var/log/qradar
#move and add time date to file name
mv qradar.log qradar$day.log
#load file into variable
#copy file from local to hdfs cluster
if [ -f qradar$day.log ]
then
file=qradar$day.log
hadoop dfs -put /var/log/qradar/$file /user/qradar
else
echo "failed to rename and move the file into the cluster" >> /var/log/messages
fi
References
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FileSystemShell.html
https://www.alluxio.io/learn/hdfs/basic-file-operations-commands/