python - Pipe command line hadoop streaming job -
i want pipe hadoop streaming job. example had run command hadoop jar hadoop-streaming.jar -mapper map1.py -reducer reducer.py -input xx -output /output1
but want use output step 1 input step 2 of mapreduce job without storing in hdfs maybe output stdout. there linux pipe? such hadoop jar hadoop-streaming.jar -mapper map1.py -reducer reducer.py -input xx | hadoop jar hadoop-streaming.jar -mapper map2.py -reducer reducer2.py -output /output
i had same problem , ended using bash/shell script run hadoop streaming command. created file called hadoop.sh contained following:
rm -r output | bin/hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar -files /hadoop-2.7.3/script/mapper.php -input /data/* -output output -mapper "php mapper.php" -jobconf mapred.reduce.tasks=1 #add beginning/ending php file ex -sc '1i|<?php' -c '$a|?>' -cx output/part-00000 #move file /output /script mv /hadoop-2.7.3/output/part-00000 /hadoop-2.7.3/script/part-00000.php
the part-00000 file becomes part0000.php file next hadoop command.
Comments
Post a Comment