hadoop 스트리밍 오류, 파이썬으로 mapreduce

안녕하세요 환경에 초보자입니다.이 오류를 해결하는 방법에 대한 아이디어가 있습니까? 아니면이 오류의 원인이 무엇일 수 있습니까?hadoop 스트리밍 오류, 파이썬으로 mapreduce

[email protected]:~/hduser/hadoop$ sudo ./bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -file /home/hduser/map.py -mapper /home/hduser/map.py -file /home/hduser/red.py -reducer /home/hduser/red.py -input /home/hduser/tmp/cddb.txt -output /home/hduser/op1 
packageJobJar: [/home/hduser/map.py, /home/hduser/red.py] [] /tmp/streamjob7455767556382290755.jar tmpDir=null 
13/06/20 12:43:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library 
13/06/20 12:43:55 WARN snappy.LoadSnappy: Snappy native library not loaded 
13/06/20 12:43:55 INFO mapred.FileInputFormat: Total input paths to process : 1 
13/06/20 12:43:55 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir. 
13/06/20 12:43:56 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-root/mapred/local] 
13/06/20 12:43:56 INFO streaming.StreamJob: Running job: job_local_0001 
13/06/20 12:43:56 INFO streaming.StreamJob: Job running in-process (local Hadoop) 
13/06/20 12:43:56 INFO util.ProcessTree: setsid exited with exit code 0 
13/06/20 12:43:56 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 
13/06/20 12:43:56 INFO mapred.MapTask: numReduceTasks: 1 
13/06/20 12:43:56 INFO mapred.MapTask: io.sort.mb = 100 
13/06/20 12:43:56 INFO mapred.MapTask: data buffer = 79691776/99614720 
13/06/20 12:43:56 INFO mapred.MapTask: record buffer = 262144/327680 
13/06/20 12:43:56 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/hduser/hadoop/./map.py] 
13/06/20 12:43:56 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 
13/06/20 12:43:57 INFO streaming.StreamJob: map 0% reduce 0% 
13/06/20 12:44:02 INFO mapred.LocalJobRunner: file:/home/hduser/tmp/cddb.txt:0+1205 
13/06/20 12:44:03 INFO streaming.StreamJob: map 100% reduce 0% 
13/06/20 12:48:11 INFO streaming.PipeMapRed: Records R/W=9/1 
13/06/20 12:48:11 INFO streaming.PipeMapRed: MRErrorThread done 
13/06/20 12:48:11 INFO streaming.PipeMapRed: mapRedFinished 
13/06/20 12:48:11 INFO mapred.MapTask: Starting flush of map output 
13/06/20 12:48:11 INFO mapred.MapTask: Finished spill 0 
13/06/20 12:48:11 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 
13/06/20 12:48:11 INFO mapred.LocalJobRunner: Records R/W=9/1 
13/06/20 12:48:11 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done. 
13/06/20 12:48:11 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 
13/06/20 12:48:11 INFO mapred.LocalJobRunner: 
13/06/20 12:48:11 INFO mapred.Merger: Merging 1 sorted segments 
13/06/20 12:48:11 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1356 bytes 
13/06/20 12:48:11 INFO mapred.LocalJobRunner: 
13/06/20 12:48:11 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/hduser/hadoop/./red.py] 
13/06/20 12:48:11 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 
13/06/20 12:48:11 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s] 
Traceback (most recent call last): 
    File "/home/hduser/hduser/hadoop/./red.py", line 30, in <module> 
    main() 
    File "/home/hduser/hduser/hadoop/./red.py", line 19, in main 
    for similarity, group in groupby(data, itemgetter(0), reverse=True): 
TypeError: groupby() takes at most 2 arguments (3 given) 
13/06/20 12:48:11 INFO streaming.PipeMapRed: MRErrorThread done 
13/06/20 12:48:11 INFO streaming.PipeMapRed: PipeMapRed failed! 
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576) 
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) 
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529) 
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) 
13/06/20 12:48:11 WARN mapred.LocalJobRunner: job_local_0001 
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) 
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576) 
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) 
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529) 
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) 
13/06/20 12:48:12 INFO streaming.StreamJob: Job running in-process (local Hadoop) 
13/06/20 12:48:12 ERROR streaming.StreamJob: Job not successful. Error: NA 
13/06/20 12:48:12 INFO streaming.StreamJob: killJob... 
Streaming Command Failed!

내가 하둡 1.0.4를 사용하고, 그리고지도는 파이썬에서 감소 쓴 을 (하둡 스트리밍이 사용됩니다).

출처

2013-06-20 rosnikv

제발, 귀하의 질문의 본문에 코드를 게시 - 코드 블록 (아무 pastebin) –

오류는 분명하다 :

Traceback (most recent call last): 
    File "/home/hduser/hduser/hadoop/./red.py", line 30, in <module> 
    main() 
    File "/home/hduser/hduser/hadoop/./red.py", line 19, in main 
    for similarity, group in groupby(data, itemgetter(0), reverse=True): 
TypeError: groupby() takes at most 2 arguments (3 given)

GROUPBY은 2 개 인자를 받아들입니다. 여기에 groupby의 문서가 있습니다.

출처

2013-06-20 07:51:47 zsxwing

hadoop 스트리밍 오류, 파이썬으로 mapreduce

답변

관련 문제