hadoop mapreduce에서 별도의 출력 파일

내 질문에 이미 질문을 받았지만 내 질문에 명확한 답을 찾을 수 없습니다.hadoop mapreduce에서 별도의 출력 파일

내 MapReduce는 기본 WordCount입니다. 현재 출력 파일은 다음과 같습니다.

// filename : 'part-r-00000' 
789 a 
755 #c 
456 d 
123 #b

출력 파일 이름을 어떻게 바꿀 수 있습니까?

그런 다음, IS-가능 2 개 출력 파일을 가지고 :

public class TweetPartitionner extends Partitioner<Text, IntWritable>{ 

    @Override 
    public int getPartition(Text a_key, IntWritable a_value, int a_nbPartitions) { 
     if(a_key.toString().startsWith("#")) 
      return 1; 
     return 0; 
    } 


}

덕분에 많은 :

여기

public static class SortReducer extends Reducer<IntWritable, Text, IntWritable, Text> { 

    public void reduce(IntWritable key, Text value, Context context) throws IOException, InterruptedException { 

     context.write(key, value); 

    } 
}

내 Partitionner 클래스의 : 여기 내 줄일 클래스의

// First output file 
789 a 
456 d 

// Second output file 
123 #b 
755 #c

을 ! 작업 파일에서

출처

2013-06-25 Apaachee

는 매퍼 파티션 프로그램 검사에서 작업 설정에 파티션 프로그램을 추가,

a 789 
#c 755  
d 456 
#b 123

은 파티션 프로그램을 쓰기 방출에서

job.setNumReduceTasks(2);

설정하면 한 다른 # 반환 0

와 키 시작

(감속기 스왑 키 및 값)

출처

2013-06-25 10:55:34 banjara

고맙습니다 zuxqoj, 좋은 해결책 인 것 같습니다. 그래서 내 Partitionner로 내 게시물을 업데이 트했습니다. 하지만 프로그램을 실행할 때 오류가 있습니다 : java.io.IOException : #rescinfo (1)에 대한 잘못된 파티션 이유는 무엇입니까? – Apaachee

나는 해결책의 시작을 내뱉었다 : http://stackoverflow.com/questions/12928101/hadoop-number-of-reducer-is-not-equal-to-what-i-have-set-in-program – Apaachee

Eclipse can ONE 감속기만 발사. 내 Hadoop 설치가 내 컴퓨터의 cygwin에 있습니다. 내 설치로 다른 감속기를 어떻게 할 수 있습니까? – Apaachee

c 출력 파일 이름을 바꾸면 http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html#write(java.lang.String, K, V)를 볼 수 있습니다.

출처

2013-06-25 11:13:24

hadoop mapreduce에서 별도의 출력 파일

답변

관련 문제