Hadoop에서 현재 파일 이름을 얻는 방법 Reduce

예를 들어 WordCount을 사용 중이며 축소 기능에서 파일 이름을 가져와야합니다.Hadoop에서 현재 파일 이름을 얻는 방법 Reduce

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { 
    public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { 
    int sum = 0; 
    while (values.hasNext()) { 
     sum += values.next().get(); 
    } 
    String filename = ((FileSplit)(.getContext()).getInputSplit()).getPath().getName(); 
    // ----------------------------^ I need to get the context and filename! 
    key.set(key.toString() + " (" + filename + ")"); 
    output.collect(key, new IntWritable(sum)); 
    } 
}

위의 수정 된 코드는 현재 단어의 파일 이름을 인쇄하고 싶습니다. Java Hadoop: How can I create mappers that take as input files and give an output which is the number of lines in each file?을 시도했지만 context 개체를 가져올 수 없습니다.

나는 hadoop을 처음 사용하여이 도움이 필요합니다. 도움이 필요한가요?

출처

2013-12-17 Praveen Kumar

context은 "새로운 API"의 구성이고 "이전 API"를 사용하기 때문에 context을 얻을 수 없습니다.

체크 아웃이 단어 대신에 예 수 : http://wiki.apache.org/hadoop/WordCount

이 경우에 감소 함수의 서명을 참조하십시오

public void reduce(Text key, Iterable<IntWritable> values, Context context)

하세요! 문맥! 이 예에서는 .mapred. 대신 .mapreduce.에서 가져옵니다.

이것은 새로운 hadoop 사용자에게 공통적 인 문제이므로 기분 나쁘게 생각하지 마십시오. 일반적으로 여러 가지 이유로 새로운 API를 고수하고 싶습니다. 그러나 당신이 찾은 예를 매우 조심하십시오. 또한 새 API와 이전 API는 상호 운용 될 수 없습니다 (예 : 새로운 API 매퍼와 이전 API 축소 기가있을 수 없음).

출처

2013-12-17 18:09:58

오래된 api 대신 새로운 api를 선호하는 이유는 무엇입니까? 둘 다 지원 될 것이라고 생각합니다. 아마도 나는 최신 정보가 아닙니다. –

이전 API의 reduce 함수에서 파일 이름을 얻는 방법은 무엇입니까? –

이전 MR API (org.apache.hadoop.mapred 패키지)를 사용하여 아래에 매퍼/축소 기 클래스에 추가하십시오. 새로운 MR의 API (org.apache.hadoop.mapreduce 패키지)를 사용하여

는

String fileName = new String(); 
public void configure(JobConf job) 
{ 
    filename = job.get("map.input.file"); 
}

, 매퍼/감속기 클래스에 아래를 추가합니다.

String fileName = new String(); 
protected void setup(Context context) throws java.io.IOException, java.lang.InterruptedException 
{ 
    fileName = ((FileSplit) context.getInputSplit()).getPath().toString(); 
}

출처

2013-12-17 18:26:58

나는이 방법으로 사용하고 작동합니다!

개선 할 수 있는지 알려주세요.

출처

2013-12-17 18:40:28

Hadoop에서 현재 파일 이름을 얻는 방법 Reduce

답변

관련 문제