MapReduce 작업 입력으로 S3 사용

아마존 S3에서 파일을 읽고 로컬 hdfs에서 데이터를 처리하는 MR 작업이 있습니다. 파일은 .gz로 압축 된 텍스트 파일입니다. 아래의 작업을 설정하려고했지만 작동하지 않을 수 있습니다. 파일을 먼저 압축 해제하려면 추가 단계를 추가해야합니까?MapReduce 작업 입력으로 S3 사용

감사합니다.

String S3_LOCATION = "s3n://access_key:[email protected]_name" 

protected void prepareHadoopJob() throws Exception {  

    this.getHadoopJob().setMapperClass(Mapper1.class); 
    this.getHadoopJob().setInputFormatClass(TextInputFormat.class); 

FileInputFormat.addInputPath(this.getHadoopJob(), new Path(S3_LOCATION)); 

this.getHadoopJob().setNumReduceTasks(0); 
this.getHadoopJob().setOutputFormatClass(TableOutputFormat.class); 
this.getHadoopJob().getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, myTable.getTableName()); 
this.getHadoopJob().setOutputKeyClass(ImmutableBytesWritable.class); 
this.getHadoopJob().setOutputValueClass(Put.class); 
}

출처

2012-07-20 user468587

일반적으로 먼저 파일을 압축 해제 할 필요가 없습니다하지만 오류 메시지의 모든 세부 사항없이 무엇이 잘못되었는지 파악하기 힘든

출처

2012-07-20 11:27:12

MapReduce 작업 입력으로 S3 사용

답변

관련 문제