Hadoop 파일 읽기

hadoop 2.2.0의 Hadoop 분산 캐시 워드 카운트 예제. mapper 클래스의 설정에서 사용할 hdfs 파일 시스템에 파일을 복사했습니다. 파일 : 드라이버 메인 클래스Hadoop 파일 읽기

Configuration conf = new Configuration(); 
    String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs(); 
    if (otherArgs.length != 3) 
    { 
     System.err.println("Usage: wordcount <in> <out>"); 
     System.exit(2); 
    } 
    Job job = new Job(conf, "word_count"); 
    job.setJarByClass(WordCount.class); 
    job.setMapperClass(Map.class); 
    job.setReducerClass(Reduce.class); 
    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(IntWritable.class); 
    FileInputFormat.addInputPath(job, new Path(otherArgs[0])); 
    Path outputpath=new Path(otherArgs[1]); 
    outputpath.getFileSystem(conf).delete(outputpath,true); 
    FileOutputFormat.setOutputPath(job,outputpath); 
    System.out.println("CachePath****************"+otherArgs[2]); 
    DistributedCache.addCacheFile(new URI(otherArgs[2]),job.getConfiguration()); 
    System.exit(job.waitForCompletion(true) ? 0 : 1);

그러나 점점 예외

java.io.FileNotFoundException 내부

protected void setup(Context context) throws IOException,InterruptedException 
{ 
     Path[] uris = DistributedCache.getLocalCacheFiles(context.getConfiguration()); 
     cacheData=new HashMap<String, String>(); 

     for(Path urifile: uris) 
     { 
     try 
     { 

     BufferedReader readBuffer1 = new BufferedReader(new FileReader(urifile.toString())); 
     String line; 
     while ((line=readBuffer1.readLine())!=null) 
     {  System.out.println("**************"+line); 
       cacheData.put(line,line); 
     } 
     readBuffer1.close(); 
     }  
     catch (Exception e) 
     { 
        System.out.println(e.toString()); 
     } 
     } 

}

/홈/user12/tmp를/mapred/지역/1408960542382/캐시 (그런 파일 또는 디렉토리)

그래서 캐시 기능이 제대로 작동하지 않습니다. 아이디어가 있으십니까?

출처

2014-08-25 user3684584

이 해결되었습니다. 파일 위치를 잘못 부여하였습니다. 이제 정상적으로 작동합니다.

출처

2014-08-26 07:52:04 user3684584

Hadoop 파일 읽기

답변

관련 문제