Java : hadoop reducer의 출력 파일을 읽으십시오.

hadoop의 mapreduce 최종 출력을 읽고 분석하려고합니다. 아래는 "작업"파일에있는 코드의 일부입니다. FileSystem (Hadoop API)을 사용하여 출력 파일을 읽으 려하지만, 이중 트윈 별 사이에 굵게 강조 표시된 코드를 어디에 두어야하는지에 관해 질문이 있습니다. system.exit 아래에 넣으면 코드가 건너 뛸 수 있습니다.Java : hadoop reducer의 출력 파일을 읽으십시오.

public static void main(String[] args) throws Exception { 
     Configuration conf = new Configuration(); 
     String[] otherArgs = new GenericOptionsParser(conf, args) 
       .getRemainingArgs(); 
     if (otherArgs.length != 3) { 
      System.err.println("Usage: format is <in> <out> <keyword>"); 
      System.exit(2); 
     } 

     **Path distCache = new Path("/"); 
     String fileSys = conf.get("fs.default.name"); 
     HashMap<String, Integer> jobCountMap = new HashMap<String, Integer>();** 

     conf.set("jobTest", otherArgs[2]); 
     Job job = new Job(conf, "job count"); 
     job.setJarByClass(JobResults.class); 
     job.setMapperClass(JobMapper.class); 
     job.setCombinerClass(JobReducer.class); 
     job.setReducerClass(JobReducer.class); 

     job.setOutputKeyClass(Text.class); 
     job.setOutputValueClass(IntWritable.class); 
     FileInputFormat.addInputPath(job, new Path(otherArgs[0])); 
     FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); 

     distCache = new Path(args[2]); 
    //  FileSystem fs = distCache.getFileSystem(conf); // for Amazon AWS 
     if (fileSys.split(":")[0].trim().equalsIgnoreCase("s3n")) distCache = new Path("s3n:/" + distCache); 

     FileSystem fs = FileSystem.get(conf);   // for local cluster 

     Path pathPattern = new Path(distCache, "part-r-[0-9]*"); 
     FileStatus[] list = fs.globStatus(pathPattern); 

     for (FileStatus status : list) 
     { 
//   DistributedCache.addCacheFile(status.getPath().toUri(), conf); 
      try { 
      BufferedReader brr = new BufferedReader(new FileReader(status.getPath().toString())); 
          String line; 
       while ((line = brr.readLine()) != null) 
       { 
        String[] resultsCount = line.split("\\|"); 
        jobCountMap.put(resultsCount[0], Integer.parseInt(resultsCount[1].trim())); 
       } 
      } catch (FileNotFoundException e) 
      { 
       e.printStackTrace(); 
      } catch (IOException e) 
      { 
       e.printStackTrace(); 
      } 
     } 

     System.out.println("the size of Hashmap is: " + jobCountMap.size()); 
     System.exit(job.waitForCompletion(true) ? 0 : 1); 
    }

출처

2013-10-26 TonyGW

System.exit 문제는 매우 쉽게 해결할 수 있습니다. 당신이 한 장소 :

System.out.println("the size of Hashmap is: " + jobCountMap.size()); 
    System.exit(job.waitForCompletion(true) ? 0 : 1);

대신 다음과 같은 장소 :

System.out.println("the size of Hashmap is: " + jobCountMap.size()); 
boolean completionStatus = job.waitForCompletion(true); 

//your code here 

if(completionStatus==true){ 
    System.exit(0) 
}else{ 
    System.exit(1) 
}

이 당신이 원하는 경우 두 번째 작업을 시작 포함, 당신은 당신의 주요 함수 내에서 원하는 처리를 실행할 수 있도록해야한다.

출처

2013-10-28 13:34:17

Java : hadoop reducer의 출력 파일을 읽으십시오.

답변

관련 문제