다음으로 내 hadoop 프로그램을 처리 할 수없는 이유는 무엇입니까?

여러분! 나는 이클립스에서 하둡에 대한 프로그램을 가지고 있고, 소스 코드는 다음과 같습니다다음으로 내 hadoop 프로그램을 처리 할 수없는 이유는 무엇입니까?

public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { 
     private final static IntWritable one = new IntWritable(1); 
    private Text word = new Text(); 
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { 
     StringTokenizer itr = new StringTokenizer(value.toString()); 
     while(itr.hasMoreTokens()) { 
      word.set(itr.nextToken()); 
      context.write(word, one); 
     } 
    } 
} 

public class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { 
    private IntWritable result = new IntWritable(); 
    @Override 
    protected void reduce(Text key, Iterable<IntWritable> values, 
      Context context) throws IOException, InterruptedException { 
     int sum = 0; 
     for(IntWritable val : values) { 
      sum += val.get(); 
     } 
     result.set(sum); 
     context.write(key, result); 
    } 
} 

public class WordCount { 
    public static void main(String[] args) throws Exception { 
     Configuration conf = new Configuration(); 
     String[] oargs = new GenericOptionsParser(conf, args).getRemainingArgs(); 
     if(oargs.length != 2) { 
      System.err.println("Usage: word count <in> <out>"); 
     } 
     System.out.println("input: "+oargs[0]); 
     System.out.println("output: "+oargs[1]); 
     Job job = new Job(conf, "word count"); 
     job.setJarByClass(WordCount.class); 
     job.setMapperClass(TokenizerMapper.class); 
     job.setCombinerClass(IntSumReducer.class); 
     job.setReducerClass(IntSumReducer.class); 
     job.setOutputKeyClass(Text.class); 
     job.setOutputValueClass(IntWritable.class); 
     FileInputFormat.addInputPath(job, new Path(oargs[0])); 
     FileOutputFormat.setOutputPath(job, new Path(oargs[1])); 
     System.out.println("=============================="); 
     System.out.println("start ..."); 
     boolean flag = job.waitForCompletion(true); 
      System.out.println(flag); 
     System.out.println("end ..."); 
     System.out.println("=============================="); 
    } 
}

결과는 로그를 참조하십시오된다

[email protected] /cygdrive/f/develop/hadoop/hadoop-1.0.3 
$ ./bin/hadoop jar ./jar/wordcount.jar /tmp/input /tmp/output 
input: /tmp/input 
output: /tmp/output 
============================== 
start ... 
12/07/25 14:59:17 INFO input.FileInputFormat: Total input paths to process : 2 
12/07/25 14:59:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
12/07/25 14:59:17 WARN snappy.LoadSnappy: Snappy native library not loaded 
12/07/25 14:59:17 INFO mapred.JobClient: Running job: job_201207251447_0001 
12/07/25 14:59:18 INFO mapred.JobClient: map 0% reduce 0%

가 로그에 가서 영원히 멈추지되지 않습니다 . 왜?

Windows XP 시스템에서 cygwin 소프트웨어로 로컬 모드에서 코드를 실행하고 있습니다.

출처

2012-07-23 rory

"다음에 수행"이란 의미는 무엇입니까? 당신이해야 할 일은 무엇입니까? 일반적으로 클러스터가이 작업을 처리하고 다시 돌아올 때까지 기다려야합니다. 이는 waitForCompletion의 의미입니다. 작업이 성공적이지 않은 경우 JVM이 존재합니다. –

실행해야하는 2 가지 맵 작업 중 하나의 작업 로그를 게시 할 수 있습니까? 당신은 job tracker web ui, http : // localhost : 50030 –

@ Rory 토마스가 묻는 것처럼 "do it next"? 이 것이 화면에있는 전체 스택 추적입니까? 한 번 컴파일 한 다음 결과를 얻었지만 다시 실행할 수 없음을 의미합니까? 이클립스 IDE에 입력란에 올바른 입력 인수 을 지정 했습니까?

두 번째로 프로그램을 다시 실행할 수없는 경우 다른 출력 디렉터리를 지정하지 않았을 수 있습니다. 하지만 스택 추적을 본 후에는 그렇지 않습니다.

출처

2012-07-23 13:37:13

을 통해 이들에 접근 할 수 있습니다. Arun, 제 코드가'job.waitForCompletion (true)'로 디버그 될 때 제 코드가 계속해서 멈추지 않을 것입니다. – rory

난 당신이 다음 코드를 확인하십시오 end ====================에 println 부분을 보지 못할 이유를 요구하는 경우 같아요

System.exit(job.waitForCompletion(true)?0:1); 
System.out.println("end ..."); 
System.out.println("==============================");

당신은 따라서 JVM은 이전에 종료됩니다하는 System.exit으로 job.waitForCompletion(true) 전화를 포장하고를 마지막 두 System.out을 실행할 수 있습니다. 여기

편집

로그 펜더/로거 메시지는 다른 예외가 아마 삼켜되고 있다는 단서입니다. 다음과 같이, 당신은 대체해야합니다 (

public class WordCount { 
    public static void main(String[] args) throws Exception { 
    ToolRunner.run(new WordCount(), args); 
    } 

    public int run(String args[]) { 
    if(args.length != 2) { 
     System.err.println("Usage: word count <in> <out>"); 
    } 
    System.out.println("input: "+args[0]); 
    System.out.println("output: "+args[1]); 
    Job job = new Job(getConf(), "word count"); 
    Configuration conf = job.getConf(); 

    job.setJarByClass(WordCount.class); 
    job.setMapperClass(TokenizerMapper.class); 
    job.setCombinerClass(IntSumReducer.class); 
    job.setReducerClass(IntSumReducer.class); 
    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(IntWritable.class); 

    FileInputFormat.addInputPath(job, new Path(args[0])); 
    FileOutputFormat.setOutputPath(job, new Path(args[1])); 

    System.out.println("=============================="); 
    System.out.println("start ..."); 
    int result = job.waitForCompletion(true) ? 0 : 1; 
    System.out.println("end ..."); 
    System.out.println("=============================="); 

    return results 
    } 
}

그리고 클러스터에 작업을 제출하기 위해 $ HADOOP_HOME/빈/하둡 스크립트를 사용한다 : 당신은 ToolRunner 유틸리티를 사용하는 코드의 서명을 개정해야 항아리 이름 및 WordCount 클래스의 정규화 된 이름) :

#> hadoop jar wordcount.jar WordCount input output

출처

2012-07-24 00:48:30

크리스! "끝내보고 싶습니다"끝 "println,하지만 내 문제는"System.exit ", 내 코드 디버그"job.waitForCompletion (사실 "), 내 코드가 진행되지 않습니다. – rory

크리스! 나는'end = end original을보고 싶다. 그러나 println하지만 내 문제는'System.exit' 때문이 아니다. 내 코드가'job으로 디버그 될 때. waitForCompletion (true)', 내 코드가 계속 실행되지 않고 영원히 멈추는 경우 – rory

작업이 제출 되나요? 경고 메시지가 의심스러워 보입니다. appenders/loggers에 대한 경고 메시지가 표시되지 않습니다. –

다음으로 내 hadoop 프로그램을 처리 할 수없는 이유는 무엇입니까?

답변

관련 문제