(해결 된) HADOOP - Mapreduce - 모든 키에 대해 동일한 값을 얻습니다.

mapreduce에 문제가 있습니다. 입력으로 노래 목록 ("Songname"# "UserID"# "boolean")을 제공해야합니다. 다른 사용자가 듣는 시간을 지정하는 노래 목록을 결과로 가져야합니다. 그래서 ''출력 ("Songname ","timelistening "). 한 쌍만 허용하기 위해 해시 테이블을 사용했습니다. 짧은 파일에서는 잘 작동하지만 입력으로 1000000 개의 레코드 목록을 입력하면 모든 레코드에 대해 동일한 값 (20)을 반환합니다.(해결 된) HADOOP - Mapreduce - 모든 키에 대해 동일한 값을 얻습니다.

이 내 매퍼입니다 :

public static class CanzoniMapper extends Mapper<Object, Text, Text, IntWritable>{ 

    private IntWritable userID = new IntWritable(0); 
    private Text song = new Text(); 

    public void map(Object key, Text value, Context context) throws IOException, InterruptedException { 
     String[] caratteri = value.toString().split("#"); 
     if(caratteri[2].equals("1")){ 
      song.set(caratteri[0]); 
      userID.set(Integer.parseInt(caratteri[1])); 
      context.write(song,userID); 
     } 
    } 
    }

이 내 감속기입니다 :

public static class CanzoniReducer extends Reducer<Text,IntWritable,Text,IntWritable> { 
    private IntWritable result = new IntWritable(); 

    public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { 
     Hashtable<IntWritable,Text> doppioni = new Hashtable<IntWritable,Text>(); 
     for (IntWritable val : values) { 
     doppioni.put(val,key); 
     } 
     result.set(doppioni.size()); 
     doppioni.clear(); 
     context.write(key,result); 
    } 
    }

및 주요 :

Configuration conf = new Configuration(); 

    Job job = new Job(conf, "word count"); 
    job.setJarByClass(Canzoni.class); 
    job.setMapperClass(CanzoniMapper.class); 
    //job.setCombinerClass(CanzoniReducer.class); 
    //job.setNumReduceTasks(2); 
    job.setReducerClass(CanzoniReducer.class); 
    job.setOutputKeyClass(Text.class); 
    job.setOutputValueClass(IntWritable.class); 
    FileInputFormat.addInputPath(job, new Path(args[0])); 
    FileOutputFormat.setOutputPath(job, new Path(args[1])); 
    System.exit(job.waitForCompletion(true) ? 0 : 1);

어떤 생각 ???

출처

2012-09-06 Pietro Luciani

아마 내가 해결했습니다. 그것은 입력 문제입니다. 노래 수에 비해 너무 많은 레코드가 있었기 때문에이 레코드 목록에는 각 노래가 각 사용자별로 적어도 한 번 이상 나열되었습니다. 테스트에서 20 명의 사용자가 있었기 때문에 자연스럽게 결과에 따라 20 곡이 나옵니다. 다른 노래 수를 늘려야합니다.

출처

2012-09-07 10:41:24

(해결 된) HADOOP - Mapreduce - 모든 키에 대해 동일한 값을 얻습니다.

답변

관련 문제