2012-08-06 3 views
0

나는 아파치의 Hadoop을 실행 중이며 그 설치에서 제공되는 grep 예제를 사용하고있다. 왜지도의 백분율이 두 번 나타나는지 궁금합니다. 나는 한 번만 달려야한다고 생각했다. 그래서 나는지도에 대한 나의 이해를 의심스럽게 만든다. 나는 그것을 보았다 (http://grokbase.com/t/gg/mongodb-user/125ay1eazq/map-reduce-percentage-seems-running-twice). 그러나 정말로 설명이 없었고이 링크는 MongoDB를위한 것이었다.hadoop showing map 두번 실행되는 백분율을 줄이자.

[email protected]:/usr/local/hadoop$ bin/hadoop jar hadoop*examples*.jar grep /user/hduser/grep /user/hduser/grep-output4 ".*woe is me.*" 

저는 이것을 프로젝트 gutenberg .txt 파일에서 실행하고 있습니다. 출력 파일이 올 바릅니다.

12/08/06 06:56:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library 
12/08/06 06:56:57 WARN snappy.LoadSnappy: Snappy native library not loaded 
12/08/06 06:56:57 INFO mapred.FileInputFormat: Total input paths to process : 1 
12/08/06 06:56:58 INFO mapred.JobClient: Running job: job_201208030925_0011 
12/08/06 06:56:59 INFO mapred.JobClient: map 0% reduce 0% 
12/08/06 06:57:18 INFO mapred.JobClient: map 100% reduce 0% 
12/08/06 06:57:30 INFO mapred.JobClient: map 100% reduce 100% 
12/08/06 06:57:35 INFO mapred.JobClient: Job complete: job_201208030925_0011 
12/08/06 06:57:35 INFO mapred.JobClient: Counters: 30 
12/08/06 06:57:35 INFO mapred.JobClient: Job Counters 
12/08/06 06:57:35 INFO mapred.JobClient:  Launched reduce tasks=1 
12/08/06 06:57:35 INFO mapred.JobClient:  SLOTS_MILLIS_MAPS=31034 
12/08/06 06:57:35 INFO mapred.JobClient:  Total time spent by all reduces waiting after reserving slots (ms)=0 
12/08/06 06:57:35 INFO mapred.JobClient:  Total time spent by all maps waiting after reserving slots (ms)=0 
12/08/06 06:57:35 INFO mapred.JobClient:  Rack-local map tasks=2 
12/08/06 06:57:35 INFO mapred.JobClient:  Launched map tasks=2 
12/08/06 06:57:35 INFO mapred.JobClient:  SLOTS_MILLIS_REDUCES=11233 
12/08/06 06:57:35 INFO mapred.JobClient: File Input Format Counters 
12/08/06 06:57:35 INFO mapred.JobClient:  Bytes Read=5592666 
12/08/06 06:57:35 INFO mapred.JobClient: File Output Format Counters 
12/08/06 06:57:35 INFO mapred.JobClient:  Bytes Written=391 
12/08/06 06:57:35 INFO mapred.JobClient: FileSystemCounters 
12/08/06 06:57:35 INFO mapred.JobClient:  FILE_BYTES_READ=281 
12/08/06 06:57:35 INFO mapred.JobClient:  HDFS_BYTES_READ=5592862 
12/08/06 06:57:35 INFO mapred.JobClient:  FILE_BYTES_WRITTEN=65331 
12/08/06 06:57:35 INFO mapred.JobClient:  HDFS_BYTES_WRITTEN=391 
12/08/06 06:57:35 INFO mapred.JobClient: Map-Reduce Framework 
12/08/06 06:57:35 INFO mapred.JobClient:  Map output materialized bytes=287 
12/08/06 06:57:35 INFO mapred.JobClient:  Map input records=124796 
12/08/06 06:57:35 INFO mapred.JobClient:  Reduce shuffle bytes=287 
12/08/06 06:57:35 INFO mapred.JobClient:  Spilled Records=10 
12/08/06 06:57:35 INFO mapred.JobClient:  Map output bytes=265 
12/08/06 06:57:35 INFO mapred.JobClient:  Total committed heap usage (bytes)=336404480 
12/08/06 06:57:35 INFO mapred.JobClient:  CPU time spent (ms)=7040 
12/08/06 06:57:35 INFO mapred.JobClient:  Map input bytes=5590193 
12/08/06 06:57:35 INFO mapred.JobClient:  SPLIT_RAW_BYTES=196 
12/08/06 06:57:35 INFO mapred.JobClient:  Combine input records=5 
12/08/06 06:57:35 INFO mapred.JobClient:  Reduce input records=5 
12/08/06 06:57:35 INFO mapred.JobClient:  Reduce input groups=5 
12/08/06 06:57:35 INFO mapred.JobClient:  Combine output records=5 
12/08/06 06:57:35 INFO mapred.JobClient:  Physical memory (bytes) snapshot=464568320 
12/08/06 06:57:35 INFO mapred.JobClient:  Reduce output records=5 
12/08/06 06:57:35 INFO mapred.JobClient:  Virtual memory (bytes) snapshot=1539559424 
12/08/06 06:57:35 INFO mapred.JobClient:  Map output records=5 
12/08/06 06:57:35 INFO mapred.FileInputFormat: Total input paths to process : 1 
12/08/06 06:57:35 INFO mapred.JobClient: Running job: job_201208030925_0012 
12/08/06 06:57:36 INFO mapred.JobClient: map 0% reduce 0% 
12/08/06 06:57:50 INFO mapred.JobClient: map 100% reduce 0% 
12/08/06 06:58:05 INFO mapred.JobClient: map 100% reduce 100% 
12/08/06 06:58:10 INFO mapred.JobClient: Job complete: job_201208030925_0012 
12/08/06 06:58:10 INFO mapred.JobClient: Counters: 30 
12/08/06 06:58:10 INFO mapred.JobClient: Job Counters 
12/08/06 06:58:10 INFO mapred.JobClient:  Launched reduce tasks=1 
12/08/06 06:58:10 INFO mapred.JobClient:  SLOTS_MILLIS_MAPS=15432 
12/08/06 06:58:10 INFO mapred.JobClient:  Total time spent by all reduces waiting after reserving slots (ms)=0 
12/08/06 06:58:10 INFO mapred.JobClient:  Total time spent by all maps waiting after reserving slots (ms)=0 
12/08/06 06:58:10 INFO mapred.JobClient:  Rack-local map tasks=1 
12/08/06 06:58:10 INFO mapred.JobClient:  Launched map tasks=1 
12/08/06 06:58:10 INFO mapred.JobClient:  SLOTS_MILLIS_REDUCES=14264 
12/08/06 06:58:10 INFO mapred.JobClient: File Input Format Counters 
12/08/06 06:58:10 INFO mapred.JobClient:  Bytes Read=391 
12/08/06 06:58:10 INFO mapred.JobClient: File Output Format Counters 
12/08/06 06:58:10 INFO mapred.JobClient:  Bytes Written=235 
12/08/06 06:58:10 INFO mapred.JobClient: FileSystemCounters 
12/08/06 06:58:10 INFO mapred.JobClient:  FILE_BYTES_READ=281 
12/08/06 06:58:10 INFO mapred.JobClient:  HDFS_BYTES_READ=505 
12/08/06 06:58:10 INFO mapred.JobClient:  FILE_BYTES_WRITTEN=42985 
12/08/06 06:58:10 INFO mapred.JobClient:  HDFS_BYTES_WRITTEN=235 
12/08/06 06:58:10 INFO mapred.JobClient: Map-Reduce Framework 
12/08/06 06:58:10 INFO mapred.JobClient:  Map output materialized bytes=281 
12/08/06 06:58:10 INFO mapred.JobClient:  Map input records=5 
12/08/06 06:58:10 INFO mapred.JobClient:  Reduce shuffle bytes=0 
12/08/06 06:58:10 INFO mapred.JobClient:  Spilled Records=10 

편집 드라이버 클래스를 그렙의 경우 : 파일에서 Grep.java

/** 
* Licensed to the Apache Software Foundation (ASF) under one 
* or more contributor license agreements. See the NOTICE file 
* distributed with this work for additional information 
* regarding copyright ownership. The ASF licenses this file 
* to you under the Apache License, Version 2.0 (the 
* "License"); you may not use this file except in compliance 
* with the License. You may obtain a copy of the License at 
* 
*  http://www.apache.org/licenses/LICENSE-2.0 
* 
* Unless required by applicable law or agreed to in writing, software 
* distributed under the License is distributed on an "AS IS" BASIS, 
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
* See the License for the specific language governing permissions and 
* limitations under the License. 
*/ 
package org.apache.hadoop.examples; 

import java.util.Random; 

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.conf.Configured; 
import org.apache.hadoop.fs.FileSystem; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapred.*; 
import org.apache.hadoop.mapred.lib.*; 
import org.apache.hadoop.util.Tool; 
import org.apache.hadoop.util.ToolRunner; 

/* Extracts matching regexs from input files and counts them. */ 
public class Grep extends Configured implements Tool { 
private Grep() {} // singleton 

public int run(String[] args) throws Exception { 
if (args.length < 3) { 
System.out.println("Grep <inDir> <outDir> <regex> [<group>]"); 
ToolRunner.printGenericCommandUsage(System.out); 
return -1; 
} 

Path tempDir = 
new Path("grep-temp-"+ 
Integer.toString(new Random().nextInt(Integer.MAX_VALUE))); 

JobConf grepJob = new JobConf(getConf(), Grep.class); 

try { 

grepJob.setJobName("grep-search"); 
FileInputFormat.setInputPaths(grepJob, args[0]); 

grepJob.setMapperClass(RegexMapper.class); 
grepJob.set("mapred.mapper.regex", args[2]); 
if (args.length == 4) 
grepJob.set("mapred.mapper.regex.group", args[3]); 

grepJob.setCombinerClass(LongSumReducer.class); 
grepJob.setReducerClass(LongSumReducer.class); 

FileOutputFormat.setOutputPath(grepJob, tempDir); 
grepJob.setOutputFormat(SequenceFileOutputFormat.class); 
grepJob.setOutputKeyClass(Text.class); 
grepJob.setOutputValueClass(LongWritable.class); 

JobClient.runJob(grepJob); 

JobConf sortJob = new JobConf(getConf(), Grep.class); 
sortJob.setJobName("grep-sort"); 

FileInputFormat.setInputPaths(sortJob, tempDir); 
sortJob.setInputFormat(SequenceFileInputFormat.class); 

sortJob.setMapperClass(InverseMapper.class); 

sortJob.setNumReduceTasks(1); // write a single file 
FileOutputFormat.setOutputPath(sortJob, new Path(args[1])); 
sortJob.setOutputKeyComparatorClass // sort by decreasing freq 
(LongWritable.DecreasingComparator.class); 

JobClient.runJob(sortJob); 
} 
finally { 
FileSystem.get(grepJob).delete(tempDir, true); 
} 
return 0; 
} 

public static void main(String[] args) throws Exception { 
int res = ToolRunner.run(new Configuration(), new Grep(), args); 
System.exit(res); 
} 

} 

답변

0

통계가 여기에

필요한 경우 명령을 실행하기위한 출력 두 작업 : job: job_201208030925_0011job: job_201208030925_0012. 백분율은이 두 작업에 속하므로 2 개의 맵 진행 백분율이 있습니다.

+0

왜 두 가지 작업이 있습니까? 첫 번째 작업과 두 번째 작업에서 각각 어떤 작업을 수행 했습니까? 어떻게 작업을 분할하고 왜 하나의 파일에 간단한 grep 작업을 분할합니까? –

+0

작업을 설정하고 시작하는 드라이버 클래스 구현을 추가하십시오. – Razvan

+0

위의 편집은 드라이버 클래스의 의미입니까? –

관련 문제