2014-04-07 2 views
1

내가 사용 시퀀스 파일을 코끼리 조련사 일부 텍스트를 변환하기 위해 노력하고있어 :두싯 - 예외 : 자바 힙 공간

mahout seqdirectory -i Lastfm-ArtistTags2007 -o seqdirectory 

을하지만 얻을 모두가 같은 여기에 OutOfMemoryError가있다 :

Running on hadoop, using /usr/bin/hadoop and HADOOP_CONF_DIR= 
MAHOUT-JOB: /opt/mahout/mahout-examples-0.9-job.jar 
14/04/07 16:44:34 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[Lastfm-ArtistTags2007], --keyPrefix=[], --method=[mapreduce], --output=[seqdirectoryjps], --startPhase=[0], --tempDir=[temp]} 
14/04/07 16:44:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
14/04/07 16:44:35 INFO input.FileInputFormat: Total input paths to process : 4 
14/04/07 16:44:35 WARN snappy.LoadSnappy: Snappy native library not loaded 
14/04/07 16:44:35 INFO mapred.JobClient: Running job: job_local407267609_0001 
14/04/07 16:44:35 INFO mapred.LocalJobRunner: Waiting for map tasks 
14/04/07 16:44:35 INFO mapred.LocalJobRunner: Starting task: attempt_local407267609_0001_m_000000_0 
14/04/07 16:44:35 INFO util.ProcessTree: setsid exited with exit code 0 
14/04/07 16:44:35 INFO mapred.Task: Using ResourceCalculatorPlugin : [email protected] 
14/04/07 16:44:35 INFO mapred.MapTask: Processing split: Paths:/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/README.txt:0+2472,/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/ArtistTags.dat:0+71652722,/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/tags.txt:0+1739746,/home/giuliano/cook/lastfm/Lastfm-ArtistTags2007/artists.txt:0+327051 
14/04/07 16:44:35 INFO compress.CodecPool: Got brand-new compressor 
14/04/07 16:44:35 INFO mapred.LocalJobRunner: Map task executor complete. 
14/04/07 16:44:35 WARN mapred.LocalJobRunner: job_local407267609_0001 
java.lang.Exception: java.lang.OutOfMemoryError: Java heap space 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) 
Caused by: java.lang.OutOfMemoryError: Java heap space 
    at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:119) 
    at org.apache.mahout.text.WholeFileRecordReader.nextKeyValue(WholeFileRecordReader.java:118) 
    at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69) 
    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531) 
    at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:166) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:724) 
14/04/07 16:44:36 INFO mapred.JobClient: map 0% reduce 0% 
14/04/07 16:44:36 INFO mapred.JobClient: Job complete: job_local407267609_0001 
14/04/07 16:44:36 INFO mapred.JobClient: Counters: 0 
14/04/07 16:44:36 INFO driver.MahoutDriver: Program took 1749 ms (Minutes: 0.02915) 

I 두싯 0.9, 하둡 1.2.1 및 오픈 JDK Java7u25 도움이되지 않았다 4096 MAHOUT_HEAPSIZE을 정의

를 사용하고, 텍스트 파일은 여기에서 찾을 수 있습니다 : http://static.echonest.com/Lastfm-ArtistTags2007.tar.gz

답변

1

현재 생성 된 작업은 로컬 작업 러로 실행되며 실행은 작업을 실행 한 노드에서만 발생합니다. 실행을 분산되게하려면 을 mapred-site.xml inorder에 설정하여 작업 추적자 주소를 지정하십시오. 당신이 환경 변수 HADOOP_CONF_DIR 보면 분산 모드에서

실행이 OUTOFMEMORY 문제를

를 해결할 수있는, 그 값은 그 값 다음 명령 export HADOOP_CONF_DIR=/etc/hadoop/conf을 사용하여 빈 세트입니다. mapred.job.tracker 값이 jobTracker /etc/hadoop/conf/mapred-site.xml 구성을 가리켜 야 함을 확인하십시오.

+0

정확하게 변경해야합니까? 난 hadoop 전문 지식이 아니야 – MrParrot

+0

내 대답 수정 – sachin