두싯 : KMeans 클러스터링

내가 두싯 새로운 그리고 난이 코드가 있습니다
두싯 : KMeans 클러스터링

public class mahout { 

public static final double[][] points = { {1, 1}, {2, 1}, {1, 2},{2, 2}, {3, 3}, {8, 8}, {9, 8}, {8, 9}, {9, 9}}; 

public static List<Vector> getPoints(double[][] raw) { 
List<Vector> points = new ArrayList<Vector>(); 
for (int i = 0; i < raw.length; i++) { 
double[] fr = raw[i]; 
    Vector vec = new RandomAccessSparseVector(fr.length); 
vec.assign(fr); 
points.add(vec); 
} 

return points; 

} 

public static void main(String args[]) throws Exception { 

int k = 2; 

List<Vector> vectors = getPoints(points); 

File testData = new File("testdata"); 
if (!testData.exists()) { 
    testData.mkdir(); 
} 
testData = new File("testdata/points"); 
if (!testData.exists()) { 
    testData.mkdir(); 
} 

Configuration conf = new Configuration(); 
FileSystem fs = FileSystem.get(conf); 
ClusterHelper.writePointsToFile(vectors, conf, new Path("testdata/points/file1")); 

Path path = new Path("testdata/clusters/part-00000"); 
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, 
    path, Text.class, Kluster.class); 

for (int i = 0; i < k; i++) { 
    Vector vec = vectors.get(i); 
    Kluster cluster = new Kluster(vec, i, new EuclideanDistanceMeasure()); 
    writer.append(new Text(cluster.getIdentifier()), cluster); 
} 
writer.close(); 

Path output = new Path("output"); 
HadoopUtil.delete(conf, output); 

KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"), 
    output, new EuclideanDistanceMeasure(), 0.001, 10, 
    true, 0.0,false); 

SequenceFile.Reader reader = new SequenceFile.Reader(fs, 
    new Path("output/" + Kluster.CLUSTERED_POINTS_DIR 
      + "/part-m-00000"), conf); 

IntWritable key = new IntWritable(); 
WeightedVectorWritable value = new WeightedVectorWritable(); 
while (reader.next(key, value)) { 
    System.out.println(value.toString() + " belongs to cluster " 
        + key.toString()); 
} 
reader.close(); 
} 
}

을하지만 난 코드를 실행할 때이 오류가는 :

24-ott-2013 9.50.25 org.apache.hadoop.util.NativeCodeLoader <clinit> 
AVVERTENZA: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info 
INFO: Deleting output 
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info 
INFO: Input: testdata/points Clusters In: testdata/clusters Out: output Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure 
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info 
INFO: convergence: 0.0010 max Iterations: 10 
24-ott-2013 9.50.25 org.apache.hadoop.security.UserGroupInformation doAs 
GRAVE: PriviledgedActionException as:hp cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724\.staging to 0700 
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724\.staging to 0700 
    at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689) 
    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662) 
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) 
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) 
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) 
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918) 
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Unknown Source) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) 
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912) 
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500) 
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) 
    at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:182) 
    at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:223) 
    at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143) 
    at mahout.main(mahout.java:69)

는 어디 문제입니다 어떻게 해결할 수 있습니까? 코드를 실행하는 사용자가 스택 추적에 했나요 디렉토리에 충분한 권한이있는 경우

출처

2013-10-24 user2837896

-1

나타나는 문제는

Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724.staging to 0700

확인이다. 또한

Unable to load native-hadoop library for your platform...

은 정말 아무것도 윈도우에 하둡을 실행할 때 그것은 문제입니다 ^^

출처

2013-10-24 08:04:28 Julien

하둡 버전에 따라 다릅니다. 거의 확실하게 윈도우에서 실행되지는 않습니다. (정확히 기억한다면 1.1입니다.) 이것은 네이티브 라이브러리에 의존하지 않습니다. –

그리고이 문제를 어떻게 해결할 수 있습니까? – user2837896

@ user2837896 코드를 실행하는 사용자가 스택 추적에 언급 된 디렉토리에 충분한 권한이 있는지 확인하십시오. – Julien

잘 실행되지 수 있다는 사실에 대한 우려가 제공하는 추적.

당신이 특정 문제에 대한 몇 가지 JIRA 문제를 볼 수 있습니다

https://github.com/congainc/patch-hadoop_7682-1.0.x-win

https://issues.apache.org/jira/browse/HADOOP-7682

https://issues.apache.org/jira/browse/HADOOP-8089

만 수정하거나 패치 하둡이 패치를 사용하는 것입니다 또는 Windows에서 기본적으로 실행되는 Hadoop 2.2로 업그레이드 할 수도 있습니다.

출처

2013-10-24 08:22:05

hadoop 2.2를 여기에서 다운로드했다 : http://mirror.nohup.it/apache/hadoop/ common/hadoop-2.2.0/ 하지만 이클립스 프로젝트에서 가져올 수있는 항아리가 없습니다 – user2837896

src가 아닌 binary를 다운로드해야합니다. –

"hadoop-2.2.0.tar.gz"를 다운로드했습니다. 그런 다음 hadoop-common-2.2.0.jar라는 항아리를 가져 왔습니다.하지만 이제 java.lang.NoClassDefFoundError : org/apache/hadoop/util이 있습니다./PlatformName 오류 – user2837896

두싯 : KMeans 클러스터링

답변

관련 문제