2016-08-24 3 views
1

에 DataFrame의지도 기능을 사용할 때 스키마 아래에있는 DataFrame "orderedDf을"이 발생java.lang.ClassNotFoundException가 내가 스파크

+--------+----------------+----------------+---------+------------------+--------------+ 
|schoolID|count(studentID)|count(teacherID)|sum(size)|sum(documentCount)|avg_totalScore| 
+--------+----------------+----------------+---------+------------------+--------------+ 
|school03|    2|    2|  195|    314|   100.0| 
|school02|    2|    2|  193|    330|   94.5| 
|school01|    2|    2|  294|    285|   83.4| 
|school04|    2|    2|  263|    415|   72.5| 
|school05|    2|    2|  263|    415|   62.5| 
|school07|    2|    2|  263|    415|   52.5| 
|school09|    2|    2|  263|    415|   49.8| 
|school08|    2|    2|  263|    415|   42.3| 
|school06|    2|    2|  263|    415|   32.5| 
+--------+----------------+----------------+---------+------------------+--------------+ 

"avg_totalScore"열은 desc로 정렬되어 있습니다. 지금, 난 그냥 아래처럼, 나는 세 그룹에 모든 행을 분할하려면, 문제가있다 : 즉

+--------+----------------+----------------+---------+------------------+--------------+ 
|schoolID|count(studentID)|count(teacherID)|sum(size)|sum(documentCount)|avg_totalScore| 
+--------+----------------+----------------+---------+------------------+--------------+ 
|great |    2|    2|  195|    314|   100.0| 
|great |    2|    2|  193|    330|   94.5| 
|great |    2|    2|  294|    285|   83.4| 
|good |    2|    2|  263|    415|   72.5| 
|good |    2|    2|  263|    415|   62.5| 
|good |    2|    2|  263|    415|   52.5| 
|bad  |    2|    2|  263|    415|   49.8| 
|bad  |    2|    2|  263|    415|   42.3| 
|bad  |    2|    2|  263|    415|   32.5| 
+--------+----------------+----------------+---------+------------------+--------------+ 

, 나는 그들의 "avg_totalScore"에 따라 세 그룹으로 학교를 나눌, 좋은 학교, 좋은 학교, 나쁜 학교 인 경우 요금은 3 : 3 : 3입니다.

내 솔루션은 다음과 같다 :

val num = orderedDf.count() 
val first_split_num = math.floor(num * (1.0/3)) 
val second_split_num = math.ceil(num * (2.0/3)) 
val accumu = SparkContext.getOrCreate(Configuration.getSparkConf).accumulator(0, "Group Num") 
val rdd = orderedDf.map(row => { 
    val group = { 
    accumu match { 
     case a: Accumulator[Int] if a.value <= first_split_num => "great" 
     case b: Accumulator[Int] if b.value <= second_split_num => "good" 
     case _ => "bad" 
    } 
    } 
    accumu += 1 
    Row(group, row(1), row(2), row(3), row(4), row(5), row(6)) 
}) 

val result = sqlContext.createDataFrame(rdd,orderedDf.schema) 

위의 코드가 어떤 예외는 없다,하지만 사용하는 경우, 괜찮 :

result.collect().foreach(println) 

또는

result.show() 

내가 얻을 ClassNotFound 예외, 이유를 모르겠습니다. 누가 나를 도울 수있어, 대단히 감사합니다!

여기에 예외의 세부 사항입니다 :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 44.0 failed 4 times, most recent failure: Lost task 0.3 in stage 44.0 (TID 3644, node1): java.lang.ClassNotFoundException: com.lancoo.ecbdc.business.ComparativeAnalysisBusiness$$anonfun$1 
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) 
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
at java.lang.Class.forName0(Native Method) 
at java.lang.Class.forName(Class.java:348) 
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68) 
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620) 
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) 
at scala.collection.immutable.$colon$colon.readObject(List.scala:362) 
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) 
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) 
at scala.collection.immutable.$colon$colon.readObject(List.scala:362) 
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:498) 
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) 
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) 
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) 
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) 
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76) 
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115) 
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) 
at org.apache.spark.scheduler.Task.run(Task.scala:89) 
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745) 
+0

특정 클래스가 발견되지 않는 것은'com.lancoo.ecbdc.business.ComparativeAnalysisBusiness $$ anonfun $ 1'이며,'orderedDf' 구현을 보여주지 않는다면 문제의 가능성이 있습니다. 매우 높다. –

+0

지도 함수는 변환 연산자, 게으른 연산자이므로 일부 동작 연산자가 실행되기 전에 문제가지도 함수에 있다고 생각합니다. 하지만지도 기능에서 오류가 어디 있는지 모르겠습니다. – StrongYoung

답변

-1
java.lang.ClassNotFoundException: com.lancoo.ecbdc.business.ComparativeAnalysisBusiness$$anonfun$1 

클래스 로더 예외에 따라 위에서 언급 한 클래스를로드 할 수 없습니다. 이 클래스가 코드에서 어떻게 사용되는지에 대한 자세한 정보를 제공해 주시겠습니까?

+0

고맙습니다. 위의 코드는이 클래스에 있습니다. 문제는 ClassNotFound가 아니라이 클래스의 익명 함수입니다. – StrongYoung

관련 문제