에 DataFrame의지도 기능을 사용할 때 스키마 아래에있는 DataFrame "orderedDf을"이 발생java.lang.ClassNotFoundException가 내가 스파크
+--------+----------------+----------------+---------+------------------+--------------+
|schoolID|count(studentID)|count(teacherID)|sum(size)|sum(documentCount)|avg_totalScore|
+--------+----------------+----------------+---------+------------------+--------------+
|school03| 2| 2| 195| 314| 100.0|
|school02| 2| 2| 193| 330| 94.5|
|school01| 2| 2| 294| 285| 83.4|
|school04| 2| 2| 263| 415| 72.5|
|school05| 2| 2| 263| 415| 62.5|
|school07| 2| 2| 263| 415| 52.5|
|school09| 2| 2| 263| 415| 49.8|
|school08| 2| 2| 263| 415| 42.3|
|school06| 2| 2| 263| 415| 32.5|
+--------+----------------+----------------+---------+------------------+--------------+
"avg_totalScore"열은 desc로 정렬되어 있습니다. 지금, 난 그냥 아래처럼, 나는 세 그룹에 모든 행을 분할하려면, 문제가있다 : 즉
+--------+----------------+----------------+---------+------------------+--------------+
|schoolID|count(studentID)|count(teacherID)|sum(size)|sum(documentCount)|avg_totalScore|
+--------+----------------+----------------+---------+------------------+--------------+
|great | 2| 2| 195| 314| 100.0|
|great | 2| 2| 193| 330| 94.5|
|great | 2| 2| 294| 285| 83.4|
|good | 2| 2| 263| 415| 72.5|
|good | 2| 2| 263| 415| 62.5|
|good | 2| 2| 263| 415| 52.5|
|bad | 2| 2| 263| 415| 49.8|
|bad | 2| 2| 263| 415| 42.3|
|bad | 2| 2| 263| 415| 32.5|
+--------+----------------+----------------+---------+------------------+--------------+
, 나는 그들의 "avg_totalScore"에 따라 세 그룹으로 학교를 나눌, 좋은 학교, 좋은 학교, 나쁜 학교 인 경우 요금은 3 : 3 : 3입니다.
내 솔루션은 다음과 같다 :
val num = orderedDf.count()
val first_split_num = math.floor(num * (1.0/3))
val second_split_num = math.ceil(num * (2.0/3))
val accumu = SparkContext.getOrCreate(Configuration.getSparkConf).accumulator(0, "Group Num")
val rdd = orderedDf.map(row => {
val group = {
accumu match {
case a: Accumulator[Int] if a.value <= first_split_num => "great"
case b: Accumulator[Int] if b.value <= second_split_num => "good"
case _ => "bad"
}
}
accumu += 1
Row(group, row(1), row(2), row(3), row(4), row(5), row(6))
})
val result = sqlContext.createDataFrame(rdd,orderedDf.schema)
위의 코드가 어떤 예외는 없다,하지만 사용하는 경우, 괜찮 :
result.collect().foreach(println)
또는
result.show()
내가 얻을 ClassNotFound 예외, 이유를 모르겠습니다. 누가 나를 도울 수있어, 대단히 감사합니다!
여기에 예외의 세부 사항입니다 :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 44.0 failed 4 times, most recent failure: Lost task 0.3 in stage 44.0 (TID 3644, node1): java.lang.ClassNotFoundException: com.lancoo.ecbdc.business.ComparativeAnalysisBusiness$$anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
특정 클래스가 발견되지 않는 것은'com.lancoo.ecbdc.business.ComparativeAnalysisBusiness $$ anonfun $ 1'이며,'orderedDf' 구현을 보여주지 않는다면 문제의 가능성이 있습니다. 매우 높다. –
지도 함수는 변환 연산자, 게으른 연산자이므로 일부 동작 연산자가 실행되기 전에 문제가지도 함수에 있다고 생각합니다. 하지만지도 기능에서 오류가 어디 있는지 모르겠습니다. – StrongYoung