2016-11-02 8 views
2

최근에 우리는 1.3 버전에서 1.6 버전으로 스파크 업데이트를했고이 업데이트 후에 "카운트 (distinct)"조건을 가진 쿼리가 작동하지 않습니다. 오류스파크 1.6에서 hivecontext 쿼리로 작동하지 않는 카운트 (distinct)가 있습니다

쿼리 ::와 쿼리로

hiveContext.sql( "select A1.x, A1.y, A1.z from (select concat(g,h) as x, y, z from raw_parquet where f = '') A1 group by A1.x, A1.y,A1.z having count(distinct(A1.z)) > 1").show() 

을 "COUNT (*)를 가진 잘 작동하고 있습니다"예 :

hiveContext.sql( "select A1.x, A1.y, A1.z from (select concat(g,h) as x, y, z from raw_parquet where f = '') A1 group by A1.x, A1.y,A1.z having count(*) > 1").show() 

해결 방법이 있으면 알려 주시기 바랍니다. 대단히 감사합니다

오류 ::

org.apache.spark.sql.AnalysisException: resolved attribute(s) gid#687,z#688 missing from x#685,y#252,z#255 in operator !Aggregate [x#685,y#252], [cast(((count(if ((gid#687 = 1)) z#688 else null),mode=Complete,isDistinct=false) > cast(1 as bigint)) as boolean) AS havingCondition#686,x#685,y#252]; 
     at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38) 
     at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44) 
     at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:183) 
     at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50) 
     at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:121) 
     at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120) 
     at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120) 
     at scala.collection.immutable.List.foreach(List.scala:318) 
     at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120) 
     at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120) 
     at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120) 
     at scala.collection.immutable.List.foreach(List.scala:318) 
     at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120) 
     at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50) 
     at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44) 
     at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34) 
     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133) 
     at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52) 
     at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817) 
     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) 
     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36) 
     at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38) 
     at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40) 
     at $iwC$$iwC$$iwC$$iwC.<init>(<console>:42) 
     at $iwC$$iwC$$iwC.<init>(<console>:44) 
     at $iwC$$iwC.<init>(<console>:46) 
     at $iwC.<init>(<console>:48) 
     at <init>(<console>:50) 
     at .<init>(<console>:54) 
     at .<clinit>(<console>) 
     at .<init>(<console>:7) 
     at .<clinit>(<console>) 
     at $print(<console>) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045) 
     at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326) 
     at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821) 
     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852) 
     at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800) 
     at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) 
     at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) 
     at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) 
     at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) 
     at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) 
     at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) 
     at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) 
     at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) 
     at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) 
     at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) 
     at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) 
     at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064) 
     at org.apache.spark.repl.Main$.main(Main.scala:31) 
     at org.apache.spark.repl.Main.main(Main.scala) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) 
     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) 
     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 

답변

1

이 같은 시도 :

df.groupBy("x").count().filter($"count" >= 1).show() 

또는

import org.apache.spark.sql.functions.count 
df.groupBy("x").agg(count("*").alias("cnt")).where($"cnt" > 1) 
관련 문제