2017-05-12 12 views
0

테이블에서 데이터를 선택하고 결과를 spark-sql을 사용하여 다른 하이브 파티션 테이블에 삽입하는 하이브 쿼리를 실행 중입니다. 삽입하는 동안 1536 개의 파티션이 필요합니다. 나는 2000 아래spark-sql을 사용하여 spark에서 하이브 동적 파티션을 늘릴 수 없습니다

에 최대 파티션을 증가 그겁니다 그러나 스파크는 1536 개 파티션과 데이터를 삽입 할 수없는 명령입니다

스파크-SQL --master 실 --num - 집행 (14) --executor- 메모리 45G --executor-core 30 - 드라이버 메모리 10G --conf spark.dynamicAllocation.enabled = false -e "SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition. mode = nonstrict; SET hive.exec.max.dynamic.partitions = 2000; 테이블에 삽입 weatherdata_part_rv.weather_data_daily_model_location_mapping_rv 파티션 (mod el_id, record_date) weatherdata_part_rv.model_location_xref * FROM ( y.rec_id에서, x.municipal_id, x.model_id, y.record_date을 선택) 외부 x.municipal_id = y.weather_station_id에 weatherdata_part_rv.weather_data_daily Y 가입 좌측 X : "

오류 스택 :

spark-sql --master yarn --num-executors 14 --executor-memory 45G --executor-cores 30 --driver-memory 10G --conf spark.dynamicAllocation.enabled=false -e "SET hive.exec.dynamic.partition = true;SET hive.exec.dynamic.partition.mode = nonstrict;SET hive.exec.max.dynamic.partitions = 2000; 
> insert into table weatherdata_part_rv.weather_data_daily_model_location_mapping_rv partition (model_id,record_date) select y.rec_id,x.municipal_id,y.temprature_min_in_celcius,y.temprature_max_in_celcius,y.rainfall_in_mm,y.relative_humidity_min,y.relative_humidity_max,y.radiation_max,y.wind_intensity,y.wind_direction,y.cloud_coverage,y.soil_temprature_in_celcius,y.water_quantity_in_soil,y.lmdt,y.icon,y.probablity_of_rainfall,y.rain_acc_20feb_onwards,x.model_id,y.record_date from (select * from weatherdata_part_rv.model_location_xref) x left outer join weatherdata_part_rv.weather_data_daily y on x.municipal_id=y.weather_station_id;" 
17/05/12 09:44:05 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 
17/05/12 09:44:05 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 
17/05/12 09:44:08 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 
hive.exec.dynamic.partition  true 
Time taken: 1.874 seconds, Fetched 1 row(s) 
hive.exec.dynamic.partition.mode  nonstrict 
Time taken: 0.67 seconds, Fetched 1 row(s) 
hive.exec.max.dynamic.partitions  2000 
Time taken: 0.047 seconds, Fetched 1 row(s) 
17/05/12 09:58:30 ERROR SparkSQLDriver: Failed in [ 
insert into table weatherdata_part_rv.weather_data_daily_model_location_mapping_rv partition (model_id,record_date) select y.rec_id,x.municipal_id,y.temprature_min_in_celcius,y.temprature_max_in_celcius,y.rainfall_in_mm,y.relative_humidity_min,y.relative_humidity_max,y.radiation_max,y.wind_intensity,y.wind_direction,y.cloud_coverage,y.soil_temprature_in_celcius,y.water_quantity_in_soil,y.lmdt,y.icon,y.probablity_of_rainfall,y.rain_acc_20feb_onwards,x.model_id,y.record_date from (select * from weatherdata_part_rv.model_location_xref) x left outer join weatherdata_part_rv.weather_data_daily y on x.municipal_id=y.weather_station_id] 
java.lang.reflect.InvocationTargetException 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:498) 
     at org.apache.spark.sql.hive.client.Shim_v1_2.loadDynamicPartitions(HiveShim.scala:823) 
     at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:689) 
     at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687) 
     at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687) 
     at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:230) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:229) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:272) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:687) 
     at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:796) 
     at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784) 
     at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784) 
     at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) 
     at org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:784) 
     at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:268) 
     at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170) 
     at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347) 
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) 
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) 
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) 
     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
     at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) 
     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) 
     at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87) 
     at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87) 
     at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185) 
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) 
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) 
     at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699) 
     at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62) 
     at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335) 
     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) 
     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) 
     at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:168) 
     at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:498) 
     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) 
     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) 
     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) 
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) 
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1536, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1536. 
     at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1578) 
     ... 48 more 
java.lang.reflect.InvocationTargetException 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:498) 
     at org.apache.spark.sql.hive.client.Shim_v1_2.loadDynamicPartitions(HiveShim.scala:823) 
     at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:689) 
     at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687) 
     at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:687) 
     at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:230) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:229) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:272) 
     at org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:687) 
     at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:796) 
     at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784) 
     at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:784) 
     at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) 
     at org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:784) 
     at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:268) 
     at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170) 
     at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347) 
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) 
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) 
     at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) 
     at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
     at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) 
     at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) 
     at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87) 
     at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87) 
     at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185) 
     at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) 
     at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) 
     at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699) 
     at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62) 
     at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335) 
     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) 
     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311) 
     at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:168) 
     at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:498) 
     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738) 
     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) 
     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) 
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) 
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1536, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1536. 
     at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1578) 
     ... 48 more 

는 스파크의 최대 하이브 파티션에 어떤 제한이 있습니까?

그렇다면 최대 개수를 늘릴 수있는 방법이 있습니까?

+0

Spark 2.0.0부터 SET 연산자를 사용하여 동적으로 하이브 클라이언트를 구성 할 수 없습니다. https://issues.apache.org/jira/browse/SPARK-19881 –

+0

예. 최신 스파크 버전 때문입니다. hive-site.xml 파일에 속성을 추가했을 때 작동했습니다. 고마워요. –

답변

1

당신은 spark_home/conf의/하이브-site.xml 파일 및 하이브 가정/conf의/하이브를 site.xml

hive.exec.max.dynamic.partitions에서 하이브-site.xml 파일에서 특성 아래에 추가 할 수 =이 문제를 해결해야 2000

<name>hive.exec.max.dynamic.partitions</name> 
    <value>2000</value> 
    <description></description> 

희망.

값이 인식되지 않으면 hs2 프로세스를 다시 시작하십시오.

+0

효과가있었습니다. 고맙습니다. –

+0

그 일에 귀 기울이는 것을 좋아합니다. –

관련 문제