2016-08-08 2 views
0

빨간색 변속기의 데이터를 spark-redshift를 사용하여 읽었을 때이 오류가 발생했습니다. S3에서 버킷을 만들었고 충분한 자격 증명으로 액세스 할 수있었습니다.S3 버킷이 누락되었습니다. 빨간색 변속 - 스파크

java.sql.SQLException: Amazon Invalid operation: S3ServiceException:The specified bucket does not exist,Status 404,Error NoSuchBucket,Rid AA6E01BF9BCED7ED,ExtRid 7TQKPoWU5lMdJ9av3E0Ehzdgg+e0yRrNYaB5Q+WCef0JPm134XHeiSNk1mx4cdzp,CanRetry 1 
Details: 

error: S3ServiceException:The specified bucket does not exist,Status 404,Error NoSuchBucket,Rid AA6E01BF9BCED7ED,ExtRid 7TQKPoWU5lMdJ9av3E0Ehzdgg+e0yRrNYaB5Q+WCef0JPm134XHeiSNk1mx4cdzp,CanRetry 1 
code: 8001 
context: Listing bucket=redshift-spark.s3.amazonaws.com prefix=s3Redshift/3a312209-7d6d-4d6b-bbd4-c1a70b2e136b/ 
query: 0 
location: s3_unloader.cpp:200 
process: padbmaster [pid=4952] 
-----------------------------------------------; 
at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(ErrorResponse.java:1830) 
at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(PGMessagingContext.java:804) 
at com.amazon.redshift.client.PGMessagingContext.handleMessage(PGMessagingContext.java:642) 
at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(InboundMessagesPipeline.java:312) 
at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(PGMessagingContext.java:1062) 
at com.amazon.redshift.client.PGMessagingContext.getErrorResponse(PGMessagingContext.java:1030) 
at com.amazon.redshift.client.PGClient.handleErrorsScenario2ForPrepareExecution(PGClient.java:2417) 
at com.amazon.redshift.client.PGClient.handleErrorsPrepareExecute(PGClient.java:2358) 
at com.amazon.redshift.client.PGClient.executePreparedStatement(PGClient.java:1358) 
at com.amazon.redshift.dataengine.PGQueryExecutor.executePreparedStatement(PGQueryExecutor.java:370) 
at com.amazon.redshift.dataengine.PGQueryExecutor.execute(PGQueryExecutor.java:245) 
at com.amazon.jdbc.common.SPreparedStatement.executeWithParams(Unknown Source) 
at com.amazon.jdbc.common.SPreparedStatement.execute(Unknown Source) 
at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:101) 
at com.databricks.spark.redshift.JDBCWrapper$$anonfun$executeInterruptibly$1.apply(RedshiftJDBCWrapper.scala:101) 
at com.databricks.spark.redshift.JDBCWrapper$$anonfun$2.apply(RedshiftJDBCWrapper.scala:119) 
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) 
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 

나는 문제가 스파크 적색 편이 버전 amazonaws-SDK와 충돌로했다 S3

+0

'context :'에서 무엇을 말하는지보십시오. 당신이 양동이를 올바르게 세우는 것 같아요. 현재 버킷'redshift-spark.s3.amazonaws.com'을 사용하고 있습니다. – moertel

답변

1

에서 만든 버킷이있다. POM을 업데이트하면 문제가 해결되었습니다.

업데이트의 pom.xml

<dependency> 
      <groupId>com.amazonaws</groupId> 
      <artifactId>aws-java-sdk</artifactId> 
      <version>1.10.22</version> 
      <!--<version>1.7.4</version>--> 
     </dependency> 
    <dependency> 
      <groupId>com.databricks</groupId> 
      <artifactId>spark-redshift_2.10</artifactId> 
      <version>0.6.0</version> 
     </dependency> 
1

것처럼 버킷의 이름을 제공하십시오 "S3 : // 버킷 폴더 이름>"당신의 버킷 s3redshift/myfile을 같은 디렉토리 구조를 가지고 예를 들어 이 s3redshift입니다 버킷 이름 thn 주소는 's3 : // s3redshift/myfile'형식이어야하며 추가 매개 변수를 전달하지 않아야합니다.

관련 문제