하이브 - 외부 (동적) 분할 된 테이블

나는 MySQL에 테이블을 가지고있다. nas_comps.하이브 - 외부 (동적) 분할 된 테이블

select comp_code, count(leg_id) from nas_comps_01012011_31012011 n group by comp_code; 
comp_code  count(leg_id) 
'J'   20640 
'Y'   39680

먼저, I는 HDFSHadoop 버전 1.0.2)를 사용 Sqoop을 상으로 데이터를 가져 :

/*shows the partitions on 'describe' but not 'show partitions'*/ 
create external table nas_comps(DS_NAME string,DEP_DATE string, 
           CRR_CODE string,FLIGHT_NO string,ORGN string, 
           DSTN string,PHYSICAL_CAP int,ADJUSTED_CAP int, 
           CLOSED_CAP int) 
PARTITIONED BY (LEG_ID int, month INT, COMP_CODE string) 
location '/pros/olap2/dataimports/nas_comps'

파티션 컬럼 : 다음

sqoop import --connect jdbc:mysql://172.25.37.135/pros_olap2 \ 
--username hadoopranch \ 
--password hadoopranch \ 
--query "select * from nas_comps where dep_date between '2011-01-01' and '2011-01-10' AND \$CONDITIONS" \ 
-m 1 \ 
--target-dir /pros/olap2/dataimports/nas_comps

, I는 외부 분배 하이브 테이블 작성 설명 될 때 표시됩니다 :

hive> describe extended nas_comps; 
OK 
ds_name string 
dep_date  string 
crr_code  string 
flight_no  string 
orgn string 
dstn string 
physical_cap int 
adjusted_cap int 
closed_cap  int 
leg_id int 
month int 
comp_code  string 

Detailed Table Information  Table(tableName:nas_comps, dbName:pros_olap2_optim, 
owner:hadoopranch, createTime:1374849456, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:ds_name, type:string, comment:null), 
FieldSchema(name:dep_date, type:string, comment:null), FieldSchema(name:crr_code, 
type:string, comment:null), FieldSchema(name:flight_no, type:string, comment:null), 
FieldSchema(name:orgn, type:string, comment:null), FieldSchema(name:dstn, type:string, 
comment:null), FieldSchema(name:physical_cap, type:int, comment:null), 
FieldSchema(name:adjusted_cap, type:int, comment:null), FieldSchema(name:closed_cap, 
type:int, comment:null), FieldSchema(name:leg_id, type:int, comment:null), 
FieldSchema(name:month, type:int, comment:null), FieldSchema(name:comp_code, type:string, 
comment:null)], location:hdfs://172.25.37.21:54300/pros/olap2/dataimports/nas_comps, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, 
numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters: 
{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys: 
[FieldSchema(name:leg_id, type:int, comment:null), FieldSchema(name:month, type:int, 
comment:null), FieldSchema(name:comp_code, type:string, comment:null)], 
parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1374849456}, viewOriginalText:null, 
viewExpandedText:null, tableType:EXTERNAL_TABLE)

그러나 파티션이 때문에 생성 있는지 확실하지 않습니다 :

hive> show partitions nas_comps; 
OK 
Time taken: 0.599 seconds 


select count(1) from nas_comps;

0을 반환 기록 나는 동적 파티션이있는 외부 하이브 테이블을 만들려면 어떻게해야합니까

를?

출처

2013-07-26 Kaliyug Antagonist

하이브는이 방법으로 파티션을 생성하지 않습니다.
원하는 파티션 키로 분할 된 테이블을 만든 다음 외부 테이블에서 새 분할 된 테이블 (hive.exec.dynamic.partition=true 및 hive.exec.dynamic.partition.mode=nonstrict으로 설정)까지 insert overwrite table을 실행하십시오. 당신이 수동으로 디렉토리를 작성해야 외부에서 분할 된 테이블을 유지해야하는 경우

다음 MSCK REPAIR TABLE table_name;command

출처

2013-07-26 17:02:17 dimamah

동적 분할

하이브 파티션 테이블에 레코드를 삽입하는 동안 동적으로 더해진다.

insert 문과 함께 만 지원됩니다.
load data 문과 함께 지원되지 않습니다.
하이브 테이블에 데이터를 삽입하기 전에 동적 파티션 설정을 활성화해야합니다. hive.exec.dynamic.partition.mode=nonstrict 기본값은 strict hive.exec.dynamic.partition=true 기본값은 false입니다.

동적 분할 쿼리

SET hive.exec.dynamic.partition.mode=nonstrict; 
SET hive.exec.dynamic.partition=true; 
INSERT INTO table_name PARTITION (loaded_date) 
select * from table_name1 where loaded_date = 20151217

loaded_date = 20151217

여기서 파티션 및 그 값이다.

제한 :

동적 파티션은 위의 진술과 함께 작동합니다.
table_name1에서 loaded_date 열에서 선택하는 데이터에 따라 동적으로 파티션을 생성합니다. 당신의 상태는 다음 위의 기준과 일치하지 않는 경우

는 :

ALTER TABLE table_name ADD PARTITION (DS_NAME='partname1',DATE='partname2');

또는 동적 파티션 생성이 Link을 사용하십시오 :

먼저 분할 된 테이블은 다음과 같이 할 만듭니다.

출처

2013-07-26 11:03:55

를 사용 (파티션 당 1 디렉토리 이름이 PARTION_KEY=VALUE이 있어야한다) 그래, 난이 확인했지만이 동적되지 않습니다 파티션 - 여전히 파티션에 값을 제공해야합니다. –

오른쪽, 쉘 스크립트를 통해 실행하십시오. 파티션을위한 쉘 스크립트에서 변수를 생성하고 alter table 명령에 전달할 수 있습니다. 그렇지 않으면 현재 사용할 수있는 옵션이 없습니다 :( –

하이브 - 외부 (동적) 분할 된 테이블

답변

관련 문제