2017-11-21 4 views
0

하이브 맵 조인을 배우고 조건부 태스크를 발견했습니다. 조건부 태스크에 대해 다음과 같은 질문이 있습니다.하둡지도 측면 조인 조건부 태스크

  1. hadoop 맵 조인에서 조건부 태스크 란 무엇입니까?
  2. 지도에서 작은 테이블을 식별하는 조건부 작업이 어떻게 도움이됩니까? 다음 하이브 속성과 의미

    설정 hive.auto.convert.join.noconditionaltask.size의 차이는 무엇

  3. 이다;

    세트 hive.mapjoin.smalltable.filesize;

위의 질문에 대한 답을 얻고지도 측 조인에서 조건부 작업을 이해할 수 있습니까? 자세한 내용은

답변

0
1. What is a conditional task in hadoop map join ? 

    During compilation time, the query processor generates a 
conditional task containing a list of tasks and among this one of the tasks gets resolved to run during execution time. 



2. How does conditional task help in identifying the small table in map join? 

During the execution stage, the conditional task knows the exact file size of each input table, even if the table is an intermediate one. If all the tables are too large to be converted into map join, then just run the common join task as previously. If one of the tables is large and others are small enough to run map join, then the conditional task will pick the corresponding map join local task to run. By this mechanism, it can convert the common join into a map join automatically and dynamically. 

3. What is difference between the following hive properties and their significance 


    hive. Auto. Convert. Join 

    default value: false 

    This is used for auto join conversion. Once auto join is enabled, you need not specify the map-join in the query. 

hive. Auto. Convert. Join. Noconditionaltask 

default value: true 



    This controls whether hive should enable the optimization of converting common join into map-join based on the input file size or not.If the total size of small tables is larger than 25MB, then the conditional task will choose the original common join to run. 

hive. Auto. Convert. Join. Noconditionaltask. Size 

default value: 10000000 

If the sum of size for n-1 of the tables/partitions for an n-way join is smaller than this size, the join is directly converted to a map join (there is no conditional task). The default is 10mb. But this is dependent on hive. Auto. Convert. Join. Noconditionaltask and works only when it is true. 

: https://cwiki.apache.org/confluence/download/attachments/27362054/Hive+Summit+2011-join.pdf