Cassandra와 통합 된 돼지 : 간단한 분산 쿼리를 완료하는 데 몇 분이 걸립니다. 이게 정상인가?

나는 Cassandra + Pig/Hadoop의 테스트 통합을 설정했다. 8 개 노드는 Cassandra + TaskTracker 노드이고, 1 개 노드는 JobTracker/NameNode입니다.Cassandra와 통합 된 돼지 : 간단한 분산 쿼리를 완료하는 데 몇 분이 걸립니다. 이게 정상인가?

나는 카산드라 클라이언트를 해고하고 카산드라 배포판에있는 Readme.txt에 나와있는 데이터의 단순한 비트를 만들어 : 나는 CASSANDRA_HOME에 나와있는 샘플 돼지 쿼리를 실행 한 다음

[[email protected]] create keyspace Keyspace1; 
    [[email protected]] use Keyspace1; 
    [[email protected]] create column family Users with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type; 
    [[email protected]] set Users[jsmith][first] = 'John'; 
    [[email protected]] set Users[jsmith][last] = 'Smith'; 
    [[email protected]] set Users[jsmith][age] = long(42)

(사용 pig_cassandra) :

grunt> rows = LOAD 'cassandra://Keyspace1/Users' USING CassandraStorage() AS (key, columns: bag {T: tuple(name, value)}); 
grunt> cols = FOREACH rows GENERATE flatten(columns); 
grunt> colnames = FOREACH cols GENERATE $0; 
grunt> namegroups = GROUP colnames BY (chararray) $0; 
grunt> namecounts = FOREACH namegroups GENERATE COUNT($1), group; 
grunt> orderednames = ORDER namecounts BY $0; 
grunt> topnames = LIMIT orderednames 50; 
grunt> dump topnames;

완료하는 데 약 3 분이 걸렸습니다.

HadoopVersion PigVersion  UserId StartedAt    FinishedAt       Features 
    1.0.0    0.9.1   root 2012-01-12  22:16:53  2012-01-12 22:20:22  GROUP_BY,ORDER_BY,LIMIT 
Success! 

Job Stats (time in seconds): 
JobId Maps Reduces MaxMapTime  MinMapTIme  AvgMapTime  MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs 
job_201201121817_0010 8  1  12  6  9  21  21  21  colnames,cols,namecounts,namegroups,rows  GROUP_BY,COMBINER  
job_201201121817_0011 1  1  6  6  6  15  15  15  orderednames SAMPLER 
job_201201121817_0012 1  1  9  9  9  15  15  15  orderednames ORDER_BY,COMBINER  hdfs://xxxx/tmp/temp-744158198/tmp-1598279340, 

Input(s): 
Successfully read 1 records (3232 bytes) from: "cassandra://Keyspace1/Users" 

Output(s): 
Successfully stored 3 records (63 bytes) in: "hdfs://xxxx/tmp/temp-744158198/tmp-1598279340" 

Counters: 
Total records written : 3 
Total bytes written : 63 
Spillable Memory Manager spill count : 0 
Total bags proactively spilled: 0 
Total records proactively spilled: 0

로깅에 오류 또는 경고가 없습니다.

정상입니까? 아니면 이상이 있습니까?

출처

2012-01-13 marathon

예 Hadoop에서 Map/Reduce 작업을 실행하는 데는 일반적으로 시작하는 데 약 1 분이 걸리기 때문에 정상입니다. 돼지는 스크립트의 복잡성에 따라 여러 Map/Reduce 작업을 생성합니다.

출처

2012-01-13 09:35:22 Brainlag

Cassandra와 통합 된 돼지 : 간단한 분산 쿼리를 완료하는 데 몇 분이 걸립니다. 이게 정상인가?

답변

관련 문제