2014-11-04 1 views
1

다음과 같이 CSV가 있습니다. 여기서 줄 바꾸기는 줄 바꿈 대신 "+++"로 끝납니다. 문자열 "+++"가있는 줄 바꿈을 수행하여 csv를로드하는 방법?사용자 지정 줄 바꿈을 사용하여 CSV로드

VTS,51,0071,9739965515,NM,GP,INF01,V,19,072219,291014,0000.0000,N,00000.0000,E,07AE 

VTS,01,0097,9739965515,SP,GP,18,072253,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,169,B205+++VTS,51,0071,9739965515,NM,GP,INF01,V,18,072311,291014,0000.0000,N,00000.0000,E,C24E+++VTS,01,0097,9739965515,NM,GP,19,072311,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,171,B358 

VTS,51,0071,9739965515,NM,GP,INF01,V,18,072319,291014,0000.0000,N,00000.0000,E,012F 
VTS,51,0071,9739965515,NM,GP,INF01,V,19,072326,291014,0000.0000,N,00000.0000,E,B2E6+++VTS,01,0097,9739965515,NM,GP,18,072326,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,173,EAA0 
VTS,51,0071,9739965515,NM,GP,INF01,V,18,072333,291014,0000.0000,N,00000.0000,E,9896 
VTS,51,0071,9739965515,NM,GP,INF01,V,18,072340,291014,0000.0000,N,00000.0000,E,9B23 

먼저 새 줄 또는 "+++"기호가있는 줄을 끊고 데이터를로드해야합니다. 그런 다음 두 번째 열의 01 값으로 다시 필터링하십시오.

예상 출력 :

VTS,01,0097,9739965515,SP,GP,18,072253,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,169,B205 
VTS,01,0097,9739965515,NM,GP,19,072311,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,171,B358 
VTS,01,0097,9739965515,NM,GP,18,072326,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,173,EAA0 
+0

예상되는 출력은 무엇인가? –

+0

@SivasakthiJayaraman의 예상 출력은 –

+0

입니다.이 솔루션을 업데이트했습니다. 유효한지 확인한 후 알려주십시오. –

답변

1

PigScript :

A = LOAD 'input.csv' AS (line:chararray); 
B = FOREACH A { 
       splitRow = TOKENIZE(line,'+++'); 
       GENERATE FLATTEN(splitRow) AS newList; 
       } 
C = FOREACH B GENERATE FLATTEN(STRSPLIT(newList,',',16)); 
D = FILTER C BY $1==01; 
DUMP D; 

출력 :

(VTS,01,0097,9739965515,SP,GP,18,072253,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,169,B205) 
(VTS,01,0097,9739965515,NM,GP,19,072311,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,171,B358) 
(VTS,01,0097,9739965515,NM,GP,18,072326,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,173,EAA0) 
+0

위의 단계를 설명해 주실 수 있습니다. 내부적으로 작동하는 것을 이해할 수 있습니까? –

+0

추가 라인을 추가하면 E = foreach D가 $ 7, $ 15를 생성합니다. 덤프 E; 내가 얻은 결과는 (060037061114006800400000,999,149,9594) (060113061114006800400000999152, B927) 대신 불필요하게 제출되는 것입니다. $ 7 및 $ 15 –

+0

"C = FOREACH B GENERATE FLATTEN (STRSPLIT (newList, ',', 23)); 줄을 변경하십시오. 즉, 16 대신에 23. 기본적으로 전체 열 수를 제공하십시오. 나는 실수로 16을 주었다. –

관련 문제