hadoop 순차 파일을 읽는 방법은 무엇입니까?

나는 hadoop map-reduce 작업의 결과물 인 순차 파일을 가지고있다. 이 파일에서 데이터는 키 값 쌍으로 기록되며 값 자체는 맵입니다. MAP 개체로 값을 읽고 더 처리 할 수 있도록하고 싶습니다. 프로그램hadoop 순차 파일을 읽는 방법은 무엇입니까?

Configuration config = new Configuration(); 
    Path path = new Path("D:\\OSP\\sample_data\\data\\part-00000"); 
    SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config); 
    WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance(); 
    Writable value = (Writable) reader.getValueClass().newInstance(); 
    long position = reader.getPosition(); 

    while(reader.next(key,value)) 
    { 
      System.out.println("Key is: "+textKey +" value is: "+val+"\n"); 
    }

출력 : 키는이 [이 열쇠] 값은 {ABC = 839,177, XYZ = 548,498, LMN = 2, PQR = 1} 문자열 여기

내가 얻고 값 ,하지만 나는 그것을지도의 대상으로 원한다.

출처

2011-11-25 samarth

어디에서'val'이 오나요? 그리고지도는'Writable'이 아닙니다, 당신은 당신의 m/r 일에있는 수업에 무엇을 사용하고 있습니까? –

순차 파일을 가지고 있는데 맵 축소 작업에서 수행중인 작업을 알지 못합니다. 다음 정보가 제공됩니다. "각 해당 파일을 시퀀스 파일로 열어야합니다. 압축 해제 코덱을 사용해야합니다. 시퀀스 파일 클래스는 사용할 수있는 압축 코덱을 말할 수있는 것, 그리고 나서 각 키와 TypedBytes를 사용하여 인코딩 된 것 같아요. " – samarth

그런 다음 키와 값의 클래스를 가져와야합니다. 그렇지 않으면 올바르게 직렬화하지 않습니다. –

확인 SequenceFile#next(Writable, Writable)

while(reader.next(key,value)) 
{ 
     System.out.println("Key is: "+textKey +" value is: "+val+"\n"); 
}

에 대한 API 문서는 SequenceFile의 값 유형을 얻을 수

while(reader.next(key,value)) 
{ 
     System.out.println("Key is: "+key +" value is: "+value+"\n"); 
}

사용 SequenceFile.Reader#getValueClassName로 교체해야합니다. SequenceFile에는 파일 헤더에 키/값 유형이 있습니다.

출처

2011-11-25 09:30:26

고맙습니다. 가치 클래스는 "TypedBytesWritable"입니다.이 클래스에서지도 객체를 얻을 수 있습니까? – samarth

[TypedBytesWritable # getValue은] (http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/typedbytes/TypedBytesWritable.html#getValue%28%29) 개체를 얻어야한다. –

안녕하세요. 저에게 잘됐 읍니다. 정말 감사합니다. – samarth

hadoop 순차 파일을 읽는 방법은 무엇입니까?

답변

관련 문제