대신 데이터 파일을 처리 내가 파이썬 스크립트가

올바른 문자열의 파일에 널 (null) 바이트를 쓰기 동일하지 않습니다. 첫 번째 컴퓨터에서 Python 2.6.6을 사용하면 예상 한 결과가 나타납니다. 그러나 다른 것들 (파이썬 2.6.6, 3.3.2, 2.7.5)에서 file 객체의 write 메소드는 대부분의 처리 중에 원하는 값 대신에 null 바이트를 넣습니다. 대신 데이터 파일을 처리 내가 파이썬 스크립트가

$ hexdump -C result/process/1.res 
00000000 73 6f 75 72 63 65 2c 72 73 73 69 2c 6c 71 69 2c |source,rssi,lqi,| 
00000010 70 61 63 6b 65 74 49 64 2c 72 75 6e 2c 63 6f 75 |packetId,run,cou| 
00000020 6e 74 65 72 0a 00 00 00 00 00 00 00 00 00 00 00 |nter............| 
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 
* 
0003a130 00 00 00 00 00 00 00 00 00 00 31 33 2c 36 35 2c |..........13,65,| 
0003a140 31 34 2c 38 2c 39 38 2c 31 33 31 34 32 0a 31 32 |14,8,98,13142.12| 
0003a150 2c 34 37 2c 31 37 2c 38 2c 39 38 2c 31 33 31 34 |,47,17,8,98,1314| 
0003a160 33 0a 33 2c 34 35 2c 31 38 2c 38 2c 39 38 2c 31 |3.3,45,18,8,98,1| 
0003a170 33 31 34 34 0a 31 31 2c 38 2c 32 33 2c 38 2c 39 |3144.11,8,23,8,9| 
0003a180 38 2c 31 33 31 34 35 0a 39 2c 32 30 2c 32 32 2c |8,13145.9,20,22,|

방법이 문제 좀 해결하는 당신에게 아이디어가 :이 결과를 얻을? 다음 고려 사항으로

출처

2013-06-26 user2523255

링크의 전체 코드를 살펴 보았습니다. 파이썬과 객체 지향 프로그래밍에 익숙하지 않습니까? 문제는 전역을 사용하고 여러 위치에서 파일을 열어 사전에 파일 핸들을 저장한다는 것입니다. 매우 이해하기 어렵습니다. 코드에 필사적으로 리팩토링이 필요합니다. – MattH

몇 가지 일반적인 발언. 디버거를 사용해 보셨습니까? 목록 이해에 print-statement를 추가하여 출력이 예상 한 것과 같은지 확인하십시오. 'rstrip'을 사용하는 대신'strip()'을 사용하여 후행 공백을 포함하여 모든 줄 끝 문자를 제거하십시오. –

print-statement를 사용해 보았는데 출력이 널 바이트가 아닌 올바른 행이었습니다. – user2523255

: 프로그래밍 파이썬 10 년 이상에서

, 나는 global를 사용하려면 강력한 이유 건너 적이 없어요. 인수를 함수에 전달하십시오.
완료되면 파일을 닫으려면 with statement을 사용하십시오.

코드를 리팩토링하려는 시도가 있는데, 여기에는 특정 식별자로 모든 행을 저장할 수있는 충분한 메모리가 있다고 가정합니다.

이 리팩터링 후 결과 파일에 null 바이트가 있으면 디버깅을 진행할 합당한 기반이 있습니다.

import os 
import re 
from contextlib import closing 

def list_files_to_process(directory='results'): 
    """ 
    Return a list of files from directory where the file extension is '.res', 
    case insensitive. 
    """ 
    results = [] 
    for filename in os.listdir(directory): 
    filepath = os.path.join(directory,filename) 
    if os.path.isfile(filepath) and filename.lower().endswith('.res'): 
     results.append(filepath) 
    return results 

def group_lines(sequence): 
    """ 
    Generator, process a sequence of lines, separated by a particular line. 
    Yields batches of lines along with the id from the separator. 
    """ 
    separator = re.compile('^A:(?P<id>\d+):$') 
    batch = [] 
    batch_id = None 
    for line in sequence: 
    if not line: # Ignore blanks 
     continue 
    m = separator.match(line): 
    if m is not None: 
     if batch_id is not None or len(batch) > 0: 
     yield (batch_id,batch) 
     batch_id = m.group('id') 
     batch = [] 
    else: 
     batch.append(line) 
    if batch_id is not None or len(batch) > 0: 
    yield (batch_id,batch) 

def filename_for_results(batch_id,result_directory): 
    """ 
    Return an appropriate filename for a batch_id under the result directory 
    """ 
    return os.path.join(result_directory,"results-%s.res" % (batch_id,)) 

def open_result_file(filename,header="source,rssi,lqi,packetId,run,counter"): 
    """ 
    Return an open file object in append mode, having appended a header if 
    filename doesn't exist or is empty 
    """ 
    if os.path.exists(filename) and os.path.getsize(filename) > 0: 
    # No need to write header 
    return open(filename,'a') 
    else: 
    f = open(filename,'a') 
    f.write(header + '\n') 
    return f 

def process_file(filename,result_directory='results/processed'): 
    """ 
    Open filename and process it's contents. Uses group_lines() to group 
    lines into different files based upon specific line acting as a 
    content separator. 
    """ 
    error_filename = filename_for_results('error',result_directory) 
    with open(filename,'r') as in_file, open(error_filename,'w') as error_out: 
    for batch_id, lines in group_lines(in_file): 
     if len(lines) == 0: 
     error_out.write("Received batch %r with 0 lines" % (batch_id,)) 
     continue 
     out_filename = filename_for_results(batch_id,result_directory) 
     with closing(open_result_file(out_filename)) as out_file: 
     for line in lines: 
      if line.startswith('L') and line.endswith('E') and line.count(',') == 5: 
      line = line.lstrip('L').rstrip('E') 
      out_file.write(line + '\n') 
      else: 
      error_out.write("Unknown line, batch=%r: %r\n" %(batch_id,line)) 

if __name__ == '__main__': 
    files = list_files_to_process() 
    for filename in files: 
    print "Processing %s" % (filename,) 
    process_file(filename)

출처

2013-06-26 11:44:47 MattH

+1 노력! –

약간의 수정으로 작동합니다 (줄 바꿈 끝에 줄 바꿈이 있음). 감사 !!! – user2523255

@ user2523255 : 환영합니다. 어떤 이유로 나는 실수로 파일 객체를 반복하는 것이 개행을 먹었다 고 생각했다. null 바이트 문제도 해결됩니까? – MattH

대신 데이터 파일을 처리 내가 파이썬 스크립트가

답변

관련 문제