2016-12-02 4 views
0

저는 파이썬에 처음으로 익숙하며 파일에서 사전을 작성한 다음 사전을 반복합니다. 나는 일식에서 일해 왔으며 출력이나 경고를받지 못하고있다.사전 컴파일 및 반복

import re 

id_to_info = {} #declare dictionary 

def parse_record(term): 
    go_id = re.findall(r"id:\s(.*?)\n", term, re.DOTALL) 
    name = re.findall(r"name:\s(.*?)\n", term, re.DOTALL) 
    namespace = re.findall(r"namespace:\s(.*?)\n", term, re.DOTALL) 
    is_a = re.findall(r"is_a:\s(.*?)\n", term, re.DOTALL) 
    info = namespace + "\n" + name + "\n" + is_a 
    id_to_info[go_id] = info 
    for go_id, info in id_to_info.interitems(): 
     print(go_id + "\t" + info) 

def split_record(record): 
    sp_file = open(record) 
    sp_records = sp_file.read() 
    sp_split_records = re.findall(r"(\[.*?)\n\n", sp_records, re.DOTALL) 
    for sp_record in sp_split_records: 
     parse_record(term=sp_record) 
    sp_file.close() 

split_record(record="go.rtf") 
:

과 같이 입력 모양 (실제 입력 상당히 큰)는

[Term] 
id: GO:0000010 
name: trans-hexaprenyltranstransferase activity 
namespace: molecular_function 
def: "Catalysis of the reaction: all-trans-hexaprenyl diphosphate + isopentenyl diphosphate = all-trans-heptaprenyl diphosphate + diphosphate." [KEGG:R05612, RHEA:20839] 
subset: gosubset_prok 
xref: KEGG:R05612 
xref: RHEA:20839 
is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups 

[Term] 
id: GO:0000011 
name: vacuole inheritance 
namespace: biological_process 
def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069] 
is_a: GO:0007033 ! vacuole organization 
is_a: GO:0048308 ! organelle inheritance 

[Term] 
id: GO:0000012 
name: single strand break repair 
namespace: biological_process 
def: "The repair of single strand breaks in DNA. Repair of such breaks is mediated by the same enzyme systems as are used in base excision repair." [http://www.ultranet.com/~jkimball/BiologyPages/D/DNArepair.html] 
subset: gosubset_prok 
is_a: GO:0006281 ! DNA repair 

[Term] 
id: GO:0000014 
name: single-stranded DNA endodeoxyribonuclease activity 
namespace: molecular_function 
def: "Catalysis of the hydrolysis of ester linkages within a single-stranded deoxyribonucleic acid molecule by creating internal breaks." [GOC:mah] 
synonym: "single-stranded DNA specific endodeoxyribonuclease activity" RELATED [] 
synonym: "ssDNA-specific endodeoxyribonuclease activity" RELATED [GOC:mah] 
is_a: GO:0004520 ! endodeoxyribonuclease activity 

내가 생산하기 위해 노력하고 출력은 내가 가진 코드는

GO:0000010  molecular_function 
trans-hexaprenyltranstransferase activity 
GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups 

GO:0000011 biological_process 
vacuole inheritance 
is_a: GO:0007033 ! vacuole organization 
is_a: GO:0048308 ! organelle inheritance 

GO:0000012 biological_process 
single strand break repair 
is_a: GO:0006281 ! DNA repair 

GO:0000014 molecular_function 
single-stranded DNA endodeoxyribonuclease activity 
is_a: GO:0004520 ! endodeoxyribonuclease activity 

어디서 잘못 가고 있는지 모르겠지만 주요 문제는 사전 호출이라고 생각하고 있습니까?

+0

'id_to_info.interitems() :'는 그것을 자르지 않습니다. 'is_a'가리스트이기 때문에'id_to_info.items() :' –

+2

당신의 코드가 여기에서 충돌합니다 :'info = namespace + "\ n"+ name + "\ n"+ is_a'. –

+0

... 실제로 어떤 출력을 얻습니까? – Prune

답변

1

re.findall 발견 된 항목의 목록을 반환합니다. 귀하의 코드는 문자열을 가정합니다. 한 줄에 하나의 히트가 있기 때문에 가능한 경우 [0]을 추가하십시오. is_a이 비어있을 수 있으므로 좀 더 부드러운 처리가 필요합니다.

또한, (키 값) 방법 iteritems (반복 항목)하지 I N teritems이다.

여기 업데이 트입니다 :

import re 

id_to_info = {} #declare dictionary 

def parse_record(term): 
    go_id = re.findall(r"id:\s(.*?)\n", term, re.DOTALL)[0] 
    name = re.findall(r"name:\s(.*?)\n", term, re.DOTALL)[0] 
    namespace = re.findall(r"namespace:\s(.*?)\n", term, re.DOTALL)[0] 
    is_a = re.findall(r"is_a:\s(.*?)\n", term, re.DOTALL) 
    is_a = is_a[0] if is_a else "" 
    # print namespace, name, is_a 
    info = namespace + "\n" + name + "\n" + is_a 
    id_to_info[go_id] = info 
    for go_id, info in id_to_info.iteritems(): 
     print(go_id + "\t" + info) 

def split_record(record): 
    sp_file = open(record) 
    sp_records = sp_file.read() 
    sp_split_records = re.findall(r"(\[.*?)\n\n", sp_records, re.DOTALL) 
    for sp_record in sp_split_records: 
     parse_record(term=sp_record) 
    sp_file.close() 

split_record(record="go.rtf") 

출력 :

GO:0000010 molecular_function 
trans-hexaprenyltranstransferase activity 
GO:0016765 ! transferase activity, transferring alkyl or aryl (other 
GO:0000011 biological_process 
vacuole inheritance 
GO:0007033 ! vacuole organization 
GO:0000010 molecular_function 
trans-hexaprenyltranstransferase activity 
GO:0016765 ! transferase activity, transferring alkyl or aryl (other 
GO:0000011 biological_process 
vacuole inheritance 
GO:0007033 ! vacuole organization 
GO:0000010 molecular_function 
trans-hexaprenyltranstransferase activity 
GO:0016765 ! transferase activity, transferring alkyl or aryl (other 
GO:0000012 biological_process 
single strand break repair 

내가 당신에게 형식의 나머지 부분을 떠날거야. :-)

2
import re 

id_to_info = {} #declare dictionary 

def parse_record(term): 
    go_id = re.findall(r"id:\s(.*?)\n", term, re.DOTALL)[0] 
    name = re.findall(r"name:\s(.*?)\n", term, re.DOTALL)[0] 
    namespace = re.findall(r"namespace:\s(.*?)\n", term, re.DOTALL)[0] 
    is_a = re.findall(r'is_a:(.*)', term, re.DOTALL)[0] 
    info = namespace + "\n" + name + "\n" + is_a 
    id_to_info[go_id] = info 
    for go_id, info in id_to_info.iteritems(): 
     print(go_id + "\t" + info) 

def split_record(record): 
    sp_file = open(record) 
    sp_records = sp_file.read() 
    sp_split_records = re.findall(r"(\[.*?)\n\n", sp_records, re.DOTALL) 
    for sp_record in sp_split_records: 
     parse_record(term=sp_record) 
    sp_file.close() 

split_record(record="go.rtf") 

나는 적어도 디버그 인터프리터 대신 터미널 또는 을, IDE를 사용하여 사용을 을하지 제안 :

Python 2.7.10 (default, Jul 30 2016, 18:31:42) 
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin 
Type "help", "copyright", "credits" or "license" for more information. 
>>> s = """[Term] 
... id: GO:0000010 
... name: trans-hexaprenyltranstransferase activity 
... namespace: molecular_function 
... def: "Catalysis of the reaction: all-trans-hexaprenyl diphosphate + isopentenyl diphosphate = all-trans-heptaprenyl diphosphate + diphosphate." [KEGG:R05612, RHEA:20839] 
... subset: gosubset_prok 
... xref: KEGG:R05612 
... xref: RHEA:20839 
... is_a: GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups""" 
>>> import re 
>>> re.findall(r'is_a:(.*)', s) 
[' GO:0016765 ! transferase activity, transferring alkyl or aryl (other than methyl) groups'] 

는 또한 인쇄를 많이 넣어, 파이썬은 그렇지 않은 의미, 동적 컴파일하고 실행하십시오. 오류가 발생할 때까지 실행됩니다.

귀하의 문제 :

1) 정규식 - 구글 주위 2) 오타 - iteritems! 둘 다 Python doc에서 읽을 수 있습니다. 그들은 정말 좋습니다 .. 또는 어떤 책을 선택하십시오. 코드를 작성하고 통역사를 실험하면서 많은 것을 배우게됩니다.

--- 파이썬 애호가!