2011-07-29 4 views
2

나는 python &에서 새로운 것들을 시도해 보았습니다. 나는 두 개의 목록을 사전에 가지고 있습니다.파이썬에서 대소 문자를 구분하는 문자열 매칭 대체

List1:        List2: 
Anterior       cord 
cuneate nucleus      Medulla oblongata 
nucleus        Spinal cord 
Intermediolateral nucleus   Spinal 
            sksdsj 
british        7 

그리고 아래와 같은 텍스트 줄이 있습니다.

<s id="5239778-2">The name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s> 
<s id="3691284-1">In the medulla oblongata, the arcuate nucleus is a group of neurons located on the anterior surface of the medullary pyramids.</s> 
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal.</s> 
<s id="1053949-16">The Anterior cord syndrome results from injury to the anterior part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s> 
<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s> 

리턴 코드는 list1 & list2.so에서 두 줄을 모두 가져와야합니다. 다음 코드를 시도했습니다.

그러나 다음과 같은 결과가 나타납니다.

<s id="5239778-2">The name refers collectively to the <e1>cuneate nucleus</e1> and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s> 
<s id="3691284-1">In the medulla oblongata, the arcuate <e1>nucleus</e1> is a group of neurons located on the anterior surface of the medullary pyramids.</s> 
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal.</s> 
<s id="1053949-16">The <e1>Anterior</e1> <e2>cord</e2> syndrome results from injury to the <e1>anterior</e1> part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s> 
<s id="69-7">...Meanwhile is the studio 7 album by British pop band 10cc.</s> 

결과 라인 -4, 내가 원하는 것을 두리스트에서 일치하는 문자열을 얻었습니다. 그러나, 하나 이상의 문자열 (예 : result line-1 & 3)과 일치하는 라인을 얻고 싶지는 않습니다. 또한 , if는 두리스트의 문자열을 일치시킵니다 (예 : result line-2).

모든 종류의 도움을 주시면 대단히 감사하겠습니다.

답변

5

기본적으로 <e1> 개의 태그와 다른 단어를 <e2> 태그에 넣으려고합니다. 그게 맞습니까?

그렇다면,이 같은 무언가 할 것입니다 :

#!/usr/bin/python 

from __future__ import print_function 
import re 

text = '''\ 
<s id="5239778-2">The name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s> 
<s id="3691284-1">In the medulla oblongata, the arcuate nucleus is a group of neurons located on the anterior surface of the medullary pyramids.</s> 
<s id="21120-99">Anterior horn cells, motoneurons located in the spinal cord.</s> 
<s id="1053949-16">The Anterior cord syndrome results from injury to the anterior part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>''' 

list1 = ('Anterior', 'cuneate nucleus', 'Intermediolateral nucleus') 
list2 = ('cord', 'Medulla oblongata', 'Spinal cord') 

# put phrases in \b so that they match the whole words 
re1 = re.compile("(%s)" % "|".join(r"\b%s\b" % i for i in list1), re.IGNORECASE) 
re2 = re.compile("(%s)" % "|".join(r"\b%s\b" % i for i in list2), re.IGNORECASE) 

for line in text.split("\n"): 
    line = re1.sub(r"<e1>\1</e1>", line) 
    line = re2.sub(r"<e2>\1</e2>", line) 
    print(line) 

출력 :

<s id="5239778-2">The name refers collectively to the <e1>cuneate nucleus</e1> and gracile nucleus, which are present at the junction between the <e2>spinal cord</e2> and the <e2>medulla oblongata</e2>.</s> 
<s id="3691284-1">In the <e2>medulla oblongata</e2>, the arcuate nucleus is a group of neurons located on the <e1>anterior</e1> surface of the medullary pyramids.</s> 
<s id="21120-99"><e1>Anterior</e1> horn cells, motoneurons located in the <e2>spinal cord</e2>.</s> 
<s id="1053949-16">The <e1>Anterior</e1> <e2>cord</e2> syndrome results from injury to the <e1>anterior</e1> part of the <e2>spinal cord</e2>, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the <e2>spinal cord</e2>.</s> 
+0

에있는 list2의 태그와 줄의 문자열과 일치하는 태그 – Liza

+0

좋아요, 제가 게시 한 코드는 정확히 그것을합니다. –

+0

목록에 숫자 문자열이있어서 일치하는 부분이 있다고 생각해주십시오. 그래서, 나는 항상이 부분을 벗어나야 만합니다. ''파트 – Liza

1

이 방법에 대해 : 나는에 목록 1에서 정확히 그 문자열을 넣을

result = "" 
lines = ['<s id="5239778-2">The name refers collectively to the cuneate nucleus and gracile nucleus, which are present at the junction between the spinal cord and the medulla oblongata.</s>', 
'<s id="3691284-1">In the medulla oblongata, the arcuate nucleus is a group of neurons located on the anterior surface of the medullary pyramids.</s>', 
'<s id="21120-99">Anterior horn cells, motoneurons located in the spinal cord.</s>', 
'<s id="1053949-16">The Anterior cord syndrome results from injury to the anterior part of the spinal cord, causing weakness and loss of pain and thermal sensations below the injury site but preservation of proprioception that is usually carried in the posterior part of the spinal cord.</s>'] 

for line in lines: 
    for item1 in list1: 
     if line.find(item1) != -1: 
      for item2 in list2: 
       if line.find(item2) != -1: 
         result = result + line + '\n' 
         break 
      break 
print result 
관련 문제