2011-05-04 2 views
2

줄에 키워드가 있고 그 다음 3 줄에서 다른 키워드가 발견되면 텍스트 파일을 검색하여 줄과 그 다음 3 줄을 인쇄하고 싶습니다.파이썬에서 텍스트의 검색된 부분을 이미 인쇄 한 후에 어떻게 텍스트의 후속 부분으로 이동할 수 있습니까?

지금 제 코드가 너무 많은 정보를 인쇄합니다. 한 부분이 이미 인쇄되면 다음 텍스트 섹션으로 넘어갈 수있는 방법이 있습니까?

text = """ 

here is some text 1 
I want to print out this line and the following 3 lines only once keyword 2 
print this line since it has a keyword2 3 
print this line keyword 4 
print this line 5 
I don't want to print this line but I want to start looking for more text starting at this line 6 
Don't print this line 7 
Not this line either 8 
I want to print out this line again and the following 3 lines only once keyword 9 
please print this line keyword 10 
please print this line it has the keyword2 11 
please print this line 12 
Don't print this line 13 
Start again searching here 14 
etc. 
""" 

text2 = open("tmp.txt","w") 
text2.write(text) 
text2.close() 

searchlines = open("tmp.txt").readlines() 

data = [] 

for m, line in enumerate(searchlines): 
    line = line.lower() 
    if "keyword" in line and any("keyword2" in l.lower() for l in searchlines[m:m+4]): 
     for line2 in searchlines[m:m+4]: 
      data.append(line2) 
print ''.join(data) 

출력은 바로 지금이다 :

I want to print out this line and the following 3 lines only once keyword 2 
print this line since it has a keyword2 3 
print this line keyword 4 
print this line 5 
print this line since it has a keyword2 3 
print this line keyword 4 
print this line 5 
I don't want to print this line but I want to start looking for more text starting at this line 6 
I want to print out this line again and the following 3 lines only once keyword 9 
please print this line keyword 10 
please print this line it has the keyword2 11 
please print this line 12 
please print this line keyword 10 
please print this line it has the keyword2 11 
please print this line 12 
Don't print this line 13 
please print this line it has the keyword2 11 
please print this line 12 
Don't print this line 13 
Start again searching here 14 

내가 인쇄하기를 원하는 경우 :

I want to print out this line and the following 3 lines only once keyword 2 
print this line since it has a keyword2 3 
print this line keyword 4 
print this line 5 
I want to print out this line again and the following 3 lines only once keyword 9 
please print this line keyword 10 
please print this line it has the keyword2 11 
please print this line 12 

답변

1

그래서, 첫 번째 키워드 keyword는 두 번째 키워드 keyword2의 문자열입니다. 그래서 이것을 regexp 객체를 사용하여 구현 했으므로 boundary anchor라는 단어를 사용할 수 있습니다. \b.

import re 
from StringIO import StringIO 

text = """ 

here is some text 1 
I want to print out this line and the following 3 lines only once keyword 2 
print this line since it has a keyword2 3 
print this line keyword 4 
print this line 5 
I don't want to print this line but I want to start looking for more text starting at this line 6 
Don't print this line 7 
Not this line either 8 
I want to print out this line again and the following 3 lines only once keyword 9 
please print this line keyword 10 
please print this line it has the keyword2 11 
please print this line 12 
Don't print this line 13 
Start again searching here 14 
etc. 
""" 


def my_scan(data,search1,search2): 
    buffer = [] 
    for line in data: 
    buffer.append(line) 
    if len(buffer) > 4: 
     buffer.pop(0) 
    if len(buffer) == 4: # Valid search block 
     if search1.search(buffer[0]) and search2.search("\n".join(buffer[1:3])): 
     for item in buffer: 
      yield item 
     buffer = [] 

# First search term 
s1 = re.compile(r'\bkeyword\b') 
s2 = re.compile(r'\bkeyword2\b') 

for row in my_scan(StringIO(text),s1,s2): 
    print row.rstrip() 

는 생산 :

I want to print out this line and the following 3 lines only once keyword 2 
print this line since it has a keyword2 3 
print this line keyword 4 
print this line 5 
I want to print out this line again and the following 3 lines only once keyword 9 
please print this line keyword 10 
please print this line it has the keyword2 11 
please print this line 12 
+0

입니다 두 번째 검색어와 일치하는 – MattH

1

그래서 당신은 이상을 포함하는 4 개 라인의 모든 블록을 인쇄 할 2 개의 키워드?

어쨌든, 내가 방금 가져온 내용이 있습니다. 어쩌면 당신은 그것을 사용할 수 있습니다 :

text = """ 

here is some text 1 
I want to print out this line and the following 3 lines only once keyword 2 
print this line since it has a keyword2 3 
print this line keyword 4 
print this line 5 
I don't want to print this line but I want to start looking for more text starting at this line 6 
Don't print this line 7 
Not this line either 8 
I want to print out this line again and the following 3 lines only once keyword 9 
please print this line keyword 10 
please print this line it has the keyword2 11 
please print this line 12 
Don't print this line 13 
Start again searching here 14 
etc. 
""".splitlines() 

keywords = ['keyword', 'keyword2'] 

buffer, kw = [], set() 
for line in text: 
    if len(buffer) == 0:     # first line of a block 
     for k in keywords: 
      if k in line: 
       kw.add(k) 
       buffer.append(line) 
       continue 
    else:        # continuous lines 
     buffer.append(line) 
     for k in keywords: 
      if k in line: 
       kw.add(k) 
     if len(buffer) > 3: 
      if len(kw) >= 2:    # just print blocks with enough keywords 
       print '\n'.join(buffer) 
      buffer, kw = [], set() 
0

"키워드"는 "keyword2"의 하위 집합입니다.

또한 데이터는 13 번 줄을 보지 않으려는 것을 의미합니다. 문제 성명서에 인쇄해야합니다.

첫 번째 키워드가 "키워드"에서 "firstkey"로 변경되어 코드가 작동합니다 (13 행 제외). 다른 사람이 지적으로

$ diff /tmp/q /tmp/q2 
4c4 
< I want to print out this line and the following 3 lines only once keyword 2 
--- 
> I want to print out this line and the following 3 lines only once firstkey 2 
6c6 
< print this line keyword 4 
--- 
> print this line firstkey 4 
11,12c11,12 
< I want to print out this line again and the following 3 lines only once keyword 9 
< please print this line keyword 10 
--- 
> I want to print out this line again and the following 3 lines only once firstkey 9 
> please print this line firstkey 10 
30c30 
<  if "keyword" in line and any("keyword2" in l.lower() for l in searchlines[m:m+4]): 
--- 
>  if "firstkey" in line and any("keyword2" in l.lower() for l in searchlines[m:m+4]): 
0

첫째, 당신이 그런 식으로 코드를 수정할 수 :

text = """ 
0// 
1// here is some text 1 
A2// I want to print out this line and the following 3 lines only once keyword 2 
b3// print this line since it has a keyword2 3 
b4// print this line keyword 4 
b5// print this line 5 
6// I don't want to print this line but I want to start looking for more text starting at this line 6 
7// Don't print this line 7 
8// Not this line either 8 
A9// I want to print out this line again and the following 3 lines only once keyword 9 
b10// please print this line keyword 10 
b11// please print this line it has the keyword2 11 
b12// please print this line 12 
13// Don't print this line 13 
14// Start again searching here 14 
15// etc. 
""" 
searchlines = map(str.lower,text.splitlines(1)) 
# splitlines(1) with argument 1 keeps the newlines 

data,again = [],-1 

for m, line in enumerate(searchlines): 
    if "keyword" in line and m>again and "keyword2" in ''.join(searchlines[m:m+4]): 
     data.extend(searchlines[m:m+4]) 
     again = m+4 

print ''.join(data) 

.

둘째, 짧은 정규식 솔루션은 결과

text = """ 
0// 
1// here is some text 1 
A2// I want to print out this line and the following 3 lines only once keyword 2 
b3// print this line since it has a keyword2 3 
b4// print this line keyword 4 
b5// print this line 5 
6// I don't want to print this line but I want to start looking for more text starting at this line 6 
7// Don't print this line 7 
8// Not this line either 8 
A9// I want to print out this line again and the following 3 lines only once keyword 9 
b10// please print this line keyword 10 
b11// please print this line it has the keyword2 11 
b12// please print this line 12 
13// Don't print this line 13 
14// Start again searching here 14 
15// etc. 
""" 

import re 

regx = re.compile('(^.*?(?<=[ \t]){0}(?=[ \t]).*\r?\n' 
        '.*?((?<=[ \t]){1}(?=[ \t]))?.*\r?\n' 
        '.*?((?<=[ \t]){1}(?=[ \t]))?.*\r?\n' 
        '.*?(?(1)|(?(2)|{1})).*)'.\ 
        format('keyword','keyword2'),re.MULTILINE|re.IGNORECASE) 

print '\n'.join(m.group(1) for m in regx.finditer(text)) 

나는 또한 당신이 세 줄의 블록 내에서 잠재적 인 "첫 번째 줄의 일치"를 찾고 있지 않습니다 있으리라 믿고있어

A2// I want to print out this line and the following 3 lines only once keyword 2 
b3// print this line since it has a keyword2 3 
b4// print this line keyword 4 
b5// print this line 5 
b10// please print this line keyword 10 
b11// please print this line it has the keyword2 11 
b12// please print this line 12 
13// Don't print this line 13 
관련 문제