파이썬 : 줄 바꿈 문자 만 각 줄에 붙이십시오.

숫자가있는 단어와 두 가지 유형의 문자가있는 큰 텍스트 파일 인 '|'과 '.'이 있습니다. StackOverflow에서 검색하여이 문자열을 가져 오는 방법과 문자 만 유지하는 방법을 찾았습니다. 예를 들어,파이썬 : 줄 바꿈 문자 만 각 줄에 붙이십시오.

old_fruits='apple|0.00|kiwi|0.00|0.5369|-0.2437|banana|0.00|pear'

경우 다음

re.sub("[^A-Za-z]","",old_fruits)

내가 다음에, 각 행에 하나 개의 단어로 파일에서이 단어를 쓰기 위해 노력하고있어

'applekiwibananapear'

을 반환 개행 문자와 다음 단어는 다음과 같습니다.

apple 
kiwi 
banana 
pear

어떤 생각이나 올바른 방향을 가리키는 것이 좋습니다.

출처

2012-08-13 Levar

이것을보십시오 :

import re 

old_fruits = 'apple|0.00|kiwi|0.00|0.5369|-0.2437|banana|0.00|pear' 

with open('fruits.out', 'w') as f: 
    fruits = re.findall(r'[^\W\d]+', old_fruits) 
    f.write('\n'.join(fruits))

출처

2012-08-13 03:41:38 fabiocerqueira

고마워요! 나는 그것을 시도했고 그것이 내가 원하는 것처럼 효과가 있었다. – Levar

기준으로 영업 이익의 코드를 사용하여

import re 
old_fruits = 'apple|0.00|kiwi|0.00|0.5369|-0.2437|banana|0.00|pear' 

with open('outdata.txt', 'w') as f: 
    f.write('\n'.join(re.sub("[^A-Za-z]"," ",old_fruits).split()))

파일 'outdata.txt'에

apple 
kiwi 
banana 
pear

을 제공합니다.

출처

2012-08-13 03:38:07 Levon

그냥 너를 \ n 넣고 나누면 (\ n 꺼내서) 다시 넣어 둬. –

내 대답이 안되니?+를 사용하면 1 + 인스턴스가 일치하고 \ n으로 바뀝니다. 일종의 느낌이 당신이하려고했던 것 같습니다 –

대답은 어렵지 않다,이 비 알파 문자의 인스턴스가 대체됩니다 1+하지

print re.sub("[^A-Za-z]+","\n",old_fruits) #re.sub("[^A-Za-z]+","\n",old_fruits) is the string you want

는 "+"이란 이유를 가장 좋은 방법 인 경우 나도 몰라하지만, \ n을

출처

2012-08-13 03:39:18

of=old_fruits.split("|") 
for i in range(0,len(of),2): 
# write to file

출처

2012-08-13 03:40:58

당신은 정규 표현식을 사용하지 않고이 작업을 수행 할 수 있습니다.

old_fruits = 'apple|0.00|kiwi|0.00|0.5369|-0.2437|banana|0.00|pear' 
words = (word for word in old_fruits.split('|') if word.isalpha()) 
new_fruits = '\n'.join(words) 

print(new_fruits)

출력은

입니다 : 최종 출력을 형성하기 위해, 파이프 문자 문자열을 분할 알파벳 문자 만있는 그 단어를 필터링 발전기 표현과 inbuild string.isalpha() 기능을 사용하고 그들과 합류

apple 
kiwi 
banana 
pear

원하는대로 (파일에 쓰지 않지만 사용자가 이에 대처할 수 있다고 가정).

편집 : 내 컴퓨터에

import timeit # Setup - not counted in the timing so it doesn't matter we include regex for both tests setup = r"""old_fruits = 'apple|0.00|kiwi|0.00|0.5369|-0.2437|banana|0.00|pear' import re fruit_re=re.compile(r'[^\W\d]+') """ no_re = r"""words = (word for word in old_fruits.split('|') if word.isalpha()) new_fruits = '\n'.join(words)""" with_re = r"""new_fruits = '\n'.join(fruit_re.findall(old_fruits))""" num = 10000 print("Short input") t = timeit.timeit(no_re, setup, number=num) print("No regex: {0:.2f} microseconds to run".format((t*1e6)/num)) t = timeit.timeit(with_re, setup, number=num) print("With regex: {0:.2f} microseconds to run".format((t*1e6)/num)) print("") print("100 times longer input") setup = r"""old_fruits = 'apple|0.00|kiwi|0.00|0.5369|-0.2437|banana|0.00|pear'*100 import re fruit_re=re.compile(r'[^\W\d]+')""" t = timeit.timeit(no_re, setup, number=num) print("No regex: {0:.2f} microseconds to run".format((t*1e6)/num)) t = timeit.timeit(with_re, setup, number=num) print("With regex: {0:.2f} microseconds to run".format((t*1e6)/num))

결과 :

Short input No regex: 18.31 microseconds to run With regex: 15.37 microseconds to run 100 times longer input No regex: 793.79 microseconds to run With regex: 999.08 microseconds to run

는 그래서 미리 컴파일 된 비 정규식 대 정규식의 일부 타이밍 비교를 제공하기 위해 빠른 스크립트를 노크 정규 표현식은 짧은 입력 문자열에서 더 빠릅니다. 생성자 표현식이 더 빠르면 (적어도 제 컴퓨터 - Ubuntu Linux, Python 2.7 - 결과가 다를 수 있습니다).

출처

2012-08-13 03:52:58 Blair

감사합니다. 그것도 잘 작동합니다. – Levar

@Levar - 정규식 대 발전기의 속도를 빠르게 테스트하는 업데이트 된 답변. – Blair

파이썬 : 줄 바꿈 문자 만 각 줄에 붙이십시오.

답변

관련 문제