2017-12-18 4 views
-1

유전자 서열 파일이 있는데 각 유전자의 헤더를 바꾸고 싶습니다. 여기에 입력은 다음과 같습니다특정 문자열로 시작하지만 해당 줄에 특정 단어를 유지하는 줄을 삭제하는 방법은 무엇입니까?

>lcl|CP000046.1_cds_AAW37389.1_1 [gene=dnaA] [locus_tag=SACOL0001] [protein=chromosomal replication initiator protein DnaA] [protein_id=AAW37389.1] [location=544..1905] [gbkey=CDS] 
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTGTAAGTTACTCAACTTTCCTAAAAGATACTGAGCTTTACACGATTAAAGATGGTGAAGCTATCGTATTATCGAGTATTCCTTTTAATGCAAATTGGTTAAATCAACAATATGCTGAAATTATCCAAGCAATCTTATTTGATGTTGTAGGCTATGAAGTTAAACCTCACTTTATTACTCTGAAGAATTAGCAAATTATAGTAATAATGAAACTGCTACTCCAAAAGAAACAACAAAACCTTCTACTGAAACAACTGAGGATAATCATGTGCTTGGTAGAGAGCAATTCAATGCCCATAACACATTTGACACTTTTGTAATCGGACCCGGTAACCGCTTTCCACATGCAGCGAGTTTAGCTGTGGCCGAAGCACCAGCCAAAGCGTACAATCCATTATTTATCTATGGAGGTGTTGGTTTA 

>lcl|CP000046.1_cds_AAW37390.1_2 [gene=dnaN] [locus_tag=SACOL0002] [protein=DNA polymerase III, beta subunit] [protein_id=AAW37390.1] [location=2183..3316] [gbkey=CDS] 
ATGATGGAATTCACTATTAAAAGAGATTATTTTATTACACAATTAAATGACACATTAAAAGCTATTTCACCAAGAACAACATTACCTATATTAACTGGTATCAAAATCGATGCGAAAGAACATGAAGTTATATTAACTGGTTCAGACTCTGAAATTTCAATAGAAATCACTATTCCTAAAACTGTAGATGGCGAAGATATTGTCAATATTTCAGAAACAGGCTCAGTAGTACTTCCTGGACGATTCTTTGTTGATATTATAAAAAAATTACCTGGTAAAGATGTTAAATTATCTACAAATGAACAATTCCAGACATTAATTACATCAGGTCATTCTGAATTTAATTTAAGTGGCTTAGATCCAGATCAATATCCTTTATTACCTCAAGTTTCTAGAGATG 

예상 출력 : 나는

sed '/^>/ d' inputfile > outputfile 

나오지하지만 얻을 수있는 아이디어를 얻고 있지 않다으로 특정 단어를 congaing 라인을 삭제하는 방법을 알고

>Saureus1|SACOL0001 
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTGTAAGTTACTCAACTTTCCTAAAAGATACTGAGCTTTACACGATTAAAGATGGTGAAGCTATCGTATTATCGAGTATTCCTTTTAATGCAAATTGGTTAAATCAACAATATGCTGAAATTATCCAAGCAATCTTATTTGATGTTGTAGGCTATGAAGTTAAACCTCACTTTATTACTCTGAAGAATTAGCAAATTATAGTAATAATGAAACTGCTACTCCAAAAGAAACAACAAAACCTTCTACTGAAACAACTGAGGATAATCATGTGCTTGGTAGAGAGCAATTCAATGCCCATAACACATTTGACACTTTTGTAATCGGACCCGGTAACCGCTTTCCACATGCAGCGAGTTTAGCTGTGGCCGAAGCACCAGCCAAAGCGTACAATCCATTATTTATCTATGGAGGTGTTGGTTTA 

>Saureus1|SACOL0002 
ATGATGGAATTCACTATTAAAAGAGATTATTTTATTACACAATTAAATGACACATTAAAAGCTATTTCACCAAGAACAACATTACCTATATTAACTGGTATCAAAATCGATGCGAAAGAACATGAAGTTATATTAACTGGTTCAGACTCTGAAATTTCAATAGAAATCACTATTCCTAAAACTGTAGATGGCGAAGATATTGTCAATATTTCAGAAACAGGCTCAGTAGTACTTCCTGGACGATTCTTTGTTGATATTATAAAAAAATTACCTGGTAAAGATGTTAAATTATCTACAAATGAACAATTCCAGACATTAATTACATCAGGTCATTCTGAATTTAATTTAAGTGGCTTAGATCCAGATCAATATCCTTTATTACCTCAAGTTTCTAGAGATG 

예상되는 출력. 여기서 첫 번째 부분에서는 SACOL00을 제외하고 유전자 머리글의 모든 텍스트를 삭제해야하며 그 전에는 fasta sysmbol ">"을 Strain name으로 유지해야합니다.

이런 종류의 질문이 반복되면 용서해주십시오. GNU와

+0

인용 태그보다는 같이 귀하의 샘플에 대한 코드 태그를 사용하십시오. – RavinderSingh13

+0

[editing-help] (http://stackoverflow.com/editing-help)를보십시오. – Cyrus

답변

1

Awk 솔루션 :

awk '/^>lcl/{ gsub(/^\[[^=]+=|\]$/,"",$3); printf ">Saureus1|%s\n",$3; next }1' file 

출력 :

>Saureus1|SACOL0001 
ATGTCGGAAAAAGAAATTTGGGAAAAAGTGCTTGAAATTGCTCAAGAAAAATTATCAGCTGTAAGTTACTCAACTTTCCTAAAAGATACTGAGCTTTACACGATTAAAGATGGTGAAGCTATCGTATTATCGAGTATTCCTTTTAATGCAAATTGGTTAAATCAACAATATGCTGAAATTATCCAAGCAATCTTATTTGATGTTGTAGGCTATGAAGTTAAACCTCACTTTATTACTCTGAAGAATTAGCAAATTATAGTAATAATGAAACTGCTACTCCAAAAGAAACAACAAAACCTTCTACTGAAACAACTGAGGATAATCATGTGCTTGGTAGAGAGCAATTCAATGCCCATAACACATTTGACACTTTTGTAATCGGACCCGGTAACCGCTTTCCACATGCAGCGAGTTTAGCTGTGGCCGAAGCACCAGCCAAAGCGTACAATCCATTATTTATCTATGGAGGTGTTGGTTTA 

>Saureus1|SACOL0002 
ATGATGGAATTCACTATTAAAAGAGATTATTTTATTACACAATTAAATGACACATTAAAAGCTATTTCACCAAGAACAACATTACCTATATTAACTGGTATCAAAATCGATGCGAAAGAACATGAAGTTATATTAACTGGTTCAGACTCTGAAATTTCAATAGAAATCACTATTCCTAAAACTGTAGATGGCGAAGATATTGTCAATATTTCAGAAACAGGCTCAGTAGTACTTCCTGGACGATTCTTTGTTGATATTATAAAAAAATTACCTGGTAAAGATGTTAAATTATCTACAAATGAACAATTCCAGACATTAATTACATCAGGTCATTCTGAATTTAATTTAAGTGGCTTAGATCCAGATCAATATCCTTTATTACCTCAAGTTTCTAGAGATG 
+0

두 솔루션 모두 작동 중입니다. –

관련 문제