SnakeMake 파이프 라인을 시도하고 있는데 실제로 이해가 안되는 오류에 집착했습니다.Snakemake : Snakemake 파이프 라인의 MissingInputException
내가 입력 파일이있는 디렉토리 (raw_data
)을 가지고 :
ll /home/nico/labo/etudes/Optimal/data/raw_data
total 41M
drwxrwxr-x 2 nico nico 4,0K mars 6 16:09 ./
drwxrwxr-x 11 nico nico 4,0K mars 6 16:14 ../
-rw-rw-r-- 1 nico nico 15M févr. 27 12:21 sampleA_R1.fastq.gz
-rw-rw-r-- 1 nico nico 19M févr. 27 12:22 sampleA_R2.fastq.gz
-rw-rw-r-- 1 nico nico 3,4M févr. 27 12:21 sampleB_R1.fastq.gz
-rw-rw-r-- 1 nico nico 4,3M févr. 27 12:22 sampleB_R2.fastq.gz
이 디렉토리는이 개 샘플 4 개 파일이 포함되어 있습니다. 마지막으로 SnakeMake이 snakefile_bwa_samtools.py
파일
{
"fastqExtension": "fastq.gz",
"fastqDir": "/home/nico/labo/etudes/Optimal/data/raw_data",
"outputDir": "/home/nico/labo/etudes/Optimal/data/mapping_BaL",
"logDir": "logs",
"reference": {
"fasta": "/home/nico/labo/references/genomes/HIV1/BaL_AY713409/BaL_AY713409.fasta",
"index": "/home/nico/labo/references/genomes/HIV1/BaL_AY713409/BaL_AY713409.fasta.bwt"
}
}
그리고 :
import subprocess
from os.path import join
### Globals ---------------------------------------------------------------------
# A Snakemake regular expression matching fastq files.
SAMPLES, = glob_wildcards(join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]))
print(SAMPLES)
### Rules -----------------------------------------------------------------------
# Pipeline output files
rule all:
input: expand(join(config["outputDir"], "{sample}.bam.bai"), sample=SAMPLES)
# Reads alignment on reference genome and BAM file creation
rule bwa_mem_to_bam:
input:
index = config["reference"]["index"],
fasta = config["reference"]["fasta"],
fq1_ID = "{sample}_R1."+config["fastqExtension"],
fq2_ID = "{sample}_R2."+config["fastqExtension"],
fq1 = join(config["fastqDir"], "{sample}_R1."+config["fastqExtension"]),
fq2 = join(config["fastqDir"], "{sample}_R2."+config["fastqExtension"])
output:
temp(join(config["outputDir"], "{sample}.bamUnsorted"))
version:
subprocess.getoutput(
"man bwa | tail -n 1 | cut -d ' ' -f 1 | cut -d '-' -f 2"
)
log:
join(config["outputDir"], config["logDir"], "{sample}.bwa_mem.log")
message:
"Alignment of {input.fq1_ID} and {input.fq2_ID} on {input.fasta} with BWA version {version}."
shell:
"bwa mem {input.fasta} {input.fq1} {input.fq2} 2> {log} | samtools view -Sbh - > {output}"
# Sorting the BAM files on genomic positions
rule bam_sort:
input:
join(config["outputDir"], "{sample}.bamUnsorted")
output:
join(config["outputDir"], "{sample}.bam")
log:
join(config["outputDir"], config["logDir"], "{sample}.samtools_sort.log")
version:
subprocess.getoutput(
"samtools --version | "
"head -1 | "
"cut -d' ' -f2"
)
message:
"Genomic sorting of {input} with samtools version {version}."
shell:
"samtools sort -f {input} {output} 2> {log}"
# Indexing the BAM files
rule bam_index:
input:
join(config["outputDir"], "{sample}.bam")
output:
join(config["outputDir"], "{sample}.bam.bai")
message:
"Indexing {input}."
shell:
"samtools index {input}"
내가이 파이프 라인 실행
snakemake --cores 3 --snakefile /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py --configfile /home/nico/labo/etudes/Optimal/data/snakemake_config_files/config_snakemake_Optimal_mapping_BaL.json
을 나는 ' 나는 config_snakemake_Optimal_mapping_BaL.json
를라는 SnakeMake 파이프 라인의 구성 JSON 파일을 생성 다음 오류 출력이 표시됩니다.
['sampleB', 'sampleA']
MissingInputException in line 18 of /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py:
Missing input files for rule bwa_mem_to_bam:
sampleB_R1.fastq.gz
sampleB_R2.fastq.gz
또는 순간을 따라 :
['sampleB', 'sampleA']
PeriodicWildcardError in line 40 of /home/nico/labo/scripts/pipeline_illumina/snakefile_bwa_samtools.py:
The value _unsorted in wildcard sample is periodically repeated (sampleB_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted_unsorted). This would lead to an infinite recursion. To avoid this, e.g. restrict the wildcards in this rule to certain values.
가 목록에 표시 (출력의 종류의 첫 번째 줄) 내가 반드시 규칙 bwa_mem_to_bam
에 와일드 카드로 장난 해요로 샘플이 제대로 감지, 그러나 나는 정말로 이유를 알지 못한다. .. 어떤 단서?