Building consensus genome from raw fastq reads¶
- Notebook version:
v0.0.1
- Created by: Dr. Hiren Ghosh,
Imperial BRC Genomics Facility
- Maintained by:
Imperial BRC Genomics Facility
- Docker image: imperialgenomicsfacility/viral-genome-analysis-notebooks
- Github repository: imperial-genomics-facility/viral-genome-notebook-image
- Created on:
2020-April-21 14:42
- Contact us: Imperial BRC Genomics Facility
- License: Apache License 2.0
Configure notebook for run¶
[1]:
## Number of CPU to use
CPU_THREADS = 1
## Default value for runningthe notebook in binder
MEM_LIMIT_GB = 2
MEM_LIMIT_BYTES = MEM_LIMIT_GB * 1000000000
## Download raw fastq instead of SRA format, faster on binder
FETCH_RAW_FASTQ = True
## subsample reads to 1M to run in binder, set it to zero to disable
SUBSAMPLE_READ = 1000000
## A toggle for running assembly on binder, set it to 0 to disable
RUN_ASSEMBLY = 1
## List of k-mers to use for de-novo assembly
ASSEMBLY_KMERS = '27,31'
## Accession id of the reference genome
REFERENCE_fasta = 'NC_045512.2'
Prepare sample list¶
[2]:
## we only have one sample in the list along with the fastq files for the sample
list_of_samples_data = \
[{'sample_name':'SRR10971381',
'fastq_files' : [
'/tmp/SRR10971381_1.fastq.gz',
'/tmp/SRR10971381_2.fastq.gz']}
]
Load required python libraries¶
[3]:
import os, requests
Fetch fastq files from SRA¶
[4]:
%%time
if FETCH_RAW_FASTQ:
## Download raw fastq from SRA (fast)
!wget -O /tmp/SRR10971381_1.fastq.gz https://sra-pub-src-1.s3.amazonaws.com/SRR10971381/WH_R1.fastq.gz.1
!wget -O /tmp/SRR10971381_2.fastq.gz https://sra-pub-src-1.s3.amazonaws.com/SRR10971381/WH_R2.fastq.gz.1
else:
## Download reads in SRA format and then convert it to fastq (slow)
!wget -O /tmp/SRR10971381 https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/010714/SRR10971381
## Convert SRA format data to fastq format
!fastq-dump --split-files --gzip -outdir /tmp /tmp/SRR10971381
--2020-04-21 12:12:18-- https://sra-pub-src-1.s3.amazonaws.com/SRR10971381/WH_R1.fastq.gz.1
Resolving sra-pub-src-1.s3.amazonaws.com (sra-pub-src-1.s3.amazonaws.com)... 52.216.78.124
Connecting to sra-pub-src-1.s3.amazonaws.com (sra-pub-src-1.s3.amazonaws.com)|52.216.78.124|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2739477612 (2.6G) [application/x-troff-man]
Saving to: '/tmp/SRR10971381_1.fastq.gz'
/tmp/SRR10971381_1. 100%[===================>] 2.55G 29.4MB/s in 80s
2020-04-21 12:13:38 (32.5 MB/s) - '/tmp/SRR10971381_1.fastq.gz' saved [2739477612/2739477612]
--2020-04-21 12:13:41-- https://sra-pub-src-1.s3.amazonaws.com/SRR10971381/WH_R2.fastq.gz.1
Resolving sra-pub-src-1.s3.amazonaws.com (sra-pub-src-1.s3.amazonaws.com)... 52.216.147.140
Connecting to sra-pub-src-1.s3.amazonaws.com (sra-pub-src-1.s3.amazonaws.com)|52.216.147.140|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2838458153 (2.6G) [application/x-troff-man]
Saving to: '/tmp/SRR10971381_2.fastq.gz'
/tmp/SRR10971381_2. 100%[===================>] 2.64G 30.1MB/s in 82s
2020-04-21 12:15:03 (32.8 MB/s) - '/tmp/SRR10971381_2.fastq.gz' saved [2838458153/2838458153]
CPU times: user 3.05 s, sys: 721 ms, total: 3.77 s
Wall time: 2min 46s
Sub-sample reads for binder run¶
This is an optional step. We are sub-sampling reads from the raw fastq files to the value specified in the variable SUBSAMPLE_READ to make it work in the Binder
[5]:
%%time
## following step may take some time to run
for entry in list_of_samples_data:
if SUBSAMPLE_READ > 0:
sample_name = entry.get('sample_name')
fastq_files = entry.get('fastq_files')
R1_fastq = fastq_files[0]
R2_fastq = fastq_files[1]
R1_sub_fastq = '/tmp/{0}_sub_1.fastq'.format(sample_name)
R2_sub_fastq = '/tmp/{0}_sub_2.fastq'.format(sample_name)
## running seqtk to subsample files
print('subsampling reads for sample {0} R1'.format(sample_name))
!seqtk sample -2 -s100 $R1_fastq $SUBSAMPLE_READ > $R1_sub_fastq
print('subsampling reads for sample {0} R2'.format(sample_name))
!seqtk sample -2 -s100 $R2_fastq $SUBSAMPLE_READ > $R2_sub_fastq
entry.update({'subsample_fastq_files':[R1_sub_fastq,R2_sub_fastq]})
subsampling reads for sample SRR10971381 R1
subsampling reads for sample SRR10971381 R2
CPU times: user 9.85 s, sys: 1.7 s, total: 11.5 s
Wall time: 10min 29s
Check for viral DNA contamination using Fastv¶
Fetch required reference genomes and list of unique k-mers¶
[6]:
## fetch coronavirus genomes
!wget -O /tmp/SARS2_153_complete_genomes_20200329.fasta \
https://storage.googleapis.com/sars-cov-2/SARS2_153_complete_genomes_20200329.fasta
## fetch unique kmers for coronavirus
!wget -O /tmp/SARS-CoV-2.kmer.fa \
http://opengene.org/fastv/data/SARS-CoV-2.kmer.fa
--2020-04-21 12:25:34-- https://storage.googleapis.com/sars-cov-2/SARS2_153_complete_genomes_20200329.fasta
Resolving storage.googleapis.com (storage.googleapis.com)... 108.177.112.128, 2607:f8b0:4001:c07::80
Connecting to storage.googleapis.com (storage.googleapis.com)|108.177.112.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4662317 (4.4M) [application/octet-stream]
Saving to: '/tmp/SARS2_153_complete_genomes_20200329.fasta'
/tmp/SARS2_153_comp 100%[===================>] 4.45M --.-KB/s in 0.07s
2020-04-21 12:25:34 (62.2 MB/s) - '/tmp/SARS2_153_complete_genomes_20200329.fasta' saved [4662317/4662317]
--2020-04-21 12:25:35-- http://opengene.org/fastv/data/SARS-CoV-2.kmer.fa
Resolving opengene.org (opengene.org)... 47.90.42.109
Connecting to opengene.org (opengene.org)|47.90.42.109|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7632 (7.5K) [application/octet-stream]
Saving to: '/tmp/SARS-CoV-2.kmer.fa'
/tmp/SARS-CoV-2.kme 100%[===================>] 7.45K --.-KB/s in 0s
2020-04-21 12:25:35 (886 MB/s) - '/tmp/SARS-CoV-2.kmer.fa' saved [7632/7632]
Prepare function for fastv run¶
[7]:
## run fastv function
def run_fastv(sample_name,ref_genome,ref_kmers,R1_fastq,R2_fastq,
output_path='fastv_output'):
'''
A function for running fastv tool for a paired-end fastq data
:param sample_name: Sample name
:param ref_genome: Reference genome fasta file
:param ref_kmers: Reference k-mers fasta file
:param R1_fastq: Path for R1 fastq file
:param R2_fastq: Path for R2 fastq file
:param output_path: Output dir path, default fastv_output in current dir
:returns: fastv_html_output,fastv_json_output,fastv_log_output
'''
try:
output_path = os.path.abspath(output_path)
!mkdir -p $output_path
print('running fastv for sample {0}'.format(sample_name))
fastv_html_output = os.path.join(output_path,'{0}.fastv.html'.format(sample_name))
fastv_json_output = os.path.join(output_path,'{0}.fastv.json'.format(sample_name))
fastv_log_output = os.path.join(output_path,'{0}.fastv.log'.format(sample_name))
!~/bin/fastv \
-i $R1_fastq \
-I $R2_fastq \
-k $ref_kmers \
-g $ref_genome \
-h $fastv_html_output \
-j $fastv_json_output \
--thread $CPU_THREADS 2> $fastv_log_output
return fastv_html_output,fastv_json_output,fastv_log_output
except Exception as e:
raise ValueError(
'Failed to run fastv for sample {0}, error: {1}'.format(sample_name,e))
Run fastv for all samples¶
[8]:
%%time
for entry in list_of_samples_data:
sample_name = entry.get('sample_name')
if SUBSAMPLE_READ > 0:
fastq_files = entry.get('subsample_fastq_files')
else:
fastq_files = entry.get('fastq_files')
R1_fastq = fastq_files[0]
R2_fastq = fastq_files[1]
fastv_html_output,fastv_json_output,fastv_log_output =\
run_fastv(
sample_name=sample_name,
ref_genome='/tmp/SARS2_153_complete_genomes_20200329.fasta',
ref_kmers='/tmp/SARS-CoV-2.kmer.fa',
R1_fastq=R1_fastq,
R2_fastq=R2_fastq)
entry.update(
{'fastv_files':[fastv_html_output,fastv_json_output,fastv_log_output]})
running fastv for sample SRR10971381
CPU times: user 2 s, sys: 334 ms, total: 2.33 s
Wall time: 2min 19s
[9]:
## now we have the list of output files appended in the sample list
list_of_samples_data
[9]:
[{'sample_name': 'SRR10971381',
'fastq_files': ['/tmp/SRR10971381_1.fastq.gz',
'/tmp/SRR10971381_2.fastq.gz'],
'subsample_fastq_files': ['/tmp/SRR10971381_sub_1.fastq',
'/tmp/SRR10971381_sub_2.fastq'],
'fastv_files': ['/home/vmuser/examples/fastv_output/SRR10971381.fastv.html',
'/home/vmuser/examples/fastv_output/SRR10971381.fastv.json',
'/home/vmuser/examples/fastv_output/SRR10971381.fastv.log']}]
De-novo assembly of viral genome using Megahit¶
Prepare function for Megahit assembly run¶
[10]:
def run_megahit_assembly(sample_name,R1_fastq,R2_fastq,output_path='megahit_output'):
'''
A function for running megahit de-novo assembly for a paired-end fastq data
:param sample_name: Sample name
:param R1_fastq: Path for R1 fastq file
:param R2_fastq: Path for R2 fastq file
:param output_path: Output dir path, default megahit_output in current dir
:returns: megahit_assembly,fastg_output
'''
try:
output_path = os.path.abspath(output_path)
!mkdir -p $output_path
output_dir = os.path.join(output_path,'megahit_assembly_{0}'.format(sample_name))
print('running de-novo assembly using megahit for sample {0}'.format(sample_name))
!megahit \
-1 $R1_fastq \
-2 $R2_fastq \
-o $output_dir \
--k-list $ASSEMBLY_KMERS \
--num-cpu-threads $CPU_THREADS \
--memory $MEM_LIMIT_BYTES \
--tmp-dir /tmp
max_kmer = ASSEMBLY_KMERS.split(',')[-1]
fastg_input = \
os.path.join(
output_dir,
'intermediate_contigs',
'k{0}.contigs.fa'.format(max_kmer))
fastg_output = os.path.join(output_dir,'{0}_k{1}.fastg'.format(sample_name,max_kmer))
print('converting de-novo assembly to fastg for sample {0}'.format(sample_name))
!megahit_toolkit contig2fastg $max_kmer $fastg_input > $fastg_output
return output_dir,fastg_output
except Exception as e:
raise ValueError(
'Failed to run fastv for sample {0}, error: {1}'.format(sample_name,e))
Run de-novo assembly for all the samples¶
[11]:
%%time
## following step may take upto 20 min
if RUN_ASSEMBLY > 0:
for entry in list_of_samples_data:
sample_name = entry.get('sample_name')
if SUBSAMPLE_READ > 0:
fastq_files = entry.get('subsample_fastq_files')
else:
fastq_files = entry.get('fastq_files')
R1_fastq = fastq_files[0]
R2_fastq = fastq_files[1]
megahit_output_dir,fastg_output = \
run_megahit_assembly(
sample_name=sample_name,
R1_fastq=R1_fastq,
R2_fastq=R2_fastq)
entry.update({'megahit_output_dir':megahit_output_dir,'megahit_fastg':fastg_output})
running de-novo assembly using megahit for sample SRR10971381
2020-04-21 12:27:56 - MEGAHIT v1.2.9
2020-04-21 12:27:56 - Using megahit_core with POPCNT and BMI2 support
2020-04-21 12:27:56 - Convert reads to binary library
2020-04-21 12:28:00 - b'INFO sequence/io/sequence_lib.cpp : 77 - Lib 0 (/tmp/SRR10971381_sub_1.fastq,/tmp/SRR10971381_sub_2.fastq): pe, 2000000 reads, 151 max length'
2020-04-21 12:28:00 - b'INFO utils/utils.h : 152 - Real: 3.4744\tuser: 2.2256\tsys: 1.1238\tmaxrss: 164380'
2020-04-21 12:28:00 - Start assembly. Number of CPU threads 1
2020-04-21 12:28:00 - k list: 27,31
2020-04-21 12:28:00 - Memory used: 2000000000
2020-04-21 12:28:00 - Extract solid (k+1)-mers for k = 27
2020-04-21 12:30:02 - Build graph for k = 27
2020-04-21 12:31:32 - Assemble contigs from SdBG for k = 27
2020-04-21 12:37:34 - Local assembly for k = 27
2020-04-21 12:37:55 - Extract iterative edges from k = 27 to 31
2020-04-21 12:38:22 - Build graph for k = 31
2020-04-21 12:38:49 - Assemble contigs from SdBG for k = 31
2020-04-21 12:42:07 - Merging to output final contigs
2020-04-21 12:42:07 - 9056 contigs, total 2427140 bp, min 200 bp, max 8614 bp, avg 268 bp, N50 253 bp
2020-04-21 12:42:08 - ALL DONE. Time elapsed: 851.511862 seconds
converting de-novo assembly to fastg for sample SRR10971381
CPU times: user 13.2 s, sys: 2 s, total: 15.3 s
Wall time: 14min 17s
[12]:
## now we have the list of assembly output files present in the sample list
list_of_samples_data
[12]:
[{'sample_name': 'SRR10971381',
'fastq_files': ['/tmp/SRR10971381_1.fastq.gz',
'/tmp/SRR10971381_2.fastq.gz'],
'subsample_fastq_files': ['/tmp/SRR10971381_sub_1.fastq',
'/tmp/SRR10971381_sub_2.fastq'],
'fastv_files': ['/home/vmuser/examples/fastv_output/SRR10971381.fastv.html',
'/home/vmuser/examples/fastv_output/SRR10971381.fastv.json',
'/home/vmuser/examples/fastv_output/SRR10971381.fastv.log'],
'megahit_output_dir': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381',
'megahit_fastg': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381/SRR10971381_k31.fastg'}]
Map raw reads on reference genome¶
Prepare function for reference genome fetching¶
[13]:
def fetch_genome_fasta_from_ncbi(refseq_id,output_path='.',file_format='fasta'):
'''
A function for fetching the genome fasta sequences from NCBI
:param refseq_id: NCBI genome id
:param output_path: Path to dump genome files, default '.'
:param file_format: Output file format, default fasta, supported formats are 'fasta' and 'gb'
:returns: output_file
'''
try:
output_path = os.path.abspath(output_path)
!mkdir -p $output_path
url = \
'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id={0}&rettype={1}'.\
format(refseq_id,file_format)
r = requests.get(url)
if r.status_code != 200:
raise ValueError('Failed to download file for {0}, http status code {1}'.format(refseq_id,r.status_code))
data = r.content.decode('utf-8')
output_file = \
os.path.join(
os.path.abspath(output_path),
'{0}.{1}'.format(refseq_id,file_format))
with open(output_file,'w') as fp:
fp.write(data)
print('Downloaded genome seq for {0}'.format(refseq_id))
return output_file
except Exception as e:
raise ValueError('Failed to download data for {0} from NCBI, error: {1}'.format(refseq_id,e))
[14]:
reference_fastq = fetch_genome_fasta_from_ncbi(REFERENCE_fasta,output_path='ref_genome')
reference_fastq
Downloaded genome seq for NC_045512.2
[14]:
'/home/vmuser/examples/ref_genome/NC_045512.2.fasta'
Build Bowtie2 index for reference genome¶
[15]:
## create reference index dir
!mkdir -p bowtie2_ref
bowtie2_ref = os.path.abspath('bowtie2_ref/NC_045512.2')
## Build Bowtie2 index for reference genome
!bowtie2-build \
$reference_fastq \
$bowtie2_ref > bowtie2_build.log
Building a SMALL index
Prepare function for bowtie2 mapping¶
[16]:
def run_bowtie2_mapping(sample_name,bowtie2_index,R1_fastq,R2_fastq,output_path='bowtie2_output'):
'''
A function for running bowtie2 mapping for a paired-end fastq data
:param sample_name: Sample name
:param bowtie2_index: Bowtie index path
:param R1_fastq: Path for R1 fastq file
:param R2_fastq: Path for R2 fastq file
:param output_path: Output dir path, default bowtie2_output in current dir
:returns: bowtie2 alignment in sam
'''
try:
output_path = os.path.abspath(output_path)
!mkdir -p $output_path
output_sam = os.path.join(output_path,'alignment_{0}.sam'.format(sample_name))
!bowtie2 \
-x $bowtie2_index \
--very-fast \
-1 $R1_fastq \
-2 $R2_fastq \
--threads $CPU_THREADS \
-S $output_sam
return output_sam
except Exception as e:
raise ValueError(
'Failed to run bowtie2 for sample {0}, error: {1}'.format(sample_name,e))
Run bowtie2 mapping for all the samples¶
[17]:
%%time
## following step may take upto 30 min (per sample)
for entry in list_of_samples_data:
sample_name = entry.get('sample_name')
fastq_files = entry.get('fastq_files')
R1_fastq = fastq_files[0]
R2_fastq = fastq_files[1]
output_sam = \
run_bowtie2_mapping(
sample_name=sample_name,
bowtie2_index=bowtie2_ref,
R1_fastq=R1_fastq,
R2_fastq=R2_fastq)
entry.update({'bowtie2_sam':output_sam})
28282964 reads; of these:
28282964 (100.00%) were paired; of these:
28224324 (99.79%) aligned concordantly 0 times
58640 (0.21%) aligned concordantly exactly 1 time
0 (0.00%) aligned concordantly >1 times
----
28224324 pairs aligned concordantly 0 times; of these:
1870 (0.01%) aligned discordantly 1 time
----
28222454 pairs aligned 0 times concordantly or discordantly; of these:
56444908 mates make up the pairs; of these:
56443948 (100.00%) aligned 0 times
960 (0.00%) aligned exactly 1 time
0 (0.00%) aligned >1 times
0.22% overall alignment rate
CPU times: user 18 s, sys: 2.95 s, total: 21 s
Wall time: 21min 33s
[18]:
## now we have the list of bowtie2 output files present in the sample list
list_of_samples_data
[18]:
[{'sample_name': 'SRR10971381',
'fastq_files': ['/tmp/SRR10971381_1.fastq.gz',
'/tmp/SRR10971381_2.fastq.gz'],
'subsample_fastq_files': ['/tmp/SRR10971381_sub_1.fastq',
'/tmp/SRR10971381_sub_2.fastq'],
'fastv_files': ['/home/vmuser/examples/fastv_output/SRR10971381.fastv.html',
'/home/vmuser/examples/fastv_output/SRR10971381.fastv.json',
'/home/vmuser/examples/fastv_output/SRR10971381.fastv.log'],
'megahit_output_dir': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381',
'megahit_fastg': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381/SRR10971381_k31.fastg',
'bowtie2_sam': '/home/vmuser/examples/bowtie2_output/alignment_SRR10971381.sam'}]
Consensus genome building from mapped reads¶
Prepare function for consensus fasta building¶
[19]:
def aln_to_consensus_fasta(sample_name,sam_file,reference_fasta,output_path='samtools_dir'):
'''
A function for aligned sam file to consensus fasta generation
:param sample_name: Sample name
:param sam_file: sam format alignment file
:param reference_fasta: reference fasta file
:param output_path: Output dir path, default samtools_dir
:returns: sorted_bam_file,flagstat_file,bcftools_call,consensus_fasta
'''
try:
output_path = os.path.abspath(output_path)
!mkdir -p $output_path
bam_file = os.path.join('/tmp','{0}_raw.bam'.format(sample_name))
sorted_bam_file = os.path.join(output_path,'{0}_sorted.bam'.format(sample_name))
flagstat_file = os.path.join(output_path,'{0}_sorted.flagstat'.format(sample_name))
bcftools_call = os.path.join(output_path,'{0}_calls.vcf.gz'.format(sample_name))
consensus_fasta = os.path.join(output_path,'{0}_consensus.fasta'.format(sample_name))
!samtools view -q 5 -bo $bam_file $sam_file
!samtools sort $bam_file > $sorted_bam_file
!samtools index $sorted_bam_file
!samtools flagstat $sorted_bam_file > $flagstat_file
!bcftools mpileup -f $reference_fasta $sorted_bam_file | bcftools call --ploidy 1 -mv -Oz -o $bcftools_call
!bcftools index $bcftools_call
!cat $reference_fasta | bcftools consensus $bcftools_call > $consensus_fasta
return sorted_bam_file,flagstat_file,bcftools_call,consensus_fasta
except Exception as e:
raise ValueError(
'Failed to run consensus fasta generation for sample {0}, error: {1}'.format(sample_name,e))
Run consensus fasta building for all samples¶
[20]:
%%time
## following step may take upto 30 min (per sample)
for entry in list_of_samples_data:
sample_name = entry.get('sample_name')
bowtie2_sam = entry.get('bowtie2_sam')
sorted_bam_file,flagstat_file,bcftools_call,consensus_fasta = \
aln_to_consensus_fasta(
sample_name=sample_name,
sam_file=bowtie2_sam,
reference_fasta='/home/vmuser/examples/ref_genome/NC_045512.2.fasta')
entry.update({
'bowtie2_sorted_aln':sorted_bam_file,
'flagstat_file':flagstat_file,
'bcftools_call':bcftools_call,
'consensus_fasta':consensus_fasta})
[mpileup] 1 samples in 1 input files
[mpileup] maximum number of reads per input file set to -d 250
Note: the --sample option not given, applying all records regardless of the genotype
The site NC_045512.2:3 overlaps with another variant, skipping...
The site NC_045512.2:4 overlaps with another variant, skipping...
Applied 1 variants
CPU times: user 6.94 s, sys: 1.19 s, total: 8.13 s
Wall time: 8min 14s
[21]:
## now we have the list of consensus fasta output files present in the sample list
list_of_samples_data
[21]:
[{'sample_name': 'SRR10971381',
'fastq_files': ['/tmp/SRR10971381_1.fastq.gz',
'/tmp/SRR10971381_2.fastq.gz'],
'subsample_fastq_files': ['/tmp/SRR10971381_sub_1.fastq',
'/tmp/SRR10971381_sub_2.fastq'],
'fastv_files': ['/home/vmuser/examples/fastv_output/SRR10971381.fastv.html',
'/home/vmuser/examples/fastv_output/SRR10971381.fastv.json',
'/home/vmuser/examples/fastv_output/SRR10971381.fastv.log'],
'megahit_output_dir': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381',
'megahit_fastg': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381/SRR10971381_k31.fastg',
'bowtie2_sam': '/home/vmuser/examples/bowtie2_output/alignment_SRR10971381.sam',
'bowtie2_sorted_aln': '/home/vmuser/examples/samtools_dir/SRR10971381_sorted.bam',
'flagstat_file': '/home/vmuser/examples/samtools_dir/SRR10971381_sorted.flagstat',
'bcftools_call': '/home/vmuser/examples/samtools_dir/SRR10971381_calls.vcf.gz',
'consensus_fasta': '/home/vmuser/examples/samtools_dir/SRR10971381_consensus.fasta'}]
[ ]: