Building consensus genome from raw fastq reads¶

Notebook version: v0.0.1
Created by: Dr. Hiren Ghosh, Imperial BRC Genomics Facility
Maintained by: Imperial BRC Genomics Facility
Docker image: imperialgenomicsfacility/viral-genome-analysis-notebooks
Github repository: imperial-genomics-facility/viral-genome-notebook-image
Created on: 2020-April-21 14:42
Contact us: Imperial BRC Genomics Facility
License: Apache License 2.0

Configure notebook for run¶

[1]:

## Number of CPU to use
CPU_THREADS = 1

## Default value for runningthe notebook in binder
MEM_LIMIT_GB = 2
MEM_LIMIT_BYTES = MEM_LIMIT_GB * 1000000000

## Download raw fastq instead of SRA format, faster on binder
FETCH_RAW_FASTQ = True

## subsample reads to 1M to run in binder, set it to zero to disable
SUBSAMPLE_READ = 1000000

## A toggle for running assembly on binder, set it to 0 to disable
RUN_ASSEMBLY = 1

## List of k-mers to use for de-novo assembly
ASSEMBLY_KMERS = '27,31'

## Accession id of the reference genome
REFERENCE_fasta = 'NC_045512.2'

Prepare sample list¶

[2]:

## we only have one sample in the list along with the fastq files for the sample

list_of_samples_data = \
  [{'sample_name':'SRR10971381',
    'fastq_files' : [
      '/tmp/SRR10971381_1.fastq.gz',
      '/tmp/SRR10971381_2.fastq.gz']}
  ]

Load required python libraries¶

[3]:

import os, requests

Fetch fastq files from SRA¶

[4]:

%%time
if FETCH_RAW_FASTQ:
  ## Download raw fastq from SRA (fast)
  !wget -O /tmp/SRR10971381_1.fastq.gz https://sra-pub-src-1.s3.amazonaws.com/SRR10971381/WH_R1.fastq.gz.1
  !wget -O /tmp/SRR10971381_2.fastq.gz https://sra-pub-src-1.s3.amazonaws.com/SRR10971381/WH_R2.fastq.gz.1
else:
  ## Download reads in SRA format and then convert it to fastq (slow)
  !wget -O /tmp/SRR10971381 https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/010714/SRR10971381
  ## Convert SRA format data to fastq format
  !fastq-dump --split-files --gzip -outdir /tmp /tmp/SRR10971381

--2020-04-21 12:12:18--  https://sra-pub-src-1.s3.amazonaws.com/SRR10971381/WH_R1.fastq.gz.1
Resolving sra-pub-src-1.s3.amazonaws.com (sra-pub-src-1.s3.amazonaws.com)... 52.216.78.124
Connecting to sra-pub-src-1.s3.amazonaws.com (sra-pub-src-1.s3.amazonaws.com)|52.216.78.124|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2739477612 (2.6G) [application/x-troff-man]
Saving to: '/tmp/SRR10971381_1.fastq.gz'

/tmp/SRR10971381_1. 100%[===================>]   2.55G  29.4MB/s    in 80s

2020-04-21 12:13:38 (32.5 MB/s) - '/tmp/SRR10971381_1.fastq.gz' saved [2739477612/2739477612]

--2020-04-21 12:13:41--  https://sra-pub-src-1.s3.amazonaws.com/SRR10971381/WH_R2.fastq.gz.1
Resolving sra-pub-src-1.s3.amazonaws.com (sra-pub-src-1.s3.amazonaws.com)... 52.216.147.140
Connecting to sra-pub-src-1.s3.amazonaws.com (sra-pub-src-1.s3.amazonaws.com)|52.216.147.140|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2838458153 (2.6G) [application/x-troff-man]
Saving to: '/tmp/SRR10971381_2.fastq.gz'

/tmp/SRR10971381_2. 100%[===================>]   2.64G  30.1MB/s    in 82s

2020-04-21 12:15:03 (32.8 MB/s) - '/tmp/SRR10971381_2.fastq.gz' saved [2838458153/2838458153]

CPU times: user 3.05 s, sys: 721 ms, total: 3.77 s
Wall time: 2min 46s

Sub-sample reads for binder run¶

This is an optional step. We are sub-sampling reads from the raw fastq files to the value specified in the variable SUBSAMPLE_READ to make it work in the Binder

[5]:

%%time
## following step may take some time to run

for entry in list_of_samples_data:
  if SUBSAMPLE_READ > 0:
    sample_name = entry.get('sample_name')
    fastq_files = entry.get('fastq_files')
    R1_fastq = fastq_files[0]
    R2_fastq = fastq_files[1]
    R1_sub_fastq = '/tmp/{0}_sub_1.fastq'.format(sample_name)
    R2_sub_fastq = '/tmp/{0}_sub_2.fastq'.format(sample_name)
    ## running seqtk to subsample files
    print('subsampling reads for sample {0} R1'.format(sample_name))
    !seqtk sample -2 -s100 $R1_fastq $SUBSAMPLE_READ  > $R1_sub_fastq
    print('subsampling reads for sample {0} R2'.format(sample_name))
    !seqtk sample -2 -s100 $R2_fastq $SUBSAMPLE_READ > $R2_sub_fastq
    entry.update({'subsample_fastq_files':[R1_sub_fastq,R2_sub_fastq]})

subsampling reads for sample SRR10971381 R1
subsampling reads for sample SRR10971381 R2
CPU times: user 9.85 s, sys: 1.7 s, total: 11.5 s
Wall time: 10min 29s

Check for viral DNA contamination using Fastv¶

Fetch required reference genomes and list of unique k-mers¶

[6]:

## fetch coronavirus genomes

!wget -O /tmp/SARS2_153_complete_genomes_20200329.fasta \
  https://storage.googleapis.com/sars-cov-2/SARS2_153_complete_genomes_20200329.fasta

## fetch unique kmers for coronavirus

!wget -O /tmp/SARS-CoV-2.kmer.fa \
  http://opengene.org/fastv/data/SARS-CoV-2.kmer.fa

--2020-04-21 12:25:34--  https://storage.googleapis.com/sars-cov-2/SARS2_153_complete_genomes_20200329.fasta
Resolving storage.googleapis.com (storage.googleapis.com)... 108.177.112.128, 2607:f8b0:4001:c07::80
Connecting to storage.googleapis.com (storage.googleapis.com)|108.177.112.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4662317 (4.4M) [application/octet-stream]
Saving to: '/tmp/SARS2_153_complete_genomes_20200329.fasta'

/tmp/SARS2_153_comp 100%[===================>]   4.45M  --.-KB/s    in 0.07s

2020-04-21 12:25:34 (62.2 MB/s) - '/tmp/SARS2_153_complete_genomes_20200329.fasta' saved [4662317/4662317]

--2020-04-21 12:25:35--  http://opengene.org/fastv/data/SARS-CoV-2.kmer.fa
Resolving opengene.org (opengene.org)... 47.90.42.109
Connecting to opengene.org (opengene.org)|47.90.42.109|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7632 (7.5K) [application/octet-stream]
Saving to: '/tmp/SARS-CoV-2.kmer.fa'

/tmp/SARS-CoV-2.kme 100%[===================>]   7.45K  --.-KB/s    in 0s

2020-04-21 12:25:35 (886 MB/s) - '/tmp/SARS-CoV-2.kmer.fa' saved [7632/7632]

Prepare function for fastv run¶

[7]:

## run fastv function

def run_fastv(sample_name,ref_genome,ref_kmers,R1_fastq,R2_fastq,
              output_path='fastv_output'):
  '''
  A function for running fastv tool for a paired-end fastq data

  :param sample_name: Sample name
  :param ref_genome: Reference genome fasta file
  :param ref_kmers: Reference k-mers fasta file
  :param R1_fastq: Path for R1 fastq file
  :param R2_fastq: Path for R2 fastq file
  :param output_path: Output dir path, default fastv_output in current dir
  :returns: fastv_html_output,fastv_json_output,fastv_log_output
  '''
  try:
    output_path = os.path.abspath(output_path)
    !mkdir -p $output_path
    print('running fastv for sample {0}'.format(sample_name))
    fastv_html_output = os.path.join(output_path,'{0}.fastv.html'.format(sample_name))
    fastv_json_output = os.path.join(output_path,'{0}.fastv.json'.format(sample_name))
    fastv_log_output = os.path.join(output_path,'{0}.fastv.log'.format(sample_name))
    !~/bin/fastv \
    -i $R1_fastq \
    -I $R2_fastq \
    -k $ref_kmers \
    -g $ref_genome \
    -h $fastv_html_output \
    -j $fastv_json_output \
    --thread $CPU_THREADS 2> $fastv_log_output
    return fastv_html_output,fastv_json_output,fastv_log_output
  except Exception as e:
    raise ValueError(
            'Failed to run fastv for sample {0}, error: {1}'.format(sample_name,e))

Run fastv for all samples¶

[8]:

%%time

for entry in list_of_samples_data:
  sample_name = entry.get('sample_name')
  if SUBSAMPLE_READ > 0:
    fastq_files = entry.get('subsample_fastq_files')
  else:
    fastq_files = entry.get('fastq_files')

  R1_fastq = fastq_files[0]
  R2_fastq = fastq_files[1]
  fastv_html_output,fastv_json_output,fastv_log_output =\
    run_fastv(
      sample_name=sample_name,
      ref_genome='/tmp/SARS2_153_complete_genomes_20200329.fasta',
      ref_kmers='/tmp/SARS-CoV-2.kmer.fa',
      R1_fastq=R1_fastq,
      R2_fastq=R2_fastq)
  entry.update(
    {'fastv_files':[fastv_html_output,fastv_json_output,fastv_log_output]})

running fastv for sample SRR10971381
CPU times: user 2 s, sys: 334 ms, total: 2.33 s
Wall time: 2min 19s

[9]:

## now we have the list of output files appended in the sample list

list_of_samples_data

[9]:

[{'sample_name': 'SRR10971381',
  'fastq_files': ['/tmp/SRR10971381_1.fastq.gz',
   '/tmp/SRR10971381_2.fastq.gz'],
  'subsample_fastq_files': ['/tmp/SRR10971381_sub_1.fastq',
   '/tmp/SRR10971381_sub_2.fastq'],
  'fastv_files': ['/home/vmuser/examples/fastv_output/SRR10971381.fastv.html',
   '/home/vmuser/examples/fastv_output/SRR10971381.fastv.json',
   '/home/vmuser/examples/fastv_output/SRR10971381.fastv.log']}]

De-novo assembly of viral genome using Megahit¶

Prepare function for Megahit assembly run¶

[10]:

def run_megahit_assembly(sample_name,R1_fastq,R2_fastq,output_path='megahit_output'):
  '''
  A function for running megahit de-novo assembly for a paired-end fastq data

  :param sample_name: Sample name
  :param R1_fastq: Path for R1 fastq file
  :param R2_fastq: Path for R2 fastq file
  :param output_path: Output dir path, default megahit_output in current dir
  :returns: megahit_assembly,fastg_output
  '''
  try:
    output_path = os.path.abspath(output_path)
    !mkdir -p $output_path
    output_dir = os.path.join(output_path,'megahit_assembly_{0}'.format(sample_name))
    print('running de-novo assembly using megahit for sample {0}'.format(sample_name))
    !megahit \
      -1 $R1_fastq \
      -2 $R2_fastq \
      -o $output_dir \
      --k-list $ASSEMBLY_KMERS \
      --num-cpu-threads $CPU_THREADS \
      --memory $MEM_LIMIT_BYTES \
      --tmp-dir /tmp
    max_kmer = ASSEMBLY_KMERS.split(',')[-1]
    fastg_input = \
      os.path.join(
        output_dir,
        'intermediate_contigs',
        'k{0}.contigs.fa'.format(max_kmer))
    fastg_output = os.path.join(output_dir,'{0}_k{1}.fastg'.format(sample_name,max_kmer))
    print('converting de-novo assembly to fastg for sample {0}'.format(sample_name))
    !megahit_toolkit contig2fastg $max_kmer $fastg_input > $fastg_output
    return output_dir,fastg_output
  except Exception as e:
    raise ValueError(
            'Failed to run fastv for sample {0}, error: {1}'.format(sample_name,e))

Run de-novo assembly for all the samples¶

[11]:

%%time

## following step may take upto 20 min

if RUN_ASSEMBLY > 0:
  for entry in list_of_samples_data:
    sample_name = entry.get('sample_name')
    if SUBSAMPLE_READ > 0:
      fastq_files = entry.get('subsample_fastq_files')
    else:
      fastq_files = entry.get('fastq_files')

    R1_fastq = fastq_files[0]
    R2_fastq = fastq_files[1]
    megahit_output_dir,fastg_output = \
      run_megahit_assembly(
        sample_name=sample_name,
        R1_fastq=R1_fastq,
        R2_fastq=R2_fastq)
    entry.update({'megahit_output_dir':megahit_output_dir,'megahit_fastg':fastg_output})

running de-novo assembly using megahit for sample SRR10971381
2020-04-21 12:27:56 - MEGAHIT v1.2.9
2020-04-21 12:27:56 - Using megahit_core with POPCNT and BMI2 support
2020-04-21 12:27:56 - Convert reads to binary library
2020-04-21 12:28:00 - b'INFO  sequence/io/sequence_lib.cpp  :   77 - Lib 0 (/tmp/SRR10971381_sub_1.fastq,/tmp/SRR10971381_sub_2.fastq): pe, 2000000 reads, 151 max length'
2020-04-21 12:28:00 - b'INFO  utils/utils.h                 :  152 - Real: 3.4744\tuser: 2.2256\tsys: 1.1238\tmaxrss: 164380'
2020-04-21 12:28:00 - Start assembly. Number of CPU threads 1
2020-04-21 12:28:00 - k list: 27,31
2020-04-21 12:28:00 - Memory used: 2000000000
2020-04-21 12:28:00 - Extract solid (k+1)-mers for k = 27
2020-04-21 12:30:02 - Build graph for k = 27
2020-04-21 12:31:32 - Assemble contigs from SdBG for k = 27
2020-04-21 12:37:34 - Local assembly for k = 27
2020-04-21 12:37:55 - Extract iterative edges from k = 27 to 31
2020-04-21 12:38:22 - Build graph for k = 31
2020-04-21 12:38:49 - Assemble contigs from SdBG for k = 31
2020-04-21 12:42:07 - Merging to output final contigs
2020-04-21 12:42:07 - 9056 contigs, total 2427140 bp, min 200 bp, max 8614 bp, avg 268 bp, N50 253 bp
2020-04-21 12:42:08 - ALL DONE. Time elapsed: 851.511862 seconds
converting de-novo assembly to fastg for sample SRR10971381
CPU times: user 13.2 s, sys: 2 s, total: 15.3 s
Wall time: 14min 17s

[12]:

## now we have the list of assembly output files present in the sample list

list_of_samples_data

[12]:

[{'sample_name': 'SRR10971381',
  'fastq_files': ['/tmp/SRR10971381_1.fastq.gz',
   '/tmp/SRR10971381_2.fastq.gz'],
  'subsample_fastq_files': ['/tmp/SRR10971381_sub_1.fastq',
   '/tmp/SRR10971381_sub_2.fastq'],
  'fastv_files': ['/home/vmuser/examples/fastv_output/SRR10971381.fastv.html',
   '/home/vmuser/examples/fastv_output/SRR10971381.fastv.json',
   '/home/vmuser/examples/fastv_output/SRR10971381.fastv.log'],
  'megahit_output_dir': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381',
  'megahit_fastg': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381/SRR10971381_k31.fastg'}]

Map raw reads on reference genome¶

Prepare function for reference genome fetching¶

[13]:

def fetch_genome_fasta_from_ncbi(refseq_id,output_path='.',file_format='fasta'):
  '''
  A function for fetching the genome fasta sequences from NCBI

  :param refseq_id: NCBI genome id
  :param output_path: Path to dump genome files, default '.'
  :param file_format: Output file format, default fasta, supported formats are 'fasta' and 'gb'
  :returns: output_file
  '''
  try:
    output_path = os.path.abspath(output_path)
    !mkdir -p $output_path
    url = \
      'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id={0}&rettype={1}'.\
        format(refseq_id,file_format)
    r = requests.get(url)
    if r.status_code != 200:
      raise ValueError('Failed to download file for {0}, http status code {1}'.format(refseq_id,r.status_code))
    data = r.content.decode('utf-8')
    output_file = \
      os.path.join(
        os.path.abspath(output_path),
        '{0}.{1}'.format(refseq_id,file_format))
    with open(output_file,'w') as fp:
      fp.write(data)
    print('Downloaded genome seq for {0}'.format(refseq_id))
    return output_file
  except Exception as e:
    raise ValueError('Failed to download data for {0} from NCBI, error: {1}'.format(refseq_id,e))

[14]:

reference_fastq = fetch_genome_fasta_from_ncbi(REFERENCE_fasta,output_path='ref_genome')
reference_fastq

Downloaded genome seq for NC_045512.2

[14]:

'/home/vmuser/examples/ref_genome/NC_045512.2.fasta'

Build Bowtie2 index for reference genome¶

[15]:

## create reference index dir

!mkdir -p bowtie2_ref

bowtie2_ref = os.path.abspath('bowtie2_ref/NC_045512.2')
## Build Bowtie2 index for reference genome

!bowtie2-build \
  $reference_fastq \
  $bowtie2_ref > bowtie2_build.log

Building a SMALL index

Prepare function for bowtie2 mapping¶

[16]:

def run_bowtie2_mapping(sample_name,bowtie2_index,R1_fastq,R2_fastq,output_path='bowtie2_output'):
  '''
  A function for running bowtie2 mapping for a paired-end fastq data

  :param sample_name: Sample name
  :param bowtie2_index: Bowtie index path
  :param R1_fastq: Path for R1 fastq file
  :param R2_fastq: Path for R2 fastq file
  :param output_path: Output dir path, default bowtie2_output in current dir
  :returns: bowtie2 alignment in sam
  '''
  try:
    output_path = os.path.abspath(output_path)
    !mkdir -p $output_path
    output_sam = os.path.join(output_path,'alignment_{0}.sam'.format(sample_name))
    !bowtie2 \
      -x $bowtie2_index \
      --very-fast  \
      -1 $R1_fastq \
      -2 $R2_fastq \
      --threads $CPU_THREADS \
      -S  $output_sam
    return output_sam
  except Exception as e:
    raise ValueError(
            'Failed to run bowtie2 for sample {0}, error: {1}'.format(sample_name,e))

Run bowtie2 mapping for all the samples¶

[17]:

%%time

## following step may take upto 30 min (per sample)
for entry in list_of_samples_data:
  sample_name = entry.get('sample_name')
  fastq_files = entry.get('fastq_files')
  R1_fastq = fastq_files[0]
  R2_fastq = fastq_files[1]
  output_sam = \
    run_bowtie2_mapping(
      sample_name=sample_name,
      bowtie2_index=bowtie2_ref,
      R1_fastq=R1_fastq,
      R2_fastq=R2_fastq)
  entry.update({'bowtie2_sam':output_sam})

28282964 reads; of these:
  28282964 (100.00%) were paired; of these:
    28224324 (99.79%) aligned concordantly 0 times
    58640 (0.21%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    28224324 pairs aligned concordantly 0 times; of these:
      1870 (0.01%) aligned discordantly 1 time
    ----
    28222454 pairs aligned 0 times concordantly or discordantly; of these:
      56444908 mates make up the pairs; of these:
        56443948 (100.00%) aligned 0 times
        960 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
0.22% overall alignment rate
CPU times: user 18 s, sys: 2.95 s, total: 21 s
Wall time: 21min 33s

[18]:

## now we have the list of bowtie2 output files present in the sample list

list_of_samples_data

[18]:

[{'sample_name': 'SRR10971381',
  'fastq_files': ['/tmp/SRR10971381_1.fastq.gz',
   '/tmp/SRR10971381_2.fastq.gz'],
  'subsample_fastq_files': ['/tmp/SRR10971381_sub_1.fastq',
   '/tmp/SRR10971381_sub_2.fastq'],
  'fastv_files': ['/home/vmuser/examples/fastv_output/SRR10971381.fastv.html',
   '/home/vmuser/examples/fastv_output/SRR10971381.fastv.json',
   '/home/vmuser/examples/fastv_output/SRR10971381.fastv.log'],
  'megahit_output_dir': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381',
  'megahit_fastg': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381/SRR10971381_k31.fastg',
  'bowtie2_sam': '/home/vmuser/examples/bowtie2_output/alignment_SRR10971381.sam'}]

Consensus genome building from mapped reads¶

Prepare function for consensus fasta building¶

[19]:

def aln_to_consensus_fasta(sample_name,sam_file,reference_fasta,output_path='samtools_dir'):
  '''
  A function for aligned sam file to consensus fasta generation

  :param sample_name: Sample name
  :param sam_file: sam format alignment file
  :param reference_fasta: reference fasta file
  :param output_path: Output dir path, default samtools_dir
  :returns: sorted_bam_file,flagstat_file,bcftools_call,consensus_fasta
  '''
  try:
    output_path = os.path.abspath(output_path)
    !mkdir -p $output_path
    bam_file = os.path.join('/tmp','{0}_raw.bam'.format(sample_name))
    sorted_bam_file = os.path.join(output_path,'{0}_sorted.bam'.format(sample_name))
    flagstat_file = os.path.join(output_path,'{0}_sorted.flagstat'.format(sample_name))
    bcftools_call = os.path.join(output_path,'{0}_calls.vcf.gz'.format(sample_name))
    consensus_fasta = os.path.join(output_path,'{0}_consensus.fasta'.format(sample_name))
    !samtools view -q 5 -bo $bam_file $sam_file
    !samtools sort $bam_file > $sorted_bam_file
    !samtools index $sorted_bam_file
    !samtools flagstat $sorted_bam_file > $flagstat_file
    !bcftools mpileup -f $reference_fasta $sorted_bam_file | bcftools call --ploidy 1  -mv -Oz -o $bcftools_call
    !bcftools index $bcftools_call
    !cat $reference_fasta | bcftools consensus $bcftools_call > $consensus_fasta
    return sorted_bam_file,flagstat_file,bcftools_call,consensus_fasta
  except Exception as e:
    raise ValueError(
            'Failed to run consensus fasta generation for sample {0}, error: {1}'.format(sample_name,e))

Run consensus fasta building for all samples¶

[20]:

%%time

## following step may take upto 30 min (per sample)
for entry in list_of_samples_data:
  sample_name = entry.get('sample_name')
  bowtie2_sam = entry.get('bowtie2_sam')
  sorted_bam_file,flagstat_file,bcftools_call,consensus_fasta = \
    aln_to_consensus_fasta(
      sample_name=sample_name,
      sam_file=bowtie2_sam,
      reference_fasta='/home/vmuser/examples/ref_genome/NC_045512.2.fasta')
  entry.update({
    'bowtie2_sorted_aln':sorted_bam_file,
    'flagstat_file':flagstat_file,
    'bcftools_call':bcftools_call,
    'consensus_fasta':consensus_fasta})

[mpileup] 1 samples in 1 input files
[mpileup] maximum number of reads per input file set to -d 250
Note: the --sample option not given, applying all records regardless of the genotype
The site NC_045512.2:3 overlaps with another variant, skipping...
The site NC_045512.2:4 overlaps with another variant, skipping...
Applied 1 variants
CPU times: user 6.94 s, sys: 1.19 s, total: 8.13 s
Wall time: 8min 14s

[21]:

## now we have the list of consensus fasta output files present in the sample list

list_of_samples_data

[21]:

[{'sample_name': 'SRR10971381',
  'fastq_files': ['/tmp/SRR10971381_1.fastq.gz',
   '/tmp/SRR10971381_2.fastq.gz'],
  'subsample_fastq_files': ['/tmp/SRR10971381_sub_1.fastq',
   '/tmp/SRR10971381_sub_2.fastq'],
  'fastv_files': ['/home/vmuser/examples/fastv_output/SRR10971381.fastv.html',
   '/home/vmuser/examples/fastv_output/SRR10971381.fastv.json',
   '/home/vmuser/examples/fastv_output/SRR10971381.fastv.log'],
  'megahit_output_dir': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381',
  'megahit_fastg': '/home/vmuser/examples/megahit_output/megahit_assembly_SRR10971381/SRR10971381_k31.fastg',
  'bowtie2_sam': '/home/vmuser/examples/bowtie2_output/alignment_SRR10971381.sam',
  'bowtie2_sorted_aln': '/home/vmuser/examples/samtools_dir/SRR10971381_sorted.bam',
  'flagstat_file': '/home/vmuser/examples/samtools_dir/SRR10971381_sorted.flagstat',
  'bcftools_call': '/home/vmuser/examples/samtools_dir/SRR10971381_calls.vcf.gz',
  'consensus_fasta': '/home/vmuser/examples/samtools_dir/SRR10971381_consensus.fasta'}]

[ ]: