Workflow: Nanopore assembly workflow

Fetched 2023-09-24 18:06:59 GMT

**Workflow for sequencing with ONT Nanopore data, from basecalled reads to (meta)assembly and binning**<br> - Workflow Nanopore Quality - Kraken2 taxonomic classification of FASTQ reads - Flye (de-novo assembly) - Medaka (assembly polishing) - metaQUAST (assembly quality reports) **When Illumina reads are provided:** - Workflow Illumina Quality: https://workflowhub.eu/workflows/336?version=1 - Assembly polishing with Pilon<br> - Workflow binnning https://workflowhub.eu/workflows/64?version=11 - Metabat2 - CheckM - BUSCO - GTDB-Tk **All tool CWL files and other workflows can be found here:**<br> Tools: https://git.wur.nl/unlock/cwl/-/tree/master/cwl<br> Workflows: https://git.wur.nl/unlock/cwl/-/tree/master/cwl/workflows<br> The dependencies are either accessible from https://unlock-icat.irods.surfsara.nl (anonymous,anonymous)<br> and/or<br> By using the conda / pip environments as shown in https://git.wur.nl/unlock/docker/-/blob/master/kubernetes/scripts/setup.sh<br>

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
memory Integer (Optional) Maximum memory in MB

Maximum memory usage in megabytes

binning Boolean (Optional) Run binning workflow

Run with contig binning workflow

threads Integer (Optional) Number of threads

Number of threads to use for computational processes

identifier String Identifier used

Identifier for this dataset used in this workflow

metagenome Boolean (Optional) When working with metagenomes

Metagenome option for the flye assembly

deduplicate Boolean (Optional) Deduplicate reads

Remove exact duplicate reads (Illumina) with fastp

destination String (Optional) Output Destination

Optional Output destination used for cwl-prov reporting.

pilon_fixlist String Pilon fix list

A comma-separated list of categories of issues to try to fix

basecall_model String Basecalling model

Basecalling model used with Guppy

kraken_database String Kraken2 database

Absolute path with database location of kraken2

filter_references String[] Contamination reference file(s)

Reference fasta file(s) for contamination filtering

nanopore_fastq_files String[] (Optional) Nanopore reads

List of file paths with Nanopore raw reads in fastq format

nanopore_fastq_reads File[] (Optional) Nanopore FASTQ reads

File(s) of FASTQ reads in gzip format

illumina_forward_reads String[] (Optional) illumina forward reads

illumina sequenced forward read file

illumina_reverse_reads String[] (Optional) illumina reverse reads

illumina sequenced reverse file

use_reference_mapped_reads Boolean Use mapped reads

Continue with reads mapped to the given reference

Steps

ID Runs Label Doc
flye
../flye/flye.cwl (CommandLineTool)
De novo assembler for single molecule sequencing reads, with a focus in Oxford Nanopore Technologies reads

Flye v2.9 assembler with a focus in reads from Oxford Nanopore Technologies.

medaka
../medaka/medaka_py.cwl (CommandLineTool)
Polishing of assembly created from ONT nanopore long reads

Uses Medaka to polish an assembly constructed from on ONT nanopore reads that have been basecalled by Guppy.

Direct install (https://github.com/nanoporetech/medaka): Use the Conda environment to prevent dependencies issues (bcftools, bgzip, minimap2, samtools, tabix). Install Anaconda if failling this.

Environment: conda create -n medaka -c conda-forge -c bioconda medaka

kraken2_krona
../krona/krona.cwl (CommandLineTool)
Krona

Visualization of Kraken2 report results. ktImportText -o $1 $2

workflow_pilon Metagenomics workflow

Workflow pilon assembly polishing Steps: - BBmap (Read mapping to assembly) - Pilon

illumina_kraken2
../kraken2/kraken2.cwl (CommandLineTool)
Kraken2 metagenomics read classification

Kraken2 metagenomics read classification.

Updated databases available at: https://benlangmead.github.io/aws-indexes/k2 (e.g. PlusPF-8) Original db: https://ccb.jhu.edu/software/kraken2/index.shtml?t=downloads

kraken2_compress
../bash/pigz.cwl (CommandLineTool)
compress a file multithreaded with pigz
metaquast_medaka
../metaquast/metaquast.cwl (CommandLineTool)
metaQUAST: Quality Assessment Tool for Metagenome Assemblies

Runs the Quality Assessment Tool for Metagenome Assemblies application

Necessary to install the pre-release to prevent issues: https://github.com/ablab/quast/releases/tag/quast_5.1.0rc1

The working installation followed the method in http://quast.sourceforge.net/docs/manual.html: $ wget https://github.com/ablab/quast/releases/download/quast_5.1.0rc1/quast-5.1.0rc1.tar.gz $ tar -xzf quast-5.1.0rc1.tar.gz $ cd quast-5.1.0rc1/ $ ./setup.py install_full

nanopore_kraken2
../kraken2/kraken2.cwl (CommandLineTool)
Kraken2 metagenomics read classification

Kraken2 metagenomics read classification.

Updated databases available at: https://benlangmead.github.io/aws-indexes/k2 (e.g. PlusPF-8) Original db: https://ccb.jhu.edu/software/kraken2/index.shtml?t=downloads

workflow_binning Metagenomic Binning from Assembly

Workflow for Metagenomics from raw reads to annotated bins.<br> Summary - MetaBAT2 (binning) - CheckM (bin completeness and contamination) - GTDB-Tk (bin taxonomic classification) - BUSCO (bin completeness)

**All tool CWL files and other workflows can be found here:**<br> Tools: https://git.wur.nl/unlock/cwl/-/tree/master/cwl<br> Workflows: https://git.wur.nl/unlock/cwl/-/tree/master/cwl/workflows<br>

The dependencies are either accessible from https://unlock-icat.irods.surfsara.nl (anonymous,anonymous)<br> and/or<br> By using the conda / pip environments as shown in https://git.wur.nl/unlock/docker/-/blob/master/kubernetes/scripts/setup.sh<br>

flye_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

pilon_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

medaka_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

binning_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

kraken2_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

assembly_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

metaquast_nanopore_pilon
../metaquast/metaquast.cwl (CommandLineTool)
metaQUAST: Quality Assessment Tool for Metagenome Assemblies

Runs the Quality Assessment Tool for Metagenome Assemblies application

Necessary to install the pre-release to prevent issues: https://github.com/ablab/quast/releases/tag/quast_5.1.0rc1

The working installation followed the method in http://quast.sourceforge.net/docs/manual.html: $ wget https://github.com/ablab/quast/releases/download/quast_5.1.0rc1/quast-5.1.0rc1.tar.gz $ tar -xzf quast-5.1.0rc1.tar.gz $ cd quast-5.1.0rc1/ $ ./setup.py install_full

workflow_quality_illumina Illumina read quality control, trimming and contamination filter.

**Workflow for Illumina paired read quality control, trimming and filtering.**<br /> Multiple paired datasets will be merged into single paired dataset.<br /> Summary: - FastQC on raw data files<br /> - fastp for read quality trimming<br /> - BBduk for phiX and (optional) rRNA filtering<br /> - Kraken2 for taxonomic classification of reads (optional)<br /> - BBmap for (contamination) filtering using given references (optional)<br /> - FastQC on filtered (merged) data<br />

**All tool CWL files and other workflows can be found here:**<br> Tools: https://git.wur.nl/unlock/cwl/-/tree/master/cwl<br> Workflows: https://git.wur.nl/unlock/cwl/-/tree/master/cwl/workflows<br>

WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default

workflow_quality_nanopore Nanopore Quality Control and Filtering

**Workflow for nanopore read quality control and contamination filtering.** - FastQC before filtering (read quality control) - Kraken2 taxonomic read classification - Minimap2 read filtering based on given references - FastQC after filtering (read quality control)

**All tool CWL files and other workflows can be found here:**<br> Tools: https://git.wur.nl/unlock/cwl/-/tree/master/cwl<br> Workflows: https://git.wur.nl/unlock/cwl/-/tree/master/cwl/workflows<br>

WorkflowHub: https://workflowhub.eu/projects/16/workflows?view=default

illumina_pilon_readmapping
../bbmap/bbmap.cwl (CommandLineTool)
BBMap

Read filtering using BBMap against a (contamination) reference genome

metaquast_pilon_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

illumina_pilon_sam_to_sorted_bam
../samtools/sam_to_sorted-bam.cwl (CommandLineTool)
sam to sorted bam

samtools view -@ $2 -hu $1 | samtools sort -@ $2 -o $3.bam

metaquast_medaka_files_to_folder
../expressions/files_to_folder.cwl (ExpressionTool)

Transforms the input files to a mentioned directory

Outputs

ID Type Label Doc
binning_output Directory Binning output

Binning outputfolders

kraken2_output Directory Kraken2 reports

Kraken2 taxonomic classification reports

assembly_output Directory Assembly output

Output from different assembly steps

illumina_quality_stats Directory Filtered statistics

Statistics on quality and preprocessing of the reads

nanopore_quality_output Directory Read quality and filtering reports

Quality reports

Permalink: https://w3id.org/cwl/view/git/b9097b82e6ab6f2c9496013ce4dd6877092956a0/cwl/workflows/workflow_nanopore_assembly.cwl