Workflow: EMG assembly for paired end Illumina

Fetched 2023-01-10 12:13:15 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
mapseq_ref File [FASTA]
forward_reads File [FASTQ]
reverse_reads File [FASTQ]
mapseq_taxonomies File[]
assembly_mem_limit Integer

in Gb

fraggenescan_model https://w3id.org/cwl/view/git/30397448563d06c342b25a3603c97b6fff7ba7d3/tools/FragGeneScan-model.yaml#model
covariance_model_database File

Steps

ID Runs Label Doc
cmscan
../tools/infernal-cmscan.cwl (CommandLineTool)
search sequence(s) against a covariance model database

http://eddylab.org/infernal/Userguide.pdf

assembly
../tools/metaspades.cwl (CommandLineTool)
metaSPAdes: de novo metagenomics assembler

https://arxiv.org/abs/1604.03071 http://cab.spbu.ru/files/release3.10.1/manual.html#meta

extract_SSUs
../tools/esl-sfetch-manyseqs.cwl (CommandLineTool)
extract by names from an indexed sequence file

https://github.com/EddyRivasLab/easel

fraggenescan
../tools/FragGeneScan1_20.cwl (CommandLineTool)
FragGeneScan: find (fragmented) genes in short reads

FragGeneScan is an application for finding (fragmented) genes in short reads. It can also be applied to predict prokaryotic genes in incomplete assemblies or complete genomes.

FragGeneScan was first released through omics website (http://omics.informatics.indiana.edu/FragGeneScan/) in March 2010, where you can find its old releases. FragGeneScan migrated to SourceForge in October, 2013 (https://sourceforge.net/projects/fraggenescan/).

Version 1.20 can be downloaded here: https://sourceforge.net/projects/fraggenescan/files/

interproscan
../tools/InterProScan5.21-60.cwl (CommandLineTool)
InterProScan: protein sequence classifier

Version 5.21-60 can be downloaded here: https://github.com/ebi-pf-team/interproscan/wiki/HowToDownload

Documentation on how to run InterProScan 5 can be found here: https://github.com/ebi-pf-team/interproscan/wiki/HowToRun

classify_SSUs
../tools/mapseq.cwl (CommandLineTool)
MAPseq

sequence read classification tools designed to assign taxonomy and OTU classifications to ribosomal RNA sequences. http://meringlab.org/software/mapseq/

get_SSU_coords
index_scaffolds
../tools/esl-sfetch-index.cwl (CommandLineTool)
index a sequence file for use by esl-sfetch

https://github.com/EddyRivasLab/easel

remove_asterisks_and_reformat
../tools/esl-reformat.cwl (CommandLineTool)
normalize to fasta

normalizes input sequeces to FASTA with fixed number of sequence characters per line using esl-reformat from https://github.com/EddyRivasLab/easel

Outputs

ID Type Label Doc
SSUs File
pCDS File
scaffolds File
annotations File
classifications File
Permalink: https://w3id.org/cwl/view/git/30397448563d06c342b25a3603c97b6fff7ba7d3/workflows/emg-assembly.cwl