Workflow: WGS and MT analysis for fastq files

Fetched 2023-01-08 18:53:40 GMT

rna / protein - qc, preprocess, filter, annotation, index, abundance

children parents
workflow cluster_inputs Workflow Inputs cluster_outputs Workflow Outputs jobid jobid preProcess preprocess fasta jobid->preProcess jobid protAnnotate protein annotation jobid->protAnnotate jobid indexSimSeq index sim seq jobid->indexSimSeq jobid qcBasic qcBasic jobid->qcBasic jobid dereplication dereplication jobid->dereplication outPrefix abundance abundance jobid->abundance jobid rnaAnnotate rna annotation jobid->rnaAnnotate jobid darkmatter extract darkmatter jobid->darkmatter outName m5rnaFull m5rnaFull m5rnaFull->rnaAnnotate m5rnaFull m5nrBDB m5nrBDB m5nrBDB->protAnnotate m5nrBDB m5nrBDB->rnaAnnotate m5nrBDB m5rnaIndex m5rnaIndex m5rnaIndex->rnaAnnotate m5rnaIndex m5rnaPrefix m5rnaPrefix m5rnaPrefix->rnaAnnotate m5rnaPrefix m5nrSCG m5nrSCG m5nrSCG->protAnnotate m5nrSCG m5nrSCG->abundance m5nrSCG maxAmbig maxAmbig maxAmbig->preProcess maxAmbig derepPrefix derepPrefix derepPrefix->dereplication prefixLength m5nrFull m5nrFull m5nrFull->protAnnotate m5nrFull sequences sequences sequences->preProcess sequences sequences->qcBasic sequences filterAmbig filterAmbig filterAmbig->preProcess filterAmbig m5rnaClust m5rnaClust m5rnaClust->rnaAnnotate m5rnaClust filterLn filterLn filterLn->preProcess filterLn deviation deviation deviation->preProcess deviation rnaClustMapOut rnaClustMapOut preProcessPassed preProcessPassed dereplicationRemoved dereplicationRemoved seqBinOut seqBinOut seqStatOut seqStatOut preProcessRemoved preProcessRemoved rnaClustSeqOut rnaClustSeqOut qcStatOut qcStatOut protFeatureOut protFeatureOut protFilterFeatureOut protFilterFeatureOut rnaSimsOut rnaSimsOut adapterPassed adapterPassed simSeqOut simSeqOut rnaFeatureOut rnaFeatureOut sourceStatsOut sourceStatsOut qcSummaryOut qcSummaryOut dereplicationPassed dereplicationPassed protClustMapOut protClustMapOut md5ProfileOut md5ProfileOut lcaProfileOut lcaProfileOut darkmatterOut darkmatterOut protSimsOut protSimsOut protClustSeqOut protClustSeqOut preProcess->preProcessPassed preProcess->preProcessRemoved preProcess->adapterPassed preProcess->dereplication sequences preProcess->rnaAnnotate sequences protAnnotate->protFeatureOut protAnnotate->protFilterFeatureOut protAnnotate->protClustMapOut protAnnotate->protSimsOut protAnnotate->protClustSeqOut protAnnotate->indexSimSeq filterSims protAnnotate->indexSimSeq featureSeqs protAnnotate->indexSimSeq clustMaps protAnnotate->abundance filterSims protAnnotate->abundance protClustMap protAnnotate->abundance expandSims protAnnotate->abundance protExpandLca protAnnotate->darkmatter clustMap protAnnotate->darkmatter geneSeq protAnnotate->darkmatter simHit indexSimSeq->simSeqOut indexSimSeq->abundance md5index qcBasic->seqBinOut qcBasic->seqStatOut qcBasic->qcStatOut qcBasic->qcSummaryOut qcBasic->preProcess stats dereplication->dereplicationRemoved dereplication->dereplicationPassed dereplication->protAnnotate sequences abundance->sourceStatsOut abundance->md5ProfileOut abundance->lcaProfileOut rnaAnnotate->rnaClustMapOut rnaAnnotate->rnaClustSeqOut rnaAnnotate->rnaSimsOut rnaAnnotate->rnaFeatureOut rnaAnnotate->protAnnotate rnaClustMap rnaAnnotate->protAnnotate rnaSims rnaAnnotate->indexSimSeq filterSims rnaAnnotate->indexSimSeq featureSeqs rnaAnnotate->indexSimSeq clustMaps rnaAnnotate->abundance filterSims rnaAnnotate->abundance rnaClustMap rnaAnnotate->abundance expandSims rnaAnnotate->abundance rnaExpandLca rnaAnnotate->darkmatter clustMap rnaAnnotate->darkmatter simHit darkmatter->darkmatterOut
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
jobid String
m5nrBDB File
m5nrSCG File
filterLn Boolean
m5nrFull File[]
maxAmbig Integer
deviation Float
m5rnaFull File
sequences File
m5rnaClust File
m5rnaIndex Directory
derepPrefix Integer
filterAmbig Boolean
m5rnaPrefix String

Steps

ID Runs Label Doc
qcBasic
abundance abundance

abundace profiles from annotated files, for protein and/or rna

darkmatter
../Tools/extract_darkmatter.tool.cwl (CommandLineTool)
extract darkmatter

retrieve predicted proteins that have no similarity hits >extract_darkmatter.py -i <input> -s <sim 1> -s <sim 2> -m <clust map 1> -m <clust map 2> -o <outName>

preProcess preprocess fasta

Remove reads from fasta files based on sequence stats. Return fasta files with reads passed and reads removed.

indexSimSeq index sim seq

create sorted / filtered similarity file with feature sequences, and index by md5

rnaAnnotate rna annotation

RNAs - predict, cluster, identify, annotate

protAnnotate protein annotation

Proteins - predict, filter, cluster, identify, annotate

dereplication
../Tools/dereplication.tool.cwl (CommandLineTool)
dereplication

Keep only one of sequence sets with identical prefixes

Outputs

ID Type Label Doc
qcStatOut File
seqBinOut File
simSeqOut File
rnaSimsOut File
seqStatOut File
protSimsOut File
qcSummaryOut File
adapterPassed File
darkmatterOut File
lcaProfileOut File
md5ProfileOut File
rnaFeatureOut File
protFeatureOut File
rnaClustMapOut File
rnaClustSeqOut File
sourceStatsOut File
protClustMapOut File
protClustSeqOut File
preProcessPassed File
preProcessRemoved File
dereplicationPassed File
dereplicationRemoved File
protFilterFeatureOut File
Permalink: https://w3id.org/cwl/view/git/4e4d2e674bde612f98f2b0370445f8b2a47587df/CWL/Workflows/wgs-noscreen-fasta.workflow.cwl