Workflow: WGS and MT analysis for fastq files

Fetched 2023-01-09 01:14:39 GMT

rna / protein - qc, preprocess, filter, annotation, index, abundance

children parents
workflow cluster_outputs Workflow Outputs cluster_inputs Workflow Inputs indexName indexName orgScreen screen out taxa indexName->orgScreen indexName m5rnaFull m5rnaFull rnaAnnotate rna annotation m5rnaFull->rnaAnnotate m5rnaFull deviation deviation preProcess preprocess fasta deviation->preProcess deviation jobid jobid abundance abundance jobid->abundance jobid indexSimSeq index sim seq jobid->indexSimSeq jobid protAnnotate protein annotation jobid->protAnnotate jobid jobid->preProcess jobid darkmatter extract darkmatter jobid->darkmatter outName dereplication dereplication jobid->dereplication outPrefix jobid->orgScreen jobid jobid->rnaAnnotate jobid qcBasic qcBasic jobid->qcBasic jobid m5nrFull m5nrFull m5nrFull->protAnnotate m5nrFull indexDir indexDir indexDir->orgScreen indexDir derepPrefix derepPrefix derepPrefix->dereplication prefixLength maxAmbig maxAmbig maxAmbig->preProcess maxAmbig m5rnaClust m5rnaClust m5rnaClust->rnaAnnotate m5rnaClust filterLn filterLn filterLn->preProcess filterLn sequences sequences sequences->preProcess sequences sequences->qcBasic sequences m5nrSCG m5nrSCG m5nrSCG->abundance m5nrSCG m5nrSCG->protAnnotate m5nrSCG m5rnaIndex m5rnaIndex m5rnaIndex->rnaAnnotate m5rnaIndex filterAmbig filterAmbig filterAmbig->preProcess filterAmbig m5rnaPrefix m5rnaPrefix m5rnaPrefix->rnaAnnotate m5rnaPrefix m5nrBDB m5nrBDB m5nrBDB->protAnnotate m5nrBDB m5nrBDB->rnaAnnotate m5nrBDB adapterPassed adapterPassed seqStatOut seqStatOut protClustMapOut protClustMapOut lcaProfileOut lcaProfileOut seqBinOut seqBinOut rnaFeatureOut rnaFeatureOut protClustSeqOut protClustSeqOut qcStatOut qcStatOut dereplicationRemoved dereplicationRemoved protSimsOut protSimsOut rnaClustMapOut rnaClustMapOut dereplicationPassed dereplicationPassed simSeqOut simSeqOut orgScreenPassed orgScreenPassed rnaClustSeqOut rnaClustSeqOut md5ProfileOut md5ProfileOut protFeatureOut protFeatureOut darkmatterOut darkmatterOut preProcessPassed preProcessPassed rnaSimsOut rnaSimsOut sourceStatsOut sourceStatsOut qcSummaryOut qcSummaryOut preProcessRemoved preProcessRemoved protFilterFeatureOut protFilterFeatureOut abundance->lcaProfileOut abundance->md5ProfileOut abundance->sourceStatsOut indexSimSeq->simSeqOut indexSimSeq->abundance md5index protAnnotate->protClustMapOut protAnnotate->protClustSeqOut protAnnotate->protSimsOut protAnnotate->protFeatureOut protAnnotate->protFilterFeatureOut protAnnotate->abundance protClustMap protAnnotate->abundance expandSims protAnnotate->abundance filterSims protAnnotate->abundance protExpandLca protAnnotate->indexSimSeq clustMaps protAnnotate->indexSimSeq filterSims protAnnotate->indexSimSeq featureSeqs protAnnotate->darkmatter simHit protAnnotate->darkmatter geneSeq protAnnotate->darkmatter clustMap preProcess->adapterPassed preProcess->preProcessPassed preProcess->preProcessRemoved preProcess->dereplication sequences preProcess->rnaAnnotate sequences darkmatter->darkmatterOut dereplication->dereplicationRemoved dereplication->dereplicationPassed dereplication->orgScreen sequences orgScreen->orgScreenPassed orgScreen->protAnnotate sequences rnaAnnotate->rnaFeatureOut rnaAnnotate->rnaClustMapOut rnaAnnotate->rnaClustSeqOut rnaAnnotate->rnaSimsOut rnaAnnotate->abundance rnaExpandLca rnaAnnotate->abundance expandSims rnaAnnotate->abundance rnaClustMap rnaAnnotate->abundance filterSims rnaAnnotate->indexSimSeq clustMaps rnaAnnotate->indexSimSeq filterSims rnaAnnotate->indexSimSeq featureSeqs rnaAnnotate->protAnnotate rnaClustMap rnaAnnotate->protAnnotate rnaSims rnaAnnotate->darkmatter simHit rnaAnnotate->darkmatter clustMap qcBasic->seqStatOut qcBasic->seqBinOut qcBasic->qcStatOut qcBasic->qcSummaryOut qcBasic->preProcess stats
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
jobid String
m5nrBDB File
m5nrSCG File
filterLn Boolean
indexDir Directory
m5nrFull File[]
maxAmbig Integer
deviation Float
indexName String (Optional)
m5rnaFull File
sequences File
m5rnaClust File
m5rnaIndex Directory
derepPrefix Integer
filterAmbig Boolean
m5rnaPrefix String

Steps

ID Runs Label Doc
qcBasic
abundance abundance

abundace profiles from annotated files, for protein and/or rna

orgScreen screen out taxa

Remove sequences which align against a reference set using bowtie2. The references are preformatted (index files)

darkmatter
../Tools/extract_darkmatter.tool.cwl (CommandLineTool)
extract darkmatter

retrieve predicted proteins that have no similarity hits >extract_darkmatter.py -i <input> -s <sim 1> -s <sim 2> -m <clust map 1> -m <clust map 2> -o <outName>

preProcess preprocess fasta

Remove reads from fasta files based on sequence stats. Return fasta files with reads passed and reads removed.

indexSimSeq index sim seq

create sorted / filtered similarity file with feature sequences, and index by md5

rnaAnnotate rna annotation

RNAs - predict, cluster, identify, annotate

protAnnotate protein annotation

Proteins - predict, filter, cluster, identify, annotate

dereplication
../Tools/dereplication.tool.cwl (CommandLineTool)
dereplication

Keep only one of sequence sets with identical prefixes

Outputs

ID Type Label Doc
qcStatOut File
seqBinOut File
simSeqOut File
rnaSimsOut File
seqStatOut File
protSimsOut File
qcSummaryOut File
adapterPassed File
darkmatterOut File
lcaProfileOut File
md5ProfileOut File
rnaFeatureOut File
protFeatureOut File
rnaClustMapOut File
rnaClustSeqOut File
sourceStatsOut File
orgScreenPassed File
protClustMapOut File
protClustSeqOut File
preProcessPassed File
preProcessRemoved File
dereplicationPassed File
dereplicationRemoved File
protFilterFeatureOut File
Permalink: https://w3id.org/cwl/view/git/091374dc59a23966338638a668ae397d4ee20b2f/CWL/Workflows/wgs-fasta.workflow.cwl