Workflow: stats.cwl

Fetched 2023-01-14 09:37:53 GMT
children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs


ID Type Title Doc
reads File[]
assembler String
sequences File
output_dest String
min_contig_length Integer


ID Runs Label Doc
readfq.cwl (CommandLineTool)

usage: kseq_fastq_base input.fastq.gz [input2.fastq.gz input3.fastq.gz ...]

Script to calculate base count of fastq files.

positional arguments: input.fastq.gz Raw read files

bwa-mem.cwl (CommandLineTool)

Usage: bwa mem [options] <idxbase> <in1.fq> [in2.fq]

Algorithm options: -w INT band width for banded alignment [100] -d INT off-diagonal X-dropoff [100] -r FLOAT look for internal seeds inside a seed longer than {-k} * FLOAT [1.5] -y INT seed occurrence for the 3rd round seeding [20] -c INT skip seeds with more than INT occurrences [500] -D FLOAT drop chains shorter than FLOAT fraction of the longest overlapping chain [0.50] -W INT discard a chain if seeded bases shorter than INT [0] -m INT perform at most INT rounds of mate rescues for each read [50] -S skip mate rescue -P skip pairing; mate rescue performed unless -S also in use -e discard full-length exact matches

Scoring options:

-A INT score for a sequence match, which scales options -TdBOELU unless overridden [1] -B INT penalty for a mismatch [4] -O INT[,INT] gap open penalties for deletions and insertions [6,6] -E INT[,INT] gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1] -L INT[,INT] penalty for 5'- and 3'-end clipping [5,5] -U INT penalty for an unpaired read pair [17]

-x STR read type. Setting -x changes multiple parameters unless overriden [null] pacbio: -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0 (PacBio reads to ref) ont2d: -k14 -W20 -r10 -A1 -B1 -O1 -E1 -L0 (Oxford Nanopore 2D-reads to ref) intractg: -B9 -O16 -L5 (intra-species contigs to ref)

Input/output options:

-p smart pairing (ignoring in2.fq) -R STR read group header line such as '@RG\tID:foo\tSM:bar' [null] -H STR/FILE insert STR to header if it starts with @; or insert lines in FILE [null] -j treat ALT contigs as part of the primary assembly (i.e. ignore <idxbase>.alt file)

-v INT verbose level: 1=error, 2=warning, 3=message, 4+=debugging [3] -T INT minimum score to output [30] -h INT[,INT] if there are <INT hits with score >80% of the max score, output all in XA [5,200] -a output all alignments for SE or unpaired PE -C append FASTA/FASTQ comment to SAM output -V output the reference FASTA header in the XR tag -Y use soft clipping for supplementary alignments -M mark shorter split hits as secondary

-I FLOAT[,FLOAT[,INT[,INT]]] specify the mean, standard deviation (10% of the mean if absent), max (4 sigma from the mean if absent) and min of the insert size distribution. FR orientation only. [inferred]

Note: Please read the man page for detailed description of the command line and options.

bwa-index.cwl (CommandLineTool)

Usage: bwa index [options] <in.fasta>

Options: -a STR BWT construction algorithm: bwtsw or is [auto] -p STR prefix of the index [same as fasta name] -b INT block size for the bwtsw algorithm (effective with -a bwtsw) [10000000] -6 index files named as <in.fasta>.64.* instead of <in.fasta>.*

Warning: `-a bwtsw' does not work for short genomes, while `-a is' and `-a div' do not work not for long genomes.

metabat-jgi-summarise.cwl (CommandLineTool)

Usage: jgi_summarize_bam_contig_depths <options> sortedBam1 [ sortedBam2 ...] where options include: --outputDepth arg The file to put the contig by bam depth matrix (default: STDOUT) --percentIdentity arg The minimum end-to-end % identity of qualifying reads (default: 97) --pairedContigs arg The file to output the sparse matrix of contigs which paired reads span (default: none) --unmappedFastq arg The prefix to output unmapped reads from each bam file suffixed by 'bamfile.bam.fastq.gz' --noIntraDepthVariance Do not include variance from mean depth along the contig --showDepth Output a .depth file per bam for each contig base --minMapQual arg The minimum mapping quality necessary to count the read as mapped (default: 0) --weightMapQual arg Weight per-base depth based on the MQ of the read (i.e uniqueness) (default: 0.0 (disabled)) --includeEdgeBases When calculating depth & variance, include the 1-readlength edges (off by default) --maxEdgeBases When calculating depth & variance, and not --includeEdgeBases, the maximum length (default:75) --referenceFasta arg The reference file. (It must be the same fasta that bams used)

Options that require a --referenceFasta --outputGC arg The file to print the gc coverage histogram --gcWindow arg The sliding window size for GC calculations --outputReadStats arg The file to print the per read statistics --outputKmers arg The file to print the perfect kmer counts

Options to control shredding contigs that are under represented by the reads --shredLength arg The maximum length of the shreds --shredDepth arg The depth to generate overlapping shreds --minContigLength arg The mimimum length of contig to include for mapping and shredding --minContigDepth arg The minimum depth along contig at which to break the contig

stats-report.cwl (CommandLineTool)

usage: [-h] output coverage_file base_count

Script to calculate coverage from file and output report

positional arguments: output Output file coverage_file file base_count Sum of base count for all input files

optional arguments: -h, --help show this help message and exit

samtools-sort.cwl (CommandLineTool)

samtools-sort.cwl is developed for CWL consortium Usage: samtools sort [options...] [in.bam] Options: -l INT Set compression level, from 0 (uncompressed) to 9 (best) -m INT Set maximum memory per thread; suffix K/M/G recognized [768M] -n Sort by read name -o FILE Write final output to FILE rather than standard output -O FORMAT Write output as FORMAT ('sam'/'bam'/'cram') (either -O or -T PREFIX Write temporary files to PREFIX.nnnn.bam -T is required) -@ INT Set number of sorting and compression threads [1]

Legacy usage: samtools sort [options...] <in.bam> <out.prefix> Options: -f Use <out.prefix> as full final filename rather than prefix -o Write final output to stdout rather than <out.prefix>.bam -l,m,n,@ Similar to corresponding options above

samtools-view.cwl (CommandLineTool)

samtools-view.cwl is developed for CWL consortium Usage: samtools view [options] <in.bam>|<in.sam>|<in.cram> [region ...]

Options: -b output BAM -C output CRAM (requires -T) -1 use fast BAM compression (implies -b) -u uncompressed BAM output (implies -b) -h include header in SAM output -H print SAM header only (no alignments) -c print only the count of matching records -o FILE output file name [stdout] -U FILE output reads not selected by filters to FILE [null] -t FILE FILE listing reference names and lengths (see long help) [null] -T FILE reference sequence FASTA FILE [null] -L FILE only include reads overlapping this BED FILE [null] -r STR only include reads in read group STR [null] -R FILE only include reads with read group listed in FILE [null] -q INT only include reads with mapping quality >= INT [0] -l STR only include reads in library STR [null] -m INT only include reads with number of CIGAR operations consuming query sequence >= INT [0] -f INT only include reads with all bits set in INT set in FLAG [0] -F INT only include reads with none of the bits set in INT set in FLAG [0] -x STR read tag to strip (repeatable) [null] -B collapse the backward CIGAR operation -s FLOAT integer part sets seed of random number generator [0]; rest sets fraction of templates to subsample [no subsampling] -@ INT number of BAM compression threads [0]

samtools-index.cwl (CommandLineTool)

samtools-index.cwl is developed for CWL consortium


ID Type Label Doc
logfile File
bwa_mem_output File
bwa_index_output File
samtools_sort_output File
samtools_view_output File
samtools_index_output File
metabat_coverage_output File