Workflow: 03-map-pe-blacklist-removal.cwl

Fetched 2023-01-10 03:20:04 GMT

ATAC-seq 03 mapping - reads: PE - blacklist removal

children parents
workflow cluster_inputs Workflow Inputs cluster_outputs Workflow Outputs input_fastq_read1_files input_fastq_read1_files extract_basename_1 extract_basename_1 input_fastq_read1_files->extract_basename_1 input_file bowtie-pe bowtie-pe input_fastq_read1_files->bowtie-pe input_fastq_read1_file ENCODE_blacklist_bedfile ENCODE_blacklist_bedfile remove_encode_blacklist remove_encode_blacklist ENCODE_blacklist_bedfile->remove_encode_blacklist b nthreads nthreads nthreads->bowtie-pe nthreads filtered2sorted filtered2sorted nthreads->filtered2sorted nthreads sort_dedup_bams sort_dedup_bams nthreads->sort_dedup_bams nthreads sort_bams sort_bams nthreads->sort_bams nthreads sort_dups_marked_bams sort_dups_marked_bams nthreads->sort_dups_marked_bams nthreads sort_masked_bams sort_masked_bams nthreads->sort_masked_bams nthreads sam2bam sam2bam nthreads->sam2bam nthreads picard_java_opts picard_java_opts mark_duplicates mark_duplicates picard_java_opts->mark_duplicates java_opts genome_ref_first_index_file genome_ref_first_index_file genome_ref_first_index_file->bowtie-pe genome_ref_first_index_file genome_sizes_file genome_sizes_file execute_pcr_bottleneck_coef execute_pcr_bottleneck_coef genome_sizes_file->execute_pcr_bottleneck_coef genome_sizes picard_jar_path picard_jar_path picard_jar_path->mark_duplicates picard_jar_path input_fastq_read2_files input_fastq_read2_files input_fastq_read2_files->bowtie-pe input_fastq_read2_file output_data_sorted_dedup_bam_files output_data_sorted_dedup_bam_files output_percent_mitochondrial_reads output_percent_mitochondrial_reads output_preseq_c_curve_files output_preseq_c_curve_files output_read_count_mapped_filtered output_read_count_mapped_filtered output_pbc_files output_pbc_files output_read_count_mapped output_read_count_mapped output_percentage_uniq_reads output_percentage_uniq_reads output_bowtie_log output_bowtie_log output_data_sorted_dups_marked_bam_files output_data_sorted_dups_marked_bam_files output_picard_mark_duplicates_files output_picard_mark_duplicates_files index_dedup_bams index_dedup_bams index_dedup_bams->output_data_sorted_dedup_bam_files extract_basename_2 extract_basename_2 extract_basename_1->extract_basename_2 file_path index_bams index_bams bam_idxstats bam_idxstats index_bams->bam_idxstats bam remove_duplicates remove_duplicates remove_duplicates->sort_dedup_bams input_file index_dups_marked_bams index_dups_marked_bams index_dups_marked_bams->output_data_sorted_dups_marked_bam_files index_dups_marked_bams->remove_duplicates input_file bowtie-pe->output_bowtie_log mapped_reads_count mapped_reads_count bowtie-pe->mapped_reads_count bowtie_log bowtie-pe->sam2bam input_file index_filtered_bam index_filtered_bam masked_file_basename masked_file_basename remove_encode_blacklist->masked_file_basename input_file remove_encode_blacklist->sort_masked_bams input_file filtered2sorted->index_filtered_bam input_file filtered2sorted->remove_encode_blacklist a preseq-c-curve preseq-c-curve filtered2sorted->preseq-c-curve input_sorted_file filtered2sorted->execute_pcr_bottleneck_coef input_bam_files percent_mitochondrial_reads percent_mitochondrial_reads percent_mitochondrial_reads->output_percent_mitochondrial_reads masked_file_basename->mark_duplicates output_filename mapped_reads_count->output_read_count_mapped bam_idxstats->percent_mitochondrial_reads idxstats sort_dedup_bams->index_dedup_bams input_file mapped_filtered_reads_count mapped_filtered_reads_count sort_dedup_bams->mapped_filtered_reads_count input_bam_file sort_bams->index_bams input_file filter-unmapped filter-unmapped sort_bams->filter-unmapped input_file preseq-c-curve->output_preseq_c_curve_files percent_uniq_reads percent_uniq_reads preseq-c-curve->percent_uniq_reads preseq_c_curve_outfile mark_duplicates->output_picard_mark_duplicates_files mark_duplicates->sort_dups_marked_bams input_file extract_basename_2->bowtie-pe output_filename extract_basename_2->remove_encode_blacklist output_basename_file extract_basename_2->preseq-c-curve output_file_basename extract_basename_2->filter-unmapped output_filename extract_basename_2->execute_pcr_bottleneck_coef input_output_filenames sort_dups_marked_bams->index_dups_marked_bams input_file percent_uniq_reads->output_percentage_uniq_reads mapped_filtered_reads_count->output_read_count_mapped_filtered index_masked_bams index_masked_bams sort_masked_bams->index_masked_bams input_file filter-unmapped->filtered2sorted input_file execute_pcr_bottleneck_coef->output_pbc_files sam2bam->sort_bams input_file index_masked_bams->mark_duplicates input_file
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
nthreads Integer
picard_jar_path String

Picard Java jar file

picard_java_opts String (Optional)

JVM arguments should be a quoted, space separated list (e.g. \"-Xms128m -Xmx512m\")

genome_sizes_file File

Genome sizes tab-delimited file (used in samtools)

input_fastq_read1_files File[]

Input fastq files for paired_read1

input_fastq_read2_files File[]

Input fastq files for paired_read2

ENCODE_blacklist_bedfile File

Bedfile containing ENCODE consensus blacklist regions to be excluded.

genome_ref_first_index_file File

Bowtie first index files for reference genome (e.g. *1.ebwt). The rest of the files should be in the same folder.

Steps

ID Runs Label Doc
sam2bam
../map/samtools2bam.cwl (CommandLineTool)
bowtie-pe
../map/bowtie-pe.cwl (CommandLineTool)
sort_bams
../map/samtools-sort.cwl (CommandLineTool)
index_bams
../map/samtools-index.cwl (CommandLineTool)
bam_idxstats
../map/samtools-idxstats.cwl (CommandLineTool)
preseq-c-curve
../map/preseq-c_curve.cwl (CommandLineTool)

Usage: c_curve [OPTIONS] <sorted-bed-file>

Options: -o, -output yield output file (default: stdout) -s, -step step size in extrapolations (default: 1e+06) -v, -verbose print more information -P, -pe input is paired end read file -H, -hist input is a text file containing the observed histogram -V, -vals input is a text file containing only the observed counts -B, -bam input is in BAM format -l, -seg_len maximum segment length when merging paired end bam reads (default: 5000)

Help options: -?, -help print this help message -about print about message

filter-unmapped
../map/samtools-filter-unmapped.cwl (CommandLineTool)
filtered2sorted
../map/samtools-sort.cwl (CommandLineTool)
mark_duplicates
../map/picard-MarkDuplicates.cwl (CommandLineTool)
sort_dedup_bams
../map/samtools-sort.cwl (CommandLineTool)
index_dedup_bams
../map/samtools-index.cwl (CommandLineTool)
sort_masked_bams
../map/samtools-sort.cwl (CommandLineTool)
index_masked_bams
../map/samtools-index.cwl (CommandLineTool)
remove_duplicates
../map/samtools-view.cwl (CommandLineTool)
extract_basename_1
../utils/extract-basename.cwl (CommandLineTool)

Extracts the base name of a file

extract_basename_2
../utils/remove-extension.cwl (CommandLineTool)

Extracts the base name of a file

index_filtered_bam
../map/samtools-index.cwl (CommandLineTool)
mapped_reads_count
../map/bowtie-log-read-count.cwl (CommandLineTool)

Get number of processed reads from Bowtie log.

percent_uniq_reads
../map/preseq-percent-uniq-reads.cwl (CommandLineTool)

Get number of processed reads from Bowtie log.

masked_file_basename
../utils/extract-basename.cwl (CommandLineTool)

Extracts the base name of a file

sort_dups_marked_bams
../map/samtools-sort.cwl (CommandLineTool)
index_dups_marked_bams
../map/samtools-index.cwl (CommandLineTool)
remove_encode_blacklist
../map/bedtools-intersect.cwl (CommandLineTool)

Tool: bedtools intersect (aka intersectBed) Version: v2.25.0 Summary: Report overlaps between two feature files.

Usage: bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>

Note: -b may be followed with multiple databases and/or wildcard (*) character(s). Options: -wa Write the original entry in A for each overlap.

-wb Write the original entry in B for each overlap. - Useful for knowing _what_ A overlaps. Restricted by -f and -r.

-loj Perform a \"left outer join\". That is, for each feature in A report each overlap with B. If no overlaps are found, report a NULL feature for B.

-wo Write the original A and B entries plus the number of base pairs of overlap between the two features. - Overlaps restricted by -f and -r. Only A features with overlap are reported.

-wao Write the original A and B entries plus the number of base pairs of overlap between the two features. - Overlapping features restricted by -f and -r. However, A features w/o overlap are also reported with a NULL B feature and overlap = 0.

-u Write the original A entry _once_ if _any_ overlaps found in B. - In other words, just report the fact >=1 hit was found. - Overlaps restricted by -f and -r.

-c For each entry in A, report the number of overlaps with B. - Reports 0 for A entries that have no overlap with B. - Overlaps restricted by -f and -r.

-v Only report those entries in A that have _no overlaps_ with B. - Similar to \"grep -v\" (an homage).

-ubam Write uncompressed BAM output. Default writes compressed BAM.

-s Require same strandedness. That is, only report hits in B that overlap A on the _same_ strand. - By default, overlaps are reported without respect to strand.

-S Require different strandedness. That is, only report hits in B that overlap A on the _opposite_ strand. - By default, overlaps are reported without respect to strand.

-f Minimum overlap required as a fraction of A. - Default is 1E-9 (i.e., 1bp). - FLOAT (e.g. 0.50)

-F Minimum overlap required as a fraction of B. - Default is 1E-9 (i.e., 1bp). - FLOAT (e.g. 0.50)

-r Require that the fraction overlap be reciprocal for A AND B. - In other words, if -f is 0.90 and -r is used, this requires that B overlap 90 percent of A and A _also_ overlaps 90 percent of B.

-e Require that the minimum fraction be satisfied for A OR B. - In other words, if -e is used with -f 0.90 and -F 0.10 this requires that either 90 percent of A is covered OR 10 percent of B is covered. Without -e, both fractions would have to be satisfied.

-split Treat \"split\" BAM or BED12 entries as distinct BED intervals.

-g Provide a genome file to enforce consistent chromosome sort order across input files. Only applies when used with -sorted option.

-nonamecheck For sorted data, don't throw an error if the file has different naming conventions for the same chromosome. ex. \"chr1\" vs \"chr01\".

-sorted Use the \"chromsweep\" algorithm for sorted (-k1,1 -k2,2n) input.

-names When using multiple databases, provide an alias for each that will appear instead of a fileId when also printing the DB record.

-filenames When using multiple databases, show each complete filename instead of a fileId when also printing the DB record.

-sortout When using multiple databases, sort the output DB hits for each record.

-bed If using BAM input, write output as BED.

-header Print the header from the A file prior to results.

-nobuf Disable buffered output. Using this option will cause each line of output to be printed as it is generated, rather than saved in a buffer. This will make printing large output files noticeably slower, but can be useful in conjunction with other software tools and scripts that need to process one line of bedtools output at a time.

-iobuf Specify amount of memory to use for input buffer. Takes an integer argument. Optional suffixes K/M/G supported. Note: currently has no effect with compressed files.

Notes: (1) When a BAM file is used for the A file, the alignment is retained if overlaps exist, and exlcuded if an overlap cannot be found. If multiple overlaps exist, they are not reported, as we are only testing for one or more overlaps.

execute_pcr_bottleneck_coef

ChIP-seq - map - PCR Bottleneck Coefficients

mapped_filtered_reads_count
../peak_calling/samtools-extract-number-mapped-reads.cwl (CommandLineTool)

Extract mapped reads from BAM file using Samtools flagstat command

percent_mitochondrial_reads
../utils/idxstats-percentage-of-reads-in-chrom.cwl (ExpressionTool)

Outputs

ID Type Label Doc
output_pbc_files File[]

PCR Bottleneck Coeficient files.

output_bowtie_log File[]

Bowtie log file.

output_read_count_mapped File[]

Read counts of the mapped BAM files

output_preseq_c_curve_files File[]

Preseq c_curve output files.

output_percentage_uniq_reads File[]

Percentage of uniq reads from preseq c_curve output

output_read_count_mapped_filtered File[]

Read counts of the mapped and filtered BAM files

output_data_sorted_dedup_bam_files File[]

BAM files without duplicate reads.

output_percent_mitochondrial_reads File[]

Percentage of mitochondrial reads.

output_picard_mark_duplicates_files File[]

Picard MarkDuplicates metrics files.

output_data_sorted_dups_marked_bam_files File[]

BAM files with marked duplicate reads.

Permalink: https://w3id.org/cwl/view/git/67e8ccd5abddbd9e27f23ceeb95536fecf792d93/v1.0/ATAC-seq_pipeline/03-map-pe-blacklist-removal.cwl