Workflow: DiffBind - Differential Binding Analysis of ChIP-Seq Peak Data

Fetched 2023-01-09 04:56:33 GMT

Differential Binding Analysis of ChIP-Seq Peak Data --------------------------------------------------- DiffBind processes ChIP-Seq data enriched for genomic loci where specific protein/DNA binding occurs, including peak sets identified by ChIP-Seq peak callers and aligned sequence read datasets. It is designed to work with multiple peak sets simultaneously, representing different ChIP experiments (antibodies, transcription factor and/or histone marks, experimental conditions, replicates) as well as managing the results of multiple peak callers. For more information please refer to: ------------------------------------- Ross-Innes CS, Stark R, Teschendorff AE, Holmes KA, Ali HR, Dunning MJ, Brown GD, Gojis O, Ellis IO, Green AR, Ali S, Chin S, Palmieri C, Caldas C, Carroll JS (2012). “Differential oestrogen receptor binding is associated with clinical outcome in breast cancer.” Nature, 481, -4.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
alias String Experiment short name/Alias
threads Integer (Optional) Number of threads

Number of threads for those steps that support multithreading

use_common Boolean (Optional) Use common peaks within each condition. Ignore Minimum peakset overlap

Derive consensus peaks only from the common peaks within each condition. Min peakset overlap is ignored. Default: false

min_overlap Integer (Optional) Minimum peakset overlap

Min peakset overlap. Only include peaks in at least this many peaksets when generating consensus peakset. Default: 2

name_cond_1 String (Optional) Condition 1 name, single word with letters and numbers only

Condition 1 name, single word with letters and numbers only

name_cond_2 String (Optional) Condition 2 name, single word with letters and numbers only

Condition 2 name, single word with letters and numbers only

blocked_file File (Optional) [Textual format] Blocking attribute headerless TSV/CSV file for multi-factor analysis with columns to set name and group. If this inputs is set, blocking attributes above are ignored

Blocking attribute metadata file for multi-factor analysis. Headerless TSV/CSV file. First column - names from --name1 and --name2, second column - group name. --block is ignored

cutoff_param https://w3id.org/cwl/view/git/a409db2289b86779897ff19003bd351701a81c50/workflows/diffbind.cwl#cutoff_param/cutoff (Optional) Parameter to which cutoff should be applied

Parameter to which cutoff should be applied (fdr or pvalue). Default: fdr

cutoff_value Float (Optional) P-value or FDR cutoff for reported results

P-value or FDR cutoff for reported results

fragmentsize Integer (Optional) Reads extension size, bp

Extended each read from its endpoint along the appropriate strand. Default: 125bp

promoter_dist Integer (Optional) Promoter distance, bp

Max distance from gene TSS (in both direction) overlapping which the peak will be assigned to the promoter region. Default: 1000 bp

upstream_dist Integer (Optional) Upstream distance, bp

Max distance from the promoter (only in upstream direction) overlapping which the peak will be assigned to the upstream region. Default: 20,000 bp

analysis_method https://w3id.org/cwl/view/git/a409db2289b86779897ff19003bd351701a81c50/workflows/diffbind.cwl#analysis_method/method (Optional) Analysis method

Method by which to analyze differential binding affinity. Default: deseq2

annotation_file File [TSV] Genome annotation

Genome annotation file in TSV format

min_read_counts Integer (Optional) Minimum read counts. Exclude intervals where MAX read counts for all samples < specified value

Min read counts. Exclude all merged intervals where the MAX raw read counts among all of the samples is smaller than the specified value. Default: 0

chrom_length_file File [Textual format] Chromosome length file

Chromosome length file

peak_files_cond_1 File[] [xls] Biological condition 1 samples. Minimum 2 samples

XLS peak files for condition 1 from MACS2. Minimim 2 files. Order corresponds to read_files_cond_1

peak_files_cond_2 File[] [xls] Biological condition 2 samples. Minimum 2 samples

XLS peak files for condition 2 from MACS2. Minimim 2 files. Order corresponds to read_files_cond_2

read_files_cond_1 File[] [BAM] Biological condition 1 samples. Minimum 2 samples

Read files for condition 1. Minimim 2 files in BAM format

read_files_cond_2 File[] [BAM] Biological condition 2 samples. Minimum 2 samples

Read files for condition 2. Minimim 2 files in BAM format

remove_duplicates Boolean (Optional) Remove duplicated reads

Remove reads that map to exactly the same genomic position. Default: false

blocked_attributes String[] (Optional) Blocking attributes for multi-factor analysis. Minimum 2

Blocking attributes for multi-factor analysis. Minimum 2. Either names from --name1 or/and --name2 or array of strings that can be parsed by R to bool. In the later case the order and size should correspond to [--read1]+[--read2]. Default: not applied

sample_names_cond_1 String[] (Optional) Biological condition 1 sample names

Aliases for biological condition 1 samples to make the legend for generated plots. Order corresponds to the read_files_cond_1

sample_names_cond_2 String[] (Optional) Biological condition 2 sample names

Aliases for biological condition 2 samples to make the legend for generated plots. Order corresponds to the read_files_cond_2

narrow_peaks_files_cond_1 File[] (Optional) [ENCODE narrow peak format] Called peaks for biological condition 1

Narrow peaks file(s) for biological condition 1

narrow_peaks_files_cond_2 File[] (Optional) [ENCODE narrow peak format] Called peaks for biological condition 2

Narrow peaks file(s) for biological condition 2

genome_coverage_files_cond_1 File[] [bigWig] Genome coverage(s) for biological condition 1

Genome coverage bigWig file(s) for biological condition 1

genome_coverage_files_cond_2 File[] [bigWig] Genome coverage(s) for biological condition 2

Genome coverage bigWig file(s) for biological condition 2

Steps

ID Runs Label Doc
pipe
diffbind.cwl#pipe/32343c61-3964-4d73-90be-c30295f297b3 (ExpressionTool)
diffbind
../tools/diffbind.cwl (CommandLineTool)
DiffBind - Differential Binding Analysis of ChIP-Seq Peak Data

Runs R script to compute differentially bound sites from multiple ChIP-seq experiments using affinity (quantitative) and occupancy data.

sort_bed
../tools/linux-sort.cwl (CommandLineTool)

Tool sorts data from `unsorted_file` by key

`default_output_filename` function returns file name identical to `unsorted_file`, if `output_filename` is not provided.

assign_genes
../tools/iaintersect.cwl (CommandLineTool)

Tool assigns each peak obtained from MACS2 to a gene and region (upstream, promoter, exon, intron, intergenic)

`default_output_filename` function returns output filename with sufix set as `ext` argument. Function is called when either `output_filename` or `log_filename` inputs are not provided.

select_files
diffbind.cwl#select_files/73f4dd4f-8570-441c-8293-4c303f4e5372 (ExpressionTool)
bed_to_bigbed
../tools/ucsc-bedtobigbed.cwl (CommandLineTool)

Tool converts bed file to bigBed

Before running `baseCommand` the following files are created in Docker working directory (using `InitialWorkDirRequirement`): `narrowpeak.as` - default BED file structure template for ENCODE narrowPeak format `broadpeak.as` - default BED file structure template for ENCODE broadPeak format

`default_output_filename` function returns default output file name based on `input_bed` basename with `*.bb` extension if `output_filename` is not provided.

`get_bed_type` function returns default BED file type if `bed_type` is not provided. Depending on `input_bed` file extension the following values are returned: `*.narrowpeak` --> bed6+4 `*.broadpeak` --> bed6+3 else --> null (`bedToBigBed` will use its own default value)

`get_bed_template` function returns default BED file template if `bed_template` is not provided. Depending on `input_bed` file extension the following values are returned: `*.narrowpeak` --> narrowpeak.as (previously staged into Docker working directory) `*.broadpeak` --> broadpeak.as (previously staged into Docker working directory) else --> null (`bedToBigBed` will use its own default value)

convert_to_bed
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

filter_columns
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

restore_columns
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

Outputs

ID Type Label Doc
diffbind_ma_plot File (Optional) [PNG] MA plot for significantly differentially bound sites

MA plot for significantly differentially bound sites

diffbind_bed_file File [bigBed] Estimated differential peaks

Estimated differential peaks, bigBed

diffbind_pca_plot File (Optional) [PNG] PCA plot for significantly differentially bound sites

PCA plot for significantly differentially bound sites

diffbind_stderr_log File [Textual format] diffbind stderr log

diffbind stderr log

diffbind_stdout_log File [Textual format] diffbind stdout log

diffbind stdout log

narrow_peaks_cond_1 File[] (Optional) [ENCODE narrow peak format] Called peaks for biological condition 1

Narrow peaks file(s) for biological condition 1

narrow_peaks_cond_2 File[] (Optional) [ENCODE narrow peak format] Called peaks for biological condition 2

Narrow peaks file(s) for biological condition 2

diffbind_ma_plot_pdf File (Optional) [PDF] MA plot for significantly differentially bound sites

MA plot for significantly differentially bound sites

diffbind_report_file File [TSV] Differential binding analysis results

Differential binding analysis results exported as TSV

diffbind_all_pca_plot File (Optional) [PNG] PCA plot for all bound sites

PCA plot for all bound sites

diffbind_boxplot_plot File (Optional) [PNG] Box plots of read distributions for significantly differentially bound sites

Box plots of read distributions for significantly differentially bound sites

diffbind_pca_plot_pdf File (Optional) [PDF] PCA plot for significantly differentially bound sites

PCA plot for significantly differentially bound sites

diffbind_volcano_plot File (Optional) [PNG] Volcano plot for for significantly differentially bound sites

Volcano plot for for significantly differentially bound sites

genome_coverage_cond_1 File[] [bigWig] Genome coverage(s) for biological condition 1

Genome coverage bigWig file(s) for biological condition 1

genome_coverage_cond_2 File[] [bigWig] Genome coverage(s) for biological condition 2

Genome coverage bigWig file(s) for biological condition 2

diffbind_all_pca_plot_pdf File (Optional) [PDF] PCA plot for all bound sites

PCA plot for all bound sites

diffbind_boxplot_plot_pdf File (Optional) [PDF] Box plots of read distributions for significantly differentially bound sites

Box plots of read distributions for significantly differentially bound sites

diffbind_volcano_plot_pdf File (Optional) [PDF] Volcano plot for for significantly differentially bound sites

Volcano plot for for significantly differentially bound sites

diffbind_db_sites_binding_heatmap File (Optional) [PNG] Binding heatmap for significantly differentially bound sites

Binding heatmap for significantly differentially bound sites

diffbind_peak_correlation_heatmap File (Optional) [PNG] Peak overlap correlation heatmap

Peak overlap correlation heatmap

diffbind_all_peak_overlap_rate_plot File (Optional) [PNG] All peak overlap rate plot

All peak overlap rate plot

diffbind_counts_correlation_heatmap File (Optional) [PNG] Raw counts correlation heatmap

Raw counts correlation heatmap

diffbind_consensus_peak_venn_diagram File (Optional) [PNG] Consensus peak Venn Diagram

Consensus peak Venn Diagram

diffbind_all_data_correlation_heatmap File (Optional) [PNG] Not filtered normalized counts correlation heatmap

Not filtered normalized counts correlation heatmap

diffbind_db_sites_binding_heatmap_pdf File (Optional) [PDF] Binding heatmap for significantly differentially bound sites

Binding heatmap for significantly differentially bound sites

diffbind_db_sites_correlation_heatmap File (Optional) [PNG] Normalized counts correlation heatmap for significantly differentially bound sites

Normalized counts correlation heatmap for significantly differentially bound sites

diffbind_peak_correlation_heatmap_pdf File (Optional) [PDF] Peak overlap correlation heatmap

Peak overlap correlation heatmap

diffbind_peak_overlap_rate_plot_cond_1 File (Optional) [PNG] Condition 1 peak overlap rate plot

Condition 1 peak overlap rate plot

diffbind_peak_overlap_rate_plot_cond_2 File (Optional) [PNG] Condition 2 peak overlap rate plot

Condition 2 peak overlap rate plot

diffbind_all_peak_overlap_rate_plot_pdf File (Optional) [PDF] All peak overlap rate plot

All peak overlap rate plot

diffbind_counts_correlation_heatmap_pdf File (Optional) [PDF] Raw counts correlation heatmap

Raw counts correlation heatmap

diffbind_consensus_peak_venn_diagram_pdf File (Optional) [PDF] Consensus peak Venn Diagram

Consensus peak Venn Diagram

diffbind_all_data_correlation_heatmap_pdf File (Optional) [PDF] Not filtered normalized counts correlation heatmap

Not filtered normalized counts correlation heatmap

diffbind_db_sites_correlation_heatmap_pdf File (Optional) [PDF] Normalized counts correlation heatmap for significantly differentially bound sites

Normalized counts correlation heatmap for significantly differentially bound sites

diffbind_peak_overlap_rate_plot_cond_1_pdf File (Optional) [PDF] Condition 1 peak overlap rate plot

Condition 1 peak overlap rate plot

diffbind_peak_overlap_rate_plot_cond_2_pdf File (Optional) [PDF] Condition 2 peak overlap rate plot

Condition 2 peak overlap rate plot

Permalink: https://w3id.org/cwl/view/git/a409db2289b86779897ff19003bd351701a81c50/workflows/diffbind.cwl