Workflow: THOR - differential peak calling of ChIP-seq signals with replicates

Fetched 2023-01-10 13:59:08 GMT

What is THOR? -------------- THOR is an HMM-based approach to detect and analyze differential peaks in two sets of ChIP-seq data from distinct biological conditions with replicates. THOR performs genomic signal processing, peak calling and p-value calculation in an integrated framework. For more information please refer to: ------------------------------------- Allhoff, M., Sere K., Freitas, J., Zenke, M., Costa, I.G. (2016), Differential Peak Calling of ChIP-seq Signals with Replicates with THOR, Nucleic Acids Research, epub gkw680.

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
alias String Experiment short name/Alias
bin_size Integer (Optional) Size of underlying bins for creating the signal

Size of underlying bins for creating the signal

merge_peaks Boolean (Optional) Merge peaks closer than fragment size

Merge peaks which have a distance less than the estimated mean fragment size (recommended for histone data)

alias_cond_1 String (Optional) Name for condition 1

Name to be displayed for condition 1

alias_cond_2 String (Optional) Name for condition 2

Name to be displayed for condition 2

pvalue_cutoff Float (Optional) P-value cutoff for peak detection

P-value cutoff for peak detection. Call only peaks with p-value lower than cutoff. [default: 0.1]

extension_size Integer[] (Optional) Comma-separated list of read extension sizes (provide value for every sample)

Read's extension size for BAM files (comma separated list for each BAM file in config file). If option is not chosen, estimate extension sizes

annotation_file File [TSV] Genome annotation

Genome annotation file in TSV format

chrom_length_file File [Textual format] Chromosome length file

Chromosome length file

remove_duplicates Boolean (Optional) Remove the duplicate reads

Remove the duplicate reads

bambai_pair_cond_1 File[] [BAM] Biological condition 1

Coordinate sorted BAM alignment and index BAI files for the first biological condition

bambai_pair_cond_2 File[] [BAM] Biological condition 2

Coordinate sorted BAM alignment and index BAI files for the second biological condition

deadzones_bed_file File (Optional) [BED] Dead zones file

Define blacklisted genomic regions avoided for analysis

housekeeping_genes_bed_file File (Optional) [BED] Housekeeping genes file

Define housekeeping genes (BED format) used for normalizing

Steps

ID Runs Label Doc
thor
../tools/rgt-thor.cwl (CommandLineTool)
THOR - differential peak calling of ChIP-seq signals with replicates

Configuration file is autogenerated based on the bambai_pair_cond_1, bambai_pair_cond_2 and chrom_length_file inputs. The following parameters in a configuration file are skipped: genome, inputs1, inputs2. The following arguments are skipped: --report (tool fails to execute)

sort_bed
../tools/linux-sort.cwl (CommandLineTool)

Tool sorts data from `unsorted_file` by key

`default_output_filename` function returns file name identical to `unsorted_file`, if `output_filename` is not provided.

assign_genes
../tools/iaintersect.cwl (CommandLineTool)

Tool assigns each peak obtained from MACS2 to a gene and region (upstream, promoter, exon, intron, intergenic)

`default_output_filename` function returns output filename with sufix set as `ext` argument. Function is called when either `output_filename` or `log_filename` inputs are not provided.

bed_to_bigbed
../tools/ucsc-bedtobigbed.cwl (CommandLineTool)

Tool converts bed file to bigBed

Before running `baseCommand` the following files are created in Docker working directory (using `InitialWorkDirRequirement`): `narrowpeak.as` - default BED file structure template for ENCODE narrowPeak format `broadpeak.as` - default BED file structure template for ENCODE broadPeak format

`default_output_filename` function returns default output file name based on `input_bed` basename with `*.bb` extension if `output_filename` is not provided.

`get_bed_type` function returns default BED file type if `bed_type` is not provided. Depending on `input_bed` file extension the following values are returned: `*.narrowpeak` --> bed6+4 `*.broadpeak` --> bed6+3 else --> null (`bedToBigBed` will use its own default value)

`get_bed_template` function returns default BED file template if `bed_template` is not provided. Depending on `input_bed` file extension the following values are returned: `*.narrowpeak` --> narrowpeak.as (previously staged into Docker working directory) `*.broadpeak` --> broadpeak.as (previously staged into Docker working directory) else --> null (`bedToBigBed` will use its own default value)

filter_columns
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

restore_columns
../tools/custom-bash.cwl (CommandLineTool)

Tool to run custom script set as `script` input with arguments from `param`. Default script runs sed command over the input file and exports results to the file with the same name as input's basename

Outputs

ID Type Label Doc
thor_stderr_log File [Textual format] rgt-THOR stderr log

rgt-THOR stderr log

cond_1_bigwig_file File[] [bigWig] First biological condition ChIP-seq signals

Postprocessed ChIP-seq signals from the first biological condition samples

cond_2_bigwig_file File[] [bigWig] Second biological condition ChIP-seq signals

Postprocessed ChIP-seq signals from the second biological condition samples

diffpeaks_bed_file File [bigBed] Estimated differential peaks

Estimated differential peaks, bigBed

diffpeaks_annotated_file File [TSV] Estimated differential peaks with assigned genes

File contains nearest gene information for the differential peaks BED file generated by rgt-THOR

Permalink: https://w3id.org/cwl/view/git/46a077b51619c6a14f85e0aa5260ae8a04426fab/workflows/rgt-thor.cwl