Workflow: somatic_exome: exome alignment and somatic variant detection

Fetched 2023-01-10 10:04:59 GMT

somatic_exome is designed to perform processing of mutant/wildtype H.sapiens exome sequencing data. It features BQSR corrected alignments, 4 caller variant detection, and vep style annotations. Structural variants are detected via manta and cnvkit. In addition QC metrics are run, including somalier concordance metrics. example input file = analysis_workflows/example_data/somatic_exome.yaml

children parents
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
mills File mills: File specifying common polymorphic indels from mills et al.

mills provides known polymorphic indels recommended by GATK for a variety of tools including the BaseRecalibrator. This file is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 Essentially it is a list of known indels originally discovered by mill et al. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557762/ File should be in vcf format, and tabix indexed.

docm_vcf File
omni_vcf File
vep_pick
dbsnp_vcf File dbsnp_vcf: File specifying common polymorphic indels from dbSNP

dbsnp_vcf provides known indels reecommended by GATK for a variety of tools including the BaseRecalibrator. This file is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 Essintially it is a list of known indels from dbSNP. File should be in vcf format, and tabix indexed.

reference File reference: Reference fasta file for a desired assembly

reference contains the nucleotide sequence for a given assembly (hg37, hg38, etc.) in fasta format for the entire genome. This is what reads will be aligned to. Appropriate files can be found on ensembl at https://ensembl.org/info/data/ftp/index.html When providing the reference secondary files corresponding to reference indices must be located in the same directory as the reference itself. These files can be created with samtools index, bwa index, and picard CreateSequenceDictionary.

cosmic_vcf File (Optional)
tumor_name String (Optional) tumor_name: String specifying the name of the MT sample

tumor_name provides a string for what the MT sample will be referred to in the various outputs, for exmaple the VCF files.

normal_name String (Optional) normal_name: String specifying the name of the WT sample

normal_name provides a string for what the WT sample will be referred to in the various outputs, for exmaple the VCF files.

known_indels File known_indels: File specifying common polymorphic indels from 1000G

known_indels provides known indels reecommended by GATK for a variety of tools including the BaseRecalibrator. This file is part of the GATK resource bundle available at http://www.broadinstitute.org/gatk/guide/article?id=1213 Essintially it is a list of known indels from 1000 Genomes Phase I indel calls. File should be in vcf format, and tabix indexed.

somalier_vcf File
interval_list File
manta_non_wgs Boolean (Optional)
synonyms_file File (Optional)
vep_cache_dir String
bait_intervals File bait_intervals: interval_list file of baits used in the sequencing experiment

bait_intervals is an interval_list corresponding to the baits used in sequencing reagent. These are essentially coordinates for regions you were able to design probes for in the reagent. Typically the reagent provider has this information available in bed format and it can be converted to an interval_list with Picards BedToIntervalList. Astrazeneca also maintains a repo of baits for common sequencing reagents available at https://github.com/AstraZeneca-NGS/reference_data

bqsr_intervals String[] bqsr_intervals: Array of strings specifying regions for base quality score recalibration

bqsr_intervals provides an array of genomic intervals for which to apply GATK base quality score recalibrations. Typically intervals are given for the entire chromosome (i.e. chr1, chr2, etc.), these names should match the format in the reference file.

cle_vcf_filter Boolean
known_variants File (Optional)

Previously discovered variants to be flagged in this pipelines's output vcf

tumor_sequence https://w3id.org/cwl/view/git/233f026ffce240071edda526391be0c03186fce8/definitions/types/sequence_data.yml#sequence_data[] tumor_sequence: yml file specifying the location of MT sequencing data

tumor_sequence is a yml file for which to pass information regarding sequencing data for single sample (i.e. fastq files). If more than one fastq file exist for a sample, as in the case for multiple instrument data, the sequence tag is simply repeated with the additional data (see example input file). Note that in the @RG field ID and SM are required.

normal_sequence https://w3id.org/cwl/view/git/233f026ffce240071edda526391be0c03186fce8/definitions/types/sequence_data.yml#sequence_data[] normal_sequence: yml file specifying the location of WT sequencing data

normal_sequence is a yml file for which to pass information regarding sequencing data for single sample (i.e. fastq files). If more than one fastq file exist for a sample, as in the case for multiple instrument data, the sequence tag is simply repeated with the additional data (see example input file). Note that in the @RG field ID and SM are required.

varscan_p_value Float (Optional)
target_intervals File target_intervals: interval_list file of targets used in the sequencing experiment

target_intervals is an interval_list corresponding to the targets for the sequencing reagent. These are essentially coordinates for regions you wanted to design probes for in the reagent. Bed files with this information can be converted to interval_lists with Picards BedToIntervalList. In general for a WES exome reagent bait_intervals and target_intervals are the same.

summary_intervals https://w3id.org/cwl/view/git/233f026ffce240071edda526391be0c03186fce8/definitions/types/labelled_file.yml#labelled_file[]
tumor_sample_name String
manta_call_regions File (Optional)
normal_sample_name String
per_base_intervals https://w3id.org/cwl/view/git/233f026ffce240071edda526391be0c03186fce8/definitions/types/labelled_file.yml#labelled_file[]
pindel_insert_size Integer
vep_ensembl_species String

ensembl species - Must be present in the cache directory. Examples: homo_sapiens or mus_musculus

vep_ensembl_version String

ensembl version - Must be present in the cache directory. Example: 95

vep_to_table_fields String[]
annotate_coding_only Boolean (Optional)
filter_docm_variants Boolean (Optional)
manta_output_contigs Boolean (Optional)
mutect_scatter_count Integer
panel_of_normals_vcf File (Optional)
per_target_intervals https://w3id.org/cwl/view/git/233f026ffce240071edda526391be0c03186fce8/definitions/types/labelled_file.yml#labelled_file[]
strelka_cpu_reserved Integer (Optional)
varscan_min_coverage Integer (Optional)
varscan_min_var_freq Float (Optional)
vep_ensembl_assembly String

genome assembly to use in vep. Examples: GRCh38 or GRCm38

varscan_strand_filter Integer (Optional)
vep_custom_annotations https://w3id.org/cwl/view/git/233f026ffce240071edda526391be0c03186fce8/definitions/types/vep_custom_annotation.yml#vep_custom_annotation[]

custom type, check types directory for input format

qc_minimum_base_quality Integer (Optional)
varscan_max_normal_freq Float (Optional)
variants_to_table_fields String[]
qc_minimum_mapping_quality Integer (Optional)
mutect_artifact_detection_mode Boolean
picard_metric_accumulation_level String
variants_to_table_genotype_fields String[]
mutect_max_alt_alleles_in_normal_count Integer (Optional)
mutect_max_alt_allele_in_normal_fraction Float (Optional)

Steps

ID Runs Label Doc
manta
../tools/manta_somatic.cwl (CommandLineTool)
Set up and execute manta
cnvkit
../tools/cnvkit_batch.cwl (CommandLineTool)
concordance
../tools/concordance.cwl (CommandLineTool)
Concordance checking between Tumor and Normal BAM
detect_variants
detect_variants.cwl (Workflow)
Detect Variants workflow
tumor_index_cram
../tools/index_cram.cwl (CommandLineTool)
samtools index cram
normal_index_cram
../tools/index_cram.cwl (CommandLineTool)
samtools index cram
tumor_bam_to_cram
../tools/bam_to_cram.cwl (CommandLineTool)
BAM to CRAM conversion
normal_bam_to_cram
../tools/bam_to_cram.cwl (CommandLineTool)
BAM to CRAM conversion
tumor_alignment_and_qc
alignment_exome.cwl (Workflow)
exome alignment with qc
normal_alignment_and_qc
alignment_exome.cwl (Workflow)
exome alignment with qc

Outputs

ID Type Label Doc
final_tsv File
final_vcf File
cn_diagram File (Optional)
tumor_cram File
normal_cram File
vep_summary File
all_candidates File
cn_scatter_plot File (Optional)
tumor_flagstats File
diploid_variants File (Optional)
intervals_target File (Optional)
normal_flagstats File
small_candidates File
somatic_variants File (Optional)
tumor_hs_metrics File
docm_filtered_vcf File
normal_hs_metrics File
final_filtered_vcf File
reference_coverage File (Optional)
mutect_filtered_vcf File
pindel_filtered_vcf File
tumor_only_variants File (Optional)
intervals_antitarget File (Optional)
strelka_filtered_vcf File
varscan_filtered_vcf File
mutect_unfiltered_vcf File
pindel_unfiltered_vcf File
tumor_target_coverage File
normal_target_coverage File
strelka_unfiltered_vcf File
tumor_bin_level_ratios File
tumor_segmented_ratios File
varscan_unfiltered_vcf File
tumor_summary_hs_metrics File[]
normal_summary_hs_metrics File[]
tumor_antitarget_coverage File
tumor_insert_size_metrics File
tumor_per_base_hs_metrics File[]
tumor_verify_bam_id_depth File
normal_antitarget_coverage File
normal_insert_size_metrics File
normal_per_base_hs_metrics File[]
normal_verify_bam_id_depth File
tumor_per_target_hs_metrics File[]
tumor_snv_bam_readcount_tsv File
tumor_verify_bam_id_metrics File
normal_per_target_hs_metrics File[]
normal_snv_bam_readcount_tsv File
normal_verify_bam_id_metrics File
somalier_concordance_metrics File
tumor_indel_bam_readcount_tsv File
tumor_mark_duplicates_metrics File
normal_indel_bam_readcount_tsv File
normal_mark_duplicates_metrics File
somalier_concordance_statistics File
tumor_alignment_summary_metrics File
tumor_per_base_coverage_metrics File[]
normal_alignment_summary_metrics File
normal_per_base_coverage_metrics File[]
tumor_per_target_coverage_metrics File[]
normal_per_target_coverage_metrics File[]
Permalink: https://w3id.org/cwl/view/git/233f026ffce240071edda526391be0c03186fce8/definitions/pipelines/somatic_exome.cwl