class: Workflow cwlVersion: v1.2 id: >- bristol-myers-squibb/integrated-wes-wgs-production-ready-pipelines/germline-calling/6 doc: "**Germline Calling** is workflow made around the **Sentieon Haplotyper** tool. The Haplotyper algorithm performs Haplotype variant calling.\n\nThis pipeline is a part of the project to transfer BMS variant calling procedures from custom bash scripts to the Platform in October 2017. Output files are named with naming patterns mimicking those in the client-produced runs. This pipeline uses commercial Sentieon tools that require a client-specific licence.\n\n*A list of **all inputs and parameters** with corresponding docs can be found at the bottom of the page.*\n\n### Common Use Cases\n\n**Normal BAM file** is used as input and it should be sorted and indexed. This can be done by using the **BAM prep** workflow. The **Input tar with reference** input should contain the reference genome with all necessary indexes. For VCF output that this workflow produces, Variant Calling Metrics are calculated and an annotated VCF file is produced.\n\n### Changes Introduced by Seven Bridges\n\n* Outputs are named by sample IDs e.g.: NormalID.raw.vcf\n\n### Common Issues and Important Notes\n\n* Since Sentieon tools are licenced, errors with the licence will cause the pipeline to fail with an informative error message.\n* More information about Sentieon tools can be found on this [link](https://support.sentieon.com/manual/_downloads/Sentieon.pdf).\n\n### Performance Benchmarking\n\nIn the following table you can find estimates of **Germline Calling** run times and costs. All samples are aligned against **hg19 human reference**. \n\n*Cost can be significantly reduced by using **spot instances**. Visit the [Knowledge Center](https://docs.sevenbridges.com/docs/about-spot-instances) for more details.*\n\n| Threads and CPUs | Memory | Input size | WGS/WES | Duration | Cost | Instance (AWS) |\n| -------------:|:-------------:| -----:|:-------------:| ------------- | ------------- | ------------- |\n|16|30000| 10.5 GB GB | WES | 12 minutes| $0.16 | c4.4xlarge |\n|16|30000| 14.2 GB | WES | 20 minutes| $0.26 | c4.4xlarge |\n\n\n### API Python Implementation\n\nThe app's draft task can also be submitted via the **API**. In order to learn how to get your **Authentication token** and **API endpoint** for corresponding platform visit our [documentation](https://github.com/sbg/sevenbridges-python#authentication-and-configuration).\n\n```python\n# Initialize the SBG Python API\nfrom sevenbridges import Api\napi = Api(token=\"enter_your_token\", url=\"enter_api_endpoint\")\n# Get project_id/app_id from your address bar. Example: https://igor.sbgenomics.com/u/your_username/project/app\nproject_id = \"your_username/project\"\napp_id = \"your_username/project/app\"\n# Replace inputs with appropriate values\ninputs = {\n\t\"input_tar_with_reference\": api.files.query(project=project_id, names=[\"enter_filename\"])[0], \n\t\"licsrvr_host_and_port\": \"sevenbridges\", \n\t\"snpeff_database\": api.files.query(project=project_id, names=[\"enter_filename\"])[0], \n\t\"dbsnp_database\": api.files.query(project=project_id, names=[\"enter_filename\"])[0], \n\t\"target_bed\": api.files.query(project=project_id, names=[\"enter_filename\"])[0], \n\t\"input_reads\": api.files.query(project=project_id, names=[\"enter_filename\"])[0], \n\t\"assembly\": \"hg19\"}\n# Creates draft task\ntask = api.tasks.create(name=\"Germline Calling - API Run\", project=project_id, app=app_id, inputs=inputs, run=False)\n```\n\nInstructions for installing and configuring the API Python client, are provided on [github](https://github.com/sbg/sevenbridges-python#installation). For more information about using the API Python client, consult [the client documentation](http://sevenbridges-python.readthedocs.io/en/latest/). **More examples** are available [here](https://github.com/sbg/okAPI).\n\nAdditionally, [API R](https://github.com/sbg/sevenbridges-r) client is available. To learn more about using this API client please refer to the [API R client documentation](https://sbg.github.io/sevenbridges-r/)." label: Germline Calling $namespaces: sbg: 'https://sevenbridges.com' inputs: - id: licsrvr_host_and_port type: string label: License server host and port doc: >- License server host and port in the format (HOST:PORT) (parentheses omitted). 'sbg:x': -657 'sbg:y': -478 - id: input_reads 'sbg:fileTypes': BAM type: File label: Input reads doc: Input aligned reads in BAM format. secondaryFiles: - pattern: .bai required: true 'sbg:x': -645.11767578125 'sbg:y': -107.64787292480469 - id: snpeff_database 'sbg:fileTypes': ZIP type: File label: SnpEff database file doc: >- SnpEff database file is zip archive that can be downloaded from the SnpEff official site, or using the SnpEff download app. 'sbg:x': -649.2353515625 'sbg:y': -303.32061767578125 - id: dbsnp_database 'sbg:fileTypes': VCF.GZ type: File label: Database dbSNP doc: dbSNP database containing known variants. secondaryFiles: - pattern: .tbi required: false - pattern: .idx required: false 'sbg:x': -646.676513671875 'sbg:y': 23.05811309814453 - id: target_bed 'sbg:fileTypes': BED type: File? label: Target BED doc: Target BED with regions of interest. 'sbg:x': -643.558837890625 'sbg:y': 491.14715576171875 - id: in_reference 'sbg:fileTypes': 'FASTA, FA' type: File label: Reference doc: Reference Genome in FASTA format. secondaryFiles: - pattern: .fai required: true 'sbg:x': -637.0294799804688 'sbg:y': 175.49928283691406 - id: ancestry_resources_files 'sbg:fileTypes': TAR type: File? label: Ancestry Admixture resources files doc: >- TAR archive with the directory containing resources required to run the pipeline: 1. Autosomal_SNP_list_only_rs_v2.txt (2 columns (rsID\trsID) for the Ancestry Informative Markers - AIMs) 2. PAP.bed, PAP.bim, PAP.fam (genotypes of hypothetical/putative ancestral population) 3. snp151Commonhg19.bed OR snp151Commonhg19.bed (dbSNP151 table, hg19 or hg38, rsID mappings to chr, start, end, only SNPs) 4. template_merge.pop (template required for running ADMIXTURE program) 'sbg:x': -641.558837890625 'sbg:y': 336.3529968261719 outputs: - id: raw_vcf outputSource: - bms_rename_app_raw_vcf/out_file 'sbg:fileTypes': VCF type: File label: Raw VCF doc: Renamed output file. secondaryFiles: - pattern: .tbi required: false 'sbg:x': 1016.784423828125 'sbg:y': -376.4706726074219 - id: annotated_vcf outputSource: - bms_rename_app_snpeff_annotated_vcf/out_file 'sbg:fileTypes': VCF type: File label: SnpEff Annotated VCF doc: Renamed output file. secondaryFiles: - pattern: .tbi required: false 'sbg:x': 1028.490478515625 'sbg:y': -32.61084747314453 - id: raw_gvcf outputSource: - bms_rename_app/out_file 'sbg:fileTypes': 'VCF, VCF.GZ' type: File label: Raw gVCF doc: Raw compressed gVCF. secondaryFiles: - pattern: .tbi required: false 'sbg:x': 1025.2427978515625 'sbg:y': 311.11431884765625 - id: admixture_proportions_with_pedigree_info outputSource: - ancestry_admixture_pipeline/admixture_proportions_with_pedigree_info 'sbg:fileTypes': ADMIXTURE.PEDIGREE.TXT type: File label: Admixture proportions with pedigree info doc: Admixture proportions with pedigree info report file. 'sbg:x': 1028.5299072265625 'sbg:y': 145.1833038330078 steps: - id: sample_from_file in: - id: in_file linkMerge: merge_flattened source: - input_reads out: - id: out_sample_id run: class: CommandLineTool cwlVersion: v1.2 $namespaces: sbg: 'https://sevenbridges.com' id: bristol-myers-squibb/iwes-cwltool-validated-pipelines/sample-from-file/0 baseCommand: - echo - extracting - sample - from - file inputs: - id: in_file type: 'File[]' label: Input file doc: Input file with metadata. outputs: - id: out_sample_id doc: Metadata field from input file. label: Metadata field from input file type: string outputBinding: outputEval: |- ${ var file = [].concat(inputs.in_file)[0]; if (file.metadata){ if (file.metadata['sample_id']){ var file_name = file.metadata['sample_id']; } else { var file_name = file.nameroot; if (file_name.includes('-')){ var file_name = file_name.split('-').slice(0)[0]; } } } else { var file_name = file.nameroot; if (file_name.includes('-')){ var file_name = file_name.split('-').slice(0)[0]; } } return file_name; } label: Sample from file requirements: - class: ResourceRequirement ramMin: 1000 coresMin: 1 - class: DockerRequirement dockerPull: 'bms-images.sbgenomics.com/bristol-myers-squibb/ubuntu:20.04' - class: InlineJavascriptRequirement 'sbg:projectName': iWES CWLtool validated pipelines 'sbg:revisionsInfo': - 'sbg:revision': 0 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:modifiedOn': 1638145231 'sbg:revisionNotes': null 'sbg:image_url': null 'sbg:appVersion': - v1.2 'sbg:id': bristol-myers-squibb/iwes-cwltool-validated-pipelines/sample-from-file/0 'sbg:revision': 0 'sbg:revisionNotes': null 'sbg:modifiedOn': 1638145231 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:createdOn': 1638145231 'sbg:createdBy': bristol-myers-squibb/jovana_babic 'sbg:project': bristol-myers-squibb/iwes-cwltool-validated-pipelines 'sbg:sbgMaintained': false 'sbg:validationErrors': [] 'sbg:contributors': - bristol-myers-squibb/jovana_babic 'sbg:latestRevision': 0 'sbg:publisher': sbg 'sbg:content_hash': a62a80496bb1b98d8e413ff13b1f2b5f2b999ab32c11cf21fe5e4478d6467bdeb label: Sample from file 'sbg:x': -66 'sbg:y': -359.8439636230469 - id: bms_rename_app_raw_vcf in: - id: suffix_string default: .HC.vcf.gz - id: first_part_of_string source: sample_from_file/out_sample_id - id: in_file source: haplotypecaller_genotyping/output out: - id: out_file run: class: CommandLineTool cwlVersion: v1.2 $namespaces: sbg: 'https://sevenbridges.com' id: >- bristol-myers-squibb/iwes-cwltool-validated-pipelines/bms-rename-app-raw-vcf/0 baseCommand: [] inputs: - 'sbg:category': Name inputs id: suffix_string type: string? label: Suffix String doc: Tool string in desired format with extension. - 'sbg:category': Name inputs 'sbg:toolDefaultValue': None id: second_part_of_string type: string? label: String 2 doc: Second part of the output name. Overrides the input file. - 'sbg:category': Name inputs id: first_part_of_string type: string? label: String 1 doc: First part of the output name. Overrides the input file. - 'sbg:stageInput': link 'sbg:category': File Inputs id: in_file type: File inputBinding: shellQuote: false position: 1 'sbg:cmdInclude': true label: Input file doc: Input file. - 'sbg:category': Name inputs id: second_part_of_string_file type: File? label: File - second part of name string doc: >- File whose name shall be used for the second part of the output name. - 'sbg:category': Name inputs id: first_part_of_string_file type: File? label: File - first part of name string doc: File whose name shall be used for the first part of the output name. outputs: - id: out_file doc: Renamed output file. label: Output file type: File outputBinding: glob: |- ${ if (inputs.first_part_of_string) { var first = inputs.first_part_of_string; } else { if (inputs.first_part_of_string_file){ var first_input_file = [].concat(inputs.first_part_of_string_file)[0]; } else { var first_input_file = [].concat(inputs.in_file)[0]; } if (first_input_file.metadata){ if (first_input_file.metadata['sample_id']){ var first = first_input_file.metadata['sample_id']; } else { var first = first_input_file.nameroot.split('.')[0]; } } else { var first = first_input_file.nameroot.split('.')[0]; } } var junct = '-'; if (inputs.second_part_of_string) { var second = inputs.second_part_of_string; } else if (inputs.second_part_of_string_file){ var second_input_file = [].concat(inputs.second_part_of_string_file)[0]; if (second_input_file.metadata){ if (second_input_file.metadata['sample_id']){ var second = second_input_file.metadata['sample_id']; } else { var second = second_input_file.nameroot.split('.')[0]; } } else { var second = second_input_file.nameroot.split('.')[0]; } } else { junct = ''; var second = ''; } if (inputs.suffix_string) { var last = inputs.suffix_string; } else { var last = ''; } return first + junct + second + last; } outputEval: '$(inheritMetadata(self, inputs.in_file))' secondaryFiles: - pattern: .bai required: false - pattern: ^.bai required: false - pattern: .fai required: false - pattern: ^.fai required: false - pattern: .dict required: false - pattern: ^.dict required: false - pattern: .idx required: false - pattern: ^.idx required: false - pattern: .tbi required: false - pattern: ^.tbi required: false - pattern: .csi required: false - pattern: ^.csi required: false label: BMS Rename App arguments: - prefix: '' shellQuote: false position: 0 valueFrom: |- ${ if (inputs.in_file) { return 'cp'; } else { return 'echo NO input given, skipping... #'; } } - prefix: '' shellQuote: false position: 10 valueFrom: |- ${ if (inputs.first_part_of_string) { var first = inputs.first_part_of_string; } else { if (inputs.first_part_of_string_file){ var first_input_file = [].concat(inputs.first_part_of_string_file)[0]; } else { var first_input_file = [].concat(inputs.in_file)[0]; } if (first_input_file.metadata){ if (first_input_file.metadata['sample_id']){ var first = first_input_file.metadata['sample_id']; } else { var first = first_input_file.nameroot.split('.')[0]; } } else { var first = first_input_file.nameroot.split('.')[0]; } } var junct = '-'; if (inputs.second_part_of_string) { var second = inputs.second_part_of_string; } else if (inputs.second_part_of_string_file){ var second_input_file = [].concat(inputs.second_part_of_string_file)[0]; if (second_input_file.metadata){ if (second_input_file.metadata['sample_id']){ var second = second_input_file.metadata['sample_id']; } else { var second = second_input_file.nameroot.split('.')[0]; } } else { var second = second_input_file.nameroot.split('.')[0]; } } else { junct = ''; var second = ''; } if (inputs.suffix_string) { var last = inputs.suffix_string; } else { var last = ''; } return first + junct + second + last; } - prefix: '' shellQuote: false position: 20 valueFrom: |- ${ if (!inputs.in_file.hasOwnProperty('secondaryFiles')){ return "|| : # No secondary files found"; } if (!inputs.in_file.secondaryFiles){ return "|| : # No secondary files found"; } var secondary_files = []; for (var i = 0; i- bristol-myers-squibb/iwes-cwltool-validated-pipelines/bms-rename-app-raw-vcf/0 'sbg:revision': 0 'sbg:revisionNotes': null 'sbg:modifiedOn': 1638145232 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:createdOn': 1638145232 'sbg:createdBy': bristol-myers-squibb/jovana_babic 'sbg:project': bristol-myers-squibb/iwes-cwltool-validated-pipelines 'sbg:sbgMaintained': false 'sbg:validationErrors': [] 'sbg:contributors': - bristol-myers-squibb/jovana_babic 'sbg:latestRevision': 0 'sbg:publisher': sbg 'sbg:content_hash': ae7e3182c2c69c931a451b1260d7e1e222f8fcb82bc614e798a069d2f83080155 label: BMS Rename App (Raw VCF) 'sbg:x': 652.5296630859375 'sbg:y': -379.02947998046875 - id: bms_rename_app_snpeff_annotated_vcf in: - id: suffix_string default: .snpEff.vcf - id: first_part_of_string source: sample_from_file/out_sample_id - id: in_file source: effect/AnnotatedEFFVCF out: - id: out_file run: class: CommandLineTool cwlVersion: v1.2 $namespaces: sbg: 'https://sevenbridges.com' id: >- bristol-myers-squibb/iwes-cwltool-validated-pipelines/bms-rename-app-raw-vcf/0 baseCommand: [] inputs: - 'sbg:category': Name inputs id: suffix_string type: string? label: Suffix String doc: Tool string in desired format with extension. - 'sbg:category': Name inputs 'sbg:toolDefaultValue': None id: second_part_of_string type: string? label: String 2 doc: Second part of the output name. Overrides the input file. - 'sbg:category': Name inputs id: first_part_of_string type: string? label: String 1 doc: First part of the output name. Overrides the input file. - 'sbg:stageInput': link 'sbg:category': File Inputs id: in_file type: File inputBinding: shellQuote: false position: 1 'sbg:cmdInclude': true label: Input file doc: Input file. - 'sbg:category': Name inputs id: second_part_of_string_file type: File? label: File - second part of name string doc: >- File whose name shall be used for the second part of the output name. - 'sbg:category': Name inputs id: first_part_of_string_file type: File? label: File - first part of name string doc: File whose name shall be used for the first part of the output name. outputs: - id: out_file doc: Renamed output file. label: Output file type: File outputBinding: glob: |- ${ if (inputs.first_part_of_string) { var first = inputs.first_part_of_string; } else { if (inputs.first_part_of_string_file){ var first_input_file = [].concat(inputs.first_part_of_string_file)[0]; } else { var first_input_file = [].concat(inputs.in_file)[0]; } if (first_input_file.metadata){ if (first_input_file.metadata['sample_id']){ var first = first_input_file.metadata['sample_id']; } else { var first = first_input_file.nameroot.split('.')[0]; } } else { var first = first_input_file.nameroot.split('.')[0]; } } var junct = '-'; if (inputs.second_part_of_string) { var second = inputs.second_part_of_string; } else if (inputs.second_part_of_string_file){ var second_input_file = [].concat(inputs.second_part_of_string_file)[0]; if (second_input_file.metadata){ if (second_input_file.metadata['sample_id']){ var second = second_input_file.metadata['sample_id']; } else { var second = second_input_file.nameroot.split('.')[0]; } } else { var second = second_input_file.nameroot.split('.')[0]; } } else { junct = ''; var second = ''; } if (inputs.suffix_string) { var last = inputs.suffix_string; } else { var last = ''; } return first + junct + second + last; } outputEval: '$(inheritMetadata(self, inputs.in_file))' secondaryFiles: - pattern: .bai required: false - pattern: ^.bai required: false - pattern: .fai required: false - pattern: ^.fai required: false - pattern: .dict required: false - pattern: ^.dict required: false - pattern: .idx required: false - pattern: ^.idx required: false - pattern: .tbi required: false - pattern: ^.tbi required: false - pattern: .csi required: false - pattern: ^.csi required: false label: BMS Rename App arguments: - prefix: '' shellQuote: false position: 0 valueFrom: |- ${ if (inputs.in_file) { return 'cp'; } else { return 'echo NO input given, skipping... #'; } } - prefix: '' shellQuote: false position: 10 valueFrom: |- ${ if (inputs.first_part_of_string) { var first = inputs.first_part_of_string; } else { if (inputs.first_part_of_string_file){ var first_input_file = [].concat(inputs.first_part_of_string_file)[0]; } else { var first_input_file = [].concat(inputs.in_file)[0]; } if (first_input_file.metadata){ if (first_input_file.metadata['sample_id']){ var first = first_input_file.metadata['sample_id']; } else { var first = first_input_file.nameroot.split('.')[0]; } } else { var first = first_input_file.nameroot.split('.')[0]; } } var junct = '-'; if (inputs.second_part_of_string) { var second = inputs.second_part_of_string; } else if (inputs.second_part_of_string_file){ var second_input_file = [].concat(inputs.second_part_of_string_file)[0]; if (second_input_file.metadata){ if (second_input_file.metadata['sample_id']){ var second = second_input_file.metadata['sample_id']; } else { var second = second_input_file.nameroot.split('.')[0]; } } else { var second = second_input_file.nameroot.split('.')[0]; } } else { junct = ''; var second = ''; } if (inputs.suffix_string) { var last = inputs.suffix_string; } else { var last = ''; } return first + junct + second + last; } - prefix: '' shellQuote: false position: 20 valueFrom: |- ${ if (!inputs.in_file.hasOwnProperty('secondaryFiles')){ return "|| : # No secondary files found"; } if (!inputs.in_file.secondaryFiles){ return "|| : # No secondary files found"; } var secondary_files = []; for (var i = 0; i- bristol-myers-squibb/iwes-cwltool-validated-pipelines/bms-rename-app-raw-vcf/0 'sbg:revision': 0 'sbg:revisionNotes': null 'sbg:modifiedOn': 1638145232 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:createdOn': 1638145232 'sbg:createdBy': bristol-myers-squibb/jovana_babic 'sbg:project': bristol-myers-squibb/iwes-cwltool-validated-pipelines 'sbg:sbgMaintained': false 'sbg:validationErrors': [] 'sbg:contributors': - bristol-myers-squibb/jovana_babic 'sbg:latestRevision': 0 'sbg:publisher': sbg 'sbg:content_hash': ae7e3182c2c69c931a451b1260d7e1e222f8fcb82bc614e798a069d2f83080155 label: BMS Rename App (SnpEff Annotated VCF) 'sbg:x': 656.9412841796875 'sbg:y': -28.4411678314209 - id: sbg_prepare_intervals in: - id: bed_file source: target_bed - id: split_mode default: File per chr with alt contig in a single file - id: format default: chr start end out: - id: intervals run: class: CommandLineTool cwlVersion: v1.2 $namespaces: sbg: 'https://sevenbridges.com' id: >- bristol-myers-squibb/iwes-cwltool-validated-pipelines/sbg-prepare-intervals/0 baseCommand: - python - sbg_prepare_intervals.py inputs: - 'sbg:category': File Inputs id: bed_file type: File? inputBinding: prefix: '--bed' shellQuote: false position: 1 label: Input BED file doc: Input BED file containing intervals. Required for modes 3 and 4. 'sbg:fileTypes': BED - 'sbg:category': File Input id: fai_file type: File? inputBinding: prefix: '--fai' shellQuote: false position: 2 valueFrom: |- ${ self = [].concat(self)[0]; if (self.nameext == '.fa' || self.nameext == '.fasta'){ if (self.hasOwnProperty('secondaryFiles')){ for (var i = 0; i- Depending on selected Split Mode value, output files are generated in accordance with description below: 1. File per interval - The tool creates one interval file per line of the input BED(FAI) file. Each interval file contains a single line (one of the lines of BED(FAI) input file). 2. File per chr with alt contig in a single file - For each contig(chromosome) a single file is created containing all the intervals corresponding to it . All the intervals (lines) other than (chr1, chr2 ... chrY or 1, 2 ... Y) are saved as ("others.bed"). 3. Output original BED - BED file is required for execution of this mode. If mode 3 is applied input is passed to the output. 4. File per interval with alt contig in a single file - For each chromosome a single file is created for each interval. All the intervals (lines) other than (chr1, chr2 ... chrY or 1, 2 ... Y) are saved as ("others.bed"). NOTE: Do not use option 1 (File per interval) with exome BED or a BED with a lot of GL contigs, as it will create a large number of files. - 'sbg:category': Config inputs id: format type: - 'null' - type: enum symbols: - chr start end - 'chr:start-end' name: format label: Interval format doc: Format of the intervals in the generated files. outputs: - id: intervals doc: Array of BED files generated as per selected Split Mode. label: Intervals type: 'File[]?' outputBinding: glob: Intervals/*.bed 'sbg:fileTypes': BED doc: >- Depending on selected Split Mode value, output files are generated in accordance with description below: 1. File per interval - The tool creates one interval file per line of the input BED(FAI) file. Each interval file contains a single line (one of the lines of BED(FAI) input file). 2. File per chr with alt contig in a single file - For each contig(chromosome) a single file is created containing all the intervals corresponding to it . All the intervals (lines) other than (chr1, chr2 ... chrY or 1, 2 ... Y) are saved as ("others.bed"). 3. Output original BED - BED file is required for execution of this mode. If mode 3 is applied input is passed to the output. 4. File per interval with alt contig in a single file - For each chromosome a single file is created for each interval. All the intervals (lines) other than (chr1, chr2 ... chrY or 1, 2 ... Y) are saved as ("others.bed"). ##### Common issues: Do not use option 1 (File per interval) with exome BED or a BED with a lot of GL contigs, as it will create a large number of files. label: SBG Prepare Intervals arguments: - prefix: '' shellQuote: false position: 0 valueFrom: |- ${ if (inputs.format){ return '--format ' + '"' + inputs.format + '"'; } return ""; } requirements: - class: ShellCommandRequirement - class: ResourceRequirement ramMin: 1000 coresMin: 1 - class: DockerRequirement dockerPull: 'images.sbgenomics.com/bogdang/sbg_prepare_intervals:1.0' - class: InitialWorkDirRequirement listing: - entryname: sbg_prepare_intervals.py entry: >- """ Usage: sbg_prepare_intervals.py [options] [--fastq FILE --bed FILE --mode INT --format STR --others STR] Description: Purpose of this tool is to split BED file into files based on the selected mode. If bed file is not provided fai(fasta index) file is converted to bed. Options: -h, --help Show this message. -v, -V, --version Tool version. -b, -B, --bed FILE Path to input bed file. --fai FILE Path to input fai file. --format STR Output file format. --mode INT Select input mode. """ import os import sys import glob import shutil from docopt import docopt default_extension = '.bed' # for output files def create_file(contents, contig_name, extension=default_extension): """function for creating a file for all intervals in a contig""" new_file = open("Intervals/" + contig_name + extension, "w") new_file.write(contents) new_file.close() def add_to_file(line, name, extension=default_extension): """function for adding a line to a file""" new_file = open("Intervals/" + name + extension, "a") if lformat == formats[1]: sep = line.split("\t") line = sep[0] + ":" + sep[1] + "-" + sep[2] new_file.write(line) new_file.close() def fai2bed(fai): """function to create a bed file from fai file""" region_thr = 10000000 # threshold used to determine starting point accounting for telomeres in chromosomes basename = fai[0:fai.rfind(".")] with open(fai, "r") as ins: new_array = [] for line in ins: len_reg = int(line.split()[1]) cutoff = 0 if ( len_reg < region_thr) else 0 # sd\\telomeres or start with 1 new_line = line.split()[0] + '\t' + str(cutoff) + '\t' + str( len_reg + cutoff) new_array.append(new_line) new_file = open(basename + ".bed", "w") new_file.write("\n".join(new_array)) return basename + ".bed" def chr_intervals(no_of_chrms=23): """returns all possible designations for chromosome intervals""" chrms = [] for i in range(1, no_of_chrms): chrms.append("chr" + str(i)) chrms.append(str(i)) chrms.extend(["x", "y", "chrx", "chry"]) return chrms def mode_1(orig_file): """mode 1: every line is a new file""" with open(orig_file, "r") as ins: prev = "" counter = 0 names = [] for line in ins: if is_header(line): continue if line.split()[0] == prev: counter += 1 else: counter = 0 suffix = "" if (counter == 0) else "_" + str(counter) create_file(line, line.split()[0] + suffix) names.append(line.split()[0] + suffix) prev = line.split()[0] create_file(str(names), "names", extension=".txt") def mode_2(orig_file, others_name): """mode 2: separate file is created for each chromosome, and one file is created for other intervals""" chrms = chr_intervals() names = [] with open(orig_file, 'r') as ins: for line in ins: if is_header(line): continue name = line.split()[0] if name.lower() in chrms: name = name else: name = others_name try: add_to_file(line, name) if not name in names: names.append(name) except: raise Exception( "Couldn't create or write in the file in mode 2") create_file(str(names), "names", extension=".txt") def mode_3(orig_file, extension=default_extension): """mode 3: input file is staged to output""" orig_name = orig_file.split("/")[len(orig_file.split("/")) - 1] output_file = r"./Intervals/" + orig_name[ 0:orig_name.rfind('.')] + extension shutil.copyfile(orig_file, output_file) names = [orig_name[0:orig_name.rfind('.')]] create_file(str(names), "names", extension=".txt") def mode_4(orig_file, others_name): """mode 4: every interval in chromosomes is in a separate file. Other intervals are in a single file""" chrms = chr_intervals() names = [] with open(orig_file, "r") as ins: counter = {} for line in ins: if line.startswith('@'): continue name = line.split()[0].lower() if name in chrms: if name in counter: counter[name] += 1 else: counter[name] = 0 suffix = "" if (counter[name] == 0) else "_" + str(counter[name]) create_file(line, name + suffix) names.append(name + suffix) prev = name else: name = others_name if not name in names: names.append(name) try: add_to_file(line, name) except: raise Exception( "Couldn't create or write in the file in mode 4") create_file(str(names), "names", extension=".txt") def prepare_intervals(): # reading input files and split mode from command line args = docopt(__doc__, version='1.0') bed_file = args['--bed'] fai_file = args['--fai'] split_mode = int(args['--mode']) # define file name for non-chromosomal contigs others_name = 'others' global formats, lformat formats = ["chr start end", "chr:start-end"] lformat = args['--format'] if lformat == None: lformat = formats[0] if not lformat in formats: raise Exception('Unsuported interval format') if not os.path.exists(r"./Intervals"): os.mkdir(r"./Intervals") else: files = glob.glob(r"./Intervals/*") for f in files: os.remove(f) # create variable input_file taking bed_file as priority if bed_file: input_file = bed_file elif fai_file: input_file = fai2bed(fai_file) else: raise Exception('No input files are provided') # calling adequate split mode function if split_mode == 1: mode_1(input_file) elif split_mode == 2: mode_2(input_file, others_name) elif split_mode == 3: if bed_file: mode_3(input_file) else: raise Exception('Bed file is required for mode 3') elif split_mode == 4: mode_4(input_file, others_name) else: raise Exception('Split mode value is not set') def is_header(line): x = line.split('\t') try: int(x[1]) int(x[2]) header = False except: sys.stderr.write('Line is skipped: {}'.format(line)) header = True return header if __name__ == '__main__': prepare_intervals() writable: false - class: InlineJavascriptRequirement 'sbg:projectName': iWES CWLtool validated pipelines 'sbg:revisionsInfo': - 'sbg:revision': 0 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:modifiedOn': 1638145233 'sbg:revisionNotes': null 'sbg:image_url': null 'sbg:toolAuthor': Seven Bridges Genomics 'sbg:license': Apache License 2.0 'sbg:toolkit': SBGTools 'sbg:toolkitVersion': '1.0' 'sbg:categories': - Converters 'sbg:appVersion': - v1.2 'sbg:id': >- bristol-myers-squibb/iwes-cwltool-validated-pipelines/sbg-prepare-intervals/0 'sbg:revision': 0 'sbg:revisionNotes': null 'sbg:modifiedOn': 1638145233 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:createdOn': 1638145233 'sbg:createdBy': bristol-myers-squibb/jovana_babic 'sbg:project': bristol-myers-squibb/iwes-cwltool-validated-pipelines 'sbg:sbgMaintained': false 'sbg:validationErrors': [] 'sbg:contributors': - bristol-myers-squibb/jovana_babic 'sbg:latestRevision': 0 'sbg:publisher': sbg 'sbg:content_hash': a1168e2abbbdb701df4dca244ba689d6d58608946fae5417d3912603b41587c04 label: SBG Prepare Intervals 'sbg:x': -348.12701416015625 'sbg:y': 398.2785339355469 - id: bms_rename_app in: - id: suffix_string default: .HC.g.vcf.gz - id: first_part_of_string source: sample_from_file/out_sample_id - id: in_file source: gatk_merge_vcfs/out_variants out: - id: out_file run: class: CommandLineTool cwlVersion: v1.2 $namespaces: sbg: 'https://sevenbridges.com' id: >- bristol-myers-squibb/iwes-cwltool-validated-pipelines/bms-rename-app-raw-vcf/0 baseCommand: [] inputs: - 'sbg:category': Name inputs id: suffix_string type: string? label: Suffix String doc: Tool string in desired format with extension. - 'sbg:category': Name inputs 'sbg:toolDefaultValue': None id: second_part_of_string type: string? label: String 2 doc: Second part of the output name. Overrides the input file. - 'sbg:category': Name inputs id: first_part_of_string type: string? label: String 1 doc: First part of the output name. Overrides the input file. - 'sbg:stageInput': link 'sbg:category': File Inputs id: in_file type: File inputBinding: shellQuote: false position: 1 'sbg:cmdInclude': true label: Input file doc: Input file. - 'sbg:category': Name inputs id: second_part_of_string_file type: File? label: File - second part of name string doc: >- File whose name shall be used for the second part of the output name. - 'sbg:category': Name inputs id: first_part_of_string_file type: File? label: File - first part of name string doc: File whose name shall be used for the first part of the output name. outputs: - id: out_file doc: Renamed output file. label: Output file type: File outputBinding: glob: |- ${ if (inputs.first_part_of_string) { var first = inputs.first_part_of_string; } else { if (inputs.first_part_of_string_file){ var first_input_file = [].concat(inputs.first_part_of_string_file)[0]; } else { var first_input_file = [].concat(inputs.in_file)[0]; } if (first_input_file.metadata){ if (first_input_file.metadata['sample_id']){ var first = first_input_file.metadata['sample_id']; } else { var first = first_input_file.nameroot.split('.')[0]; } } else { var first = first_input_file.nameroot.split('.')[0]; } } var junct = '-'; if (inputs.second_part_of_string) { var second = inputs.second_part_of_string; } else if (inputs.second_part_of_string_file){ var second_input_file = [].concat(inputs.second_part_of_string_file)[0]; if (second_input_file.metadata){ if (second_input_file.metadata['sample_id']){ var second = second_input_file.metadata['sample_id']; } else { var second = second_input_file.nameroot.split('.')[0]; } } else { var second = second_input_file.nameroot.split('.')[0]; } } else { junct = ''; var second = ''; } if (inputs.suffix_string) { var last = inputs.suffix_string; } else { var last = ''; } return first + junct + second + last; } outputEval: '$(inheritMetadata(self, inputs.in_file))' secondaryFiles: - pattern: .bai required: false - pattern: ^.bai required: false - pattern: .fai required: false - pattern: ^.fai required: false - pattern: .dict required: false - pattern: ^.dict required: false - pattern: .idx required: false - pattern: ^.idx required: false - pattern: .tbi required: false - pattern: ^.tbi required: false - pattern: .csi required: false - pattern: ^.csi required: false label: BMS Rename App arguments: - prefix: '' shellQuote: false position: 0 valueFrom: |- ${ if (inputs.in_file) { return 'cp'; } else { return 'echo NO input given, skipping... #'; } } - prefix: '' shellQuote: false position: 10 valueFrom: |- ${ if (inputs.first_part_of_string) { var first = inputs.first_part_of_string; } else { if (inputs.first_part_of_string_file){ var first_input_file = [].concat(inputs.first_part_of_string_file)[0]; } else { var first_input_file = [].concat(inputs.in_file)[0]; } if (first_input_file.metadata){ if (first_input_file.metadata['sample_id']){ var first = first_input_file.metadata['sample_id']; } else { var first = first_input_file.nameroot.split('.')[0]; } } else { var first = first_input_file.nameroot.split('.')[0]; } } var junct = '-'; if (inputs.second_part_of_string) { var second = inputs.second_part_of_string; } else if (inputs.second_part_of_string_file){ var second_input_file = [].concat(inputs.second_part_of_string_file)[0]; if (second_input_file.metadata){ if (second_input_file.metadata['sample_id']){ var second = second_input_file.metadata['sample_id']; } else { var second = second_input_file.nameroot.split('.')[0]; } } else { var second = second_input_file.nameroot.split('.')[0]; } } else { junct = ''; var second = ''; } if (inputs.suffix_string) { var last = inputs.suffix_string; } else { var last = ''; } return first + junct + second + last; } - prefix: '' shellQuote: false position: 20 valueFrom: |- ${ if (!inputs.in_file.hasOwnProperty('secondaryFiles')){ return "|| : # No secondary files found"; } if (!inputs.in_file.secondaryFiles){ return "|| : # No secondary files found"; } var secondary_files = []; for (var i = 0; i- bristol-myers-squibb/iwes-cwltool-validated-pipelines/bms-rename-app-raw-vcf/0 'sbg:revision': 0 'sbg:revisionNotes': null 'sbg:modifiedOn': 1638145232 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:createdOn': 1638145232 'sbg:createdBy': bristol-myers-squibb/jovana_babic 'sbg:project': bristol-myers-squibb/iwes-cwltool-validated-pipelines 'sbg:sbgMaintained': false 'sbg:validationErrors': [] 'sbg:contributors': - bristol-myers-squibb/jovana_babic 'sbg:latestRevision': 0 'sbg:publisher': sbg 'sbg:content_hash': ae7e3182c2c69c931a451b1260d7e1e222f8fcb82bc614e798a069d2f83080155 label: BMS Rename App 'sbg:x': 658.0885009765625 'sbg:y': 312.7941589355469 - id: gatk_merge_vcfs in: - id: in_variants source: - haplotypecaller_genotyping_gvcf/output - id: output_file_format default: vcf.gz out: - id: out_variants run: class: CommandLineTool cwlVersion: v1.2 $namespaces: sbg: 'https://sevenbridges.com' id: bristol-myers-squibb/iwes-cwltool-validated-pipelines/gatk-merge-vcfs/0 baseCommand: [] inputs: - 'sbg:altPrefix': '-I' 'sbg:category': Required Arguments id: in_variants type: 'File[]' inputBinding: shellQuote: false position: 4 valueFrom: |- ${ if (self) { var cmd = []; for (var i = 0; i < self.length; i++) { cmd.push('--INPUT', self[i].path); } return cmd.join(' '); } } label: Input variants file doc: >- VCF or BCF input files (file format is determined by file extension). 'sbg:fileTypes': 'VCF, VCF.GZ, BCF' secondaryFiles: - pattern: |- ${ if (self.nameext == ".vcf") { return self.basename + ".idx"; } else { return self.basename + ".tbi"; } } required: true - 'sbg:category': Optional Arguments 'sbg:toolDefaultValue': '2' id: compression_level type: int? inputBinding: prefix: '--COMPRESSION_LEVEL' shellQuote: false position: 4 label: Compression level doc: >- Compression level for all compressed files created (e.g. BAM and VCF). - 'sbg:category': Optional Arguments 'sbg:toolDefaultValue': '500000' id: max_records_in_ram type: int? inputBinding: prefix: '--MAX_RECORDS_IN_RAM' shellQuote: false position: 4 label: Max records in RAM doc: >- When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed. - 'sbg:category': Platform Options id: memory_overhead_per_job type: int? label: Memory overhead per job doc: >- This input allows a user to set the desired overhead memory when running a tool or adding it to a workflow. This amount will be added to the Memory per job in the Memory requirements section but it will not be added to the -Xmx parameter leaving some memory not occupied which can be used as stack memory (-Xmx parameter defines heap memory). This input should be defined in MB (for both the platform part and the -Xmx part if Java tool is wrapped). - 'sbg:category': Platform Options 'sbg:toolDefaultValue': 2048 MB id: memory_per_job type: int? label: Memory per job doc: >- This input allows a user to set the desired memory requirement when running a tool or adding it to a workflow. This value should be propagated to the -Xmx parameter too.This input should be defined in MB (for both the platform part and the -Xmx part if Java tool is wrapped). - 'sbg:altPrefix': '-D' 'sbg:category': Optional Arguments 'sbg:toolDefaultValue': 'null' id: sequence_dictionary type: File? inputBinding: prefix: '--SEQUENCE_DICTIONARY' shellQuote: false position: 4 label: Sequence dictionary doc: >- The index sequence dictionary to use instead of the sequence dictionary in the input files. 'sbg:fileTypes': DICT - 'sbg:category': Platform options 'sbg:toolDefaultValue': '1' id: cpu_per_job type: int? label: CPU per job doc: >- This input allows a user to set the desired CPU requirement when running a tool or adding it to a workflow. - 'sbg:category': Optional Arguments id: output_file_format type: - 'null' - type: enum symbols: - vcf - bcf - vcf.gz name: output_file_format label: Output file format doc: Output file format. - 'sbg:category': Optional Arguments id: output_prefix type: string? label: Output prefix doc: Output file name prefix. outputs: - id: out_variants doc: >- The merged VCF or BCF file. File format is determined by file extension. label: Output merged VCF or BCF file type: File? outputBinding: glob: |- ${ var in_variants = [].concat(inputs.in_variants); var vcf_count = 0; var vcf_gz_count = 0; var bcf_count = 0; var gvcf_count = 0; var gvcf_gz_count = 0; for (var i = 0; i < in_variants.length; i++) { if (in_variants[i].path.endsWith('vcf') && !(in_variants[i].path.endsWith('g.vcf')) ) vcf_count += 1 else if (in_variants[i].path.endsWith('vcf.gz') && !(in_variants[i].path.endsWith('g.vcf.gz'))) vcf_gz_count += 1 else if (in_variants[i].path.endsWith('bcf')) bcf_count += 1 else if (in_variants[i].path.endsWith('g.vcf')) gvcf_count += 1 else if (in_variants[i].path.endsWith('g.vcf.gz')) gvcf_gz_count += 1 } var max_ext = Math.max(vcf_count, vcf_gz_count, bcf_count, gvcf_count, gvcf_gz_count) var most_frequent_ext = (max_ext == vcf_count) ? "vcf" : (max_ext == vcf_gz_count) ? "vcf.gz" : (max_ext == bcf_count) ? "bcf" : (max_ext == gvcf_count) ? "g.vcf" : "g.vcf.gz"; var out_format = inputs.output_file_format; var out_ext = ""; if (out_format) { out_ext = ((most_frequent_ext == "g.vcf" || most_frequent_ext == "g.vcf.gz") && (out_format == "vcf" || out_format == "vcf.gz")) ? "g." + out_format : ((most_frequent_ext == "g.vcf" || most_frequent_ext == "g.vcf.gz") && (out_format == "bcf" )) ? most_frequent_ext : out_format; } else { out_ext = most_frequent_ext; } return "*" + out_ext; } outputEval: '$(inheritMetadata(self, inputs.in_variants))' secondaryFiles: - pattern: .tbi required: false 'sbg:fileTypes': 'VCF, VCF.GZ, BCF' doc: >- The **GATK MergeVcfs** tool combines multiple variant files into a single variant file. *A list of **all inputs and parameters** with corresponding descriptions can be found at the bottom of the page.* ###Common Use Cases * The **MergeVcfs** tool requires one or more input files in VCF format on its **Input variant files** (`--INPUT`) input. The input files can be in VCF format (can be gzipped, i.e. ending in ".vcf.gz", or binary compressed, i.e. ending in ".bcf"). The tool generates a VCF file on its **Output merged VCF or BCF file** output. * The **MergeVcfs** tool supports a sequence dictionary file (typically name ending in .dict) on its **Sequence dictionary** (`--SEQUENCE_DICTIONARY`) input if the input VCF does not contain a complete contig list and if the output index is to be created (true by default). * The output file is sorted (i) according to the dictionary and (ii) by coordinate. * Usage example: ``` gatk MergeVcfs \ --INPUT input_variants.01.vcf \ --INPUT input_variants.02.vcf.gz \ --OUTPUT output_variants.vcf.gz ``` ###Changes Introduced by Seven Bridges * The output file will be prefixed using the **Output prefix** parameter. In case **Output prefix** is not provided, the input files provided on the **Input variant files** input will be alphabetically sorted by name and output prefix will be equal to the Sample ID metadata from the first element from that list, if the Sample ID metadata exists. Otherwise, output prefix will be inferred from the filename of the first element from this list. Moreover, the number of input files will be added after the output prefix as well as the tool specific extension which is **merged**. This way, having identical names of the output files between runs is avoided. * The user has a possibility to specify the output file format using the **Output file format** argument. The default output format is "vcf.gz". ###Common Issues and Important Notes * Note 1: If running this tool on multi-sample input files (originating from e.g. some scatter-gather runs), the input files must contain the same sample names in the same column order. * Note 2: Input file headers must contain compatible declarations for common annotations (INFO, FORMAT fields) and filters. * Note 3: Input files variant records must be sorted by their contig and position following the sequence dictionary provided or the header contig list. ###Performance Benchmarking This tool is ultra fast, with a running time less than a minute on the default AWS c4.2xlarge instance. ###References [1] [GATK MergeVcfs](https://software.broadinstitute.org/gatk/documentation/tooldocs/4.1.0.0/picard_vcf_MergeVcfs.php) label: GATK Merge VCFs arguments: - prefix: '' shellQuote: false position: 0 valueFrom: /opt/gatk - shellQuote: false position: 1 valueFrom: '--java-options' - prefix: '' shellQuote: false position: 2 valueFrom: |- ${ if (inputs.memory_per_job) { return '\"-Xmx'.concat(inputs.memory_per_job, 'M') + '\"'; } return '\"-Xms2000m\"'; } - shellQuote: false position: 3 valueFrom: MergeVcfs - prefix: '' shellQuote: false position: 4 valueFrom: |- ${ var in_variants = [].concat(inputs.in_variants); var output_prefix = ""; var vcf_count = 0; var vcf_gz_count = 0; var bcf_count = 0; var gvcf_count = 0; var gvcf_gz_count = 0; for (var i = 0; i < in_variants.length; i++) { if (in_variants[i].path.endsWith('vcf') && !(in_variants[i].path.endsWith('g.vcf')) ) vcf_count += 1 else if (in_variants[i].path.endsWith('vcf.gz') && !(in_variants[i].path.endsWith('g.vcf.gz'))) vcf_gz_count += 1 else if (in_variants[i].path.endsWith('bcf')) bcf_count += 1 else if (in_variants[i].path.endsWith('g.vcf')) gvcf_count += 1 else if (in_variants[i].path.endsWith('g.vcf.gz')) gvcf_gz_count += 1 } var max_ext = Math.max(vcf_count, vcf_gz_count, bcf_count, gvcf_count, gvcf_gz_count) var most_frequent_ext = (max_ext == vcf_count) ? "vcf" : (max_ext == vcf_gz_count) ? "vcf.gz" : (max_ext == bcf_count) ? "bcf" : (max_ext == gvcf_count) ? "g.vcf" : "g.vcf.gz"; var out_format = inputs.output_file_format; var out_ext = ""; if (out_format) { out_ext = ((most_frequent_ext == "g.vcf" || most_frequent_ext == "g.vcf.gz") && (out_format == "vcf" || out_format == "vcf.gz")) ? "g." + out_format : ((most_frequent_ext == "g.vcf" || most_frequent_ext == "g.vcf.gz") && (out_format == "bcf" )) ? most_frequent_ext : out_format; } else { out_ext = most_frequent_ext; } if (inputs.output_prefix) { output_prefix = inputs.output_prefix; } else { if (in_variants.length > 1) { in_variants.sort(function(file1, file2) { var file1_name = file1.basename.toUpperCase(); var file2_name = file2.basename.toUpperCase(); if (file1_name < file2_name) { return -1; } if (file1_name > file2_name) { return 1; } // names must be equal return 0; }); } var in_variants_first = in_variants[0]; if (in_variants_first.metadata && in_variants_first.metadata.sample_id) { output_prefix = in_variants_first.metadata.sample_id; } else { output_prefix = in_variants_first.basename.split('.')[0]; } if (in_variants.length > 1) { output_prefix = output_prefix + "." + in_variants.length; } } return "--OUTPUT " + output_prefix + ".merged." + out_ext; } requirements: - class: ShellCommandRequirement - class: ResourceRequirement ramMin: |- ${ var memory = 3500; if (inputs.memory_per_job) { memory = inputs.memory_per_job; } if (inputs.memory_overhead_per_job) { memory += inputs.memory_overhead_per_job; } return memory; } coresMin: |- ${ return inputs.cpu_per_job ? inputs.cpu_per_job : 1 } - class: DockerRequirement dockerPull: 'bms-images.sbgenomics.com/bristol-myers-squibb/gatk-4-1-7-0:0' - class: InitialWorkDirRequirement listing: [] - class: InlineJavascriptRequirement expressionLib: - |- var updateMetadata = function(file, key, value) { file['metadata'][key] = value; return file; }; var setMetadata = function(file, metadata) { if (!('metadata' in file)) file['metadata'] = metadata; else { for (var key in metadata) { file['metadata'][key] = metadata[key]; } } return file }; var inheritMetadata = function(o1, o2) { var commonMetadata = {}; if (!Array.isArray(o2)) { o2 = [o2] } for (var i = 0; i < o2.length; i++) { var example = o2[i]['metadata']; for (var key in example) { if (i == 0) commonMetadata[key] = example[key]; else { if (!(commonMetadata[key] == example[key])) { delete commonMetadata[key] } } } } if (!Array.isArray(o1)) { o1 = setMetadata(o1, commonMetadata) } else { for (var i = 0; i < o1.length; i++) { o1[i] = setMetadata(o1[i], commonMetadata) } } return o1; }; var toArray = function(file) { return [].concat(file); }; var groupBy = function(files, key) { var groupedFiles = []; var tempDict = {}; for (var i = 0; i < files.length; i++) { var value = files[i]['metadata'][key]; if (value in tempDict) tempDict[value].push(files[i]); else tempDict[value] = [files[i]]; } for (var key in tempDict) { groupedFiles.push(tempDict[key]); } return groupedFiles; }; var orderBy = function(files, key, order) { var compareFunction = function(a, b) { if (a['metadata'][key].constructor === Number) { return a['metadata'][key] - b['metadata'][key]; } else { var nameA = a['metadata'][key].toUpperCase(); var nameB = b['metadata'][key].toUpperCase(); if (nameA < nameB) { return -1; } if (nameA > nameB) { return 1; } return 0; } }; files = files.sort(compareFunction); if (order == undefined || order == "asc") return files; else return files.reverse(); }; - |- var setMetadata = function(file, metadata) { if (!('metadata' in file)) file['metadata'] = metadata; else { for (var key in metadata) { file['metadata'][key] = metadata[key]; } } return file }; var inheritMetadata = function(o1, o2) { var commonMetadata = {}; if (!Array.isArray(o2)) { o2 = [o2] } for (var i = 0; i < o2.length; i++) { var example = o2[i]['metadata']; for (var key in example) { if (i == 0) commonMetadata[key] = example[key]; else { if (!(commonMetadata[key] == example[key])) { delete commonMetadata[key] } } } } if (!Array.isArray(o1)) { o1 = setMetadata(o1, commonMetadata) } else { for (var i = 0; i < o1.length; i++) { o1[i] = setMetadata(o1[i], commonMetadata) } } return o1; }; 'sbg:categories': - Utilities - VCF Processing 'sbg:license': Open source BSD (3-clause) license 'sbg:toolAuthor': Broad Institute 'sbg:toolkit': GATK 'sbg:toolkitVersion': 4.1.7.0 'sbg:projectName': iWES CWLtool validated pipelines 'sbg:revisionsInfo': - 'sbg:revision': 0 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:modifiedOn': 1638145234 'sbg:revisionNotes': null 'sbg:image_url': null 'sbg:links': - id: 'https://software.broadinstitute.org/gatk/' label: Homepage - id: 'https://github.com/broadinstitute/gatk/' label: Source Code - id: >- https://github.com/broadinstitute/gatk/releases/download/4.1.0.0/gatk-4.1.0.0.zip label: Download - id: 'https://www.ncbi.nlm.nih.gov/pubmed?term=20644199' label: Publications - id: >- https://software.broadinstitute.org/gatk/documentation/tooldocs/4.1.0.0/picard_vcf_MergeVcfs.php label: Documentation 'sbg:appVersion': - v1.2 'sbg:id': bristol-myers-squibb/iwes-cwltool-validated-pipelines/gatk-merge-vcfs/0 'sbg:revision': 0 'sbg:revisionNotes': null 'sbg:modifiedOn': 1638145234 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:createdOn': 1638145234 'sbg:createdBy': bristol-myers-squibb/jovana_babic 'sbg:project': bristol-myers-squibb/iwes-cwltool-validated-pipelines 'sbg:sbgMaintained': false 'sbg:validationErrors': [] 'sbg:contributors': - bristol-myers-squibb/jovana_babic 'sbg:latestRevision': 0 'sbg:publisher': sbg 'sbg:content_hash': adbda2c521f8c6f2854731e5f2b8ddb3ea2c647d0c894e3e341e944f993cd0372 label: GATK Merge VCFs 'sbg:x': 194.5807342529297 'sbg:y': 307.2890625 - id: haplotypecaller_genotyping in: - id: GenomeReference source: in_reference - id: inputBAM source: input_reads - id: ReferenceSNP source: dbsnp_database - id: interval source: target_bed - id: threads default: 16 - id: licsrvr_host_and_port source: licsrvr_host_and_port - id: cpu_per_job default: 16 out: - id: output run: class: CommandLineTool cwlVersion: v1.2 $namespaces: sbg: 'https://sevenbridges.com' id: >- bristol-myers-squibb/iwes-cwltool-validated-pipelines/haplotypecaller-genotyping/0 baseCommand: [] inputs: - id: GenomeReference type: File inputBinding: prefix: '-r' shellQuote: false position: 0 label: Genome reference (fasta) secondaryFiles: - pattern: .fai required: true - id: inputBAM type: File inputBinding: prefix: '-i' shellQuote: false position: 0 label: input BAM file 'sbg:fileTypes': BAM secondaryFiles: - pattern: ^.bai required: true - id: ReferenceSNP type: File? inputBinding: prefix: '-d' shellQuote: false position: 6 label: reference SNP (dbSNP) 'sbg:fileTypes': 'VCF.GZ, VCF' secondaryFiles: - pattern: .tbi required: true - id: minBaseQual type: int? inputBinding: prefix: '--min_base_qual' shellQuote: false position: 6 - id: pruneFactor type: int? inputBinding: prefix: '--prune_factor' shellQuote: false position: 6 - id: emitConfidence type: int? inputBinding: prefix: '--emit_conf' shellQuote: false position: 6 - id: callConfidence type: int? inputBinding: prefix: '--call_conf' shellQuote: false position: 6 - 'sbg:category': Algo Options 'sbg:toolDefaultValue': '1 in gvcf mode, 0 otherwise' id: phasing type: - 'null' - type: enum symbols: - Enable - Disable name: phasing inputBinding: shellQuote: false position: 6 valueFrom: |- ${ var expr = ''; if (self == '') { self = null; inputs.phasing = null }; if (inputs.phasing) { if (inputs.phasing == 'Enable') { expr = '--phasing 1'; } else { expr = '--phasing 0'; } } return expr; } label: Phasing doc: Disable/enable phasing (diploid only). - id: emitMode type: - 'null' - type: enum symbols: - VARIANT - CONFIDENT - ALL - GVCF name: emitMode inputBinding: prefix: '--emit_mode' shellQuote: false position: 6 - id: PCRIndelModel type: - 'null' - type: enum symbols: - HOSTILE - AGGRESIVE - CONSERVATIVE - NONE name: PCRIndelModel inputBinding: prefix: '--pcr_indel_model' shellQuote: false position: 6 - id: interval type: File? inputBinding: prefix: '--interval' shellQuote: false position: 0 - id: minMapQuality type: int? inputBinding: prefix: '--min_map_qual' shellQuote: false position: 6 - id: trimSoftClipped type: boolean? inputBinding: prefix: '--trim_soft_clip' shellQuote: false position: 6 - id: ploidy type: int? inputBinding: prefix: '--ploidy' shellQuote: false position: 6 - id: threads type: int? inputBinding: prefix: '-t' shellQuote: false position: 0 label: SentieonHaplotyper threads - 'sbg:category': Execution id: licsrvr_host_and_port type: string label: License server host and port doc: >- License server host and port in the format (HOST:PORT) (parentheses omitted). - 'sbg:category': Execution 'sbg:toolDefaultValue': '1' id: cpu_per_job type: int? label: CPU per job doc: >- Number of CPUs per job. Appropriate instance will be chosen based on this parameter. - 'sbg:category': Execution id: mem_per_job type: int? label: Memory per job doc: >- Memory per job in MB. Appropriate instance will be chosen based on this parameter. outputs: - id: output doc: HaplotypeCaller variants. label: HaplotypeCaller variants type: File outputBinding: glob: '*.vcf.gz' outputEval: '$(inheritMetadata(self, inputs.inputBAM))' secondaryFiles: - pattern: .tbi required: false 'sbg:fileTypes': 'VCF.GZ, VCF' doc: >- **Sentieon Haplotyper** is an algorithm designed to detect germline variants with Haplotype variant calling. It is capable of calling SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region. In other words, whenever the program encounters a region showing signs of variation, it discards the existing mapping information and completely reassembles the reads in that region. A list of **all inputs and parameters** with corresponding descriptions can be found at the bottom of the page. ### Common Use Cases * The input to the Haplotyper algorithm are BAM file and FASTA reference; its output is a VCF file. _Database dbSNP_ can be added to label found known variants. _BQSR Table_ can be given if one wants to perform recalibration on the fly. * Using _BAM Output_ option, one can output a BAM file with containing modified reads after the local reassembly done by the variant calling. This option should only be used in conjunction with a small BED file for troubleshooting purposes. ### Changes Introduced by Seven Bridges * No modifications to the original tool representation have been made. ### Common Issues and Important Notes * No common issues specific to the tool's execution on the Seven Bridges Platform have been detected. ### Performance Benchmarking In the following table you can find estimates of running time and cost. All samples are aligned against **GRCh37 human reference index**. *Cost can be significantly reduced by using **spot instances**. Visit the [Knowledge Center](https://docs.sevenbridges.com/docs/about-spot-instances) for more details.* | BAM File Size [GB] | Mean coverage |Duration (min) | Cores | Cost ($) | Instance (AWS) | |--------------------------|----------------|----------------|-------|----------|------------| | 1.29 | 6X | 1 | 16 | 0.013 | c4.4xlarge | | 7.45 | 40X | 3 | 16 | 0.039 | c4.4xlarge | | 8.36 | 46X | 3 | 16 | 0.039 | c4.4xlarge | | 181.87 | 42X | 47 | 58 | 1.80 | m5.12xlarge | | 220.17 | 50X | 53 | 53 | 2.03 | m5.12xlarge | | 252.65 | 52X | 32 | 47 | 1.76 | m5.12xlarge | ### References [1 - Sentieon manual](https://support.sentieon.com/manual/_downloads/Sentieon.pdf) label: Sentieon HaplotypeCaller arguments: - prefix: '' shellQuote: false position: 0 valueFrom: >- ${ var command = 'export SENTIEON_LICENSE='; var command = command + inputs.licsrvr_host_and_port; return command; } - prefix: '' shellQuote: false position: 0 valueFrom: '&&' - prefix: '' shellQuote: false position: 0 valueFrom: ' $SENTIEON_PATH/bin/sentieon' - prefix: '' shellQuote: false position: 0 valueFrom: driver - prefix: '--algo' shellQuote: false position: 5 valueFrom: Haplotyper - prefix: '' shellQuote: false position: 35 valueFrom: |- ${ var ext=".vcf.gz"; if(inputs.emitMode === 'GVCF'){ ext=".g.vcf.gz" } else ext=".vcf.gz"; return inputs.inputBAM.basename.replace(/.coord|.name|.mdup|.bam$/gi, '') + ext; } requirements: - class: ShellCommandRequirement - class: NetworkAccess networkAccess: true - class: ResourceRequirement ramMin: |- ${ if (inputs.mem_per_job) { return inputs.mem_per_job; } else { return 1000; } } coresMin: |- ${ if (inputs.cpu_per_job) { return inputs.cpu_per_job;} else { return 1;} } - class: DockerRequirement dockerPull: 'images.sbgenomics.com/luka.topalovic/sentieon20201001:0' - class: InlineJavascriptRequirement expressionLib: - |- var setMetadata = function(file, metadata) { if (!('metadata' in file)) file['metadata'] = metadata; else { for (var key in metadata) { file['metadata'][key] = metadata[key]; } } return file }; var inheritMetadata = function(o1, o2) { var commonMetadata = {}; if (!Array.isArray(o2)) { o2 = [o2] } for (var i = 0; i < o2.length; i++) { var example = o2[i]['metadata']; for (var key in example) { if (i == 0) commonMetadata[key] = example[key]; else { if (!(commonMetadata[key] == example[key])) { delete commonMetadata[key] } } } } if (!Array.isArray(o1)) { o1 = setMetadata(o1, commonMetadata) } else { for (var i = 0; i < o1.length; i++) { o1[i] = setMetadata(o1[i], commonMetadata) } } return o1; }; 'sbg:projectName': iWES CWLtool validated pipelines 'sbg:revisionsInfo': - 'sbg:revision': 0 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:modifiedOn': 1638145235 'sbg:revisionNotes': null 'sbg:image_url': null 'sbg:toolkit': Sentieon 'sbg:toolkitVersion': '20201001' 'sbg:categories': - Variant Calling 'sbg:license': Client license 'sbg:appVersion': - v1.2 'sbg:id': >- bristol-myers-squibb/iwes-cwltool-validated-pipelines/haplotypecaller-genotyping/0 'sbg:revision': 0 'sbg:revisionNotes': null 'sbg:modifiedOn': 1638145235 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:createdOn': 1638145235 'sbg:createdBy': bristol-myers-squibb/jovana_babic 'sbg:project': bristol-myers-squibb/iwes-cwltool-validated-pipelines 'sbg:sbgMaintained': false 'sbg:validationErrors': [] 'sbg:contributors': - bristol-myers-squibb/jovana_babic 'sbg:latestRevision': 0 'sbg:publisher': sbg 'sbg:content_hash': a7ff3d6ee625f54324f0eebfb2032b95da416f5ea304243d35113eebf856fd7c7 label: Sentieon HaplotypeCaller 'sbg:x': -130.3828125 'sbg:y': -8.734375 - id: haplotypecaller_genotyping_gvcf in: - id: GenomeReference source: in_reference - id: inputBAM source: input_reads - id: ReferenceSNP source: dbsnp_database - id: phasing default: Enable - id: emitMode default: GVCF - id: interval source: sbg_prepare_intervals/intervals - id: threads default: 16 - id: licsrvr_host_and_port source: licsrvr_host_and_port - id: cpu_per_job default: 6 - id: mem_per_job default: 5000 out: - id: output run: class: CommandLineTool cwlVersion: v1.2 $namespaces: sbg: 'https://sevenbridges.com' id: >- bristol-myers-squibb/iwes-cwltool-validated-pipelines/haplotypecaller-genotyping/0 baseCommand: [] inputs: - id: GenomeReference type: File inputBinding: prefix: '-r' shellQuote: false position: 0 label: Genome reference (fasta) secondaryFiles: - pattern: .fai required: true - id: inputBAM type: File inputBinding: prefix: '-i' shellQuote: false position: 0 label: input BAM file 'sbg:fileTypes': BAM secondaryFiles: - pattern: ^.bai required: true - id: ReferenceSNP type: File? inputBinding: prefix: '-d' shellQuote: false position: 6 label: reference SNP (dbSNP) 'sbg:fileTypes': 'VCF.GZ, VCF' secondaryFiles: - pattern: .tbi required: true - id: minBaseQual type: int? inputBinding: prefix: '--min_base_qual' shellQuote: false position: 6 - id: pruneFactor type: int? inputBinding: prefix: '--prune_factor' shellQuote: false position: 6 - id: emitConfidence type: int? inputBinding: prefix: '--emit_conf' shellQuote: false position: 6 - id: callConfidence type: int? inputBinding: prefix: '--call_conf' shellQuote: false position: 6 - 'sbg:category': Algo Options 'sbg:toolDefaultValue': '1 in gvcf mode, 0 otherwise' id: phasing type: - 'null' - type: enum symbols: - Enable - Disable name: phasing inputBinding: shellQuote: false position: 6 valueFrom: |- ${ var expr = ''; if (self == '') { self = null; inputs.phasing = null }; if (inputs.phasing) { if (inputs.phasing == 'Enable') { expr = '--phasing 1'; } else { expr = '--phasing 0'; } } return expr; } label: Phasing doc: Disable/enable phasing (diploid only). - id: emitMode type: - 'null' - type: enum symbols: - VARIANT - CONFIDENT - ALL - GVCF name: emitMode inputBinding: prefix: '--emit_mode' shellQuote: false position: 6 - id: PCRIndelModel type: - 'null' - type: enum symbols: - HOSTILE - AGGRESIVE - CONSERVATIVE - NONE name: PCRIndelModel inputBinding: prefix: '--pcr_indel_model' shellQuote: false position: 6 - id: interval type: File? inputBinding: prefix: '--interval' shellQuote: false position: 0 - id: minMapQuality type: int? inputBinding: prefix: '--min_map_qual' shellQuote: false position: 6 - id: trimSoftClipped type: boolean? inputBinding: prefix: '--trim_soft_clip' shellQuote: false position: 6 - id: ploidy type: int? inputBinding: prefix: '--ploidy' shellQuote: false position: 6 - id: threads type: int? inputBinding: prefix: '-t' shellQuote: false position: 0 label: SentieonHaplotyper threads - 'sbg:category': Execution id: licsrvr_host_and_port type: string label: License server host and port doc: >- License server host and port in the format (HOST:PORT) (parentheses omitted). - 'sbg:category': Execution 'sbg:toolDefaultValue': '1' id: cpu_per_job type: int? label: CPU per job doc: >- Number of CPUs per job. Appropriate instance will be chosen based on this parameter. - 'sbg:category': Execution id: mem_per_job type: int? label: Memory per job doc: >- Memory per job in MB. Appropriate instance will be chosen based on this parameter. outputs: - id: output doc: HaplotypeCaller variants. label: HaplotypeCaller variants type: File outputBinding: glob: '*.vcf.gz' outputEval: '$(inheritMetadata(self, inputs.inputBAM))' secondaryFiles: - pattern: .tbi required: false 'sbg:fileTypes': 'VCF.GZ, VCF' doc: >- **Sentieon Haplotyper** is an algorithm designed to detect germline variants with Haplotype variant calling. It is capable of calling SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region. In other words, whenever the program encounters a region showing signs of variation, it discards the existing mapping information and completely reassembles the reads in that region. A list of **all inputs and parameters** with corresponding descriptions can be found at the bottom of the page. ### Common Use Cases * The input to the Haplotyper algorithm are BAM file and FASTA reference; its output is a VCF file. _Database dbSNP_ can be added to label found known variants. _BQSR Table_ can be given if one wants to perform recalibration on the fly. * Using _BAM Output_ option, one can output a BAM file with containing modified reads after the local reassembly done by the variant calling. This option should only be used in conjunction with a small BED file for troubleshooting purposes. ### Changes Introduced by Seven Bridges * No modifications to the original tool representation have been made. ### Common Issues and Important Notes * No common issues specific to the tool's execution on the Seven Bridges Platform have been detected. ### Performance Benchmarking In the following table you can find estimates of running time and cost. All samples are aligned against **GRCh37 human reference index**. *Cost can be significantly reduced by using **spot instances**. Visit the [Knowledge Center](https://docs.sevenbridges.com/docs/about-spot-instances) for more details.* | BAM File Size [GB] | Mean coverage |Duration (min) | Cores | Cost ($) | Instance (AWS) | |--------------------------|----------------|----------------|-------|----------|------------| | 1.29 | 6X | 1 | 16 | 0.013 | c4.4xlarge | | 7.45 | 40X | 3 | 16 | 0.039 | c4.4xlarge | | 8.36 | 46X | 3 | 16 | 0.039 | c4.4xlarge | | 181.87 | 42X | 47 | 58 | 1.80 | m5.12xlarge | | 220.17 | 50X | 53 | 53 | 2.03 | m5.12xlarge | | 252.65 | 52X | 32 | 47 | 1.76 | m5.12xlarge | ### References [1 - Sentieon manual](https://support.sentieon.com/manual/_downloads/Sentieon.pdf) label: Sentieon HaplotypeCaller arguments: - prefix: '' shellQuote: false position: 0 valueFrom: >- ${ var command = 'export SENTIEON_LICENSE='; var command = command + inputs.licsrvr_host_and_port; return command; } - prefix: '' shellQuote: false position: 0 valueFrom: '&&' - prefix: '' shellQuote: false position: 0 valueFrom: ' $SENTIEON_PATH/bin/sentieon' - prefix: '' shellQuote: false position: 0 valueFrom: driver - prefix: '--algo' shellQuote: false position: 5 valueFrom: Haplotyper - prefix: '' shellQuote: false position: 35 valueFrom: |- ${ var ext=".vcf.gz"; if(inputs.emitMode === 'GVCF'){ ext=".g.vcf.gz" } else ext=".vcf.gz"; return inputs.inputBAM.basename.replace(/.coord|.name|.mdup|.bam$/gi, '') + ext; } requirements: - class: ShellCommandRequirement - class: NetworkAccess networkAccess: true - class: ResourceRequirement ramMin: |- ${ if (inputs.mem_per_job) { return inputs.mem_per_job; } else { return 1000; } } coresMin: |- ${ if (inputs.cpu_per_job) { return inputs.cpu_per_job;} else { return 1;} } - class: DockerRequirement dockerPull: 'images.sbgenomics.com/luka.topalovic/sentieon20201001:0' - class: InlineJavascriptRequirement expressionLib: - |- var setMetadata = function(file, metadata) { if (!('metadata' in file)) file['metadata'] = metadata; else { for (var key in metadata) { file['metadata'][key] = metadata[key]; } } return file }; var inheritMetadata = function(o1, o2) { var commonMetadata = {}; if (!Array.isArray(o2)) { o2 = [o2] } for (var i = 0; i < o2.length; i++) { var example = o2[i]['metadata']; for (var key in example) { if (i == 0) commonMetadata[key] = example[key]; else { if (!(commonMetadata[key] == example[key])) { delete commonMetadata[key] } } } } if (!Array.isArray(o1)) { o1 = setMetadata(o1, commonMetadata) } else { for (var i = 0; i < o1.length; i++) { o1[i] = setMetadata(o1[i], commonMetadata) } } return o1; }; 'sbg:projectName': iWES CWLtool validated pipelines 'sbg:revisionsInfo': - 'sbg:revision': 0 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:modifiedOn': 1638145235 'sbg:revisionNotes': null 'sbg:image_url': null 'sbg:toolkit': Sentieon 'sbg:toolkitVersion': '20201001' 'sbg:categories': - Variant Calling 'sbg:license': Client license 'sbg:appVersion': - v1.2 'sbg:id': >- bristol-myers-squibb/iwes-cwltool-validated-pipelines/haplotypecaller-genotyping/0 'sbg:revision': 0 'sbg:revisionNotes': null 'sbg:modifiedOn': 1638145235 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:createdOn': 1638145235 'sbg:createdBy': bristol-myers-squibb/jovana_babic 'sbg:project': bristol-myers-squibb/iwes-cwltool-validated-pipelines 'sbg:sbgMaintained': false 'sbg:validationErrors': [] 'sbg:contributors': - bristol-myers-squibb/jovana_babic 'sbg:latestRevision': 0 'sbg:publisher': sbg 'sbg:content_hash': a7ff3d6ee625f54324f0eebfb2032b95da416f5ea304243d35113eebf856fd7c7 label: Sentieon HaplotypeCaller GVCF scatter: - interval 'sbg:x': -133.79428100585938 'sbg:y': 305.75262451171875 - id: effect in: - id: inputVCF source: haplotypecaller_genotyping/output - id: SNPEff_data_directory source: snpeff_database out: - id: AnnotatedEFFVCF - id: statEFF - id: statGeneEFF run: class: CommandLineTool cwlVersion: v1.2 $namespaces: sbg: 'https://sevenbridges.com' id: bristol-myers-squibb/iwes-cwltool-validated-pipelines/effect/0 baseCommand: [] inputs: - id: nextProt type: boolean? inputBinding: prefix: '-nextProt' shellQuote: false position: 0 - id: intervals type: 'File[]?' inputBinding: prefix: '-interval' itemSeparator: ' -interval ' shellQuote: false position: 0 label: Intervals doc: >- Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times). This contains the annotation information. - id: motif type: boolean? inputBinding: prefix: '-motif' shellQuote: false position: 0 doc: Annotate using motif database - id: only_transcripts type: File? inputBinding: prefix: '-onlyTr' shellQuote: false position: 0 doc: Only use the transcripts - id: only_protein type: boolean? inputBinding: prefix: '-onlyProtein' shellQuote: false position: 0 doc: Anotate only protein coding transcripts - id: inputVCF type: File inputBinding: shellQuote: false position: 10 doc: VCF file to annotate secondaryFiles: - pattern: .tbi required: true - id: prefix_chr type: boolean? inputBinding: prefix: '-chr' shellQuote: false position: 0 doc: Add prefix 'chr' in front of chromosomes - id: filter_intervals type: 'File[]?' inputBinding: prefix: '-fi' itemSeparator: ' -fi ' shellQuote: false position: 0 doc: Restrict analysis to these regions 'sbg:fileTypes': bed - id: use_gene_id type: boolean? inputBinding: prefix: '-geneId' shellQuote: false position: 0 doc: Use gene id instead of gene name - id: LOF_NMD type: boolean? inputBinding: prefix: '-lof' shellQuote: false position: 0 doc: Add Loss of Function and Nonsense Mediated Decay annotations - id: canonical type: boolean? inputBinding: prefix: '-canon' shellQuote: false position: 0 doc: Annotate cannonical transcripts only - id: padding type: int? inputBinding: prefix: '-ud' shellQuote: false position: 0 doc: Upstream and Downstream interval size padding - id: cancer type: boolean? inputBinding: prefix: '-cancer' shellQuote: false position: 0 doc: >- Using the -cancer command line option, you can compare somatic vs germline samples. - id: Cancer_samples type: File? inputBinding: prefix: '-cancerSamples' shellQuote: false position: 0 doc: File with germline and cancer samples - id: SNPEff_data_directory type: File inputBinding: shellQuote: false position: -9 label: Tarball (tar.gz) with snpeff directory doc: |- tarball with snpeff directory. When extracted it will create the directory `snpEff` - 'sbg:toolDefaultValue': '1' id: cpu_per_job type: int? label: CPUs per job doc: Number of CPUs per job - 'sbg:toolDefaultValue': '8000' id: mem_per_job type: int? label: Memory per job (MB) doc: Memory per job - 'sbg:toolDefaultValue': '0' id: mem_overhead_per_job type: int? label: Memory overhead per job (MB) doc: Memory overhead per job (MB) - 'sbg:toolDefaultValue': 'False' id: compress_output type: boolean? label: Compress Output doc: Performs bgzip and tabix on the output vcf outputs: - id: AnnotatedEFFVCF label: VCF file annotated wiht the Effect type: File outputBinding: glob: |- ${ if (inputs.compress_output){ return inputs.inputVCF.basename; } return inputs.inputVCF.nameroot; } outputEval: '$(inheritMetadata(self, inputs.inputVCF))' secondaryFiles: - pattern: .tbi required: false - id: statEFF doc: html file with annotation statistics from SNPEFF type: File? outputBinding: glob: '*.html' outputEval: '$(inheritMetadata(self, inputs.inputVCF))' - id: statGeneEFF doc: txt file with annotation statistics from SNPEFF type: File? outputBinding: glob: '*genes.txt' outputEval: '$(inheritMetadata(self, inputs.inputVCF))' label: SnpEff Annotation arguments: - prefix: '' shellQuote: false position: -1 valueFrom: |- ${ var mem = 8000; var overhead = 0; if (inputs.mem_per_job){ mem = inputs.mem_per_job; } if (inputs.mem_overhead_per_job){ overhead = inputs.mem_overhead_per_job; } return 'java -Xmx'.concat(mem-overhead, 'M -jar /opt/snpEff/snpEff.jar'); } - prefix: '' shellQuote: false position: 0 valueFrom: eff - prefix: '-v' shellQuote: false position: 0 valueFrom: ' ' - prefix: '-stats' shellQuote: false position: 0 valueFrom: '$(inputs.inputVCF.basename.replace(".vcf.gz", ".stats.html"))' - prefix: '-config' shellQuote: false position: 0 valueFrom: /opt/snpEff/snpEff.config - prefix: '' shellQuote: false position: 9 valueFrom: GRCh38.86 - prefix: '' shellQuote: false position: 20 valueFrom: |- ${ if (inputs.compress_output){ var name = inputs.inputVCF.basename; return "| bgzip > " + name + " && tabix -p vcf " + name; } return " > " + inputs.inputVCF.nameroot; } - prefix: '-dataDir' shellQuote: false position: 5 valueFrom: >- ${ return runtime.outdir + "/" + inputs.SNPEff_data_directory.basename.replace(".tar.gz","") } - prefix: '' shellQuote: false position: -10 valueFrom: 'tar -xvzf ' - prefix: '' shellQuote: false position: -8 valueFrom: '&&' requirements: - class: ShellCommandRequirement - class: ResourceRequirement ramMin: |- ${ var mem=8000; if (inputs.mem_per_job){ mem = inputs.mem_per_job; } return mem; } coresMin: |- ${ var cpu = 1; if (inputs.cpu_per_job){ cpu = inputs.cpu_per_job; } return cpu; } - class: DockerRequirement dockerPull: 'images.sbgenomics.com/bristol-myers-squibb/celgene/snpeff4.3t:0' - class: InlineJavascriptRequirement expressionLib: - |- var setMetadata = function(file, metadata) { if (!('metadata' in file)) file['metadata'] = metadata; else { for (var key in metadata) { file['metadata'][key] = metadata[key]; } } return file }; var inheritMetadata = function(o1, o2) { var commonMetadata = {}; if (!Array.isArray(o2)) { o2 = [o2] } for (var i = 0; i < o2.length; i++) { var example = o2[i]['metadata']; for (var key in example) { if (i == 0) commonMetadata[key] = example[key]; else { if (!(commonMetadata[key] == example[key])) { delete commonMetadata[key] } } } } if (!Array.isArray(o1)) { o1 = setMetadata(o1, commonMetadata) } else { for (var i = 0; i < o1.length; i++) { o1[i] = setMetadata(o1[i], commonMetadata) } } return o1; }; - |- var setMetadata = function(file, metadata) { if (!('metadata' in file)) { file['metadata'] = {} } for (var key in metadata) { file['metadata'][key] = metadata[key]; } return file }; var inheritMetadata = function(o1, o2) { var commonMetadata = {}; if (!o2) { return o1; }; if (!Array.isArray(o2)) { o2 = [o2] } for (var i = 0; i < o2.length; i++) { var example = o2[i]['metadata']; for (var key in example) { if (i == 0) commonMetadata[key] = example[key]; else { if (!(commonMetadata[key] == example[key])) { delete commonMetadata[key] } } } for (var key in commonMetadata) { if (!(key in example)) { delete commonMetadata[key] } } } if (!Array.isArray(o1)) { o1 = setMetadata(o1, commonMetadata) if (o1.secondaryFiles) { o1.secondaryFiles = inheritMetadata(o1.secondaryFiles, o2) } } else { for (var i = 0; i < o1.length; i++) { o1[i] = setMetadata(o1[i], commonMetadata) if (o1[i].secondaryFiles) { o1[i].secondaryFiles = inheritMetadata(o1[i].secondaryFiles, o2) } } } return o1; }; 'sbg:projectName': iWES CWLtool validated pipelines 'sbg:revisionsInfo': - 'sbg:revision': 0 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:modifiedOn': 1638145237 'sbg:revisionNotes': null 'sbg:image_url': null 'sbg:toolkit': SnpSift 'sbg:toolkitVersion': 4.3t 'sbg:appVersion': - v1.2 'sbg:id': bristol-myers-squibb/iwes-cwltool-validated-pipelines/effect/0 'sbg:revision': 0 'sbg:revisionNotes': null 'sbg:modifiedOn': 1638145237 'sbg:modifiedBy': bristol-myers-squibb/jovana_babic 'sbg:createdOn': 1638145237 'sbg:createdBy': bristol-myers-squibb/jovana_babic 'sbg:project': bristol-myers-squibb/iwes-cwltool-validated-pipelines 'sbg:sbgMaintained': false 'sbg:validationErrors': [] 'sbg:contributors': - bristol-myers-squibb/jovana_babic 'sbg:latestRevision': 0 'sbg:publisher': sbg 'sbg:content_hash': ac134528fefee107d3ea2aca818d3d7f2037b40f349fd96143e5882ef5848832a label: SnpEff Annotation 'sbg:x': 201.558837890625 'sbg:y': -33.35270309448242 - id: ancestry_admixture_pipeline in: - id: vcf_file source: haplotypecaller_genotyping/output - id: resources_files_tar_archive source: ancestry_resources_files out: - id: admixture_proportions_with_pedigree_info run: class: CommandLineTool cwlVersion: v1.2 $namespaces: sbg: 'https://sevenbridges.com' id: >- bristol-myers-squibb/ancestry-iwes-integration-dev/ancestry-admixture-pipeline/2 baseCommand: [] inputs: - id: vcf_file type: File inputBinding: shellQuote: false position: 2 label: Input VCF doc: Input VCF to calculate samples admixture/ancestry estimates. 'sbg:fileTypes': VCF - id: resources_files_tar_archive type: File label: Resources files doc: >- TAR archive with the directory containing resources required to run the pipeline: 1. Autosomal_SNP_list_only_rs_v2.txt (2 columns (rsID\trsID) for the Ancestry Informative Markers - AIMs) 2. PAP.bed, PAP.bim, PAP.fam (genotypes of hypothetical/putative ancestral population) 3. snp151Commonhg19.bed OR snp151Commonhg19.bed (dbSNP151 table, hg19 or hg38, rsID mappings to chr, start, end, only SNPs) 4. template_merge.pop (template required for running ADMIXTURE program) 'sbg:fileTypes': TAR outputs: - id: admixture_proportions_with_pedigree_info doc: Admixture proportions with pedigree info report file. label: Admixture proportions with pedigree info type: File outputBinding: glob: '*.admixture.pedigree.txt' outputEval: '$(inheritMetadata(self, inputs.vcf_file))' 'sbg:fileTypes': ADMIXTURE.PEDIGREE.TXT label: Ancestry Admixture Pipeline arguments: - prefix: '' shellQuote: false position: 0 valueFrom: |- ${ var archive = [].concat(inputs.resources_files_tar_archive)[0].path return "tar -xf " + archive } - prefix: '' shellQuote: false position: 1 valueFrom: |- ${ return "&& admixture_analysis.sh" } - prefix: '' shellQuote: false position: 3 valueFrom: |- ${ var reference_dir = inputs.resources_files_tar_archive.metadata.reference_genome if (reference_dir == "GRCh37") { var reference_name = "GRCh37" } else if (reference_dir == "GRCh38") { var reference_name = "GRCh38" } return reference_name + " " + reference_dir } - prefix: '' shellQuote: false position: 4 valueFrom: |- ${ var sample_name = inputs.vcf_file.metadata.sample_id var reference = inputs.resources_files_tar_archive.metadata.reference_genome var rename = sample_name + "." + reference + ".admixture.pedigree.txt" return "&& mv *.pedigree.info.txt " + rename } requirements: - class: ShellCommandRequirement - class: LoadListingRequirement - class: InlineJavascriptRequirement expressionLib: - |- var setMetadata = function(file, metadata) { if (!('metadata' in file)) { file['metadata'] = {} } for (var key in metadata) { file['metadata'][key] = metadata[key]; } return file }; var inheritMetadata = function(o1, o2) { var commonMetadata = {}; if (!o2) { return o1; }; if (!Array.isArray(o2)) { o2 = [o2] } for (var i = 0; i < o2.length; i++) { var example = o2[i]['metadata']; for (var key in example) { if (i == 0) commonMetadata[key] = example[key]; else { if (!(commonMetadata[key] == example[key])) { delete commonMetadata[key] } } } for (var key in commonMetadata) { if (!(key in example)) { delete commonMetadata[key] } } } if (!Array.isArray(o1)) { o1 = setMetadata(o1, commonMetadata) if (o1.secondaryFiles) { o1.secondaryFiles = inheritMetadata(o1.secondaryFiles, o2) } } else { for (var i = 0; i < o1.length; i++) { o1[i] = setMetadata(o1[i], commonMetadata) if (o1[i].secondaryFiles) { o1[i].secondaryFiles = inheritMetadata(o1[i].secondaryFiles, o2) } } } return o1; }; hints: - class: DockerRequirement dockerPull: >- bms-images.sbgenomics.com/bristol-myers-squibb/ancestry_admixture_analysis:master successCodes: - 0 - 7 'sbg:projectName': Ancestry iWES integration - Dev 'sbg:revisionsInfo': - 'sbg:revision': 0 'sbg:modifiedBy': bristol-myers-squibb/ana_stankovic 'sbg:modifiedOn': 1641985283 'sbg:revisionNotes': >- Copy of bristol-myers-squibb/bms-rna-tools-cwl1-2-per-sample-workflow/ancestry-admixture-pipeline/12 - 'sbg:revision': 1 'sbg:modifiedBy': bristol-myers-squibb/ana_stankovic 'sbg:modifiedOn': 1642078829 'sbg:revisionNotes': 'Added labels, file types and descriptions' - 'sbg:revision': 2 'sbg:modifiedBy': bristol-myers-squibb/ana_stankovic 'sbg:modifiedOn': 1642079285 'sbg:revisionNotes': App rename 'sbg:image_url': null 'sbg:appVersion': - v1.2 'sbg:id': >- bristol-myers-squibb/ancestry-iwes-integration-dev/ancestry-admixture-pipeline/2 'sbg:revision': 2 'sbg:revisionNotes': App rename 'sbg:modifiedOn': 1642079285 'sbg:modifiedBy': bristol-myers-squibb/ana_stankovic 'sbg:createdOn': 1641985283 'sbg:createdBy': bristol-myers-squibb/ana_stankovic 'sbg:project': bristol-myers-squibb/ancestry-iwes-integration-dev 'sbg:sbgMaintained': false 'sbg:validationErrors': [] 'sbg:contributors': - bristol-myers-squibb/ana_stankovic 'sbg:latestRevision': 2 'sbg:publisher': sbg 'sbg:content_hash': af79fed6510469f1632bac893303c6fe37084c34333ebe4c5f5dcc52d6a2ec3c7 'sbg:workflowLanguage': CWL label: Ancestry Admixture Pipeline 'sbg:x': 202.61781311035156 'sbg:y': 134.41197204589844 when: '$(inputs.resources_files_tar_archive? true : false)' hints: - class: 'sbg:AWSInstanceType' value: c4.8xlarge;ebs-gp2;1200 requirements: - class: ScatterFeatureRequirement - class: InlineJavascriptRequirement - class: StepInputExpressionRequirement 'sbg:projectName': Integrated WES-WGS Production Ready Pipelines 'sbg:revisionsInfo': - 'sbg:revision': 0 'sbg:modifiedBy': bristol-myers-squibb/luka.topalovic 'sbg:modifiedOn': 1622761292 'sbg:revisionNotes': null - 'sbg:revision': 1 'sbg:modifiedBy': bristol-myers-squibb/luka.topalovic 'sbg:modifiedOn': 1622761292 'sbg:revisionNotes': Initial - 'sbg:revision': 2 'sbg:modifiedBy': bristol-myers-squibb/luka.topalovic 'sbg:modifiedOn': 1622761375 'sbg:revisionNotes': Initial - 'sbg:revision': 3 'sbg:modifiedBy': bristol-myers-squibb/pavle.marinkovic 'sbg:modifiedOn': 1625754356 'sbg:revisionNotes': removed VC Metrics - 'sbg:revision': 4 'sbg:modifiedBy': bristol-myers-squibb/luka_test_account 'sbg:modifiedOn': 1640359453 'sbg:revisionNotes': cwltool validated - 'sbg:revision': 5 'sbg:modifiedBy': bristol-myers-squibb/pavle.marinkovic 'sbg:modifiedOn': 1640706338 'sbg:revisionNotes': updated - 'sbg:revision': 6 'sbg:modifiedBy': bristol-myers-squibb/luka_test_account 'sbg:modifiedOn': 1643896038 'sbg:revisionNotes': Added Ancestry pipeline tool 'sbg:image_url': >- https://bms.sbgenomics.com/ns/brood/images/bristol-myers-squibb/integrated-wes-wgs-production-ready-pipelines/germline-calling/6.png 'sbg:toolAuthor': Luka Topalovic 'sbg:wrapperAuthor': Luka Topalovic 'sbg:categories': - Variant Calling - DNA - WES (WXS) 'sbg:appVersion': - v1.2 'sbg:id': >- bristol-myers-squibb/integrated-wes-wgs-production-ready-pipelines/germline-calling/6 'sbg:revision': 6 'sbg:revisionNotes': Added Ancestry pipeline tool 'sbg:modifiedOn': 1643896038 'sbg:modifiedBy': bristol-myers-squibb/luka_test_account 'sbg:createdOn': 1622761292 'sbg:createdBy': bristol-myers-squibb/luka.topalovic 'sbg:project': bristol-myers-squibb/integrated-wes-wgs-production-ready-pipelines 'sbg:sbgMaintained': false 'sbg:validationErrors': [] 'sbg:contributors': - bristol-myers-squibb/luka.topalovic - bristol-myers-squibb/luka_test_account - bristol-myers-squibb/pavle.marinkovic 'sbg:latestRevision': 6 'sbg:publisher': sbg 'sbg:content_hash': a321bc37915c5041deb439acc318bc922c15d7e0a411faa0f075bb02de835f8bc 'sbg:workflowLanguage': CWL