Workflow: Bacterial Annotation, pass 4, blastp-based functional annotation (second pass)

Fetched 2023-01-14 18:17:01 GMT
children parents
workflow cluster_outputs Workflow Outputs cluster_inputs Workflow Inputs wp_assignments wp_assignments Prepare_SPARCLBL_input prepare_sparclbl_input wp_assignments->Prepare_SPARCLBL_input other_assignments Add_Names_to_Proteins add_prot_names_to_annot wp_assignments->Add_Names_to_Proteins proteins hmm_assignments hmm_assignments hmm_assignments->Prepare_SPARCLBL_input other_assignments hmm_assignments->Add_Names_to_Proteins proteins proteins proteins Find_Naming_Protein_Hits blastp_wnode_naming proteins->Find_Naming_Protein_Hits proteins Assign_SPARCL_Architecture_Names_to_Proteins_gp_fetch_sequences gp_fetch_sequences proteins->Assign_SPARCL_Architecture_Names_to_Proteins_gp_fetch_sequences proteins Assign_Clusters_to_Proteins assign_cluster proteins->Assign_Clusters_to_Proteins proteins thresholds thresholds Bacterial_Annot_Filter bact_annot_filter thresholds->Bacterial_Annot_Filter thr genus_list genus_list genus_list->Find_Naming_Protein_Hits genus_list sequence_cache sequence_cache sequence_cache->Find_Naming_Protein_Hits asn_cache sequence_cache->Assign_Clusters_to_Proteins asn_cache sequence_cache->Add_Names_to_Proteins asn_cache sequence_cache->Bacterial_Annot_Filter asn_cache Find_best_protein_hits align_filter sequence_cache->Find_best_protein_hits asn_cache uniColl_cache uniColl_cache uniColl_cache->Find_Naming_Protein_Hits asn_cache uniColl_cache->Assign_Clusters_to_Proteins asn_cache uniColl_cache->Find_best_protein_hits asn_cache lds2 lds2 lds2->Find_Naming_Protein_Hits lds2 lds2->Assign_SPARCL_Architecture_Names_to_Proteins_gp_fetch_sequences lds2 lds2->Assign_Clusters_to_Proteins lds2 naming_blast_db naming_blast_db naming_blast_db->Assign_Clusters_to_Proteins namedb_dir Extract_Model_Proteins_prot_ids Extract_Model_Proteins_prot_ids Extract_Model_Proteins_prot_ids->Prepare_SPARCLBL_input input Good_AntiFam_filtered_proteins_gilist Good_AntiFam_filtered_proteins_gilist Good_AntiFam_filtered_proteins_gilist->Find_Naming_Protein_Hits ids defline_cleanup_rules defline_cleanup_rules defline_cleanup_rules->Add_Names_to_Proteins defline_cleanup_rules CDDdata2 CDDdata2 Assign_SPARCL_Architecture_Names_to_Proteins_sparclbl sparclbl CDDdata2->Assign_SPARCL_Architecture_Names_to_Proteins_sparclbl d scatter_gather_nchunks scatter_gather_nchunks scatter_gather_nchunks->Find_Naming_Protein_Hits scatter_gather_nchunks blast_rules_db_dir blast_rules_db_dir blast_rules_db_dir->Find_Naming_Protein_Hits blastdb_dir blast_rules_db blast_rules_db naming_sqlite naming_sqlite naming_sqlite->Prepare_SPARCLBL_input unicoll_sqlite naming_sqlite->Assign_Clusters_to_Proteins unicoll_sqlite naming_sqlite->Add_Names_to_Proteins unicoll_sqlite annotation annotation annotation->Add_Names_to_Proteins input CDDdata CDDdata CDDdata->Assign_SPARCL_Architecture_Names_to_Proteins_sparclbl b blast_hits_cache blast_hits_cache blast_hits_cache->Find_Naming_Protein_Hits blast_hits_cache taxid taxid taxid->Find_Naming_Protein_Hits taxid identification_db_dir identification_db_dir identification_db_dir->Find_Naming_Protein_Hits blastdb_dir taxon_db taxon_db taxon_db->Find_Naming_Protein_Hits taxon_db out_annotation out_annotation Find_Naming_Protein_Hits->Find_best_protein_hits input Assign_Clusters_to_Proteins_sort align_sort Assign_Clusters_to_Proteins_sort->Assign_Clusters_to_Proteins hits Prepare_SPARCLBL_input->Assign_SPARCL_Architecture_Names_to_Proteins_gp_fetch_sequences input Prepare_SPARCLBL_input->Assign_SPARCL_Architecture_Names_to_Proteins_sparclbl p Assign_SPARCL_Architecture_Names_to_Proteins_asn2fasta asn2fasta Assign_SPARCL_Architecture_Names_to_Proteins_gp_fetch_sequences->Assign_SPARCL_Architecture_Names_to_Proteins_asn2fasta i Assign_SPARCL_Architecture_Names_to_Proteins_asn2fasta->Assign_SPARCL_Architecture_Names_to_Proteins_sparclbl s Assign_Clusters_to_Proteins->Prepare_SPARCLBL_input other_assignments Assign_Clusters_to_Proteins->Add_Names_to_Proteins proteins Assign_SPARCL_Architecture_Names_to_Proteins_sparclbl->Add_Names_to_Proteins proteins Add_Names_to_Proteins->Bacterial_Annot_Filter input Bacterial_Annot_Filter->out_annotation Find_best_protein_hits->Assign_Clusters_to_Proteins_sort input default1 50 default1->Find_Naming_Protein_Hits max_target_seqs default2 1e-01 default2->Find_Naming_Protein_Hits evalue default3 6 default3->Find_Naming_Protein_Hits word_size default4 21 default4->Find_Naming_Protein_Hits threshold default5 default5 default5->Find_Naming_Protein_Hits align_filter default6 "subject" default6->Find_Naming_Protein_Hits affinity default7 "yes" default7->Find_Naming_Protein_Hits soft_masking default8 true default8->Find_Naming_Protein_Hits compart default9 false default9->Find_Naming_Protein_Hits no_merge default10 "F" default10->Find_Naming_Protein_Hits comp_based_stats default11 "asn-binary" default11->Find_Naming_Protein_Hits ofmt default12 "6000000000" default12->Find_Naming_Protein_Hits dbsize default13 10000 default13->Find_Naming_Protein_Hits max_batch_length default14 "30 2.2 2.5" default14->Find_Naming_Protein_Hits seg default15 10 default15->Find_Naming_Protein_Hits top_by_score default16 0 default16->Find_Naming_Protein_Hits delay default17 1 default17->Find_Naming_Protein_Hits max_jobs default18 true default18->Find_Naming_Protein_Hits nogenbank default19 "blastdb" default19->Find_Naming_Protein_Hits blastdb default20 "blast_rules_db" default20->Find_Naming_Protein_Hits blastdb default21 "predicted-protein" default21->Find_Naming_Protein_Hits blast_type default22 true default22->Find_Naming_Protein_Hits allow_intersection default23 "13G" default23->Assign_Clusters_to_Proteins_sort limit_mem default24 true default24->Assign_Clusters_to_Proteins_sort nogenbank default25 "query,subject,-score,-num_ident,query_align_len,subject_align_len,query_start,subject_start" default25->Assign_Clusters_to_Proteins_sort k default26 "seq-align-set" default26->Assign_Clusters_to_Proteins_sort ifmt default27 "proteins.fa" default27->Assign_SPARCL_Architecture_Names_to_Proteins_asn2fasta fasta_name default28 true default28->Assign_SPARCL_Architecture_Names_to_Proteins_asn2fasta prots_only default29 "seq-entry" default29->Assign_SPARCL_Architecture_Names_to_Proteins_asn2fasta type default30 "binary" default30->Assign_SPARCL_Architecture_Names_to_Proteins_asn2fasta serial default31 true default31->Assign_Clusters_to_Proteins nogenbank default32 "blastp" default32->Assign_Clusters_to_Proteins task default33 21 default33->Assign_Clusters_to_Proteins threshold default34 5e-02 default34->Assign_Clusters_to_Proteins margin default35 "no" default35->Assign_Clusters_to_Proteins seg default36 "F" default36->Assign_Clusters_to_Proteins comp_based_stats default37 5e-01 default37->Assign_Clusters_to_Proteins cutoff default38 "blastdb" default38->Assign_Clusters_to_Proteins namedb default39 6 default39->Assign_Clusters_to_Proteins word_size default40 1.5e-01 default40->Assign_Clusters_to_Proteins sure_cutoff default41 "seq-align" default41->Assign_Clusters_to_Proteins hfmt default42 20 default42->Assign_SPARCL_Architecture_Names_to_Proteins_sparclbl m default43 500 default43->Assign_SPARCL_Architecture_Names_to_Proteins_sparclbl n default44 1 default44->Assign_SPARCL_Architecture_Names_to_Proteins_sparclbl x default45 true default45->Add_Names_to_Proteins submission_mode_genbank default46 true default46->Add_Names_to_Proteins nogenbank default47 1000000 default47->Bacterial_Annot_Filter long_model_limit default48 120 default48->Bacterial_Annot_Filter max_overlap default49 60 default49->Bacterial_Annot_Filter abs_short_model_limit default50 5000 default50->Bacterial_Annot_Filter max_unannotated_region default51 true default51->Bacterial_Annot_Filter nogebank default52 180 default52->Bacterial_Annot_Filter short_model_limit default53 true default53->Find_best_protein_hits nogenbank default54 default54 default54->Find_best_protein_hits filter default55 "seq-align-set" default55->Find_best_protein_hits ifmt
Workflow as SVG
  • Selected
  • Default Values
  • Nested Workflows
  • Tools
  • Inputs/Outputs

Inputs

ID Type Title Doc
lds2 File
taxid Integer
CDDdata Directory
CDDdata2 Directory
proteins File
taxon_db File
annotation File
genus_list Integer[]
thresholds File
naming_sqlite File
uniColl_cache Directory
blast_rules_db String
sequence_cache Directory
wp_assignments File
hmm_assignments File
naming_blast_db Directory
blast_hits_cache File (Optional)
blast_rules_db_dir Directory
defline_cleanup_rules File
identification_db_dir Directory
scatter_gather_nchunks String
Extract_Model_Proteins_prot_ids File
Good_AntiFam_filtered_proteins_gilist File

Steps

ID Runs Label Doc
Add_Names_to_Proteins
../progs/add_prot_names_to_annot.cwl (CommandLineTool)
add_prot_names_to_annot
Bacterial_Annot_Filter
../progs/bact_annot_filter.cwl (CommandLineTool)
bact_annot_filter
Find_best_protein_hits
../progs/align_filter.cwl (CommandLineTool)
align_filter
Prepare_SPARCLBL_input
../progs/prepare_sparclbl_input.cwl (CommandLineTool)
prepare_sparclbl_input
Find_Naming_Protein_Hits blastp_wnode_naming
Assign_Clusters_to_Proteins
../progs/assign_cluster.cwl (CommandLineTool)
assign_cluster
Assign_Clusters_to_Proteins_sort
../progs/align_sort.cwl (CommandLineTool)
align_sort
Assign_SPARCL_Architecture_Names_to_Proteins_sparclbl
../progs/sparclbl.cwl (CommandLineTool)
sparclbl
Assign_SPARCL_Architecture_Names_to_Proteins_asn2fasta
../progs/asn2fasta.cwl (CommandLineTool)
asn2fasta
Assign_SPARCL_Architecture_Names_to_Proteins_gp_fetch_sequences
../progs/gp_fetch_sequences.cwl (CommandLineTool)
gp_fetch_sequences

Outputs

ID Type Label Doc
out_annotation File
Permalink: https://w3id.org/cwl/view/git/5ec226c941562124032ca6861bc8d1aeabf9d91a/bacterial_annot/wf_bacterial_annot_pass4.cwl