Help
Some features of the Viral MinionDB
The Viral Minion DB is very easy to use and allows you to search for models that fulfill a series of characteristics. Before we start, let's get familiarized with some aspects of the database and the models:
- All models come with a recommended cuttoff score inserted as a tag (CUTOFF SCORE) in the header section. We strongly recommend you to perform your similarity search using the suggested value. You can use the parameter -T <cutoff score> (threshold) on the hmmsearch program.
- For automated searches of multiple models, we offer HMM-Prospector (link: https://github.com/gruberlab/hmmprospector), a companion tool to the database. HMM-Prospector is a publicly available program that uses a single or multiple profile HMM file as a query in similarity searches against a FASTQ/FASTA dataset using the hmmsearch program. HMM-Prospector processes the results and generates tabular files with qualitative and quantitative results. Score cutoffs are automatically used for each search according to the respective values assigned in each profile HMM. The program is fully documented.
- There are two types of profile HMMs, based on the length of the alignment region used to build the model. Models built from multiple sequence alignments (MSAs) containing the entire protein sequences are called "full-length" models. Conversely, models built from short alignment regions, typically varying from 20 to 60 bp, are named "short" models. Initially, all developed models were short and, for this reason, they were called "Minions". From now on, as explained previously, this repository also includes full-length models.
- The entire dataset is composed of profile HMMs constructed from prokaryotic and eukaryotic viruses and the user can restrict the search to either one of these groups or use both.
- Unlike other repositories of profile HMMs, we did not use orthology methods to select the sequences to be used in model construction. Here, all models were constructed from sequences selected from the NCBI's Identical Protein Groups (IPG) (link: https://www.ncbi.nlm.nih.gov/ipg) database using queries based on taxonomic names or IDs.
- Annotation terms that designate the proteins have been added to the database and can be used in queries.
- Models are designed for the detection of "wide" and 'narrow" taxonomic groups. The term wide refers to a viral family, while the term narrow to a sublevel of this taxon, i.e., a subfamily or genus. For example, a model able to detect any member of the Peribunyaviridae family, including the genera Herbevirus and Orthobunyavirus, is considered a "wide" detection model. On the other hand, models capable of specifically detecting one or the other of these genera are considered "narrow" detection models.
Making your searches
To select the profile HMMs, go the the "Search HMMs" tab and select your preferred choices.
- Insert a taxonomic name or taxon ID. This is MANDATORY and unfilled queries will return no results. If you want, for instance, all models of the database, you can use the term "Viruses". All other parameters are optional.
- You can define a query term using a protein name.
- You can also define an exclusion term to be used as a negative term in the query.
- Profile HMMs are built according to the number of sequences available on the IPG database. You can restrict the search of models to only those constructed with a minimum number of sequences.
- Model type: full-length or short - see item 3 of the former section.
- Minimum and maximum length - see item 3 of the former section. This option only applies to short models. if you choose model type "full-length" or all", these options will not be available.
- Taxonomic range of models: wide and narrow - see item 7 of the former section.
- Host: eukaryotic or prokaryotic - see item 4 of the former section.
Example: let's find short profile HMMs constructed from at least 5 sequences of the RNA-dependent RNA polymerase, with a narrow range to detect viral sequences of the genus Orthobunyavirus, and with a total length between 25 to 50 bp:
Use the following options:
- Orthobunyavirus
- RNA polymerase
- <blank>
- 5
- short
- 25
- 50
- narrow
- eukaryotic
Download the models
Once you make your search, if there are available models fulfilling your criteria, a download link will be presented at the bottom of the search page. Additional links allow you to download all models from eukaryotic or prokaryotic viruses or, alternatively, from both types of hosts.