What information does Tool 2 need?
You need to provide Tool 2 with two pieces of information.
First, the custom sequences, which Tool 2 will convert to a database. These must be saved in a multi-fasta file; the format of this file is important. Tool 2 looks for this file in the input_custom_seqs
folder.
Secondly, the genomes you wish to query must be saved in the input folder, input_genomes
. Tool 2 screens all genomes in this folder. The LIBRARY
folder contains the genome assemblies for all the sequenced strains.
How do you run Tool 2?
Once you have accessed Tool 2 and provided the two required inputs, you can set up a stable connection (optional, but recommended), enter a single command and the results will be deposited in the output folder.
What are the different folders?
The Tool 2 file system contains the following folders:
Directory names | Description of contents |
---|---|
input_custom_seqs/ | The multi-fasta files containing custom sequences (*.fa ) |
input_genomes/ | Tool 2 will screen all genomes in this folder (*.fa ). |
LIBRARY/ | The genome assembly files (*.fa ) for all sequenced isolates. |
output_tool_2/ | Your output data will be deposited here in subdirectories named xxx_i(yy)_c(zz)_YYYY-MM-DD_HH:MM , where xxx is the database, yy is the identity and zz is the coverage. This is followed by the date and time the run was started. |
work/ | The Nextflow working directory. |
What documents are made by Tool 2?
Each time Tool 2 runs it makes an output folder named according to the run parameters, date and time. In the output folder Tool 2 deposits the documents in 2 sub-folders: isolate
and summary
.
In the isolate
folder there will be a results file for each genome assembly.
In the summary
folder there will be a single document summarising all ‘hits’ results.
Tool 2 also reports information about the run. In the run output folder you will find:
run_parameters.txt
This text document lists:
- the name of the database used (xxx),
- the minimum DNA %identity (yy) and
- the minimum DNA %coverage (zz).
- Ignore ‘help=false’. This is an expected byproduct of the help/usage message.
abricate_version.txt
- a text document which reports the version of abricate used for the run
<xxx>_sequences.fa
- the custom sequences file used for the run.
How do you view the output files?
You can open the files in a spreadsheet, e.g. MS Excel. The output files are in .tab
format.
The run_parameters.txt
document can be opened in any text editor.
Can I change the settings of Tool 2?
Yes, you can modify the two different settings: identity
and coverage
. Using the default settings there must be a minimum DNA identity of 75%, and a minimum DNA coverage of 0% for a match to be recorded. These can be altered when you set Tool 2 to run.
For all runs, the settings are recorded in the name of the newly created output folder, e.g. the results of a run on May 21st 2021 started at 1.30 pm using the resfinder database and default settings will be stored in this folder: 12P-i80-c80__2021-05-21_13:15/
.
Is there a limit to how many genome assemblies Tool 2 can screen?
The smallest number of genome assemblies Tool 2 can screen is one. There is no upper limit, although practically speaking the upper limit will be defined by the size of the server.
Tool 2 will screen 100 bacterial genome assemblies in ~10 minutes.
Where can I find further information about Tool 2?
Tool 1 uses a software tool named ABRicate. It was developed by Torsten Seemann and Björn Grüning in Melborne, Australia. The ABRicate information page can be found here on GitHub.
You can find further information about the BLAST tool makeblastdb
here. makeblastdb
produces BLAST databases from FASTA files.