Updating the reference libraries from RefSeq
Core and Tool 4 require reference libraries downloaded from RefSeq. The python scripts automate this process for you and deposit the new library in the reference_libraries
folder. You need to update the Tool's config files so that they use the new libraries.
Making a new library
Reference libraries can easily be made as follows:
- Navigate to the
_maintenance_
folder by enteringcd ~/genome_tools/_maintenance_
. - Run the appropriate script using the command in the table below.
- Check that the following message appears:
Reference library complete
.
The commands are:
Reference Library | Command |
---|---|
maxOne | python3 maxOne_phylogeny.py |
maxThree | python3 maxThree_taxonomy.py |
maxThree (for Tool 4) | python3 maxThree_tool4.py |
Reference libraries are automatically saved here:
~/genome_tools/reference_libraries
The newly generated libraries are saved with the current date.
Linking the Tools to the new library
Update the relevant config files to link the Tools to the newly generated reference libraries by following these three steps:
- Locate the config file in the File Browser.
- Double-click on
nextflow.config
to edit - Update the relevent lines (see below), save and close.
For the Core taxonomy module
The taxonomy module uses the maxThree database.
maxThree has two parts:
-
params.fastanidbpath
, the reference genome sequences (ending.fasta
) and -
params.fastanireflistpath
, the names of the reference genomes grouped in files by Genus.
Be sure the update both, replacing YYYY-MM-DD
:
params.fastanidbpath = "/file_path/fna_YYYY-MM-DD/*"
params.fastanireflistpath = "/file_path/fna_ref_lists_YYYY-MM-DD/*"
For the Core phylogeny module
The phylogeny module uses the maxOne database.
The location of the database is specified here:
params.lsbsrrefgenomefasta = "/file_path/fasta_YYYY-MM-DD/"
genomes must end in .fasta
.
Example
The whole path to a fasta database built on 29th April 2022 is:
params.lsbsrrefgenomefasta = "../reference_libraries/fasta_2022-04-29/"
For Tool 4
Tool 4 uses the maxOne database.
The location of the database is specified here:
backgroundGenomesDir = "../reference_libraries/tool4_fasta_YYYY-MM-DD"
genomes must end in .fasta
.