Updating the reference libraries from RefSeq


Core and Tool 4 require reference libraries downloaded from RefSeq. The python scripts automate this process for you and deposit the new library in the reference_libraries folder. You need to update the Tool's config files so that they use the new libraries.


Making a new library

Reference libraries can easily be made as follows:

  1. Navigate to the _maintenance_ folder by entering cd ~/genome_tools/_maintenance_.
  2. Run the appropriate script using the command in the table below.
  3. Check that the following message appears: Reference library complete.

The commands are:

Reference Library Command
maxOne python3 maxOne_phylogeny.py
maxThree python3 maxThree_taxonomy.py
maxThree (for Tool 4) python3 maxThree_tool4.py

Reference libraries are automatically saved here:

~/genome_tools/reference_libraries

The newly generated libraries are saved with the current date.


Linking the Tools to the new library

Update the relevant config files to link the Tools to the newly generated reference libraries by following these three steps:

  1. Locate the config file in the File Browser.
  2. Double-click on nextflow.config to edit
  3. Update the relevent lines (see below), save and close.

For the Core taxonomy module

The taxonomy module uses the maxThree database.

maxThree has two parts:

  • params.fastanidbpath, the reference genome sequences (ending .fasta) and
  • params.fastanireflistpath, the names of the reference genomes grouped in files by Genus.

Be sure the update both, replacing YYYY-MM-DD:

params.fastanidbpath = "/file_path/fna_YYYY-MM-DD/*"

params.fastanireflistpath = "/file_path/fna_ref_lists_YYYY-MM-DD/*"


For the Core phylogeny module

The phylogeny module uses the maxOne database.

The location of the database is specified here:

params.lsbsrrefgenomefasta = "/file_path/fasta_YYYY-MM-DD/"

genomes must end in .fasta.

Example

The whole path to a fasta database built on 29th April 2022 is:

params.lsbsrrefgenomefasta = "../reference_libraries/fasta_2022-04-29/"


For Tool 4

Tool 4 uses the maxOne database.

The location of the database is specified here:

backgroundGenomesDir = "../reference_libraries/tool4_fasta_YYYY-MM-DD"

genomes must end in .fasta.