New features#

Major

NRPS/PKS Modules

For NRPS/PKS analysis in versions prior to 5.1, the loading domains (PKS_AT/AMP-binding/A-OX) have been used as the basis of all NRPS/PKS results, from loaded substrate to SMILES generation. For PKS_AT domains, there was also a modification system, but none existed for NRPS domains, nor were methyltransferases taken into account. Especially for trans-AT PKSs, this meant that modifications would not be made, since the modifications weren't in the same gene as the PKS_AT domain.

With antiSMASH 5.1 this has changed to instead use NRPS/PKS modules, each with:

  • an optional starter domain (a ketosynthase, condensation, or a more specific starter domain)
  • a loader domain (optional in the case of a PKS module, if the ketosythase is a trans-AT variety)
  • zero or more modification domains (methyltransferases, ketoreductases, etc)
  • a carrier protein
  • and, optionally, an epimerase, thioesterase or terminal reductase

A module is considered complete when it has all required parts, otherwise it's considered partial. Not all domains detected by the NRPS/PKS domain detection are considered part of a module (e.g. the docking terminal domains).

Modules are shown in HTML output, with partial modules having dashed outlines and complete modules having a "lid" shown by default (hidden when no NRPS/PKS predictions were made or the mouse is over the lid). The lid includes the loaded monomer with modifications, e.g. D-NMe-ala for a hypothetical adenylation domain loading alanine with both an N-methyltransferase domain and an epimerase in the module. If there is no consensus prediction for a specific gene, the loaded monomer will be shown as ?. All module lids can be hidden/shown using a toggle on the page (hotkey: m).

Modules with an iterative PKS_KS domain will also show a special iterative indicator.

The SMILES generated for a candidate cluster now use these monomers modified from substrates by the module, instead of just the loaded substrates. Only complete modules contribute their monomers to the product.

More information regarding the display and interpretation of these modules is available here.

Minor

Analysis:

  • added a domain for fungal non-reducing PKS product template (PT) domains (TIGR04532)
  • added detection of TfuA-related RiPPs as described here
  • added the PP-binding profile for carrier proteins as a fallback for when neither PCP nor ACP match

Visualisation:

  • added keyboard navigation shortcuts (w,e to change region; a,s to change detail tab; d,f to change sidepanel tab)

Fixes and small changes#

Analysis:

  • updated cutoffs for the Chal_sti_synth profile to be consistent
  • fixed NRPS/PKS Enedyne_KS profile missing a trusted cutoff
  • fixed NRPS/PKS PKS_KS domains not recording their subtype (e.g. iterative or enedyne)
  • fixed an edge case where protoclusters could be lost when forming a hybrid region
  • updated KnownClusterBlast database to MIBiG 2.0 dataset

Input handling:

  • improved error messages
    • better differentiation between no records and all records too small
    • more information in errors arising from GFF files with invalid formats
    • more information in some errors arising from bad feature locations
    • better information if file paths supplied are directories and not files
  • improved performance when reading GFF input files with many records
  • --hmmdetection-strictness setting retained when reusing previous results
  • files in commandline arguments checked earlier for permission/readability problems
  • allowed GenBank parsing to be more flexible in reading antiSMASH qualifiers with spacing altered by external tools
  • changed features with negative locations to raise errors instead of potentially giving incorrect results
  • fixed a crash related to existing NRPS/PKS annotations in GenBank inputs
  • fixed some input errors caused by a change in regex behaviour introduced in Python 3.7
  • fixed a crash when some prerequiste binaries were missing but attempts were made to use them
  • fixed a crash when reusing results and output directory structure wasn't present
  • fixed record-level annotations being lost when --start or --end were provided

Output:

  • improved antiSMASH meta-comment in GenBank output files

Visualisation:

  • changed resistance gene marking to use a symbol underneath instead of overriding gene colour
  • symbol-based annotations now use the same symbol in the legend
  • CDS identifiers, Stachelhaus codes and other easily confused text labels now use a serif font
  • strictness level used now shown in main overview page
  • added labels to Aminotran domains in the NRPS/PKS view
  • improved display when web results are slow to load or viewing results inside a zip file
  • ACPS domains are no longer coloured the same as carrier protein domains in the NRPS/PKS page
  • fixed NRPS/PKS domains tab being created with no results
  • miscellaneous small styling changes for more consistent appearance in different browsers
  • fix missing colouring for some clusterblast SVGs

Other:

  • changed the RiPP modules to rebuild their RODEO classifiers when necessary instead of forcing a specific scikit-learn version
  • allowed for changed behaviour when running tests on MacOS
  • fixed git commit version not being updated correctly when running development versions

Numerous other small changes and fixes were made internally, for a full list see the git shortlog.

5.1.1#

New features

  • included the PKS_PP domain from SMART when detecting ACP domains for NRPS/PKS domains

Fixes and small changes

  • fixed an issue with prepeptides in region-specific genbank outputs not adjusting locations of components correctly
  • fixed NRPS modules without condensation domains being considered a starter module when not the first module in a gene
  • fixed PKS modules not being considered complete if they were the first in a gene and were missing a PKS_KS domain