New features#

Major

NRPS/PKS Modules

For NRPS/PKS analysis in versions prior to 5.1, the loading domains (PKS_AT/AMP-binding/A-OX) have been used as the basis of all NRPS/PKS results, from loaded substrate to SMILES generation. For PKS_AT domains, there was also a modification system, but none existed for NRPS domains, nor were methyltransferases taken into account. Especially for trans-AT PKSs, this meant that modifications would not be made, since the modifications weren't in the same gene as the PKS_AT domain.

With antiSMASH 5.1 this has changed to instead use NRPS/PKS modules, each with:

an optional starter domain (a ketosynthase, condensation, or a more specific starter domain)
a loader domain (optional in the case of a PKS module, if the ketosythase is a trans-AT variety)
zero or more modification domains (methyltransferases, ketoreductases, etc)
a carrier protein
and, optionally, an epimerase, thioesterase or terminal reductase

A module is considered complete when it has all required parts, otherwise it's considered partial. Not all domains detected by the NRPS/PKS domain detection are considered part of a module (e.g. the docking terminal domains).

Modules are shown in HTML output, with partial modules having dashed outlines and complete modules having a "lid" shown by default (hidden when no NRPS/PKS predictions were made or the mouse is over the lid). The lid includes the loaded monomer with modifications, e.g. D-NMe-ala for a hypothetical adenylation domain loading alanine with both an N-methyltransferase domain and an epimerase in the module. If there is no consensus prediction for a specific gene, the loaded monomer will be shown as ?. All module lids can be hidden/shown using a toggle on the page (hotkey: m).

Modules with an iterative PKS_KS domain will also show a special iterative indicator.

The SMILES generated for a candidate cluster now use these monomers modified from substrates by the module, instead of just the loaded substrates. Only complete modules contribute their monomers to the product.

More information regarding the display and interpretation of these modules is available here.

Minor

Analysis:

added a domain for fungal non-reducing PKS product template (PT) domains (TIGR04532)
added detection of TfuA-related RiPPs as described here
added the PP-binding profile for carrier proteins as a fallback for when neither PCP nor ACP match

Visualisation:

added keyboard navigation shortcuts (w,e to change region; a,s to change detail tab; d,f to change sidepanel tab)

Fixes and small changes#

Analysis:

updated cutoffs for the Chal_sti_synth profile to be consistent
fixed NRPS/PKS Enedyne_KS profile missing a trusted cutoff
fixed NRPS/PKS PKS_KS domains not recording their subtype (e.g. iterative or enedyne)
fixed an edge case where protoclusters could be lost when forming a hybrid region
updated KnownClusterBlast database to MIBiG 2.0 dataset

Input handling:

improved error messages
- better differentiation between no records and all records too small
- more information in errors arising from GFF files with invalid formats
- more information in some errors arising from bad feature locations
- better information if file paths supplied are directories and not files
improved performance when reading GFF input files with many records
--hmmdetection-strictness setting retained when reusing previous results
files in commandline arguments checked earlier for permission/readability problems
allowed GenBank parsing to be more flexible in reading antiSMASH qualifiers with spacing altered by external tools
changed features with negative locations to raise errors instead of potentially giving incorrect results
fixed a crash related to existing NRPS/PKS annotations in GenBank inputs
fixed some input errors caused by a change in regex behaviour introduced in Python 3.7
fixed a crash when some prerequiste binaries were missing but attempts were made to use them
fixed a crash when reusing results and output directory structure wasn't present
fixed record-level annotations being lost when --start or --end were provided

Output:

improved antiSMASH meta-comment in GenBank output files

Visualisation:

changed resistance gene marking to use a symbol underneath instead of overriding gene colour
symbol-based annotations now use the same symbol in the legend
CDS identifiers, Stachelhaus codes and other easily confused text labels now use a serif font
strictness level used now shown in main overview page
added labels to Aminotran domains in the NRPS/PKS view
improved display when web results are slow to load or viewing results inside a zip file
ACPS domains are no longer coloured the same as carrier protein domains in the NRPS/PKS page
fixed NRPS/PKS domains tab being created with no results
miscellaneous small styling changes for more consistent appearance in different browsers
fix missing colouring for some clusterblast SVGs

Other:

changed the RiPP modules to rebuild their RODEO classifiers when necessary instead of forcing a specific scikit-learn version
allowed for changed behaviour when running tests on MacOS
fixed git commit version not being updated correctly when running development versions

Numerous other small changes and fixes were made internally, for a full list see the git shortlog.

5.1.1#

New features

included the PKS_PP domain from SMART when detecting ACP domains for NRPS/PKS domains

Fixes and small changes

fixed an issue with prepeptides in region-specific genbank outputs not adjusting locations of components correctly
fixed NRPS modules without condensation domains being considered a starter module when not the first module in a gene
fixed PKS modules not being considered complete if they were the first in a gene and were missing a PKS_KS domain