The following changes include all those made between antiSMASH 7.0 and antiSMASH 8.0.

Important notes#

This release increases the minimum python version to 3.11.

New features#

Major#

Full circular record support

Previously, antiSMASH would treat all inputs as linear, cutting any cross-origin features into two parts. For circular sequences, this risked splitting and/or missing cluster predictions, especially in cases where sequences weren't rotated to the biological origin.

In antiSMASH 8.0, all inputs in Genbank format with a topology annotation of circular will have new handling. This allows for protoclusters and genes, if genefinding is used, to be detected over the origin. All analyses also handle these cross-origin regions, where relevant.

Any cross-origin regions will be displayed in the overview visualisation, paired with additional changes to the nucleotide axis in the region view.

In some special cases, e.g. dense circular plasmids, the full contig can be covered by protoclusters. If one or more of those protoclusters crosses the origin, then it will be shown in the record overview as a full contig region, and the cross-origin protoclusters marked in the region view.

Terpene analysis

A new terpene analysis module has been added. This extends the predictions made by antiSMASH to include details about particular synthases and even compounds, where possible.

Gene function updates

Gene function classification has been extended to detect previously undetected functions, as well as extending details of the predicted function(s).

  • A new comparison dataset, using the MITE database
  • More accurate and more detailed predictions of halogenases
  • Broader detection of other functions not previously covered by SMCoGs

A new results panel is included in the HTML output. This lists all categories and subcategories of tailoring functions present in each region. Further details about each particular gene, and the tools contributing to those categories are also available when expanding the details within the displayed results.

Deprecations and removals

GlimmerHMM has been removed as a fungal genefinding option, as it performs much worse than other fungal genefinding tools that can use other data about the provided sequence. This functionality has not been replaced. It is expected that users will perform genefinding with these more context-aware tools prior to antiSMASH analysis.

The MEME suite of software, sepcifically MEME and FIMO, are no longer assumed to be present by default. Users can still provide their own binaries for local installations of these, but the webservice and docker containers will no longer use these tools. This change disables the fungal-only CASSIS border prediction tool and affects RODEO RiPP precursor analysis.

Other removals include raw diamond results for Clusterblast files have been removed, along with SVG files, as they have either been replaced or were redundant.

Minor#

Detection rules

  • Added rules: archeael-RiPP, nitropropanoic acid, azoxy-dimer, azoxy-crosslink, polyyne, deazapurine, polyhalogenated-pyrrole, hydroxytropolone, hydrogen-cyanide, darobactin, isocyanide, fungal_CDPS, triceptide, lysine, CDPS, and HR-T2PKS
  • Updated rules: terpene, mycosporine_like, NI-siderophore, transAT-PKS, NRPS, NRPS-like, and fatty_acid
  • Added support for "extender" conditions to rules, for cases where a continguous section of some core genes might be too far from the other core genes to be properly marked
  • Added extenders to trans-AT PKS rule to mark all genes in long chains of trans-AT modules as core

Additionally, new command line options are available to limit detected cluster types to a subset:

  • --hmmdetection-limit-to-rule-names: restricts protoclusters detected to the named types
  • --hmmdetection-limit-to-rule-categories: restricts protoclusters detected to the named categories

For more detailed information on rules themselves, see the glossary.

Analysis:

  • New functionality for detection rules, allowing rules with many possible repeated elements (e.g. transAT-PKS) to extend further away from other required components
  • New NRPS/PKS domains, including Interface and IBH_Asp domains which improve confidence in NRPS modules involving multiple genes
  • New activity predictions for NRPS condensation domains (see the full list for details)
  • Updated the TFBS module with new binding sites based on the CollectTF database
  • Updated ClusterCompare database to MIBiG 4.0
  • Updated Clusterblast known database to MIBiG 4.0
  • Updated the CompaRiPPson database to MIBiG 4.0
  • Improved ClusterCompare scoring of references for which only a single gene match is present
  • Included NRPS/PKS modules containing CoA-ligase domains in polymer predictions
  • Improved merging of multi-gene Type I PKS modules
  • Improved detection of lanthipeptide precursors
  • Updated Pfam version to 35.0, with matching GO mappings updated to 2023-07-05 release

Output:

  • Removed Clusterblast SVG output files due to file sizes and counts. The equivalent functionality is present in the HTML visualisation, from where individual SVGs can be exported.
  • Removed Clusterblast results from Genbank output file annotations
  • Added additional results and information to JSON output

Visualisation:

  • Added links to submit an antiSMASH-database BLAST query for each gene's details view
  • Replaced quantitative cluster similarity metric on main overview page with a qualitative value
  • Added links out to PARAS, another adenylation domain substrate predictor, have been added to NRPS domain tooltips
  • Improved interactivity of ClusterBlast SVGs, at the same time removing the many generated SVG files
  • Improved visualisation of CompaRiPPson results
  • Improved visualisation of sactipeptide and lassopeptide precursor sequences

Other:

Fixes and small changes#

Analysis:

  • Changed protocluster-to-region ClusterCompare scoring for regions with a single protocluster to match that of region-to-region comparisons
  • Fixed an issue leading to lassopeptide precursor detection being too generous
  • Changed rule categories for larylpolyenes and PUFAs
  • Changed ladderane rule to be more specific
  • Changed the neighbourhood range of NI-siderophores to be wider in order to cover all of the known clusters

Input handling:

  • Added support for reusing previous results compressed with bzip2
  • Added handling of inputs with extremely long gene names to prevent crashing in third-party dependencies, however it is strongly recommended to avoiding annotating genes in this way
  • Added an absolute limit to contig identifiers which would cause output filenames to have lengths longer than operating systems support
  • Improved error messages for some types of input errors
  • Improved auto-generation of gene naming when genefinding to avoid some issues in HTML output

Output:

  • Added parent record annotations to per-region Genbank output files, including "source" features, if present

Visualisation:

  • Removed NCBI genomic context links by default, due to being non-functional for non-NCBI inputs (NOTE: enabled on the webservice for inputs fetched directly from the NCBI)
  • Removed links to the defunct SEARCHGTr website

Other:

  • Modified some command line arguments to be explicitly disabled as well as enabled (e.g. --verbose and --no-verbose)
  • Improved performance, particularly for multi-contig inputs
  • Added graceful exiting on keyboard interrupt

Numerous other small changes and fixes were made internally, for a full list see the git shortlog.