New features#

Major#

ClusterCompare

ClusterCompare is a new algorithm for cluster comparison, taking into account presence of biosynthetic profiles, NRPS/PKS module counts and layouts, and gene functions along with the sequence identity and synteny used by the clusterblast module.

The range of scores for any pairing is between 0 and 1, with 1 being a theoretical perfect score. When comparing multiple protoclusters against a single region, the score may be higher and mousing over each protocluster's result will show the 0-1 score for that particular protocluster.

ClusterCompare has been initially released with a comparison database for MIBiG 2.0 (--cc-mibig), but includes a script for easily generating custom databases (which can be used with --cc-custom).

RREfinder

The precision mode of RREfinder was added and can be run with --rre. This will detect and annotate RiPP Recognition Elements within RiPP protoclusters.

TIGRfam detection

Annotations based on the TGIRfam database can now added to a run (with --tigrfam). These annotations will be present in the genbank output and also be shown in the gene detail panel on the top right of the HTML output.

Annotation sideloading

A method of running antiSMASH analyses on genomic areas that antiSMASH does not detect a cluster in is now possible via the use of sideloaded annotations. These annotations, provided in a JSON format, can contain protoclusters and/or subregions along with extra details to attach to those regions. One or more sideload external annotation files with --sideloader.

Clusterfinder

The clusterfinder module has been removed due to being often misinterpreted or misleading.

Minor#

Analysis:

  • a number of additions and changes were made to detected RiPP protoclusters, see the glossary page for details

Output:

  • JSON output now includes a simple representation of all detected areas
  • output filenames can be customised with --output-basename instead of defaulting to the input name

Visualisation:

  • Pfam domains for the region are shown in a detail tab similar to the existing NRPS/PKS domains
  • Pfam domain hits for a single gene are now included in the gene detail panel of the HTML output
  • added a button to download SVG images in the HTML output

Other:

  • detection rules can now create and use aliases for groups of profiles

Fixes and small changes#

Input handling:

  • improved error messages for invalid inputs
  • improved handling of non-standard genbank inputs

Other:

  • a workaround has been added for a MacOS specific issue using python 3.8+ with --cpus set higher than 1

Numerous other small changes and fixes were made internally, for a full list see the git shortlog.