Automated genome mining · candidate prioritization

Surfacing the bacterial chemistry we haven't
characterized yet

ALCHEMY scans bacterial genomes for biosynthetic gene clusters — the factories that make natural products — and ranks the ones with no match in MIBiG, the curated reference of experimentally studied clusters. The output is a shortlist of candidates worth investigating, not finished discoveries.

6
Candidate BGCs flagged
0
Hits in curated MIBiG
5
Genomes mined
3,059
Clusters MIBiG actually covers
// The top-ranked candidates

Two clusters worth a closer look

Both came back with no hit against MIBiG (dual ClusterBlast + ClusterCompare). Important caveat: MIBiG only holds 3,059 hand-curated, experimentally studied clusters — a tiny slice of the millions in nature — so "no MIBiG hit" means under-characterized, not proven novel. They haven't yet been checked against the full predicted universe (antiSMASH-DB, BiG-FAM). These are leads to investigate.

NO MIBiG HIT

"phycolactam"

GCF_042055075.2 (NCBI: Roseobacter phycocola) · region 3
antiSMASH classes this as a hybrid β-lactone + NRPS cluster (3 modules + a glycosyltransferase) with no MIBiG match — the top-ranked candidate to investigate. The code-name is a provisional label, not a structural claim.
Cluster size52,664 bp
Coding genes43 CDS
antiSMASH classβ-lactone / NRPS
vs MIBiGNo match
NO MIBiG HIT

"silazactam"

GCF_055394375.1 (NCBI: Marinobacter alkaliphilus) · region 4
A compact β-lactone cluster (propionyl-CoA synthetase + leuA signature) with no MIBiG match. Provisional code-name only — nothing here has been expressed, isolated or structurally confirmed.
Cluster size24,249 bp
Coding genes22 CDS
antiSMASH classβ-lactone
vs MIBiGNo match
// The method

A discovery engine, fully automated

Five scripts take a marine genome from public database to a ranked, novelty-scored shortlist of candidate new chemistry — no manual lab work, no local installs.

01
🧬

Pick genomes

Pull marine bacterial assemblies from NCBI by ecology + novelty potential.

02
🔬

antiSMASH

Submit to the antiSMASH web service to detect every biosynthetic gene cluster.

03
📊

MIBiG diff

Dual ClusterBlast + ClusterCompare against the global known-cluster catalogue.

04
🏆

Rank

Product-class-weighted scoring surfaces the strongest zero-hit candidates.

05
📄

Assemble

Auto-build a preprint-ready manuscript with gene tables and figures.

// The scan

Genomes put through the pipeline

A pilot batch of five marine bacterial genomes submitted to antiSMASH. Three completed runs alone surfaced six zero-hit candidate clusters.

Organism
Assembly
Status
Roseobacter phycocola
GCF_042055075.2
headline find
Marinobacter alkaliphilus
GCF_055394375.1
novel BGC
Pseudoalteromonas sp. TO-2024
GCA_055398285.1
complete
Salinispora sp. CH2A1_3
GCF_056820995.1
queued
Salinispora sp. CH2A1_6
GCF_056820975.1
queued
// What this is — and isn't

A candidate, not a discovery

Being honest about the science is the whole point. Here's exactly what a "no MIBiG hit" does and doesn't mean.

📉

Absence ≠ novelty

MIBiG curates only ~3,000 experimentally studied clusters. Nature holds millions. "No MIBiG hit" is the normal outcome for most clusters — it flags under-characterized, not new.

🔬

Predicted, not observed

These are antiSMASH predictions from genome data. Nothing has been expressed, isolated or had its structure determined. You can't legitimately name a molecule that's never been seen.

🧭

A triage engine

The honest framing: ALCHEMY ranks under-characterized clusters for follow-up. Next step is comparing against antiSMASH-DB / BiG-FAM and, eventually, wet-lab work.