Name: Stable Audio 3 review
Item: Stable Audio 3
Rating: 7.8
Author: Marcus Feld

7.8

Good

Dubspot Review

Stable Audio 3

Stability AI · Open weights (free) / API

sound

8.0

workflow

8.0

value

9.0

innovation

8.0

The verdict: Stability AI's most useful audio release yet — genuinely fast, open-weight, and strong on ambient, foley, and loops. The big caveat is unchanged: no vocals, no lyrics.

Official site

Pros

Open-weight and runs locally — seconds per generation on an M4 MacBook Pro
Up to six-minute generations with strong long-form harmonic coherence
Excellent for ambient, foley, SFX, and instrumental loops
ComfyUI integration fits straight into existing pipelines

Cons

No vocals and no lyrics — the headline caveat
Rhythmic genres flatten: you get what you ask for, not the happy accidents
Best speed still wants a capable GPU

Buy it if

You want fast, open, local generative audio for ambient, SFX, and loops

Skip it if

You need vocals or full song structure — reach for Suno or Udio instead

See full specs, price, and alternatives in the plugin database →

Stability AI's audio program has always sat slightly off to the side of the company's image-generation work, and earlier Stable Audio releases reflected that. The models were interesting, but they were short, GPU-hungry, and not the kind of thing you would actually slot into a working session. Stable Audio 3, released on May 20, is a different proposition. It is the first time the company has shipped audio models that feel built for use in a studio rather than a research demo.

The headline news is straightforward. Three of the four models in the family ship as open weights, the small variants run on a regular CPU, and the medium model writes coherent tracks past the six-minute mark. The catch is also straightforward. None of the models do vocals or lyrics. If that disqualifies them for your work, the release is a non-starter. If it doesn't, Stable Audio 3 is one of the more genuinely usable AI audio tools to land this year.

The four models, and where each one fits

Stable Audio 3 is a model family rather than a single model, and the difference between them matters. Small SFX and Small Music are both 459 million parameters, run on CPU only, and generate up to two minutes of audio. Medium is a 1.4-billion-parameter GPU model that handles tracks beyond six minutes. Large is the 2.7-billion-parameter flagship and the only model that does not ship as open weights — it is available exclusively through Stability AI's API and self-hosting tier.

Small SFX is the workhorse of the family for sound designers. It generates short sound effects, foley elements, and ambient textures fast enough to make iteration feel realistic. Small Music handles loops, beds, and short cues. Medium is where the family starts to feel like a real composition tool, with the headroom to build something with structure rather than a single mood. Large extends that headroom further, but it is also where the licensing complexity kicks in.

The two small models are the most quietly important part of this release. The fact that they run on CPU without a discrete GPU means a producer with a recent MacBook Pro or a modest desktop can generate music locally, with no API call, no cloud cost, and no waiting in a queue. That is the kind of accessibility shift that changes how a tool gets used day-to-day.

On-device generation, finally fast enough

Stable Audio 2.0 was capable but slow, especially on consumer hardware. Stable Audio 3 closes that gap. The Medium model generates audio in under two seconds on an H200 GPU, and runs in a few seconds on a MacBook Pro M4. The Small models, on CPU only, fall a little behind that pace but stay inside the kind of latency that makes iteration feel natural rather than punishing.

That speed unlocks an interactive workflow that previous Stable Audio releases didn't support. You can prompt, listen, tweak the description, regenerate, and decide in under a minute. For a sound designer browsing for a foley element or a producer auditioning ambient beds for a transition, the inference time stops being the bottleneck. The bottleneck moves back to the prompt and to how clearly you can describe what you actually want.

Licensed training data and what you own

This is the area where Stable Audio 3 most clearly distinguishes itself from the competition. Stability AI says all four models were trained exclusively on fully licensed data, and the company has pinned its commercial positioning to that point. Under the Community License, users own the audio they generate and can commercialize it freely. Organizations with more than one million dollars in annual revenue need to step up to the Enterprise License, which includes legal indemnification.

For producers and small studios who have watched the AI music lawsuits unfold over the last two years, that clarity is meaningful. Stable Audio 3 is being offered with an explicit, written promise that the data underneath it has been licensed rather than scraped, and that the outputs are yours to use. Whether that promise matters more than the limitations the model imposes elsewhere is a personal call, but it is on the table in a way it is not with most competing tools.

ComfyUI integration on day one

ComfyUI day-zero support is the integration that turns Stable Audio 3 from an interesting model release into a usable production tool. Stability AI partnered with the ComfyUI team to ship official workflow templates for the Small SFX, Small Music, and Medium models when the release went live. The templates require ComfyUI version 0.22.0 or later, and they expose the model's text prompt and duration parameters in a way that a producer who has never written a line of Python can still drive.

The practical value of that integration is the rest of the ComfyUI ecosystem. Once Stable Audio 3 is a node in your graph, you can chain it with conditional logic, batch generation, custom processing, and other ComfyUI nodes that were not built with audio in mind but happen to be useful for it anyway. The model becomes part of a graph rather than a standalone app, and the workflow becomes reproducible.

What it actually sounds like

The most consistently praised use cases from working producers are the ones that play to the model's strengths rather than fight them. Ambient drones, foley and SFX, instrumental loops, and atmospheric beds come out well. Many of those generations exhibit the kind of harmonic coherence that older Stable Audio releases struggled with, especially over longer durations. Drum-and-bass and other rhythmic genres respond reasonably to prompting, but the rhythmic surprise that defines the genre tends to flatten into the average — you get what you ask for rather than what you didn't know you wanted.

The recurring complaint is that the music can feel correct without feeling interesting. Song structure and progression are weaker than what the competition delivers at the full-track level, and seed variation is more limited than you might hope — the same prompt across multiple seeds often produces near-identical melodies with slightly different decorations. None of this is fatal, but it is the honest read on where the model sits today.

The model also does not do vocals or lyrics. This is the single biggest editorial decision in the entire release. For producers who use AI tools to generate full songs with singing, Stable Audio 3 is not a competitor to the Suno or Udio family. For sound designers, instrumental producers, and anyone who treats AI audio as a sketching and source-material tool, the absence of vocals matters less.

LoRA fine-tuning and audio inpainting

The two features that most distinguish Stable Audio 3 from a straightforward generation model are LoRA fine-tuning and audio inpainting. LoRA, the lightweight fine-tuning technique that became standard in image generation, now ships as documented support for the Small and Medium models. That means a small studio can take the base model and adapt it to a particular library of source material — your own foley collection, your own ambient drone palette — without retraining from scratch.

Inpainting is the other quietly powerful feature. The model supports both single-segment and multi-segment edits, plus causal continuation, which means extending a generated track past its original endpoint without re-rolling from scratch. For producers building out longer compositions or trying to repair a section that almost worked, those tools are more useful than another round of full-track generation.

Where it lands

Stable Audio 3 is the most directly useful audio release Stability AI has shipped. It runs on hardware most producers already own, it integrates with ComfyUI on day one, it ships with documented LoRA support and inpainting, and the licensing posture is clear. The trade-off the company has made is to focus the entire family on instrumental music and sound design, leaving vocal generation to other tools.

That trade-off is what makes the release worth taking seriously. Stability AI has resisted the temptation to compete directly with Suno and Udio on full-song generation and built something narrower and more useful for the specific producer who cares about generative SFX, ambient material, loops, and instrumental sketches. For that audience, this release is one of the most credible open-weight audio tools currently available.

Producers who want to pair generative material with traditional sample libraries should look at Loopcloud and Loopmasters, both of which still cover the ground Stable Audio 3 deliberately doesn't. For the rest of the AI-adjacent production toolset, Plugin Boutique continues to be the most reliable starting point.

For broader context on the AI-music legal landscape that frames how releases like this one get read, our piece on the Suno and Udio lawsuits and what they mean for producers in 2026 walks through the cases and settlements that are still shaping the field.

Stable Audio 3 Is Stability AI's Most Useful Audio Release Yet — With One Big Caveat

The four models, and where each one fits

On-device generation, finally fast enough

Licensed training data and what you own

ComfyUI integration on day one

What it actually sounds like

LoRA fine-tuning and audio inpainting

Where it lands

More in Software

Ethno World 7 Complete Review: One of the Best Kontakt Libraries Ever Made

Native Instruments SuperStarSaw: A.G. Cook's Supersaw Synth

Melatonin Sine Machine: 10,000 Oscillators, No Knobs

Roland ZENOLOGY GX Brings ZEN-Core to iPad, Free for Now