Stable Audio 2.0, an audio generation model for Stability AI, now lets users upload their own audio samples which they can then transform using prompts and create AI-generated songs. But the songs will not win any Grammys just yet.
The first version of Stable Audio was released in September 2023 and only offered up to 90 seconds for some paying users, which meant they could only make short sound clips to experiment with. Stable Audio 2.0 offers a full three-minute sound clip—the length of most radio-friendly songs. All uploaded audio must be copyright-free.
Unlike OpenAI’s audio generation model Voice Engine, which is only available to a select group of users, Stability AI made Stable Audio free and publicly available through its website and, soon, its API.
One big difference between Stable Audio 2.0 and its earlier iteration is the ability to create songs that sound like songs, complete with an intro, progression, and an outro, says Stability AI.
The company let me play a bit with Stable Audio to see how it works, and let’s just say there is still a long way to go before I can channel my inner Beyonce. With the prompt “folk pop song with American vibes” (I meant Americana, by the way), Stable Audio generated a song that, in some parts, does sound like it belongs in my Mountain Vibes Listening Wednesday Morning Spotify playlist. But it also added what I guess are vocals? Another The Verge reporter claims it sounds like whale sounds. I’m more worried I have accidentally summoned an entity into my home.
Here’s the song:
I theoretically could tweak the audio to make it more my listening style, as new features in Stable Audio 2.0 let users customize their project by adjusting prompt strength (aka how much the prompt should be followed) and how much of any uploaded audio it will modify. Users can also add sound effects like the roar of a crowd or keyboard taps.
Strange Gregorian whale noises aside, it’s not a surprise that AI-generated songs still feels soulless and weird. My colleague Wes Davis ruminated on this after listening to a song generated by Suno. Other companies, like Meta and Google, have also been dabbling in AI audio generation, but have not released their models publicly as they gather feedback from developers to respond to the soulless sound problem.
Stable AI said in a press release that Stable Audio is trained on data from AudioSparx, which has a library of more than 800,000 audio files. Stability AI maintains that artists under AudioSparx were allowed to opt out of their material to train the model. Training on copyrighted audio was one of the reasons Stability AI’s former vice president for audio, Ed Newton-Rex, left the company shortly after the launch of Stable Audio. For this version, Stability AI says it partnered with Audible Magic to use its content recognition technology to track and block copyrighted material from entering the platform.
Stability Audio 2.0 is better than its previous version at making songs sound like songs, but it’s not quite there yet. If the model insists on adding some sort of vocals, maybe the next version will have more discernable language.