Millions of Songs Were Used to Train AI Music
The Atlantic published databases showing tens of millions of tracks — including Taylor Swift and Bad Bunny — were fed into AI music tools like Suno and Udio.
Evgenii Arsentev · PhDMany millions of recorded songs — work by artists ranging from Taylor Swift to Bad Bunny — were used to train AI music generators without the artists' permission, according to an investigation The Atlantic published on June 15. The magazine released four searchable databases documenting the material: the largest lists 12 million tracks, a second holds 9 million, and two more contain roughly 100,000 songs each. The tools fed by this data include Suno, Udio and models from Google.
The reporting, by staff writer Alex Reisner, is the music equivalent of the book-scraping investigations the same outlet ran earlier — except the scale here is larger and the catalog more recognizable. The databases don't just name obscure demos; they reach into the commercial back catalog that streaming services sell every day. Suno and Udio are already being sued by major record labels for copyright infringement, and both have leaned on a 'fair use' defense — the same argument that is being fought over in courtrooms across the AI industry.
Why this matters beyond the lawsuits
The legal stakes are not abstract. A parallel copyright fight in book publishing produced a $1.5 billion settlement involving Anthropic, which gives you a sense of the numbers labels will be chasing. But the part that touches ordinary listeners is quieter: AI tracks trained on real artists are starting to compete with those artists inside the same playlists you already use. Streaming platforms have noticed — Spotify is testing tools to help artists manage AI content, Deezer has built AI-music detection, and Apple Music flags AI tracks when distributors opt in — but those measures are early and uneven, and they haven't stopped bad actors from cloning a singer's style on demand.
What makes this investigation useful, rather than just alarming, is that it turns a vague worry into something you can check. Until now, an artist who suspected their catalog had been swallowed by a model had no way to confirm it; a searchable list changes that, and it's exactly the kind of evidence that moves a copyright case from 'we think' to 'here it is.' My one honest take: the music industry spent two years debating whether this was happening, and the more productive question now is on what terms it gets licensed — because the data is plainly already inside the models, and no settlement un-trains them.
If you make music, search the databases for your own name before you do anything else — concrete proof is worth more than outrage, and it's the thing labels and lawyers can act on. If you only listen, get in the habit of checking the artist label on a track before you add it to a playlist; the 'AI-generated' tags are imperfect, but they're the first practical line between a song a person wrote and one a model assembled from millions of stolen ones.
Related guides

Author
Evgenii Arsentev
PhD · Chief Product Officer at a tech company
Want to actually build this?
Guides explain. The free course transforms — personalized, gamified, and built to get you shipping fast.
◉ Start the free courseSource: engadget.com