The Atlantic has published a searchable database identifying millions of songs that appear in AI training sets, spotlighting the extent to which commercial music has been swept into machine-learning research. Reporter Alex Reisner traced four datasets—two comprising 12 million and 9 million tracks and two smaller sets exceeding 100,000 songs—that have been widely downloaded and, in some cases, used by researchers at companies including Google and Stability AI. Many of the collections are distributed as link lists to YouTube or Spotify, with developers relying on automated tools to extract audio—methods that can violate platform terms and sidestep monetization mechanisms for artists. The database includes works by major performers such as Lady Gaga, Radiohead, Wu-Tang Clan and Bruce Springsteen, underscoring rising legal and licensing tensions as AI spreads through the music industry.
Related articles:
MusicLM: Generating Music From Text (audio examples)
Artificial Intelligence and Copyright (U.S. Copyright Office)
Artificial Intelligence Act (EU)





























