More than 170,000 YouTube videos are part of a massive dataset that was used to train AI systems for some of the biggest technology companies, according to an investigation by Proof News and copublished with Wired. Apple, Anthropic, Nvidia, and Salesforce are among the tech firms that used the “YouTube Subtitles” data that was ripped from the video platform without permission. The training dataset is a collection of subtitles taken from YouTube videos belonging to more than 48,000 channels — it does not include imagery from the videos.
...