, , , , , , , ,

Apple, NVIDIA and Anthropic reportedly used YouTube transcripts without permission to train AI models

Some of the world’s largest tech companies trained their AI models on a dataset that included transcripts of more than 173,000 YouTube videos without permission, a new investigation from Proof News has found. The dataset, which was created by a nonprofit company called EleutherAI, contains transcripts of YouTube videos from more than 48,000 channels and was used by Apple, NVIDIA and Anthropic among other companies. The findings of the investigation spotlight AI’s uncomfortable truth: the technology is largely built on the backs of data siphoned from creators without their consent or compensation.

The dataset doesn’t include any videos or images from YouTube, but contains video transcripts from the platform’s biggest creators including Marques Brownlee and MrBeast, as well as large news publishers like The New York Times, the BBC, and ABC News. Subtitles from videos belonging to Engadget are also part of the dataset.

“Apple has sourced data for their AI from several companies,” Brownlee posted on X. “One of them scraped tons of data/transcripts from YouTube videos, including mine,” he added. “This is going to be an evolving problem for a long time.”

A Google spokesperson told Engadget that previous comments made by YouTube CEO Neal Mohan saying that companies using YouTube’s data to train AI models would violate the paltform’s terms and service still stand. Apple, NVIDIA, Anthropic and EleutherAI did not respond to a request for comment from Engadget.

So far, AI companies haven’t been transparent about the data used to train their models. Earlier this month, artists and photographers criticized Apple for failing to reveal the source of training data for Apple Intelligence, the company own spin on generative AI coming to millions of Apple devices this year.

YouTube, the world’s largest repository of videos, in particular, is a goldmine of not only transcripts but also audio, video, and images, making it an attractive dataset for training AI models. Earlier this year, OpenAI’s chief technology officer, Mira Murati, evaded questions from The Wall Street Journal about whether the company used YouTube videos to train Sora, OpenAI’s upcoming AI video generation tool. “I’m not going to go into the details of the data that was used, but it was publicly available or licensed data,” Murati said at the time. Alphabet CEO Sundar Pichai has also said that companies using data from YouTube to train their AI models would violate of the platform’s terms of service.

If you want to see if subtitles from your YouTube videos or from your favorite channels are part of the dataset, head over to the Proof News’ lookup tool

Update, July 16 2024, 3:17 PM PT: This story has been updated to include a statement from Google. 

This article originally appeared on Engadget at https://www.engadget.com/apple-nvidia-and-anthropic-reportedly-used-youtube-transcripts-without-permission-to-train-ai-models-170827317.html?src=rss

https://www.engadget.com/apple-nvidia-and-anthropic-reportedly-used-youtube-transcripts-without-permission-to-train-ai-models-170827317.html?src=rss


Leave a Reply

Your email address will not be published. Required fields are marked *

July 2024
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
293031  

About Us

Welcome to encircle News! We are a cutting-edge technology news company that is dedicated to bringing you the latest and greatest in everything tech. From automobiles to drones, software to hardware, we’ve got you covered.

At encircle News, we believe that technology is more than just a tool, it’s a way of life. And we’re here to help you stay on top of all the latest trends and developments in this ever-evolving field. We know that technology is constantly changing, and that can be overwhelming, but we’re here to make it easy for you to keep up.

We’re a team of tech enthusiasts who are passionate about everything tech and love to share our knowledge with others. We believe that technology should be accessible to everyone, and we’re here to make sure it is. Our mission is to provide you with fun, engaging, and informative content that helps you to understand and embrace the latest technologies.

From the newest cars on the road to the latest drones taking to the skies, we’ve got you covered. We also dive deep into the world of software and hardware, bringing you the latest updates on everything from operating systems to processors.

So whether you’re a tech enthusiast, a business professional, or just someone who wants to stay up-to-date on the latest advancements in technology, encircle News is the place for you. Join us on this exciting journey and be a part of shaping the future.

Podcasts

TWiT 988: Flaming Corn Maze – AT&T Breach, Galaxy Z Fold6, Olympic Disinfo This Week in Tech (Audio)

Galaxy Z Fold 6 launches for $1,899 with wider displays FTC bans anonymous messaging app NGL from hosting children AT&T says criminals stole phone records of 'nearly all' customers in new data breach EU accuses Elon Musk's X of deceptive practices over blue 'checkmark' After 41 years Microsoft quietly adds spellchecking and autocorrect to Windows Notepad AI PCs: Qualcomm (QCOM), Microsoft (MSFT) Turn to AI to Revive PC Market Goldman Sachs: AI Is Overhyped, Wildly Expensive, and Unreliable U.S. says Russian bot farm used AI to impersonate Americans Disinfo spreaders set their sights on Paris Olympics My 28,000-follower Twitter account was hacked—and it changed my life for the better Is anyone concerned that Palmer Luckey's new compay Anduril (aka Aragorn's sword from LOTR) is making military products and has a mission statement straight out of Robocop? Apple now makes it easier to switch from Google Photos to iCloud Photos FTC Fires A Warning Shot At Eight Companies Over 'Right To Repair' Violations Host: Leo Laporte Guests: Mike Elgan, Denise Howell, and Harry McCracken Download or subscribe to this show at https://twit.tv/shows/this-week-in-tech Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsors: e-e.com/twit motific.ai bitwarden.com/twit ziprecruiter.com/twit
  1. TWiT 988: Flaming Corn Maze – AT&T Breach, Galaxy Z Fold6, Olympic Disinfo
  2. TWiT 987: Often Plagiarized, Never Equalled – Sapce Junk, Threads Hits 175M Users, AIndependence
  3. TWiT 986: Our Dope GPS! – Supreme Court Decisions, Snapdragon X Elite Tests
  4. TWiT 985: TikTok With Wings – AT&T Landlines, US Bans Kaspersky and DJI
  5. TWiT 984: Fifty-three Clicks – Bot Farms in Ukraine, LA Public Health Dept. Phished