, , , , , , , ,

Search engines that don’t pay up can’t index Reddit content

When Reddit said last month that it would block unauthorized data scraping from its site, everyone’s (rightful) first reaction was “AI, AI, AI.” However, now that the change has taken effect, chatbot makers aren’t the only ones being locked out. The widely used forum also appears to be blocking all search engines other than Google, which reportedly inked a deal earlier this year with Reddit worth $60 million annually.

404 Media reported on Wednesday (and Engadget confirmed in our queries) that searching for Reddit results from the past week on rival engine Bing (using “site:reddit.com”) returns empty results. The publication reported that DuckDuckGo produced seven links without any descriptions, only providing the note, “We would like to show you a description here but the site won’t allow us.” The engine now appears to have removed even those, as our test only produced an empty page, reading, “no results found.”

When Reddit said last month that it would update its Robots Exclusion Protocol (robots.txt) to block automated data scraping, it’s now apparent that it wasn’t only meant to thwart AI companies like Perplexity and its controversial “answer engine.” Currently, Google appears to be the only search engine allowed to crawl Reddit and produce results from “the front page of the internet.”

Ironically, part of the forum website’s robots.txt file reads, “Reddit believes in an open internet, but not the misuse of public content.” The file for Reddit now essentially says, “Do not scrape.” Apparently, it now considers search engines that don’t buy into exclusive deals to be misusing its content.

The ubiquitous robots.txt is the web standard that communicates which parts of a site can be crawled. Although many crawlers are known to ignore its instructions, Google’s standard procedure is to respect it. So, on the technical side, the companies in cahoots on the lucrative deal appear to have deployed some manual override.

Of course, the saga is a trickle-down effect of AI chatbots scraping the live web for results. With courts slow to determine how much of the open web is fair use to train chatbots on, companies like Reddit, whose bottom lines now depend on safeguarding their data from those who don’t pay, are building walls at the expense of the open web. (Although, given the integral role Microsoft has played in this AI era, cozying up with OpenAI early on, it seems ironic that Bing finds itself on the losing end of at least one aspect of the fallout.)

Colin Hayhurst, CEO of lesser-known “no-tracking” search engine Mojeek, told 404 Media that Reddit is “killing everything for search but Google.” In addition, the executive said his attempts to contact Reddit were ignored. “It’s never happened to us before,” he said. “Because this happens to us, we get blocked, usually because of ignorance or stupidity or whatever, and when we contact the site you certainly can get that resolved, but we’ve never had no reply from anybody before.”

Engadget asked Google and Reddit for comment and confirmation, but we hadn’t heard back by publication. 404 Media reported running into a similar wall of silence from the companies.

Reddit has made no secret of its desire to block AI companies from scraping its treasure trove of data in this burgeoning age of AI. Last year, CEO Steve Huffman risked alienating large portions of its user base by blocking third-party API requests, leading to the demise of beloved apps like Christian Selig’s Apollo. Despite widespread protests among moderators and forum-goers, the company only temporarily lost negligible numbers of users.

The gamble appeared to pay off, and Reddit recovered. It went public in March.

This article originally appeared on Engadget at https://www.engadget.com/search-engines-that-dont-pay-up-cant-index-reddit-content-172949170.html?src=rss

https://www.engadget.com/search-engines-that-dont-pay-up-cant-index-reddit-content-172949170.html?src=rss


Leave a Reply

Your email address will not be published. Required fields are marked *

July 2024
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
293031  

About Us

Welcome to encircle News! We are a cutting-edge technology news company that is dedicated to bringing you the latest and greatest in everything tech. From automobiles to drones, software to hardware, we’ve got you covered.

At encircle News, we believe that technology is more than just a tool, it’s a way of life. And we’re here to help you stay on top of all the latest trends and developments in this ever-evolving field. We know that technology is constantly changing, and that can be overwhelming, but we’re here to make it easy for you to keep up.

We’re a team of tech enthusiasts who are passionate about everything tech and love to share our knowledge with others. We believe that technology should be accessible to everyone, and we’re here to make sure it is. Our mission is to provide you with fun, engaging, and informative content that helps you to understand and embrace the latest technologies.

From the newest cars on the road to the latest drones taking to the skies, we’ve got you covered. We also dive deep into the world of software and hardware, bringing you the latest updates on everything from operating systems to processors.

So whether you’re a tech enthusiast, a business professional, or just someone who wants to stay up-to-date on the latest advancements in technology, encircle News is the place for you. Join us on this exciting journey and be a part of shaping the future.

Podcasts

TWiT 989: Executive Laundry Folding Disorder – Crowdstrike, Prime Day, Stremaing the Olympics This Week in Tech (Audio)

Biden drops out of the race via tweet "CrowdStrike update that caused global outage likely skipped checks, experts say Amazon's Prime Day causes worker injuries, Senate probe finds Kaspersky Lab Closing U.S. Division; Laying Off Workers Researchers: Weak Security Defaults Enabled Squarespace Domains Hijacks The Paris Olympics Will Show Us the Future of Sports on TV Judge dismisses much of SEC suit against SolarWinds over cybersecurity disclosures FBI Used New Cellebrite Software to Access Trump Shooter's Phone Hundreds more Californians get housing with Apple's help Microsoft's new AI system 'SpreadsheetLLM' unlocks insights from spreadsheets, boosting enterprise productivity Bethesda Game Studios workers unionize under Communications Workers of America Host: Leo Laporte Guests: Lisa Schmeiser, Ashley Esqueda, and Anthony Ha Download or subscribe to this show at https://twit.tv/shows/this-week-in-tech Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsors: canary.tools/twit – use code: TWIT lookout.com mintmobile.com/twit NetSuite.com/TWIT motific.ai
  1. TWiT 989: Executive Laundry Folding Disorder – Crowdstrike, Prime Day, Stremaing the Olympics
  2. TWiT 988: Flaming Corn Maze – AT&T Breach, Galaxy Z Fold6, Olympic Disinfo
  3. TWiT 987: Often Plagiarized, Never Equalled – Sapce Junk, Threads Hits 175M Users, AIndependence
  4. TWiT 986: Our Dope GPS! – Supreme Court Decisions, Snapdragon X Elite Tests
  5. TWiT 985: TikTok With Wings – AT&T Landlines, US Bans Kaspersky and DJI