Reddit起诉Perplexity AI，指控其违规抓取平台内容训练人工智能。

qimuai 发布于 2025-10-23 09:01 阅读：181 一手编译

内容来源：https://www.theverge.com/news/804660/reddit-suing-perplexity-data-scrapers-ai-lawsuit

内容总结：

近日，社交媒体平台Reddit正式起诉人工智能公司Perplexity及三家数据抓取服务商，指控其通过非法手段大规模窃取平台受版权保护的内容。

Reddit在提交至加州联邦法院的诉状中指出，数据抓取公司Oxylabs、SerpApi和AWMProxy采取"工业级非法手段"规避平台数据保护措施，并将这些公司比作"企图抢劫银行的歹徒"。诉状称Perplexity作为上述公司的客户，"为获取训练AI模型所需的Reddit内容不择手段"，拒绝像其他AI公司那样与Reddit达成合法授权协议。

值得注意的是，Reddit自2023年起已与OpenAI、谷歌等多家AI公司达成数据授权合作。该公司首席法务官本·李在声明中强调："AI公司正陷入优质人类内容的军备竞赛，这种压力催生了工业级'数据洗白'产业链。"他特别指出，被告方通过伪装身份、隐藏地理位置等技术手段，从谷歌搜索结果中窃取Reddit内容。

诉讼文件披露了一个关键证据：Reddit在今年5月向Perplexity发出禁止抓取数据的律师函后，该平台对Reddit内容的引用量不降反升。Reddit还通过技术测试发现，其设置的仅对谷歌可见的测试内容在数小时内就被Perplexity的"答案引擎"抓取并展示。

Perplexity传播总监杰西·德怀尔回应称尚未收到诉讼文件，同时强调"将坚决捍卫用户自由公平获取公共知识的权利"。此次诉讼凸显了AI行业发展过程中，数据获取的合法边界问题正成为行业竞争的新焦点。

中文翻译：

根据起诉书内容，Reddit正在起诉人工智能公司Perplexity及三家"数据抓取服务提供商"，指控这些"恶意行为者实施工业级规模的非法数据保护规避行为，为获取Reddit平台上有价值的版权内容不择手段"。

Reddit指控Perplexity窃取内容训练人工智能
Reddit正在将用户发帖售予人工智能公司，同时起诉那些拒绝付费的企业。

诉状中将SerpApi、Oxylabs和AWMProxy三家数据抓取公司比作"预谋抢劫银行的匪徒"——"明知无法突破银行金库，便转而袭击运载现金的装甲车"。Reddit声称Perplexity是其中至少一家数据抓取公司的客户，指出该公司"为获取其问答引擎急需的Reddit数据显然不择手段——具体而言，就是拒绝像其竞争对手那样与Reddit直接达成授权协议"。

诉讼文件显示，Reddit曾于2024年5月向Perplexity发出禁止函，要求其停止抓取Reddit数据。虽然Perplexity当时承诺未使用Reddit内容训练AI模型并保证遵守robots.txt协议，但此后其在问答中引用Reddit内容的频率不降反增。Reddit还特别设置了一篇仅允许谷歌爬取的帖子，该公司表示"几小时内"Perplexity就"复现了该帖内容"。

"Perplexity能够获取该Reddit内容并用于其问答引擎的唯一可能，就是其与共同被告通过抓取谷歌搜索结果获取数据，并迅速将内容整合至问答系统。"Reddit在诉状中写道。

Reddit深知其平台数据——由用户创作并投票排序的海量主题帖子——对AI模型训练具有重要价值。2023年引发抗议的API接口调整，正是该公司实现数据变现的举措。目前Reddit已与OpenAI和谷歌等AI企业达成授权协议，且据传正在寻求更优厚的合作条件。此前该平台还曾对Anthropic采取法律行动，指控其机器人违反承诺擅自访问Reddit平台。

"AI公司正陷入优质人类内容军备竞赛，这种压力催生了工业级的数据洗白产业链，"Reddit首席法务官本·李在声明中表示，"数据抓取者突破技术防护窃取数据，再转售给渴求训练材料的客户。Reddit因其作为人类史上最大规模动态对话集合体而成为主要目标。"

"被告Oxylabs UAB、AWM Proxy和SerpAI——分别是立陶宛数据抓取商、前俄罗斯僵尸网络运营商、公开宣扬规避手段的灰色企业——堪称此类违法行为的典型范例，"李强调，"由于无法直接抓取Reddit，他们通过伪装身份、隐藏位置、篡改爬虫程序等手段从谷歌搜索盗取内容。Perplexity则至少与其中一家抓取商保持合作，选择购买赃物而非与Reddit建立合法授权关系。"

Perplexity传播总监杰西·德怀尔向The Verge回应："虽未正式收到诉状，但我们将始终坚定捍卫用户自由公平获取公共知识的权利。在通过精准AI提供事实性答案的过程中，我们始终保持原则性与责任感，绝不会容忍任何对开放精神和公共利益的威胁。"

热门资讯

微软连Xbox开发机都大幅涨价
泄露文件显示亚马逊拟用机器人取代60万美国员工
三星Galaxy XR上手体验：廉价版苹果Vision Pro今日上市
OpenAI推出AI浏览器ChatGPT Atlas
通用汽车将全面弃用CarPlay和Android Auto，不限于电动车

英文来源：

Reddit is suing Perplexity and three “data-scraping service providers” to “stop the industrial-scale, unlawful circumvention of data protections by a group of bad actors who will stop at nothing to get their hands on valuable copyrighted content on Reddit,” according to the complaint.
Reddit sues Perplexity for allegedly ripping its content to feed AI
Reddit is selling your posts to AI companies, and now it’s suing to stop the ones who won’t pay.
Reddit is selling your posts to AI companies, and now it’s suing to stop the ones who won’t pay.
The company equates the data scraping companies — SerpApi, Oxylabs, and AWMProxy — to “would-be bank robbers” who “knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead.” Reddit alleges that Perplexity is a customer of “at least one” of the data scraping companies, saying that it “will apparently do anything to get the Reddit data it desperately needs to fuel its ‘answer engine’ — that is, anything other than enter into an agreement with Reddit directly, as some of its competitors have done.”
According to the lawsuit, Reddit sent a cease-and-desist letter to Perplexity in May 2024 “demanding that it stop scraping Reddit data.” While Perplexity told Reddit at the time that it didn’t use Reddit content to train AI models and that it would respect Reddit’s robots.txt, after that letter, the volume of Reddit citations on Perplexity actually increased. Reddit also created a post that could only be crawled by Google, and “within hours,” Perplexity “ produced the contents” of that post, the company says.
“The only way that Perplexity could have obtained that Reddit content and then used it in its ‘answer engine’ is if it and/or its Co-Defendants scraped Google SERPs for that Reddit content and Perplexity then quickly incorporated that data into its answer engine,” Reddit writes.
Reddit’s data — posts on all sorts of topics written by and ranked by humans — is hugely helpful to help train AI models, and the company knows it; the API changes that sparked the 2023 protests were positioned as a way for the company to be compensated for that data. Reddit has struck deals with AI companies including OpenAI and Google, and it reportedly wants better ones. And Reddit has previously taken legal action against Anthropic, alleging that Anthropic’s bots accessed Reddit’s platform even after Anthropic said they wouldn’t be doing that.
“AI companies are locked in an arms race for quality human content — and that pressure has fueled an industrial-scale ‘data laundering’ economy,” Ben Lee, Reddit’s chief legal officer, says in a statement. “Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material. Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.
“Defendants Oxylabs UAB, AWM Proxy, and SerpAI — a Lithuanian data scraper, a former Russian botnet, and a company that openly advertises its shady circumvention tactics — are textbook examples of this illegal behavior,” Lee says. “Unable to scrape Reddit directly, they mask their identities, hide their locations, and disguise their web scrapers to steal Reddit content from Google Search. Perplexity is a willing customer of at least one of these scrapers, choosing to buy stolen data rather than enter into a lawful agreement with Reddit itself.”
“Perplexity has not yet received the lawsuit, but we will always fight vigorously for users’ rights to freely and fairly access public knowledge,” Jesse Dwyer, Perplexity’s head of communication, tells The Verge. “Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”
Most Popular

Even Xbox developer kits are getting a big price hike
Amazon hopes to replace 600,000 US workers with robots, according to leaked documents
Samsung Galaxy XR hands-on: It’s like a cheaper Apple Vision Pro and launches today
OpenAI’s AI-powered browser, ChatGPT Atlas, is here
GM will ditch Apple CarPlay and Android Auto on all its cars, not just EVs

ThevergeAI大爆炸

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读