Reddit vs Perplexity AI: Lawsuit Over Data Scraping

November 12, 2025

On Oct. 22, 2025 Reddit filed a lawsuit in the Southern District of New York accusing the defendants Perplexity AI and three scraping intermediaries – Oxylabs UAB, AWMProxy and SerpApi (“intermediaries”) of participating in an “industrial-scale” operation to harvest Reddit comments and use them for commercial purposes.

The complaint seeks damages and an injunction to block further use of Reddit data, due to the defendants’ conduct resulting in unfair competition, unjust enrichment, and copyright infringement.

What Reddit Alleges

According to Reddit, the intermediaries deliberately circumvented Reddit’s anti-scraping and technical controls (e.g. registered user-identification limits, IP-rate limits, captcha bot protection, and anomaly-detection tools), then sold the harvested corpus to Perplexity (which used the material to power an “answer engine” i.e. their chatbot).

The intermediaries allegedly used tools designed to bypass two levels of security: first, to evade Reddit’s own anti-scraping measures, and second, to circumvent Google’s controls by scraping Reddit content directly from Google’s search engine results.

Interestingly, Reddit is effectively calling the defendants in this case would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armoured truck carrying the cash instead.

Legal Grounds

In legal terms, Reddit has claimed violations on the following counts:

DMCA – Circumvention of Technological Control Measures (17 U.S.C. § 1201(a)(1)(A))
DMCA – Trafficking of Technology, Product, Service, or Device for Use in Circumventing Technological Measure Controlling Access (17 U.S.C. § 1201(a)(2))
DMCA – Trafficking of Technology, Product, Service, or Device for Use in Circumventing Technological Measure Protecting Right of Copyright Owner (17 U.S.C. § 1201(b))
Unfair Competition
Unjust Enrichment

The “Robbery”

To support its claims, Reddit issued a subpoena to Google to learn how it prevents AI companies from scraping content from its Search Engine Results Pages (SERPs). Google explained that it uses a system called SearchGuard, which blocks automated systems from collecting large amounts of search results or indexed data, while still allowing normal human users to view Google’s search results (including those that feature Reddit content).

Reddit alleges that the intermediaries masked their identities, accessed content even when direct crawling was blocked, and laundered the data through search results so Perplexity could ingest it without a direct license.

Reddit also alleges that Perplexity continued to use Reddit content even after a cease-and-desist letter was issued in May 2024, which they claim is evidence that Perplexity purchased or otherwise relied on illicitly obtained copies instead of negotiating a license.

The “Trap”

Reddit specifically claims that they caught Perplexity red-handed by using the digital equivalent of marked bills (to use the bank robbery analogy) to track Reddit data and confirm that Perplexity was using Reddit data acquired through the scraping of Google SERPs.

In other words, Reddit posted content that was deliberately only reachable via Google search (content not directly crawlable from Reddit or anywhere else), then observed that Perplexity’s outputs reproduced that same content within hours.

Reddit characterizes this as evidence that Perplexity (or the intermediaries) scraped Google SERPs to reconstitute Reddit posts, effectively circumventing Reddit’s direct access controls. This “trap” is central to Reddit’s account of how scraping shifted from opportunistic crawling to a coordinated, third-party-driven data market.

Comparison to Reddit’s Anthropic Case

Reddit’s earlier lawsuit against Anthropic in June 2025 and the Perplexity matter share core themes, but the Perplexity lawsuit is different in the way that it confronts not just an AI company but the lesser-known services the AI industry relies on to acquire online material needed to train AI chatbots.

The Anthropic complaint centres on breach of Reddit’s user agreement, trespass to chattels, unjust enrichment, tortious interference and unfair competition. It mainly argues that Anthropic’s bots repeatedly accessed Reddit after being told not to, and that Anthropic’s conduct therefore violated contractual protections owed to Reddit and its users.

While the Perplexity lawsuit also pleads unfair competition and unjust enrichment, it stresses on the role of intermediaries and makes a stronger allegation of organized “data laundering” (i.e., that third parties systematically bypassed technical blocks and then sold the results).

Reddit’s Licensing Deals

In 2024, Reddit reached licensing arrangements with OpenAI and Google to provide structured access to Reddit content for product and model use. Those deals are evidence Reddit points to when framing Perplexity and Anthropic not as inadvertent users of public web content but as companies that could have sought a commercial license and did not. The existence of large licensing deals also sharpens Reddit’s claim that unauthorized scraping undermines a developing revenue stream tied to content licensing.

Practically speaking, the licensing deals give Reddit both standing to seek monetary relief and a narrative – “we license our data to some AI companies; others take it without paying.”

What has Perplexity Said?

Perplexity responded on Reddit the same day that the lawsuit was filed, denying Reddit’s allegations and arguing that Reddit’s suit is less about legal infringement and more about strengthening its negotiating position with partners like Google and OpenAI.

The company emphasized that it does not train AI models on content and therefore doesn’t need or qualify for a data licensing deal. Instead, Perplexity says it merely summarizes and cites public Reddit threads, claiming its citation feature drives users back to Reddit and promotes transparency.

Bottom Line

Taken together the Anthropic and Perplexity suits form a pattern: Reddit is asserting ownership and control over the material on its platform, using both litigation and licensing as tools to monetize and police that asset. The Perplexity case however adds a new angle: targeting the data-resale chain and trying to limit content use strictly by licensing.

Authors: Shantanu Mukherjee, Varun Alase