After striking deals with Google and OpenAI, Reddit CEO Steve Huffman is calling on Microsoft and others to pay if they want to continue scraping the site’s data.
“Without these agreements, we don’t have any say or knowledge of how our data is displayed and what it’s used for, which has put us in a position now of blocking folks who haven’t been willing to come to terms with how we’d like our data to be used or not used,” Huffman said in an interview this week. He specifically named Microsoft, Anthropic, and Perplexity for refusing to negotiate, saying it has been “a real pain in the ass to block these companies.”
Reddit has been escalating its fight against crawlers in recent months. At the beginning of July, its robots.txt file was updated to block web crawlers it doesn’t have agreements with. Then people began noticing that Reddit results were only visible in Google results — where Reddit is paid for its data to be shown — and not other search engines like Bing.
Huffman said that Microsoft has been using Reddit’s data to train its AI and summarizing its content in Bing results “without telling us” and that Reddit’s data has also been sold through the Bing API to other search engines. In the interview, he referenced Microsoft AI CEO Mustafa Suleyman’s recent comment at a conference that public data on the internet is “freeware.”
“We’ve had Microsoft, Anthropic, and Perplexity act as though all of the content on the internet is free for them to use,” Huffman said. “That’s their real position.”
In response to Reddit results recently disappearing from Bing, Microsoft’s head of search, Jordi Ribas, said on X that “Reddit has blocked Bing from crawling their site for search, favoring another search engine and impacting competition from Bing and Bing-powered engines.” Microsoft spokesperson Caitlin Roulston separately told The Verge last week that “we honor the directions provided by websites that do not want content on their pages to be used with our generative AI models.”
“The traditional value exchange from search engines has changed”
Huffman pointed to OpenAI’s recent announcement of SearchGPT, which will be able to show Reddit results thanks to a deal both companies reached earlier this year, as the model he wants to replicate. None of the content licensing deals Reddit has done to date include exclusive use cases for its data, according to spokesperson Tim Rathschmidt.
By calling for licensing deals, Reddit is joining more traditional media publishers (including The Verge’s parent company, Vox Media) in seeking payment for letting their content feed generative AI. “I think the traditional value exchange from search engines has changed,” said Huffman. “Search and summarization and training are merging, and the value exchange of crawling in exchange for traffic back is becoming muddied.”
After this story was published, Anthropic spokesperson Jennifer Martinez sent the following statement: “Reddit has been on our block list for web crawling since mid-May and we haven’t added any URLs from Reddit to our crawler since then. We respect robots.txt, the industry accepted signal for blocking web crawling.”
Microsoft declined to comment for this story. Perplexity didn’t respond to a request for comment.
Update, July 31: Added statement from Anthropic and noted that Microsoft declined to comment and Perplexity didn’t respond.