Perplexity, which offers an AI search product that it calls an “answer engine,” is a buzzy AI startup embroiled in scandal following accusations that it rips off content, doesn’t respect robots.txt files, and even plagiarizes articles.
The company, which has already received funding from the likes of Jeff Bezos and is in talks to raise hundreds of millions of dollars more, advertises on its website that “every answer” is “backed by citations from trusted news outlets, academic papers, and established blogs.
However, plagiarism and paywall problems have made Perplexity a lightning rod for media industry frustrations as it attempts to overtake Google for the future of search on the internet.
Here’s our coverage of the ongoing developments.
Perplexity’s grand theft AI
In every hype cycle, certain patterns of deceit emerge. In the last crypto boom, it was “ponzinomics” and “rug pulls.” In self-driving cars, it was “just five years away!” In AI, it’s seeing just how much unethical shit you can get away with.
Perplexity, which is in ongoing talks to raise hundreds of millions of dollars, is trying to create a Google Search competitor. Perplexity isn’t trying to create a “search engine,” though — it wants to create an “answer engine.” The idea is that instead of combing through a bunch of results to answer your own question with a primary source, you’ll simply get an answer Perplexity has found for you. “Factfulness and accuracy is what we care about,” Perplexity CEO Aravind Srinivas told The Verge.
AI is eating its own tail, Perplexity edition.
Uh oh!
In multiple scenarios, Perplexity relied on AI-generated blog posts, among other seemingly authentic sources, to provide health information. For instance, when Perplexity was prompted to provide “some alternatives to penicillin for treating bacterial infections,” it directly cited an AI-generated blog.
Reddit escalates its fight against AI bots
In the coming weeks, Reddit will start blocking most automated bots from accessing its public data. You’ll need to make a licensing deal, like Google and OpenAI have done, to use Reddit content for model training and other commercial purposes.
While this has technically been Reddit’s policy already, the company is now enforcing it by updating its robots.txt file, a core part of the web that dictates how web crawlers are allowed to access a site. “It’s a signal to those who don’t have an agreement with us that they shouldn’t be accessing Reddit data,” the company’s chief legal officer, Ben Lee, tells me. “It’s also a signal to bad actors that the word ‘allow’ in robots.txt doesn’t mean, and has never meant, that they can use the data however they want.”
Perplexity continues to piss off publishers.
Wired and Robb Knight, a developer at MacStories, found that the AI search engine seems to ignore requests not to scrape their websites. They both blocked Perplexity in their robots.txt file — a standard instruction document for web crawlers — and found that Perplexity still managed to access their content. They’re not the only ones annoyed.
Perplexity will research and write reports
AI search platform Perplexity is launching a new feature called Pages that will generate a customizable webpage based on user prompts. The new feature feels like a one-stop shop for making a school report since Perplexity does the research and writing for you.
Pages taps Perplexity’s AI search models to find information and then creates what I can loosely call a research presentation that can be published and shared with others. In a blog post, Perplexity says it designed Pages to help educators, researchers, and “hobbyists” share their knowledge.