Perplexity, which positions its product as a “free search engine based on artificial intelligence”, was in the center of the scandal. После обвинений Forbes in theft of materials and their transformation on various platforms, the publication wired reported that Perplexity ignores the Robots.txt exclusion protocol and carries out an unauthorized data collection from Wired sites and other media holding media nast. The Shortcut technological site is also put forward similar accusations.
Now, According to Reuters , perplexity is not the only company ignoring Robots.txt and scanning sites to obtain content, which is then used to teach their technology. The agency refers to a letter from Tollbit, a startup that helps publishers to conclude licensed transactions with companies using AI. The letter reports that “Jeenets from many sources (not only one company) choose a bypass of the Robots.txt protocol to extract content from sites.”
Robots.txt is a simple but effective tool with which site owners control search robots indexing. Despite the fact that its use is advisory in nature, it has been used since 1994.
Tollbit did not indicate specific companies, however Business instill He said that Openai and Anthropic – the creators of the ChatGPT and Claude chat bots, respectively, also ignore the Robots.txt signals. Both of these developers previously announced compliance with the instructions “not to scan” specified in Robots.txt files.
During its own investigation, Wired found that the machine on the Amazon server, “definitely controlled by Perplexity”, bypassed Robots.txt instructions on the publication’s website. To confirm that Perplexity scans their content, Wired provided the instrument with the headlines of its articles and brief descriptions of materials. As a result, he issued texts “strongly reminiscent” of Wired and “almost without indication of authorship.”