Various open projects collided
Problems arise due to the fact that such and indexes act aggressively, collect information in several flows and do not take into account the rules for accessing content set on sites through the file ROBOTS.TXT . The problem is aggravated by the fact that a large number of different companies around the world are engaged in developments in the field of machine learning, which are trying to collect as much data as possible to the best of their capabilities. Each company launches its own indexer and all together they create a huge parasitic load on infrastructure elements.
After the start of blocking such a traffic, some indexors began to pretend to be typical browsers to bypass filtering by the user agent identifier and use distributed networks covering a large number of hosts to overcome restrictions on the intensity of appeals from one IP. The infrastructure of open projects using the GIT-Roads, forums and Wiki, which were initially not designed for high load processing, suffer from the activity of AI indexisors.
Problems arose for a joint development of SourceHut, developed by Drew Drew devault, author of the user environment Sway. Drew complains that once again, instead of engaging in the development of the platform, he has to spend most of his time on the raking of unexpected problems. Four years ago, the problem for SourceHut was the use of CI infrastructure for cryptocurrency mining. Two years ago, I had to deal with Flud’s requests “Git Clone” due to the GO Module Mirror service. Last year, the platform was disabled for a week due to DDOS attacks. Now a new misfortune has arisen – AI -indexors.
According to Drew, the solution of several priority problems was postponed for weeks or even months due to the fact that the creators of Sourcehut is constantly distracting the blocking of the AI Bots. To avoid failures the blocking rules have to be reviewed several times a day. To reduce queries to resource-intensive handlers in SourceHut were introduced