Juzh wrote: ↑15 Mar 2025, 21:36
mwillems wrote: ↑15 Mar 2025, 16:46
Hopefully this can get fixed, it's crazy.
Is it new bots each time evading Robots.txt, or do they bypass its scope entirely?
robots.txt is merely a suggestion, entirely up to bot to obey it or not. To be sure you have to block their IPs
Yeah but I thought Amazon, Google and Co largely behaved unless they went through a link. But yeah, you can block the IPs but if new ones pop up you'll be exposed again.
https://github.com/ai-robots-txt/ai.rob ... metrics.md
ByteSpider is one of the few that doesn't respect it. I assumed it would be from China and it is. Apparently it bombards with requests more than any other crawler, and constantly changes IP. Apparently the only way to block it is with firewall rules, and even then there are reports that it creates fake agent names.
I had no idea how much of an issue this was.