The site has been getting hammered for six hours. I got an alert four hours ago but was remote and there was just no way I could get to it.
Anyway, I get home and look at the logs and there's just a constant barrage of hits on the site, all from Singapore, and it's behaving like what I would call a "site-rip" where someone is trying to get a copy of every thread without having access to the database. Normally site-rips are easy to stop because they come from one or very few IPs, but after running some scripts on the logs I got this result:
Code:
[LTG ~]$ wc 2023-Spider-Attack-Sorted.txt
4004 4004 54358 2023-Spider-Attack-Sorted.txt
That output means that in the file of IP addresses that have been sorted and de-duped there are
4004 unique IP addresses. That is an immense number of sources spread out over an entire /16 (16 million IPs), and some searching shows that they're using a huge number of AWS (Amazon Web Services) cloud IPs.
It's probably nothing more than a crawl (which is done by a
spider - hence the file name) which is done when a search engine wants to be able to use your site for search results, but it was consuming a huge amount of bandwidth:
My server outputs an average of about 2.5Mbps with typical peaks and valleys, so when it's pushing out a significant multiple of that there's something wrong. I have currently blocked it all which is a very "sledgehammer" approach but it stopped the abuse for now. Any users in Singapore might suffer, but for now everything should be stable.