As I mentioned in Complete Blog Remake, Part 2, there are lots of evil bots out there. They are
relentless in their automated search for known exploits, and a lot of those target WordPress installations and plugins. Most of these go
through the normal HTTP protocol, trying to find URLs that are routed to some badly written, exploitable PHP code. In my logs, I find thousands
of calls to
/wp-content/uploads/locate.php and others where there
are current or older versions that expose known SQL injection or script injection exploits.
Because of how my routing is built, all of these requests are interpreted as possible article titles and sent to the
Single(string postname) method, which searches for an article with a weird name, doesn't find it, and responds with a 404 page.
The request gets logged by Azure, and when there are many bots (or just one unusually intense one), Azure alerts me of having many client
errors in a short time period.
In the beginning, I used these logs to double-check that I hadn't missed any incoming links, but because of the huge amount of bots out there, the
requests that I'm really interested in gets drowned out by the low signal-to-noise ratio.
Building the naughty list
Some requests could be people or crawlers (Google, Yahoo, Baidu, ...) just doing their job, following links that may or may not lead somewhere,
so I don't want to blindly and automatically block the IP address of everyone making mistakes in typing or following a misspelled link. But if there
are a few bad requests from the same IP address (say eight in 24 hours), I will block them.
Other requests are just blatant attempts at finding exploits. I will block the IP address of those calls instantly. The
Single method makes
use of the
PageNotFound method of the base class, so the result is really straightforward:
public ActionResult Single(string postname)
if (postname.StartsWith("xmlrpc.php") ||
/* Edited out: Code that searches for the requested article */
if (article == null)
PageNotFound method of the base class isn't too complicated either. It calls the
ApplicationData class to handle
the list of suspicious or blocked IP addresses:
public ActionResult PageNotFound(int statusCode = 404)
if (applicationData.SuspectUserAddress(Request.UserHostAddress, statusCode == 403))
return new HttpStatusCodeResult(403);
/* Edited out: Code that gives a nice 404 page */
And here is finally some of the code that keeps track of suspicious IP addresses:
internal bool SuspectUserAddress(string address, bool confidentSuspicion)
// Is this address already blocked? Just return true.
if (BlockedAddresses.Contains(address)) return true;
// If I'm not sure yet, check some more rules
// How many times has this address acted suspiciously already?
int count = SuspiciousRequestAddresses.Count(sra => sra == address);
if (count >= 5)
// Do a reverse DNS lookup. Is it NOT a known nice crawler?
// Then this suspicion is a confident one!
confidentSuspicion = true;
// Are we sure now?
// Remove from list of suspicious requests
SuspiciousRequestAddresses.RemoveWhere(sra => sra == address);
// Add to list of blocked addresses
// We are not sure... That means this request should be stored as a suspicious one
private bool IsNiceCrawler(string address)
var parsed = IPAddress.Parse(address);
var hostInfo = Dns.GetHostEntry(parsed);
// Something like (google.com$)|(googlebot.com$)|(msn.com$)|(crawl.baidu.com$)
string validationRegex = ConfigurationManager.AppSettings["NiceCrawlersRegex"];
// Check all of hostInfo's aliases for one that matches the regex
bool isNice = hostInfo.Aliases.Any(
alias => Regex.IsMatch(alias, validationRegex, RegexOptions.IgnoreCase)
After doing this, the amount of 404s went down by a lot, but the 403 errors started rising. I checked a few
times to see that the blocked requests are really exploit attempts, and I feel comfortable with this solution.
Also, I changed my Azure alerts to separate the different 4xx responses. I still want those unhandled 404s to
generate an alert so that I can fix broken links. This works really well for me.
Complete blog remake, part 1
Complete blog remake, part 2
403s for the Naughty List (this part)