I got an email from Media Temple (my hosting provider for CSS-Tricks) telling me that I was going to exceed my “GPU” limit for the month. Wha? Turns out a GPU is a “Grid Performance Unit” and is a Media Temple specific way to calculate how much server resources you are using. I think everything but database stuff is included in this. They provide a “GPU Tool” as part of their interface to show you what parts of your website are causing the most usage. I figured I’d better check it out to make sure everything was normal.
The top three usages for this site are the homepage, the RSS feed, and the Atom feed. Makes sense to me… but the fourth on the list stuck out. It was the url “css-tricks.com/images/ajax-loader.gif”. Check it out:
I don’t really use any AJAX on this site, save for a little simple jQuery stuff. My first thought was the poll as that has had some weird issues in the past and does indeed use a little spinner thing to show results. But that wasn’t it… different file name. What’s weird is that the file doesn’t exist, which is why it was throwing a 404 error when accessed.
Now when you have WordPress (or some other CMS) installed on your site, chances are you have a 404 page that is built into that, to provide a bit more user-friendly experience than a blank white browser default error page. Check out my 404 page. OK, so it could be a bit more user-friendly but that’s another story. The point is, it takes the server a heck of a lot more resources to cough up that whole page then it would to serve up a little GIF image. I found a spinning GIF image I had laying around with that name and threw it in my images directory.
- Total size of 404 page: 767 KB
- Total size of ajax-loader.gif: 4 KB
So where are these weird requests coming from? I’m 99% sure they are something random and possibly malicious and not coming from legitimate request from this site. What I do know, is that I can save myself 763 KB of bandwidth/resources on every single request for this file if I just serve up the image and not the 404 page. When it’s getting hit over 16,000 times a month, that adds up!
But this wasn’t the only URL that was getting hit for requests for this weird file! All kinds of URL’s were getting hit with it, all css-tricks.com URL’s. Mostly differerent random posts, like “css-tricks.com/random-post-title/images/ajax-loader.gif”. All added up, these random ones were even more stressful than the main one in /images.
I needed a catch-all solution to re-direct all these weird requests to get to the right place. As luck would have it, Jeff Starr over at Perishable Press had written a timely article about a similar problem he was having: Redirect All Requests for a Nonexistent File to the Actual File. His problem was originally for weird requests to non-existent favicon.ico files. I was seeing a bit of that as well, and also some robots.txt requests.
Time to stop all this madness and fix this!
With some .htaccess voodoo, I was able to get ANY request for this “ajax-loader.gif” file, as well as favicon.ico files to redirect to their ACTUAL locations on the the server and stop throwing bandwidth-sucking 404 errors.
Here is the final code:
# REDIRECT FAVICON.ICO
<ifmodule mod_rewrite.c>
RewriteCond %{REQUEST_URI} !^/favicon\.ico [NC]
RewriteCond %{REQUEST_URI} favicon\.ico [NC]
RewriteRule (.*) https://css-tricks.com/favicon.ico [R=301,L]
</ifmodule>
# REDIRECT AJAX-LOADER
<ifmodule mod_rewrite.c>
RewriteCond %{REQUEST_URI} !^/images/ajax\-loader\.gif [NC]
RewriteCond %{REQUEST_URI} ajax\-loader\.gif [NC]
RewriteRule (.*) https://css-tricks.com/images/ajax-loader.gif [R=301,L]
</ifmodule>
HUGE thanks to Jeff Starr for working with me on this and getting it working properly. You can see it in action by trying to request the file like this:
https://css-tricks.com/blahblahblah/ajax-loader.gif
You will get instantly re-directed to the proper location.
Note: If you are using WordPress and have special perma-links (like I do), there is a chunk of WordPress-specific stuff already in your root-level .htaccess file. This stuff needs to go BEFORE that to work right.
Did you find out why the requests where occuring to begin with?
Will this work on an Microsoft’s IIS server? Or only on Apache web servers?
Dave Samuels wrote in to help answer this question…
@Brian Lang: This would only be for Apache web servers. I did some quick research (AKA google search) and unfortunately there isn’t a direct equivalent to a .htaccess file in IIS. I’m not sure what kind of access you have to your hosting site because there are some “plugins” that can mimic .htaccess settings in IIS. Good luck with your site.
Thanks for this post Chris I’ve been having some weird access requests on my media-temple site as well and this should help with that.
@Marcus, We host the website ourselves, so access to the server is no problem. I’ll try and find some of those “plugins”
I turned on a proxy and observed a request for ajax-loader.gif for your video articles, specifically #33 and #31 but not #31. Probably the same sort of thing for RSS feeds with embedded video.
That is, “not #32”.
Chris,
As of 11:40AM Pacific Time, I just clicked on the final link in the article, for the ajax-loader.gif, and got the loader image. It didn’t re-direct. Thought you should know.
@John S. – That’s the idea, that you actually GET the loader image, not the 404 page which is much more resource-intensive to serve up.
DOH! Ignore that last post! Apparently, I’m still asleep and didn’t really think this through. OF COURSE I got the loader… that was the whole point! DEE-DEE-DEE! I’ll crawl back into bed, now!
Another way to curb this, and avoid the .htaccess hit, is to simply put “blank” files in necessary spots (favicon.ico, for one).
@David: The problem is there are hundreds of posts, and they were all getting the weird requests. And with my permalink structure, those posts don’t actually have folders on my server, so I would be unable to actually place a blank file where the request wanted one.
That’s exactly the point with this technique. Rather than chasing down non-existant files at virtual locations and fiddling around with increasing numbers of meaningless “dummy” files, we use a simple rewrite rule to solve the problem in one fell swoop. Clean, simple, and effective.
It’s possible someone (or some people) have linked your image in a tag or w/e from their own website, somewhere often used like a home page or every page or something.
It might be worth writing a PHP script to replace the image and log into a text file the referring URL? Unless your webhosting provide full Apache logs of course!
But than again, instead of hacking the problem, u should just go and investigate why that error occurs so u can fix it in a normal way. Doing a rewrite will also cost unnecessary server usage..
@V1: i bet this is not an error in the way you mean. I think this is indeed a malicious spider thingie looking for ajax enabled websites.
If a ajax-loader.gif image is found on a site, it is most likely to have some ajax going on… Then the owner of the spider can go and look for security holes in the ajax functionality…
Pretty interesting though… I haven’t seen nor heared anything like this happening elsewere.
@V1: I am no expert, but in my experience Joost is correct in his conclusion: such behavior is intentional, malicious, and persistent. This conclusion comes after many hours of doing exactly what you say: investigation of the errors in question.
There are bad bots out there navigating directories looking for potential exploits. In this case, ajax functionality is the target. As Chris mentions in his article, serving up tens, hundreds, or even thousands of customized 404 pages in response to such requests requires far more server resources than a simple htaccess redirect.
If you happen to know of a “normal way” of responding to such attacks, please help out by sharing it with us so that we may forego “hacking the problem.”
@Jeff Starr: This is not a spider crawling your site looking for malicious things. The reason this is happening is because the ajax-loader file is being called via a relative link and not an absolute path. Try to find where it’s being loaded in a javascript file or css file and change ‘images/ajax-loader’ to ‘/images/ajax-loader’.
I’d much rather see a 200 response than a 301 => 200.
@Mike: I don’t think so. First of all, the type of behavior that Chris describes is seen for many different files (e.g., robots.txt, favicon.ico, ajax-loader.gif, etc.) in many different locations. A majority of these 404 requests target URLs that only exist virtually, as created by Apache’s mod_rewrite functionality (i.e., permalinks). For example, many have seen entires such as these in their error logs:
http://domain.tld/some-great-post/robots.txt
http://domain.tld/another-great-post/robots.txt
http://domain.tld/yet-another-great-post/robots.txt
Not only do the robots.txt files not exist, but the directories themselves do not exist; these types of URLs are generated dynamically, based on database content. So, when you have a site such as this one that is employing some jQuery to create a nice sliding effect, it is doing so from a local directory that does exist, as seen in this hypothetical setup:
[site root]
[css]
[images]
[javascript]
index.php
head.php
.
.
.
Given this typical folder structure, the jQuery script that calls the ajax-loader.gif file would be located in the “javascript” directory. All calls for the file are made on the server from this location, not from random virtual “permalink” directories. The only occasion (that I can think of) where a permalink 404 requesting a non-existent file such as ajax-loader.gif would appear in the logs happens when a web page at some permalink URL is saved as an offline copy through the browser.
Thus, your solution may eliminate a few of the errors (the ones caused from saving offline copies of the pages), but the many others will persist because they are not of local origin. An easy way to verify the malicious nature of these errors is to examine the remote address associated with the requests.
Not to get off-topic, but there was a side comment about robots.txt not existing… imo, it should always exist!
You can use this file to restrict (well behaved) bots to only searching parts of your site that you want them to. Giving them full access to every file and every image is a HUGE waste of server resources.
I also set out “spider traps” for the not-so-well-behaved bots :) I set up a directory containing an index.php file with no links to it whatsoever and list it at the top of my disallow list. Any bad bot (or hacker looking at robots.txt to get a directory list) that sees the name and goes to that directory is redirected to a file that adds their IP to a list and informs them that they have been banned. Then my header (used on every page) calls a file that checks each user’s IP against that list and if found, redirects them to a file stating that they have been banned.
Gets rid of bad bots. Gets rid of hackers. Saves server resources. :)