I returned from a trip out of town Monday to crashing web servers that ate my lunch all week long. For several days, I used the top command in Linux and watched helplessly as two servers ground to a halt with load averages higher than 100.

Top reports the processes that are taking up the most CPU, memory and time. On the server running Workbench, the culprit was always httpd, the Apache web server. This didn't make sense, because Apache serves web pages, images, and other files with incredible efficiency. You have to hose things pretty badly to make Apache suck.

If you know the process ID of a server hog, Apache can tell you what that process is doing in its server status report, a feature that requires the mod_status module. The report for Apache's web site shows what they look like.

Using this report, I found the culprit: A PHP script I wrote to receive trackback pings was loading the originating site before accepting the ping, which helps ensure it's legit:

// make sure the trackback ping's URL links back to us
$handle = fopen($url, "r");
$tb_page = '';
while (!feof($handle)) {
$tb_page .= fread($handle, 8192);
}
fclose($handle);
$pos = strpos($tb_page, "http://www.cadenhead.org/workbench");
if ($pos === false) {
$error_code = 1;
send_response(1, "No link found to this site.");
exit;
}

Most trackback pings are not legit -- I've received 600 from spammers in just the past three hours. Each ping required Apache to check the spammer's site, download a page if it existed, and look for a link to Workbench. A single process performing this task could occupy more than 50 percent of the CPU and run for a minute or more.

I'm surprised Apache ran at all after I added trackback a couple months ago. I was beginning to think the web server software was idiot-proof, but I've proven otherwise.

-- Rogers Cadenhead

Comments

I was beginning to think the web server software was idiot-proof, but I've proven otherwise.

I'm confused. Why is it Apache's fault that you wrote code causing it to crash? A search through all the source of a page would be extremely processor intensive with hundreds of requests an hour.

Can I blame IIS if I wrote an infinite loop in VBScript?


 

Who said it was Apache's fault?


 

Ah, my apologies! I mis-read and took the quote (that I quoted) as meaning it was Apache's fault. Silly me glossed over:

I found the culprit: A PHP script

Keep up the good work!


 

> I was beginning to think the web server software was idiot-proof, but I've proven otherwise.

It's like that saying in Poker... If you've been playing for a while and you haven't spotted the chump, then you are the chump.


 

Just accept the trackbacks (returning a "202 Accepted") and queue them into a file or database table, then separately via a cronjob or some other process, run the queue and purge the 99.999957281% of the crap that is spam from the spammers.

I changed my trackback script to be a PHP script which always replies "202 Accepted" but files all the data way in files to be processed later. Out of maybe 5000 trackbacks I received last year not a single one was valid.


 

I'm having trouble figuring out how the script you posted was using up 50% of the CPU for up to a minute. It shouldn't be nearly that bad. What kind of hardware are we dealing with? The download of the page is essentially computationally free, and scanning one string for a single occurrence of another string is very efficient. I'm just not seeing why it would be running slowly.


 

This server is an Intel 2.4 Ghz P4 Celeron with 1 gig of DDR 266 memory running Redhat Enterprise Linux 3.


 

I would bet it's the DNS lookup on the fopen of the URL. DNS is typically thread safe but not reentrant. Under a threaded server like Apache/2 each DNS call will block until the hostname's been resolved, if you get a slew of trackbacks at the same time the following requests will stall waiting for the hostname to resolve. Assuming everything is working perfectly this shouldn't be a problem, but if you're getting slammed by trackbacks which list fake web sites for the trackback URL, your DNS may hang just long enough on each request to cause the server to fall over.

This is pure speculation on my part, I haven't stared at apache or resolver source code in years.


 

It it helps, I just crashed a server too by simply requesting 100 or so PHP files in just a few minutes via AJAX. Seems each of th PHP files had loops which ran a thousand+ or so times, causing the crash by overloading the machine?


 

It's not apache's fault the server crashed? How preposterous! The software shouldn't crash. It should issue some form of useful error message then carry on.


 

Post #9. I too just carshed a W2K3 server by running a php script that processes a lot of data. Crashdump shows that php is at fault. This is is rather annoying.


 

The problem with the example code is that it does not check for errors returned by fopen().

If fopen() returns an invalid handle, then the while loop will run forever, since feof($handle) will always be false.