For several years, I've been unable to find a suitable Web server log statistics program for this server, which hosts several dozen virtual domains for myself and a few friends and relatives.
The commercial options such as WebTrends and Wusage cost more than I want to pay for a server-wide solution. The open-source and free-beer programs I have found are either skimpy on stats or can't handle sites that get millions of hits a year.
I've decided to write my own program in Java, a project I'm naming Logfreak. The initial goal is to write an application and class library that can read logs in Apache common and combined log formats and store statistics in a JDBC or ODBC database. Once that works, the stats can be retrieved for presentation on a JavaServer Pages or PHP front end.
The first thing I've learned is that I'll never deal with text again without using regular expressions. It isn't pretty to look at a pattern matching expression like this:
^(.+)\s(.+)\s(.+)\s\[(.+)\]\s"(.+)"\s(.+)\s (.+)\s"(.*)"\s\"(.*)"$
However, when it pulls 11 elements out of server log entries looking like 209.240.205.61 - - [11/Dec/2003:15:17:04 -0500] "GET /visit.php HTTP/1.1" 302 5 "http://www.uroulette.com/" "Mozilla/3.0 WebTV/1.2 (compatible; MSIE 2.0)" without being hosed by goofy user-designated referral and user-agent strings, I can appreciate the beauty of such ugly syntax.
