mod_log_sql and old logs

I’ve converted one of our Linux Apache webservers over to logging with mod_log_sql and adapted some previously existing php code to display the data.

While this works great for data after the conversion, I’ve still got several megabytes of older logfiles that I’d like to load into the same database. I figured someone else had done this before, but all my googling lead to lots of code snippets, but nothing exactly like I wanted. So, perl to the rescue!

I found several good bits of code, and hacked something together.

#!/usr/bin/perl
#
use DBI;
use Time::Local;
my $dsn = 'DBI:mysql:apachelogs:localhost';
my $db_username = 'user';
my $db_password = 'password';
my $dbh = DBI->connect($dsn, $db_username, $db_password);
%MONTHS=(
  'Jan','00','Feb','01','Mar','02','Apr','03','May','04','Jun','05',
  'Jul','06','Aug','07','Sep','08','Oct','09','Nov','10','Dec','11'
  );
$LOGFILE = "access.log";
open(LOGFILE) or die("Could not open log file.");
while (<logfile>) {
        $count++;
        chomp;
        s/s+/ /go;
        ( $l_clientAddress
        , $l_rfc1413
        , $l_username
        , $l_localTime
        , $l_httpRequest
        , $l_statusCode
        , $l_bytesSentToClient
        , $l_referer
        , $l_clientSoftware
        )
        = /^(S+) (S+) (S+) [(.+)] "(.+)" (S+) (S+) "(.*)" "(.*)"/o;
        if ( $l_bytesSentToClient =~ /D/) { $l_bytesSentToClient = 0; }
        # get all the time info
        ($date,$month,$year,$hour,$min,$sec) = ($l_localTime =~ m|(..)/(...)/(....):(..):(..):(..)|o);
        $month = $MONTHS{$month};
        $year = $year - 1900;
        ($method, $uri, $protocol) = split(" ", $l_httpRequest);
        $time = timelocal($sec,$min,$hour,$date,$month,$year);
        # do line-by-line processing.
        #print "$l_clientAddress, $l_rfc1413, $l_username, $l_localTime, $method, $uri, $protocol, $l_statusCode, $l_bytesSentToClient, $l_clientSoftwaren";
        #print "$l_localTime, $date,$month,$year,$hour,$min,$sec,  $timen";
        #insert into table for host
        $sql = qq(insert into access_host (agent, bytes_sent, machine_id, request_file, referer, remote_host,
        request_line, request_method, request_protocol, request_time, request_uri, server_port, status,
        time_stamp, virtual_host) values('$l_clientSoftware', '$l_bytesSentToClient', 'hostname', '/var/www/htdocs$uri', '$l_referer', '$l_clientAddress',
        '$l_httpRequest', '$method', '$protocol', '$l_localTime', '$uri', 80, $l_statusCode, $time, 'hostname')
        );
        #print "$month $countn";
        $dbh->do($sql);
}
close(LOGFILE);
$dbh->disconnect();
print "Total lines inserted: $countn";

(Sorry about the crap-tac-u-lar code display, between WordPress’s insertion of <p> and my clunky stylesheet, it just looks bad.)