The Apache HTTP server allows a system administrator to configure how
it should log requests. This is good in terms of flexibility, but it’s horrid
in terms of parsing: every installation can be different.
I was tasked with getting Apache logs into Graylog and discovered that $CUST
has different Apache log formats even between Apache instances which run on a
single machine. I certainly didn’t want to have to write extractors for all of those,
and I can’t imagine people here wanting to maintain those …
People
havetried
submitting JSON directly from Apache, but I find that a bit cumbersome to
write, and I have the feeling it’s brittle: an unexpected brace in the request
(which ought to be possible) could render the JSON invalid.
I settled on what I think is a much simpler and rather flexible format: a
TAB-separated (\t) list of key=value pairs configured like this in
httpd.conf:
The apache-logger program splits those up, adds fields required for GELF, and fires that off to a Graylog server configured with an appropriate GELF input.
Graylog effectively receives something like this (the Geo-location having been added by apache-logger):
You’ll have noted that the LogFormat allows me to specify any number of fields (e.g. instance) and values.