Log Templater (Artificial Ignorance Utility)

During the last security incident that I worked on, I needed to grind through 20gb of log files looking for any odd log lines that would indicate the point where the bad guys got in. If I had done it manually, I would still be looking at log data. Instead, I built a tool that converted logs into pattern templates and looked for templates that I had never seen before. This allowed me to zero in on just a few hundred log lines out of all the data.

Templater is a small and fast log processor that provides simple artificial ignorance capabilities. You use the tool to process past log data and store templates that represent normal log line structures. You then run the tool against current or target logs and all normal patterns are automatically ignored. The parser is fast and capable of processing millions of lines per minute. For weblogs and firewall logs, I average 9M lines per minute on a 2GHz x86 machine running *NIX. The template strategy was originally proposed by a friend of mine in 2003 who later built a tool called never before seen (NBS) which also provides artificial ignorance for arbitrary text data as well as text structures.

Log patterns

Hi,

For finding log patterns you can also have a look at SLCT or a distributed version of it that I've been working on, disco-slct.

This website also has a couple of other log tools: http://ristov.users.sourceforge.net/

Regards,
Jens

Thanks for posting this

Jens, thanks for posting this, I enjoyed the papers and your algorithm is interesting. Did you do any testing on really large log sets? If so, was the increase in memory and processing time linear?

Ignorance

I took a closer look at your tool. The one issue I am unclear about is the precision. I think you are throwing away too much. What you are doing is encoding the pattern, the look of a log entry. You are ignoring the values. So, if you have an SSH login, for example, you might deem that as okay. However, if someone tries a brute force, you will ignore that pattern. Is that correct? What you will see is new types of log messages. Ones that do not occur much. Am I missing something?

Correct

That is exactly what it does. It is not meant to be a known suspicious pattern detector, it is mean to be a simple filter that makes few assumptions about the content or meaning of the log data.

I have a version of the utility that I have not released that keeps track of both the template (structure) of the log line as well as the variables. It is not as clear or simple to use and stores the variables in binary trees specific to each template.

The latter tool could be used for both filtering as well as detecting known suspicious patters and strings.

Pattern Creation

Would visualization be useful to find your 'normal' patterns? Or how do you go about finding them generally?

How to visualize log 'patterns'

I have percolated on how the 'pattern' of log data could be visualized without taking the time to parse and normalize it. I have played a bit with extracting key attributes (fixed-time, line length, inter-line-timing, Bayesian characteristics) and now the 'line structure' then plotting numeric representations of them visually over time to build histograms of these non-key-value based data.

I think visualization in this fashion could be another useful filter to separate the significant from the background noise.