No correlation without normalisation

Share

The SANS Institute recently released a whitepaper entitled SANS Log Management Survey: Mid-Sized Business Respond based on their annual log management survey but sliced to focus on businesses with fewer than 2000 employees.

In the early years of the survey, collection of log data was identified as the most significant challenge for this group while today it ranks 7th. This is a testament to the fact that log management products have come a long way in a relatively short period of time and today are proficient in the collection and management of log data.

But as we often find in IT, when you make improvements in one part of a system you can highlight inherent weaknesses in other parts. In this case the weaknesses being highlighted have to do with effective usage of collected log data. Top challenges this year include searching through data, analysis and reporting, near real-time retrieval of historic logs, indexing and access of log data and normalisation of data.

All of these challenges demonstrate a desire on behalf of the user community for better speed and flexibility in the usage of log data and are each topics in their own right. For this post I’d like to make some observations about normalisation together with one of the top reasons identified in the report for having log management, correlation.

Correlation in the context of log management and SIEM can be described as the ability to access, analyse and relate different attributes of events from multiple source to bring something to the attention of an analyst that would have otherwise gone unnoticed. Correlation introduces a much higher level of efficiency and effectiveness into the security workflow.

Given the volume of events being generated in even a modest sized system, it is not practical for analysts to do this work by manual or semi automated means - big data problems require an automated approach. In addition to data volume we also need to deal with the diverse nature of the data being collected.


Picture a typical enterprise environment which consists of many different types of hardware and software “devices” ranging from border routers and VPN devices to firewalls and authentication servers, along with an even wider range of application servers like web servers, email servers database servers.

These devices generate logs that are critical to an analyst because collectively they provide the complete picture of activity needed for thorough analysis. It is seldom if ever the case that two manufactures will use the same logging mechanism or format their logs identically.

The fact that the formats are all different makes it virtually impossible to store the log data in a common location such as a database to provide that compete picture without normalising the events first, and without this base to start from, security analysts will never be able to carry out the level of correlation and forensic analysis necessary to detect and remediate sophisticated malware or insider threat. Correlation and normalisation go hand in hand. In fact there is no correlation without normalisation.

While log collection and management is the foundation, two key areas to look at when evaluating log management solutions are the scope and deployment of the normalisation mechanisms and the ability of the analysis and reporting tools to exploit that normalisation through enterprise wide event correlation.