Classifying data breaches using 'visibility'

Assemble a group of information security professionals to discuss data classification terminology, and you'll likely end up with a long discussion on what is meant by ‘restricted’, ‘confidential’, ' secret’,...


Assemble a group of information security professionals to discuss data classification terminology, and you'll likely end up with a long discussion on what is meant by ‘restricted’, ‘confidential’, ' secret’, ‘personal’, ‘company confidential’, ‘classified’, ‘top Secret’, with little agreement.

Following the maxim that “you can't manage what you don't measure”, it's not surprising that we have data breaches when there's no generally accepted way of specifying the degree of confidentiality that's required for a particular item of data. This is especially true when the data is passed to another organisation.

So I'd propose a new approach, namely “visibility”, symbol “V”. Visibility is defined as the log (base 10) of the number of people who can see a data item. In case you've forgotten logarithms, for powers of 10 (10, 100, 1000, 10000) you count the number of zeros in the number to get the logarithm.

So a true secret, known only by one person, has a visibility of 0, since log(1) = 0. If it's a shared secret between two people, V is 0.3, since log(2) = 0.3, approximately. If it's one of those password reset questions, such as “What's my favourite breakfast cereal”, the visibility will be around 1, assuming 10 people might know this (log(10) = 1). If it's data on your company intranet, V might be around 3 or 4 for a company size of 1,000 - 10,000. And if you publish it on the Internet, potential visibility is up to 9.8 (taking the world population of 6 billion people).

Now we have a way of comparing confidentiality - by calculating the visibility, which is the inverse of confidentiality. We can ask “how much visibility do we need for an employee's bank account details”, and can measure if we exceed this.

Let’s assume we get a desired visibility of 0.7, meaning 5 people have access, namely the four people in the payroll department and also the employee (log(5) = 0.7). But we might find that that in reality an additional 27 people have access, namely two managers and the 25 IT administrators, making the visibility 1.5 (log(32) = 1.5. Which means there is a visibility difference of 0.8 (=1.5-0.7) between the potential visibility and the desired visibility. So now we have a measure that we can manage, and try to reduce this visibility gap by additional controls.  

But not everyone looks at everything they could, potentially, access. So we have two versions of visibility - potential and actual. Potential visibility is defined as the number of people who could see the data, if they chose to look. And actual visibility is the number of people who did look or know the data. In this example our audit logs demonstrate that those IT administrators and managers never looked at the bank account details, bringing the actual visibility back to 0.7.

This is important information to have if a breach occurs, you can now say with justification “this was a visibility breach of only 0.3, so it is of minor significance”. It lets you compare the release of quite well known information to many extra people, with the release of secret information being passed to one other person. It also lets you put into outsourcing contracts “we require this data to be protected with a visibility of 1.5 or less”.

So at last we have a way to measure how well we manage confidentiality of information.

Andrew Yeomans, Jericho Forum board member

"Recommended For You"

Mapping security in the private cloud Ashley Madison hauled to court in class action suits over data breach