Business at risk from mountains of unstructured data

Raphael Reich looks at 5 sources of unstructured data at risk of being overlooked or under-protected.


You might be surprised to discover just how much of your business data is stored in widely accessible files on shared network storage. At a time when companies are focused on building data warehouses and squeezing as much business intelligence out of their databases as possible, files actually make up as much as 80% of business data, according to market analyst firm IDC. That is a staggering number, and enough to make you wonder: Where is all this unstructured data coming from, and is it relevant business data?

The answer is that file data is unquestionably relevant, as well as valuable. Not only is it valued by businesses and regulators, but it is prized by malicious insiders. For example, last July, a former Goldman Sachs worker was arrested for downloading software source code with the intention of taking it with him to a new employer. And, earlier in 2009, a Microsoft employee was accused of stealing documents he planned to use in a lawsuit against the company. More recently, in March 2010, the Canada Revenue Agency, Canada's equivalent to the US Internal Revenue Service, disclosed that CRA employees had been accessing hundreds of files and using the information for everything from financial gain to providing preferential treatment for friends and relatives.

And, as noted, regulators are also concerned with file-based data. Regulations that address data security, such as Sarbanes Oxley (SOX), Payment Card Industry Data Security Standard (PCI DSS), Health Insurance Portability and Accountability Act (HIPAA) and others, do not limit their scope based on data format. They apply equally to files, databases, applications, etc. So, even if an organisation uses financial applications and databases to manage their finances, when financial data that is governed by SOX is exported to a spreadsheet for manipulation, the handling of the spreadsheet must also comply.

To answer the question of where all this valuable file data coming from, here's a quick checklist of sources to consider as you survey your own file data landscape, as well as thoughts on protecting these files.

Business applications and databases

Whether your business applications and databases are running in-house or in the cloud, mid-level managers are probably using them to export interesting data for analysis, reporting, presentations and other legitimate business activities. For example, six months of sales data from your account can be invaluable in assessing sales trends and diagnosing operational issues. But, when the resulting spreadsheets, documents and presentations containing exported information are stored on shared file systems for enhanced communications and collaboration, you have a data security risk that needs to be mitigated. And, if that data is financial in nature, includes credit card information or has customer details, you may also have a SOX, PCI or personally identifiable information (PII)-related compliance issue to address.

Intellectual works

Plenty of file data is never stored in a database or application, but goes straight from the mind of knowledge workers into a file. Software source code is an obvious example, as are legal documents, product roadmaps and strategic planning documents. These files often contain intellectual property and a wealth of information and rich detail about market opportunities, partnerships, business operations, future plans and strategic advantage. Sharing this information on file servers and network attached storage devices can be critical for mobilising your company and uniting distributed project teams, but it's just as critical to ensure that the data is protected from intentional or even inadvertent harm.

"Recommended For You"

Baillie Gifford better secures wealth management data Is now the time to encrypt?