Waiting on the data

Data, data, data - for many organisations it’s the life blood of the company, and as big data policies are added to the agenda, it can often be the case that you’ll hear the phrase mentioned in your office more than once as you walk...


Data, data, data - for many organisations it’s the life blood of the company, and as big data policies are added to the agenda, it can often be the case that you’ll hear the phrase mentioned in your office more than once as you walk past colleagues With the digital age well upon us, data is becoming an integral aspect for businesses of all types, but the seductive claims made by big data - better insight, real-time analysis, more accurate predictions - mask the fact that the data deluge is just as likely to make an enterprise slower, less responsive and, in the long term, less ‘intelligent’.

We are living in a world where the amount of data generated and stored is increasing by 35-40 per cent a year, which would be enormous if organisations only kept one copy of all that data; however, the average company makes eight to ten copies of each of its databases. All of these copies are sitting on physical hardware, and for each terabyte of original content created, eight terabytes of duplicate data is produced. The problem isn’t the amount of data. It’s the IT and human capital involved in provisioning and storing copies, masking the data in each of those copies and refreshing databases.

These copies are necessary for projects ranging from backup and disaster recovery to testing and training. But making and moving physical copies of databases can cause CIOs plenty of headaches, because the sheer size of the data often means that engineers, developers and analysts have to wait days on end for fresh data or a new database environment.

It takes huge amounts of time and effort to crunch through this massive volume of replicated data, producing a drag on the whole cycle. The inevitable delays caused by database replication only add to the nightmare of increasing app development and testing times.
As batch testing lengthens, it also takes longer to develop satisfactory, error-free environments, adding either to severe delays in the project or product release date, or to unacceptable error rates. This can add weeks or months to projects, accruing significant financial and reputational cost to the organisation involved.

The experience of one of the largest ticket sellers in the US provides ample illustration of these hidden costs of waiting for data: The company found itself having to add three weeks to every project, simply to prepare its data. Each project also included an average 20 per cent schedule buffer to offset the delays in updating test data. The firm planned to embrace quality at the expense of speed; however, as data volumes grew it was forced to test with samples instead of the actual data, resulting in more bugs leaking into production. In project management there are always trade-offs between quality, cost and speed; however, in this company’s case, the attempt to embrace quality over speed actually led to lower quality, while the delays and outages meant additional costs.

Until very recently, the only way to combat these problems was for organisations to consciously limit their data sets by working with a smaller, more manageable subset of data. Alternatively, they could be brutal about the types of data chosen for real-time reporting. Neither of these approaches embraces the spirit of ‘big data’, the ethos of which is that all business information can and should bring value to an organisation.

Fortunately, there exists technology to addresses the crux of this problem, harnessing the power of virtualisation but applying it to databases. Database virtualisation gives organisations the ability to create multiple copies of the same database, allowing for multiple users to work from the same data sets in tandem. Rather than needing to make the average eight copies of data, this technology makes a single copy of each database or dataset, which is then shared with individuals as a virtual instance, refreshed as each user requires.

This means that almost overnight organisations can not only slash the amount of storage space they require but, even more crucially, massively reduce the processing burdens and in-turn wait times.

Database virtualisation is still in its infancy, and yet is already bringing astounding results. Organisations that have deployed the technology have seen their processing times shrink from weeks to a few hours. This means that they can begin to achieve the types of insight and business intelligence that ‘big data’ has always promised whilst still being able to develop, test and deploy new applications in record time. With the arrival of database virtualisation, it no longer makes sense to make and provision a few physical copies of databases that multiple employees must share. It no longer makes sense for legions of developers, QA engineers, data analysts, and business analysts to wait days on end for stale data. It's time to move databases onto agile infrastructure so that organisations can stop waiting on data and start actually doing business.

Posted by Iain Chidgey, VP and General Manager EMEA, Delphix

Find your next job with computerworld UK jobs