The Test Data Warehouse - managing quality into your data

We established in my last article that quality can’t simply be tested into software, but as Philip Crosby (1926-2001) eloquently suggests, it ‘has to be the result of a carefully constructed cultural environment. It has to be part of...

Share

We established in my last article that quality can’t simply be tested into software, but as Philip Crosby (1926-2001) eloquently suggests, it ‘has to be the result of a carefully constructed cultural environment. It has to be part of the fabric of the organisation, not part of the fabric.’ Whilst the ideal is applauded, practical steps are sometimes harder to ascertain.

This is never truer than in software development, where the desire to ensure that developers and testers get the ‘right data’ in the right place’, at the ‘right time’, doesn’t just fail to come about but will result in costly delays in delivering the software, or that low quality software is released.

We have already discussed how to create the ‘right data’ - that is, data which provides 100% functional coverage in the minimal optimal number of tests - but this is only half the story. After all, just as you can’t make an omelette without cracking a few eggs, you can’t test without data. Yet, few organisations have a developed a cogent strategy for delivering data to their developers and testers.

Accordingly, these downstream teams can sit idle for weeks waiting for the ‘right data’. This leads some teams to spend considerable amounts of time manually creating data that doesn’t accurately reflect the data they are due to receive. Inevitably, in order to meet time and budgetary constraints, quality is often compromised.

So what can be done? Well, by adopting the concept of a Test Data Warehouse, it is possible to maximise the value of synthetically created data and help ‘shift testing left’ in the development lifecycle to deliver high quality, valuable software to market much earlier.

Brought to Book
The Bodleian Library in Oxford is one of the world’s most prestigious research institutions, and as such, a particularly large and complex database. Dispersed across more than 100 buildings (the largest being a depository in Swindon, 30 miles away), the Bodleian holds tens of millions of resources in an almost unrivalled array of formats, from e-journals to medieval manuscripts. Yet, the Bodleian experiences none of the difficulties - in meeting thousands of data requests each day - that have become inherent in software development projects; providing users with the right resources, to requested locations, within a matter of hours.

How they achieve this is relatively simple. Unlike in most enterprise architectures, all of the library’s holdings are stored in one central location; although geographically disparate, the book stacks are a single ‘organism’. This allows the librarian, as is generally their wont, to manage all of the resources in such a way as to enable readers to request, locate and access them quickly and easily.

At the Bodleian, readers use a single online catalogue to locate, track and request resources; this is why a search for Feminist Literature won’t throw up Joseph Conrad’s ‘Heart of Darkness’! Requests can then be delivered to the reading room or email account of choice, meaning that the correct data is provided in the ‘right place’, at the ‘right time’.

However, in the software development world, developers and testers are often still left to ‘scour the shelves’ in search of the data they need. By adopting the concept of a Test Data Warehouse, DBAs and Data Architects can become ‘data librarians’; custodians of high quality data which can be stored, manipulated and managed from one central location. It is at this point that we can draw the parallels between a Test Data Warehouse and a library.

The central premise of a library is to share knowledge and culture. This is what a Test Data Warehouse offers us within a development project - the opportunity to share high quality data across numerous teams and projects throughout the enterprise, in parallel.

Therefore, instead of sitting idle waiting for data to flow down through a traditional ‘waterfall’ lifecycle, teams can test earlier and without reliance on data from other teams. Working in parallel also means that teams can work on different versions of the software without disruption.

A further benefit of working in parallel is that, whether we use synthetic data creation or not, the data is placed straight into the Test Data Warehouse at the requirements phase. Not only does this minimise the risk of miscommunication, but also lifts dependencies between teams for data. Therefore, teams can be sure that they are working with the correct data, making their test cycles more meaningful and rigorous, as well as efficient.

In addition to this, storing and managing data within a Test Data Warehouse means that it can be reused again and again. This is where the library trumps the bookshop - literary greats like Orwell, Dickens, Austen etc. - can be accessed by everyone.

Within software development, this means that good work is returned to the Test Data Warehouse for use by other teams or projects. This enables you to maximise the value of work done, particularly any synthetic data creation, which can be used repeatedly without generating extra effort.

In common ecological parlance, the Test Data Warehouse really helps minimise your data footprint’, and maximises the time, cost and quality benefits associated with doing so.

Ray Scott Director of Grid Tools Professional Services

Enhanced by Zemanta

Find your next job with computerworld UK jobs