Big data projects have a lot of promise, but the majority fail. A recent study found that just 11 percent of corporate leaders in the UK have generated any cash using data, despite recognising the value it holds.
“People always ask, ‘why did your project work?'" says chief technology officer at Hotels.com, Thierry Bedos.
"We started solving a real issue for the business - which was customer service and personalising what we offer them online - whereas some firms use big data as an innovation project and say ‘we need to play with big data, lets think of some cool use cases we think will add value’”.
Poor grasp of the technology can kill big data investments, he adds.
“There are enterprises that underestimated the paradigm shift, using the wrong technology for the wrong job. Some put analytical systems in the online world where you are never going to get throughput you need, for example. Some didn’t understand that the change in skillset is very different.”
The confusion over technology Bedos is referring to involves deploying a long-term analytical tool like Apache Hadoop to create real-time predictions for customers on your website (like the notifications you might get that there are only two rooms left on a certain hotel, for example), which is really a job for the likes of Apache Cassandra instead.
Hotels.com use a combination of Cassandra and Hadoop to balance real-time analytics with an offline capability too.
Three years ago it moved from traditional relational databases like Microsoft SQL server to become “active active”.
“We were starting to expand to other datacentres as well as the public cloud and going for an ‘active-active scenario where we had systems in multiple places. It became clear that we needed a solution that could scale with us, where traditional databases have struggled.”
After establishing a business case for a NoSQL solution it opted for Cassandra, distributed by Datastax, to make use of its different nodes to ensure that customers get speedy, reliable interactions wherever they are in the world.
Hadoop versus Cassandra
Hotels.com uses Hadoop for huge data storage and offline analytics - that means crunching large amounts of data and not expecting an answer within a millisecond. Cassandra, on the other hand, is used in the online transactional world “where you need an answer below ten milliseconds”. It can also store the data, but is targeted at online for its speedy capabilities.
It will collect information on customer habits from its website to Hadoop, compute it and do some analysis to “try and understand why people have looked at two particular properties on a regular basis - there must be something similar about them”.
“We update the list of properties after deciphering the relation and store it in Cassandra so that a customer can see it straight away”.
Cassandra is responsible for telling you that 1,000 other visitors to Hotels.com have looked at the same hotel in Paris as you, giving a sense of “urgency” to book.
‘Big data is all about humans’
Hotels.com has a group of data scientists that look at its data to try and understand trends to make customer experiences “more enjoyable and faster”, Bedos said.
However an industry-wide skills shortage impacted Bedos' workforce when Cassandra was first deployed.
“We had no choice but to learn it ourselves”, he says. Beginning with a problem that wasn’t critical was a good way “to get our hands dirty”, Bedos advises. The firm created a centre of excellence when it moved from the initial use-case to ensure that there is a reference point for developers if they have an issue and to further acceptance of the technology more widely throughout the firm.
“The way you design data in this world is very different from how you designed data in a traditional database. It requires training, mentoring and coaching for the wider community.”
The firm has “a few hundred” working on technology, all supported by multiple data centres in different regions, although Bedos declined to reveal where.
Research by analytics vendor Rosslyn Analytics found that 56 percent of CIOs and senior IT managers believe data is inaccessible by business decision-makers.
Hotels.com has got around this problem by training up its employees.
“We have a couple of core teams and wider teams that build features of the website who understand. It’s not just a few individuals anymore - it’s the whole organisation.”
Ready to go?
Preparing the infrastructure for a big data project is the most challenging aspect, Bedos warns. Whether it is through cloud platforms or hosted in your own data centre, prepare for the volume coming from your site.
“The data is not what takes a long time - it is putting the infrastructure in place”, he adds.
Now Hotels.com is working on deploying Cassandra and Hadoop across the entire business and recycle and implement algorithms and models to use data throughout the entire website.