Forrester now says enterprise adoption of Hadoop is "mandatory", so any business that wants to derive value from its data should, at the very least, be looking at the technology.
So, what is Hadoop? The open-source Apache Software Foundation describes Hadoop as “a distributed computing platform” or “a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.”
According to the foundation: “Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”
The advantages - speed, reliability, lower costs - are appealing to the enterprise, and businesses are starting to deploy the technology at various scales.
Here are a selection of case studies from businesses deploying Hadoop at the enterprise scale, from big banks to airlines and retailers.
1. Hadoop in the enterprise: Royal Bank of Scotland
Royal Bank of Scotland (RBS) has been working with Silicon Valley company Trifacta to get its Hadoop data lake in order, so it can gain insight from the chat conversations its customers are having with the bank online.
RBS stores approximately 250,000 chat logs plus associated metadata per month. The bank stores this unstructured data in Hadoop. However, before turning to Trifacta this was a huge and untapped source of information about its user base
2. Hadoop in the enterprise: CERN
Creative Commons, photo: Vieamusante
The Large Hadron Collider in Switzerland is one of the largest and most powerful machines in the world. It is equipped with around 150 million sensors, producing a petabyte of data every second, and the data being delivered is growing all the time.
CERN researcher Manuel Martin Marquez said: “This data has been scaling in terms of amount and complexity, and the role we have is to serve to these scaleable requirements, so we run a Hadoop cluster.”
“From a simplistic manner we run particles through machines and make them collide, and then we store and analyse that data.”
“By using Hadoop we limit the cost in hardware and complexity in maintenance.”
3. Hadoop in the enterprise: Royal Mail
British postal service company
Royal Mail has used Hadoop to get the "building blocks in place" for its big data strategy.
Director of the Technology Data Group at Royal Mail, Thomas Lee-Warren, told Computerworld UK that its Hadoop investment is the foundation of a drive to gain more value from internal data. "We have a lot of data,” Lee-Warren explained. “We are about to go up to running in the region of a hundred terabytes, across nine nodes.”
The business uses Hortonworks' Hadoop analytics tools to transform the way it manages data across the organisation, freeing the analytics team to deliver insights on proprietary information held in its data warehouse.
4. Hadoop in the enterprise: British Airways
British Airways deployed its first instance of Hadoop in April 2015, as a data archive for legal cases that were primarily stored, at a high cost, on its enterprise data warehouse (EDW) platform.
Since deploying Hortonworks 2.2 HDP, Spanos said his department has returned on its investment within a year, and is able to deliver 75 percent more free space for new projects, which translates to cost reductions to the airline’s finance team.
British Airways’ data exploitation manager Alan Spanos said: “In business intelligence, if you don’t adopt this technology to do at least part of your job role, you will not exist in a few years' time. You can only go so far with traditional technology. It still has a place within your architecture, but quite frankly, this is where you need to be. ”
5. Hadoop in the enterprise: Western Union
Global payments provider
Western Union implemented a Hadoop-based data analytics platform from Cloudera in 2014 to provide a more personalised experience for its customers.
Using Cloudera Enterprise, Western Union is able to more efficiently store and process real-time analytics on what the vendor describes as “one of the world’s largest enterprise data sets”.
Cloudera’s Apache Hadoop implementation helps Western Union centralise its global customer data in an enterprise data hub, and supports pattern recognition and predictive modelling. The big data analytics platform is aimed at creating a more personalised experience across multiple products and service delivery channels for Western Union customers.
6. Hadoop in the enterprise: King.com
European gaming giant and creator of Candy Crush King.com
deployed Cloudera’s Distribution for Apache Hadoop in 2012. The aim was to run analytics for every ‘event’, or action, its millions of users take during gameplay.
The company’s director of data warehousing, Mats-Mats Eriksson, told Computerworld UK that using analytics is vital to its success online.
“Analytics is one of the things that made King.com the thing that it is today,
” Eriksson explained. “In the universe that we operate in, gaming online, it is absolutely essential to know as much as possible about the players and optimise everything.”
“Everybody wants a business case for Hadoop, but for me it is simply about difference between knowing what happens in a game and not knowing."
7. Hadoop in the enterprise: Yahoo
Yahoo started using data analytics company Splunk’s Hadoop tool, Hunk, in 2015. By analysing its IT operations in real time, the firm has saved millions in hardware costs in a year.
Yahoo analyses 150 terabytes of machine data with Splunk Enterprise every day. This information is used to optimise IT operations, applications delivery and security, as well as business analytics to better understand the customer and personalise search results.
Hundreds of employees are now using Hunk to analyse and visualise 600 petabytes of data, to cost-effectively monitor its infrastructure, as well as Splunk Enterprise.
8. Hadoop in the enterprise: Expedia
Expedia planned to double its Hadoop investment back in 2015 and was an early adopter of Hortoworks project Apache Falcon to crunch large volumes of numbers.
Expedia previously used a DB2 database in conjunction with various instances of Microsoft SQL server, which became increasingly expensive to scale as data volume increased with the business growing organically, along with acquiring several travel companies including Trivago and
Since moving to Hadoop, the firm has seen costs drop and is able to both store and process data using the cluster.
Woodhead, who is data platform technical lead for Hotels.com, revealed that “hundreds” of employees across different departments and offices, one of which is based in London, used the two-petabyte cluster for web traffic, bookings and travel reviews.
9. Hadoop in the enterprise: Hotels.com
Hotels.com uses Hadoop for huge data storage and offline analytics - that means crunching large amounts of data and not expecting an answer within a millisecond. Cassandra, on the other hand, is used in the online transactional world “where you need an answer below ten milliseconds”.
It can also store the data, but is targeted at online for its speedy capabilities. The business moved from traditional relational databases like Microsoft SQL server three years ago to become “active/active”.
Chief technology officer at Hotels.com, Thierry Bedos, said: "We started solving a real issue for the business - which was customer service and personalising what we offer them online - whereas some firms use big data as an innovation project and say 'we need to play with big data, let's think of some cool use cases we think will add value'”.
10. Hadoop in the enterprise: Marks and Spencer
M&S adopted the Cloudera Enterprise Data Hub Edition system in 2015 to analyse data from multiple sources, to better understand customer behaviour.
Jagpal Jheeta, head of business information and customer insight at M&S, said: “Smart and efficient data usage is a key focus at M&S, as it ultimately fuels better customer insight, engagement and loyalty. We needed a scalable, robust and future-proof strategic partner. Cloudera is aiding us in leveraging analytics to better serve the business now and in the future.”
11. Hadoop in the enterprise: Tesla
Tesla is using a Hadoop cluster to collect the increasing amount of data being generated by its connected cars.
CIO Jay Vijayan said: “We are working on a big data platform... The car is connected, but it does not really talk to the network every minute because we want to keep it as smart and efficient as possible. It alerts us if the car is not functioning properly so service teams can take action.”