“Fundamentally the commercial challenge around data is the creation of a single view of a customer," Daljit Rehal, strategic systems director at Centrica, told Computerworld UK.
British firm Centrica is one of the biggest companies you’ve never heard of, providing electricity and gas to millions of businesses and homeowners. Its UK business comes predominantly through the British Gas subsidiary, which has around 20 million customers alone.
The problem for Centrica was that its customer data was spread out across various sources: 16 data warehouses from a variety of vendors and home-built SQL Server databases, to be exact. (See also: what is a graph database?)
Joining up data
Rehal spelt out the problem for us: “You have to think of how the traditional SQL-based answers can’t really cope with joining the structured and unstructured data together for a unified view of your customers. In this new paradigm of IoT [Internet of Things] and devices you need to go beyond that.”
The speed at which data is being produced, and across different formats, forced Rehal to look at NoSQL solutions to help join this data into one unified repository for his staff.
After looking at other NoSQL frameworks, namely companies like Teradata and Cloudera, or MarkLogic and Cassandra-based solutions, Rehal decided on Hortonworks because it “felt the nearest to a true open source environment.”
“We started to sense that some of the competitors to Hortonworks were more on the proprietary side, or at least in danger of going down that route,” he said.
Rehal didn’t want to implement an open source solution and then wake up to “see that what we have built was no longer open source.”
So, why open source?
“There are other ways, but we felt strongly that the best way would be open source because, without mentioning the other vendors, we were being charged a lot of money for every time there was an upgrade, or new tooling, licensing renewals and hardware costs,” Rehal said. “We knew we needed NoSQL technology and, starting from the fundamentals of MapReduce for batch processing, Hadoop was the right answer.”
The results have been positive, not only in terms of solving his enterprise problem around customer data, but also in terms of speed of delivery, storage footprint and cost.
For example: “To buy a bigger appliance for one of our suppliers would have cost me five million quid [pounds], and that would give me twelve nodes. Now I am able to get 250 nodes for £750,000 on Hadoop. If you add to that the getting rid of old solutions and decommissioning them then the whole thing pays for itself.”
British Gas case study
Taking its biggest entity, British Gas, Rehal took the first step of implementing a Hadoop data lake for all of its customer data.
“At British Gas our first use case was to set up a Hadoop data lake to centrally collect all of our customer interaction data,” he said.
On top of this, Rehal looked to Hortonworks for simple data solutions, eventually building out the largest live operational Hortonworks Data Platform (HDP) cluster in the UK over the past two years: “We selected HDP for this because of its enterprise-grade features and 100 percent open source make-up.”
Later, Rehal saw the potential to open this data up to his staff and analysts for better self-serving: “With that cluster up and running successfully with 250ish nodes and a 2PB capacity, we began to look into additional workloads, analytics and data types to capture. That's when we started evaluating Hortonworks Data Flow to help easily collect data from edge nodes and bring that back into the data lake.”