Why simplicity - not speed - is key to enterprise Hadoop strategies

hadoopsummit2016

Microsoft, EMC and HPE discuss their experience with Hadoop, from implementing connected data platforms to fundamental shifts in data architecture, and unsurprisingly simplicity, not speed, is the most important factor when it comes to implementation. 

Share

At the press and analyst day of the Hadoop Summit in Dublin, three Hortonworks partners told the press what their customers want most when it comes to embracing enterprise-ready, open source big data tools.

Where Hortonworks is keen to speak about the transformational capabilities of Spark, machine learning, real-time and predictive analytics, the partners painted a slightly different picture, of enterprise customers that just want simplicity when it comes to releasing value from their data. (See also: what is a graph database?)

Simplicity

Corporations are looking to reduce their reliance on coding and programming when it comes to their big data strategy, especially with the current talent shortage in data science.

This means that simplicity and enterprise-ready Hadoop deployments are a natural fit for companies looking at open source big data solutions.

Read next: How Hortonworks helped British Gas unify its customer data in "a true open source environment"

As its VP of corporate strategy Shaun Connolly said, Hortonworks at its core is about: “Productising Apache tech into commoditised enterprise tech.”

Stefan Voss, director of technical marketing at EMC said: “Simplicity is the most important trend we hear from customers. You will hear all of this nice, pretty new projects and all enterprise customers struggle to integrate them all and the complexity.”

EMC's Voss said that he points his customers towards integrated, enterprise-ready solutions like Hortonworks DataPlatform (HDP) for data-at-rest and DataFlow (HDF) for data-in-motion.

This “allows the data scientist to pick and choose which tools they need based on the data streams and deploy it in a very fast manner”, he said.

Voss expects to see more industry specific tools being developed for Hadoop for the Internet of Things (IoT) use cases and in verticals such as healthcare, to simply plug and play when it comes to complex data analytics, “with the intention of hiding the complexity of the underlying structure,” he says.

Chris Goodfellow, CTO of Haven OnDemand at Hewlett Packard Enterprise, agrees that his customers seek simplicity.

“A lot of our customers, especially in the more traditional enterprises, their data is all over the place and each piece of data is siloed,” he said. 

“You couldn’t have one hundred [Hadoop] projects, with thirty people [working on] each. So making the technology accessible and simplified is a key evolution [that will enable] perhaps two or three developers to say: ‘I have this specific business problem and I will apply this technology to that without having to start from scratch and deploy servers and networking.’”

Releasing data

Hortonworks executives spoke earlier in the day about how their customers generally fall into two camps when it comes to their data: they want to ‘renovate’ and ‘innovate’. 

‘Renovate’ means to take proprietary data spread out disparately across various silo and bring it together using a single data lake. This is where Hortonworks HDP comes in.

‘Innovate’ is where companies look to do more with their data when it is all stored in a Hadoop cluster, such as advanced analytics, machine learning and predictive modelling, which is HDF’s remit. 

Raghu Ramakrishnan, CTO for data at Microsoft sees his customers as very much in the renovate stage, with companies wanting to combine their proprietary data with contextual information to drive insight. 

“Fundamentally I think what we are seeing is much more data-centricity in all facets of business,” he said.

“What that means is you have the kind of use cases that traditional relational databases enabled. But increasingly you are seeing them want to blend that data with other information from operational data sources that are not relational, from third-party sources like Twitter to IoT devices. 

Ramakrishnan added: “So you need to make everything as easy as it can be, because these systems have gotten complex enough that enterprises are asking us for a platform that can deal which allow them to focus on the business logic, and to do all of this with the data uniformly governed and audited.”

HPE’s Goodfellow agreed that his customers are getting more savvy about what they can do with their data once it is consolidated.

“With traditional business intelligence (BI) you would want to use that information to see what your sales were like last week. But also they want to start doing things in real time.”

Case study: Predicting traffic patterns

To show the benefit of blending proprietary data with contextual, EMC’s Stefan Radke from EMC gave the example of a government ‘smart city’ project which analysed traffic flows, a project which EMC helped facilitate. 

“What kind of data would you have to have to predict traffic on the road?” asks Radke.

“That is not only how many cars on the street but how fast they go. So to predict things you would collect weather data, you would collect data from schools, when they open, when they close.

“All of these sources would be required to develop a predictive model. You have to have as much data as possible, store it in the data lake and decide on-demand on the schema that you want to use.”

Open source

Microsoft’s Ramakrishnan channelled the tech giant’s greater acceptance of open source under the leadership of Satya Nadella during the session.

“All of the products need to run on the same machines where the data is, because that customer is not going to be locked into any one solution,” he said.

Ramakrishnan went on to speak about the architectural issues around Hadoop deployments.

“The openness of the sockets we build around is key. I think we are seeing a LEGO-style architecture for data management and analytics and two of the key blocks are the place where we keep the data and manage it: the store, and resource management, that allows us to co-locate our computation as close to the data as possible.”

Microsoft already integrates Hadoop capabilities into a range of its cloud products through HDInsight, including Azure data lake, Power BI, Azure machine learning and Azure active directory.

Conclusion

What all of this shows is a set of enterprise customers which aren’t quite as advanced as some open source proponents would like us to think.

A lot of these customers are just looking for a better way to give their data scientists access to the relevant data, rather than complex machine learning and predictive modelling just yet.

Neil Winters from Hortonworks customer Markel Insurance put it best when talking about how Hortonworks is viewed within his organisation: “Senior leadership wouldn’t know what Hortonworks is, but they do know that our data warehouse and BI team are running at a faster rate and turning solutions out at a rate they’ve not seen before. 

“The cost is keeping down and the capabilities are going up,” he said, two things that are pretty much guaranteed to acquire you some board-level sponsorship for your big data strategy.

Find your next job with computerworld UK jobs