Data is increasingly becoming integral to businesses but the vast quantities of information can be hard to manage. Confluent was developed as a central platform to manage all their strands of streaming data, and has now opened an EMEA HQ in London this March as it plans to boost growth in Europe.
The Confluent team came together at LinkedIn, where they built the real-time streaming technology Kafka as the core platform behind the professional networking site. The scalable messaging system was developed in 2010 to accommodate the growing membership and complexity of LinkedIn.
Initially all the data was processed through separate pipelines for everything from tracking page views to providing messaging queuing functionality, with each of them maintained and scaled individually.
They designed Kafka as a single channel to process them all. Today it acts as the central data pipeline at LinkedIn, handling more than one trillion messages a day. The technology earned its name from the labyrinthine manner in which the systems and applications were all connected.
"Usually in companies this is very Kafkaesque, there are lots of twists and turns, and so building a system that did it all we thought it should be called Kafka," Confluent CEO and co-founder Jay Kreps told Computerworld UK.
"We had a distributed database that we also set up that runs as part of LinkedIn even today that was called Voldemort, so Kafka was not the goofiest name."
The system was designed to connect all an organisation's data streams through a central pipeline to support the flow of information between them, and applications would react to and process those tides of data in real-time.
Kreps explains: "If you ever see a diagram of a how a Netflix or an Uber or an Airbnb or an eBay works, internally they have this giant real-time stream of everything happening in the company that's running with Kafka, and that's kind of what plugs everything together, that's kind of the central nervous system."
Real-time streaming trends
Kafka was released as open source and quickly gained traction within Silicon Valley, where it provided the architecture upon which numerous big tech companies built their systems. Uber, for example, uses Kafka to collect and analyse all the information about its drivers and cities to supports its pricing analysis and decide where to route its cars to passengers.
The software soon moved beyond the Valley. In the automotive industry Kafka can be used to support connected car initiatives by feeding continual statistics about the health of the car, traffic information and geolocated services back into the navigational dashboards and connected products.
Big banks that are typically built around real-time flows of data can use Kafka as an event-based architecture, to recognise different transaction types and help the bank react to them appropriately. It can also be used to identify fraud by gathering intelligence in real-time. Telecommunications companies use it to correlate customer behaviour across multiple data sources and identify anomalies in computing systems, which can be integrated in real-time with financial systems.
Kafka is today used by a selection of the aforementioned tech giants, global corporations like Goldman Sachs and startups including peer-to-peer lending pioneer Zopa.
"That we know of, over a third of the Fortune 500 companies use Kafka in production," Kreps says. "And that includes seven of the top ten global banks, eight of the top insurance companies, nine of the top ten telecoms companies and six of the top ten travel companies.
"So it's super prevalent across a bunch of industries running with these really large-scale problems, and you would see it for all different types of applications."
The Confluent Platform
In September 2014 the creators of Kafka founded Confluent, the data streaming platform for the technology they invented, and recently launched an EMEA HQ in London to boost the growth they predict in Europe.
The Confluent Platform is the company’s core product. It adds additional tools designed to help companies get going, without having to rely on a team of engineers to build out all the integration.
"I think the big things that we've done is really integrate rich and powerful features for reacting to streams of data, processing streams of data, building applications around streams of data and really integrate that into Kafka in a really cool way," says Kreps.
"We've also been working on just putting together all the tools that companies need to get going with this stuff. We really want this to be not a kind of advanced package for the highest-end problems, but really kind of the default way of working with data.
"As a result we really need to make it easy to plug into all the systems you have, make it easy to kind of manage and monitor Kafka at scale, make it easy to run across data centres, all the things that real companies have to do."
The future of real-time data
Subscription bookings to the company grew by more than 700 percent last year, and Kreps believes the technology is now coming to the forefront of data management in a host of different businesses.
"For a long time it was viewed as too difficult to really get right, and kind of a niche as a result, and what people have realised is that it is actually totally possible to build systems of scale that can do everything in real-time," he says.
Many businesses have now gone to the opposite extreme. They generate and work with all of their data continuously and now want a full platform that can stream it in real-time. Kafka is designed to act as a hub for all of that data, covering all of the activities happening inside the company and providing a foundation on which it can build certain applications.
"When I look at what companies are trying to do, increasingly they have more systems, more applications, more types of data," says Kreps. "A lot of companies are now tip-toeing into this whole internet of things domain, where they can implement parts of their business that were really not represented in a digital way at all.
"While they have these new systems and new applications and new types of data, they're also in a totally different operating environment. A lot of companies are starting to move some of their applications into the public cloud, and so they have to bridge totally new data sets, many of them much larger, across many more application and systems, and now across many environments like the public cloud and on premise.
"That's the problem that we see people adopting Kafka to solve, and that's how I see it playing out as the technology becomes increasingly mainstream in large companies."
"We went into EMEA really just because we saw so much demand," says Kreps. "In the UK and across Europe, we saw the technology being heavily adopted in financial services, in manufacturing, in car companies, and so we felt like we really had to have a presence there."