Nationwide Building Society is currently re-architecting its systems around the streaming data technology Apache Kafka, with the help of enterprise vendor Confluent, in order to speed up its access to transaction data and increase resiliency as it bids to compete with more agile challenger banks like Monzo and Starling.
In September, Nationwide committed to a £4.1 billion investment in technology over next five years. Speaking during the Kafka Summit in London last week, Rob Jackson, head of application architecture at Nationwide, spoke about how "Kafka and this [streaming] architecture holds probably the number one priority in that investment."
Like most banks, Nationwide has a lot of legacy technology and is being disrupted by agile fintech providers and challenger banks. "If we don't do a good job with our systems and our data and our apps, our competitors will, and they will disintermediate us. So we have got to react to that," Jackson said.
Add on top of that new regulations like open banking, which means the bank will be increasingly hit with new and more unpredictable data volumes as third parties request access to customer transaction data, and an outdated data architecture just won't cut it.
"As customers log into our apps more, and send more payments, they expect real time data," Jackson said. Customers of modern banking apps like Monzo and Bud expect to see their transactions straight away, they want to categorise them and establish rules for better money management.
All this is great in principle, but for a bank like Nationwide, "our existing platforms are making it hard for us to do those things," Jackson admits.
"We have to make use of our data, we have got lots and lots of data in our systems, it tends to be locked away into silos, hard to use in apps and get insights about you," he added. "Of course, we have a very strong need for agility and innovation to remain competitive."
The Speed Layer
The answer for Nationwide is something it calls a Speed Layer (below), an old data architecture term that has been appropriated by Jackson and his team to represent what is essentially "a streaming platform with some other bits at the side," Jackson said.
The old architecture is what Jackson characterises as pretty atypical for a bank: a channel application at the top (let's say your internet banking app), talking to an API gateway and some channel web services for orchestration. Data is pulled together and aggregated where it interacts with backend systems from vendors like SAP. The process of data traversing all of these layers creates latency and bottlenecks that Nationwide wanted to overcome with this new architecture.
The Speed Layer on the other hand is based around Kafka and aims to deliver near real time data from backend systems to channel applications. "You can think of it a bit like an enterprise cache," Jackson said.
This means that once data is written into Nationwide's Unisys mainframe, the system uses change data capture to process that data, push it into Kafka and use stream processing techniques to produce Kafka topics for databases like Redis, Cassandra or MongoDB to query it.
This creates what Jackson calls a "near real time copy of the mainframe data, pushed up into microservices and then when you want to get the data you are just talking to that top layer of microservices and the API gateway and not having to traverse down through the stack."
This new architecture is being deployed on premise for now.
The Speed Layer promises four primary benefits for the bank: high resiliency, agility, scalability and the ability to ingest rich data sets.
In terms of resiliency this new architecture replicates data, meaning internet banking should be down far less often for Nationwide customers.
"If we have some planned outage on the mainframe today that means the channel applications are down and you can't use internet banking," Jackson explained. "Once we replicate into Kafka you can take a mainframe out and still read the data, so you can still see your transactions, the data is getting stale, but you can still read the data. If Kafka goes down the data is still there to query it. If you lose an entire data centre the data is in the other data centre too. So you have resilience built in to every layer of this architecture."
The new architecture is also "smaller, lighter, faster, meaning it will recover and if anything goes down you just re-process your topic," Jackson said.
On the scalability front Nationwide processes somewhere between one and two hundred requests per second. Compare that to follow enterprise Kafka user Alibaba, which processes about 425 million transactions a second, "so these technologies scale way beyond what we need to and it's amazing what they can do," Jackson said.
It is also worth noting that the only way to scale the old system would be to buy more mainframe capacity, "but it is not cheap or quick, with no elastic scaling, so we looked to change that," Jackson said.
As customer expectations of their financial apps are changing banks like Nationwide need to be able to surface information for users quicker than before, where a two day lag was par for the course.
By being event driven, this new architecture opens up the opportunity for the bank to bring these sort of features to customers more easily. The Nationwide mobile app just recently shipped a feature where customers can get insights into their spending, called MoneyWatch, but it is still hampered by slow data.
"Now with stream processing that [event] has been pushed to an interesting events topic, and channel apps can subscribe to that topic and maybe push a notification or categorise data," Jackson said. "More and more we want to start pushing stuff to the apps and to the customers that they are interested in and this architecture allows for that, whereas the traditional architecture on mainframe systems meant we couldn't."
Progress so far
So far Nationwide has only done a proof of concept (PoC) for the new streaming platform, which will be slowly rolled out across the bank according to the use case.
"We split the initial adoption of this architecture into three use cases, the first one is now completed, two and three should be live shortly," Jackson said, without going into more detail.
The bank also took a fairly unique approach to the project, for a bank anyway, taking an engineer-led approach to start with before spinning up a project team after the PoC.
Next, the bank wants to slowly roll out the new platform across business units. "We haven't really got consistency of approaches across all the different areas, so that is something we are working on with Confluent," Jackson admitted.
"Next is where we want to head with this architecture, so we are doing a whole bunch of meetings with our internal business stakeholders to say: 'here's the architecture, here's what it can do, what use cases have you got?' Then we pull all of that together to find the capabilities we have and the opportunities for that."
In terms of actual return on investment Jackson said that it is still early days, but the first use case has already removed around 7 million requests a year from its mainframes, which are typically pay per request, "so there is a strong business case of mainframe offloading there," he said.