Cloud native is a term that is increasingly used to describe the combination of IT operations tools such as containers with micro service architectures, resulting in applications that are suited to distributed cloud infrastructure. The idea is that it makes for more resilient and flexible software.
Although much of the concept centres around technology, Ticketmaster executive program director, Bindi Belanger, says a change in the way teams work is equally important.
“Cloud native transformations within your company require major cultural transformations,” Belanger told attendees at Cloud Native Con in Berlin this week. “Everything from how you define and set goals, how your leadership views the importance of outcomes over outputs, then the skill sets and teams and the way that you organise the work.
“I would argue that the way that you organise your delivery of cloud native solutions is just as important as the technology choices that you make.”
The company has realised some significant benefits from its cloud native and devops project. Having relied on outdated systems and processes in its tech operations division, Ticketmaster is now able to provide its developers with infrastructure in a matter of minutes.
“Because of cloud native solutions we have gone from several months to deploy new infrastructure and environments, to when we were in the middle of our devops transformation we got down to a few weeks. And now with cloud native it is just a matter of minutes,” she said.
There have been similar benefits around the frequency of software releases. In the past, the level of coordination required between ops, application support teams and software teams meant that releases didn't happen very often, she explained. In some cases just once every few months.
“With devops we finally got to a more weekly delivery culture and then with cloud native teams were able to release new features as often as they need during the day.”
Two years ago, however, this was not the case for Ticketmaster.
The company was founded in 1976 and launched its online ticket sales service in 1996. It has since grown substantially, joining with Live Nation in 2010 and now provides a range of services - such as producing concerts - in addition to its core business, with revenues of $7.6 billion.
It is a large, technology-intensive organisation. One of its main challenges is handling huge volumes of traffic on its network that spike when tickets for major acts go on sale. This requires its systems to scale up to handle 150 million transactions in minutes in some cases.
“We invite the entire world to come DDOS our website every time we have a major artist on sale,” Belanger explained.
Supporting its ticket sales are 21 different ticketing systems, which include over 250 different products and services. To support its operations it has relied on a mix of new and legacy technologies amassed over decades. “To build and maintain those products and services we have an organisation of over 1,400 people globally and they build that software on our private cloud, which is about 20,000 virtual machines across seven global data centres.”
Belanger said that its infrastructure is large and complex, and has relied on legacy systems. “We jokingly refer to the tech stack as the tech museum, because we have software from every era,” she said.
With a diverse business, Ticketmaster has numerous competitors. This places huge importance on the ability to move quickly to create reliable software that supports the wider business. Previously, its legacy systems and outdated organisational processes created a bottleneck to new developments.
“We have a lot of competitive pressure across a large market surface area, [but] we have legacy tech which [was] not ready for containers or public cloud,” she said.
The effect was to hold the business back from developing new services, with more time focused on maintaining the stability of legacy systems. “We were spending a lot of our time on constant firefighting, which meant that we had very limited resources to work on projects to add new value and new features to our development teams.
“Those challenges made it very difficult for our developers to work with tech ops,” she said, adding that, because of the complexity of the tech stack they were highly dependent on operations, and “didn’t have a lot autonomy themselves.”
“To get a new app deployed or a new environment built out, if we didn’t have capacity on our private cloud…it often took several months, especially if it required purchasing additional hardware to build out our on-prem private cloud.”
Two years ago the company started to make changes to its technology operations teams, and began to adopt a devops approach.
“We realised that we need to become much more lean and create autonomous teams,” she said.
There were challenges here too. The company grew its developer team by 250 percent, but did not expand its operations team at the same pace. “Because ops didn't scale to match the growth of the develop organisation, eventually all roads led to being blocked by operations. So while we got faster at developing, we didn't didn't get faster at delivering value.”
This was improved by mixing its systems engineers with product delivery teams. “By removing that organisational silo, by taking them out of ops and putting them with those people that needed to make those changes, we were hoping to really get out of those barriers.”
Software automation tools also helped streamline processes.
“The goal of all this was to create delivery teams that were self-sufficient. Their jobs would be to build software, run it, own it operate it, optimise and monetise it.”
Moving to the cloud
The decision to move Ticketmaster’s data centre infrastructure into Amazon Web Services was a key part of the transition too.
“We don't need to spend a lot of time and money building out infrastructure to be always on,” Belanger said. “We wanted infrastructure that was on demand and scalable. But most importantly the decision to move to the public cloud was to force modernisation of our products and services to cloud native standards.”
She added that there were numerous operational advantages from moving to the cloud. “The benefits of moving to the cloud are clear,” she said. “Not just infrastructure resources like compute and storage, but how we are using our human resources.
“If your teams are spending all their time building and maintaining and upgrading infrastructure they are not spending time adding value and helping development tams move faster.”
The goal was to increase speed. “We wanted to shift our leaders to focusing on not changing things, towards taking calculated risks so that we could enabled speed and continuous delivery, which is another way of saying constant change.”
Cloud native teams
“Our decision to move to the public cloud was a decision to become a cloud native company,” said Belanger.
A variety of measures were put in place to create more efficient technology operations:
- ‘Tech maturity’ and ‘team maturity’ models were put in place as way to measure effectiveness and target improvements. "We wanted to be able to define and measure team performance and technical performance objectively.”
- Data was used to inform decisions around operational changes. The business started “publishing telemetry on everything from uptime to failed maintenance”. “We were working towards creating culture where change was normal and not something to fear,” she said.
- Smaller, more agile teams were created. “In order to create the ideal cloud native teams we realised that smaller was better. Two to five people teams have proven to be really successful. Two people to a team might seem a little strange, but we found that having fewer people focus on the same problem allowed us to move faster.”
- New staff were recruited to support the cloud native approach. “We wanted people that had developer background, which is something that is not easy to find if you are looking for people that are also really familiar with infrastructure.
- “We wanted problem solvers. We didn’t want people who were like ‘we have done this for the past five years lets keep doing it, the status quo is fine’. We wanted people that were constantly looking to drive and embrace change.”
Overall, the changes mean that the tech teams could move faster.
“Instead of having a weekly iteration or month long iteration, every morning the team will get together and say what are we delivering today and at the end of the day you are asked to demo the value that you delivered. So we get away from the two week iteration, or the end of the quarter we will have something delivered."
Of course, investment in new technologies also played a key role in the operations changes.
New tools were adopted, deploying Kubernetes container orchestration with CoreOS Tectonic. Prometheus monitoring and Helm packaging tools were also added.
Belanger said with the rapid changes in operations technology, it means that the team has to be prepared to quickly adopt new systems.
“You can’t stick to a single framework and say this is the box we are going to live in and we must live in that box,” she said.
Kubernetes has helped create applications that are much easier to update, she said.
“One of the great use cases that we have seen is the new Ticketmaster web platforms that was built on Kubernetes. It is still in its beta phase now. Before Kubernetes, even though we were a modern team with great lean practices, we were building on new technology, it still took about 20 minutes or so to deploy, with low confidence - it would often run into issues.
With Kubernetes “fully automated updates that can happen within a minute. And because of that it helped to enable our daily delivery culture”.