Amadeus, the airline industry’s technology partner, is implementing monitoring software that it hopes will eventually enable it to identify and fix system errors before they occur.
The company is rolling out predictive analytics software from Netuitive to monitor its Altéa Departure Control (DCS) system, its Flight Management system, which includes aircraft weight and balance data, and its airline, hotel and car rental booking and reservation systems. More than 100 commercial airlines in the world use the Altéa suite, which includes reservation and inventory management, as well as departure control.
“We see a rapid increase in complexity of the technical systems environment that we are managing in order to deliver services to our customers,” said Peter Raven, director of business development at Amadeus.
“We are constantly monitoring many servers and applications. Monitoring the scale of this is getting really quite large, and the management of all of these metrics is becoming very unwieldy.”
For example, Amadeus currently has standard alerts on around 500,000 metrics across its IT environment, which picks up hundreds of gigabytes of monitoring data an hour. This information is then delivered to system managers who look at the data to work out where problems have occurred and fix them.
Analytics-driven ops management
Amadeus wanted a big data technology that would look at the monitoring data in real time and come up with “high-level” information to tell its ops team that “something has happened here”.
Raven said that the Netuitive software was its first foray into “analytics-driven IT operations management”. It said that Netuitive provided everything it needed out of the box, whereas it would have taken Amadeus three years to develop something in-house.
“We did a proof-of-concept with Netuitive. We took a subset of just under 10 percent, 40,000 metrics, which we fed into our system. To get that installed and tuned took us two months, from proof-of-concept to running it. It was much lower effort on our side,” he said.
Amadeus signed the software licensing agreement with Netuitive around Christmas 2013, and Raven said that the full implementation is “just getting going”. It will monitor about 2,000 servers located in data centres in Europe, Brazil and the US as part of the three-year deal.
“With the full rollout, we are going to integrate Netuitive into our overall [in-house] monitoring and alerting platform, so we don’t have a new interface for our operations people to work with, which reduces the learning on the user side” he said.
“We’ve got three or four months in integration work, then it will be rolled out to the ops team in the second half of 2014.”
Netuitive is being implemented on-premise - after running the proof-of-concept trial in the cloud - because of the large amount of data plugged into it. It is being connected to customer critical platforms first, that is its Altéa system, before being deployed across its primarily Microsoft SQL server-based e-commerce system and reservation platform. Airline IT systems are based on Linux with Oracle database, and Amadeus has its own middleware and software on the Linux environment.
“All the application monitoring is built in house,” said Raven. “We use a lot of open source on servers. The reason is we are quite large and need to be aware of vendor licence costs.
He added: “It’s easy for us to hook into Netuitive, and for our alerting system to hook into Netuitive.”
As well as providing real-time monitoring, Amadeus wants to eventually make use of the predictive capabilities of Netuitive.
Raven said that the software learns from the historical monitoring data, and Amadeus sees a future where it might be able to predict when a problem may happen based on a particular pattern that occurs in the data.
“At the moment, it hits us when it stops working. We are hoping we get the information when things change rather than when they stop. You’ve got to get there before the problem,” he said.
“There isn’t much time to get to the problem [once it happens]. We are talking about a small number of minutes to recover.”
“At the moment, it’s always too long. I don’t think this is a one-off programme at all. This will be ongoing and increasingly developed,” Raven said.
Find your next job with computerworld UK jobs