Food delivery startup Deliveroo continues to suffer from regular outages at peak times, despite the inherent flexibility and resilience of the AWS cloud infrastructure its core applications run on.
There was a major outage reported in January and again in February, leading to an apology email to customers affected which said: "We've had some problems with our server, which is how we connect your order with the restaurant and the rider who brings it to you.
"This kind of thing can happen to all sorts of businesses, but with hungry people waiting for food, we know it’s super important that we don’t let you down. We've learnt a lot from the incident that affected you, and we are doing our best to ensure it doesn't happen again."
Deliveroo credited those affected with free delivery on the following two orders, but the voucher code was only valid for a week.
Deliveroo raised a massive Series E of $275 million from the hedge fund Bridgepoint in August 2016 and is growing at breakneck speed, operating in 120 cities in 12 countries. However, outages are extremely harmful to a company like Deliveroo, especially as it faces fierce competition from UberEats and JustEat, both of which have had outages of their own.
Hey @ClaireAshley31 was just a minor outage, all service should be restored and working as normal again now. Apologies~— Deliveroo (@Deliveroo_AU) January 20, 2017
Deliveroo runs its core applications on AWS cloud infrastructure. AWS is favoured by startups and app developers because of its near-constant uptime and ability to scale according to demand, charging users as they consume.
AWS does suffer outages of its own, which can have a disastrous effect on organisations that are reliant on the S3 backend storage its applications run on, but these tend to hit a large number of companies at once, not just one application.
Carl Brooks, a cloud infrastructure analyst at 451 Research believes that these outages are a natural symptom of growing pains for a startup expanding at the pace of Deliveroo. "In the overwhelming number of cases we look at with outages, AWS isn't at fault," he told Computerworld UK.
"With a company like this you have to coordinate a lot of things: riders, orders and a lot of apps over a number of cellular networks and tie that back to this central app infrastructure, which they use AWS for. So you take a complex app and start throwing as many users as they can at it, that is probably why it is breaking."
Brooks believes that these outages fundamentally come down to engineering. "You have to architect for resilience and unpredictability in factors like web latency and things like that. Also, if they want to save money by turning servers off at times of low demand and are suddenly getting slammed with a rush you will see an outage or a slow down."
Brooks says the same thing happened at Netflix when it was growing at its fastest. "It is really just growing pains for an app like this and Netflix went through the same thing, often on a Friday evening, and experienced years of outages but haven't had a significant outage for several years, so that is what happens at this scale."
Speaking on stage at the AWS Summit in London last week, VP of engineering at Deliveroo Dan Webb was glowing about AWS though, saying that it has "helped us move from a world where we were struggling to cope with the increasing demands of our data warehouse, and only had limited access to real-time data, to an environment which will continue to support us as we grow at a rapid pace".
Computerworld UK asked Deliveroo why the recent spate of outages have occurred but the company wasn't willing to discuss the topic.
All startups reliant on an app and cloud infrastructure will suffer from outages from time to time, and Deliveroo has a strong track record of responding quickly when they do, however this is an engineering issue, not a marketing problem.
While consumers will generally be pretty patient, when it comes to hungry customers the threshold is pretty low. The company will need to iron out these issues quickly.
Machine learning at Deliveroo
Webb also spoke about some of the data work the company does on top of AWS tools. Deliveroo's core routing engine -- called Frank -- uses machine learning algorithms to bunch orders and optimise routing in real-time.
"Frank is constantly calculating and recalculating the best combination of riders and orders to make predictions on rider travel time and food preparation time using machine learning models trained on our historic data," Webb explained.
Deliveroo runs all of this on AWS products, including events processing through Kinesis and Lambda, streamed into an S3 data lake. Deliveroo also uses Snowflake for its data warehouse.
The machine learning algorithms run on a "cluster of applications that sit on EC2, essentially pulling data from Snowflake and running that back to S3 for consumption from our application", Webb said.