Uber has developed a complex in-house tool for deploying machine learning-as-a-service, which it calls Michelangelo.
In a lengthy blog entry on the Uber Engineering site posted last week, Jeremy Hermann, head of machine learning platform, and Mike Del Balso, a product manager for machine learning at Uber, describe Michelangelo as: "An internal ML-as-a-service platform that democratises machine learning and makes scaling AI to meet the needs of business as easy as requesting a ride."
In short: Michelango has enabled Uber to roll out a standardised workflow for new machine learning solutions across the organisation. This takes the form of six steps: manage data, train, evaluate and deploy models, make predictions and monitor predictions. The system also supports traditional machine learning models, time series forecasting, and deep learning, the blog post explained.
This follows on from the work being done at Uber under former head of machine learning Danny Lange, which we wrote about last year. "The results from Jeremy and Mark are a continuation and evolution of that platform, which we named Michelangelo," an Uber spokesperson explained.
Before Michelangelo, Uber struggled to scale machine learning algorithms across the organisation. Hermann and Del Balso explained: "While data scientists were using a wide variety of tools to create predictive models (R, scikit-learn, custom algorithms, etc.), separate engineering teams were also building bespoke one-off systems to use these models in production.
"Prior to Michelangelo, it was not possible to train models larger than what would fit on data scientists' desktop machines, and there was neither a standard place to store the results of training experiments nor an easy way to compare one experiment to another.
"Most importantly, there was no established path to deploying a model into production – in most cases, the relevant engineering team had to create a custom serving container specific to the project at hand."
Under the covers Michelangelo is a mix of open source systems and components, assembled by Uber engineers. This includes HDFS, Spark, Samza, Cassandra, MLLib, XGBoost, and TensorFlow components. This is deployed on top of Uber's own data lake and compute infrastructure.
When asked if it was considering open sourcing the Michelangelo tool itself, a spokesperson for Uber told Computerworld UK: "We're not open sourcing anything related to Michelangelo at the moment.
"Open sourcing machine learning platforms is difficult and doesn't yield the same value for other organisations to use since these systems depend on proprietary data and customised signals to operate."
Michelango at UberEats
Staff within the food-delivery department at the company, UberEats, had "several models running on Michelangelo, covering meal delivery time predictions, search rankings, search autocomplete, and restaurant rankings", the blog post outlined.
"On the Michelangelo platform, the UberEats data scientists use gradient boosted decision tree regression models to predict this end-to-end delivery time."
So the model crunches through time of day, delivery location, average meal prep time for the last seven days and and near-realtime meal prep data to make an accurate prediction.
"Models are deployed across Uber's data centres to Michelangelo model serving containers and are invoked via network requests by the UberEats microservices. These predictions are displayed to UberEats customers prior to ordering from a restaurant and as their meal is being prepared and delivered."
The Uber engineering team are constantly tweaking Michelangelo and are currently working on a feature called AutoML.
"This will be a system for automatically searching and discovering model configurations (algorithm, feature sets, hyper-parameter values, etc.) that result in the best performing models for given modelling problems," the blog explains.
"The system would allow data scientists to specify a set of labels and an objective function, and then would make the most privacy-and security-aware use of Uber’s data to find the best model for the problem. The goal is to amplify data scientist productivity with smart tools that make their job easier."
The team is also working on a model visualisation tool to help with debugging deep learning models. It also wants to build out its online learning portal for machine learning to further democratise the use of the technique across the organisation.