How to use data scientists and machine learning in the enterprise

Yandex Data Factory experts explain how to introducing data science into a business


Machine learning has become a buzzword in business technology but the implications of applying it are often overlooked.

"The major problem is that data science is science itself, and businesses aren't very well accustomed to using scientific methods of decision making," says Jane Zavalishina, CEO of machine learning and data analytics specialists Yandex Data Factory.


The company emerged as a spin-out from multinational technology corporation Yandex, the operator of the largest search engine in Russia. In December 2014, Yandex extended the capacity in data science it developed to support this core product into providing machine learning-based services for industry applications by launching the Yandex Data Factory.

The Yandex Data Factory team establishes its findings through a process of experimentation, and its success can only be judged once the experiment concludes.

"When you delegate some work to your employee, ideally you expect more or less a complete level of results," Zavalishina explains. "But it works differently with data scientists, because with data science you cannot expect guaranteed results."

Failure will be a legitimate outcome of any data science project and this is a prospect business managers must accept.

What makes a data scientist tick?

Working with data scientists requires an alternative approach to business in which logic overrules creativity and reality trumps belief. In other words, it depends on fact and logic rather than imagining what could be possible.

It’ll be a struggle, then, to task data scientists with questions that they fundamentally consider meaningless.

"It sounds like division by zero, it doesn't make sense," says Zavalishina. "The problem is you can't make them do this; you cannot motivate people to divide by zero. They start thinking you're probably an idiot, which doesn't make your work with them better."

Read next: How to get a job as a data scientist

They need to understand the project and believe that it makes sense. If they are approached to use machine learning to improve systems, for example, they will need enough data to measure meaningful results.

"A lot of decisions in business are made by intuition, that's why there is no need to measure everything in regular business," says Yandex Data Factory COO Alexander Khaytin. "But then when it comes to a data science project or to communication with data scientists you can't just tell them, 'do this stuff, I feel it's going to be good.' It doesn't work."

Asking the right questions 

Predictive analytics modelling relies on algorithms that tend to be far more complex than more traditional statistical systems. They can be difficult to explain.

The retail industry often uses data science to better predict stock replenishment requirements for weekly item orders. The results can amaze, but there are so many factors to take in that the process itself is often hard to communicate.

"It's just impossible to explain to someone who cannot grasp data complexity, but because it cannot be explained, you cannot decide how good it is just based on your common sense or business intelligence," says Zavalishina. "You need to make sure that you know what it is you want to improve, and how you measure results. 

"It's not creative. It's specifics and what it tried to predict or optimise. It's like dealing with mathematicians. You ask the question and then you will receive exactly the answer to this question."

Read next: The 16 best data scientists to follow on Twitter

If your question is wrong don't expect the right answer. It's a surprisingly common problem, as companies often lack thorough planning on their objectives and the measurement of assessing them.

"We were working with this big retail company and the asked us to build a model which would predict how much of each and every item will sell the next week," Zavalishina recalls. "We tried it with one item, but the problem was they realised that [the prediction] is practically no use for them."

Their model was precise, but the company was ordering its product in packages of six rather than as individual items. If the prediction called for seven items next week, they would need to answer a different question. Should they buy one or two? It may appear a small change, but it meant they had started at the wrong place. The model became entirely different, because the parameters for optimisation had shifted.

Data science requires careful planning. The company received the right answer, but should have asked a different question.

Failure on the route to success

The optimisation model provided to another retailer suggested that the expensive and unusual products they rarely sold weren't worth ordering at all. The decision was mathematically logical, but that doesn't mean it made business sense. Such items can be crucial to the shop's identity and customer base.

"You are pretty much guaranteed that with your first data science project or machine learning project you will need to get back and rethink what the metrics are and what the goals are," says Zavalishina.

Yandex usually recommends customers begin with projects that are very specific and short, to avoid the risk of a long-term investment in a project that could have meaningless results. This method allows companies to make piece-by-piece improvements across the board.

Another company had their own system to determine which customers were sent certain offers. Yandex would use the recommendations of a statistical model produced by a machine learning algorithm to determine how a random slice of the customer base was contacted. The rest of the customers were contacted according to the previous system, and the company then compared the conversion rates of offers into sales.

The only problem was that the offers were sent to the control group on Friday and to the experimental group on the weekend. The diverse patterns of behaviour at the different times of contact made any comparison meaningless.

Business managers often ask Yandex whether they should take courses in machine learning or data science to understand how the technology could benefit their organisations.

"What we usually answer is actually no, it doesn't make any sense," says Zavalishina. "It won't make you data scientists, so it won't really help you. If you want to be able to apply the technology in your work, you are much better off learning the scientific method and measuring and experimentation. Basically, we need a more scientific approach in the business if you want this technology to bring results."

Accepting uncertainty

Businesses need to embrace the scientific culture. Negative results don't mean the work has failed, they only prove that the optimisation didn't work.

The responsibility within the corporate structure is another challenge.  Yandex was once approached by a client hoping to optimise its advertising spending. The algorithm developed promised the same level of response while saving 20 percent in costs.

Implementing the results proved more challenging than attaining them. The staff responsible for this project were paid bonuses based on their plans and decisions behind what they should buy to achieve optimal results.

"So now they have this model, which provides them with recommendations, and mathematically it is proven that the recommendations were better, but the problem is it's their responsibility," explains Zavalishina. 

Data science projects acknowledge for the different responsibilities and priorities that can exist in the same business. This team was expected to implement a model that could result in cuts to their bonuses.

"When it comes to a scientific approach it's much more rational, much more measurable, and this can be quite a conflicted situation," adds Khaytin.

"The usual decision-making purpose is going to be at least disrupted. For example, an expert can tell you 'I have an intuition, I have an idea, it's going to be that way'. In our hands you have some data science tool, some data science project and it's totally different, there is no intuition, there is no place for it."

Integrating business and scientific approaches is a complicated process that requires patience and understanding. Yandex also worked with a steel manufacturer on optimising the balance behind the mixture of materials used in the production process. The quality was improved by increasing the quantity of a certain substance, but the more of this substance that was added the more expensive the production.

Yandex used historical data to make an accurate model of how best to balance the quality and cost of the mixture, returning with a recipe provided by a machine learning algorithm.

"This recipe often doesn't make sense to them," says Zavalishina. “They look and say 'no it won't work, I cannot do that, I'm not accepting this, I'm doing something different'. 

"The funny thing is it will bring better optimisation, but on the other hand you have the experts' [preferences], so how do you deal with that?  They are basically not using 80 percent of your recommendations.

"We came up with a solution, which would be another algorithm which looks at the recipe we provided, and on top of that builds the prediction of how probable it is to be accepted by that operator. So optimise the recipe so they became a bit a less optimal from a strictly mathematical point of view, but much more probable to be accepted by humans."

Fears have long been expressed that artificial intelligence could destroy humankind, but the marriage between man and machine learning remains at the foundation of data science.

"Recommended For You"

Yandex open sources "crown jewel" machine learning library CatBoost Havas Group uses HPE converged systems to transform its data storage