Tsonga v Wawrinka. Djokovic v Nadal. Murray v Federer. Sounds pretty straightforward: one wins, the other loses. Right?
Actually, there’s far more to it than that – at least according to Murray Swartzberg, senior vice president of men’s tennis association ATP. (See also: what is a graph database?)
The association has just launched a new website which takes data from tennis matches over the last 25 years (which is constantly updated, for example with statistics from the ongoing ATP World Tour).
“We started by asking three simple questions: who is the best server, why that is statistically, then which is the best returner in the world, and who is the best player under pressure, according to the numbers,” Swartzberg tells ComputerworldUK.
These statistics are presented as a leaderboard on the website. However the online tool allows fans to go much further, comparing players by surface (clay, grass or concrete), year, career, their own previous performances, and so on.
“We want to make the game more appealing to a broad audience. And a great way to do that is to explain the game of tennis using statistics,” he adds.
The association signed a partnership deal with IT services company (and ATP sponsor) Infosys for the project in October 2015 (the Indian multinational’s CEO Dr Vishal Sikka is understood to be a big tennis fan).
The partners started doing trials of the technology at the 2016 Apia International Sydney in January before launching the website in April. The underlying technology includes Apache Hadoop and machine learning, according to Swartzberg.
“Before we go into machine learning, we have to do a bit of human learning of course. But what it does is give us the ability to correlate vast amounts of data and look for anomalies in data that we wouldn’t otherwise find,” he says.
There have been some challenges along the way, of course. When processing vast amounts of data, the quality of that data and ability to correlate it becomes vital. The team have had to manage a vast array of inputs from structured data pulled from SQL databases and semi-structured data in an XML format to unstructured feeds like news articles or social media feeds.
“Since early on in the process we’ve had challenges in correlating unstructured data. Tennis is a data-driven business and we’ve amassed huge amounts of data historically in many different formats,” Swartzberg explains.
“We want to give fans the tools to answer questions. ‘What does it take to be number one in the world?’ We can evaluate the performance of all previous number ones. Or ‘who’s the best comeback player?’ We can see who’s won after going two points down in a game,” Swartzberg says.
However the tool isn’t just useful for fans. It is also a handy means of providing new methods to evaluate players, he explains.
“A coach can see the breakdown of their player’s performance against the opponent. Or a journalist writing a match report can see where strengths and weaknesses are, or it could be used in the commentary booth. It’s all about how to make the game more attractive,” he explains.
The next step is to develop and keep on updating the platform, providing insights as the match is occurring, according to Swartzberg.
“That way we can provide a new level of commentary. Commentators can talk about whether at a given stage of the match, is this player playing above or below their usual performance against this opponent? How does it compare to this player, this surface? It’s all about making tennis more fun.”