The studio behind the hugely popular Xbox game Gears of War is using AI bots and data analytics to cut down on in-game crashes.
Phil Cousins is principal engineer at The Coalition, the first-party Microsoft that makes the Gears of War series, and it's his job to bring together the artistic elements of the games and the technical side to make sure it runs up to speed across both PC and Xbox. Cousins wants to use data and analytics to catch problems with the game before the end user does.
Cousins’ team uses a complex stack of tools including out-of-the-box products like Adobe Photoshop, Unreal Engine, and Visual Studio along with its own custom software, plus anything coming in from its dozens of outsourcing partners. But logging formats vary from tool to tool, and this creates problems.
Splunk solved a lot of these issues for The Coalition. Speaking at Splunk's .conf user conference this week, Cousins said Splunk provided four features the team really liked.
"The first was that we require no centralised schema for our logs," he explains. "There was also a vast array of universal plugins to get that data flowing instantly without having to write a bunch of stuff. Then there was great searching and visualisation, and lastly it was easy to create alerts and reports for when our servers went down."
In short: "We could start to see key insights into a lot of our tools which we couldn’t before."
Naturally the studio started small, setting up an instance of Splunk and treating it like an IT operations manager would, feeding it well-formatted logs for things like disks, CPU, network and P4Admin to react quicker to errors.
Cousins says that once the studio had implemented Splunk for logging operational data others in the company started asking about reports. It has since increased the Splunk licence to allow for the ingestion of 40GB of data a day, which will set you back roughly £1200 per gigabyte per year on a perpetual licence.
The challenge for Cousins and his team was to get these into a format from which they could take actionable insights rather than as a technical dashboard.
Cousins says the team tweaked the metrics into things the company knew would be relevant to people like designers, artists and quality assurance people. "So these became frames-per-second, memory usage, crashes and test coverage, if someone had actually been to an area in the game," he says.
Bots at war
The Coalition derives relevant game data by using machine learning to spin up artificially intelligent bot players to run through the game during out of office hours, with the log data ready for the engineers in the morning.
"We actually play multiple versus tests in a multiplayer game where we spin up ten bots that play against each other, and hoard mode which we run through fifty waves," Cousins says. "It can do the entire coverage now of a quality assurance (QA) team by itself."
Then to get this into a format that designers and engineers could understand the studio built its own Splunk app based on IT Service Intelligence. The dashboard features the same metrics to explore, plus a heat map to show where errors are occurring, so engineers can jump to that point in the game and investigate.
The early teething problems with Splunk mainly revolved around The Coalition’s topography. Cousins had some advice for anyone starting out with Splunk to avoid the mistakes he made.
He explains: "Our indexes started falling behind as we started throwing more data at it. So we rebuilt our topology to have a single search head and a bunch of indexes that index on separate machines. Then we have a separate license deployment server and a bunch of forwarders with SQL server on the side which we inject into."