With nearly one billion users worldwide and 500 million people visiting its social network every day, Facebook has its work cut out for it in managing its systems. To help do that, the company has been developing its own management and development tools tuned to its specific needs, rather than relying on commercial offerings.
Tools in use include Perflab, for testing site changes committed by engineers; Gatekeeper, for advanced A/B testing of code changes; and Claspin, providing a high-density heatmap for viewing a large set of servers. "We spent a lot of time building up the internal tool stack," said Jay Parikh, Facebook vice president of infrastructure engineering, at the O'Reilly Velocity conference in Santa Clara, California. The conference is focused on web performance and operations, with Facebook serving as a prime example of the demands being made on the web.
With Perflab, Facebook can test every code change committed by engineers. The tool helps Facebook push through thousands of code revisions per week. It also tracks back end metrics, such as CPU usage and data-fetching. Gatekeeper, Farikh said, is "essentially an A/B testing framework on super steroids." It separates the release of code versus the activation of a feature in production. Claspin, meanwhile, gives a view of distributed systems in Facebook's infrastructure. "We're able to spot oncoming or up and coming problems and be able to drill down very quickly with just a couple of clicks."
Facebook has built dozens of its own tools, Farikha said. While Facebook does not commercialise these tools, it does offer them via open source on occasion, such as it did with its Phabricator software fabrication tool last year, Farikh said. No decision has been made yet on whether Claspin, GateKeeper, or PerfLab could go this route. "These tools also are very ingrained with our system, so they're not easily generalisable. So we're not sure it would make sense to open source them yet.
Facebook has big tasks to undertake in the data management and coding realms. "Today, we will ingest 10 terabytes of log data into Hadoop," in about 30 minutes, Farikh said. Facebook also will have six million photos uploaded and 160 million newsfeed stories created in that amount of time. The company pushes out 700 million code changes to its users every day at least once a day.