Internet giants embrace open source: 12 technologies open-sourced by Facebook, Google, Twitter and LinkedIn
Update: Google has now joined Facebook's Open Compute Project, offering more insights into the operation of its huge data centres and allowing others to benefit from its energy efficient hardware designs. Read on for more information about how some of the biggest web firms have opened up their technologies to others in the past.
Open source projects often stem from businesses attempting to solve an internal problem, before being offered up to a wider community. And increasingly in recent years it has been the internet giants such as Facebook, Twitter, LinkedIn and Google which have offered the technology which underpins their huge, hyper-scale computing environments.
Owning some of the biggest data centres in the world, these companies are forced to manage data on an unprecedented scale, requiring some pioneering techniques. And, in some cases, these can be seen to gradually filter down to a wider range of businesses.
And why reveal their trade secrets to the wider world? In return for offering others a glimpse of their internal systems, these internet firms gain access to a vibrant community of developers that can improve their own technology for free. For example, Facebook has claimed that its Open Compute Project has saved it $2 billion in data centre costs.
Here are some of the innovative technologies from the big web companies that are now seeing wider use....
1. Google – MapReduce
Google has released over 20 million lines of code and hundreds of open source projects.
One of its most notable creations has been the MapReduce programming model, which allowed it to crunch huge data sets across large clusters of servers. While it is no longer used at Google, MapReduce’s legacy has been the inspiration for the open source Hadoop platform, alongside the Google File System. Hadoop has become widely used in the years since its creation by former Yahoo employee Doug Cutting, with a number of IT vendors selling their own services based on the software.
2. Google - Kubernetes
Containerisation has been one of the biggest buzz words of recent years, though many have pointed out that the technology is not new. Google reportedly used around two billion containers to manage applications in its data centres, relying on its secretive Borg and Omega technologies to run workloads internally for years.
And these platforms have provided the basis for its open source Kubernetes container cluster management platform, which has been made available publically since June 2014. Kubernetes has been picked by a range of large businesses looking for a lightweight alternative to virtual machines.
3. Google - TensorFlow
At the heart of Google’s impressive search capabilities for Google Photos, voice recognition tools and Google Translate sits its AI system Tensor Flow.
The machine learning tool was open sourced last month in order to help accelerate wider developments around the technology, which is still in its infancy.
"It’s a highly scalable machine learning system," Google CEO Sundar Pichai said of TensorFlow in a blog post. "TensorFlow is faster, smarter, and more flexible than our old system, so it can be adapted much more easily to new products and research."
4. Facebook - Open Compute Project
Facebook has taken an interesting approach to its open source endeavours, focusing to a large degree on hardware rather than software.
The social media giant launched the initiative, which aims to share its data centre design innovations, four years ago in an attempt to “revolutionise data centre hardware”.
“The result is that today we have open-sourced every major physical component of our data centre stack — a stack that is powerful enough to connect 1.39 billion people around the world and is efficient enough to have saved us $2 billion in infrastructure costs over the last three years. But we’re not finished — not even close.”
Initiatives include Yosemite, announced earlier this year as what Facebook claims to be the 'first open source modular chassis for high powered microservers'.
Whether or not the OCP will filter down to more mainstream businesses remains to be seen – though some of the larger enterprises are using it, such as big banks, with Goldman Sachs represented on the board – but a number of tech firms such as Apple and Microsoft have already been won over.
5. Google - Open Compute Project
Google announced on Wednesday that it would join OCP, adding to the ranks of service providers and some the world's largest banks such as Goldman Sachs and Bank of America.
Google proposed a new design for server racks that could help cloud data centers cut their energy bills.
Its first contribution will be a new rack design that distributes power to servers at 48 volts, compared with the 12 volts that's common in most data centres.
"Today’s launch is a first step in a larger effort. We think there are other areas of possible collaboration with OCP," wrote John Zipfel, technical program manager, Google, in a blog post.
"We’ve recently begun engaging the industry to identify better disk solutions for cloud based applications. And we think that we can work with OCP to go even further, looking up the software stack to standardise server and networking management systems."
6. Facebook - Big Sur
Facebook progressed its open source hardware concept recently, providing a server framework that is targeted directly at AI use cases – dubbed Big Sur. By sharing its server blueprints, the social network firm went a step further than Google with its recent decision to open source its own AI software library, TensorFlow.
Facebook recently began to build custom servers based around Nvidia GPUs – chips that were originally intended for rendering computer game images but have proved to be well suited to deep learning.
7. Facebook – Torch
Facebook uses deep learning internally to, for example, filter information on Facebook feeds, and has open-sourced some of the modules it created as part of the Torch deep learning framework. The algorithms created by its Facebook AI Research (FAIR) team are claimed to be faster than those already available in Torch, which is also used by others such as Google and Twitter.
8. Facebook – Cassandra
Non-relational database Cassandra was created by Facebook engineers Avinash Lakshman and Prashant Malik as a means to power its inbox search function.
“Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure,” said Lakshman upon its public release in 2008.
Although Facebook no longer uses Cassandra itself, it is used within other large tech firms such as Twitter, Netflix and Apple, while five-year software firm DataStax is helping popularise the technology among more traditional enterprises.
9. Twitter – Aurora, Storm
The social media firm is a major open source software user, and has contributed back to the community in a number of ways.
Its Aurora framework was created by Google developer, Bill Farner, taking a lead from Google’s Borg microservices architecture.
Aurora builds on top of Apache Mesos and provides common features that allow any site to run large-scale production applications. It is able to make scheduling decisions, such as moving a service onto a healthy machine in the event of a failure, ensuring greater reliability.
“Aurora is software that keeps services running in the face of many types of failure, and provides engineers a convenient, automated way to create and update these services,” the company said in a blog post. A number of other companies are now using the software.
Other projects include Bootstrap and Storm, which is used to analyse large-scale data streams created by millions of Twitter feeds.
10. NetFlix – Chaos Monkey
As a major AWS user, NetFlix wanted a way to test resiliency of its applications running in the cloud. Chaos Monkey was born with the aim of artificially creating problems with virtual machines hosted by the public cloud provider – unleashing its Simian Army to test that its systems are able to react to random failures on the network.
11. LinkedIn – Kafka
Kafka was created by business networking site LinkedIn for internal use, before being open sourced in 2011.
The team of engineers that created the real-time, distributed messaging system left the company last year, to set up a new business focusing on Kafka, called Confluent.
12. Airbnb - Air Flow
Last year, the home-rental firm unveiled plans to open source two data mining tools at its OpenAir engineering summit, AirFlow and Aerosolve.
Airflow is a data workflow management framework that is available under the Apache licence, supporting authoring, scheduling and monitoring of data pipelines.
It has also opened up its Aerosolve machine learning tool which is used internally to support features such as its price recommendations engine for those renting properties.
Open Compute Project acclaims moves by HP and others that should make it easier to adopt
DeepMind is a 'vacuum for very, very clever people', says Ocado Technology’s head of data
CEO Jay Kreps explains how the tech behind Netflix is transforming the flow of streaming data