that they were finally taking steps to ensure their users selected an open source license for repositories created on the service. Their offering is promising; an additional step in the process to create a new repository prompts for selection of an open source license, and for those needing help making a license selection they have made a new web micro-site called
which offers simple (if sketchy and perhaps slightly partisan) license analysis. There's a full discussion
They have more to come; they have staff working on a tool to add licenses to existing projects
, and both tools are themselves hosted on Github and accepting pull requests from all comers. Anything you don't like, you can help fix. All very pleasing and a welcome change from the answers I was getting when I wrote the article that exposed the problem
About the same time this news broke, I heard from source code analysis specialist Black Duck. They told me that, prior to the Github announcement (which they had not expected), they had analysed the projects on Github, using both their own proprietary scanning software and also inspection by their analysts. They asserted that "while 60 percent of OSS projects have a clearly declared license, 40 percent of the world’s OSS projects have no explicitly declared or identifiable license, raising questions for many". This statistic relies heavily on their study of Github; they state that "77 percent of projects on GitHub have no declared license, compared to only seven percent of projects from all other open source forges."
So in other words, there's not too much of a problem anywhere but Github. That 77% number sounds pretty scary all the same. But their press release needs closer study. It turns out that "of this 77 percent, 42 percent of GitHub projects actually have embedded licenses" - a slightly ambiguous statement. I asked Black Duck's Dave Gruber what it meant to have an "embedded license". He told me that meant a valid open source license existed in the project, but was only declared in the source code itself and not in a README or LICENSE file in the file structure. The code was licensed; it just wasn't possible to cursorily scan to determine that fact.
While that makes it harder to see at a glance what license is in use, it doesn't mean the code lacks an open source license or presents any danger. If you were thinking of using or even contributing to the project, you would be sure to open the source files and take a look. So I also clarified what the real numbers were, given the percentages were stated ambiguously and with unclear terminology.
Gruber told me that Black Duck did not scan the whole of Github; rather, it restricted analysis to projects that had active forks, assuming those to be in collaborative use. When I wrote my original article
, community members told me that they had used scripts to determine that only 25-30% of Github projects had no licenses at all, and Gruber acknowledged this was the same range as Black Duck's expensively-achieved analysis actually disclosed.
Uncertainty and Doubt?
So the real risk is much smaller than the headline numbers suggest. In all this, I can't help feeling Black Duck want us to be afraid. It's very important that Github takes its responsibilities seriously, and their new improvements show they are starting to do so. But the headline "60% of open source is dangerous" number from Black Duck, together with the "77% of Github is dangerous" number, seem over stated. Given their business model is to apply reassuring consulting and tools to corporate fears about open source, maybe that's not surprising. But it's regrettable.
Open source software is all about developers being able to achieve sufficient certainty to collaborate without the need to spend money on legal advice. OSI's approved licenses deliver that, and the vast majority of active open source projects have this topic sorted. While Github's laissez faire attitude to date has led to a good deal of inconvenience identifying the license in use for projects there, as well as pandering to the anti-bureaucratic instincts of the newer generation of developers, it's now being sorted and it never rose to the level of a crisis for most people.
It must have been frustrating for Black Duck to have the PR spin on their new product thwarted by Github; I just wish they had responded by toning down the "danger, danger" message. Open source has a lower compliance burden than proprietary software and its endless, custom EULAs and developer licenses. Let's shout that message, for a change.