CIO

Who really contributes to open source?

New data debunks several myths around which companies lead in open source contributions
  • Matt Asay (InfoWorld)
  • 07 February, 2018 22:00

Microsoft has been nipping at the top open source contributor position for years, but a new analysis by Adobe developer Fil Maj puts Microsoft into a whole other universe of contributions. Or, at least, of contributors.

Using the GitHub REST API to pull public profile information from all 2,060,011 GitHub users who were active in 2017 (“active” meaning ten or more commits to public projects), Maj was able to pull the total number of corporate contributors to GitHub, with results that might surprise you.

Getting at the GitHub truth around open source

Back in October 2017, Googler Felipe Hoffa tried to analyze GitHub PushEvents to understand which companies were most generously contributing to open source projects. By his estimation, Microsoft came out on top in terms of total contributors (about 1,300), compared to second-place Google (about 900 contributors), while Google topped the charts in terms of actual code pushed to repositories (about 1,100 compared to Microsoft’s roughly 825).

It was an excellent attempt, but some of the data didn’t ring true. Why, for example, was Red Hat, a company completely committed to open source, so far behind Microsoft and Google? (Hoffa estimated 442 contributors and 338 repositories from Red Hat.) And though it served the “Amazon is a poor open source steward” tagline that Microsoft and Google prefer, was it possible that Amazon’s open source contributions were really as anemic as Hoffa’s estimate portrayed (134 contributors and 158 repositories)?

Probably not.

In fact, using Maj’s data, definitely not. Maj, unlike Hoffa, analyzed profile information (specifically, the company field) for GitHub’s 2-million-strong developer community. Although it’s not a perfect measure (and no attempt has yet been made to gauge the total number of repositories to which these GitHub contributors push code), it yields a much richer, more accurate data set for figuring out total contributors for any company.

Here, then, is the revised ranking of GitHub contributors, with their total number of employees actively contributing to open source projects on GitHub:

It’s possible that, for example, Google employees neglect to add their company to their profile field, while Microsoft employees may be especially scrupulous to do so.

Even with that caveat in mind, we end up with a far bigger population of corporate contributors than Hoffa’s data set includes. (Also of note: neither data set comes up with the 16,000-plus Microsoft contributors that GitHub itself published back in 2016. GitHub’s methodology, however, remains opaque and wasn’t repeated in 2017.)

Microsoft and developers! developers! developers!

Which leaves us with Microsoft having twice the number of contributors of its next nearest competitor, Google. For those of us that were around when Microsoft castigated open source as a “cancer” and “anti-American,” this is a remarkable change of heart (or, as I’ve argued, a change of business model). Microsoft has long appreciated the value of developers, but Azure has given Microsoft license to embrace open source as a way to attract them to its platform.

Meanwhile, Amazon, so often snubbed as an open source ne’er-do-well, comes in at No. 6 in the rankings, with close to 900 contributors. Amazon has perhaps not worn open source on its sleeve in quite the same was as Google and Microsoft have, but it remains a strong contributor to the projects that feed its developer community.

And Red Hat? Well, Maj’s data finally puts the open source leader in the Top Three contributors, where it belongs. Even fully committed to open source, Red Hat has dramatically fewer engineers on its payroll than Google or Microsoft. As such, it’s doubly impressive that Red Hat would place so highly. The Red Hat data basically reveals what we’ve always assumed: Pretty much every engineer in the company works on open source projects.

Other takeaways? Chinese companies like Baidu, Tencent, and Alibaba, which have long been perceived to be net consumers of open source, actually contribute quite a bit. Ditto Oracle, a company to which I’m generally happy to hand out criticism, ranks very high amongst its legacy peers, largely due to its contributions to MySQL and Linux, though not exclusively so.

As for analyst Lawrence Hecht’s thoughtful question as to the right ratio of contributors-to-developers in large companies, based on how much these companies gain from their open source contributions, I think the right answer is …”more.”