Don't judge a project by its GitHub stars alone
Open source is now universally accepted and employed by developers and companies across the world. This rise in popularity, though, has raised many questions about what exactly the new world of open source looks like.
What are the most popular open source languages?
Which packages have had the greatest adoption?
How many packages are actively being used?
As we started to ask questions like these, we realized we needed to simplify our questions a little bit. When it comes down to it, do we even have ways of getting reliable answers to these questions?
Last week, I took a look at the Libraries.io dependent repositories count, alluding to its origins as a solution to questions about open source package usage. Today, I’d like to dive a little deeper into why this metric is so important, and why it’s hard to judge actual usage with other common stats.
The need for decaying usage metrics
Millions of packages have been starred on GitHub, but this doesn’t quite help us understand overall usage. For example, a user can star a package for many reasons: some do it as a reminder to come back later (like a bookmark), some treat it as a Facebook Like, and others may do it to curate a list of their favorite packages. If anything, stars do the best job showing us the amount of traffic or attention a package receives on GitHub. But none of this tells us if the package is actually being used.
What’s more, GitHub stars also have a nondecreasing problem. By that I mean that the number of stars a package or repo has will only effectively only ever increase—or stagnate.
In theory, a GitHub user could un-star a package, but that seldom happens in practice. This causes stars to represent the community’s feeling at some point in time (specifically the past) more than current preferences.
Another commonly accepted measure for popularity is the number of forks a package has. The thinking is that a developer must be interested in the work if they create their own local copy of the code—assuming they even end up using the local copy.
Yet forks have a similar nondecreasing problem to stars (and also downloads!) in that they never decrease. Because of this, a package that was once super-popular but has fallen by the wayside will still appear popular by each of these counts. Furthermore, most open source software is consumed through ecosystem-specific package managers, not directly via GitHub forks.
This means that, in reality, stars, forks, and downloads act in similar ways: they show a nondecreasing metric of popularity that fails to account for negative changes in the community’s tastes.
Understanding what stars miss
It was with all of this in mind that Libraries.io created their dependent repositories count: to understand which packages are most actively being used, and which are gaining or losing favor. But how different are the lists of top packages by dependent repositories and GitHub stars? And what are the kinds of packages that tend to be over or underrepresented by a metric such as stars?
To attempt to understand this relationship, I took the top ~2,500 GitHub packages ranked by their stars count and joined them with their dependent repositories count. Some top repos were omitted because they aren’t packages to build on top of (for example, the most starred repository on GitHub is freeCodeCamp—containing the codebase and curriculum of freeCodeCamp—which is a wonderful service, but not something most developers use on a daily basis).
This resulted in the final two metrics being dependent repositories per package versus stars per package in each language. Through this comparison, we can find the most over and underappreciated packages—regardless of language—by looking at the difference between their ranking by stars per package and dependent repositories per package:
Packages misrepresented by stars
|Rank||Underappreciated by stars||Overappreciated by stars|
As for the most over-appreciated packages by stars? Well, the overwhelming commonality is that many of these have seen little to no development or new releases in the past two years (hiredis, big-list-of-naughty-strings, Surge, ratchet, ExSwift).
This speaks to exactly why dependent repositories count is an important metric: these packages that were once popular have seen little activity and are thus little used today.
Looking at their stars count would lead you to believe that these packages are still popular, and perhaps actively maintained and updated, which presents a risk to any potential user. It’s these forms of risks that make decaying attention metrics crucial to understanding what’s going into your open source stack.