Managing the complexities of modular software

Last week, I took a dive into the world of JavaScript and NPM (its largest package manager) exploring the granularity of JavaScript software and the importance of package management in that ecosystem.

However, the trend towards modular software packaging in open source extends far beyond JavaScript.  What we find is that many of the largest languages and package managers exhibit a comparable level of modularity to JavaScript and NPM, tending towards small package sizes with many long tail packages that see minimal use.

But what impact does this have on your software development?

The spread of granular packages

As we can see in the boxplot graph below, many prominent package managers and open source communities are becoming more granular, as the distribution of their repository sizes is largely clustered below 1 MB, with the mean repository size often being close to 100 KB.

A quick refresher on boxplots: the colored boxes represent the middle 50% of repository sizes (from the first to the third quartile, also known as the interquartile range), with the interior line showing the mean repo size.  The horizontal lines at the top and bottom of the vertical whiskers tell us the maximum and minimum repo sizes, and the dots above or below those lines show outliers in the data.

This graphic leads us to a number of interesting conclusions: for most package managers, individual packages do tend to be small, however we see huge diversity in the range of repository sizes—for example, Rubygems is tightly concentrated between 100 KB and below 1 MB repositories, but with a huge number of outliers.  

repository size by package manager

We can also see some of the older package managers (such as Maven and NuGet) tend to have the largest repository sizes.  Why is this?  It’s hard to definitively say.  Do these repositories simply contain more code?  Or do they tackle more problems (within a single repo)—as opposed to the more task-specific granular projects we see today?

All told, this data proves to us that granular packaging is not a passing trend in JavaScript, but rather a movement that has encompassed all of open source, and is here to stay.

The scope of open source

As was the case with JavaScript, because of all the small packages that are included in open source ecosystems, there are many that go almost unused in development.  This is the long tail of open source.  

As we can see below, this is not unique to JavaScript and NPM: only ~30% of packages in the selected package managers have at least one public repository that depends on their code.  But due to the present modularity of open source, actively used packages number greater than 370,000.  And that says nothing of the packages used in private repositories that can’t be analyzed publicly—meaning that this is an underestimate of how many packages are actually used from each of these communities.

package usage in top package managers

All of this is to say that despite the “long tail” of open source, the rise of granular software packaging has resulted in a world where there are hundreds of thousands of packages that are actively used by professional software development teams.  Managing this breadth of software can present challenges.

What does this mean for your software?

The act of releasing open source software in small packages has many consequences, both intentional and not.  For example, small packages tend to be more specialized to a specific task, updated more often, and, generally, less complex.  

They also introduce more potential points of failure into a build: you require more packages to build your application, and those require more dependencies of their own, introducing a complex dependency tree that could cause trouble for you as the end developer.

And should you decide to use a package that is a part of the long tail of open source, there is a potentially greater risk that the package becomes unmaintained, leaving you, as the user, in limbo.

A first step to caring for your team’s software is being aware of the potential unintended side-effects of modularity of open source software packaging; after that, there are a number of other directions we can take to help our open source code and community.

If you are interested in learning more, consider signing up for our mailing list or following us on Twitter.  And if you love all things package managers, check out The Manifest, a podcast co-hosted by our own Andrew Nesbitt.

Keenan Szulik