Revealing a world of hidden dependencies with Libraries.io

A couple of weeks ago, we announced that Tidelift had joined forces with Libraries.io to make open source software work better for developers and users.

Libraries.io has done a lot of amazing things—many of which Havoc already wrote about—but one of our favorites has been their open data releases, like the one last week, of the largest publicly-available dataset of open source software packages in the world!

This dataset is really unique in how well it helps us understand the inner-workings of the open source universe, but there are a couple particular aspects that really stand out for me.

Mapping Open Source Dependencies

A huge portion of the Libraries.io effort has been centered around tracking package dependencies, and, as such, they’ve created an enormous map of millions of dependency interactions across all of open source. With this, we can not only see and analyze the range of dependencies on any given package, but we can also follow each of those dependencies down into their own dependencies.

Exploring these hidden dependencies (also called transitive or nested dependencies) in open source is really hard, but it’s incredibly important for solving many issues that plague the ecosystem, namely three big ones: licensing, security, and versioning.

By mapping the dependencies between packages, Libraries.io was able to create a stat called “dependent repositories count,” which does exactly what it sounds like: it looks at a given application-level package and counts the total number of code repositories that require that package as a dependency.

This might seem straightforward, but in reality dependent repositories count is perhaps the single best measure of the popularity of an open source package. Unlike some existing metrics (downloads, GitHub stars and forks) which are non-decreasing—meaning that the total count only ever increases or stays flat—the dependent repositories count is an active measurement that can go up and down based on present-day usage.

Why is this important?

There are a couple of key reasons why this really matters.

The first is that a stats such as downloads, stars, and forks don’t tell you how many developers are actually using a piece of software; just because they downloaded it or liked it, doesn’t mean it’s running in their application.

The second is that dependent repositories count is the only metric that actively responds to the community’s preference, and it’s the only measure that will decrease if the community stops using a package. This is incredibly powerful! It uniquely leverages the collective knowledge of open source developers across the globe, letting their universal wisdom and actions determine which packages are the most critically interconnected.

What this looks like in practice

Here’s a real world example. Below, I’ve included a table of the top 10 most-depended-upon packages in four popular open source languages: JavaScript, Python, Ruby, and PHP.

Of particular interest is to look at the complexion of the various packages that are the most used in their respective languages: we see some large and conclusive frameworks (express, Django, rails, phpunit), but also a lot of smaller parsers and utilities. And what’s more, many of these packages would be overlooked by other attention metrics.

Top 10 Open Source Software Dependencies

Rank
JavaScript
Python
Ruby
PHP

1
express
requests
rake
phpunit/phpunit

2
uglifier
Django
activesupport
psr/log

3
mocha
Flask
i18n
monolog/monolog

4
gulp
six
rack
laravel/framework

5
grunt
Jinja2
builder
symfony/console

6
lodash
MarkupSafe
tzinfo
doctrine/inflector

7
body-parser
Werkzeug
rails
mockery/mockery

8
grunt-contrib-watch
gunicorn
multi_json
swiftmailer/swiftmailer

9
babel-core
mock
rack-test
symfony/yaml

10
chai
Sphinx
thor
symfony/event-dispatcher

Rank	JavaScript	Python	Ruby	PHP
1	express	requests	rake	phpunit/phpunit
2	uglifier	Django	activesupport	psr/log
3	mocha	Flask	i18n	monolog/monolog
4	gulp	six	rack	laravel/framework
5	grunt	Jinja2	builder	symfony/console
6	lodash	MarkupSafe	tzinfo	doctrine/inflector
7	body-parser	Werkzeug	rails	mockery/mockery
8	grunt-contrib-watch	gunicorn	multi_json	swiftmailer/swiftmailer
9	babel-core	mock	rack-test	symfony/yaml
10	chai	Sphinx	thor	symfony/event-dispatcher

It’s worth noting that this also isn’t a perfect metric: some communities don’t track dependencies at all, and others have weaker data aggregation. Like any statistic, it can’t paint a flawless picture of the entirety of the open source ecosystem. What it is, though, is the most reliable and up-to-date measurement of the community’s current attitude about package usage.

Over the coming weeks and months, we’ll begin to dive in a little deeper to analyze some of the data that Libraries.io is collecting to help the world better understand open source software.

If you are interested in learning more, consider signing up for updates or following us on Twitter.

RSVP: How to reduce your organization's reliance on "bad" open source packages

Revealing a world of hidden dependencies with Libraries.io

Don't miss the latest from Tidelift

Mapping Open Source Dependencies

Why is this important?

What this looks like in practice

Top 10 Open Source Software Dependencies

Data, Libraries.io, Dependencies, Metrics, Packages

You might also like:

Address

Tidelift

Product

Resources

For Maintainers

RSVP: How to reduce your organization's reliance on "bad" open source packages

Revealing a world of hidden dependencies with Libraries.io

Don't miss the latest from Tidelift

Mapping Open Source Dependencies

Why is this important?

What this looks like in practice

Top 10 Open Source Software Dependencies

Data, Libraries.io, Dependencies, Metrics, Packages

You might also like:

Data, Libraries.io, Dependencies, Metrics, Packages

What makes an open source package “bad” for enterprise use?

Data, Libraries.io, Dependencies, Metrics, Packages

How Tidelift open source intelligence data makes your supply chain healthier and more secure

Address

Tidelift