Using Neo4j to Find the Most Powerful Package


2:30 PM (Pacific Time) March 22, 2016. NPM packages broke for a few hours because Left-pad was unpublished. This showed us, as developers, that sometimes one person holds the power to break a lot. All because Left-pad was popular, but Left-pad was also used in other popular packages that were used in yet other packages. This is an example of a powerful package I want to find in Python.

I made a graph database with Neo4j for another project, but could use this same database to answer this question. Within the 5,317 packages, I will look at how many packages depend directly or indirectly on each package, and then list those.

MATCH (n:Package)<-[:DEPENDS_ON*0..]-(m)
WITH n as n, count(DISTINCT m) as num_deps
RETURN n.name as Name, num_deps ORDER BY num_deps DESC LIMIT 25

List of the first 25 packages

|"Name"               │"num_deps"│
╞═════════════════════╪══════════╡
│"typing-extensions" │1518 │
├─────────────────────┼──────────┤
│"six" │1310 │
├─────────────────────┼──────────┤
│"zipp" │1004 │
├─────────────────────┼──────────┤
│"idna" │1002 │
├─────────────────────┼──────────┤
│"urllib3" │983 │
├─────────────────────┼──────────┤
│"charset-normalizer" │960 │
├─────────────────────┼──────────┤
│"certifi" │954 │
├─────────────────────┼──────────┤
│"requests" │902 │
├─────────────────────┼──────────┤
│"importlib-metadata" │822 │
├─────────────────────┼──────────┤
│"python-dateutil" │701 │
├─────────────────────┼──────────┤
│"pyparsing" │627 │
├─────────────────────┼──────────┤
│"colorama" │624 │
├─────────────────────┼──────────┤
│"attrs" │596 │
├─────────────────────┼──────────┤
│"packaging" │569 │
├─────────────────────┼──────────┤
│"importlib-resources"│474 │
├─────────────────────┼──────────┤
│"pytz" │472 │
├─────────────────────┼──────────┤
│"numpy" │463 │
├─────────────────────┼──────────┤
│"setuptools" │428 │
├─────────────────────┼──────────┤
│"pyyaml" │380 │
├─────────────────────┼──────────┤
│"click" │353 │
├─────────────────────┼──────────┤
│"markupsafe" │336 │
├─────────────────────┼──────────┤
│"jinja2" │328 │
├─────────────────────┼──────────┤
│"dataclasses" │301 │
├─────────────────────┼──────────┤
│"typing" │278 │
├─────────────────────┼──────────┤
│"wrapt" │252 │
└─────────────────────┴──────────┘
Six and 100 of its 327 direct first dependencies

This gives us the result. Typing-extensions is the highest package, but it is maintained by Python itself. So if Guido van Rossum ever goes crazy, 100% of Python is unsafe.

But after that comes Six. Six is a Python 2 and 3 compatibility library used directly by 327/5317 packages (one relationship hop away), and indirectly by 1310/5317 packages (more than one hop). Looking at the insights of Six’s GitHub page, we see only 1 person with a significant influence. Let’s hope Benjamin Peterson has mercy upon us all!

So it will be time to stop asking “Why is Six afraid of Seven?” and start asking “Why is Python afraid of Six?.”


Using Neo4j to Find the Most Powerful Package was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.