As their names suggest, the two are closely related. Lodash started out as a fork of Underscore. Their API and functionality has a significant overlap. Some developers have suggested that both utility belts have become less useful over the last few years. The argument draws on the "death-by-success" pattern: their functionality is so essential that much of it got included into the language (in particular since ECMAScript version 6). So programmers might not need to import the packages anymore. On the other hand, proponents give plenty of reasons for not abandoning the utility belts: clarity, convenience, simplicity, speed, or access to functionality that is still not available in ECMAScript.
So what's really happening in the community? What do the professionals decide? Are they abandoning the utility belts or are they depending on them more and more?
LGTM's large scale analysis of open-source projects can help answer these questions.
There are many ways to include a library: you can for example import, require or include in script tags. You can download and then rename a library. On the other hand, you can mention a library in your package.json without actually using it in your code.
LGTM's of use of QL makes it possible to cut through this thicket. The following QL query will check whether a project depends on Lodash or Underscore:
currently all the way back until June 2015. That means that the first commits we see is not necessarily the first ever commit of a project. In some cases this results in missing data — in those cases I've assumed that the first record of a dependency is not a new one. For example: if the first commit is after 40 days, and at that stage the project depends on Lodash, I will count it as having depended on Lodash during those first 40 days as well. This is much more likely than Lodash having been introduced just when the data collection started.
It's not very surprising that projects would migrate from Underscore to Lodash more often than the other way around. Advice and blogs tend to favor Lodash by a large margin. And even a quick Google search indicates that this direction seems to be the more popular. At the time of writing, "from lodash to underscore" has 10 Google hits, while "from underscore to lodash" has 340. Dropping the "from" makes it 429 versus 1810.
But in fact, the majority of the shift is caused by something else: It's not so much projects swapping Underscore for Lodash. If we look at the data more closely, we see that projects that previously depended on Underscore often stop using utility belts altogether. On the other hand, projects that start using utility belts often turn to Lodash.
True switches from using purely Underscore to using purely Lodash (the dark blue areas) account for only a small part of projects changing their utility belt portfolio over the course of the two years. The converse is even rarer: Not one single project in our data used Lodash at the beginning of our timeframe and only Underscore at the end. (However, there are several which started out from Lodash and then moved to use both.)
Planck famously said:
A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.
Modeling the past and future
We can try to explore how the trends might continue by fitting Markov models to the data4. Such models are based on a simple idea. Each project has a state: [using underscore | using lodash | using both | using none]. Each month, projects might transition from one state to another. The probability that a project does so is determined solely by its current state and the so-called transition matrix. To build the model, we need to determine the probabilities in this matrix. Then we can predict the percentage of projects being at either category at any given time in the future.
They can be seen as an advanced version of Markov models. However, it turns out that they don't add much value beyond normal Markov models in this situation. So Occam's razor tells us to use the simpler method.
On the basis of individual projects, such a model explains 94.7% of the month-to-month variance5. A high number is not surprising, since projects are not very volatile: A project not using a utility belt in March will likely not use one in April either. However, projects are more volatile over longer periods. So let's use the model to predict where an individual project ends up after 2 years from just the starting state. This is a harder task: The uncertainty increases with each successive month the prediction extends into the future. Still, the Markov model explains 73.1% of the variance over that longer time frame.
This is defined as 1 minus the error sum of squares for the model in question divided by the error sum of squares for a model always predicting the base rate. The error sum of squares sums over each month, project and possible dependency setup (Lodash, Underscore, any and both). The model makes a prediction for each combination. E.g. it might predict for April and project 1 that the chance for Lodash is 0.3 and for Underscore is 0.7. If the project uses Lodash that month, the model is penalized (1 − 0.3)2 for its Lodash prediction (would ideally have been 1) and 0.72 for its Underscore prediction (would ideally have been 0).
The fitted transition probabilities per month are shown here:
This corresponds to a mix that appears stable, because its composition changes only slowly. Should the circumstances stay constant6, it predicts that the total share of utility belt projects will eventually settle at 21%, a higher number than the current one. Most of these will be Lodash projects:
Who uses what?
I looked for differences between projects that use one of the two utility belts and those who don't use either.
It turns out, the simplest way to tell whether a project uses any of the two is to look at the number of other things it uses. That makes sense: If a project uses just about any library there is, it will probably also include a utility belt.
However, there is a very useful second dimension to look at: how often a project updates its dependencies. Generally, projects with many dependencies change their library portfolio more often of course. But that correlation is not very tight. And this leaves room for a big effect: If a project is flexible in its dependencies, it's much more likely to use Lodash than Underscore. Likely, the higher dependency churn indicates a desire to optimize one's dependencies together with an open mind for new ones.
The following table shows that many such projects have already found Lodash. It crudely splits the data into three buckets of equal size for each dimension. It then counts the frequency of Lodash and Underscore usage in each combination of buckets.
|Lodash / Underscore||Few dependencies||Average||Many dependencies|
|Dependencies static||3% / 5%||9% / 9%||18% / 22%|
|Average||3% / 4%||8% / 7%||22% / 12%|
|Dependencies variable||6% / 0%||12% / 5%||33% / 13%|
The actual relationship can be distilled quite nicely using a two-tiered logistic regression. This predicts the probability that a project uses any utility belt from the total number of dependencies. Then, it predicts the probability that if it has a utility belt dependency, it's a Lodash project7. For this, it uses both the total number of dependencies as well as the number of changes to the dependencies over our time frame of two years.
It turns out that there is quite a difference between the different regions on that graph:
- Small: Projects that don't use many dependencies. They rarely use either Underscore or Lodash.
- Big and slow: Projects that have a lot of dependencies but don't fiddle around with them. They often use Underscore.
- Large but limber: Projects with a large portfolio of dependencies, which they keep updating. They often use Lodash.
LGTM analyses commits back in time,↩
This is statistically significant at a level of 5% using a two sided binomial test.↩
This is not actually statistically significant, probably due to the lower number of Underscore projects in total.↩
I did play around with hidden Markov models as well.↩
This refers to the R-squared value of 0.95.↩
They will not.↩
Projects using both Lodash and Underscore were not counted for this second question.↩