While we have submitted our original Git Metric Correlation Research to Researchers, Professors, and the ESEC/FSE 2021 Industry Papers (held in mid-2021), a research paper as detailed as ours takes some time to work its way through the "academic review" circuit. That's why we were very excited to get our first feedback about the research this week from Dr. Alain Abran, PhD, retired Professor and Director of the Software Engineering Research Laboratory at Université du Québec. Here's how Google summarizes Dr. Abran, and here was his profile from a speaking engagement at the International Conference on Information & Communications Systems in 2018:
Dr. Abran was a presenting Speaker/Researcher at International Conference on Information and Communication Systems in 2018, above is a consolidated roll-up of his profile page from the event
In short, Dr. Abran is legendary in the field of software metrics and effort estimation. He has published three widely-circulated academic textbooks on this topic in the past 10 years, he was the Chairman of the Common Software Measurement International Consortium, and he has 500 peer-reviewed papers and 13,000 research citations.
Dr. Abran's opinion on whether our data and methodology makes a compelling case for Line Impact as the metric that best correlates with Story Points?
I have looked closely at the research report you sent me: it is quite impressive, well structured, research well conducted and a very careful methodology for data collection, data sets constructions, and data analysis.
For your intended purpose to provide an evidence base that your measurement approach (e.g. 'Line Impact') is better than other alternatives, I am of the opinion that you have clearly demonstrated its superiority.
To have a Researcher of Dr. Abran's renown endorse Line Impact in this manner should be a big step toward more widespread study & adoption of Line Impact among software teams. Businesses that continue to rely on legacy software metrics like Commit Count and Lines of Code will find themselves at an increasing disadvantage as awareness of Line Impact's reputation spreads.
Beyond his judgment that we succeeded in demonstrating superior correlation between Story Points and Line Impact, Dr. Abran had several thoughtful reservations about the use of Story Points as a proxy for "software effort." He has reached this subject for decades, so we were eager to probe his wisdom for better alternatives. He believes that part of the problem with Story Points is that they don't offer the consistency of a great metric. As he put it,
Typically you expect from 'measurements': repeatability, reproducibility, consistency, etc: that is, when you use a 'measurement method' you, as a customer, expect 'fairness' and 'accountability' and ability to compare across contexts. You do not get any of that with 'Story Points': to the contrary, you do not get 'repeatability, reproducibility, consistency, etc.'
We strongly agree that the estimation method used to choose Story Points contributes significantly to its consistency or lack thereof. This was clearly manifest in our dataset, showing that the range of correlation from team-to-team varied widely. In particular, the challenge of comparing Story Points across contexts forced us to assess the correlation on a per-repo basis, since the constant that translates "Story Point estimation" to "days of effort spent" is different for every team. We believe that teams can (and sometimes do) strive to ensure that their Story Points correlate with effort, but accurate Story Point estimation doesn't happen absent a concerted effort of talented developers & managers.
To that end, Dr. Abran points out another shortcoming of Story Points:
Another issue for research using 'story points' is that such information is collected 'up front' at estimation time, but it is not re-measured at the end of a project-interation-sprint! And typically there is no analysis to verify how many 'story points' have indeed been 'delivered' at the end of a project, or did they really correspond to the 'complexity' of the software delivered, and they certainly do not correspond to the real effort that was required to complete the project-commit-etc.
He is correct that Story Points are usually estimated up-front and seldom evaluated afterward. The implication is that measuring Story Point correlation is more like measuring how closely tasks align with what the team expected the effort would be, rather than what the effort actually turned out to be. Teams that wanted to maximize the correlation between Story Points and other git metrics can use a postmortem after each Sprint to update the Story Points for issues that turned out to require different levels of effort than had initially been estimated.
His final assessment of Story Points reemphasizes his belief that it can be a volatile metric to use as a foundation:
Therefore, any analysis with 'story points' is that it is using an 'estimate' which typically has a large margin of errors: this is far from a solid foundation - it is more like using quick-sands as a foundation
We concur that Story Points can be a tricky independent variable to match with a high level of r2. At the same time, we believe that it's possible for teams to choose Story Points in such a way that they reflect software effort; indeed, this is how all of the top Google articles recommend choosing Story Points. Teams are implored to capture their "effort" or "complexity" estimation within Story Points, but the extent to which they succeed at this is difficult to prove.
In general, given the variability of Story Points, it seems prudent to explore additional "effort" variables when establishing correlation of Line Impact during the next round of research we sponsor.
We will continue to communicate with Dr. Abran and other leading researchers to understand and incorporate the best possible methods of software estimation. Our hypothesis is that the better an independent variable can represent the true effort required to resolve an issue, the better that variable will align with Line Impact. Dr. Abran offered us links to a number of contemporary academic research papers exploring best practices for estimating effort, so we will be reviewing these in the weeks to come in pursuit of research that continues to explore the correlation between cognitive energy and Line Impact.