Judging by the almost 1,000 points accumulated by the HN Story about Microsoft's "productivity tracking," and the associated 400+ comment thread, there seems to exist an undercurrent of apprehension toward measurement. It's true that there can be good reasons to fear measurement, especially a metric calling itself "Productivity Score," which is ripe to get laughed at. But there's also a defense of metrics that deserves to be made alongside all the skepticism that bubbles to the surface when a metric like Microsoft's makes news.
The right types of measurements are central to understanding the world as it is, which helps to get things done and improve with time. What types of measurements might even "measurement haters" acknowledge as "useful"?
Presidential selection (polling)
Stock portfolio results
Gender/diversity demographics in tech
Temperature (body, home, predicted outdoor)
Crime rates / Incarceration rate
Hours slept per night
High school & college GPA / in-major GPA
Zillow / house prices
Quarterback rating / QBR
[Insert your favorite video game/sports leaderboard here]
Article word count / estimated time to read
Packages shipped / delivered
Covid tests administered / positive %
Even if one doesn't appreciate these metrics, or even if one doesn't accept them as totally valid, consider a world without them. After all, these are all measurements somebody invented a way to quantify at some point. Prior to the existence of measurements like these, all that was left to guide intention was "gut feeling" or self-invented metrics. 😨 Hopefully it is not controversial that measurements like the above confer decision-making benefits relative to "gut feeling."
That's why the metrics themselves are seldom the main problem. Even new measurements, which inevitably carry a high ratio of (vocal detractors / vocal supporters) in comment threads, can be useful when appropriately interpreted. When metrics aren't "appropriately interpreted," it's usually not because the metric fails to measure something of interest. More often, it's due to some combination of careless metric labeling/explanation, under-informed metric practitioners, or rationale-backfilling managers.
Those problems sometimes lead smart people to "throw out the baby with the bathwater," reverting to making decisions based on their gut. It's the wrong conclusion. Good metrics exist to light the path toward improvement. Trying to get better results without measurement is like trying to drive across the United States without a map -- to whatever extent it's possible, it's ridiculously inefficient.
Maybe by highlighting the specific ways that measurements are most often misused, readers can draw a finer line to distinguish between "generalized mistrust of measurement" (less useful) and "wariness of misused measurement" (more useful).
The more useful/profound the metric, the more difficult it is to derive, the more of an onus is on the metric provider to precisely describe their creation's limitations. Especially if a provider chooses to label their metric something dumb like "Productivity Score," you are setting an extremely high bar for how much explanation will be needed to describe the myriad ways your label falls short of reasonable user expectations.
I believe the right approach for a metric creator is to start by picking a label that's precise enough that its users don't start with unreasonable expectations about what the metric conveys. But more explanation usually still needs to happen. A recent attempt I made toward this end was my article Toward Developer Measurement of the Greatest Shared Benefit. The article basically says that GitClear's "Line Impact" measures one specific thing, and it's not "developer productivity," though it is related in some ways. The article makes the case for why a manager would be foolish to use the metric in isolation, and includes a quote where I concisely list limitations of the metric in a format that anybody can reference.
It's a sometimes-tedious obligation on the part of a metric creator to carefully explain what their measurement doesn't do. But scrimping on this step leaves the metric creator in a very dangerous position with regards to how their metric can get misused in combination with Problems #2 and #3.
People are busy. There's never time to read the manual, so we satisfice. And that's the basis for how most git metrics come to be hated. A metric like "commits" does actually carry some predictive signal, but it can't be used to compare across committers, and its predictive power diminishes as the time range analyzed gets smaller. The metric will misinform any manager who hasn't taken the time to realize it holds almost no signal across developers or across teams.
When managers make bad decisions based on their over-estimation of what "commit count" means, it's unfair to pin the result of that on Github. Github isn't telling managers that this metric should be used to make decisions. They're simply making it easier to see how a core git metric accumulates over time. In these cases, I believe the primary error was on the part of the manager who took it upon themselves to ascribe significance that Github (nor anyone of sound mind) would ascribe to a metric like "commit count."
Letting go of employees is hard, and excepting violations of company policy, it's always a judgement call. If you were a boss, and you had a gut feeling that somebody should be let go, how would you make that happen? You would need to backfill a justification for what you already wanted to do.
Now of course, "backfilling an explanation" is the very definition of what a good metric helps avoid. In a life of accurate measurement, you're like a baseball team: the reason you let go of players is to find other players with better stats that will give you a better chance of winning next year. But in the world of "measurement-hating managers," the conclusion "somebody must be let go" is made based on gut feeling. They will take any tangible manifestation that aligns with their gut feeling.
If you can find a metric that implies a promise as big as Microsoft's "Productivity Score," and the firee's number is lower than the average, that's a convenient way to justify the decision. Hard to say how common cases like this are, but the "willful misuse of measurement to justify gut feelings" definitely gives rise to some subset of people who grow to hate measurement in general.
How are we supposed to know we're moving in the right direction if we don't have reliable data to guide us? If you're not using data, embodied through appropriately-interpreted metrics, I don't know how you would expect to consistently move closer to your goals. You're driving across the United States without a map.
Good measurements help to create narratives out of a real world that is often hopelessly complex. Leaders who expect to "follow intuition" or "trust their gut" to achieve long-term success will lose when pitted against those that follow data-backed measurements, interpreted appropriately.
Individuals who ignore measurements for their potential to be misinterpreted are leaving predictive power on the table. Unless you choose where to eat based on the sign above the restaurant's door, you already intuitively understand that measurements tend to help make better decisions. In a sense, "getting wiser" is the process by which people slowly become more discriminating in which metrics they choose, and their knowledge of how to apply them appropriately to move toward the future they're after.
tl; dr Developer metrics (or any productivity metric) is not problematic unto itself. Good measurement is key to improving. Problems begin when managers don’t understand the limitations of their preferred metric. No single metric can tell a full story, but good ones can start a convo.
Even when a metric has been proven to accurately convey an underlying phenomenon, that still doesn't mean it is going to be able to serve as a target to optimize for. In the context of developer metrics, I explore that distinction at depth in my article on metrics that resist being gamed.