One of the top 10 questions we receive about GitClear's Diff Delta metric is some form of "why would I trust a metric given Goodhart's Law?" The law, which has gained widespread acceptance among developers, states that "as soon as any metric becomes a target, it ceases to be a good metric." The historical prevalence of short-sighted metrics plays no small part in the numerous examples the internet will provide to prove Goodhart's Law.
What are people missing when they subscribe religiously to Goodhart's Law? The utility that can be extracted from creating a metric designed to be gamed.
For a metric to get better as it's "gamed," the outcome of pursuing the metric must translate to long-term wellbeing. Think "daily step count," a metric that games human minds to do more of the exercise we should be doing anyway. Banking reports that make it exciting to see your 401k balance grow are yet another example. And there are more.
So, is maximizing "Diff Delta" more like maximizing "daily step count" (long-term benefit) or "lines of code added" (long-term suffering)?
You be the judge.
When customers ask us if their developers can game Diff Delta, we answer "yes, if you want improved long-term code health." As a developer comes to learns the contours of what pushes their Diff Delta up, they find clues in how to outcompete their past selves:
If there's one theme to the work that earns a high Diff Delta per minute, it's deleting old code. Why would you want to encourage developers to find opportunities to remove or revise legacy systems (code that hasn't been touched in years)?
To start, the more code exists in your repo, the more code you "surface area" exists for future maintainers. A big reason that startups ship code so much faster than mid-sized or enterprise companies is because they simply have less code to consider when they're looking to add another feature. If, as companies, matured, they proactively sought out opportunities to reduce their code to maintain, they might never lose the speed they once enjoyed as a youthful company.
In practice, there's almost never time to pursue a specific task of "removing X feature." But there are often opportunities to incrementally remove cruft from systems when those systems are being evolved into their second and third versions. The relatively high per-line value that GitClear assigns to work that revises or removes legacy code (code that hasn't been touched in 1-2+ years) means that developers have a strong incentive to always be aware of when they can trim down and reuse a v1 system to implement a feature, rather than heaping another redundant module into the code base that will need to be maintained in perpetuity.
Conversely, when developers add new code, Diff Delta applies a "greenfield code penalty" that serves to give less and less value per line when a developer is adding a large outlay of new code. The EPA's advice to "reduce, reuse, recycle" steers developer behavior in a direction that will improve their happiness and the velocity of product evolution.
Deleting code that stays deleted is the blessed path to sustaining startup velocity as years pass.
Another long-term desirable way for developers to "game the system" is to write and update documentation, write tests to back up their features, and/or avoid deprecated behaviors that management has determined are ill suited to the repo's health. GitClear allows managers to incentivize all of these behaviors through Diff Delta multipliers:
The Diff Delta multiplier we use for documentation is 2.5, whereas most code is valued at 1.0
The GitClear team ascribes 2.5x as much value to each line of documentation that is meaningfully changed (not just moved, or deleted and re-added), since we have learned through experience that updated documentation is key to helping new developers onboard rapidly. Likewise, Diff Delta incentivizes tests for similar reasons. Any team can choose what sort of code they want to see more or less of, and tilt their team's own Diff Delta toward that preference.
If you want to see more React components, more devops config files, or any other sort of target that engineering execs have deemed prudent, applying a multiplier in Code Domain settings allows that to be incentivized.
Before a developer can game specific targets like the above, the first incentive they need to experience is the incentive to show up and be focused on their task. Since Diff Delta is a metric that empirically correlates with developer effort, the best way to start "gaming the metric" is to put in a sustained effort, day after day. As a developer reviews and considers their past history, they can begin to spot patterns in the forces at play when a day passes with no commits.
When a metric is measuring what you or your organization aspires to accomplish, like happier customers, less bugs, and faster release cycles, pick a metric that leads to more of those outcomes. When an engineering managers helps their team optimize over a long time horizon, having shared, transparent measurement that can group together the "95th percentile weeks" allows each developer to identify the meta-patterns that fueled their most fertile creative and productive phases.