Diff Delta is the foundational metric GitClear employs to interpret "how much meaningful change is occurring" in a repo over time.
On this page, we'll illustrate the different types of code operations that are recognized in order to calculate how much "meaningful change" is happening. As proven by our "95% of lines of code are noise" data, much care is required to transform "changed lines" into a stable representation of meaningful change.
If you're interested in understanding how Diff Delta is calculated, and what it means, there are a few other resources to consider:
Visual Guide to Understanding Diff Delta (PDF). Shows how churn is interpreted, and the interplay between Commit Groups and Diff Delta.
Diff Delta from First Principles. Contextualizes how to understand the Diff Delta, relative to the most popular existing metrics.
Research on correlation between Story Points and git metrics (PDF). This was the research that inspired celebrated, lifelong software researcher Alain Abran to declare "For your intended purpose to provide an evidence base that your measurement approach ("Diff Delta") is better than other alternatives, I am of the opinion that you have clearly demonstrated its superiority."
Diff Delta in Context: What does it mean to accrue 10, 100, 1000, 10,000 Diff Delta?
Even without visiting any of these, you'll apprehend the core tenets of Diff Delta calculation by the end of this page.
There are 7 different operations that we recognize in commits. Each operation below is accompanied by a screenshot of how the operation looks in the GitClear diff viewer.
Each added line of code counts for up to 10 points.
Each deleted line of code can count for up to 25 points. Our experience is that deleted code means less code, which is more impactful for the long-term benefit of the codebase, and such changes are weighed more heavily.
Moved code (about 30% of all changed lines) is assigned no Diff Delta.
When a line changes in part, we consider this an "update." Updates can count for up to 10 points.
When a developer applies the same change to several lines en masse, this is detected as "Find & replace." Such lines are worth up to 3 points.
When a developer repeatedly adds ("pastes") the same line in multiple locations, across one or more commits. Copy/pasted code is assigned no Diff Delta.
One of the most common types of code change is the "no-op." This encompasses all changes to white space, blank lines added, and lines whose only change was their line number.
Assuming you're using one of the 40 programming languages we have built a custom parser for, Diff Delta will additionally recognize a few other concepts.
About 5-10% of all lines of code are language keywords. Regardless of whether they're added, deleted, or moved, they're transparent to Diff Delta.
Because it's trivial to add or delete large swaths of comments, and because much boilerplate comes in the form of comments, they are afforded negligible Diff Delta.
As a matter of taste, some projects and committers prefer to spread declarations across multiple lines. From the standpoint of Diff Delta, the entire declaration is treated as a single line, since it could have been represented as such.
Some languages make heavy use of include statements to avail functionality within modules. These statements do not unto themselves contribute value, and so are afforded negligible Diff Delta.
The "action detected" is combined with our assessment of the line's context to yield our final estimate of cognitive load. Here are a few types of context recognized by GitClear:
Proximal changes. A line that is changed alongside 10 other lines generally requires less cognitive load than a standalone line that is changed. The former situation describes most new features implemented, whereas the latter implies a targeted bug fix that probably took some research to pinpoint.
Churn. Has the line been changed previously within the last couple days? If so, its Diff Delta is spread across all the commits that changed the line.
File type. It's generally easier to write a line of CSS than a line of Python. We estimate scalars for each file type based on the level of redundancy we detect in that file within your project. Admins can modify these scalars on a per-project basis to account for their own sense of the relative difficulty of contributing a line to various types of files.
Keywords and syntactic lines. We detect usage of known keywords in file types, and allow the admin of a GitClear project to define additional patterns born of language or project convention. Such lines are assigned little to no Diff Delta.
Comments. Similar to "keywords and syntactic lines," we detect lines of code that are comments and adjust Diff Delta for such lines (by default, comments are considered to require less cognitive load than code)
While the above list is not exhaustive, it reflects the philosophical approach to assigning Diff Delta to a commit. The cornerstone of calculating Diff Delta is to 1) detect the action that occurred 2) scale this value relative to the context of the changed line. Wherever possible, we expose the context scalars used, so they can be modified based on the judgment of project admins.
When GitClear's Commit Cruncher processes commits, it weaves together a network of line ancestry that allows GitClear to delineate between "churn"-like changes vs. "legacy code updates." The Visual Guide to Understanding Diff Delta (PDF) depicts this process on pages 13-14 and 17-18.