When customers are becoming acquainted with GitClear, it's hard to build an instinct for what "normal" Line Impact fluctuations are, and what sort of expectations should exist around data consistency over time? This article will describe the "life cycle" of Line Impact: when it's initially evaluated, when it gets re-evaluated, and how it usually changes over time. Finally, if you find what you believe to be an anomaly in Line Impact, we'll offer some tips to help get the issue investigated and resolved promptly.
Wherein we review how Line Impact is initially evaluated, how it's propagated, and how it changes over time.
The Line Impact factors page describes much of what GitClear evaluates when making a first-pass approximation at Line Impact. The single biggest factor is the Line Impact assigned is the age of the code being changed or replaced. Code that has been in the repo for years (possibly getting shuffled between files) requires the most cognitive energy to understand, so deleting and refactoring such code tends to earn a premium per line, all other factors being equal.
On the opposite end of the spectrum, code that is deleting or replacing lines that might have changed earlier today are what is usually known as "churn" code, and these lines are ascribed very little Line Impact. But understanding how we handle "churned" code is the underpinning for building expectations around how Line Impact will change after its initial evaluation.
It's common that, in the course of developing a PR, a developer might add, remove, or change a particular line across several commits. GitClear detects this common development pattern and "stretches" the work that ultimately got done across the commits. For example, if we follow a single line through an entire PR, the ultimate Line Impact calculated might be "10 points for deleting an old line," and "10 points for replacing it with a totally new line of code" that took 6 commits to finalize. In this example, this single line from a file would get 20 Line Impact ascribed, divided by 6 commits in which it was changed -- so, a little more than 3 Line Impact per line change per commit.
When the line is first authored, it might already be perfect. Some developers only push commits after thorough QA and testing. In the example above, on the first day the line was changed, it would have been assigned 20 points. But then, each subsequent day the code is modified, the Line Impact ascribed to that first day's work is diminished. If the line ultimately changed 10 more times, then eventually the work from that first day would only be worth 2 Line Impact. But it might be a week or two before that outcome is known, because it depends on how much more the code needed to be polished before it made its way into master and stopped being changed.
After Line Impact is a month old, it generally shouldn't change much. There are, however, a handful of reasons that Line Impact might change after one month:
Date range, commits, or active committers for the entity are added. Evaluating more code means that the per-time-interval values will change for the entity or organization in which the changed repo(s) reside.
Committers are exiled. Removing active committers reduces the Line Impact reported for those repos.
Branches are discarded. If code in a branch is never merged to master, it will eventually (about 2 months after activity has been deemed stale) its Line Impact that was initially ascribed will be stripped. Additionally, if a user later force pushes a branch with its commits removed, then the Line Impact for those commits will be removed ("force pushed to oblivion" is the explanation GitClear messages use to describe this) from contributing to Line Impact unless the commits also existed on the main branch.
Multipliers are changed (e.g., to create incentives). Line Impact is a configurable quantity that is often used as a way to incentivize desired behavior within a team. When one changes Line Impact multipliers (through Code Domains, code file types, or other mechanisms), all of the Line Impact for the effected repos will eventually be recalculated.
Commits are duplicated. If some change is later duplicated (i.e., we detect identical changes made to a file across branches), then the value for one of the commits is negated, depending on the lineage of the commit authorship (earlier authorship is preferred, but might be committed after the later-authored duplicate)
Beyond the typical life cycle changes of Line Impact, there are a couple other mechanisms whereby its results can be perceived to "change" over time.
Different reports on GitClear consume Line Impact data through various caching mechanisms that can sometimes take up to 24 hours to catch up with Line Impact ascribed to the most recently calculated commits. For chart values less than a day or two old, take their results with a grain of salt.
All Line Impact on GitClear is ascribed to the date on which the commit was "authored" (committed on the developer's local machine), not when it was "committed" -- a timestamp used by providers like Github to describe when the commit happened. GitClear believes that, since the "committed at" time can change frequently based on arbitrary factors (e.g., rebasing), it makes most sense to use authorship time as the definitive source of truth on when the work was done.
This decision means that sometimes Line Impact can appear to "pop up" from days or weeks ago, if the committer chose not to push their for a protracted amount of time.
The Line Impact for any commit can be manually changed by any user who is designated as a "Lead Developer" or above in the User settings. If the value of a commit is explicitly changed, that will be indicated in a prominent popup when the commit is visited.
Sometimes, even taking account of the normal reasons that Line Impact changes, a manager might observe changes that seem suspect, or outright wrong. Here are some options we recommend for these situations.
Line Impact is only as valuable inasmuch as users can learn to trust it. The reason that we created this help page was that we wanted to help our users become expert in understanding the different paths by which Line Impact can be expected to change. But this does not means that everything is always perfect. We rely on our users to help us reproduce anomalies they observe, so we can fix those anomalies with tests and prevent them recurring in perpetuity.
Please email email@example.com with at least one screenshot of the incorrect report and the URL at which we can observe the anomaly you have spotted. We will usually respond to all such reports within one business day. If further investigation is required, it may take up to a week, but resolving anomalies in Line Impact, or any of the reports that present Line Impact, is our highest priority (i.e., before adding new features, we fix all reproducible data anomalies with tests).
Since Line Impact is denormalized to varying cached formats to be shown across contexts like the Directory Browser and Hourly Line Impact, it's not uncommon that the propagation from when Line Impact is calculated to when it is reflected in these graphs can be a source of perceived anomalies. If you locate a particular report that doesn't seem to be integrating Line Impact that you can see should be present (i.e., by visiting a developer's Line Impact historical graph), please see item #1 and send us a URL so that we can evaluate why the report has failed to stay in sync.
If you're too busy to wait for data to resolve, GitClear provides a button under Settings -> Data Processing to "Regenerate cached stats." Upon clicking this, we will begin to regenerate your Line Impact stats from scratch. Please allow 1-3 days for this reprocessing to complete.