For most customers, the biggest impediment to utilizing data to gauge the efficacy of policy experiments is confounding variables.
In a perfect world, managers can set up experiments, with an "experiment condition" like "Developers can work remote 100%" and a "control condition" like "Developers work from office." You let some months pass, and ostensibly, you get a data-backed answer comparing what happens when developers work from home vs. office.
But, back in the real world, experiments are often spoiled by changing conditions outside of what the experimenter wanted to measure. Looking specifically at "repo evolution speed," by far the largest confounder that GitClear customers encounter is "changing team size or composition."
It's rare in the 2020s that a team of 5-10 developers will keep all of its members unchanged over a 12 month period. If the team size grows or shrinks, suddenly graphs like Delivery Velocity stats (Diff Delta History) become misleading. Most managers expect that the rate at which stuff gets done on a Development Team is proportional to the number of people doing it.
So, to be able to run experiments, managers need data, normalized such that it will remain consistent, even if the team members don't. That is what the Per-Contributor Velocity Stats offer.
Note that, in this document, as with internet results in general, the term "Contributor" is used equivalently alongside "Committer" and "Developer."
Any Pro or Elite subscriber can access their Per-Contributor Stats from within the Velocity top-level tab. Per-Contributor Stats show various quantities, divided by (or "normalized to") the number of contributors who were active on the date in question.
Traditional Diff Delta Stats show 2,966 Diff Delta for GitClear on June 12 week
Per Contributor Stats show 1,483 Delta for GitClear on June 12 week, given 2 committers
Looking at the "Historic Diff Delta" (cumulative) stats alongside the "Diff Delta Per-Contributor" (normalized) stats, it is apparent that the shape of the graphs is different, inasmuch as the number of contributors that were active in the various repos differed from date-to-date. Since this particular example is showing the past 6 months, individual date points are per-week. Thus, the value for a data point on the Per-Contributor graph, becomes smaller as a greater number of contributors were active in the repo for a particular week.
Considering the opposite scenario: imagine that the size of the team grows from 2 contributors to 3 contributors, while the Diff Delta remains a constant 1,000. In this case, the cumulative graph will show a flat line over the two weeks (the Diff Delta remained 1,000). Whereas the normalized, Per-Contributor graph, will show a dip from a value 500 (1,000 Diff Delta divided by 2), down to 333 (1,000 Diff Delta divided by 3).
Currently, the denominator for each data point on the Per-Contributor stats includes any committer who made even a single commit to the repo. To avoid having Per-Contributor graph show an unfairly reduced value, we recommend using the Per-Contributor stats while selecting a team where each committer tends to commit consistently to the same repos.
You can also use our feature voting board to request allowing a minimum threshold to be set for number of commits per week to consider a committer "active" for the sake of this graph.
Below the main selected graph, we will also show stats aggregated over the entire time period:
These stats are for the same time range and resource as shown in the screenshot above, where there were 256k total Diff Delta registered over the time range selected. So, ~32,000 Diff Delta per developer is shown in the summary stats as the average for all active contributors over the selected time period. All of the other cumulative stats have a similar adjustment to them, where they are divided by the number of total contributors.
It depends! Let's get as specific as possible so you can cite this data to managers without ambiguity:
On a per-date basis in the graph, each value shown in the tooltip is divided by the number of active contributors for that date.
For example, if you are showing the last year of activity, then each date in the graph will correspond to a month. So, each data point in the graph will be divided by how many committers made a commit in that month in the resource that was graphed
When you are graphing a quantity per-repo (e.g., Diff Delta), then each data value is divided by the number of committers who were active in that repo on that date
When you are a graphing a quantity that is not divided by repo (e.g., Diff Delta by Operation, or by Code Age), then each data value is divided by the number of committers who were active in the resource on that date. If your selected resource is an "Entity" or "Organization," we will use the total number of developers in that collection of repos.
Contributors are not duplicated, even if a particular graph's date involves the committers work in many repos
For example, imagine that you're graphing stats for an organization where
CommitterA was active in
RepoB, both repos in the organization. For a particular date, we should show the (for example) sum of "Library" code made within
RepoB, but we would only divide that summed "Library" code by one, assuming it was only
CommitterA who had made a commit on the selected team for the date being graphed.
For the summary stats, an "Active Contributor" is any non-duplicated committer on the selected team, who made a commit to the resource at any time during the interval that is being graphed.
For this reason, the summary stats tend to be artificially low, because it's not uncommon that over the course of e.g., a year, there will have been some committers who made a single commit, or a handful of commits. Each of these committers were technically active during the selected time range, so they increase the denominator factored into the Diff Delta normalization.