"Commit Groups" are a signature GitClear feature: a set of commits that combine related work together into a single diff. When reviewing commit history or in-progress pull requests, commit groups save time by hiding the "in progress" churn that a developer generates as they iterate toward a final, tested implementation.
Topics covered in this article:
On GitClear, code reviewers have several options for how their commit history should be grouped. These options are can be toggled in the Commit Activity Browser:
Group by heuristic ("Auto" group). This groups commits together by assessing several attributes of the commits. In descending order of importance, "Commit A" and "Commit B" are likely to be grouped if: B revises a line from A, they reference the same issue/PR, they were authored at a proximal time, they change the same files, or they change files from the same directory.
Group by branch. All commits on the same branch will be grouped, up to the value the user chooses for the "Duration" of the commit group (1 day, 3 days, etc)
Group by issue. All commits implementing the same issue tracker ticket are grouped.
Group by pull request. All commits bundled in the same pull request are grouped.
Here is a demo of the Commit Activity Browser, which shows Commit Groups in their natural habitat:
Since GitClear recognizes moved code and other granular operations, the total changed lines to review are, on average, reduced by 25-30% compared to how they would be presented by other git platforms.
When a commit group is opened, you will get all the affordances of the GitClear Diff Viewer.
When we began building GitClear, one of the biggest problems we hoped to solve was aptly described by Daniel Janus in his blog post, Things I wish git had: Commit Groups.
In its default flavor, git forces teams to make a tricky decision with regards to "how much work should each commit contain?"
Some teams choose to optimize for a concise git history. Practitioners of the "concise history" approach often recommend using git --reset soft
and PR squash merge to combine developer "save points" together into some larger unit of work like "issue tracker ticket." Advocates this approach get the benefit of "less noise" in their git commit history, at the costs we describe in our recommendation to adopt minimalistic committing: less saved versions to return to when something goes wrong, and a less targeted git blame
history to explain why a specific line changed, less indication whether a teammate is "stuck" or just silently revising work for days.
Other teams lean toward minimalistic committing, making a commit any time there is enough work to be described. The benefits of this are summarized in our aforementioned guide: more concise git blame, more clarity about progress or lack thereof, more edification for the developer themselves. The drawback is that, if you're using classic git tooling, minimalistic committing leaves a trail of numerous, potentially redundant, artifacts as the work for a feature progresses.
Commit Groups sneak around the usual trade-offs by automatically bundling together similar work. By bundling together up to 200 commits (though groups usually fall in the range of 10-25 commits), a reviewer gets the most concise possible summary of what has changed lately. The reviewer using Commit Groups also gets to skip past all the developer's "in progress" churned lines, to see only the latest incarnation of what the developer has changed.
Commit list on GitHub | Commit list on GitClear |
Besides being much less clicking, the GitClear version of this diff will elide showing all the lines from the earlier commits that were subsequently changed in the later commits. When a later commit changes an earlier commit, that contributes the strongest heuristic magnitude for GitClear deriving whether to group commits together.
The "Best Github Alternative Pull Request" landing page shows specific diff examples where grouping commits together allows a set of changes to be viewed in fewer lines than would be needed on traditional (non-GitClear) diff providers, all of whom use the Myers Diff Algorithm to derive which lines to show as "changed." According to pull request research undertaken on 49 developers, pull request diffs that were built as Commit Groups could be understood faster, at a comprehension level within our margin of measurement error.
Perhaps the most common benefit that emerges from viewing diffs as Commit Groups is that the reviewer avoids the familiar dread that is "commenting on a line that already changed since you left your comment." On GitClear, such lines earn a special icon to tip off the reviewer that there are better uses of their time:
When you are reviewing a commit with the "X" icon, you can know that you're reviewing outdated code
If you have a specific dimension on which you would prefer to group commits, that can be controlled by clicking the "Gear" icon in the Commit Activity Browser:
As described above, commits will be grouped by "Issue Tracker Ticket (Jira)," "Pull Request," "Branch," or "Auto" (based on "time" & "directory of code change" heuristics).
No branches? No problem. Many freelancers, solopreneurs, and other individual developers use GitClear to make sense of their commit stream without using the usual demarcations in play on a scaled development team.
Over several years spent iterating Commit Groups, the "commit similarity evaluation" engine has accumulated numerous indicators to understand when two commits are likely working toward a similar purpose:
Does Commit B revise lines recently authored in Commit A?
Did Commit A and Commit B changes the same files?
Did Commit A and Commit B change files from the same directories?
Do Commit A and Commit B use a large amount of overlapping language in their commit messages?
Were Commits A and B authored at proximal times, relative to the developer's usual commit schedule?
[Contraindication] Do Commit A or Commit B change several files not found in the other?
Each of these factors, along with about 5 other factors retained as proprietary knowledge, are combined to produce a number that predicts whether the commits are sufficiently similar to be described and visually represented (by Committer Changelogs) as a unified group.
You can optionally review these "ad hoc groups" when looking at your "Annual Review" tab, to see what the biggest projects you've undertaken during the past year were.
Whether a Commit Group is graduating to become a changelog in a Snap Changelog, or if it is just being described when you hover over it on the Commit Activity Browser, it's often time-saving to distill the work from a Commit Group into a one-liner.
If the Commit Group is built around a Jira ticket or GitHub Issue, the title of the issue usually offers a good summary of what the Commit Group is working to implement.
When the Commit Group is aggregating work that's not on a pre-existing ticket, describing it gets more interesting! Some of the sources that GitClear consults to describe such groups:
Send the list of commit titles to an LLM, with the relative amount of work that each commit represented (as measured by its Diff Delta). Let the LLM mash up the commit titles into a coherent theme.
Check if one commit in the group is significantly larger than the others. If so, and if that work was committed with an author message that is enough characters to plausibly describe the set of commits, the title (first line) from the largest commit may be used to summarize the group
If neither of the first two methods yield an adequate summary, we utilize a set of "Backup LLMs" that can put a different spin on the JSON array of data that GitClear produces to describe the commits that compose the commit group. Since we can describe the relative amount of work per-commit, per-directory and per-file, AI has proven reliable at combining these data points into a human-recognizable summary
In most contexts, a team member can click a Commit Group title and polish its language, should they so desire
In addition to the top-line Commit Group summary, GitClear also generates a one-paragraph description that digs deeper into what work was done across the Commit Group. This description is generated from either the Diff Delta-sorted list of unique commit titles (if there are enough unique commit titles to tell a story) or by feeding the commit data back to our LLM endpoints with amended instructions to generate a slightly less concise description of what was changed in the Commit Group, and why, to the extent that the developer or their referenced ticket offers clues.
This can be tricky, because it's not uncommon that when GitHub shows the diff for an individual commit, the lines that it chooses to show as "added" or "deleted" may not match the lines it considers "added" and "deleted" when it is rendering the full set of commits that make up a large pull request, especially when it comes to designating whether the blank line "before" or "after" a method is the "new line."
To prevent misinterpretations, GitClear queries the git provider (e.g., GitHub) to get their complete diff. If the contents of GitHub's diff doesn't correspond to GitClear's interpretation, we will change the contents of the Commit Group to force it to match the diff at the git provider.
In the 4+ years since launching Commit Groups, GitClear has amassed more than 100 unit tests to ensure that our diff processing engine generates results that correspond with the git provider.
That said, it is still possible that certain particularly large or complex diffs may challenge the Commit Group builder. To confirm that the diff you're viewing matches the "before" and "after" contents at the git provider, we show a "Commit Group build details" box when viewing a pull request or commit group. When the Commit Group is built as expected, the build details will show that it is confirmed to match:
The "Commit Group build details" confirm that contents of the compiled Commit Group match the contents at the git provider
If there is a problem confirming that the Commit Group contents match the diff content at the git provider, the "Commit Group build details" will explain the current status of the build: either "group is built, but pending verification" or "group is not yet built." When we receive reports of Commit Groups that don't match, we investigate them in turn and ensure a unit test confirms the problem can't happen again.