Final version: Available on Google Docs
2023 was a very good year for GitHub Copilot. In little more than a year's time, it has grown to be used by millions of developers and tens of thousands of businesses. The popularity of Copilot proves that 2023 marked the beginning of a new era in how code is authored.
Less understood is what impact all these AI-generated lines might have on code quality and maintainability? In this paper, GitClear looks at more than 100 million lines of code, to decipher the patterns in how writing code has changed since Copilot's debut.
We find several concerning trends for Lead Developers hoping to maintain a pliable code base in the long-term. Code churn -- the percentage of lines that are changed or reverted less than two weeks after being authored -- is projected to double in 2024 relative to its level in 2021. We find that percentage of "code added" and "code copy/pasted," especially by junior developers, is increasing at a much higher rate than code that is updated, deleted, or moved to consolidated location (aka "DRY code").
We conclude with suggestions for managers that seek to maintain high code quality in spite of the forces increasingly driving against that end.
With numbers like these, it's little wonder that GitHub's own CEO, Thomas Dohmke, would carve time out of his schedule to write about the AI revolution, in a blog post (and research paper) he published on GitHub in 2023. Surely, there were less busy people who could have written about the magnitude of Copilot's impact. But, considering this is the most popular product GitHub has launched since its inception, you can understand why Dohmke would choose basking in the glow of his hit product over the usual day-to-day rigors of running a GitHub-sized company.
From Dohmke's 2023 blog post, "The economic impact of the AI-powered developer lifecycle and lessons from GitHub Copilot"
When we saw these numbers, it confirmed our intuition that Copilot is already making a measurable impact on the development ecosystem. In the same blog post, Dohmke asserts that more than 20,000 organizations are already using GitHub Copilot for Business. This is in addition to the "more than one million people" that GitHub stated were already using Copilot on a Personal license as of February 2023, when Copilot for Business was released.
At this point, it's not hard to imagine that at least a third of all global developers have access to either GitHub Copilot or an AI coding assistant like it. In 2024 and beyond, the proliferation of AI-assisted code seems likely to continue, or even accelerate.
Developers wouldn't be adopting Copilot if they didn't believe that it accelerated their ability to produce code. GitHub's "75% more fulfilled" measurement attests that Copilot certainly succeeds on this count. The question is, "what cost are teams paying for this convenience?"
Developer researchers are concerned by the impact of AI assisted programming
GitHub claims that code is written "55% faster" with Copilot. But what about code that shouldn't be written in the first place? That, too, is written 55% faster.
That is the first of several challenges facing developers who use an AI assistant. Others include:
Being inundated with suggestions for added code, but never presented with suggestions for updating, moving, or deleting code
Time required to evaluate code suggestions can become costly, especially when the developer works in an environment with competing auto-suggest mechanisms
Code suggestion is generally optimized by the likelihood the developer will accept it. It is not optimized for whether the code is correct, or whether it will even run
These drawbacks presumably account for a portion of the difference in Suggestion Acceptance Rate between Junior and Senior Developers:
GitHub's own data suggests that Junior Developers use Copilot around 20% more than experienced developers
GitClear classifies code changes (operations) into seven categories, six of which are analyzed in this research:
Added code. Newly committed lines of code that are distinct, excluding lines that incrementally change an existing line (labeled "Updates"). "Added code" also does not include lines that are added, removed, and then re-added (these lines are labeled as "Updated" and "Churned")
Deleted code. Lines of code that are removed, committed, and not subsequently re-added for at least the next two weeks.
Moved code. A line of code that is cut and pasted to a new file, or a new function within the same file. By definition, the content of a "Moved" operation doesn't change within a commit, except for (potentially) the white space that precedes the content.
Updated code. A committed line of code based off an existing line of code, that modifies the existing line of code by approximately three words or less.
Find/replaced code. A pattern of code change where the same string is removed from 3+ locations and substituted with consistent replacement content.
Copy/pasted code. Line content, excluding programming language keywords (e.g., end
, })
, [
), that are committed to multiple files or functions within a day.
No-op code. Trivial code changes, such as changes to white space, or changes in line number within the same code block. No-op code is excluded from this research.
Specific examples of GitClear's code operations can be found in the Diff Delta documentation. GitClear has been classifying git repos by these operations since 2020. As of January 2024, GitClear has analyzed and classified around a billion lines of code over four years.
For this research, we are also exploring the change in "Churned code." This is not treated as a code operation, because a churned line can simultaneously be an "Added," "Deleted," or "Updated" operation. For a line to qualify as "churned," it must have been authored, pushed to the main branch, and then revised within the subsequent two weeks. Churn is best understood as "changes that were either incomplete or erroneous when the author initially accepted, committed, and pushed them."
As a first approximation of how Copilot has changed development, we analyzed the number of different line operations that GitClear has observed, segmented by the year in which the code was authored (using the authored_at
date within the git commit header). The raw numbers for this analysis are included in the Appendix. Here are the percentages by year:
Added | Deleted | Updated | Moved | Copy/pasted | Find/replaced | Churn | |
2020 | 39.18% | 19.47% | 5.19% | 24.99% | 8.26% | 2.92% | 3.32% |
2021 | 39.49% | 19.03% | 4.99% | 24.69% | 8.43% | 3.37% | 3.63% |
2022 | 41.05% | 20.15% | 5.22% | 20.46% | 9.43% | 3.68% | 3.97% |
2023 | 42.34% | 21.12% | 5.50% | 16.92% | 10.49% | 3.63% | 5.53% |
2024 | 43.63% | 22.09% | 5.78% | 13.38% | 11.55% | 3.58% | 7.09% |
Here are how these look in graph form, where the left axis illustrates the prevalence of code change operations (which, as percentages, sum to 1). The right axis tracks the change in "Churn" code:
The projections for 2024 utilize OpenAI's gpt-4-1106-preview
Assistant to run a quadratic regression on existing data for how AI changed percentages from 2022 to 2023. The full output of the OpenAI Assistant is provided in the Appendix. Given the exponential growth of Copilot reported by GitHub, it seems likely that 2024's numbers will continue the trends that began to take form in 2022.
Looking only at the differences in operation frequency between 2022 and 2023, we find:
Operation | YoY change |
Added | +3.1% |
Deleted | +4.8% |
Updated | +5.2% |
Moved | -17.3% |
Copy/pasted | +11.3% |
Find/replaced | -1.3% |
Churn | +39.2% |
The most significant changes observed to correlate with the proliferation of Copilot are "Churn," "Moved," and "Copy/pasted." The implications for each change are reviewed in turn.
Recall that "Churn" is the percentage of code that was pushed to the repo, then subsequently removed or updated within 2 weeks. This was a relatively infrequent outcome when developers authored all their own code -- only 3-4% of code was churned prior to 2023.
Coinciding with the growth of Copilot, there is a surge in how often "mistake code" is being pushed to the repo, running the risk of being deployed to production. If the current pattern continues into 2024, more than 7% of all code changes will be reverted within two weeks, double the rate of 2021. The implications for growth in Google DORA's "Change Failure Rate" are likely to manifest when the 2024 State of Devops report is released later in the year.
Moved code is typically observed when refactoring an existing code system. Refactored systems in general, and moved code in particular, are key to enabling code reuse. As a product grows in scope, developers traditionally rearrange existing code into new modules that can be reused by newly added features. The benefits of code reuse are familiar to experienced developers. Compared with newly added code, reused code has already been tested & proven stable in production. Often, reused code has been touched by multiple developers, which increases the likelihood that the code comes with documentation. This accelerates the interpretation of the module by developers who are new to it.
Combined with the growth in code labeled "Copy/Pasted," it seems clear that the current implementation of AI Assistants discourages code reuse. Instead of refactoring and working to DRY ("Don't Repeat Yourself") code, these Assistants suggest authoring new code that repeats existing code.
There is perhaps no greater scourge to long-term code maintainability than copy/pasted code. In effect, when a non-keyword line of code is repeated, the code author is admitting "I didn't have the time or inclination to evaluate the previous implementation." By re-adding code instead of reusing it, the chore is left to future maintainers to figure out how to consolidate parallel code paths that implement some repeatedly-needed functionality.
Since most developers derive much greater satisfaction from "implementing new features" than they do "interpreting potentially reusable legacy code," copy/pasted code often persists long past its expiration date. Especially on less experienced teams, there may be no code maintainer with the moral authority to mandate code reuse. Even when there are Senior Developers possessing such authority, the willpower cost of understanding code well enough to consolidate it is hard to overstate.
If there isn't a CTO or VP of Engineering who actively schedules time to reduce tech debt, you can add "executive-driven time pressures" to the long list of reasons that the copy/pasted code will never be consolidated into the component libraries that underpin long-term development velocity.
Another way to assess how Copilot is influencing code quality is to extract the data from GitClear's Code Provenance derivation. This provides an independent secondary check of whether the patterns observed in Code O.
Less than 2 weeks | Less than one month | Less than one year | 1-2 years | |
2020 | 65.9% | 8.7% | 21.8% | 3.6% |
2021 | 66.7% | 9.0% | 20.5% | 3.8% |
2022 | 64.7% | 9.9% | 21.1% | 4.4% |
2023 | 71.3% | 9.3% | 16.4% | 3.0% |
2024 | 74.4% | 9.1% | 14.1% | 2.4% |
In its visualized graph form:
The trend in this data corroborates the patterns observed in the previous Code Operation analysis. When code gets updated, the age of the code is growing younger by year. From the 2020 epoch of our data set, it appears the steepest drop in code getting revised more than 1 month, and less than 12 months, after it was initially authored.
The trend suggests that, absent AI Assistants, developers were more likely to find recently authored code in their repo to target for refinement and reuse. Around 70% of products built in the early 2020s use the Agile Methodology, per a Techreport survey [5]. In Agile, features are typically planned and executed per-Sprint. A typical Sprint lasts 2-3 weeks. It aligns with the data to surmise that teams circa 2020 were more likely to convene post-Sprint, to discuss what was recently implemented and how to build upon it in a proximal Sprint. Judging by the data, that seems to be happening much less often lately.
Moving in the opposite direction is code that gets revised less than two weeks after being initially authored. After hovering between 65%-67% through the period before widespread AI Assistants, in 2023 it popped up to 71%, a 6.6% increase that came at the expense of refactoring more seasoned code.
Can incentives be created to counteract the tendency to "add it and forget it" that today's AI systems promote? It is hard to imagine what sort of developer experience could be implemented that would guide a developer toward preferring reuse vs reinventing the wheel. It's conceivable that AI could be trained to identify opportunities where similar code could be consolidated, and offer the developer with tools to consolidate the copy/pasted mess that is currently propagating throughout repos. But even if this hypothetical "consolidation AI" were built, when would it be invoked? The same pressures that generally prevent teams from scheduling time to reduce tech debt would generally prevent them from stopping the feature pipeline for cleanup.
Another salient question in light of this data: at what rate does development progress become inhibited by additional code? Especially when it comes to copy/pasted code, which tends to seed indecision about which utility method to use among multiple similar choices, there is almost certainly an inverse correlation between "the number of lines of code in a repo" and "the velocity at which developers can modify those lines." The current uncertainty is "when is the accumulated cruft too great to be tolerated?" Knowing the rate at which slowdown takes hold would allow future tools to highlight when a manager should consider cutting back time on new features.
By all measures we evaluate, AI tooling exerted negative pressure on code quality throughout 2023.
Developer assessments, like GitHub's 2023 survey with Wakefield Research, suggests that developers already perceive the decrease in code quality. When asked "What metrics should you be evaluated on, absent AI?" their top response was "Collaboration and Communication," with "Code Quality" in second place. When the question switched to "What metrics should you be evaluated on, when actively using AI?" their responses shifted, with "Code Quality" top concern, and "Number of production incidents" as the #3 concern:
While individual developers lack the data to substantiate why "code quality" and "production incidents" become critical concerns when using AI, our data suggests a possible backstory. When developers are inundated with quick and easy suggestions that will probably work in the short term, it becomes a constant temptation to add more lines of code without checking whether an existing system could be refined for reuse.
To the extent that inexperienced developers continue to be offered easy copy/paste suggestions, the fix for this situation won't be easy. In the age of Copilot, it is beholden on engineering leaders to monitor incoming data and consider its implications for future product maintenance. There are a growing number of tools, including GitClear, that offer Developer Analytics. When evaluating these, we recommend Managers consider adopting tools that can help detect when problematic code is festering.
When it comes to building a product, there's no question that AI assistance leads to more lines of code being added. The better question for 2024: who's on the hook to clean up the mess?
Data used to build this research is included below.
Added | Deleted | Updated | Moved | Copy/pasted | Find/replaced | Lines changed | Churn | |
2020 | 9,071,731 | 4,508,098 | 1,202,480 | 5,786,718 | 1,911,855 | 676,000 | 23,156,882 | 769,493 |
2021 | 14,464,864 | 6,969,778 | 1,826,579 | 9,043,649 | 3,087,530 | 1,234,213 | 36,626,613 | 1,331,278 |
2022 | 16,868,378 | 8,280,031 | 2,146,768 | 8,407,677 | 3,873,240 | 1,512,708 | 41,088,802 | 1,630,703 |
2023 | 22,626,714 | 11,288,962 | 2,938,800 | 9,040,659 | 5,607,373 | 1,942,194 | 53,444,702 | 2,952,912 |
2024 | 28,708,803 | 14,535,353 | 3,803,275 | 8,804,121 | 7,599,970 | 2,355,662 | 65,800,602 | 4,665,263 |
Here were some secondary characteristics of the data set analyzed, to aid in evaluating its validity/applicability relative to existing data sets the reader may possess:
Year | Commit count | Committer count | Repos analyzed | Code files changed |
2020 | 381347 | 12761 | 497 | 1368549 |
2021 | 623264 | 17577 | 643 | 2207498 |
2022 | 723823 | 18446 | 993 | 2616263 |
2023 | 1019680 | 21700 | 1294 | 3414136 |
In CSV pasteable form, for your reanalysis convenience (2024 omitted since it is a projection you can replace with your own):
The data was stored in a Postgres database and was queried via Ruby on Rails' ActiveRecord.
Year | Less than 2 weeks | Less than one month | Less than one year | 1-2 years |
2020 | 550362 | 72471 | 182420 | 30074 |
2021 | 891008 | 120029 | 274125 | 50825 |
2022 | 1136604 | 173370 | 369925 | 77463 |
2023 | 1941351 | 254082 | 445869 | 82405 |
In CSV pasteable form, for your own reanalysis convenience:
The data was stored in a Postgres database and was queried via Ruby on Rails' ActiveRecord.