Will Larsen, aka @Lethain (recently recovered from a stroke), is perhaps the most widely-read author of recent years on the topic of how to maximize an engineering team's productivity. I recently came across a post of his from late 2020 called "My skepticism towards current developer meta-productivity tools," where Will geeks out on what the perfect developer learning tool might offer. Finding and building the best dev tool for learning has been an ongoing interest throughout my career as a Programmer (20 years), CEO (10 years) and various in-between roles, so I found Will's post very interesting, to say the least.

I'm going to apply my research in the subject matter (part formal research, part ad hoc) to try to piece together how the tool Will describes might be brought to life. Then I'll add some of my own ideas about what the ideal developer learning tool might do.

I'm intentionally going to avoid mentioning my company's dev tool GitClear, because I want this discussion to be strictly about the hypothetical "best case for a developer learning tool" -- independent of what exists today. While I won't talk about our company, it's probably relevant that I've been cataloguing products and research on dev productivity as a hobby for the past 5+ years.

link"Each pull request like a request trace" brainstorm

As a starting point, I recommend reading Will's original post in its entirety (not a long read) to fully digest his proposal. Here is the paragraph that I think most specifically describes the ideal developer learning tool that he's after:

The fundamental workflow I’d like to see these systems offer is the same as a request trace. Each pull request has a unique identifier that’s passed along the developer productivity tooling, and you can see each step of their journey in your dashboarding tooling. Critically, this lets you follow the typical trace instrumentation workflow of starting broad (maybe just the PR being created, being merged, and then being deployed) and add more spans into that trace over time to increase insight about the problematic segments.

There are no shortage of cool "what if it could do X?" thought experiments that follow from the idea of marrying a PR to downstream metrics in a configurable way. Where I struggle with the idea is trying to envision how it could incorporate all the variability that real world dev teams bring when it comes to commit/PR hygiene.

My experience suggests that "PR" won't prove to be a viable key by which to index the observed data . Issues and commits won't work either (seems similar to Will's conclusion?). The only "inputs units" or "request trace key" that I could imagine working to index the "request trace" would be groups of commits (honorable mention: ranges of lines of code).

Having a poorly chosen key for request traces sets the cap on all potential downstream benefits. That is, even if the tool could reference 100 services by PR, the first time someone writes code that's not in a PR, or code that's in multiple PRs, or code where the PR is ambiguous to the tool for whatever reason -- the possible benefits of the tool are squandered. The world I've observed through our git product is a very noisy place when it comes to the different conventions used for PRs, issues, and commits.

Let's say that "groups of commits" works as the key. Now that we've solved that: what might a developer or manager learn by tying their downstream metrics to groups of commits?

My intuition is that "how useful this tool can be for learning?" will depend most on how good it is at filtering out noise.

In every exception tracking or performance monitoring tool I've adopted, the lion's share of the incidents are outliers or best ignored. Different teams have different ways they capture errors, and different tolerances for how many can accumulate, but I've never heard of a team that solves as many "event incidences" as they generate across their infrastructure monitoring. So, much rides on how well the tool can shush noisy events. Managers and developers are almost guaranteed to be skeptical of any new dev productivity tool from the getgo, so even a few "false positive" events will get the baby thrown out with its bathwater.

Where might we look to try to find low-noise, high-signal events that merit a request trace from the group of commits? If Will thinks it would be fun to make a list of what he sees as good service->metrics candidates, I'd be eager to read and consider it. In a perfect world, it would be helpful to have a list of 3-5 events that would generate a "request trace" for review after occurring. If I could better envision specific types of events that the developer learning tool is observing, it would be easier to formulate how noise could be dialed down for those events. Assuming the noise level could be managed, it would be interesting to think about the specific ways that a team would learn from this tool.

linkWhat other learning opportunities would benefit developer teams?

Lest we get lost in the weeds of implementation details, Will offers a more high-level starting point to reason about how the perfect "developer meta-productivity tool" might work:

The real need here is capturing the data to support learning, and learning happens in batches... The right tool here should be designed exclusively from a learning perspective.

This seems like the right spirit to look for ideas about what the perfect tool should offer. To that end, here are 11 specific developer learning opportunities (aka pain points) I think the perfect "developer meta-productivity tool" might fix:

How to know if new developers are getting the resources they need to learn? This is question #1 on most managers' minds when their team is in growth mode. The first 6 months of onboarding are critical for integrating a new developer into the team, so the best learning tool would provide optics and assistance relative to hiring cohort, with special emphasis helping the most recent cohort ramp up at least as quickly as the last.

Is developer time going toward planned or unplanned work? And where is the "unplanned" energy being spent? A quick query of about 1,000 repos in our git product suggests that, in the past month, 51% of commits authored have an associated PR. The other half of the work? Hard to pin down. That implies a deep well of time to possibly recapture.

Who are the subject matter experts? Similar to #1: when I'm a developer who is new to System X (but not necessarily new to the company), who would be the best to ask about it? Or if they've left the company, who would be the next best?

What's the desired level of test cases and documentation? How far are we from hitting it? Several productivity articles advocate "Lead time" as a top-tier measurement to observe, but the metric creates a strong incentive to cut corners on documentation and testing. Especially to the extent work happens outside PRs, where there's no backdrop to catch newly-added tech debt.

How did line X come to be? When one is considering changing an old or confusing line of code, wouldn't it be nice to be able to request the backstory about how the line came to be? Currently this can be done ad hoc on Slack, but that approach won't help future developers who are confused by the same line.

Where is the tech debt, in terms of directories, files, and methods? CTOs and Engineering Managers slowly build an intuition for this, but wouldn't it be better if one could browse through a directory structure like on Github, but have each folder or file labeled by how much technical debt it's estimated to contain?

How can we avoid invoking deprecated methods or methods on the shite list, especially for new developers? Or: how can we ensure that developers (especially new team members) discover the best practices even if they didn't read all the docs, or the docs are outdated?

How much does adding new devs slowing down existing ones? The Mythical Man-Month makes a persuasive case that adding more developers does not proportionally increase output. Some incremental amount of productivity will be lost to coordination--wouldn't it be nice to know how much?

How much of an impact do meetings distract from (or add to) GSD? Meetings are kind of the boogie man of developer productivity since they break flow state. But what if they add clarity that reduces churn?

What is the minimum level of PR review a company can “get away with” while avoiding bugs and tech debt? Most every team uses PRs at least sometimes. But the extent to which a robust pull request review adds value is relative to a bunch of factors.

Which Jira tickets did the company most underestimate the difficulty of? And what can be learned from those misses?

Presuming these questions could be answered by software (I think so), how would they rate in usefulness for problems you've heard about or experienced?

Realistically speaking one has to acknowledge that even a tool specifically designed for learning could be misused. "Fear of misuse" of seems like it might already be an active deterrent to companies like Github who would otherwise consider advancing the status quo regarding how commit activity is consumed.

I'm not sure how much the ideal tool needs to obsess over preventing tool misuse. It's a tough question, because almost any meaningful insight identified could be translated to some flavor of blame. That said, Github already offers nominal "commit stats," and git itself offers a blame invocation, so I don't think reasonable people are totally averse to using data-backed systems to infer fault when something goes wrong. Almost all the "most helpful" ideas on my list could be misused or misconstrued, so I think the ideal tool would employ whatever possible measures to deter misuse. Possible ideas.


Shout out to: Ness Labs for the "learning in public" idea that inspired me to post a thought exploration like this, and Scott Alexander for reminding me that blog posts tend to move things forward more than tweets