Strangers on the Internet, or Why PRs Are Conversations, Not Notifications
Here’s something I say a lot that usually gets a reaction: we are not strangers on the internet.
The open-source PR model — push your work, notify the team, hope someone picks it up, iterate based on comments — makes total sense when contributors don’t know each other and are working across time zones and organizations with no shared context. It was built for that. It works great for that.
It does not work great for a team of people who sit (or video call) together every week, have shared context on the codebase, and are accountable to each other for delivery outcomes.
When I watch internal teams use the open-source model, I see a few things consistently:
PRs get big before anyone looks at them. The engineer works in isolation until they think the thing is done, then opens a PR with thirty changes. The reviewer now has to reconstruct the entire intent of the work from the diff. This is expensive for the reviewer, and the engineer has already invested too much to pivot meaningfully on structural feedback.
Reviews become rubber stamps or rabbit holes. Either the reviewer approves because the thing looks roughly right and they don’t want to slow it down, or they find a fundamental problem and now you have a PR comment thread that should have been a twenty-minute conversation three days ago.
Coverage and quality go sideways late. By the time someone notices a testing gap or an architectural issue, the engineer is context-switched out and getting back up to speed costs time neither of you wanted to spend.
The model I’ve had the most success with is straightforward: have the conversation early, stage the review, and use the PR to confirm shared understanding rather than initiate it.
Specifically — if you’re working on something that someone else will need to sign off on, loop them in before you start. “I’m going to do this thing. Here’s my plan. Any concerns?” Then, at the midpoint, show them the architecture — not the implementation details, just the structure. Names, boundaries, where things live. Get that directional feedback when it’s cheap to act on. Then finish the implementation and let the PR be the final confirmation: here’s what we talked about, here’s what I built, here’s the test coverage that says it works.
If the PR fails automated checks (coverage below threshold, build broken, linting issues) it doesn’t even need to get to a human. It goes back to the engineer with a clear signal. Fix the basics first. Don’t ask me to spend time on work that doesn’t meet the baseline.
PRs as the ends of conversations. It’s a small mental shift with a surprisingly large impact on how smoothly a team moves.
More Code, More Risk ... More Automation
One of the things I keep saying to engineering teams right now is: this tool should improve the process on all fronts. Not just the building. All of it.
AI-assisted development is genuinely remarkable. Engineers are shipping features faster, clearing backlogs that felt immovable, and getting real leverage from tooling that would’ve taken months to build before. I’m a fan. I’m actively encouraging it.
But faster output without better validation is just faster bugs. I’ve seen it — and heard it from peers managing teams at scale. Engineers trust the output, skip the second look, and something breaks in production. The specific failure mode varies. The root cause almost never does: the code moved faster than the confidence in it.
The issue isn’t that AI makes mistakes. It does, and so do humans. The issue is that when the volume of work goes up, the surface area for mistakes goes up with it. If your validation process doesn’t scale at the same rate, you end up cleaning up behind yourself constantly.
The answer isn’t to slow down. It’s to automate your safety net.
Fast builds. Fast test runs. Automated coverage thresholds that fail a PR before it ever hits a human reviewer. If the test coverage isn’t there, the PR doesn’t get reviewed. Period. That sounds harsh, but it’s actually generous — it tells the engineer clearly and immediately what they need to do before asking anyone else to spend time on their work.
I also think there’s something important here about how we use AI in the validation loop, not just the construction loop. Use it to generate test cases. Use it to review test coverage for gaps. Have it challenge your assumptions about how a feature might fail. A lot of what a good QA person does — thinking adversarially about software — can be augmented with the same tools you’re using to write the code.
The teams that win with AI aren’t the ones moving fastest. They’re the ones who figured out how to move fast and land cleanly.
Testing Is a Team Sport
I’ve had some version of this conversation with multiple engineering leaders lately, and I keep landing in the same place: quality is a team sport, and most teams haven’t figured that out yet.
Here’s the dynamic I see a lot. You’ve got developers who write their code, write their tests (maybe), and hand the whole thing off. Somewhere downstream, a tester or QA person picks it up and starts poking holes. When they find something, it goes back. When they don’t, it ships. Everyone’s doing their job. And yet somehow, bugs still make it to production, morale is still low, and the whole thing feels slower than it should.
The problem isn’t the people. The problem is the model.
Separating “building” from “quality” creates distance — between the person who knows the most about how something was constructed and the process of verifying that it works correctly. That distance is where bugs hide.
The fix isn’t to hire more testers. It’s to collapse that distance. Make the people accountable for delivery also accountable for correctness. Pair them up. If two engineers worked on a feature together, they’re both on the hook if it breaks. Not “well, they were supposed to test it.” No. Both of them own it.
What I’ve seen work is building that expectation into the culture from the top. Not as a punitive thing, but as a shared ownership thing. *We* built it. *We* stand behind it. If it’s wrong, we fix it, we learn from it, and we make sure it doesn’t happen that way again.
The corollary to this is that no single person (or department) in an org should carry “quality” as their sole responsibility. One QA lead against a team of fifteen engineers is not a quality strategy. It’s a bottleneck with a job title. The engineerings and their leadership (the people accountable for delivery) also have to be accountable for what they deliver working correctly. That’s not a radical idea. It’s just engineering maturity.
Quality isn’t a department. It’s an expectation.
2023 Year in Review
My team has seen a lot of changes in the last year. These are things that we didn’t really have in 2022 but are became a part of our day-to-day in 2023.
Feature flags
We started to introduce the concept of flags in late 2022 but didn’t adopt them until 2023. We’ve rewritten the framework a few times. The team has created guidelines for flag creation, management, and removal. We’ve introduced over 200 flags in 2023. The adoption of our feature flag process has led to…
Deploying multiple times per day
In May of 2023, we moved to hourly deploys. We had previously been on a structured 2-week deployment cadence. There are some specific challenges with a 2-week cadence: maintaining the “release branch,” being beholden to the release schedule and work done or not done in time, the fact that we were deploying a bundle of 2 weeks of work, and hotfixes bypassing all the process. We’ve since moved to hourly deploys. We currently deploy on the hour and will be moving to full continuous deployment in January. Production incident remediation times are now tracked in minutes and not hours.
DDOS protection
In 2023, we moved our WAF to Cloudflare. This has given us DDOS protection and a CDN. The DDOS mitigation has proved extremely valuable, as our system has been able to withstand attacks over 10M requests per minute.
WASM
We’ve introduced Blazor to our stack to add frontend code quickly and reliably. We’re using Blazor WASM, which is C# and HTML compiled to WebAssemly. This allows us to use our C# knowledge and best practices (including automated testing) for browser code.
Running on Linux in prod
In the first half of 2023, we migrated our production servers to Linux. In the second half of the year, we migrated our remaining dev and staging servers to Linux. We’ve also migrated our build servers to Linux. These migrations saved costs on the computing side, allowing us to scale up our data side without any overall cost increase.
Latest .NET
Staying on the latest version of the framework is uncommon in most .NET shops. In 2022, we migrated to dotnet 6. In 2023, we’ve done it again and migrated to dotnet 7. In early 2024, we’ll move to the newly released dotnet 8.
Increased automated testing
In August, we increased our expectations around automated testing. We’re now near 40% for total line coverage for all codebases. We’ve adopted behavioral testing across all of the backend code. We’ve introduced Playwright, which allows us to test our frontend code in a more automated fashion.
Codified SDLC
In 2022, our SDLC was very loose and ad-hoc. In 2023, we’ve codified our SDLC. Our SDLC is meant to be flexible while maintaining consistency across the department. Our SDLC guidelines represent sensible defaults, and we hope they will continue to evolve to best serve the teams leveraging them.
Structured teams
At the end of 2023, we had one team of 12, one team of 5, and one team of 2 with QA floating across teams. We’ve since restructured into 3 teams of even size and even staffing.
Job descriptions
I know the engineering team had been working on some job descriptions/matrices, but they never quite made it to fruition. This year, Engineering leadership created measurable job expectations for software engineering levels 1-4. We’ve published these to our team and are using them in our 1:1s and reviews. This gives clarity to both our team members and managers. We’ll be creating similar documents for our managers and QA and DevOps teams in 2024.
Consistent meeting schedule
In addition to the meeting guidelines of our SDLC, we’ve also established a monthly department-wide meeting. This meeting is an opportunity to showcase the great work done each month, share department-level information, and keep each other accountable for our organizational goals.
Company-wide bug reporting
Open bug reporting is a sign of engineering team maturity, and in May of 2023, we opened up our bug reporting process to the whole company. We previously had two competing processes. Not only did this reduce transparency and create confusion, but issues reported in the support team’s system had to be verified and triaged before being added to the engineering backlog. This dual process limited visibility into the bug backlog and also skewed reporting.
This has been one of the most remarkable years of my career. Teams rarely see this much evolution in such a short time. I can’t wait to see what interesting enhancements 2024 delivers.
Improving Software Team Metrics
A healthy engineering organization (or any healthy team, for that matter) should be tracking itself across a variety of metrics. This is not covered by the standard CS curriculum but is readily encountered in the real world. Once someone is paying for software, there will invariably be questions about how that money is being spent. The most common metrics are bug count and velocity. Followed by automated code coverage. These are common because they’re the cheapest to produce. Bugs are, unfortunately, the most visible part of engineering output. Counting them is the start of reducing them. Code coverage is freely available in every modern build pipeline, although not always enabled. And velocity is the treasured metric of any young engineering leader, the end-all answer to the question “How much work are we getting done!?”
However, once you start looking, there is so much more insight you can gain and so many more things to track and compare. And, eventually, when you’re answering to very clever investors, you’ll need to provide the metrics that they care about. One of those, which I have come to appreciate, is the sprint completion percentage. This expounds on velocity and compares that actual value to the estimated or planned value. A high velocity is excellent, but accurate forecasting is even better for the overall business. This metric is easy enough to retrieve. Azure DevOps (ADO) has this baked into its velocity dashboards. The granularity is obviously at the sprint level.
With a little API magic, we can easily get:
| Team | Iteration Path | StartDate | EndDate | Planned | Completed | Completed Late | Incomplete | Total |
|---|---|---|---|---|---|---|---|---|
| Avengers | 21 | 2023-10-10 | 2023-10-23 | 87 | 58 | 0 | 0 | 58 |
| Avengers | 20 | 2023-09-26 | 2023-10-09 | 46 | 38 | 0 | 0 | 38 |
| Avengers | 19 | 2023-09-12 | 2023-09-25 | 51 | 50 | 0 | 0 | 50 |
| X-Men | 21 | 2023-10-10 | 2023-10-23 | 51 | 41 | 0 | 0 | 41 |
| X-Men | 20 | 2023-09-26 | 2023-10-09 | 66 | 79 | 0 | 3 | 79 |
| X-Men | 19 | 2023-09-12 | 2023-09-25 | 18 | 30 | 0 | 0 | 30 |
| Justice League | 21 | 2023-10-10 | 2023-10-23 | 90 | 75 | 0 | 0 | 75 |
| Justice League | 20 | 2023-09-26 | 2023-10-09 | 120 | 121 | 8 | 0 | 129 |
| Justice League | 19 | 2023-09-12 | 2023-09-25 | 108 | 77 | 0 | 0 | 77 |
The definitions for these states can be found here.
We need to do a little more math, though, for this to become a valuable reporting metric. Unfortunately, the rest of the business and the investors don’t care about your sprints; they care about monthly and quarterly aggregates.
So, let’s start there with the math that rolls up sprints to a monthly value. It’s pretty fun. We need to determine what month a sprint falls into. My calculation chooses the month that contains more days of the sprint, and if it is equal, the sprint starts.
| Team | Iteration Path | StartDate | EndDate | Planned | Completed | Completed Late | Incomplete | Total | Completion % | Month | Year |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Avengers | 21 | 2023-10-10 | 2023-10-23 | 87 | 58 | 0 | 0 | 58 | 67% | 10 | 2023 |
| Avengers | 20 | 2023-09-26 | 2023-10-09 | 46 | 38 | 0 | 0 | 38 | 83% | 10 | 2023 |
| Avengers | 19 | 2023-09-12 | 2023-09-25 | 51 | 50 | 0 | 0 | 50 | 98% | 9 | 2023 |
| X-Men | 21 | 2023-10-10 | 2023-10-23 | 51 | 41 | 0 | 0 | 41 | 80% | 10 | 2023 |
| X-Men | 20 | 2023-09-26 | 2023-10-09 | 66 | 79 | 0 | 3 | 79 | 120% | 10 | 2023 |
| X-Men | 19 | 2023-09-12 | 2023-09-25 | 18 | 30 | 0 | 0 | 30 | 167% | 9 | 2023 |
| Justice League | 21 | 2023-10-10 | 2023-10-23 | 90 | 75 | 0 | 0 | 75 | 83% | 10 | 2023 |
| Justice League | 20 | 2023-09-26 | 2023-10-09 | 120 | 121 | 8 | 0 | 129 | 108% | 10 | 2023 |
| Justice League | 19 | 2023-09-12 | 2023-09-25 | 108 | 77 | 0 | 0 | 77 | 71% | 9 | 2023 |
Aggregating these values can be done in a few different ways. We’re combining teams and sprints to get a monthly representation for the group as a whole. I’ve found four reasonable ways to calculate this value across teams and sprints:
- Basic Average
- Unweighted Average
- Weighted Average
- “Inverted”
Basic Average
The most basic average. This would be the average of all the values for the Completion % column for a given month and year. While this is a straightforward value to calculate, I’ve found it gives too much weight to the individual sprints. For example, one lousy sprint, even with a minimal planned value, can drastically change this calculation.
Unweighted
This is the sum of the Total column divided by the sum of the Planned column for a given month and year. This assigns too little weight to individual sprints and doesn’t address the discrepancies in point values across teams.
Weighted
This has been my go-to calculation for years. This is a two-phased calculation. First, we roll up the value for the individual teams. We do this with the unweighted model but filter by Team in addition to month and year. Then, we average those values. This handles a team having a lousy sprint but recovering in the next, as well as the differences in point values.
But what about team B? They didn’t get all that work done. It doesn’t feel like the numbers represent the reality if the work not getting done was high value / high vis. The 1st phase of the weighted model allows for a disappointing sprint. And if the team is working ahead or catching up, we’re sweeping that bad sprint under the rug. While this hadn’t always directly worried me, my colleagues who had been expecting certain things and not seeing them delivered despite the 100%+ completion rates were getting a little frustrated.
So I’ve come up with a new number to properly represent just that: how much work we aren’t getting done every month.
“Inverted”
“Inverted” may be more representative of the commitment to the business. It shows if we did what we committed to but discounts the value of above and beyond work. This calculation has a maximum of 100%. The calculation is multi-phased. The first phase is the same as weighted. Then, we “invert” the monthly team values. If the number is less than 100%, we report the difference; otherwise, we report 0. Next, we average those shortfall percentages. And finally, we subtract that value from 100%.
The inverted value is more representative of our accountability to the business. It should be noted that this value doesn’t entirely neglect above and beyond work but severely discounts it. Namely, when the X-Men go above and beyond, it won’t outweigh the shortcomings of the Avengers that month.
Conclusion
Tracking software team metrics is an essential aspect of maintaining a healthy engineering organization. While common metrics such as bug count and velocity provide a basic understanding of team performance, they often fall short in providing a comprehensive view of the team’s efficiency and productivity. This article has explored the concept of sprint completion percentage as a more insightful metric, offering a comparison of actual work done against planned work.
In essence, the choice of metric and calculation method should align with the team’s objectives and the expectations of stakeholders. By adopting a more nuanced approach to tracking software team metrics, organizations can gain deeper insights into team performance, improve forecasting accuracy, and ultimately drive better business outcomes.