Measuring technical debt's ongoing impact
Jan 26, 2023
Interest rates are rising 157%! No, I'm not talking about the Fed's latest decision, but about the slowdown Fictional Inc. faced after releasing version 3.0 of their platform. Fortunately for them, the product release was incredibly successful and they're starting to see revenue growth quickly picking up, but they now need to think about how they're going to deal with their technical debt they introduced as part of the release. The new tech debt that's been introduced as part of a new release can be thought of as increasing the interest rate and increasing the slowdown the team faces in the future.
(I'm going to assume you're fairly familiar with the concept of technical debt here, but if you need a refresher to get up to speed here's a quick primer)
Ok, you’re probably not actually going to hear an engineering manager talk like this about their technical debt. But why not? Being able to measure and quantify the ongoing impact of your tech debt is critical if you want to put together an actionable plan to address it.
When we think about technical debt, the interest is the amount of time lost on current and future development to your existing levels of technical debt. This means it's the critical piece of the debt to consider when thinking about future decisions to pay back the principal (the cost of rewriting, refactoring, or fixing the code responsible for the debt) - since we'll only ever consider it if the interest is high enough.
Technical debt clearly slows down new development - but that on its own doesn't mean that we should be fixing tech debt everywhere. Going back to rewrite or refactor existing code can be significantly costly - so we only want to pay back our technical debt principal if our interest (the amount it's slowing us down moving forward) exceeds our principal.
One obvious problem immediately crops up - the principal is a fixed unit of time (hours to fix/rewrite) but the interest is a rate of hours lost per time. To account for this we need to introduce the idea of an impact interval - the time over which we care about whether the future slowdowns from technical debt exceeds the cost of the rewrite. The impact interval you care about will be heavily dependent on your company, your typical planning process, and its stage or the lifecycle of your codebase - but I'll personally usually look at a 3 month impact interval. At our early stage as a company, looking at anything in year-long + timescales is too broad, but anything shorter than 2–3 months will heavily underestimate the impact of technical debt as we'll see later.
This means that our level of tech debt is not worth addressing if:
For example: if we had a small project that we knew had tech debt that slowed us down by 2 hour/week, it would take us 4 days to refactor away that debt, and care about a 3 month impact interval then we wouldn't spend the time to pay off that debt yet.
Now, this doesn't actually answer the level of what is a healthy level of technical debt - because I think we can all agree that facing huge slowdowns on the team isn't healthy. Instead we now have a quick way to determine when we should or shouldn't focus on rewriting and refactoring. We'll take a look at what a healthy level of debt actually is a bit later in the article.
Determining our tech debt interest rate requires us to figure out how much we are being slowed down by different decisions we've made. Unfortunately, there's not an obvious or trivial way to track how much your development is being slowed down - but there are three different approaches you can take to get good approximations.
The most direct way to tie slowdowns to different sections of our codebase is to look at how velocity varies over time, across different sections of your project. Looking across different areas of the codebase you can start to identify variations (eg. any development touching this analysis section takes 3x as long as anything else) across the different sections. Looking at the variation of an area over time also can give you some indication of how new development has impacted the rate of future development and gives an indication of the level of interest your team is dealing with.
For example: If we have a relatively simple project with 4 different areas that we'll work on then we can look at how the velocity changes over time (here we're tracking velocity in story points/developer month).
From here we can see that D has always taken ~3x as long to work on as any of the other areas for similarly complex tasks. This implies that it has 3x the interest as all of the other sections of the codebase. B used to be relatively on par with A & C, but starting in month 4 it all of a sudden jumped up to take 2x the time. This suggests that we introduced some debt here that's doubled our interest rate vs what it was before for B.
One thing to call out is that we're not talking about the interest rate for the entire codebase, but instead interest rate for individual components/areas - that's because the slowdowns that are introduced by build ups in technical debt aren't usually issues that affect everything that we do, instead they're localized to one part of the codebase.
There are some important caveats to think about when it comes to velocity based measurement.
One quick proxy for velocity based measurement is to ask your engineering team to estimate relatively how long it takes to complete a project/task in each major area of your codebase. Agree on some established baseline for a well understood/frequently used area and then have everyone estimate each other area as a percentage or multiple of that baseline. While not quite as rigorous as a full velocity based measurement approach it can give a quick idea of your relative tech debt interest based on your team's insight and intuition.
A different approach is to identify specific instances of technical debt within your project and estimate how much they may be each slowing you down. Part of this can be done by using automated tools, such as static analysis tools, to find common issues around code quality that may have impacts on readability, extendability, or maintainability of a project. For each type of issue you can assign it an interest cost (eg. 5 min/week or 1%) based on your team's experience in dealing with these types of issues.
But, this will only capture a subset of technical debt causing issues - others will be more subtle or more bespoke to your codebase and will only be observed while your team is actively working on that area of the code. In this instance, you'll want to record the specific issue (tied to an area of your codebase) along with the estimated impact it has in slowing down development. For tracking these issues we recommend using some sort of issue tracker - either in an issue backlog in GitHub, Jira, etc or using a purpose built tech debt issue tracker like Stepsize.
Some of the drawbacks to this approach are:
There are a variety of code quality metrics that you can use to get an overall sense of the status of your codebase and, in turn, get an estimate of how much technical debt you currently have impacting future development in each area. At Sourcery, we tend to look at Complexity, Working Memory, and Method Length as the critical metrics when looking within a function or class - but you can also look at things like Maintainability Index, Doc & Test Coverage, Duplication Rate, and many others.
Similar to the Issue based approach, you can assign a relative impact of different scores (or of an overall quality or health score) to the ongoing and future slowdown in development due to technical debt. Research has shown strong negative correlation between things like Complexity & Velocity as well as with code quality and bug risk, maintenance load, and more.
Looking at an example - let's revisit the simple 4 part codebase we looked at in the section on Velocity Based Measurement.
We can easily see the problem sections of this project in the table (highlighted in red) and calculating the Interest estimate is relatively straightforward - simply sum up the interest impacts of the different components.
Some of the drawbacks to this approach are:
This quality based measurement approach is the least precise of the three approaches we've looked at - but it's very effective at giving a holistic look at different areas of your project over time. You can combine this approach with the issue based approach we just discussed to balance tracking specific issues alongside tracking general quality and health issues across each section of your project.
For all three of these approaches we need to have a way to map impact in different sections of our codebase against the frequency with which we actually touch that section of the codebase. If there is a section of our project which is a nightmare to deal with but which no one ever touches any more then it's not actually heavily impacting our ongoing technical debt. Conversely, a small but persistent slowdown on a section of the codebase that's contributed to every day can result in huge time losses very quickly.
To account for this we'll need to look at how often each area of our project is contributed to. There are a few different approaches we can take here - looking at the Git history to understand which area is broadly touched most often, using more focused tooling like Codescene to get more digestible historical contribution data, or using a forward facing approach by looking at our upcoming plans and priorities to understand where our team will be spending most of their time over the next several months.
Regardless of how we get the data - we can then get a breakdown of what percentage of our time we will be spending interacting with each section of our codebase. Combining this together with the Interest contribution that we've already determined we can now see exactly how much we expect to be slowed down by when dealing with each section of our codebase moving forward.
Continuing on our example from earlier - if we took a forward facing view and new all of the projects we were going to work on in the next 3 months and could confidently estimate that this was the amount of time spent in each area of the codebase (remember this is a very simple project):
We now can tie that back to the Interest estimates we had from our Velocity Based approach and our Quality Metric Based approach and get a good idea of where we're being slowed down the most.
Here we're using the slowdowns in velocity we saw for work on sections B & C from when we looked at velocity based measurement earlier and are using that to calculate how much lost time we expect due to tech debt in the next three months. Overall we expect to see more than 28 developer months worth of effort extra to be spent just on the debt. An important thing to consider from this approach is that all of this is looking at relative velocity - so the baseline projects are treated as having effectively no debt, which isn't normally accurate. The other issue with this approach is that it doesn't take into account variations in future debt levels - which are likely to occur. But they're hard to predict, so it is easier to disregard them.
Here we've taken our the typical projected delays based on the code quality of each section and projected that out over the next three months. Across the board we see significantly lower debt impact projections than when we used the velocity method. This is because the slowdown estimates based on tech debt are lower than what we were seeing in the velocity case - we might need to do some more calibrating in this case! Just as with the velocity based metric this isn't taking into account future changes in tech debt - but both estimates can help us determine how we should prioritize rewriting and refactoring the different sections of our project.
We've looked at a few different ways to account for how much our technical debt is impacting us on an ongoing basis, but we haven't fully answered the question as to what's a healthy level of technical debt.
Unfortunately, there isn't really a precise number. In the short term, accepting some debt might be pragmatic. But, in the long run we'll want to aim to keep our debt close to zero. Because lingering interest is going to prove very costly and tech debt keeps building upon itself. But, we don't want to be spending all of our time refactoring and rewriting issues that give us only marginal gains.
As we discussed earlier we generally don't want to spend time addressing areas of the codebase where:
But, at the extreme case of this we could wind up with a case where we're massively slowed down by a high level of debt that is extremely costly to fix - which isn't a good situation either.
The best approach is to fall somewhere in the middle. Set aside time in your ongoing planning to address technical debt issues and refactor existing code - prioritizing by what is currently the costliest to you. And continue to push that until you see significant diminishing returns from reducing your debt.
See how Sourcery is working to help development teams manage and reduce tech debt throughout the software development lifecycle. From language standards to project specific best practices you can continually improve the quality of the code your team is working with.