The link between complexity, cognitive overloading, and velocity.
Feb 14, 2022
Typical measures of complexity in code are focused heavily on nesting and logical complexity, which are critically important, but they’re not the only factor we should be considering when we think about complexity and code quality. Consider the two functions below:
def payroll(employee_database):
for name, employee in employee_database.items():
if (
has_passed_probation(employee)
and (is_paid_monthly(employee) and is_end_of_month)
or (is_paid_weekly(employee) and is_end_of_week)
):
run_payroll(name, employee.salary)
letter = write_letter(name)
send_to_printer(letter)
def minimum_odd_number(numbers):
min_value = 9999
for number in numbers:
if number < min_value:
if number % 2 != 0:
min_value = number
return min_value
Both of these functions have a cognitive complexity score of 6, but the top function is fairly obviously harder to understand than the bottom function. But why is this the case?
In the second function, we can basically step our way through the function, only checking one variable at a time:
number % 2
, 0
, and we
still need to track that number < min_value
)?min_value
and ultimately return it.Compare that with the first function:
run_payroll
(2 more variables).Looking at it this way suggests that a big piece of the reason that Function 1 is harder to understand than Function 2 is because of the number of different variables, states, and values that we need to keep track of while we work our way through the function. This sparked us to create a new metric - what we term Working Memory - to account for the number of distinct pieces of program state that a developer needs to keep in their head when analysing a function and the impact this has on the understandability and maintainability of code.
Note: Sometimes we hear from developers that they think that the Working Memory is directly related to the amount of computer memory a function takes up or is directly related to the performance of the function. While there are cases where a higher Working Memory will result in the function taking up more computer memory or having worse performance, these are incidental, not the goal of the metric. (If you have a suggestion for a better name for the metric - let us know!)
The Working Memory for your code changes from line to line. It increases with complex lines and as you need to retain information from surrounding lines and decreases as you reduce the amount of information you need to actively consider. There are 5 rules behind the calculation of the Working Memory metric:
Rule 1: Each variable, constant, and function call in a line increases the Working Memory by 1.
Rule 2: Any variables declared above the line that are used below the line increase the Working Memory by 1 per variable. Even though these variables are not used directly on the line, you need to remember what they are doing while reading the line to understand the full code.
Rule 3: If there are any conditional checks that impact the line of code in question then the variables, constants, or function calls in that conditional check each increment the Working Memory by 1.
Rule 4: Complex variables which involve a base object as well as a variable,
such as self.price
or self.price.currency
add 2 to the Working Memory rather
than the standard 1 point increase for a variable. This is due to the fact that
we need to account for both the full variable and the base object.
Rule 5: If we have a list of experssions such as arguments to a function, creating a dictionary or list, we take the peak Working Memory from the list rather than adding them all up.
We consider the Peak Working Memory within a method to determine the relevant complexity for the method because it will determine whether the function is overly complex or not.
Let’s take a quick look at an example function to see how we would calculate the Working Memory line by line:
def example_function(self, var, b, c):
if condition_holds(var):
a = b + c
else:
self.alter_vars(b, c)
a = b - c
return a
condition_holds()
function
call and 1 for var
a
, b
, and c
as well as
the 2 from Line 1 that we must retain because of Rule 3.self.alter_vars()
(Rule 4), 2 for
each b
and c
, 2 for the conditions from Line 1, and 1 more because a
is
declared above the line and referenced again below (Rule 2)The max Working Memory for this function comes from line 4, giving the whole function a Working Memory of 7.
If you’d like to see more examples of how to calculate Working Memory, you should check out Nick’s previous blog post on the subject.
So far, we’ve largely discussed the value of Working Memory in an intuitive sense - that methods with fewer variables, constants, and function calls to keep track of seem to be and should be easier to understand. But, why is this the case?
To answer this question, let’s quickly turn to psychology & neuroscience. George Miller, Eugene Galanter, and Charles Pribram are credited with coining the term Working Memory in the 1960s as a measure of human cognitive recall capacity. Working Memory in this context is similar to what is often discussed as short-term memory but similar to how we’ve used it for code, it directly relates to the capacity to retain and manipulate information for a limited period of time.
Experiments over the past 60 years have repeatedly shown that there is a clear capacity limit to Working Memory and that beyond that capacity, humans are not able to retain or manipulate additional data in a limited period of time. For example - in one of Miller’s experiments he looked at how many digits a person could remember and repeat back in order. Consistently he found that the limit for this task (as well as many other tasks involving Working Memory) was 7 ± 2. This consistent threshold of 7 ± 2 has been termed “the magical number seven, plus or minus two” and represents a good average benchmark for the amount of information that the average person can retain in a short window.
When it comes to software, Working Memory (both in the neuroscience sense and in the code quality sense we discussed earlier) plays a crucial role in both the speed and quality of future development. Complex code with a high Working Memory score will push the limit of what a developer reading the code can easily understand and forces them to frequently revisit or reassess the code to make sure they know what it is doing. This leads to slowdowns in new development and in code reviews, as reviewers also struggle with the increased burden of high working memory.
If we want to reduce the Working Memory for our code we need to reduce the number of active contributors to Working Memory that we need to track throughout the method. One of the simplest ways to do this is by grouping or chunking together multiple pieces of state that would normally each contribute to the code’s Working Memory so we only need to now track that chunk.
For example, if we return to the function that we started out with:
def original_function(employee_database):
for name, employee in employee_database.items():
if (
has_passed_probation(employee)
and (is_paid_monthly(employee) and is_end_of_month)
or (is_paid_weekly(employee) and is_end_of_week)
):
run_payroll(name, employee.salary)
letter = write_letter(name)
send_to_printer(letter)
We can group together that initial set of conditional checks into a single chunk
by introducing a variable paid_today
:
def introducing_variable_version(employee_database):
for name, employee in employee_database.items():
paid_today = (
has_passed_probation(employee)
and (is_paid_monthly(employee) and is_end_of_month)
or (is_paid_weekly(employee) and is_end_of_week)
)
if paid_today:
run_payroll(name, employee.salary)
letter = write_letter(name)
send_to_printer(letter)
Here we still need to handle the complexity of figuring out the value of
paid_today
but once we do, we can just use the variable without having to
worry about the details of calculating it.
We can then separate out that complexity even further if we move the
determination of the value of paid_today
into its own function.
def extracted_function_version(employee_database):
for name, employee in employee_database.items():
if is_paid_today(employee):
run_payroll(name, employee.salary)
letter = write_letter(name)
send_to_printer(letter)
def is_paid_today(employee):
return (
has_passed_probation(employee)
and (is_paid_monthly(employee) and is_end_of_month)
or (is_paid_weekly(employee) and is_end_of_week)
)
By breaking out complex analyses into their own methods, we can simplify the
Working Memory of our main function because we normally don’t need to worry
about what goes into is_paid_today
when trying to understand the main
function. When we want to dive into those details we can look at the extracted
function separately - effectively splitting the cognitive load that the high
Working Memory function would have created into two discrete steps.
In general, the best way to reduce the Working Memory of your code is to separate out the logic of your program so that each section deals with a specific responsibility and they’re not intertwined.
We all have a capacity limitation to the amount of information we can store, process, and manipulate in our heads in the short term. Looking at your code’s Working Memory metric allows you to understand where in your code you are creating an undue burden on this capacity, leading to slowdowns in new feature development, code reviews, and increasing the risk of bugs and errors being introduced to your code.
This is the second part of a seven-part series on code quality, technical debt, and development velocity. Check out how Sourcery can help to automatically improve & refactor your code to reduce tech debt, improve code quality, and increase velocity. Part 1 covered multiple measures of complexity - cyclomatic vs cognitive complexity