Ensuring Sourcery improves code without changing its functionality
Apr 28, 2020
Refactoring is the process of improving the quality of code without changing what it does.
Our challenge in building Sourcery has been to automate this, and we've made some big strides which you can try out here. A question we're often asked is how we ensure that the changes Sourcery makes haven't affected the code's functionality. This is key - having trust that Sourcery won't break anything allows users to accept the refactorings with confidence.
The approach we take relies on three main pillars:
Our static analysis toolset borrows heavily from compiler theory. Initial analysis of each piece of code to be refactored generates an extended reaching definition to identify where variables are read and written.
a = {"example": 1}
b = a
this_could_add_to(a)
print(b)
For example this analysis picks up that the variable b
has been linked to the
variable a
. This means that the print
statement and the call to
this_could_add_to
cannot be swapped without possibly changing the code's
functionality.
Other types of analysis we do include type inference, analysis of logical statements to infer which conditions are true at each statement, and analysis of control-flow and loops.
All of this underpins Sourcery's refactoring engine, allowing us to determine when it is safe to make changes and which changes are safe to make.
We believe unit testing to be an essential feature of a good quality codebase, and we use them extensively.
For each refactoring we generate many variations of positive and negative tests.
Positive tests are of the form: given this source code, it should be refactored to produce this expected source code.
Negative tests are of the form: this source code should not be changed by this type of refactoring.
For example - here we know that the inner a
is truthy:
a = get_a()
if a:
b = func()
if a:
print(b)
so we can remove the inner conditional:
a = get_a()
if a:
b = func()
print(b)
On the other hand there are many similar pieces of source code that should not be refactored:
a = get_a()
if a:
a = func()
if a:
print(b)
Even though the outer a
was truthy, we can say nothing about the value of the
inner a
and so cannot remove the inner if.
These negative tests are just as important, as they capture all the patterns that would results in functionality changing. These include properties like: changing or using non local state, variables not having the required type, and dependencies between statements not allowing reordering. This type of test ensures that our static analysis is functioning correctly.
We rely extensively on our analysis and unit testing, but operate under the principle of trust, but verify. No matter how extensive our analysis and comprehensive our testing, it's still possible for changes in functionality to slip through the gaps.
That's why we also test Sourcery on open source libraries.
As part of our Continuous Integration pipeline (run using GitHub actions) we test all refactoring changes on a suite of open source libraries. The steps are the same for all of them:
Each test failure that is found is reviewed to identify which refactoring caused it. This always results in an improvement to our static analysis or refactoring definitions, and new tests are added so that the problem cannot reoccur.
As we stated at the start, refactoring is the art of improving code without changing the functionality. We consider it a major bug if Sourcery ever breaks working code and so we never release a version that does not pass all the above testing.
If you ever discover an example where Sourcery changes the functionality of source code, please raise an issue. We will fix it and add it to our test suite so it doesn't happen again.
We hope you found this interesting and it gives you some extra confidence. You can use Sourcery for free on GitHub or through your IDE!