Six examples of ways to refactor your Python code, and why they are improvements
May 11, 2020
Writing clean, Pythonic code is all about making it as understandable, yet concise, as possible. This is the first part of a series on Python refactorings, based on those that can be done automatically by Sourcery, the next part can be found here.
The focus here is on why these changes are good ideas, not just on how to do them.
Too much nesting can make code difficult to understand, and this is especially true in Python, where there are no brackets to help out with the delineation of different nesting levels.
Reading deeply nested code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two if
conditions can be combined using
and
is an easy win.
Before:
if a:
if b:
return c
After:
if a and b:
return c
We should always be on the lookout for ways to remove duplicated code. An opportunity for code hoisting is a nice way of doing so.
Sometimes code is repeated on both branches of a conditional. This means that the code will always execute. The duplicate lines can be hoisted out of the conditional and replaced with a single line.
if sold > DISCOUNT_AMOUNT:
total = sold * DISCOUNT_PRICE
label = f"Total: {total}"
else:
total = sold * PRICE
label = f"Total: {total}"
By taking the assignment to label
outside of the conditional we have removed a
duplicate line of code, and made it clearer what the conditional is actually
controlling, which is the total.
if sold > DISCOUNT_AMOUNT:
total = sold * DISCOUNT_PRICE
else:
total = sold * PRICE
label = f"Total: {total}"
One little trick that often gets missed is that Python's yield
keyword has a
corresponding yield from
for collections, so there's no need to iterate over a
collection with a for loop. This makes the code slightly shorter and removes the
mental overhead and extra variable used by the for loop. Eliminating the for
loop also makes the yield from
version about 15% faster.
Before:
def get_content(entry):
for block in entry.get_blocks():
yield block
After:
def get_content(entry):
yield from entry.get_blocks()
A common pattern is that we need to find if some condition holds for one or all of the items in a collection. This can be done with a for loop such as this:
found = False
for thing in things:
if thing == other_thing:
found = True
break
A more concise way, that clearly shows the intentions of the code, is to use
Python's any()
and all()
built in functions.
found = any(thing == other_thing for thing in things)
any()
will return True
when at least one of the elements evaluates to
True
, all()
will return True
only when all the elements evaluate to
True
.
These will also short-circuit execution where possible. If the call to any()
finds an element that evalutes to True
it can return immediately. This can
lead to performance improvements if the code wasn't already short-circuiting.
The most concise and Pythonic way to create a list is to use the []
notation.
x = []
This fits in with the way we create lists with elements, saving a bit of mental energy that might be taken up with thinking about two different ways of creating lists.
x = ["first", "second"]
Doing things this way has the added advantage of being a nice little performance improvement.
Here are the timings before and after the change:
$ python3 -m timeit "x = list()"
5000000 loops, best of 5: 63.3 nsec per loop
$ python3 -m timeit "x = []"
20000000 loops, best of 5: 15.8 nsec per loop
Similar reasoning and performance results hold for replacing dict()
with {}
.
Another type of hoisting is pulling invariant statements out of loops. If a statement just sets up some variables for use in the loop, it doesn't need to be inside it. Loops are inherently complex, so making them shorter and easier to understand should be on your mind while writing them.
In this example the city
variable gets assigned inside the loop, but it is
only read and not altered.
for building in buildings:
city = "London"
addresses.append(building.street_address, city)
It's therefore safe to hoist it out, and this makes it clearer that the same
city
value will apply to every building
.
city = "London"
for building in buildings:
addresses.append(building.street_address, city)
This also improves performance - any statement in a loop is going to be executed every time the loop runs. The time spent on these multiple executions is being wasted, since it only needs to be executed once. This saving can be significant if the statements involve calls to databases or other time-consuming tasks.
As mentioned, each of these is a refactoring that Sourcery can automatically perform for you. We're planning on expanding this blog series out and linking them in as additional documentation, with the aim of turning Sourcery into a great resource for learning how to improve your Python skills. You can read the next part in the series here.
If you have any thoughts on how to improve Sourcery or its documentation please do email us or hit me up on Twitter