Python Refactorings - Part 1

Six examples of ways to refactor your Python code, and why they are improvements

Written by Nick Thapen on May 11, 2020

Writing clean, Pythonic code is all about making it as understandable, yet concise, as possible. This is the first part of a series on Python refactorings, based on those that can be done automatically by Sourcery, the next part can be found here.

The focus here is on why these changes are good ideas, not just on how to do them.

1. Merge nested if conditions

Too much nesting can make code difficult to understand, and this is especially true in Python, where there are no brackets to help out with the delineation of different nesting levels.

Reading deeply nested code is confusing, since you have to keep track of which conditions relate to which levels. We therefore strive to reduce nesting where possible, and the situation where two if conditions can be combined using and is an easy win.

Before:

if a:
    if b:
        return c

After:

if a and b:
    return c

2. Hoist repeated code outside conditional statement

We should always be on the lookout for ways to remove duplicated code. An opportunity for code hoisting is a nice way of doing so.

Sometimes code is repeated on both branches of a conditional. This means that the code will always execute. The duplicate lines can be hoisted out of the conditional and replaced with a single line.

if sold > DISCOUNT_AMOUNT:
    total = sold * DISCOUNT_PRICE
    label = f"Total: {total}"
else:
    total = sold * PRICE
    label = f"Total: {total}"

By taking the assignment to label outside of the conditional we have removed a duplicate line of code, and made it clearer what the conditional is actually controlling, which is the total.

if sold > DISCOUNT_AMOUNT:
    total = sold * DISCOUNT_PRICE
else:
    total = sold * PRICE
label = f"Total: {total}"

3. Replace yield inside for loop with yield from

One little trick that often gets missed is that Python’s yield keyword has a corresponding yield from for collections, so there’s no need to iterate over a collection with a for loop. This makes the code slightly shorter and removes the mental overhead and extra variable used by the for loop. Eliminating the for loop also makes the yield from version about 15% faster.

Before:

def get_content(entry):
    for block in entry.get_blocks():
        yield block

After:

def get_content(entry):
    yield from entry.get_blocks()

4. Use any() instead of for loop

A common pattern is that we need to find if some condition holds for one or all of the items in a collection. This can be done with a for loop such as this:

found = False
for thing in things:
    if thing == other_thing:
        found = True
        break

A more concise way, that clearly shows the intentions of the code, is to use Python’s any() and all() built in functions.

found = any(thing == other_thing for thing in things)

any() will return True when at least one of the elements evaluates to True, all() will return True only when all the elements evaluate to True.

These will also short-circuit execution where possible. If the call to any() finds an element that evalutes to True it can return immediately. This can lead to performance improvements if the code wasn’t already short-circuiting.

5. Replace list() with []

The most concise and Pythonic way to create a list is to use the [] notation.

x = []

This fits in with the way we create lists with elements, saving a bit of mental energy that might be taken up with thinking about two different ways of creating lists.

x = ["first", "second"]

Doing things this way has the added advantage of being a nice little performance improvement.

Here are the timings before and after the change:

$ python3 -m timeit "x = list()"
5000000 loops, best of 5: 63.3 nsec per loop

$ python3 -m timeit "x = []"
20000000 loops, best of 5: 15.8 nsec per loop

Similar reasoning and performance results hold for replacing dict() with {}.

6. Hoist statements out of for/while loops

Another type of hoisting is pulling invariant statements out of loops. If a statement just sets up some variables for use in the loop, it doesn’t need to be inside it. Loops are inherently complex, so making them shorter and easier to understand should be on your mind while writing them.

In this example the city variable gets assigned inside the loop, but it is only read and not altered.

for building in buildings:
    city = "London"
    addresses.append(building.street_address, city)

It’s therefore safe to hoist it out, and this makes it clearer that the same city value will apply to every building.

city = "London"
for building in buildings:
    addresses.append(building.street_address, city)

This also improves performance - any statement in a loop is going to be executed every time the loop runs. The time spent on these multiple executions is being wasted, since it only needs to be executed once. This saving can be significant if the statements involve calls to databases or other time-consuming tasks.

Conclusion

As mentioned, each of these is a refactoring that Sourcery can automatically perform for you. We’re planning on expanding this blog series out and linking them in as additional documentation, with the aim of turning Sourcery into a great resource for learning how to improve your Python skills. You can read the next part in the series here.

If you have any thoughts on how to improve Sourcery or its documentation please do email us or hit me up on Twitter