Python Refactorings

Python code — Photo by Chris Ried on Unsplash

Writing clean, Pythonic code is all about making it as understandable, yet concise, as possible. This is the second part of a series on Python refactorings, based on those that can be done automatically by Sourcery. Catch the first part here and the next part here.

The focus of this series is on why these changes are good ideas, not just on how to do them.

1. Convert for loop into list/dictionary/set comprehension

A pattern that we encounter time and again when coding is that we need to create a collection of values.

A standard way of doing it in most languages would be as follows:

cubes = []
for i in range(20):
    cubes.append(i**3)

We create the list, and then iteratively fill it with values - here we create a list of cubed numbers, that end up in the cubes variable.

In Python we have access to list comprehensions. These can do the same thing on one line, cutting out the clutter of declaring an empty list and then appending to it:

cubes = [i**3 for i in range(20)]

We've turned three lines of code into one which is a definite win - it means less scrolling back and forth when reading methods and helps keep things manageable.

Squeezing code onto one line can make it more difficult to read, but for comprehensions this isn't the case. All of the elements that you need are nicely presented, and once you are used to the syntax it is actually more readable than the for loop version.

Another point is that the assignment is now more of an atomic operation - we're declaring what cubes is rather than giving instructions on how to build it. This makes the code read like more of a narrative, since going forward we will care more about what cubes is than the details of its construction.

Finally comprehensions will usually execute more quickly than building the collection in a loop, which is another factor if performance is a consideration.

If you need a deeper dive into comprehensions check out my post here. It goes into exactly how they work, how to use them for filtering, and dictionary and set comprehensions.

2. Replace assignment with augmented assignment

Augmented assignments are a quick and easy bit of Python syntax to include.

Wherever there's code like this:

count = count + other_value

we can replace it with:

count += other_value

This is a bit shorter and clearer - we don't need to think about the count variable twice. Other operators that can be used include -=, *=, /= and **=.

One thing to be slightly careful of is that the type you're assigning to has to have the appropriate operator defined. For instance numpy arrays do not support the /= operation.

3. Inline variable that is only used once

Something that we often see in people's code is assigning to a result variable and then immediately returning it.

def state_attributes(self):
    """Return the state attributes."""
    state_attr = {
        ATTR_CODE_FORMAT: self.code_format,
        ATTR_CHANGED_BY: self.changed_by,
    }
    return state_attr

Here it's better to just return the result directly:

def state_attributes(self):
    """Return the state attributes."""
    return {
        ATTR_CODE_FORMAT: self.code_format,
        ATTR_CHANGED_BY: self.changed_by,
    }

This shortens the code and removes an unnecessary variable, reducing the mental load of reading the function.

Where intermediate variables can be useful is if they then get used as a parameter or a condition, and the name can act like a comment on what the variable represents. In the case where you're returning it from a function, the function name is there to tell you what the result is - in the example above it's the state attributes, and the state_attr name wasn't providing any extra information.

4. Replace if statement with if expression

It often happens that you want to set a variable to one of two different values, depending on some program state.

if condition:
    x = 1
else:
    x = 2

This can be written on one line using Python's conditional expression syntax (its version of the ternary operator):

x = 1 if condition else 2

This is definitely more concise, but it is one of the more controversial refactorings (along with list comprehensions). Some coders dislike these expressions and find them slightly harder to parse than writing them out fully.

Our view is that as long as the conditional expression is short and fits on one line it is an improvement. There's only one statement where x is defined as opposed to having to read two statements plus the if-else lines. Similarly to the comprehension example, when we're scanning the code we usually won't need to know the details of how x gets assigned, and can just see that it's being assigned and move on.

5. Replace unneeded comprehension with generator

One tip is that functions like any, all and sum allow you to pass in a generator rather than a collection. This means that instead of doing this:

hat_found = any([is_hat(item) for item in wardrobe])

you can write:

hat_found = any(is_hat(item) for item in wardrobe)

This removes a pair of brackets, making the intent slightly clearer. It will also return immediately if a hat is found, rather than having to build the whole list. This lazy evaluation can lead to performance improvements.

Note that we are actually passing a generator into any() so strictly speaking the code would look like this:

hat_found = any((is_hat(item) for item in wardrobe))

but Python allows you to omit this pair of brackets.

The standard library functions that accept generators are:

"all", "any", "enumerate", "frozenset", "list", "max", "min", "set", "sum", "tuple"

6. Simplify conditional into return statement

The last refactoring to look at is where we reach the end of a method and want to return True or False. A common way of doing so is like this:

def function():
    if isinstance(a, b) or issubclass(b, a):
        return True
    return False

However, it's neater just to return the result directly like so:

def function():
    return isinstance(a, b) or issubclass(b, a)

This can only be done if the expression evaluates to a boolean. In this example:

def any_hats():
    hats = [item for item in wardrobe if is_hat(item)]
    if hats or self.wearing_hat():
        return True
    return False

We can't do this exact refactoring, since now we could be returning the list of hats rather than True or False. To make sure we're returning a boolean, we can wrap the return in a call to bool():

def any_hats():
    hats = [item for item in wardrobe if is_hat(item)]
    return bool(hats or self.wearing_hat())

Conclusion

As mentioned, each of these is a refactoring that Sourcery can automatically perform for you. We're planning on expanding this blog series out and linking them in as additional documentation, with the aim of turning Sourcery into a great resource for learning how to improve your Python skills. You can read the next part in the series here.

If you have any thoughts on how to improve Sourcery or its documentation please do email us or hit me up on Twitter