Six more examples of ways to refactor your Python code, and why they are improvements
Jun 10, 2020
Writing clean, Pythonic code is all about making it as understandable, yet concise, as possible. This is the second part of a series on Python refactorings, based on those that can be done automatically by Sourcery. Catch the first part here and the next part here.
The focus of this series is on why these changes are good ideas, not just on how to do them.
A pattern that we encounter time and again when coding is that we need to create a collection of values.
A standard way of doing it in most languages would be as follows:
cubes = []
for i in range(20):
cubes.append(i**3)
We create the list, and then iteratively fill it with values - here we create a
list of cubed numbers, that end up in the cubes
variable.
In Python we have access to list comprehensions. These can do the same thing on one line, cutting out the clutter of declaring an empty list and then appending to it:
cubes = [i**3 for i in range(20)]
We've turned three lines of code into one which is a definite win - it means less scrolling back and forth when reading methods and helps keep things manageable.
Squeezing code onto one line can make it more difficult to read, but for comprehensions this isn't the case. All of the elements that you need are nicely presented, and once you are used to the syntax it is actually more readable than the for loop version.
Another point is that the assignment is now more of an atomic operation - we're
declaring what cubes
is rather than giving instructions on how to build it.
This makes the code read like more of a narrative, since going forward we will
care more about what cubes
is than the details of its construction.
Finally comprehensions will usually execute more quickly than building the collection in a loop, which is another factor if performance is a consideration.
If you need a deeper dive into comprehensions check out my post here. It goes into exactly how they work, how to use them for filtering, and dictionary and set comprehensions.
Augmented assignments are a quick and easy bit of Python syntax to include.
Wherever there's code like this:
count = count + other_value
we can replace it with:
count += other_value
This is a bit shorter and clearer - we don't need to think about the count
variable twice. Other operators that can be used include -=
, *=
, /=
and
**=
.
One thing to be slightly careful of is that the type you're assigning to has to
have the appropriate operator defined. For instance numpy
arrays do not
support the /=
operation.
Something that we often see in people's code is assigning to a result variable and then immediately returning it.
def state_attributes(self):
"""Return the state attributes."""
state_attr = {
ATTR_CODE_FORMAT: self.code_format,
ATTR_CHANGED_BY: self.changed_by,
}
return state_attr
Here it's better to just return the result directly:
def state_attributes(self):
"""Return the state attributes."""
return {
ATTR_CODE_FORMAT: self.code_format,
ATTR_CHANGED_BY: self.changed_by,
}
This shortens the code and removes an unnecessary variable, reducing the mental load of reading the function.
Where intermediate variables can be useful is if they then get used as a
parameter or a condition, and the name can act like a comment on what the
variable represents. In the case where you're returning it from a function, the
function name is there to tell you what the result is - in the example above
it's the state attributes, and the state_attr
name wasn't providing any extra
information.
It often happens that you want to set a variable to one of two different values, depending on some program state.
if condition:
x = 1
else:
x = 2
This can be written on one line using Python's conditional expression syntax (its version of the ternary operator):
x = 1 if condition else 2
This is definitely more concise, but it is one of the more controversial refactorings (along with list comprehensions). Some coders dislike these expressions and find them slightly harder to parse than writing them out fully.
Our view is that as long as the conditional expression is short and fits on one
line it is an improvement. There's only one statement where x
is defined as
opposed to having to read two statements plus the if-else lines. Similarly to
the comprehension example, when we're scanning the code we usually won't need to
know the details of how x
gets assigned, and can just see that it's being
assigned and move on.
One tip is that functions like any
, all
and sum
allow you to pass in a
generator rather than a collection. This means that instead of doing this:
hat_found = any([is_hat(item) for item in wardrobe])
you can write:
hat_found = any(is_hat(item) for item in wardrobe)
This removes a pair of brackets, making the intent slightly clearer. It will also return immediately if a hat is found, rather than having to build the whole list. This lazy evaluation can lead to performance improvements.
Note that we are actually passing a generator into any()
so strictly speaking
the code would look like this:
hat_found = any((is_hat(item) for item in wardrobe))
but Python allows you to omit this pair of brackets.
The standard library functions that accept generators are:
"all", "any", "enumerate", "frozenset", "list", "max", "min", "set", "sum", "tuple"
The last refactoring to look at is where we reach the end of a method and want
to return True
or False
. A common way of doing so is like this:
def function():
if isinstance(a, b) or issubclass(b, a):
return True
return False
However, it's neater just to return the result directly like so:
def function():
return isinstance(a, b) or issubclass(b, a)
This can only be done if the expression evaluates to a boolean. In this example:
def any_hats():
hats = [item for item in wardrobe if is_hat(item)]
if hats or self.wearing_hat():
return True
return False
We can't do this exact refactoring, since now we could be returning the list of
hats rather than True
or False
. To make sure we're returning a boolean, we
can wrap the return in a call to bool()
:
def any_hats():
hats = [item for item in wardrobe if is_hat(item)]
return bool(hats or self.wearing_hat())
As mentioned, each of these is a refactoring that Sourcery can automatically perform for you. We're planning on expanding this blog series out and linking them in as additional documentation, with the aim of turning Sourcery into a great resource for learning how to improve your Python skills. You can read the next part in the series here.
If you have any thoughts on how to improve Sourcery or its documentation please do email us or hit me up on Twitter