How we've been adjusting the Google Python Style Guide for the Sourcery monorepo.
Oct 11, 2022
In this series, we're exploring how real-world projects use the Google Python Style Guide a.k.a GPSG. Let's start with the most obvious candidate: the Sourcery repository itself.
At Sourcery, we work with a monorepo that contains the vast majority of our code:
We've been using various code quality tools since the very beginning:
Applying the GPSG felt like a logical next step to make our codebase even more maintainable and more fun to work with.
Although it's a huge and versatile repo, our coding conventions are pretty much
the same everywhere. If you also work with a monorepo, you might need more
"local" rules for various parts of your codebase. For example, if different
parts of your codebase use different frameworks. In that case, you can use the
of the rules to restrict their scope.
A first review with the GPSG gave a great picture of the current status of our repo.
.sourcery.yamlfile of our repo below our existing rules.
Tip: If your repo doesn't have a
.sourcery.yaml file, generate it with
sourcery init. This will add some sensible defaults, e.g. ignoring directories
.git and virtual environments.
sourcery review .
From this, we already learned several things:
We quickly agreed on the following goals:
We generally recommend taking an incremental approach regarding legacy issues. It's essential to be aware of them, but trying to clean them up at once isn't a good idea.
A tempting but dangerous approach is to include such fixes in a random PR that was supposed to add a new feature or fix a bug. Those "quick adjustments" can easily distract both the developers and the reviewers from the actual scope of the PR. Of course, if you detect some issues that are very important or straightforward to fix, you're encouraged to fix them. But do it in a separate PR that is dedicated to cleaning up this tech debt instead of mixing it with other work.
Based on these goals, we chose a fairly common implementation approach:
To achieve this, we have the following setup:
Both use the
--diff option of the Sourcery CLI to check only the currently
We set up a team meeting to discuss which GPSG rules we wanted to follow in our repo. We scheduled it for 1 hour, but it turned out to be quicker.
Again, we copied the whole GPSG rule set into our
.sourcery.yaml file below
our existing custom rules. Then we went through them group by group or, in some
cases, rule by rule.
This was an easy one. The Import Guidelines section of the GPSG contains widely accepted best practices, and we are happy to follow them. Avoiding wildcard and relative imports makes the code easier to grasp both for humans and static analysis tools. Obviously, we care about both here. ;-)
Establishing our naming conventions was less straightforward. The 3.16 Naming Rules section of the GPSG is quite versatile. You'll probably decide rule by rule whether to follow it.
We quickly agreed that we didn't want single-character names, so we kept those 2 rules: one for variable names, one for function names. We also added all the rules that enforce the PEP8 naming conventions:
However, we had an interesting discussion about the
name-type-suffix rule. We
knew that our codebase contained several such suffixes but weren't sure how many
and where exactly. So, we ran another review, this time only for this single
sourcery review --include name-type-suffix .
We found that the 72 names with type suffix appear at various parts of the codebase. We randomly looked at some of them and the names looked sensible. So, we decided to skip this rule.
This is a complex rule where you have several options to take a less extreme approach than our team. 😀
# sourcery skipcomment to them.
_strmakes sense, you can keep the rule and remove the
or name.ends_with("_str")part of its
We had a quick agreement here. Although our code base has a few functions that are longer than 40 lines, we regard those as technical debt.
Function length has a strong correlation with other code quality metrics:
We encourage you to keep
even if it feels strict for your current codebase. Consider adding a
# sourcery skip comment to the existing too long functions. And if it feels to
obvious, you can experiment with setting the limit lower than 40. 😉
We had the most prolonged discussion about the docstrings rules. This is not a surprise, as there are valid arguments in both directions in the broader community as well. :-)
We decided to give it a try and included all the docstring rules from the GPSG in our config. We added the rules for modules and classes as they were. However, we tweaked the docstring rule for functions:
leave_. (Those are frequent constructs in our Sourcery core engine.)
condition: not ((f.starts_with("_") and body.statement_count() < 11) or pattern.in_function_scope()) and not f.starts_with("enter") and not f.starts_with("leave")
Adding the docstring rules also means that we've also started to add docstrings to already existing functions and modules whenever we touch them. I'm curious whether they will make our life easier.
Adapting the type annotation rules really depends on your codebase and on your preference: How many type hints do you already have and do you want to use them more? (For a great discussion about the advantages and disadvantages of type hints, check out this interview with Luciano Ramalho at The Real Python Podcast.)
If your code doesn't include many type hints and you want to gradually increase their number, the GPSG might be a great choice because of its flexibility. For example, you can start with only a subset of the rules or apply them only for public functions.
Our team was in a different position: We already use
mypy for enforcing type
hints and weren't sure whether we wanted to add more checks from another tool
about the same topic. So, after some back and forth, we decided to skip these
rules for now.
Generally, we regard the outcome of this meeting as version 1 of our coding
conventions. It isn't a law set in stone but a
yaml file in our repo.
Everybody is encouraged to contribute to it via our usual pull request process.
Having these rules written down and spelled out as code allows you to have better and more explicit discussions around them. (Instead of revisiting the same topic again and again at code reviews.) This is valuable both for best practices and for cases where you choose between equally good alternatives for consistency.
Finding the tailored conventions for your codebase is an iterative process. Evaluate continuously whether a specific rule helps you write more robust and more maintainable code. Feel free to ditch it if it adds too much overhead.
Have you tried out the GPSG? Have you defined some custom rules for your codebase? Reach out at firstname.lastname@example.org We'd be happy to chat about your (written and unwritten) code conventions.