Making the Sourcery CLI More Robust with Rich

sourcery review summary

Over the last couple of months, we've been working to make the Sourcery CLI more robust. There were a couple of different factors driving this effort.

We wanted to make Sourcery more powerful for legacy code. When you're writing new code, you want immediate feedback, so something like an IDE plugin is great. But, when you're working with a large codebase that's not an ideal way to review your code. Instead, the CLI is more suitable. It gives you a better overview of the changes and lets you fix dozens of issues in a single command.
We introduced several features into core Sourcery that needed new capabilities from our CLI. Custom rules and rulesets like the Google Python Style Guide didn't fit into the CLI's setup.

Rich appeared on our radar when we were searching for a way to display Markdown in the terminal. We had just added the custom rules. The fields description and explanation support Markdown and often contain links, listings, etc. In the IDEs, it was straightforward to display those Markdown elements, but what about the command line? We considered implementing the Markdown functionality ourselves but recognized that it's far from our core competence. We had heard some positive reviews about Rich and decided to give it a try.

This turned out to be an excellent choice. Introducing the Markdown element for custom rules was only the first step. Over the course of the last months, our CLI has kept growing. We've added several building blocks that make our CLI application more robust, user-friendly, and modern. Displaying feedback during long-running operations, structuring the output, showing both a detailed info and a summary. It turned out that Rich has a neat solution for all of these. In this post, we would like to share our implementation for these common CLI use cases and some lessons we've learned on the way.

Show Some Status or Progress for Operations that MIGHT Take Longer

While it sounds obvious that you need to show some status during long-running operations, it's often not so clear which operations fall into this category. For example, it took us a while to recognize that authentication is a tricky step. In the local environment with a local server, it happens instantaneously. Connecting to the production server is a different story.

Tim observed that the Sourcery CLI was unusually unresponsive while testing it during a train ride. It turned out that the authentication could take multiple minutes with a wobbly internet connection. We fixed that issue by adding some timeouts and used the status method of the Rich console to indicate that the authentication is in progress. It's handy that the Status can be used with a context manager:

def _authenticate(self) -> bool:
    with self.stderr_console.status("Authenticating"):
        ...  # Authentication logic here.

sourcery review step: Authenticating

During the crucial step, the review of the code, we went one step further: We added a Progress element to show how many files have already been processed and how many are waiting. We defined custom columns but for many use cases the default Progress element will be fine:

with Progress(
    TextColumn("{task.description}"),
    BarColumn(),
    MofNCompleteColumn(),
    TimeRemainingColumn(),
    console=progress_console,
    disable=not progress_console.is_terminal or os.getenv("PRE_COMMIT") == "1",
    expand=True,
    transient=True,
) as progress:
    review_files_task = progress.add_task("Reviewing files", total=len(file_asts))
    for file_ast in file_asts:
        progress.update(
            review_files_task,
            description=f"Reviewing {file_ast.module_path}",
            refresh=True,
        )
        # File review logic called here.

sourcery review in progress

When introducing such elements, it makes sense to consider which factors influence the execution time. Some examples from the Sourcery CLI:

The duration of the authentication depends on various factors, including your internet connection.
The duration of the review itself depends on the number and size of the files, but also on the number of violations Sourcery finds.

Test with Redirected Input and Output

The first two principles of the Command Line Interface Guidelines are human-first design and composability. A good CLI tool is practical both for humans and automations. It can be easily called in scripts and combined with other commands.

We started to pay more attention to composability after we had added the Status and Progress elements mentioned above. Will they mess up the output if we've redirected it to a file?

With the Progress element, you can tweak various options depending on the environment:

console: Do you display it on stdout or stderr?
disable
transient

For example, the Progress during sourcery review is displayed on stdout or stderr depending on whether stdout is a terminal:

progress_console = (
    self.stdout_console if self.stdout_console.is_terminal else self.stderr_console
)

One caveat is that Rich, by default, resizes the content to fit the available width. This can lead to nicer output for human users but also to unexpected behaviour if the output gets redirected. The sourcery review command has a --csv option. When it's used, the output often gets redirected into a csv file. For this reason, we ensure that each violation is displayed in exactly 1 line via setting soft_wrap:

self.stdout_console.print(
    f"{file_name}:{affected_line}:0: {proposal.id()} line {line_nr_in_diff}/{diff_length}",
    # This output should be machine-readable.
    # We need to ensure that Rich doesn't introduce any line breaks.
    soft_wrap=True,
)

In the case of the sourcery review command, the --csv option is an obvious candidate for redirected output. Even if your application doesn't contain such a feature, it's highly recommended to ensure that it works well with redirected input and output. Some common scenarios your users might try:

search the output with grep
count the lines of the output with wc -l (Often with the underlying assumption that each line represents one item.)

As the Command Line Interface Guidelines puts it: "Whatever software you're building, you can be absolutely certain that people will use it in ways you didn't anticipate. Your software will become a part in a larger system — your only choice is over whether it will be a well-behaved part."

Display Partial Results for Long-running Operations

Besides displaying some status, it's also helpful to show partial results, as soon as you have them. This way, users get some information and can even detect some errors, before the whole operation has finished. A frequent pattern to implement this:

A logic module returns a generator.
As soon as an item is yielded, the CLI module processes it and displays some output.

The sourcery review command displays a violation as soon as it has been detected. This way the users can see quickly which issues were found in the first files. With that information, they can also decide better what's a reasonable next step, if they interrupt a long-running review. E.g. run a review only for a subdirectory, exclude some noisy rules, include only a subset of high-priority rules.

sourcery review gif

Structure Your Output

The more and more features you add, the higher the chance that the output becomes overwhelming. This might be difficult to notice, because individually each piece makes a lot of sense. But you need some structure to form a coherent output of these various pieces.

After adding several new features and tweaks, we recognized that the sourcery review command had become quite overwhelming. As a first step, we introduced a Console.rule that draws a line to separate the in-progress output from the summary:

self.stderr_console.rule("Overview")

Structuring the summary was relatively straightforward, as long as we added each new piece to the same Markdown element. When we introduced a separate Table showing the number of issues per rule, we needed a small workaround to ensure that their stylings match:

table = Table(title="Issues by Rule ID", title_style="bold", title_justify="left")

sourcery review summary

In the output of the sourcery login command, we include some tips about possible next steps. These are displayed in a Panel to separate them from the command output:

def _show_tip(self, message: str) -> None:
    if not (is_ci_environment() or is_pre_commit()):
        self.stderr_console.print(Panel(Markdown(message), style="gray46", title="tip"))

sourcery login with tip

Don't Print Large Code Blocks

After we had added the GPSG rules, we were surprised to see that the sourcery review command took much longer than before. The difference wasn't that big for test files with dummy code, but as soon as we reviewed a small repo with real code, the execution time went up.

After some experimentation with various test cases, we found the culprit: the no-long-functions rule. In that initial version, whenever Sourcery detected a violation of this rule, we printed the whole function. Which was, by definition, long. It turned out that while the Syntax element of Rich is awesome, it isn't optimized for printing several hundred lines of code. Which is understandable, because such an output would be unreadable anyway. :-)

We revisited our more complex and function-level rules to ensure that we print only a sensible amount of code for each of them. If your application displays code coming from user input, validation and cropping are probably good ideas.

Test with Multiple Color Schemes

The terminal is supposed to be quite consistent. But the more fancy elements and custom styling you use, the more it might look different with various color schemes.

Rich supports a style option for various elements. Use that with care. :-) If you start defining hard-coded colours, you might learn that they don't look that great for example with the "Green on Black" color scheme. The default styling usually looks well with various color schemes.

sourcery review verbose light theme

sourcery review verbose dark theme

+1: Use Rich :-)

This is probably the most important of all lessons. Introducing Rich was a major improvement benefiting both the users of the CLI and the developers working on it.

For example, we were able get rid of a lot of custom code we used to display coloured diffs. The Syntax element is both more convenient and more reliable accross various terminals and color schemes.

Try out the new & improved Sourcery CLI and let us know what you think. Have any ideas what we should improve? Rich features we should be aware of? Reach out in an email at hello@sourcery.aior on Twitter @SourceryAI.

Resources

Rich on GitHub
Rich's documentation
Command Line Interface Guidelines An open-source, language-agnostic guide to help you write better command-line programs.