Avoiding Performance Killers in Loops

loop

Why?

In his article Reflections on Software Performance, Nelson Elhage claims that "performance is a feature".

I’ve really come to appreciate that performance isn’t just some property of a tool independent from its functionality or its feature set. Performance — in particular, being notably fast — is a feature in and of its own right, which fundamentally alters how a tool is used and perceived. ... It’s probably fairly intuitive that users prefer faster software, and will have a better experience performing a given task if the tools are faster rather than slower. What is perhaps less apparent is that having faster tools changes how users use a tool or perform a task. Users almost always have multiple strategies available to pursue a goal — including deciding to work on something else entirely — and they will choose to use faster tools more and more frequently. Fast tools don’t just allow users to accomplish tasks faster; they allow users to accomplish entirely new types of tasks, in entirely new ways.

He showcases a great example where performance enabled users to use a search tool in an unexpected way: interactively.

We've seen this first hand while developing Sourcery: a speed-up made completely new features possible. In its initial, IDE-focused version, Sourcery needed a lot of time to analyze long files (ca. 2000 lines and above). Especially when Sourcery detected thousands of issues in them. Our workaround used to be to set a timeout. This was a reasonable solution for an IDE plugin - even if it meant that Sourcery sometimes missed refactoring possibilities at the end of a huge file.

However, this kind of inconsistency isn't acceptable for a tool running in a CI. It was also clear that the more numerous and diverse rules Sourcery needs to tackle, the more this will become an issue. So, we decided to focus on performance, which brought a few architecture changes and several micro-optimizations. This speedup made it possible to introduce the major features of the last year:

allowing teams to define their custom, project-specific rules
a robust CLI tool that can be used in a CI system

When?

After laying out arguments for the significance of performance, Nelson Elhage makes the point that "Performance needs effort throughout a project’s lifecycle". He brings up three main reasons for this:

"Architecture strongly impacts performance": There are several design decisions with big influence on performance, that are difficult to change afterwards.
"Performant foundations simplify architecture": That claim surprised me first. But I had to agree with the reasoning that a frequent source of complexity is caching. The more performant the foundation is, the less we need to rely on complex caching solutions.
"Performance isn’t just about hot spots": The distribution of the execution time can be quite different among different software. In some systems, there are outliers and optimizing those will bring a huge gain. In the typechecker he wrote, time is divided relatively evenly. The performance can be improved by a bunch of micro-optimizations.

Avoiding performance killers in advance can save you a ton of work later.

What?

As discussed above, performance improvements can come in two flavors:

Improving hot spots where even a small change can bring a big gain.
Micro-optimizations where the sum of them can have a huge impact.

In our interview, Will McGugan said that a code's speed can often be reduced to its half via micro-optimizations. And he mentioned loops as the first place to look at.

Inner loops, loops doing the most iterations and doing the most work are the usual suspects. And it tends to be a piece of code that does more than one thing.

So, what are those functions that you shouldn't call in loops?

I/O-bound: talking to some external service over a network: databases, APIs, etc.
CPU-bound: complex calculations

Where can you find potential candidates for these types of functions?

Documentation

Library docs are great sources of ideas for optimization. Make sure to check out:

the reference docs of the functions you are using
the how-to guides related to your use case

They both can provide valuable insight about possible limitations and more performant alternatives.

A frequent pattern is that a library provides two ways for the same operation:

A convenient and not too performant way with a simple syntax, e.g. with a function call. This is great for getting started with the library and for one-off experimentation.
A more sophisticated way, e.g. using a context manager with performance optimizations and configuration options.

For example, with the httpx library, you can send requests in two ways:

With functions in the top-level API, like httpx.get(), httpx.post(), etc.
With an httpx.Client instance.

Regarding when to use which, the TLDR on the Advanced Usage docs says:

If you do anything more than experimentation, one-off scripts, or prototypes, then you should use a Client instance.

Contrary to the top-level functions, the Client uses connection pooling, "which can bring significant performance improvements".

Deprecations

A common reason for deprecation is speed. Again, consult the documentation.

Even if you decide that detecting and replacing all occurrences takes too much effort: It might be worth replacing the deprecated calls at least in performance hot spots and loops.

Intuition

When Will was talking about the possible performance killers in loops (see above), he mentioned one more problem besides the usual suspects: "code that does more than one thing".

This is tricky to "formalize". It's difficult to provide a marker how to recognize code that is doing too much. (If you have some good code smell or rule of thumb, let us know.) Some argue that an and in a function's name should be suspicious.

But if you know the codebase well, your educated guesses might be quite on the spot. As Will put it:

If you understand the code, you have an idea where things can be slow.

How?

Let's say you have an internal library customer_management. And you know that the function loyalty_points.calculate_balance() takes a lot of time. Now, you can create a rule to ensure that customer_management.loyalty_point.calculate_balance doesn't get called in a loop.

For this, you can use the Sourcery Rules Generator. You can install it with:

pip install sourcery-rules-generator

To create "expensive loop" rules, run the command:

sourcery-rules expensive-loop create

You'll be prompted to provide the fully qualified name of the expensive function. Here, you can enter:

customer_management.loyalty_points.calculate_balance

screenshot sourcery-rules expensive-loop create

2 rules will be generated:

1 for for loops
1 for while loops

rules:
  - id: no-customer_management-loyalty_points-calculate_balance-for
    description: Don't call `customer_management.loyalty_points.calculate_balance()`
      in loops.
    pattern: |
      for ... in ... :
          ...
          customer_management.loyalty_points.calculate_balance(...)
          ...
    tags:
      - performance
      - no-customer_management-loyalty_points-calculate_balance-in-loops
  - id: no-customer_management-loyalty_points-calculate_balance-while
    description: Don't call `customer_management.loyalty_points.calculate_balance()`
      in loops.
    pattern: |
      while ... :
          ...
          customer_management.loyalty_points.calculate_balance(...)
          ...
    tags:
      - performance
      - no-customer_management-loyalty_points-calculate_balance-in-loops

Conclusion

If an application has performance hot spots, it absolutely makes sense to focus on those. But often, there isn't an obvious culprit for the slowness. Even in that case, probably, there's still a lot of room for improvement. Micro-optimizations can have a surprisingly big cumulative effect.

A good place to start with are the ultimate multipliers: loops.

It's a good practice to review your code regularly with a performance focus. And it's also a good practice to set up some rules to avoid some costly structures in advance.

If you're looking to hunt down performance killers in your projects but aren't sure where to start, our team is happy to have a look with you.

Related Sources

Nelson Elhage: Reflections on Software Performance
interview with Will McGugan in our blog
Will McGugan: 7 things I've learned building a modern TUI framework