Fixing bugs and production isses - Sentinel is coming soon

How can you resolve production issues faster?

Date

Mar 06, 2025

Sentinel in action

A couple of months ago, we were looking at our backlog of Sentry issues and wishing there was a way to work through them faster. We're a small team without much time or bandwidth to investigate all the issues that crop up while supporting multiple products. We jump on high-severity issues when they appear, but even those significantly drain our resources from current projects.

This got us thinking - what if we could have something automatically handle these issues for us?

And so we started building Sentinel - our AI on-call engineer to help deal with issues faster.

Fixing an issue is just as much about the investigation as the solution

When we first set out to help teams handle production issues we were focused on how could we fix the issue as quickly as possible. But as we thought about it more, we realised that most issues don't have a single, obvious fix. And sometimes the investigation yields as much value as the fix itself.

So we rethought our approach to Sentinel and thought about how we could mimic how we think about bugs and issues as a developer:

  1. Figure out if this is even an issue. Or is it something being flagged but not fixable or addressable?
  2. Investigate the issue and come up with a few hypotheses about what's going on.
  3. Look at the evidence to see which hypothesis is most likely.
  4. Explore the best way(s) to fix the underlying issue. Or see if a quick fix for the symptom is the right call in this case.
  5. Actually make the code change.

Sentinel is fixing itself (sometimes)

We're now using Sentinel extensively internally, and it's able to fully take some issues from identification to PR - pretty much immediately. Let's take a quick look at one of these issues:

Sentry Stack Trace

We ran into a bug when we were expanding what info we grabbed from the Sentry API about an issue. Sometimes a Sentry issue didn't have a Last Seen field and this was causing an error because we had this as a required field for a SentryIssue object.

It's a relatively straightforward error, but let's see how Sentinel handled it.

Sentinel summarizing an issue

First Sentinel gives us a high level summary of the issue. It's pretty much spot on with the description I gave earlier.

Sentinel investigating an issue

Now for the more fun side of things.

Sentinel proposes a couple of different hypotheses about what might be causing this issue. For each one, it looks for different pieces of evidence and uses those to evaluate which hypothesis is the most likely.

Again, this isn't a particularly gnarly issue, and the top hypothesis is pretty straightforward - we're missing a lastSeen field in a Sentry Issue from the Sentry API but we require it to be a datetime value.

Sentinel proposing fixes

Sentinel suggested four different ways for us to improve how we handle cases where the lastSeen field is missing - but all four boiled down to two main approaches:

  1. Make the field optional
  2. Create a default value for the field

Since we don't actually need a lastSeen value on every issue, making it optional is the most straightforward option. And that's what Sentinel suggested.

All we had to do was click the "Create pull request" button and had the fix waiting for us in our GitHub repo - just a couple of minutes after the bug first popped up.

As I said earlier, this is a straightforward issue. For more complex bugs, we still have work to do to make Sentinel consistently investigate and fix them every time.

But, we're starting to see Sentinel open PRs that can fix real issues for us multiple times a day (hopefully we actually see that go down soon as we have fewer Sentry issues!).

Setting up Sentinel for your codebase

Today, Sentinel is in closed beta, but we're looking to expand access to more teams in the coming weeks.

If you'd like to get early access, you can sign up for our waitlist today. Or, if you'd like to see a bit more of what it looks like in action, feel free to set up a call with me to walk through it.

One caveat - we're starting with Sentry as the main error monitoring platform that we support. If you're using another tool you can still sign up for the waitlist, but it might be a few weeks before we're able to roll it out to you.