How can you resolve production issues faster?
Mar 06, 2025
A couple of months ago, we were looking at our backlog of Sentry issues and wishing there was a way to work through them faster. We're a small team without much time or bandwidth to investigate all the issues that crop up while supporting multiple products. We jump on high-severity issues when they appear, but even those significantly drain our resources from current projects.
This got us thinking - what if we could have something automatically handle these issues for us?
And so we started building Sentinel - our AI on-call engineer to help deal with issues faster.
When we first set out to help teams handle production issues we were focused on how could we fix the issue as quickly as possible. But as we thought about it more, we realised that most issues don't have a single, obvious fix. And sometimes the investigation yields as much value as the fix itself.
So we rethought our approach to Sentinel and thought about how we could mimic how we think about bugs and issues as a developer:
We're now using Sentinel extensively internally, and it's able to fully take some issues from identification to PR - pretty much immediately. Let's take a quick look at one of these issues:
We ran into a bug when we were expanding what info we grabbed from the Sentry
API about an issue. Sometimes a Sentry issue didn't have a Last Seen
field and
this was causing an error because we had this as a required field for a
SentryIssue
object.
It's a relatively straightforward error, but let's see how Sentinel handled it.
First Sentinel gives us a high level summary of the issue. It's pretty much spot on with the description I gave earlier.
Now for the more fun side of things.
Sentinel proposes a couple of different hypotheses about what might be causing this issue. For each one, it looks for different pieces of evidence and uses those to evaluate which hypothesis is the most likely.
Again, this isn't a particularly gnarly issue, and the top hypothesis is pretty
straightforward - we're missing a lastSeen
field in a Sentry Issue from the
Sentry API but we require it to be a datetime
value.
Sentinel suggested four different ways for us to improve how we handle cases
where the lastSeen
field is missing - but all four boiled down to two main
approaches:
Since we don't actually need a lastSeen
value on every issue, making it
optional is the most straightforward option. And that's what Sentinel suggested.
All we had to do was click the "Create pull request" button and had the fix waiting for us in our GitHub repo - just a couple of minutes after the bug first popped up.
As I said earlier, this is a straightforward issue. For more complex bugs, we still have work to do to make Sentinel consistently investigate and fix them every time.
But, we're starting to see Sentinel open PRs that can fix real issues for us multiple times a day (hopefully we actually see that go down soon as we have fewer Sentry issues!).
Today, Sentinel is in closed beta, but we're looking to expand access to more teams in the coming weeks.
If you'd like to get early access, you can sign up for our waitlist today. Or, if you'd like to see a bit more of what it looks like in action, feel free to set up a call with me to walk through it.
One caveat - we're starting with Sentry as the main error monitoring platform that we support. If you're using another tool you can still sign up for the waitlist, but it might be a few weeks before we're able to roll it out to you.