Advice for Someone Moving From SRE to Backend Engineering

Recently there was a Reddit post asking for advice about moving from Site Reliability Engineering to Backend Eng. I started writing a response to it, the response got long, and so I turned it into a blog post.

In the post, OP mentions a couple of things driving the motivation for the transition. One is a concern that they may be losing development skills because they’re spending so much time creating scripts and automating. The other reason is that they’re having trouble adjusting to on-call life.

Incident Management in 2021: From Basics to Best Practices

Covering the Basics

What Is Incident Management?

Incident management is the process used by developer and IT operations teams to respond to system failures (incidents) and restore normal service operation as quickly as possible.

What Is an Incident?

Incident is a broad term describing any event that causes either a complete disruption or a decrease in the quality of a given service. Incidents usually require immediate response of the development or operations team, often referred to as on-call or response teams in incident management.