A Day On-Call in A DevOps Team

It is morning. You are at your workplace and your on-call shift has just begun. The chatbot helpfully posts this information in the team channel, including the time when the shift ends and noting the next engineer to take it over from you.

You take a look at the monitoring dashboards. There are no ongoing incidents. There are no alarms firing. The application has served over 120,000 users in the last 24 hours. 0.09% of user interactions have ended in failure, and over 6% of user engagement time was wasted waiting for stuff to load. That's above normal, but the issue was already dealt with yesterday. Apparently, queries to a shared MySQL database became slow due to an ongoing table optimization. A few even timed out. Ironically, the maintenance was meant to make queries faster.