Michael Spradlin

Baseline Neglect

June 27, 2023 (540 days ago)

A baseline is a general standard or reference point that needs constant inspection and monitoring.

In business or systems management, neglecting baselines can lead to poor or flawed analysis, misinformation, bad decision-making, and a lack of understanding of system or operational performance.

Properly managed baselines provide a clear understanding of "normal" or expected conditions. When a neglected baseline hits a team/organization, it hurts.

So, why does neglect happen? Usually it's a case of:

Systems that run well go unnoticed. Companies are filled with busy people, and generally people think mostly about their list of things to be done. Over time, the work of maintaining and supporting the systems that generate baselines - "keeping the lights on" - can drift to the background of what a team or organization does; until that work pops right back to the forefront.

When an important baseline experiences a change in trend, up or down, attention inevitably shifts to that baseline. Thinking around what the right baseline(s) are for your team and how to appropriately staff them - support, maintain, develop - helps everyone.

Is this right?

When a baseline trend changes, there's usually an immediate response: “is this right?”

If that question isn't answered quickly and confidently, it's a blinking red indicator light of baseline neglect.

“Is this right?” has many forms:

In the lifespan of a baseline, it's likely that it will deviate, spike, plunge, blip, convulse, and eventually stabilize. Someone has to explain with confidence what happened and what might probably happen next.

Systems should enable workflows and processes, and the baselines those workflows and processes generate should enable high bandwidth conversations. Good work on those baselines keeps discussions centered on business outcomes. And besides, on top of healthier discussion and better dialogue, baselines are essential to planning anything new. You build on baselines.

Shifting from neglect to care and attention.

There's an important determination before you decide how to give the right care and attention to something:

Simple steps can shift a team toward a regular understanding of a baseline.

Define the baseline to monitor: Orient around an indicator of good performance toward a goal you want to achieve. In a revenue team, it could be dollars, units, etc. In a software engineering team it could be error rate, latency, etc.

> Assign a team or person to directly own monitoring the baseline

Establish monitoring systems: This one’s obvious, but just…track and record your baseline metrics.

Here is a fork: do you need to understand the baseline over time, or do you need an inspection forum? Charts and time series are easy enough to spin up, and early career team members love to create them, but I've often found that the better means of preventing issues over time is a regular inspection mechanism - e.g., a zero dash, where the idea is that you only want to see zeroes across the board.

Build Real-Time Alerts: Setting up real-time alerts forces a team to explicitly set deviation thresholds, which sets the bar for immediate attention and action. For example, you can configure alerts to trigger when an error rate surpasses a predefined value, or when an unexpected data configuration occurs.

Visualize: Create a glanceable, quick overview of the system's health and performance trends.

Examine changes in trends over common calendar periods: Zoom out and take a long look at historical patterns and trends.

Raise the water line: Conduct root cause analysis for any meaningful deviations or anomalies and document/assign the follow-ups to one person. Don't have the same error twice/thrice.

Document: Document monitoring processes and any/all changes made over time. Pay it forward to your future self. It’s difficult to get to a shared understanding of metrics, goals, and actions, so ensure everything is documented and highlight every documentation gap.