Lesson
2

Defining “Good”: Building Quality Measures Teams Can Run

The New Operating System for Customer Conversations

Core Question

How do teams define quality in a way that is consistent, explainable, and coachable?

Quality only becomes operational when “good” is defined in observable conversation behaviors, tied to customer outcomes, and measured with evidence. The goal is not a perfect scorecard—it is a stable, shared standard that supervisors can coach to, QA can apply consistently, and leaders can trust at scale.

Many teams believe they have defined quality because they have a scorecard. In practice, the scorecard is often a mix of reasonable intentions and unclear measurements. It produces numbers, but it does not consistently produce agreement, coaching, or improvement.

Defining “good” is not a documentation exercise. It is an operating decision. If the definition is vague, quality becomes subjective. If it is too rigid, teams optimize for compliance with the form rather than outcomes in the conversation. If it cannot be explained with evidence, it will not hold up under scrutiny.

The goal is a definition of quality that teams can run every day.

Quality is not a score. It is a standard

A score is an output. A standard is what the organization agrees to measure and improve. Without a shared standard, quality programs drift into personal preference and local norms.

A workable standard has three properties.

  • It can be observed in a conversation
  • It can be explained with evidence
  • It can be coached without ambiguity

If any of these are missing, teams will argue about interpretation, agents will distrust feedback, and coaching will turn into opinion.

A quality program that scales is a program that creates alignment.

What “Good” Should Be Made Of

When teams struggle to define quality, it is often because they mix different kinds of “good” in the same category. The result is confusion: agents receive feedback they cannot act on, and managers cannot separate skill issues from process issues.

A practical definition of “good” includes three layers.

Customer outcome

Did the customer get what they needed, clearly and correctly?

This is the highest-level measure, but it is not enough on its own. Outcomes often depend on factors outside the agent’s control. Still, outcomes matter because they anchor the purpose of the work.

Conversation behaviors

What did the agent do in the conversation that increased or reduced the chance of a good outcome?

Behaviors are where coaching lives. Behaviors are also where consistency is possible because they can be observed and evidenced.

Policy and process adherence

Did the agent follow required steps and say required disclosures?

This layer matters, but it should not dominate the definition of quality. Many programs overweight adherence because it is easy to score. When that happens, teams optimize for checklists rather than effective communication.

A stable quality model keeps these layers separate so the organization can see what is really happening.

Make “Good” Observable

The quickest way to ruin a quality definition is to use words that sound right but cannot be scored consistently.

Common examples:

  • “Professional”
  • “Confident”
  • “Helpful”
  • “Empathetic”
  • “Clear”

These are not wrong. They are incomplete. They describe a feeling, not an observable behavior.

To make quality operational, definitions need observable evidence. The simplest method is to convert abstract concepts into measurable behaviors.

From abstract to observable

Instead of “clear communication,” define “clear” as behaviors like:

  • States the purpose of the call within the first minute
  • Confirms understanding of the customer’s request
  • Uses plain language rather than internal terms
  • Summarizes the resolution and next steps before closing

Instead of “empathy,” define behaviors like:

  • Acknowledges the customer’s situation in one sentence
  • Uses confirming language before giving instructions
  • Avoids blame language when describing limitations

The point is not to reduce conversations to scripts. The point is to give supervisors and agents a shared language for what “good” looks like.

If “good” cannot be pointed to in the transcript, it will not scale.

Tie Measures to Coaching, Not Just Scoring

Quality measures should exist primarily to drive coaching and improvement. If a measure cannot produce a clear coaching conversation, it is usually the wrong measure.

A coaching-ready measure has these characteristics:

  • It describes a behavior the agent can change
  • It is specific enough to be actionable
  • It can be demonstrated with evidence from the interaction
  • It aligns with a real outcome or risk

This is also how you avoid “vanity quality.” Vanity measures make teams feel controlled while producing little improvement.

A simple test works well.

If a supervisor cannot explain the score using one or two short pieces of evidence, the measure is not ready.

Avoid Mixing Measurement Types

Many scorecards fail because they combine incompatible kinds of evaluation in the same category. A common example is a single “Communication” category that includes:

  • tone and politeness
  • accuracy of information
  • completeness of process steps
  • resolution quality

When these are mixed, teams cannot tell whether an agent needs training, whether the workflow is broken, or whether the knowledge base is wrong.

Good measurement design separates:

  • Skill problems
  • Knowledge problems
  • Process problems
  • Policy problems

This separation matters operationally. Skill coaching looks different from process repair. Knowledge fixes look different from compliance remediation. When categories are blended, everything becomes “agent performance,” and the organization misses systemic causes.

If quality is meant to improve the operation, it must be able to distinguish human performance from system design.

Set the Bar for Consistency, Not Perfection

A quality model does not need to capture everything. It needs to capture what matters most and do so consistently.

At scale, consistency beats completeness.

A practical approach is to define:

  • A small set of core measures that apply broadly
  • A limited number of call-type measures that apply only when relevant
  • A small number of non-negotiables for compliance or safety

This keeps the standard stable while allowing reasonable variation.

A scorecard that is too large becomes un-runnable. A scorecard that changes constantly becomes untrustworthy. A scorecard that is stable becomes an operating system component.

What This Enables Next

Once “good” is defined as observable behavior, you can measure it more consistently. Once it is measurable, you can attach evidence. Once evidence exists, you can build trust.

That trust is what allows quality to become less subjective, compliance to become more defensible, and coaching to become faster and more effective.

The next lesson focuses on the evidence requirement directly: why scores without context fail, and what makes evaluations trustworthy enough to run at scale.

In Practice

  • Teams often say they have a quality model, but reviewers still disagree because definitions are not observable.
  • Scorecards frequently mix outcomes, behaviors, and compliance steps, which makes coaching confusing and inconsistent.
  • Abstract labels like “professional” or “clear” produce debate unless they are defined as specific behaviors.
  • Measures that cannot be explained with evidence do not earn trust from agents or supervisors.
  • Quality programs improve faster when categories separate skill issues from process, policy, and knowledge issues.

Further Reading

Continue Reading

Once “good” is defined in observable behaviors, the next challenge is trust. The next lesson explains why scores without context fail and how evidence turns evaluation into something teams can rely on.
3
Evidence Beats Scores
Why do scores without context fail operational scrutiny?
The New Operating System for Customer Conversations