Most contact centre quality programmes I've reviewed look the same: a QA analyst pulls four calls per agent per month, scores them against a 40-line spreadsheet, drops a calibration meeting in everyone's diary, and the same agents make the same mistakes the next month. The scorecard hasn't been updated since 2019. Nobody's looked at the payment portion of the call in detail because PCI sits with a different team. The CSAT number ticks along, the QA number ticks along, and nothing actually changes.
That's not call quality monitoring. That's call quality theatre.
We work with contact centres that take card payments over the phone, so we see QA programmes from a particular angle. The same calls that affect customer satisfaction also touch card data, and the same gaps that let an agent fumble a refund script let them mishandle a CVV. In 2026, with PCI DSS v4.0.1 in full effect and AI-assisted QA finally usable, it's worth rebuilding the QA framework from scratch. Here's how we'd do it.
What call quality monitoring is actually for#
QA exists to do three things, in order of importance:
- Catch behaviour that puts the customer or the company at risk before it becomes a complaint, a fine, or a regulator letter.
- Give agents specific, repeatable feedback they can act on this week.
- Feed product, process, and training teams with patterns they can fix at the root.
If your QA programme can't show you did any of those three in the last quarter, it's a reporting exercise wearing a coaching badge. Everything below is built around making it do the work.
The 2026 framework#
The framework has five parts: sampling, scoring, coaching, calibration, and evidence. Each one has changed since the last time most QA leads rebuilt theirs.
1. Sampling — stop pulling four random calls per agent
Random sampling at four-per-agent-per-month is a relic. Statistical noise dominates. An agent with a real coaching need can look fine for two months because you happened to score the easy calls. Three approaches work better in 2026, and you should run them together:
- Risk-weighted sampling. Every call that took a card payment, every call where a vulnerable-customer flag fired, every call over a duration threshold, every transferred call. These are the calls where something can actually go wrong. Pull them at a higher rate than baseline.
- Speech-analytics triggered sampling. Agents using the word "definitely" near a compliance script, customers raising voices, silences over 45 seconds, repeat callers within 48 hours. Modern speech tools flag these accurately enough that human QA can spend its time on actual judgement calls.
- CSAT-anchored sampling. Every call that scored 1 or 2 on a post-call survey, every call where the customer asked to escalate. These cost you the most when they're handled badly.
You'll still want a small random sample to avoid blind spots in the triggers, but it should be a fifth of your QA volume, not all of it.
2. Scoring — fewer lines, weighted by harm
If your scorecard has more than 25 items, agents don't remember it and analysts don't apply it consistently. Cut it. Group the remaining items into three weighted buckets:
- Compliance and risk (50% of the score). Identity verification done correctly. Card data handled in line with PCI. Vulnerable-customer policy followed. Disclosures read. Recording paused at the right moments — or, if you've moved to DTMF masking, the masking flow triggered cleanly.
- Resolution and accuracy (30%). Did the customer get the right answer? Was the case noted correctly? Were the follow-up actions actually completed?
- Manner and empathy (20%). Tone, listening, acknowledgement. Important, but you can train these. Compliance and accuracy errors are the ones that bite.
Weighting matters because an agent who's lovely but misquotes a refund policy is doing more harm than an agent who's flat but accurate. The score should reflect that.
3. Coaching — within seven days or don't bother
The single biggest predictor of whether QA changes agent behaviour isn't the scorecard or the calibration session. It's the gap between the call happening and the agent hearing about it. Anything beyond seven days, the agent can't remember what they were thinking. The coaching turns into "you should have done X" instead of "why did you do Y".
Make coaching weekly, short, and specific. Two calls, fifteen minutes, one behaviour to change. Track the behaviour, not the score, in the next sample. If the score went up but the behaviour didn't change, the score is lying to you.
4. Calibration — agents in the room
QA analysts calibrate with each other regularly enough. They rarely calibrate with the agents being scored. That's where most of the disagreement actually lives.
Run a monthly calibration where two analysts and three agents score the same five calls blind, then talk through the differences. You'll find scorecard items that read one way to QA and another way to the floor. Fix the wording. The act of involving agents in the standard is half of why they accept it.
5. Evidence — for PCI, for audits, and for yourself
This is the part most QA programmes ignore until an audit shows up. Under PCI DSS v4.0.1, the controls around how card data is handled during a call need evidence — not just "we have a policy" but "here's a sample of recent calls and the QA findings against the policy". If your QA programme doesn't sample payment calls specifically, the evidence won't be there when an assessor asks.
The simplest fix: tag every call that took a payment in your QA system, sample them at a higher rate than baseline, and keep the QA findings in a format you can export to an auditor without rebuilding the data. Future-you will thank you.
Where AI helps in 2026 — and where it doesn't#
AI-assisted QA is finally useful, three years after every vendor started claiming it was. Speech-to-text accuracy on contact centre audio is now in the high 90s for clean lines, low 90s for noisy ones. Sentiment analysis is good enough to flag escalation risk but not good enough to score empathy. Automated scoring on objective items — script adherence, disclosure presence, hold-time policy — works well. Automated scoring on subjective items — tone, judgement, problem-solving — doesn't, and probably won't for a while.
The right split: let AI do the first pass on every call for the objective items, surface the calls that need human review, and let your QA analysts spend their time on the bits machines are bad at. We've seen QA teams double their effective coverage this way without adding headcount. The risk is letting the AI score everything and assuming the score is true; the items it's bad at are the items that matter most for coaching.
What we wouldn't trust AI to do yet
- Score empathy or emotional intelligence on a granular scale.
- Make agent-performance management decisions on its own.
- Replace human judgement on complex compliance calls — particularly anything involving vulnerable customers.
- Decide whether a payment call followed PCI requirements. The audio is too easy to misread, and the consequence of getting it wrong is too high.
Payments are the QA blind spot most centres still have#
The pattern we see again and again: QA covers the conversation, the resolution, the tone, the script — and stops the moment the agent says "I'll take your card now". The recording either gets paused (often badly, sometimes not at all), the agent reads the card numbers, and QA picks back up after "that's gone through". The payment portion — the part with the highest regulatory weight — gets the least QA attention.
Two ways to fix this. The hard way: build a separate QA process for payment calls, train analysts on PCI, and accept that you're doubling the QA effort on every transactional call. The easier way: move the card data out of the conversation entirely. With channel separation, the customer keys their card data into their phone, the agent hears tones not numbers, and the call recording never contains the PAN or CVV. QA can listen to the whole call without redaction, score everything in context, and the PCI evidence is the system, not the script.
That's the model we run for contact centres taking telephone payments. It changes what QA can do — because every call is fully listenable, every call is fully scorable, and the compliance question gets answered by the architecture not by the agent's discipline on the day.
Metrics that matter, metrics that don't#
The QA metrics that actually predict business outcomes are narrower than most reporting suggests:
- Behaviour change rate. Of the specific behaviours coached in the last cycle, what percentage actually changed in the next sample? This is the one number that tells you if your QA programme is working.
- Compliance error rate on payment calls. Trending down should be a board-level metric in any centre that takes cards.
- Repeat coaching items per agent. If the same agent is being coached on the same item three months running, the coaching isn't working — or the agent needs a different role.
- QA-to-CSAT correlation. If your QA scores don't correlate with customer satisfaction, the scorecard is measuring the wrong things. Fix the scorecard.
Metrics that consume reporting time without changing anything: average QA score, sample volume, percentage of agents scored. They're inputs, not outcomes. Report them once a month and don't make decisions from them.
A practical starting point#
If you're rebuilding QA in a centre that takes phone payments, the order of operations matters. Do these in sequence and you'll see movement within a quarter:
- Cut the scorecard to 25 items, weighted as above.
- Tag payment calls and sample them at twice the baseline rate.
- Move to weekly coaching, two calls, one behaviour.
- Add agents to one calibration session per month.
- Move card capture off the recording — with secure telephone payments or a payment link — so QA can listen to every call without redaction.
- Layer AI for first-pass scoring on objective items only.
You'll know it's working when coaching items stop repeating, payment-call compliance errors trend down, and your QA team starts spending its time on judgement calls rather than transcription. If you want to talk through what this looks like in a centre that takes telephone payments, get in touch — happy to share what we've seen across the centres we work with.



