How many calls per agent should we monitor each month?

Random sampling at four calls per agent is statistical noise. We'd run risk-weighted sampling (every payment call, every escalation, every vulnerable-customer flag), speech-analytics triggered sampling, and CSAT-anchored sampling together, with maybe 20% random to catch blind spots. Volume per agent ends up varying — that's the point. Agents handling more high-risk calls get more attention.

Can AI replace human QA analysts in 2026?

No. AI does a good first pass on objective items — script adherence, disclosure presence, hold time, sentiment flags — but it's still poor at empathy, judgement, and complex compliance calls. The right model is AI for the first pass and humans for the calls that need actual judgement. That roughly doubles effective coverage without adding headcount.

How does call quality monitoring interact with PCI DSS v4.0.1?

PCI v4.0.1 expects evidence that the controls around card data on calls actually work, not just that there's a policy. If your QA programme doesn't sample payment calls specifically and keep the findings in an exportable format, the evidence isn't there when an assessor asks. Tag payment calls, sample them at a higher rate, and store findings in a format you can hand over.

Why is pause-and-resume call recording a QA problem?

Because the part of the call with the highest regulatory weight — the card capture — is the part QA can't listen to. Agents pause manually, sometimes badly, sometimes not at all. Channel separation (the customer keys the card on their phone) keeps the recording continuous and PAN-free, so QA can score the whole call without redaction.

What's the single best QA metric for predicting business outcomes?

Behaviour change rate. Of the specific behaviours your coaches asked agents to change last cycle, what percentage actually changed in the next sample? Most centres don't measure this. Average QA score and sample volume are inputs, not outcomes — they look good on dashboards but don't predict anything.

How quickly should coaching happen after a scored call?

Within seven days. Beyond that the agent doesn't remember what they were thinking and the conversation becomes "you should have done X" rather than "why did you do Y". Weekly coaching cycles — two calls, fifteen minutes, one behaviour — outperform monthly reviews by a wide margin in our experience.

Should agents be part of QA calibration?

Yes, and most centres don't do this. Run a monthly session where two analysts and three agents score the same five calls blind and discuss the differences. You'll find scorecard items that read one way to QA and another way to the floor, and you'll get buy-in on the standard because the floor helped shape it.

Call quality monitoring: how to score calls that matter

Most contact centre quality programmes I've reviewed look the same: a QA analyst pulls four calls per agent per month, scores them against a 40-line spreadsheet, drops a calibration meeting in everyone's diary, and the same agents make the same mistakes the next month. The scorecard hasn't been updated since 2019. Nobody's looked at the payment portion of the call in detail because PCI sits with a different team. The CSAT number ticks along, the QA number ticks along, and nothing actually changes.

That's not call quality monitoring. That's call quality theatre.

We work with contact centres that take card payments over the phone, so we see at Paytia QA programmes from a particular angle. The same calls that affect customer satisfaction also touch card data, and the same gaps that let an agent fumble a refund script let them mishandle a CVV. In 2026, with PCI DSS v4.0.1 in full effect and AI-assisted QA finally usable, it's worth rebuilding the QA framework from scratch. For the broader picture of running a secure cloud contact centre, see our cloud contact centre guide. Here's how we'd do it.

What call quality monitoring is actually for#

QA exists to do three things, in order of importance:

Quality analyst reviewing recorded calls with headphones

Catch behaviour that puts the customer or the company at risk before it becomes a complaint, a fine, or a regulator letter.
Give agents specific, repeatable feedback they can act on this week.
Feed product, process, and training teams with patterns they can fix at the root.

If your QA programme can't show you did any of those three in the last quarter, it's a reporting exercise wearing a coaching badge. Everything below is built around making it do the work.

The 2026 framework#

The framework has five parts: sampling, scoring, coaching, calibration, and evidence. Each one has changed since the last time most QA leads rebuilt theirs.

1. Sampling — stop pulling four random calls per agent

Random sampling at four-per-agent-per-month is a relic. Statistical noise dominates. An agent with a real coaching need can look fine for two months because you happened to score the easy calls. Three approaches work better in 2026, and you should run them together:

Risk-weighted sampling. Every call that took a card payment, every call where a vulnerable-customer flag fired, every call over a duration threshold, every transferred call. These are the calls where something can actually go wrong. Pull them at a higher rate than baseline.
Speech-analytics triggered sampling. Agents using the word "definitely" near a compliance script, customers raising voices, silences over 45 seconds, repeat callers within 48 hours. Modern speech tools flag these accurately enough that human QA can spend its time on actual judgement calls.
CSAT-anchored sampling. Every call that scored 1 or 2 on a post-call survey, every call where the customer asked to escalate. These cost you the most when they're handled badly.

You'll still want a small random sample to avoid blind spots in the triggers, but it should be a fifth of your QA volume, not all of it.

2. Scoring — fewer lines, weighted by harm

If your scorecard has more than 25 items, agents don't remember it and analysts don't apply it consistently. Cut it. Group the remaining items into three weighted buckets:

Compliance and risk (50% of the score). Identity verification done correctly. Card data handled in line with PCI. Vulnerable-customer policy followed. Disclosures read. Recording paused at the right moments — or, if you've moved to DTMF masking, the masking flow triggered cleanly.
Resolution and accuracy (30%). Did the customer get the right answer? Was the case noted correctly? Were the follow-up actions actually completed?
Manner and empathy (20%). Tone, listening, acknowledgement. Important, but you can train these. Compliance and accuracy errors are the ones that bite.

Weighting matters because an agent who's lovely but misquotes a refund policy is doing more harm than an agent who's flat but accurate. The score should reflect that.

3. Coaching — within seven days or don't bother

The single biggest predictor of whether QA changes agent behaviour isn't the scorecard or the calibration session. It's the gap between the call happening and the agent hearing about it. Anything beyond seven days, the agent can't remember what they were thinking. The coaching turns into "you should have done X" instead of "why did you do Y".

Make coaching weekly, short, and specific. Two calls, fifteen minutes, one behaviour to change. Track the behaviour, not the score, in the next sample. If the score went up but the behaviour didn't change, the score is lying to you.

4. Calibration — agents in the room

QA analysts calibrate with each other regularly enough. They rarely calibrate with the agents being scored. That's where most of the disagreement actually lives.

Run a monthly calibration where two analysts and three agents score the same five calls blind, then talk through the differences. You'll find scorecard items that read one way to QA and another way to the floor. Fix the wording. The act of involving agents in the standard is half of why they accept it.

5. Evidence — for PCI, for audits, and for yourself

This is the part most QA programmes ignore until an audit shows up. Under PCI DSS v4.0.1, the controls around how card data is handled during a call need evidence — not just "we have a policy" but "here's a sample of recent calls and the QA findings against the policy". If your QA programme doesn't sample payment calls specifically, the evidence won't be there when an assessor asks.

The simplest fix: tag every call that took a payment in your QA system, sample them at a higher rate than baseline, and keep the QA findings in a format you can export to an auditor without rebuilding the data. Future-you will thank you.

Where AI helps in 2026 — and where it doesn't#

AI-assisted QA is finally useful, three years after every vendor started claiming it was. Speech-to-text accuracy on contact centre audio is now in the high 90s for clean lines, low 90s for noisy ones. Sentiment analysis is good enough to flag escalation risk but not good enough to score empathy. Automated scoring on objective items — script adherence, disclosure presence, hold-time policy — works well. Automated scoring on subjective items — tone, judgement, problem-solving — doesn't, and probably won't for a while.

Call recording software with audio waveform on screen

The right split: let AI do the first pass on every call for the objective items, surface the calls that need human review, and let your QA analysts spend their time on the bits machines are bad at. We've seen QA teams double their effective coverage this way without adding headcount. The risk is letting the AI score everything and assuming the score is true; the items it's bad at are the items that matter most for coaching.

What we wouldn't trust AI to do yet

Score empathy or emotional intelligence on a granular scale.
Make agent-performance management decisions on its own.
Replace human judgement on complex compliance calls — particularly anything involving vulnerable customers.
Decide whether a payment call followed PCI requirements. The audio is too easy to misread, and the consequence of getting it wrong is too high.

The pattern we see again and again: QA covers the conversation, the resolution, the tone, the script — and stops the moment the agent says "I'll take your card now". The recording either gets paused (often badly, sometimes not at all), the agent reads the card numbers, and QA picks back up after "that's gone through". The payment portion — the part with the highest regulatory weight — gets the least QA attention.

Two ways to fix this. The hard way: build a separate QA process for payment calls, train analysts on PCI, and accept that you're doubling the QA effort on every transactional call. The easier way: move the card data out of the conversation entirely. With channel separation, the customer keys their card data into their phone, the agent hears tones not numbers, and the call recording never contains the PAN or CVV. QA can listen to the whole call without redaction, score everything in context, and the PCI evidence is the system, not the script.

That's the model we run for contact centres taking telephone payments. It changes what QA can do — because every call is fully listenable, every call is fully scorable, and the compliance question gets answered by the architecture not by the agent's discipline on the day.

Metrics that matter, metrics that don't#

The QA metrics that actually predict business outcomes are narrower than most reporting suggests:

Behaviour change rate. Of the specific behaviours coached in the last cycle, what percentage actually changed in the next sample? This is the one number that tells you if your QA programme is working.
Compliance error rate on payment calls. Trending down should be a board-level metric in any centre that takes cards.
Repeat coaching items per agent. If the same agent is being coached on the same item three months running, the coaching isn't working — or the agent needs a different role.
QA-to-CSAT correlation. If your QA scores don't correlate with customer satisfaction, the scorecard is measuring the wrong things. Fix the scorecard.

Metrics that consume reporting time without changing anything: average QA score, sample volume, percentage of agents scored. They're inputs, not outcomes. Report them once a month and don't make decisions from them.

A practical starting point#

If you're rebuilding QA in a centre that takes phone payments, the order of operations matters. Do these in sequence and you'll see movement within a quarter:

Cut the scorecard to 25 items, weighted as above.
Tag payment calls and sample them at twice the baseline rate.
Move to weekly coaching, two calls, one behaviour.
Add agents to one calibration session per month.
Move card capture off the recording — with secure telephone payments or a payment link — so QA can listen to every call without redaction.
Layer AI for first-pass scoring on objective items only.

You'll know it's working when coaching items stop repeating, payment-call compliance errors trend down, and your QA team starts spending its time on judgement calls rather than transcription. If you want to talk through what this looks like in a centre that takes telephone payments, get in touch — happy to share what we've seen across the centres we work with.

Call Quality Monitoring in Contact Centres: 2026 Framework

What call quality monitoring is actually for#

The 2026 framework#

1. Sampling — stop pulling four random calls per agent

2. Scoring — fewer lines, weighted by harm

3. Coaching — within seven days or don't bother

4. Calibration — agents in the room

5. Evidence — for PCI, for audits, and for yourself

Where AI helps in 2026 — and where it doesn't#

What we wouldn't trust AI to do yet

Payments are the QA blind spot most centres still have#

Metrics that matter, metrics that don't#

A practical starting point#

Related Articles

Cloud contact centre: how it works and what to check

Contact Centre PCI Compliance: A Practical 2026 Guide

How to Improve Call Centre Efficiency

Ready to take secure payments?