Payment Security29 May 202619 min read

Vishing Detection — Voice Phishing in Call Centres

How vishing detection actually works in a UK contact centre — voice biometrics, behavioural analytics, agent training and PCI-grade payment masking.

Vishing Detection — Voice Phishing in Call Centres

TL;DR

Vishing detection in a contact centre is a layered job: voice biometrics and liveness checks to spot synthetic audio, behavioural analytics to catch social-engineering patterns, agent training to slow the call down, and PCI-grade payment capture so a stolen call can't become a stolen card. No single tool stops voice phishing on its own — but together they cut successful attacks to a fraction of what they were.

Last updated: 29 May 2026

If you run a contact centre in 2026 you've already seen vishing detection move from "nice idea" to "thing the board asks about every quarter". And rightly so. The economics flipped in the last 18 months: a fraudster can clone a voice from 30 seconds of audio, spin up a script in a free LLM, and dial 200 numbers a day from a softphone that costs nothing. Defending against that with a 2019 playbook — agent intuition plus a knowledge-based authentication question — doesn't work anymore.

This guide is the practical one. We're not going to spend 5,000 words explaining what voice phishing is (we cover that in the fraud prevention glossary). We'll show you what actually catches a vishing attack in a real contact centre, what the detection stack looks like end to end, and where most teams have a gap that fraudsters are already exploiting. If you want the wider context, the contact centre fraud detection guide covers the broader threat landscape — this post zooms in on the voice phishing piece.

What good vishing detection actually looks like#

Most vendors describe vishing detection like it's a single product you buy. It isn't. It's a workflow that starts when a call connects and ends when the agent either completes the transaction or escalates. The good ones run four layers in parallel: voice biometrics, behavioural analytics, agent process, and payment-layer controls. Skip any one and you've got a gap.

The voice biometric layer compares incoming audio against a stored voiceprint and runs a liveness check to spot synthetic speech. The behavioural layer scores the call against the caller's normal pattern — what device they're on, where they're calling from, what they're asking for, how they're answering questions. The agent layer is the human pause: trained staff who know the social-engineering tells and have the authority to slow a transaction down. The payment layer is where most teams under-invest, and where Paytia sits — if the customer's card never enters the agent's ears or the call recording in the first place, a successful vish doesn't translate into a successful card-not-present attack.

The reason layering matters: every single one of these defences has a known bypass. Voice biometrics get fooled by good deepfakes. Behavioural scoring throws false positives during seasonal spikes. Agents get tired or chatty. PCI scope reduction doesn't stop someone impersonating a customer to change a delivery address. But chain them together and the fraudster has to beat four systems on the same call. That's a different conversation.

The voice biometric layer — what it catches and what it misses#

Voice biometrics turns a few seconds of speech into a numerical voiceprint, then compares incoming audio to that print. The good systems also run a passive liveness check that looks for the spectral fingerprints of synthetic speech — the artefacts left by every text-to-speech engine and voice cloner currently shipping. Done well, this catches the easy 60–70% of attacks: pre-recorded audio, low-quality clones, accent mismatches, and audio replayed over a phone line.

What it misses: high-quality real-time voice clones generated by a frontier model, and human attackers who have done their research. The arms race here is loud. Every six months a new generation of voice cloning ships that beats the previous generation of detectors, and every six months the detectors catch up. If you're buying voice biometrics in 2026, the question isn't "does it work today" — it's "how fast does the vendor ship model updates, and what's their false-positive rate on UK regional accents". We've seen perfectly competent systems flag legitimate Glaswegian callers as suspicious because the training data was thin north of Manchester.

The integration story matters too. Voice biometrics has to plug into your call platform without adding latency the customer notices, and it has to score the call early enough that the agent can act on it before the payment step. A risk score that shows up after the card capture is closed isn't detection — it's an audit log.

Behavioural analytics — the layer that catches the patient attacker#

A well-prepared vishing attack will defeat voice biometrics on day one. What it can't easily defeat is the cumulative behavioural signal: the device fingerprint of the calling number, the geographic origin, the time of day, the velocity of contact attempts across the customer base, the specific phrases the caller uses, and the way they navigate the IVR. Behavioural analytics aggregates all of that and assigns a risk score on a 0–100 scale that updates throughout the call.

The signals that matter most in our experience: caller ID reputation (is this number associated with previous fraud reports across the wider telecom ecosystem), call origin (is the routing path consistent with the customer's home country), agent-script deviation (is the caller pushing the conversation toward payment changes, address updates, or refund requests faster than a normal customer would), and verification-question latency (is the caller pausing in places that suggest they're reading off a screen). None of these is conclusive on its own. Combined, they paint a picture an experienced fraud analyst would recognise immediately.

Behavioural analytics works best when it's plumbed back into the agent's screen in real time. A risk score that floats above 70 should pop a visible flag — not a soft "caller may be suspicious" hint, but a clear instruction to step through a hardened verification flow before any payment, address change, or account access. We've watched contact centres install excellent behavioural scoring engines and then leave the score buried in a separate analytics dashboard the agent never sees. That's not detection, that's reporting.

The agent process — why training still beats every algorithm#

Talk to any fraud team that's been through a serious vishing incident and they'll tell you the same thing: the call that broke through was the one where the agent felt rushed. Vishers know this. The scripts they run are built to compress the agent's decision time — a sob story, a pretend manager, a hard deadline, a partial answer to a verification question that creates plausible deniability. If the agent has 90 seconds of unstructured pressure before they touch the payment screen, the social-engineering side of the attack has already done its work.

Process changes that actually move the needle: a mandatory pause before any account modification (we like a 5-second confirmation screen that the agent can't skip), a hard rule that no caller can talk an agent into bypassing the verification flow regardless of seniority claims, and a one-button escalation route that gets a supervisor on the call without making the customer feel accused. The escalation route matters because it removes the social cost of being suspicious — the agent isn't accusing the caller, they're following procedure. That's the framing that lets junior agents do the right thing on a Friday afternoon.

Training works when it includes recorded examples of real attacks. Not made-up scenarios — actual recordings (anonymised) of vishes that succeeded against your own team or peer organisations. The patterns are eerily consistent across industries, and once an agent has heard three of them, they recognise the fourth in the first 30 seconds. We cover the broader pattern recognition piece in social engineering attack defence for contact centres — vishing is one branch of a wider social-engineering tree and the defensive techniques overlap heavily.

Why payment-layer controls finish the job#

Here's the bit most vishing detection guides skip: even a perfect detection stack has a non-zero false-negative rate. Some attacks will get through. The question is what they're worth to the attacker once they're past the gate. If your agent is reading card numbers off a screen the customer dictates over the phone, a successful vish translates directly into a successful payment fraud. If your agent never touches the card data — because the customer types it into their own keypad and the digits are masked from both the agent and the call recording — then a vishing attack that gets past detection still can't extract a card to use.

That's the principle behind DTMF masking and channel separation: take the payment data out of the conversation entirely. The customer hears the agent, the agent hears the customer, but when it's time to pay, the customer's keypad tones are intercepted before they reach the agent's headset or your call recording system. The payment processor gets the digits, the rest of your environment gets flat tones. Whether the caller is genuine or a deepfake doesn't matter at that point — the card data never lives in a place the attacker can reach.

This is the layer that turns vishing detection from "prevent every attack" to "contain every attack that gets through". It's also the layer that takes you out of PCI scope. Descoping isn't just an audit benefit — it's a fraud control. Data you don't capture can't be stolen by a social engineer who's already inside the call.

Contact centre operations floor — where detection and process come together

The 12 vishing patterns we see most often#

Across the UK contact centres we work with, the attempted vishes cluster into a dozen recurring patterns. Knowing the shapes makes them easier to spot. We're listing them as prose rather than a bullet wall — every one of these has nuance a checklist can't carry.

The fake supervisor. Caller claims to be a senior employee of your company, demanding the agent bypass verification because of a "system issue". Often paired with a spoofed internal extension as the caller ID. Defeat: hard rule that no internal caller can request verification bypass, full stop, supervisor or not.

The bereaved relative. Caller says the account holder has died and they need to close the account / change the address / process a refund. Plays on agent sympathy. Defeat: a specific bereavement protocol that requires documentation regardless of how upset the caller is.

The IT support follow-up. Caller references a fake earlier conversation with your IT team and asks the agent to confirm or reset credentials. Defeat: zero-trust rule that no inbound call can drive a credential reset.

The partial answer. Caller answers verification questions with plausible-but-incomplete information — the right town but the wrong postcode, the right date of birth but a one-digit slip — and asks the agent to "check the system" for the right answer. The agent often confirms it accidentally. Defeat: agents trained to never confirm or correct verification answers, only accept or reject.

The deepfake call-back. Caller dials, claims their line is bad, asks the agent to call them back on a number that's slightly different from the one on file. The call-back hits a fraudster's number. Defeat: only ever call back the number on file, never one given mid-call.

The urgency script. Caller invents a hard deadline — flight in 30 minutes, court appearance, hospital admission — that requires the agent to skip verification. Defeat: trained agents who know the script and a process that doesn't accept urgency as a verification bypass.

The senior citizen impersonation. Younger fraudster impersonates an elderly customer, often using a voice changer or a quickly cloned audio sample. Voice biometrics catches most of these. Behavioural analytics catches the rest via call origin and device fingerprint.

The refund scam. Caller claims to have been charged twice and asks for the refund to go to a new card. Often paired with a partial verification answer. Defeat: refunds only ever route to the original payment method.

The change-of-address attack. Caller updates the delivery address to a drop point, then orders goods or requests a replacement card. The address change is the actual fraud — the order is just the payout. Defeat: address changes require step-up authentication and a delay window before they take effect.

The MFA reset. Caller claims to have lost access to their authenticator app and asks the agent to reset multi-factor authentication. Defeat: MFA resets require either in-person verification or a multi-channel verification flow that the original customer controls.

The friendly chat. No urgency, no hard ask. Caller spends 10 minutes building rapport with the agent, then drops a small request near the end — confirm an email, update a phone number. The small request is the foothold for the next call. Defeat: log every small change for review, especially if the call had no clear purpose.

The chained attack. Multiple short calls across days, each one extracting one piece of information. By call four, the attacker has enough to pass verification cleanly. Defeat: behavioural analytics that correlates across calls, not just within them.

How to score a vishing detection vendor#

If you're shopping for vishing detection, the marketing pages all sound the same. Here's what we'd actually ask in a vendor evaluation, in the order we'd ask it.

Start with the false-positive rate on your actual customer base, not the vendor's training set. A system that flags 5% of legitimate calls as suspicious is unworkable in a contact centre handling 50,000 calls a week — that's 2,500 false alarms, which means agents will start ignoring the flags. Ask for a pilot on 10,000 of your calls, not theirs, and look at false-positive rate by demographic. Regional accent bias is real and it ends careers if it ships to a callbase your vendor didn't train for.

Then ask about latency. The risk score needs to be available to the agent within the first 30 seconds of the call. Any later and the social-engineering work has already started; the agent has built rapport, the customer has set the agenda, and a late risk flag feels like an accusation rather than guidance. If the vendor can't show you a sub-30-second score, walk away.

Then ask about model update cadence. Voice cloning is improving every quarter. Detection models that aren't being retrained on the latest synthetic audio go stale fast. Ask when the model was last refreshed, what the refresh cycle is, and whether you get the new model automatically or have to redeploy. Annual model updates aren't enough for 2026.

Finally — and most teams forget this — ask about the integration with your payment capture. A vishing detection system that doesn't talk to your payment layer is half a defence. You want a vendor whose risk score can trigger an automatic step-up to a hardened payment flow, not just an alert in a fraud queue someone reviews tomorrow. We dig into the tooling tradeoffs in our fraud detection tools comparison.

What we built at Paytia and why it sits at this layer#

Paytia isn't a vishing detection vendor in the strict sense — we don't sell voice biometrics. What we sell is the payment layer that turns vishing detection from prevention into containment. When your agent is taking a card payment from a caller, the customer's keypad input is intercepted before it reaches the agent's headset, the call recording, or anything else in your contact-centre environment. The agent hears the customer's voice, the customer hears the agent's voice, and the card digits never appear in the same audio stream as the conversation.

That sits naturally alongside whatever vishing detection stack you've got. If a vish gets past your voice biometrics and your behavioural scoring and your agent training, the attacker still can't extract a usable card because the card never entered a reachable place. We work with contact centres running Nuance, Pindrop, Pinpoint, and the major behavioural-analytics vendors — payment masking is downstream of all of them.

The Pinnacle Group case is the clearest illustration. When we deployed channel separation across their UK operations, 95% of their PCI scope dropped out of the assessment — and so did the corresponding attack surface for any fraudster who got past the front-end controls. The vishing detection layer kept catching attacks. The payment layer made the residual attacks worthless.

The compliance angle — why this matters beyond fraud#

Vishing detection isn't just a fraud-loss control. It's increasingly a compliance control. PCI DSS v4.0.1 expanded the requirements around customer-not-present authentication, and the FCA's Consumer Duty regime has made it harder to argue that fraud losses are "just the cost of doing business". If a customer rings your contact centre and a fraudster impersonates them successfully because your verification process was weak, the regulator will want to know what you did about it.

That's another reason the payment layer matters. By the time a vish gets to the payment step, the call has either been detected (good) or not (bad). Either way, if the payment capture is happening in a PCI-compliant environment that the agent can't see into, the regulatory worst case shrinks dramatically. We've covered the PCI DSS v4.0.1 changes in detail elsewhere — for vishing purposes, the relevant point is that masked, descoped payment capture is now table stakes, not a nice-to-have.

What to do in the next 30 days#

If you're reading this with the uneasy sense that your current vishing detection isn't enough, here's the order we'd suggest tackling it in. None of these need a six-month transformation programme.

First, audit your last 90 days of fraud cases for vishing patterns. You'll likely find that 30–60% of card-not-present fraud loss has a social-engineering call somewhere in its history. That number changes the budget conversation immediately.

Second, look at your agent process. Specifically: can an agent skip verification under pressure, and what's the social cost of escalating a suspicious call? If the answer to either is "yes, easily" or "high", that's a free fix — a process change, a button on the screen, a 30-minute training session — that buys you 80% of the gain at zero capex.

Third, look at where card data lives during a phone payment. If your agents see the digits, your call recording captures them, or your CRM stores them in a free-text field, that's where a successful vish becomes a successful fraud. PCI-compliant phone payment removes that surface entirely.

Fourth, then look at voice biometrics and behavioural analytics. The detection layer is the most expensive and the slowest to deploy. Doing it before the process and payment work is wrong-way-round — you'll spend six months catching attacks that wouldn't have succeeded anyway if your downstream controls were tight.

What a real vishing incident looks like end to end#

It helps to walk through what an actual incident timeline looks like, because the abstract "detect, contain, escalate" framing skips the parts where contact centres usually go wrong. We'll describe a composite case based on patterns we've seen across multiple UK clients — the specifics are anonymised but the shape is real.

Tuesday afternoon, mid-call volume. A caller dials the customer service line of a mid-sized utilities provider. The caller ID matches a record on file — spoofed, but the agent's screen pops the matching customer record automatically. The voiceprint score comes back at 78 (the system's threshold for a soft flag is 70) but the agent is two minutes into the call before that score actually lands on the screen because the integration is asynchronous and the audio buffer hasn't filled yet. The caller has already explained that they're moving house, needs the bill copied to a new address, and asks if a top-up payment can go on a different card because their main one is expiring.

The agent — who has been on the phones for eight weeks — registers the soft flag but doesn't act on it because the caller is calm, the story is internally consistent, and the request is the kind they handle 40 times a day. Address change goes through. The new card details are read aloud and entered into the CRM. The call ends. Six days later, three deliveries arrive at the new address — none of them ordered by the genuine customer. The actual customer rings to complain. The fraud team pulls the call recording and the voiceprint score and realises they had every piece of information they needed at minute three of the original call.

The post-mortem points to four breakdowns, in order of cost-to-fix. The score arrived too late, so the agent didn't have it when the decisive moment came. The agent had no clear protocol for what to do with a soft flag versus a hard flag — the training said "use judgement", which means under pressure most agents will continue serving the customer. The address change had no holding window, so the fraudster's new delivery address was active immediately. And the card details went into a free-text CRM field that the agent typed manually from the caller's voice, which means the data lived in a place that the call recording could replay later.

The fix wasn't a new detection vendor. It was tightening the integration so the score lands inside 20 seconds, rewriting the soft-flag protocol to require a hardened verification step, adding a 24-hour delay window on address changes for any flagged call, and routing the payment through a masked capture flow so the card digits never touch the CRM. None of those fixes cost six figures. All of them stopped the same attack repeating two weeks later.

Metrics that matter — and the ones that don't#

Most contact centre fraud dashboards measure the wrong things. Volume of flagged calls, average risk score, total alerts raised — those are activity metrics, not outcome metrics. They tell you the system is doing something, not whether the something is working. The metrics we actually look at when we audit a vishing detection programme are short.

Successful-attack rate per million calls handled. This is the headline. Everything else is a leading indicator for this. If you can't tell us this number for the last quarter, your detection programme isn't measurable and you can't tell whether it's improving. It needs to be calculated from confirmed fraud cases that included a contact-centre call somewhere in their attack chain, not just calls that were flagged at the time.

False-positive rate by customer segment. Aggregate false positives lie. You'll see a healthy-looking 1.8% aggregate figure that masks an 8% false-positive rate among elderly callers and a 0.3% rate among under-30s. The 8% figure is where complaints come from, and it's where regulators will look first if you ever face a Consumer Duty challenge. Slice the false-positive rate by age band, region, and time of day before you trust it.

Time-to-score and time-to-action. The first is how long after call connection the risk score reaches the agent's screen. The second is how long it takes the agent to step through a hardened flow after receiving a flag. Both should be measured per agent and reviewed weekly. Agents who consistently take more than 60 seconds to act on a hard flag need either re-training or relief from queue pressure.

Payment-layer containment ratio. Of the vishes that get past detection, what proportion result in zero card data exposure because the payment was masked anyway? In a well-built stack this should be 100% — every payment, flagged or not, should be captured in a way that doesn't expose the card to the agent or the recording. If it's below 100%, you've got a leak somewhere that doesn't depend on whether the call is fraudulent.

Vishing in voice channels other than phone#

The word "vishing" originally meant voice phishing over the public phone network, but the threat surface has widened. Vishing-style attacks now arrive over WebRTC voice on a customer portal, over voice-enabled chat widgets, through embedded calling in mobile apps, and — increasingly — through outbound voicebot interactions where the fraudster impersonates the bot's customer to trick a downstream human escalation. The defensive principles transfer, but the implementation details differ in two important ways.

First, the audio quality is more variable. WebRTC voice from a customer's browser is compressed differently from PSTN audio, and the spectral artefacts your voice biometric model trained on may not be present. Some detection vendors handle this transparently; others quietly lose accuracy on web-originated voice and don't tell you. Ask specifically about WebRTC performance before relying on a system for browser-originated calls.

Second, the metadata is richer. A web call carries device fingerprint, IP, browser ID, session history, and previous click patterns — none of which a PSTN call gives you. That's a gift for behavioural analytics: a fraudster who's spent the last 10 minutes navigating the help centre in a suspicious pattern before clicking "call us" is a much easier flag than the same person hitting your inbound phone number cold. Make sure your fraud platform actually consumes that pre-call telemetry rather than treating the voice channel as a standalone event.

Third — and this catches teams off guard — the attack surface includes any agent-assisted channel where audio is exchanged. Video-call payments, screen-share sessions, embedded conferencing in customer support tools, all of them carry the same risk that a voice impersonation succeeds and a card gets read aloud. The same masking principle applies: route payment capture out of the agent-facing audio path, regardless of whether it's a phone call, a browser session, or a video meeting.

The honest limits of vishing detection#

We'd be doing you a disservice if we ended this guide without the limits. No combination of these controls catches every attack. The state of the art for voice cloning will continue to outpace the state of the art for detection, with about a quarter's lag between them. Agents will continue to be human. Customers will continue to share information in ways that help attackers. That's why containment matters more than prevention. The contact centres that handle this well aren't the ones with the best detection — they're the ones whose worst-case loss when a vish succeeds is small enough that the loss doesn't make the news.

Get the payment layer right and the detection layer becomes a residual risk control rather than a single point of failure. Get it wrong and every detection improvement you buy will need to be perfect, because the consequence of one missed call is a chargeback queue.

One more honest note. Vishing detection is a fast-moving field, and any guide written today will be partially out of date in six months. The principles in this piece — layered defence, containment beats prevention, agent process is cheap and effective, payment masking removes the prize — hold up across model generations. The specific tools and the specific attack patterns will keep shifting. Build the programme around the principles, not the products, and you'll keep up.

Next steps#

If you want to talk through your vishing detection stack and where Paytia's payment layer would sit alongside it, book a 20-minute call with our team — bring your current fraud-loss numbers and we'll show you what containment would change. If you'd rather see the payment-masking layer in action first, the live demo walks through DTMF masking with a real call. Either route, we'll be straight about what we do and don't fix.

The Paytia solution

If you're reading this, here are the Paytia solutions that solve it.

Related Articles

Ready to take secure payments?

Book a demo with our team. We'll show you DTMF masking live, talk through PCI DSS scope reduction, and put together pricing based on your call volume.

PCI DSS Level 1
Cyber Essentials Plus

Trusted by law firms, insurers, healthcare providers and regulated businesses worldwide. Learn more about Paytia