What is DTMF masking?

When a customer types card details on their phone keypad, every keypress generates a DTMF tone in the audio. DTMF masking replaces those tones with a flat sound in real time, before they reach your agent or your call recording. The card data goes straight from the customer's handset to Paytia and on to your payment gateway. You'll also hear it called DTMF suppression — it's the same thing.

What's the difference between DTMF masking and DTMF suppression?

Nothing — they're two names for the same technology. Vendors differ on which one they use in their marketing. We used to call it DTMF suppression ourselves; most of our customers search for DTMF masking, so that's what we lead with now. Both describe the same thing: intercepting the keypad tones in real time so they never reach your agent's audio or your call recording.

How is it different from Channel Separation?

Both keep card data out of your business and drop you to SAQ A. The difference is what the agent does. With DTMF masking the agent stays on the live audio throughout — they can talk the customer through the capture and pick up the conversation immediately afterwards. With Channel Separation the agent's audio path goes off-line during capture and voice prompts run the flow. Pick DTMF masking if you want the agent engaged through the payment step.

Does it work with my phone system?

Yes — modern CCaaS platforms (Genesys, Five9, Amazon Connect, NICE CXone, 8x8, RingCentral, Talkdesk), traditional PBX, and plain SIP/VoIP trunks. Integration is via API or SIP. Most setups are live within a week.

How does it reduce PCI DSS scope?

Card data never enters your network, your agents, or your call recording. Most businesses move from SAQ D (329 controls) to SAQ A (22 controls). The recording system stops being in scope because there's no card data in it to begin with.

Is agent training required?

A little — there's a one-click action per call. The agent enters the amount, presses one key to start the capture, then watches a progress indicator on screen until the payment authorises. That's the whole behaviour change; most teams pick it up inside a single shift. If you want zero agent training, Channel Separation is the variant to look at — the platform drives the capture automatically, and the agent does nothing during the payment step.

Can DTMF masking be used for MOTO payments?

Yes. It's built for card-not-present transactions over the phone — agent-assisted sales, mail-order, telephone-order, anywhere a customer would otherwise read card details over a call.

DTMF Masking Solution — How It Works & PCI DSS Impact

How DTMF masking actually works on a live call

Every keypress on a phone produces a pair of audio frequencies — one low, one high. That's where the “dual-tone multi-frequency” name comes from. Press “5” and the handset emits 770 Hz alongside 1336 Hz. The low group runs 697, 770, 852 and 941 Hz; the high group runs 1209, 1336, 1477 and 1633 Hz. Every key on the pad maps to a unique pair, and the two-tone design exists because a single tone could be imitated by a human voice or background noise — a precise simultaneous pair can't be. Telephone networks have decoded those tones into digits since the 1960s, which is why a card number can travel over a phone call at all. It's also why the agent can hear it, why the call recorder captures it, and why a sticky note left by the desk three hours later is still a card-data leak waiting to be auditioned.

We sit in the audio path between the customer and your agent. When the agent decides it's time to take payment, they enter the amount in our agent terminal and press one key — usually 729 — to start the capture. The customer hears a short voice prompt asking them to enter their card on their handset. From that moment, every DTMF tone the customer generates gets caught by our platform, decoded into the actual digit, and sent over an encrypted channel to your payment gateway. The agent's leg of the call gets a flat replacement tone instead of the real one — a soft, identical-sounding chirp that tells them “something was pressed” but reveals nothing about which digit. The recording layer captures that same flat tone. So does any QA platform that's pulling audio from the trunk.

The voice channel stays open the whole time. If the customer asks “is this the right card?” or “what was the total again?” the agent answers normally — they just can't hear the digits being entered. The agent screen shows a masked indicator — **** **** **** 1234 once the full PAN, expiry and CVV are in — so they can see the capture is progressing without seeing the numbers themselves. When the gateway returns an authorisation, the agent sees the result on screen (paid, declined, or retry) and picks the conversation back up: “That's gone through, thanks — I'll email your receipt now.” The whole capture takes 20 to 30 seconds for most cards. There's no transfer to an IVR, no “please hold while I connect you to our payment system” — the call doesn't go anywhere.

The technical detail that matters here is wherethe interception happens. A naive implementation that detects tones and then replaces them can let a fraction of the first tone — a “bleed” — escape into the agent path before the replacement kicks in. The PCI Security Standards Council's telephone-payments guidance flags DTMF bleed specifically. Our implementation clamps the audio stream at both ends so that no tone can leak under any timing condition, which is the difference between a control that passes audit-day and one that holds up when a QSA actually starts pulling spectrograms.

A couple of details people ask about. The card number never sits in our systems beyond the few hundred milliseconds it takes to forward it to your gateway — we're a transmission point, not a vault. If you want to take repeat payments from the same customer later, the gateway tokenises the card and gives you a token to keep; the original PANstill never lives in your environment. And because we're intercepting at the audio layer rather than relying on agent behaviour, there's no “the agent forgot to start it” failure mode. If the agent skipped the capture and asked the customer to read the digits aloud the old way, you'd hear that on your recording — but the moment they press 729, masking is on, and there's nothing for the agent to remember to do correctly.

What this does to your PCI DSS scope

PCI DSS scope is determined by where cardholder data flows. Anywhere card data is stored, processed, or transmitted is in scope, and every system in scope has to meet the standard's controls. Without DTMF masking, a typical contact-centre payment puts a lot of systems in the data path: the agent's headset and workstation, the agent's desktop applications, the call recording server, the QA platform that ingests recordings, the WFM tools that touch them, the network segments connecting all of those, the firewalls in front, and the directory services authenticating the agents. That's the SAQ D scenario — 329 controls covering everything from network segmentation to vulnerability scans to formal incident-response plans, audited annually.

With masking on, card data never reaches any of those systems. The customer's handset talks to our PCI DSS Level 1 platform, and we talk to your gateway. That's the cardholder data environment, and you're not in it. The recording stops being in scope because there are no card digits in the audio. The agent workstation stops being in scope because no card data reaches it. Your CRM stops being in scope because the agent never has digits to paste. Most of your network drops out for the same reason. You move to SAQ A, which is 22 controls — most of them about picking a certified service provider, validating their attestation, and keeping your own organisational policies in order. It's the same SAQ used by businesses that take payment only via a hosted iFrame on their website. The PCI Security Standards Council's information supplement Protecting Telephone-Based Payment Card Datarecognises DTMF masking as a method that takes the telephony environment, the agent environment, and the CRM out of scope, and that's the entire commercial logic of the control.

There's a quieter version of the same risk that organisations underweight: the rogue agent. Most contact-centre fraud doesn't come from external attackers — it comes from someone on the inside who can hear or see card digits and chooses to write them down. Once the data is in the agent's environment, no audit log will tell you a number went into a notebook. The only reliable defence is making sure the data never reaches the agent in the first place, which is exactly what masking does. You're not asking your staff to behave well around card data — you're making it impossible for them to misbehave because the data isn't there.

A few things stay in scope and it's worth being honest about which. You're still responsible for the systems that handle the resultof the payment — order numbers, refund records, the customer record where you log “paid £49.99 on 12 March.” That's not card data, but if you store the auth code, last 4 digits, or anything else returned from the gateway, those records still need basic protection. You're also still responsible for vetting Paytia: pulling our Attestation of Compliance every year, checking we're still on the PCI Council's registered service-provider list, and including the validation in your own SAQ A submission. That's real work but it's an afternoon, not a quarter.

The numbers people care about: most of our customers report PCI compliance costs falling 80–90% in the first year, the audit itself going from a multi-week QSA engagement to a self-assessment, and recording retention becoming a non-issue. There's also a category of cost that's harder to quantify — the agent training, the “don't write that down” posters, the periodic spot-checks of the contact-centre floor — that simply stops being needed. If there's no card data for an agent to mishandle, you don't need a programme to stop them mishandling it.

What changed under PCI DSS 4.0.1

PCI DSS 4.0.1 has been mandatory since 31 March 2025. Two of the changes hit telephone payments directly. The first is the treatment of call recording: any recording that captures sensitive authentication data after authorisation completes is now an explicit control failure. Pause-and-resume was already a weak control under earlier versions — under 4.0.1, an environment that depends on it has very little rope when an auditor finds the inevitable handful of recordings where the pause didn't fire. The reliable answer is to ensure the data isn't in the audio at all, which is what DTMF masking does at an architectural level rather than a behavioural one.

The second is segmentation. 4.0.1 expects clearer documented separation between in-scope and out-of-scope systems, and a properly descoped contact centre is structurally easier to defend in this respect — card data never crosses the boundary, so the boundary is straightforward to evidence. UK contact centres also have to square this with the FCA's SYSC 10A taping rules, which require recording of regulated client calls. The two obligations collide every time a customer reads digits aloud or presses them on an unmasked keypad. Masking is the architecture that lets you satisfy both — your recording stays complete and continuous, with the digits simply absent from the audio.

Why customers actually feel the difference

The compliance argument is the one that wins budget. The customer-experience argument is the one that turns a payment line from a friction point into a revenue channel. Both matter, and one of them is harder to measure than the other.

Customers know when an agent can hear them entering card details. They feel it. We've had customers tell us they used to go outside, close their office door, or wait until they got home before making a payment call — because reading their card aloud or hearing the tones echo across the line in a quiet office felt like undressing in public. Some would avoid the phone payment entirely and ask for an invoice, adding days or weeks to the cycle while you chased a payment that should have closed on the call. That friction translates directly into lost or delayed revenue, and it's invisible if you only look at conversion at the call level — the customer didn't abandon, they just asked to pay later, and a third of them never come back to it.

Open-plan offices are where this bites hardest. A finance manager paying a supplier from a shared workspace doesn't want everyone within three desks to hear the keypad tones — even if the actual digits can't be reconstructed by ear, the discomfort is enough to make them stop the call and pay another way. Public spaces present the same problem, sharper. A customer on a train platform, in a hospital waiting room, or sitting in a coffee shop doesn't have the option of finding a private room. With masking on, they enter their card silently while the agent stays on the line to confirm the amount and answer questions — and the awkwardness simply isn't in the call any more.

Remote and hybrid working sharpened the same edge from the agent's side. Since 2020, a meaningful proportion of contact-centre agents take payments from spare bedrooms and kitchen tables, with partners, children and housemates in earshot. Without masking, you're trusting every one of those agents to maintain physical security in a home they don't control. With masking, card data never reaches the agent's location regardless of where they happen to be sitting — the privacy protection is consistent across head-office desks, home studies and co-working spaces. Agents tell us they find this easier too: they're no longer worrying about whether they accidentally wrote down a digit, whether the recording captured something it shouldn't have, or whether they'll be blamed if a customer's card is later compromised. That reduction in anxiety has a real effect on staff retention in a sector where turnover is already painful.

If you want a single metric to track, look at completion rate at the point of payment — the percentage of calls where payment is offered against the percentage where it's actually captured before the customer hangs up. Most customers we work with see this lift after masking goes live. The other useful number is time-to-payment: the days between a service being agreed and the cash arriving. If that drops from weeks to zero because the payment closes on the call, the privacy advantage is doing what it's supposed to.

Where it fits in your stack

We don't rip and replace. You keep your CCaaS, your gateway, your CRM, your QA platform, your headsets. We slot into the audio path and change one thing — what reaches the agent during the seconds the customer is keying digits. Everything else carries on as before.

On modern CCaaS — Genesys, Five9, Amazon Connect, NICE CXone, 8x8, RingCentral, Talkdesk, 3CX — we connect via API or SIP and the integration is a few hours of work on our side and a sign-off on yours. Traditional PBXes (Mitel, Avaya, Cisco) and plain SIP trunks are the same shape with slightly more co-ordination on routing. Your gateway can be Stripe, Worldpay, Adyen, Opayo, NMI, Trust Payments, Braintree, Cybersource or any of the long tail — if it has an API, we talk to it. We support seven integration patterns in total: network-level divert, PBX-level divert, IVR menu, agent transfer or conference, PSTN outbound via Paytia, Paytia WebRTC, and direct SIP trunk integration. Between them, that covers almost every contact-centre telephony stack we've walked into.

For the agent, the behaviour change is genuinely small — enter the amount, press 729, watch a progress indicator until you see the green light, carry on the conversation. A 15-minute team huddle covers it for most teams. The bigger piece of work is usually social: deciding how the agent introduces the capture (“I'm going to take your card details now — when you're ready, please enter your long card number on your phone keypad and I'll see when you're done”), making sure the script feels natural, and getting team leaders comfortable with the new audit trail. Edge cases that occasionally surface: old softphones with off-frequency DTMF (five-minute tolerance widen on our side), poor mobile signal degrading detection (the retry prompt handles it cleanly), a SIP carrier converting tones to RFC 2833 events (different handler, same outcome), or a recorder forking the stream at the SBC layer rather than the application layer (we just need to confirm masking sits upstream). None of these are projects.

If you want to see how this lands in your specific environment, book a 15-minute demo and we'll map it against your stack. If you'd rather see it side-by-side against the alternative architecture, channel separation takes a different route to the same compliance outcome — the comparison page walks through where each one lands in practice.

Area	Without Paytia	With Paytia
Self-assessment	SAQ D (329 controls)	SAQ A (22 controls)
Network in scope	Most of your stack	None
Call recordings	Pause-and-resume or redact	No restrictions
Staff training	Mandatory and recurring	None required

DTMF masking — silence card tones, stay PCI compliant

What the agent does

How a call actually flows

Customer

Paytia platform

Agent

Customer

Paytia platform

Agent

How DTMF masking actually works on a live call

What this does to your PCI DSS scope

What changed under PCI DSS 4.0.1

Why customers actually feel the difference

Where it fits in your stack

DTMF Masking or Channel Separation?

DTMF Masking

Channel Separation

What you get

Agent stays on the line

Recording stays clean

Works with what you have

Live in days

PCI DSS scope, before and after

PCI DSS Level 1

Who uses it

Contact centres

Financial services

Utilities

Charities

Frequently asked questions

Comparing approaches

Want to see it on your telephony?

Related solutions

MOTO Payments

Agent-Assisted Payments

Taking Card Payments Over the Phone