If your contact centre takes card payments by phone, the card number is the problem. The moment a customer reads it aloud, or even keys it in where your agent can hear the tones, that number is inside your business — on the call, in the recording, across your telephony. DTMF masking is the technique that stops that happening. Done properly, the digits never reach your agent, your recording, or anything else you own.
This guide explains what DTMF masking actually is, how it works at a technical level, how it compares to the older alternatives, and what it does to your PCI DSS scope. We've run it as a PCI DSS Level 1 service provider since 2016, so where it matters we'll tell you what works rather than what sells.
What DTMF masking is#
DTMF stands for Dual-Tone Multi-Frequency — the tones your phone keypad makes when you press a key. Each digit is a distinct pair of frequencies, which means a number typed on a keypad can be decoded by anything listening to the audio. That's the whole problem with phone payments: if a customer types their card number and the tones travel down the same line as the conversation, your agent's phone, your call recorder, and your telephony platform can all reconstruct the card number.
DTMF masking removes those tones from the parts of the call you can hear and keep. The customer types their card number on their own keypad while staying on the line with your agent. The digits are intercepted and sent straight into a secure payment environment; what's left on the agent's leg of the call is either silence or a flat, meaningless monotone. The agent keeps talking the customer through the payment — they just never receive the number.
How it works, and why where it happens matters#
Not all DTMF masking is equal, and the difference comes down to one question: where in the call are the tones intercepted?
Some approaches mask the tones at the agent's desktop — a piece of software on the agent's machine flattens the digits before they're played to the agent. That protects the agent's ears, but the tones have already travelled through your telephony to reach the desktop. Your phone system, your SIP trunk, and anything tapping that audio have already seen the card number. Your telephony stack is still in scope.
The stronger approach intercepts at the SIP/media layer — out on the network, before the audio ever reaches anything you run. We sit between your customer and your telephony stack and pull the card-entry tones out there. They never reach your phone system, your recorder, your CRM, or your agent. There's no software on the agent's machine and nothing for them to do differently. The card data goes directly into our PCI DSS Level 1 environment and is passed to your payment gateway; your business never touches it.
That distinction isn't academic. It's the difference between masking what the agent hears and removing the card data from your environment entirely — and only the second one meaningfully cuts your PCI scope.
DTMF masking vs pause-and-resume vs secure speech#
Three approaches get talked about in the same breath. They're not equivalent.
Pause-and-resume is the oldest. The call recording is paused while the customer reads their card details aloud, then resumed afterwards. It's better than nothing, but it only protects the recording. The customer still reads the number out loud, the agent still hears it, and the audio still passes through your telephony — so your agent, your phone system, and your live call are all still handling card data. The pause itself also flags exactly where in the recording the sensitive moment was. It addresses one symptom and leaves the disease.
Secure speech (speech-recognition capture) has the customer speak the digits, which are transcribed by an automated system rather than typed. It can keep the agent out of the loop, but accents, background noise and misrecognition make it less reliable for something as unforgiving as a 16-digit card number, and depending on the design the spoken number may still travel through audio your agent can hear.
DTMF masking, intercepted at the network layer, is the cleanest of the three: the customer enters the number themselves on a keypad they already know how to use, the tones are removed before they reach anything you own, and the agent stays on the line throughout. For most contact centres it's the right default. Where it gets interesting is the choice between keeping the agent on the live audio (DTMF suppression) and splitting the call so the agent's audio drops out entirely during capture (channel separation) — we cover that trade-off in detail in DTMF masking vs channel separation.
What it does to your PCI DSS scope#
This is the part that actually changes your compliance position. PCI DSS scope is every system, network and person that touches cardholder data. The more of your business that handles the card number, the longer your Self-Assessment Questionnaire and the more controls you have to evidence, patch, scan and document.
When card data never enters your environment — because it was intercepted at the network layer and delivered straight to a Level 1 service provider — the systems and people that used to be in scope come out of it. In practical terms that typically moves you from SAQ D, the full questionnaire with 329 requirements, to SAQ A, the shortest at 22. That's not a discount on the same work; it's a different, far smaller amount of work, because most of your environment is no longer in scope at all. You can read how this maps onto the current standard in how we handle PCI DSS v4, and there's a plain-English primer on the standard itself in our PCI DSS glossary entry.
How it behaves with your contact centre platform#
Because the interception happens at the SIP/media layer rather than on the agent's desktop, DTMF masking doesn't care much what's running on top of it. It works with a traditional landline or PBX, with VoIP, and with cloud contact-centre platforms — Five9, Genesys, Amazon Connect, NICE and the like — because it sits on the call path, not inside the platform. There's no agent software to roll out and no change to how agents log in or handle calls. Most customers go live within a week with nothing installed on their side.
If you run a specific platform and want to see the shape of a real deployment, our Aircall DTMF masking setup guide walks through one end to end. For the product detail and pricing, see DTMF masking, and for a shorter primer there's what is DTMF.
Where DTMF masking fits#
DTMF masking earns its place anywhere a person is on a live call when a card payment happens — inbound order lines, outbound collections, account top-ups, renewals. It's the control that lets a contact centre keep the human conversation while taking the card data out of the room. If your agents currently hear card numbers, pause recordings by hand, or work to a locked-down clean-room setup that's painful to run, this is the architectural fix that makes all three problems go away at once. It's one of the building blocks behind every channel we secure for contact centres.




