What is SIP (Session Initiation Protocol)?

Session Initiation Protocol (SIP) is the signalling standard that sets up, manages, and tears down voice, video, and messaging sessions over IP networks. Defined in RFC 3261 (2002), it's the protocol that makes VoIP calls work — handling the ring, answer, and hang-up handshake. SIP doesn't carry the audio itself; it negotiates the connection and the media flows separately, usually as RTP.

Session Initiation Protocol (SIP) is the signalling protocol that sets up, manages, and tears down voice, video, and messaging sessions over IP networks. Standardised in RFC 3261 in 2002, it's the layer that makes a modern VoIP call possible — when one phone dials another over the internet, SIP is what handles the ring, the answer, and the hang-up. It doesn't carry the audio itself; it negotiates the connection and then media (typically RTP) flows separately. SIP runs on port 5060 unencrypted, or 5061 over TLS — that secured variant is called SIPS.

SIP, sometimes called SIP signalling, is the control language of internet telephony. Think of it like the instructions on an envelope rather than the letter inside — it tells the network who's calling whom, what kind of session they want, and how to find each other, but the actual conversation rides on a separate stream. That separation is why SIP is so flexible: the same protocol can set up a phone call, a video conference, a chat session, or a screen share, and the media format is negotiated on the fly.

How SIP actually works on a call

When you pick up a SIP phone and dial, your handset sends an INVITE message to a SIP server, which routes it to the recipient. The recipient's phone replies — first with a 100 Trying, then 180 Ringing while it rings, then 200 OK once it's answered. Your phone sends an ACK to confirm, and at that point the signalling has done its job. The two endpoints now know each other's IP address, what audio codec they've agreed on, and which port to send the audio to. Voice packets start flowing as RTP, completely separate from the SIP messages.

Hanging up is just as structured: one side sends a BYE, the other replies 200 OK, and the session ends. This whole conversation — INVITE, Ringing, OK, ACK, BYE — is what SIP does. Everything else (the actual voice quality, the call recording, the DTMF tones the caller presses) happens at a different layer.

Why SIP matters for contact centres

BT is switching off the UK's analogue and ISDN networks by January 2027, which means every contact centre still on legacy lines has to migrate. Almost all of them are migrating to SIP — usually via SIP trunking, where a provider delivers virtual phone lines over your internet connection instead of physical copper. A modern cloud contact centre is essentially a SIP-aware platform with agent software, queueing, and reporting built on top.

That matters for payments because SIP is the layer where DTMF tones — the beeps from the caller's keypad — are routed. SIP can carry DTMF two ways: in-band (as actual audio tones inside the RTP stream) or out-of-band (as SIP INFO messages or RFC 4733 named telephone events). Knowing which method a contact centre uses is the first question any DTMF-masking solution has to answer, because the masking technique is different for each.

SIP, SBCs, and security

Most enterprise SIP deployments sit behind a Session Border Controller, which acts as a firewall and traffic manager for SIP traffic. SBCs handle the messy business of NAT traversal, codec translation, and protecting the SIP infrastructure from attacks like registration floods or toll fraud. They're also frequently where DTMF masking happens at the network layer — the SBC intercepts the RTP stream during card capture, replaces the keypad tones with a flat tone for the agent and the call recorder, and forwards the real digits to the payment gateway.

Securing SIP itself means using SIPS (SIP over TLS) for the signalling and SRTP for the media. Without those, a determined attacker on the network can read the signalling messages and even reconstruct the audio. For phone payments that obviously isn't acceptable, so encrypted SIP is the baseline expectation in any PCI-relevant deployment.

SIP versus the old PSTN world

The protocol SIP is replacing — the legacy PSTN with its ISDN circuits and channel-associated signalling — was a closed system controlled by telcos. SIP is an open IETF standard, which is why it spread so fast. Any phone, any softphone, any call-centre platform can speak SIP, and they can all interoperate. The trade-off is complexity: SIP has dozens of message types, hundreds of headers, and a long list of optional extensions (RFC 3261 alone is 269 pages). Most of that complexity is hidden from end users, but if you're integrating a payment platform with a SIP-based contact centre, you'll spend a lot of time reading RFCs.

How Paytia Uses This

Paytia plugs into the SIP layer of your contact-centre platform to deliver DTMF masking without changing how your agents work. When a caller's about to read out their card, the agent triggers a secure capture session — at that point we intercept the RTP stream carrying the keypad tones, mask them as a flat beep for the agent and the call recorder, and route the real digits straight to the payment gateway over an encrypted channel.

Because the integration happens at the SIP and media layer, we don't care whether you're running a cloud contact centre, an on-premise IP-PBX, or a hybrid. We support SIP trunking, SBCs, and the major contact-centre platforms, and we handle both in-band and RFC 4733 DTMF. The result is the same regardless of the underlying setup: card data never reaches the agent's headset or the call-recording system, which takes most contact centres from PCI DSS SAQ D to SAQ A.

Frequently Asked Questions

Is SIP the same thing as VoIP?

No. VoIP is the broad category of voice-over-internet calling, while SIP is one specific signalling protocol used to make VoIP work. There are other VoIP signalling protocols (H.323, IAX2, proprietary systems like Skype's), but SIP has won the open-standards race for business telephony.

What's the difference between SIP and RTP?

SIP handles the signalling — setting up the call, ringing, hanging up. RTP carries the actual audio packets once the call is connected. They run as separate streams on different ports, which is why you can do things like mask DTMF tones in the RTP stream without touching the SIP signalling.

Is SIP secure?

SIP itself sends signalling in plain text by default, which is why SIPS (SIP over TLS, port 5061) and SRTP for the media exist. Any modern contact-centre deployment handling card payments should be using both — unencrypted SIP on a network an attacker can reach is a serious problem.

What ports does SIP use?

SIP uses port 5060 for unencrypted signalling and 5061 for TLS-encrypted (SIPS). The RTP media streams use a separate range of UDP ports negotiated during the SIP handshake — typically anywhere between 10000 and 20000 depending on the platform.

How does SIP relate to DTMF and card payments?

SIP carries DTMF tones either in-band as audio inside the RTP stream, or out-of-band as SIP INFO messages or RFC 4733 named events. DTMF-masking solutions like ours hook into the SIP and RTP layers to intercept those tones during card capture, so the keypad digits never reach the agent or the call recording.

See how Paytia handles session initiation protocol (sip)

Book a personalised demo and we'll show you how our platform works with your setup.

PCI DSS Level 1
Cyber Essentials Plus

Trusted by law firms, insurers, healthcare providers and regulated businesses worldwide. Learn more about Paytia