- INTRODUCTION H.323 and Session Initiation Protocol (SIP) are the two standardized protocols for the realization of Voice over Internet Protocol (VoIP).1,2 The multimedia conference pro- tocol H.323 of the International Telecommunication Union (ITU)consistsofmultipleseparateprotocolssuchastheH.245 for control signaling and H.225 for call signaling. H.323 is difﬁcult to implement because of its complexity and the bulkiness that it introduces into the client application.3 In contrast, SIP is simpler than H.323 and also leaner on the client-side application.SIPusesthe human-readableprotocol (ASCII) instead of H.323’s binary signal coding.
Voice Over Internet Protocol Basics SIP is the Internet Engineering Task Force (IETF) standard for multimedia communications in an IP network. It is an application layer control protocol used for creating, modi- fying, and terminating sessions between one or more SIP user agents (UAs). It was primarily designed to support user location discovery, user availability, user capabilities, and session setup and management. In SIP, the end devices are called UAs, and they send SIP requests and SIP responses to establish media sessions, send and receive media, and send other SIP messages (to send short text messages to each other or subscribe to an event notiﬁcation service). A UA can be a SIP phone or SIP
client software running on a personal computer (PC) or personal digital assistant (PDA). Typically, a collection of SIP UAs belongs to an admin- istrative domain, which forms an SIP network. Each admin- istrativedomainhasaSIPproxy,whichisthepointofcontact forUAswithinthedomainandforUAsorSIPproxiesoutside the domain. All SIP signaling messages within a domain are routed through the domain’s own SIP proxy. SIP routing is performed using Uniform Resource Identiﬁers (URIs) for addressingUAs.TwotypesofSIPURIsaresupported:theSIP URIandtheTELURI.ASIPURIbeginswiththekeywordsip orsips,wheresipsindicatesthattheSIPsignalingmustbesent overasecurechannel,suchasTLS.4TheSIPURIissimilarto an email address and contains a user’s identi ﬁer and the domainatwhich the usercanbefound.Forexample,itcould contain a username such as sip:email@example.com,a global E.164 telephone number5 such as sip:11-972-310- firstname.lastname@example.org;user¼phone, or an extension such as sip:email@example.com. The TEL URI only contains an E.164telephonenumberanddoesnotcontainadomainname, for example, tel:þ1.408.555.1234. A SIP proxy server is an entity that receives SIP re- quests, performs various database lookups, and then for- wards (“proxies”) the request to the next-hop proxy server. In this way, SIP messages are routed to their ultimate destination. Each proxy may perform some specialized function, such as external database lookups, authorization checks, and so on. Because the media does not ﬂow through the SIP proxiesdbut rather only SIP signalingd SIP proxies are no longer needed after the call is estab- lished. In many SIP proxy designs, the proxies are stateless,
- ITU-T Recommendation H.323, Packet-Based Multimedia Communi- cations System, 1998. www.itu.int/rec/T-REC-H.323-200606-I/en. 2. J. Rosenberg, H. Schulzrinne, G. Camarillo, J. Peterson, R. Sparks, M. Handley, E. Schooler, SIP: Session Initiation Protocol, IETF RFC 3261, June 2002. 3. H. Schulzrinne, J. Rosenberg, A Comparison of SIP and H.323 for Internet telephony, in: Proceedings of NOSSDAV, Cambridge, UK, July 1998.
- S. Fries, D. Ignjatic, On the applicability of various MIKEY modes and extensions, IETF Draft, March 31, 2008. 5. F. Audet, The use of the SIPS URI scheme in the Session Initiation Protocol (SIP), IETF Draft, February 23, 2008.
which allows alternative intermediate proxies to resume processing for a failed (or overloaded) proxy. One type of SIP proxy called a redirect server receives a SIP request, performs a database query operation, and returns the lookup result to the requester (which is often another proxy). Another type of SIP proxy is a SIP registrar server, which receives and processes registration requests. Registration binds a SIP (or TEL) URI to the user’s device, which is how SIP messages are routed to a user agent. Multiple UAs may register the same URI, which causes incoming SIP requests to be routed to all of those UAs, a process termed forking, which causes some interesting security concerns. The typical SIP transactions can be broadly viewed by looking at the typical call ﬂow mechanism in a SIP session setup, as shown in Fig. 60.1. The term SIP trapezoid is often used to describe this message ﬂow where the SIP signaling is sent to SIP proxies and the media is sent directly between the two UAs. If Alice wants toinitiatea session withBob, she sends an initial SIP message (INVITE) to the local proxy for her domain (Atlanta.com). Her INVITE has Bob’s URI (bob@ biloxi.com) as the Request-URI, which is used to route the message. Upon receiving the initial message from Alice, her domain’s proxy sends a provisional 100 Trying message to Alice, which indicates that the message was received without error from Alice. The Atlanta.com proxy looks at
the SIP Request-URI in the message and decides to route the message to the Biloxi.com proxy. The Biloxi.com proxy receives the message and routes it to Bob. The Biloxi.com proxy delivers the INVITE message to Bob’s SIP phone, to alert Bob of an incoming call. Bob’s SIP phone initiates a provisional 180 Ringing message back to Alice, which is routed all the way backto Alice;this causes Alice’s phone to generate a ringback tone, audible to Alice. When Bob an- swers his phone a 200 OK message is sent to his proxy, and Bob can start immediately sending media (“Hello?”) to Alice.Meanwhile,Bob’s 200 OKisroutedfromhis proxy to Alice’s proxy and ﬁnally to Alice’s UA. Alice’s UA re- sponds with an ACK message to Bob and then Alice can begin sending media (audio and/or video) to Bob. Real-time media is almost exclusively sent using the Real-time Transport Protocol (RTP).6 At this point the proxies are no longerinvolvedinthe calland hence the media willtypically ﬂow directly between Alice and Bob. That is, the media takes a different path through the network than the signaling. Finally, when either Alice or Bob want to end the session, they send a BYE message to their proxy, which is routed to the other party and is acknowledged.
One of the challenging tasks faced by the industry today is secure deployment of VoIP. During the initial design of SIP, the focus was more on providing new dynamic and powerful services along with simplicity rather than security. For this reason, a lot of effort is under way in the industry and among researchers to enhance SIP’s security. The subsequent sections of this chapter deal with these issues.
- OVERVIEW OF THREATS Attacks can be broadly classiﬁed as attacks against speciﬁc users (SIP UAs), large scale (VoIP is part of the network), and against network infrastructure (SIP proxies or other network components and resources necessary for VoIP, such as routers, DNS servers, and bandwidth).7 This chapter does not cover attacks against infrastructure; the interested reader is referred to the literature.8 The subse- quent parts of this chapter deal with the attacks targeted toward the speciﬁc host and issues related to social engi- neering. The taxonomy of attacks is shown in Fig. 60.2.
Reconnaissance of Voice Over Internet Protocol Networks Reconnaissance refers to intelligent gathering or probing to assess the vulnerabilities of a network, to successfully launch a later attack; it includes footprinting the target (also known as proﬁling or information gathering). The two forms of reconnaissance techniques are passive and active. Passive reconnaissance attacks include the collection of network information through indirect or direct methods but without probing the target; active reconnaissance attacks involve generating trafﬁc with the intention of eliciting responses from the target. Passive reconnaissance tech- niques would involve searching for publicly available SIP URIs in databases provided by VoIP service providers or on webpages, looking for publicly accessible SIP proxies or SIP UAs. Examples include dig and nslookup. Although passive reconnaissance techniques can be effective, they are time intensive. If an attacker can watch SIP signaling, the attacker can perform number harvesting. Here, an attacker passively monitors all incoming and outgoing calls to build a data- base of legitimate phone numbers or extensions within an organization. This type of database can be used in more advanced VoIP attacks such as signaling manipulation or Spam over Internet Telephony (SPIT) attacks.
Active reconnaissance uses technical tools to discover informationonthehoststhatareactiveonthetargetnetwork. Thedrawbacktoactivereconnaissance,however,isthatitcan be detected. The two most common active reconnaissance attacks are call walking attacks and port-scanning attacks. Call walking is a type of reconnaissance probe in which a malicious user initiates sequential calls to a block of telephone numbers to identify what assets are available for further exploitation. This is a modern version of wardial- ing, common in the 1980s to ﬁnd modems on the Public Switched Telephone Network (PSTN). Performed during nonbusiness hours, call walking can provide information useful for social engineering, such as voicemail an- nouncements that disclose the called party’s name. SIP UAs and proxies listen on UDP/5060 and/or TCP/ 5060, so it can be effective to scan IP addresses looking for such listeners. Once the attacker has accumulated a list of active IP addresses, he can start to investigate each address further. The Nmap tool is a robust port scanner that is capable of performing a multitude of types of scans.9 In addition, honeypots and honeynets are becoming increasingly popular, as it would help detect, prevent, or prepare to respond to attacks. A honeypot is a trap where vulnerabilities are deliberately introduced to lure attackers (hackers) and then analyze their activity (probing, security attack, or compromise). While a honeynet is a collection of honeypots, in the domain of VoIP, honeynets can be useful in preventing SPIT and VoIP Phishing (Vishing).
Denial of Service A denial-of-service (DoS) attack deprives a user or an or- ganization of services or resources that are normally available. In SIP, DoS attacks can be classiﬁed as mal- formed request DoS and load-based DoS.
Malformed Request Denial of Service In this type of DoS attack, the attacker would craft a SIP request (or response) that exploits the vulnerability in a SIP proxy or SIP UA of the target, resulting in a partial or complete loss of function. For example, it has also been found that some UAs allow remote attackers to cause a DoS (“486 Busy” responses or device reboot) via a sequence of SIP INVITE transactions in which the Request-URI lacks a username.10 Attackers have also shown that the IP imple- mentations of some hard phones are vulnerable to IP frag- mentation attacks [CAN-2002-0880] and Dynamic Host Conﬁguration Protocol (DHCP)-based DoS attacks [CAN- 2002-0835], demonstrating that normal infrastructure
protection (such as ﬁrewalls) is valuable for VoIP equip- ment. DoS attacks can also be initiated against other network servicessuchasDHCPand DNS, which serve VoIPdevices.
Load-Based Denial of Service In this case, an attacker directs large volumes of trafﬁc at a target (or set of targets) and attempts to exhaust resources such as the central processing unit (CPU) processing time, network bandwidth, or memory. SIP proxies and session border controllers (SBCs) are primary targets for attackers because of their critical role of providing voice service and the complexity of the software running on them. A common type of load-based attack is a ﬂooding attack. In case of VoIP, we categorize ﬂooding attacks into these types: l Control packet ﬂoods l Call data ﬂoods l Distributed DoS attack
Control Packet Floods In this case, the attacker will ﬂood SIP proxies with SIP packets, such as INVITE messages, bogus responses, or the like. The attacker might purposefully craft authenticated messages that fail authentication to cause the victim to validate the message. The attacker might spoof the IP address of a legitimate sender so that rate limiting the attack also causes rate limiting of the legitimate user as well.
Call Data Floods The attacker will ﬂood the target with RTP packets, with or without ﬁrst establishing a legitimate RTP session, in an attempttoexhaustthetarget’sbandwidthorprocessingpower, leading to degradation of VoIP quality for other users on the same network or just for the victim. Other common forms of load-basedattacksthatcouldaffecttheVoIPsystemarebuffer overﬂow attacks, TCP SYN ﬂood, User Datagram Protocol (UDP) ﬂood, fragmentation attacks, smurf attacks, and general overload attacks. Though VoIP equipment needs to protect itself from these attacks, these attacks are not speciﬁc to VoIP. A SIP proxy can be overloaded with excessive legiti- mate trafﬁcdthe classic “Mother’s Day” problem when the telephone system is most busy. Large-scale disasters (earthquakes) can also cause similar spikes, which are not attacks. Thus, even when not under attack, the system could be under high load. If the server or the end user is not fast enough to handle incoming loads, it will experience an outage or misbehave in such a way as to become ineffective at processing SIP messages. This type of attack is very difﬁcult to detect because it would be difﬁcult to sort the legitimate user from the illegitimate users who are per- forming the same type of attack.
Distributed Denial-of-Service Attack Once an attacker has gained control of a large number of VoIP-capable hosts and formed a “zombies” network under the attacker’s control, the attacker can launch interesting VoIP attacks, as illustrated in Fig. 60.3. Each zombie can send up to thousands of messages to a single location, thereby resulting in a barrage of packets, which incapacitates the victim’s computer due to resource exhaustion.
Loss of Privacy The four major eavesdropping attacks are: l Trivial File Transfer Protocol (TFTP) conﬁguration ﬁle snifﬁng l Trafﬁc analysis l Conversation eavesdropping
Trivial File Transfer Protocol Conﬁguration File Snifﬁng Most IP phones rely on a TFTP server to download their conﬁguration ﬁle after powering on. The conﬁguration ﬁle can sometimes contain passwords that can be used to directly connect back to the phone and administer it or used to access other services (such as the company directory). An attacker who is snifﬁng the ﬁle when the phone downloads thisconﬁguration ﬁlecan glean through these passwords and potentially reconﬁgure and control the IP phone. To thwart this attack vector, vendors variously encrypt the conﬁgura- tion ﬁle or use HTTPS and authentication.
Trafﬁc Analysis Trafﬁc analysis involves determining who is talking to whom, which can be done even when the actual conver- sation is encrypted, and can even be done (to a lesser de- gree) between organizations. Such information can be beneﬁcial to law enforcement and for criminals committing corporate espionage and stock fraud.
Conversation Eavesdropping An important threat for VoIP users is eavesdropping on a conversation. In addition to the obvious problem of conﬁ- dential information being exchanged between people, eavesdropping is also useful for credit-card fraud and iden- tity theft. This is because some phone callsdespecially to certain institutionsdrequire users to enter credit-card numbers, PIN codes, or national identity numbers (Social Security numbers), which are sent as Dual-Tone Multifre- quency(DTMF)digitsinRTP. Anattackercan use tools like Wireshark, Cain & Abel, voice over misconﬁgured Internet telephones (vomit), VoIPong, and Oreka to capture RTP packets and extract the conversation or the DTMF digits.11
Man-in-the-Middle Attacks The man-in-the-middle attack is a classic form of an attack where theattackerhasmanagedtoinserthimselfbetween the
two hosts. It refers to an attacker who is able to read, and modify at will, messages between two parties without either party knowing that the link between them has been compro- mised.Assuch,theattackerhastheabilitytoinspectormodify packets exchanged between twohosts, insertnew packets,or prevent packets from being sent to hosts. Any device that handlesSIPmessagesasanormalcourseofitsfunctioncould be a man-in-the-middle: a compromised SIP proxy server or session border controller. If SIP messages are not authenti- cated, an attacker can also compromise a DNS server or use DNSpoisoningtechniquestocauseSIPmessagestoberouted to a device under the attacker’s control. In a conventional enterprise network, VoIP phones are conﬁgured with different Virtual Local Area Network (VLAN) addresses as opposed to data devices. In such situations, the attacker would initially access the network by connecting his laptop to the existing data VLAN and then hop to the designated voice VLAN. This attack can be achieved in two ways: switch spooﬁng or double tagging. If a network switch is conﬁgured for autotrunking, the attacker converts it to a switch that needs to trunk. In the second method, the attacker sends data from one switch to another by sending frames with two 802.1Q headers (one for the victim’s switch and the other for the attacking switch). The victim’s switch accepts any incoming frames, while the target switch forwards the second frame (embedded with a false-tag) to the destination host based on the VLAN identiﬁer present in the second 802.1Q header. Once, inside the desired voice VLAN, the attacker could Address Resolution Protocol (ARP) poison the designated phones that would result in a man-in-the-middle attack.
Replay Attacks Replay attacks are often used to impersonate an authorized user. A replay attack is one in which an attacker captures a valid packet sent between the SIP UAs or proxies and re- sends it at a later time (perhaps a second later, perhaps days later). As an example with classic unauthenticated telnet, an attacker that captures a telnet username and password can replay that same username and password. In SIP, an attacker would capture and replay valid SIP requests. (Capturing and replaying SIP responses is usually not valuable, as SIP responses are discarded if their Call-ID does not match a currently outstanding request, which is one way SIP protects itself from replay attacks.) If RTP is used without authenticating Real-time Transport Control Protocol (RTCP) packets and without sampling synchronization source (SSRC), an attacker can inject RTCP packets into a multicast group, each with a different SSRC, and force the group size to grow expo- nentially. A variant on a replay attack is the cut-and-paste attack. In this scenario, an attacker copies part of a captured packet with a generated packet. For example, a security
credential can be copied from one request to another, resulting in a successful authorization without the attacker even discovering the user’s password.
Impersonation Impersonation is described as a user or host pretending to be another user or host, especially one that the intended victim trusts. In case of a phishing attack, the attacker continues the deception to make the victim disclose his banking information, employee credentials, and other sen- sitive information. In SIP, the From header is displayed to the called party, so authentication and authorization of the values used in the From header are important to prevent impersonation. Unfortunately, call forwarding in SIP (called retargeting) makes simple validation of the From header impossible. For example, imagine Bob has for- warded his phone to Carol and they are in different administrative domains (Bob is at work, Carol is his wife at home). Then Alice calls Bob. When Alice’s INVITE is routed to Bob’s proxy, her INVITE will be retargeted to Carol’s UA by rewriting the Request-URI to point to Carol’s URI. Alice’s original INVITE is then routed to Carol’s UA. When it arrives at Carol’s UA, the INVITE needs to indicate that the call is from Alice. The difﬁculty is that if Carol’s SIP proxy were to have performed simplistic validation of the From in the INVITE when it arrived from Bob’s SIP proxy, Carol’s SIP proxy would have rejected itdbecause it contained Alice’s From. However, such retargeting is a legitimate function of SIP networks.
Redirection Attack If compromised by an attacker or via a SIP man-in-the- middle attack, the intermediate SIP proxies responsible for SIP message routing can falsify any response. In thissection, we describe how the attacker could use this ability to launch a redirection attack. If an attacker can fabricate a reply to a SIP INVITE, the media session can be established with the attackerratherthanthe intendedparty.InSIP, a proxy orUA can respond to an INVITE request with a 301 Moved Permanently or 302 Moved Temporarily Response. The 302 Response will also include an Expires header line that communicates how long the redirection should last. The attacker can respond with a redirection response, effectively denying service to the called party and possibly tricking the caller into communicating with, or through, a rogue UA.
Session Disruption Session disruption describes any attack that degrades or disrupts an existing signaling or media session. For example, in the case of a SIP scenario, if an attacker is able to send failure messages such as BYE and inject them into the signaling path, he can cause the sessions to fail when there is no legitimate reason why they should not continue.
For this to be successful, the attacker has to include the Call-ID of an active call in the BYE message. Alternatively, if an attacker introduces bogus packets into the media stream, he can disrupt packet sequence, impede media processing, and disrupt a session. Delay attacks are those in which an attacker can capture and resend RTP SSRC packets out of sequence to a VoIP endpoint and force the endpoint to waste its processing cycles in resequencing packets and degrade call quality. An attacker could also disrupt a Voice over Wireless Local Area Network (WLAN) service by disrupting IEEE 802.11 WLAN ser- vice using radio spectrum jamming or a Wi-Fi Protected Access (WPA) Message Integrity Check (MIC) attack. A wireless access point will disassociate stations when it re- ceives two invalid frames within 60 s, causing loss of network connectivity for 60 s. A 1-min loss of service is hardly tolerable in a voice application.
Exploits Cross-Site Scripting (XSS) attacks are possible with VoIP systems because call logs contain header ﬁelds, and ad- ministrators (and other privileged users) view those call logs. In this attack, specially crafted From (or other) ﬁelds are sent by an attacker in a normal SIP message (such as an INVITE). Then later, when someone such as the adminis- trator looks at the call logs using a web browser, the specially crafted From causes an XSS attack against the administrator’s web browser, which can then do malicious things with the administrator’s privileges. This can be a damaging attack if the administrator has already logged into other systems (HR databases, the SIP call controller, the ﬁrewall) and her web browser has a valid cookie (or active session in another window) for those other systems.
Social Engineering SPIT is classiﬁed as a social threat because the callee can treat the call as unsolicited, and the term unsolicited is strictly bound to be a user-speciﬁc preference, which makes it hard for the system to identify this kind of transaction. SPIT can be telemarketing calls used for guiding callees to a service deployed to sell products. IM spam and presence spam could also be launched via SIP messages. IM spam is very similar to email spam; presence spam is deﬁned as a set of unsolicited presence requests for the presence pack- age. A subtle variation of SPIT called vishing is an attack that aims to collect personal data by redirecting users to- ward an interactive voice responder that could collect personal information such as the PIN for a credit card. From a signaling point of view, unsolicited communication is technically a correct transaction. Unfortunately, many of the mechanisms that are effec- tive for email spam are ineffective with VoIP, for many
reasons. First, the email with its entire contents arrives at a server before it is seen by the user. Such a mail server can therefore apply many ﬁltering strategies, such as Bayesian ﬁlters, URL ﬁlters, and so on. In contrast, in VoIP, human voices are transmitted rather than text. To recognize voices and to determine whether the message is spam or not is still a very difﬁcult task for the end system. A recipient of a call only learns about the subject of the message when he is actually listening to it. Moreover, even if the content is stored on a voice mailbox, it is still difﬁcult for today’s speech recognition technologies to understand the context of the message enough to decide whether it is spam or not. One mechanism to ﬁght automated systems that deliver spam is to challenge such suspected incoming calls with a Turing test. These methods include: l Voice menu. Before a call is put through, a computer asks the caller to press certain key combinations, for example, “Press #55.” l Challenge models. Before a call is put through, a com- puter asks the caller to solve a simple equation and to type in the answer, for example, “Divide 10 by 2.” l Alternative number. Under the main number a computer announces an alternative number. This number may even be changed permanently by a call management server. All these methods can even be enforced by enriching the audio signal with noise or music. This pre- vents SPIT bots from using speech recognition. Such Turing tests are attractive, since it is often hard for computers to decode audio questions. However, these puzzles cannot be made too difﬁcult, because human beings must always be able to solve them. OneofthesolutionstotheSPITproblemisthewhitelist.In awhitelist,auserexplicitlystateswhichpersonsareallowedto contacthim.AsimilartechniqueisalsousedinSkype;where AlicewantstocallBob,she ﬁrsthastoaddBobtohercontact list and send a contact request to Bob. Only when Bob has accepted this request can Alice make calls to Bob. In general, whitelists have an introduction problem, since it is not possible to receive calls by someone who is not already on the whitelist. Blacklists are the opposite of whitelists but have limited effectiveness at blocking spam because new identities (which are not on the blacklist) can be easily created by anyone, including spammers. Authentication mechanisms can be used to provide strong authentication, which is necessary for strong whitelists and reputation systems, which form the basis of SPIT prevention. Strong authentication is generally Public Key Infrastructure (PKI) dependent. Proactive publishing of incorrect information, namely SIP addresses, is a possible way to ﬁll up spammers’ databases with existing contacts. Consent-based communication is the other solu- tion. Address obfuscation could be an alternative wherein spam bots are unable to identify the SIP URIs.
- SECURITY IN VOICE OVER INTERNET PROTOCOL Much existing VoIP equipment is dedicated to VoIP, which allows placing such equipment on a separate network. This is typically accomplished with a separate VLAN. Depending on the vendor of the equipment, this can be automated using Cisco Discovery Protocol (CDP), Link Layer Discovery Protocol (LLDP), or 802.1x, all of which will place equip- ment into a separate “voice VLAN” to assist with this sep- aration. This provides a reasonable level of protection, especially within an enterprise where employees lack much incentive to interfere with the telephone system.
Preventative Measures However, the use of VLANs is not an ideal solution because it does not work well with softphones that are not dedicated to VoIP, because placing those softphones onto the “voice VLAN” destroys the security and management advantage of the separate network. A separate VLAN can also create a false sense of security that only benign voice devices are connected to the VLAN. However, even though 802.1x provides the best security, it is still possible for an attacker to gain access to the voice VLAN (with a suitable hub between the phone and the switch). Mechanisms that provide less security, such as CDP or LLDP, can be circumvented by software on an infected computer. Some vendors’ Ethernet switches can be conﬁgured torequire clients torequest inline Ethernet power before allowing clients to join certain VLANs (such as the voice VLAN), which provides pro- tection from such infected computers. But, as mentioned previously, such protection of the voice VLAN prevents deployment of softphones, which is a signiﬁcant reason that most companies are interested in deploying VoIP.
Eavesdropping To counter the threat of eavesdropping, the media can be encrypted. The method to encrypt RTP trafﬁc is Secure RTP (SRTP; RFC3711), which does not encrypt the IP, UDP, or RTP headers but does encrypt the RTP payload (the “voice” itself). SRTP’s advantage of leaving the RTP headers unencrypted is that header compression protocols (cRTP,12 ROHC13) and protocol analyzers (looking for RTP packet loss and (S)RTCP reports) can still function with SRTP- encrypted media.
The drawback of SRTP is that approximately 13 incompatible mechanisms exist to establish the SRTP keys. These mechanisms are at various stages of deployment, industry acceptance, and standardization. Thus at this point in time it is unlikely that two SRTP-capable systems from different vendors will have a compatible SRTP keying mechanism. A brief overview of some of the more popular keying mechanisms is provided here. One of the popular SRTP keying mechanisms, Secu- rity Descriptions, requires a secure SIP signaling channel (SIP over TLS) and discloses the SRTP key to each SIP proxy along the call setup path. This means that a passive attacker, able to observe the unencrypted SIP signaling and the encrypted SRTP, would be able to eavesdrop on a call. S/MIME is SIP’s end-to-end security mechanism, which Security Descriptions could use to its beneﬁt, but S/MIME has not been well-deployed and, due to speciﬁc features of SIP (primarily forking and retargeting), it is unlikely that S/MIME will see deployment in the fore- seeable future. Multimedia Internet Keying (MIKEY) has approxi- mately eight incompatible modes deﬁned; these allow establishing SRTP keys.4 Almost all these MIKEY modes are more secure than Security Descriptions because they do not carry the SRTP key directly in the SIP message but rather encrypt it with the remote party’s private key or perform a Difﬁe-Hellman exchange. Thus, for most of the MIKEY modes, the attacker would need to actively participate in the MIKEY exchange and obtain the encrypted SRTP to listen to the media. Zimmermann Real-time Transport Protocol (ZRTP)14 is another SRTP key exchange mechanism, which uses a Difﬁe-Hellman exchange to establish the SRTP keys and detects an active attacker by having the users (or their computers) validate a short authentication string with each other. It affords useful security properties, including perfect forward secrecy and key continuity (which allows the users to verify authentication strings once, and never again), and the ability to work through SBCs. In 2006, the IETF decided to reduce the number of IETF standard key exchange mechanisms and chose DTLS-SRTP. DTLS-SRTP uses Datagram TLS (a mechanism to run TLS overanonreliableprotocolsuchasUDP)overthemediapath. To detect an active attacker, the TLS certiﬁcates exchanged over the media path must match the signed certiﬁcate ﬁnger- prints sent over the SIP signaling path. The certiﬁcate ﬁn- gerprints are signed using SIP’s identity mechanism.15
A drawback with SRTP is that it is imperative (for some keying mechanisms) or very helpful (with other keying mechanisms) for the SIP user agent to encrypt its SIP signaling trafﬁc with its SIP proxy. The only standard for such encryption, today, is SIP over TLS which runs over TCP. To date, many vendors have avoided TCP on their SIP proxies because they have found SIP-over-TCP scales worse than SIP-over-UDP. It is anticipated that if this cannot be overcome we may see SIP-over-DTLS stan- dardized. Another viable option, especially in some mar- kets, is to use IPsec ESP (encapsulating security payload) to protect SIP. Another drawback of SRTP is that diagnostic and troubleshooting equipment cannot listen to the media stream. This may seem obvious, but it can cause difﬁculties when technicians need to listen to and diagnose echo, gain, or other anomalies that cannot be diagnosed by examining SRTP headers (which are unencrypted) but can only be diagnosed by listening to the decrypted audio itself.
Identity As described in the “Threats” section, it is important to have strong identity assurance. Today there are two mechanisms to provide for identity: P-Asserted-Identity,16 which is used within a trust domain (within a company or between a service provider and its paying customers) and is simply a header inserted into a SIP request, and SIP iden- tity,16 which is used between trust domains (between two companies) and creates a signature over some of the SIP headers and over the SIP body. SIPidentityisusefulwhentwoorganizationsconnect via SIP proxies, as was originally envisioned as the SIP archi- tecture for intermediaries between two organizationsdoften a SIP service provider. Many of these service providers operate SBCs rather than SIP proxies for a variety of rea- sons. One of the drawbacks of SIP identity is that an SBC, by its nature, will rewrite the SIP body (speciﬁcally the m¼/c¼lines), which destroys the original signature. Thus, an SBC would need to rewrite the From header and sign the new message with the SBC’s own private key. This effec- tively creates hop-by-hop trust; each SBC that needs to rewrite the message inthisway isalsoable tomanipulatethe SIP headers and SIP body in other ways that could be ma- licious or could allow the SBC to eavesdrop on a call. Alternative cryptographic identity mechanisms are being pursued, but it is not yet known whether this weakness can be resolved.
Trafﬁc Analysis The most useful protection from trafﬁc analysis is to encrypt your SIP trafﬁc. This would require the attacker to gain access to your SIP proxy (or its call logs) to determine who you called. Additionally, your (S)RTP trafﬁc itself could provide useful trafﬁc analysis information. For example, someone may learn valuable information just by noticing where (S) RTP trafﬁc is being sent (the company’s in-house lawyers are calling an acquisition target several times a day). Forcing trafﬁc to be concentrated to a device can help prevent this sort of trafﬁc analysis. In some network topologies this can be achieved using a Network Address Translation (NAT), and in all cases it can be achieved with an SBC.
Reactive An intrusion prevention system (IPS) is a useful way to react to VoIP attacks against signaling or media. An IPS with generic rules and with VoIP-speciﬁc rules can detect an attack and block or rate-limit trafﬁc from the offender.
Intrusion Prevention System (IPS) Because SIP is derived from and related to many well- deployed and well-understood protocols (HTTP), IDS/IPS vendors are able to create products to protect against SIP quite readily. Often an IDS/IPS function can be built into a SIP proxy, SBC, or ﬁrewall, reducing the need for a separate IDS/IPS appliance. An IDS/IPS is marginally effective for detecting media attacks, primarily to notice an excessive amount of bandwidth is being consumed and to throttle it or alarm the event. A drawback of IPS is that it can cause false positives and deny service to a legitimate endpoint, thus causing a DoS in an attempt to prevent a DoS. An attacker, knowledgeable of the rules or behavior of an IPS, may also be able to spoof the identity of a victim (the victim’s source IP address or SIP identity) and trigger the IPS/IDS into reacting to the attack. Thus, it is important to deny attackers that avenue by using standard best practices for IP address spooﬁng17 and employing strong SIP identity. Using a separate network (VLAN) for VoIP trafﬁc can help reduce the chance of false positives, as the IDS/IPS rules can be more ﬁnely tuned for that one application running on the voice VLAN.
limiting. This is often naïvely performed by simply rate limiting the trafﬁc to the SIP proxy and allowing excess trafﬁc to be dropped. Though this does effectively reduce the transactions per second the SIP proxy needs to perform, it interferes with processing of existing calls to a signiﬁcant degree. For example, a normal call is estab- lished with an INVITE, which is reliably acknowledged when the call is established. If the simplistic rate limiting were to drop the acknowledgment message, the INVITE would be retransmitted, incurring additional processing while the system is under high load. A separate problem with rate limiting is that both attackers and legitimate users are subject to the rate limiting; it is more useful to discriminate the rate limiting to the users causing the high rate. This can be done by distributing the simple rate limiting toward the users rather than doing the simple rate limiting near the server. On the server, a more intelligent rate limiting is useful. These are usually proprietary rate-limiting schemes, but they attempt to process existing calls before processing new calls. For example, such a scheme would allow processing the acknowledgment message for a previously processed INVITE; process the BYE associated with an active call, to free up resources; or process high-priority users’ calls (the vice president’s of ﬁce is allowed to make calls, but the janitorial staff is blocked from making calls). By pushing rate limiting toward users, effective use can be made of simple packet-based rate limiting. For example, even a very active call center phone does not need to send 100 Mb of SIP signaling trafﬁc to its SIP proxy; even 1 Mb would be an excessive amount of trafﬁc. By deploying simplistic, reasonable rate limiting very near the users, ideally at the Ethernet switch itself, bugs in the call pro- cessing application or malicious attacks by unauthorized software can be mitigated. A similar situation occurs with the RTP media itself. Even high-deﬁnition video does not need to send or receive 100 Mb of trafﬁc to another endpoint and can be rate- limited based on the applications running on the dedi- cated device. This sort of policing can be effective at the Ethernet switch itself, or in an IDS/IPS (watching for excessive bandwidth), a ﬁrewall, or SBC.
Challenging A more sophisticated rate-limiting technique is to provide additional challenges to a high-volume user. This could be donewhenitissuspectedthattheuserissendingspamorwhen theuserhasinitiatedtoomanycallsinacertaintimeperiod.A simple mechanism is to complete the call with an interactive voice response system that requests the user to enter some digits (“Please enter 5, 1, 8 to complete your call”). Though this technique suffers from some problems (it does not work well for hearing-impaired users or if the caller does not
understandtheIVR’slanguage),itiseffectiveatreducingthe calls per second from both internal and external callers.
- FUTURE TRENDS Certain SIP proxies have the ability to forward SIP requests to multiple UAs. These SIP requests can be sent in parallel, in series, or a combination of both series and parallel. Such proxies are called forking proxies.
Forking Problem in Session Initiation Protocol The forking proxy expects a response from all the UAs who received the received the request; the proxy forward only the “best” ﬁnal response back to the caller. This behavior causes a situation known as the heterogeneous error response forking problem (HERFP), which is illus- trated in Fig. 60.4.18 Alice initiates an INVITE request that includes a body format that is understood by UAS2 but not UAS1. For example, the user account control (UAC) might have used a MIME type of multipart/mixed with a session description and an optional image or sound. As UAC1 does not support this MIME format, it returns a 415 (Unsupported Media Type) response. Unfortunately the proxy has to wait until all the branches generate the ﬁnal response and then pick the “best” response, depending on the criteria mentioned in RFC 3261. In many cases the proxy has to wait a long enough time that the human operating the UAC abandons the call. The proxy informs the UAS2 that the call has been canceled, which is acknowledged by UAS2. It then returns the 415 (Unsupported Media Type) back to Alice, which could have been repaired by Alice by sending the appro- priate session description.
Security in Peer-to-Peer Session Initiation Protocol Originally, SIP was speciﬁed as a client/server protocol, but recent proposals suggest using SIP in a peer-to-peer (P2P) setting.19 One of the major reasons for using SIP in a P2P setting is its robustness, since there is no centralized con- trol. As deﬁned, “peer to peer (P2P) systems are distributed systems without any centralized control or hierarchical or- ganization.” This deﬁnition deﬁnes pure P2P systems. Even though many networks are considered P2P, they employ central authority or use supernodes. Early systems used
ﬂooding to route messages, which was found to be highly inefﬁcient. To improve lookup time for a search request, structured overlay networks have been developed that provide load balancing and efﬁcient routing of messages. They use distributed hash tables (DHTs) to provide efﬁ- cient lookup.20 Examples of structured overlay networks are CAN, Chord, Pastry, and Tapestry.21e24
We focus on Chord Protocol because it is used as a prototype in most proposals for P2P-SIP. Chord has a ring- based topology in which each node stores at most log(N) entries in its ﬁnger table, which is like an application-level routing table, to point to other peers. Every node’s IP address is mapped to an m bit chord identiﬁer with a pre- deﬁned hash function h. The same hash function h is also used to map any key of data onto a key ID that forms the distributed hash table. Every node maintains a ﬁnger table of log(N)¼6 entries, pointing to the next-hop node location at distance 2i1 (for i¼1,2.m) from this node identiﬁer. Eachnode inthe ringisresponsible for storing the content of all key IDs that are equal to the identiﬁer of the node’s predecessor in the Chord ring. In a Chord ring each node n stores the IP address of m successor nodes plus its prede- cessor in the ring. The m successor entries in the routing table point to nodes at increasing distance from n. Routing is done by forwarding messages to the largest node-ID in the routing table that precedes the key ID until the direct suc- cessor of a node has a longer ID than the key ID. Singh and Schulzrinne envision a hierarchical archi- tecture in which multiple P2P networks are represented by a DNS domain. A global DHT is used for interdomain routing of messages.
Join/Leave Attack Security of structured overlay networks is based on the assumption that joining nodes are assigned node-IDs at random due to random assignment of IP addresses. This could lead to a join/leave attack in which the malicious attacker would want to control O(logN) nodes out of N nodes as search is done on O(logN) nodes toﬁnd the desired key ID. With the adoption of IPv6, the join/leave attack can be more massive because the attacker will have more IP addresses. But even with IPv4, join/leave attacks are possible if the IP addresses are assigned dynamically. Node- ID assignment in Chord is inherently deterministic, thereby allowing the attacker to compute node-IDs in advance where the attack could be launched by spooﬁng IP addresses. A probable solution would be to authenticate nodes before allowing them to join the overlay, which can involve authenticating the node before assigning the IP address.
Attacks on Overlay Routing Any malicious node within the overlay can drop, alter, or wrongly forward a message it receives instead of routing it according to the overlay protocol. This can result in severe degradation of the overlay’s availability. Therefore an ad- versary can perform one of the following:
Registration Attacks One of the existing challenges to P2PeSIP registration is to provide conﬁdentiality. This also includes message integ- rity to registration messages.
Man-in-the-Middle Attacks Let’s consider the case where a node with ID 80 and a node with ID 109 conspire to form a man-in-the-middle attack, as shown in Fig. 60.5. The honest node responsible for the key is node 180. Let’s assume that a recursive approach is used for ﬁnding the desired key ID, wherein each routing node would send the request message to the appropriate node-ID until it reaches the node-ID responsible for the desired key ID. The source node (node 30) will not have any control nor can it trace the request packet as it traverses through the Chord ring. Therefore node 32 will establish a dialogue with node 119, and node 80 would impersonate node 32 and establish a dialogue with node 108. This attack can be detected if an iterative routing mechanism is used wherein a source node checks whether the hash value is closer to the key ID than the node-ID it received on the previous hop.25 Therefore, the source node (32) would get suspicious if node 80 redirected it directly to node 119, because it assumes that there exists a node with ID lower than key ID 107.
Attacks on Bootstrapping Nodes Any node wanting to join the overlay needs to be boot- strapped with a static node or cached node or discover the bootstrap node through broadcast mechanisms (SIP- multicast). In any case, if an adversary gains access to
the bootstrap node, the joining node can easily be attacked. Securing the bootstrap node is still an open question.
Duplicate Identity Attacks Preventing duplicate identities is one of the open problems whereby a hash of two IP addresses can lead to the same node-ID. The Singh and Schulzrinne approach reduces this problem somewhat by using a P2P network for each domain. Further, they suggest email-based authentication in which a joining node would receive a password via email and then use the password to authenticate itself to the network.
Free Riding In a P2P system there is a risk of free riding in which nodes use services but fail to provide services to the network. Nodes use the overlay for registration and location service but drop other messages, which could eventually result in a reduction of the overlay’s availability. The other major challenges that are presumably even harder to solve for P2PeSIP are as follows: l Prioritizing signaling for emergency calls in an overlay network and ascertaining the physical location of users in real time may be very difﬁcult. l With the high dynamic nature of P2P systems, there is no predeﬁned path for signaling trafﬁc, and therefore it is impossible to implement a surveillance system for law enforcement agencies with P2PeSIP.
End-to-End Identity With Session Border Controllers As discussed earlier,16 End-to-End Identity with SBCs provides identity for SIP requests by signing certain SIP headers and the SIP body [which typically contains the Session Description Protocol (SDP)]. This identity is destroyed if the SIP request travels through an SBC, because the SBC has to rewrite the SDP as part of the SBC’s function (to force media to travel through the SBC). Today, nearly all companies that provide SIP trunking (Internet telephony service providers, ITSPs) utilize SBCs. In order to work with16 those SBCs, one would have to validate incoming requests (which is new), modify the SDP and create a new identity (which they are doing today), and sign the new identity (which is new). As of this writing, it appears unlikely that ITSPs will have any reason to perform these new functions. A related problem is that, even if we had end-to-end identity, it is impossible to determine whether a certain identity can rightfully claim a certain E.164 phone number in the From header. Unlike domain names, which can have their ownership validated (the way email address validation
is performed on myriad websites today), there is no de facto or written standard to determine whether an identity can rightfully claim to “own” a certain E.164. It is anticipated that as SIP trunking becomes more commonplace, SIP spam will grow with it, and the growth of SIP spam will create the necessary impetus for the in- dustry to solve these interrelated problems. Solving the end-to-end identity problem and the problem of attesting E.164 ownership would allow domains to immediately create meaningful whitelists. Over time these whitelists could be shared among SIP networks, end users, and others, eventually creating a reputation system. But as long as spammers are able to impersonate legitimate users, even creating a whitelist is fraught with the risk of a spammer guessing the contents of that whitelist (your bank, family member, or employer).
Session Initiation Protocol Security Using Identity-Based Cryptography Authentication in SIP has been a major concern, and existing authentication schemes depend on PKI or shared secrets (passwords). Although PKI has existed for decades, the cost of maintaining the infrastructure has prevented enterprises from harnessing it to its fullest potential. In a PKI, the certiﬁcates contain a preset expiration date and if the validity date expires, or if the sender refreshes his keys, then the end user (callee) would have to obtain a new certiﬁcate from a public key repository. This retrieval process would involve the onerous task of certiﬁcate path construction and path validation processes. In such cases, Identity-based cryptography can be extremely useful as it eliminates the generation and maintenance of public key certiﬁcates. The basic idea behind an identity-based cryp- tosystem is that end users can choose an arbitrary string (SIP URI) which represents their identity to compute their public key. As a result, it expunges the need for certiﬁcates from Certiﬁcate Authority (CA).26 In addition, concatena- tion of user identity and Universally Unique Identiﬁer (UUID) to generate a public key would greatly simplify the revocation process.
- SUMMARY With today’s dedicated VoIP handsets, a separate voice VLAN provides a reasonable amount of security. Going forward, as nondedicated devices become more common- place, more rigorous security mechanisms will gain
importance. This will begin with encrypted signaling and encrypted media and will evolve to include spam protection and enhancements to SIP to provide cryptographic assur- ance of SIP call and message routing. As VoIP continues to grow, VoIP security solutions (see checklist, “An Agenda for Action for VoIP’s Security Chal- lenges”) will have to consider consumer, enterprise, and pol- icyconcerns.SomeVoIPapplicationscommonlyinstalledon PCs, such as Skype, may be against corporate security pol- icies. One of the biggest challenges with enabling encryption is with maintaining a PKI and the complexities involved in distributing public key certiﬁcates that would span to end users27 and key synchronization between various devices belonging to the same end user agent.28
Using IPsec for VoIP tunneling across the Internet is another option; however, it is not without substantial overhead.29 Therefore, end-to-end mechanisms such as SRTP are speciﬁed for encrypting media and establishing session keys.
VoIP network designers should take extra care in designing intrusion detection systems that are able to identify never-before-seen activities and react according to the organization’s policy. They should follow industry best practices for securing endpoint devices and servers. Current softphones and consumer-priced hardphones use the “haste-to-market” implementation approach and therefore become vulnerable to VoIP attacks. Therefore VoIP network administrators may evaluate VoIP endpoint tech- nology, identify devices or software that will meet business needs and can be secured, and make these the corporate standards. With P2PeSIP, the lack of central authority makes authentication of users and nodes difﬁcult. Providing central authority would dampen the spirit of P2PeSIP and would conﬂict with the inherent features of distributed networks. A decentralized solution such as the reputation management system, where the trust values are assigned to nodes in the network based on prior behavior, would lead to a weak form of authentication because the credibility used to distribute trust values could vary in a decentralized system. Reputation management systems were more focused on ﬁle-sharing applications and have not yet been applied to P2PeSIP. Finally, let’s move on to the real interactive part of this Chapter: review questions/exercises, hands-on projects, case projects, and optional team case project. The answers and/or solutions by chapter can be found in the Online Instructor’s Solutions Manual.
CHAPTER REVIEW QUESTIONS/ EXERCISES True/False 1. True or False? H.323 and Session Initiation Protocol (SIP) are the two substandardized protocols for the real- ization of VoIP. 2. True or False? SIP is the Internet Engineering Task Force (IETF) substandard for multimedia communica- tions in an IP network. 3. True or False? Attacks can be broadly classiﬁed as at- tacks against speciﬁc users (SIP UAs), large scale (VoIP is part of the network) and against network infra- structure (SIP proxies or other network components and resources necessary for VoIP, such as routers, DNS servers, and bandwidth). 4. True or False? Reconnaissance refers to intelligent gathering or probing to assess the vulnerabilities of a network, to successfully launch a later attack; it includes footprinting the target (also known as proﬁling or infor- mation gathering). 5. True or False? A denial-of-service (DoS) attack de- prives a user or an organization of services or resources that are not normally available.
Published @ September 27, 2021 5:09 am