1. Introduction
Customers want quality voice and video. This can be readily provided using engineered links -- i.e., paths that prioritize, reserve, or otherwise guarantee that the real-time voice and video packets will be delivered within the required timing constraints. But because of the wonderful cost reductions of Internet bandwidth, customers would prefer to get the quality voice and video services across the Internet, and not be forced to buy a special link to get that quality. Expecting the public Internet to provide adequate quality to make VoIP work reliably is an unreasonable expectation. This topic always reminds of the George Barnard Shaw quote:
The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.
At SIP Forum's SIPNOC 2014, we held a BoF "Birds of a Feather" session and Google Hangout to discuss ways to make Voice and Video via the Internet have better quality. We also had a conversation on the VoiceOps mailing list about this subject. The notes below reflect the comments made by folks in both fora.
2. Terminology
"Customer" refers to the customer of the ITSP. "CPE" (Customer Premise Equipment) refers to equipment installed a customer's site. In the context of this discussion, all CPE is for use exclusively by the Customer at whose site it is hosted. "ISP" refers to Internet Service Providers. In this context, both the ITSP and the Customer purchase service from an ISP to connect to the Internet and thereby communicate with one another over the Internet. "ITSP" refers to an "Internet Telephony Service Provider"; but in general it's any provider of Real Time, Interactive, N-way (N>1), Human media sessions. "OTT" (Over the Top) and "BYOB" (Bring Your Own Broadband) refer to a method of delivering servie where connection between the ITSP and the Customer is done over the public Internet, as opposed to a link where guarantees of performance are provided.
3. Miscellaneous Observations
3.1. Some methods of improving the media experience across the Internet are going to be more expensive than others. Some are thus going to be most applicable to larger customer sites, while others can be efficiently applied even to individual users/customers. 3.2. One major OTT/BYOB carrier reported seeing that some major companies (apparently ISPs) appears to be de-prioritizing VoIP traffic, making it work worse than typical IP traffic. 3.3. The techniques discussed here should apply well to Residential, Hosted PBX, and also to IP NNI (Network-to-Network Interconnections).
4. Detection methods
4.1. Individual Calls
4.1.1. RTCP or RTCP-XR received from individual calls 4.1.2. Monitoring of packets via a box in the middle (e.g., ITSP Border Element / SBC, or CPE edge device) 4.1.3. RTCP Extended data sent via SIP
4.2. Customer Sites
4.2.1. ICMP ping of customer all sites to detect which customer sites are having problems (and by extension, which customers' ISPs are having problems). 4.2.2. Test calls to the customer site, e.g., to a loopback function
5. Quality Improvement Techniques
5.1. Involving multiple ISPs
5.1.1. Multiple ISPs or IP Peering Points at ITSP
5.1.1.1. Maximum number to increase number of options to reach customers. 5.1.1.2. Ability to manually tune routing to an ITSP customer that has a static public IP address, to avoid a particular ISP that is having problems. Example: End customer is on ISP 3. ITPS uses ISP 3 and ISP 3. Dispute or routing problem between ISP 3 and ISP 1 causes poor performance. So ITSP changes routing so that traffic destined to customer always exits ITSP network via ISP 2. 5.1.1.3. Advertise smaller blocks of IP space with BGP path preferences to control how customers' traffic enters the network; i.e., try to groom all VoIP traffic to enter on the ITSP's via the better ISP du jour. 5.1.1.4. On the ITSP border elements (e.g., SBC) assign different IP addresses that route in exclusively via a single ISP. E.g., SBC has customer-facing IP 1 that is only advertised via ISP 1, and a separate customer-facing IP 2 that is advertised only via ISP 2. Then configure Customer A SIP CPE to REGISTER via either IP 1, with SRV failover to IP 4. Then for Customer B, it may be better for them to register via IP 2 with failover to IP 1. 5.1.2. Multiple ISPs at Customer Premise. Use one ISP as a preferred option at the customer premise, then failover to another if the first one is too degraded as measured by SLA monitoring in the CPE router.
5.2. Codecs
5.2.1. Traditional codecs
5.2.2.1. G.729 instead of G.722 or G.711, simply to reduce the bandwidth required.
5.2.2. Adaptive Codecs
5.2.2.1. AMR / AMR-WB. Popular with vendors to implement, but relatively expensive because of Intellectual Property Fees. 5.2.2.2. Opus. Generally considered equal or superior to AMR-WB, but newer and thought to be considered immature by vendors.
5.2.3. Media Transport tricks.
5.2.3.1. Packetization Time (ptime) changes. It *might* be possible to reduce packet loss by reducing the packets/second rate of a media stream, but any reduction in packet loss will bring a proportionate increase in payload delivery. Much of the equipment in use won't support ptime != 20 ms. 5.2.3.2. Media over TCP. WebRTC includes Media over TCP. There are studies showing that media over TCP, e.g., in TCP-based VPNs, improve something. But this probably means that jitter buffers are maximized. The group's outspoken views are that that using today's codecs over TCP is likely to make delay much worse. [http://www.voip-info.org/wiki/view/VOIP+and+VPN] claims that a study by Sirrix (a German security firm) reported no ill effects of VPNs on VoIP, but the study is no longer online. 5.3. Changing call path mid-call. Detect the problem (e.g., via RTCP) then re-INVITE to change the media flow to another IP address that uses a different routing path from the customer.
5.4. Customer Premise Equipment
5.4.1. Use a device at Customer Premise that can prioritize Voice traffic as it exits the VoIP CPE going toward PE router. This can be done with an application-aware device (SIP ALG) or via DSCP marking and prioritization. 5.4.2. Use a device at Customer Premise that can manage the PE-to-CE (ITSP to CPE) TCP flows to ensure that TCP flows slow down enough to allow the RTP flows. This likely requires an ALG, or, at least, a device that can recognize the distinct RTP flows that it is trying to protect.
5.5. Traditional QoS
5.5.1. Use QoS prioritization in the core of the network. 5.5.2. Use QoS prioritization at the customer premise, even though the link between ITSP and Customer is not protected. And use a QoS policy at the customer premise that matches what's in use in the core. 5.5.3. Strip DSCP markings as packets arrive from untrusted networks, i.e., ISPs and customer sites outside your control.
5.6. Miscellaneous Recommendations.
5.6.1. Avoid messing with the media as much as possible; in particular minimize codec conversions. 5.6.2. Stay on the same network end-to-end, whenever possible. E.g., if you have customers that connect to you using Comcast at their customer premise, then try to connect the ITSP border element to Comcast.
6. Contributors
Thanks for contributions in this discussion go to: Alex Balashov (Everiste Systems), Chris Boyd (Gizmo Partners), Chris Brown (ACS Alaska), Frank Bulk (Premier Communications), Ryan Delgrosso, Jim Gast (TDS Telecom), Gavin Henry (SureVoIP), Jesse Howard (Shoretel), Faisal Imtiaz (Snappy Telecom), Eric Jastak, Brandon Lehmann, (BitRadius), Mark R Lindsey (ECG), Anthony Orlando, Gernot Scheichl (Edgewater Networks), Dan York, as well as other anonymous participants at SIPNOC and on VoiceOps.