200 OK Surprise! Some Hosted PBX Features Could Ruin Business Model, Complicate Network Planning


Advanced IP PBX Features Can Radically Change the Network Engineering, Support, and Costs for Hosted IP PBX Providers

Many Hosted PBX providers based on VoIP are surprised by the network load caused by the SIP Features "Shared Call Appearance" (SCA), "Shared Line Appearance", "Simultaneous Ring", "Busy Lamp Field" (BLF), or "Line State Monitoring". Both of the big Application Server players, Metaswitch and BroadSoft, offer these features. And carriers are starting to deploy them in spades. They're often together, so I'll call them collectively BLF/SCA/Simring.

All these features are fundamental departures in the underlying signaling model. I'll explain why, and what a VoIP Service Provider needs to do.

1,000% Increase in Signaling

Normal Modern call control requires only a few signaling SIP messages to setup or end a telephone call: INVITE, 100, 180, PRACK, 200, 200, ACK, BYE, 200. You'll get more messages when calls are put on hold, or switch to fax mode, and Metaswitch does session-audits with a re-INVITE every 30 seconds. But 10 or 20 messages are typical for a phone call.

Enter BLF/SCA. Now every phone in the group can get 6 or more NOTIFY-200 SIP messages for every call placed by other people in the group. Plus they get a call setup attempt for every single call for every person in their group.

The SIP signaling load per user grows enormously. A user who has 20 calls per day might only need 200 signaling messages for call control, but a user in a 5-person BLF/SCA group could have 1,100 messages per day. Raise that to ten-person group, and now it's 2,180 messages per day.

The signaling load grows superlinearly – in this case, it grows as n2 with the number of users in the group!

Surprise! Your system is full earlier than expected.

The real danger is not the messages per day – it's the messages per second at your peak. If your customers are clustered into one geographic area, that peak probably happens around 10:00am or 3:00pm local time. But for typical users, peak load will correspond to the daily workload.

And the problem is typically not in the routers and switches; signaling load is just more IP traffic. Solving problems in the transport network would be easy. But signaling load affects the application plane – i.e., the devices that process the SIP. Solving application-layer problems is much more complex, because (obviously) the application has to track the state and progress of all of the user-oriented business logic.

So what is most likely to get overloaded during the peak?

  1. The Session Border Controller. Even some folks at the SBC vendor have been caught off-guard in a few cases by this. In one case, a sales engineer told me that he had planned for a single SBC installation, but needed 3x SBC systems to handle all the BLF/SCA traffic. "The planning tool we had was all wrong," he said.
  2. The Application Server. When you're using BLF/SCA, the AS or Call Feature Server (CFS) may have to process 10x as many SIP dialogs, and has to handle 10x as many SIP messages. "The most expensive thing an Application Server does is process a SIP packets," a BroadSoft Systems Engineer once told me.

BLF/SCA and Simultaneous Ring are certainly very useful – but they come with a price.

Cool features – Big cost differences.

For example: suppose an SBC costs $67,500 for a non-redundant system and three years of support. Without BLF/SCA/Simultaneous Ring, you could expect to support 30,000 users on the system; i.e., that's $0.75/user/year CapEx for the SBC. But with BLF/SCA/Simultaneous Ring, your efficiency could could drop significantly – reasonably down to 5,000 users – i.e., $4.50/user/year CapEx. That's a 6x difference in cost per user – but still a great price for all the features an SBC provides.

The revenues from BLF/SCA/Simultaneous Ring are also great, but probably not proportional to the signaling load. Can you really charge 6x or 10x the price for a BLF/SCA/Simultaneous Ring customer? Of course not; but remember that the signaling load is not the only cost of the system. Technical talent, network transport, Customer Premise Equipment, sales and marketing etc., are all significant expenses that are largely unaffected by these features.

Not all SIP Phones are Created Equal

You've got to consider the Customer Premise Equipment selection. BLF/SCA are advanced new features, and do not enjoy the robust, mature, time-and-customer-tested support of ordinary call control. While Polycom has been supporting BLF/SCA for years and wide deployment, many other phones do not have the history. There's nothing magical about Polycom – they just have a head start on software maturity and reliability. And they have the experience with SIP-over-TCP necessary to make BLF work well.

Finally, test scale before you deploy. Testing a three-member BLF/SCA group in the lab is not adequate preparation to sell a ten-member BLF/SCA group. Because of the non-linear growth in signaling, and the requirement to use SIP over TCP for reliable BLF/Line State updates, these features do not scale up in clean intuitive ways.

Further, you've got to test the transport. Very low levels of packet loss can seriously affect SIP over UDP deployments of Busy Lamp Field. Your Gigabit-Ethernet VoIP lab probably isn't the best evaluate the reliability of your service, if that service is deployed using T1s to customers.

Before you deploy a big Simultaneous Ring / BLF / SCA group, you need to test and prove the reliability of that big group. You'll be sorry if you don't, and your customer will let you know just how sorry.

Proceed – with Caution.

Busy Lamp Field, Shared Call Appearance, Shared Line Appearance, and Simultaneous Ring are cool features, and well worth selling. But they constitute a genuine disruption to the ordinary Hosted IP PBX model that has been built with BroadWorks and Metaswitch for years. Network engineering, costs, and support all change in serious ways.