200 OK Meaningless telecom statistics are a pain in the asr.


In the 1960s, General Telephone and Electronics Corporation, GTE, employed my father to visit their Central Offices in Georgia to connect measurement equipment to their telephone switches. The purpose was to monitor the system for evidence of problems. They came up with all kinds of measurements in that era; one of the most enduring was Answer-Seizure Ratio, ASR. The ASR is 100*(number of calls attempted)/(number of calls answered).

So if we place 1,750 calls in a day, and 1,650 of them are answered, then the ASR is 100*1650/1750=94.3% ASR is formally documented in ITU-T document E.411: "International network management - Operational guidance". One of the earliest written references to Answer-Seizure Ratio from a 1985 CCITT "Red Book" is a sort of apology for the measurement:

The answer/seizure ratio should be also based on historical records or, if available, on measurements taken during the period the route was used. 

Two key points here:

  • An ASR value for a specific route (call path) is meaningful only when compared to other values from the same call path
  • An ASR is meaningful only if it expresses a period of normal traffic

Let's look at what has happened since 1985:

There are lots of devices automatically answering calls. Recipients have good ways to avoid answering calls. And lots of people are calling that we don't want to talk to.
For example: suppose a political campaign starts calling a lot of your subscribers.  Those subscribers really don't want to hear from the "COMMITTEE TO REELEC" or "TOLL-FREE CALL" as caller-ID will tell them. So the calls go unanswered, or go to voicemail. If you're measuring ASR for those users (based on the actual calls answered by the endpoints), then the ASR may plummet during the calling campaign.
A low ASR doesn't necessarily mean the network broke. The problem could just be that some dufus is making a lot of calls.
Using ASR as a measurement of network behavior is not an interesting way to assess problems. Instead of trying to classify "success" (as in "an answered call"), look specifically for the problems in the signaling protocol.
  • Errors. How many calls failed with some sort of error? Here you have to be intelligent. For example, an INVITE that "fails" with a 401 is no failure at all. On the other hand, an INVITE that failed with 606 is probably a real failure.
  • Timeouts. How many calls failed due to some sort of timeout. This would indicate that the called device failed to reply to some sort of signaling message.