We found the issue was lying at one of our upstream providers and only affected traffic coming or going through that connection, because they failed to route our outgoing traffic while their uplink remained open and appeared normal. it was nearly impossible to immediately detect as no alarms of any kind are given when nothing appears to be down or broken.
While call traffic was building up (as many calls couldn't get out), we noticed a spike on one of our main call gateways and wrongly assumed it was frozen thereby forcing a soft(ware) restart, and when that did not help - issued a cold reboot, which forced even existing calls to drop & rerouted to a backup gateway which resulted in same outcome. We then isolated our upstream providers and immediately proceeded by turning off the affected connection. All services immediately resumed its normal operations.
Posted over 2 years ago. Mar 23, 2017 - 16:30 EDT
The issue has been identified and resolved. We're placing into Monitoring for the next 2 hours to look out for any additional errors.
Posted over 2 years ago. Mar 23, 2017 - 14:45 EDT
We just got reports from some customers unable to make/receive calls. Investigating..