Assuring Service Quality End-to-End Across Third-Party, Partner and Peering Networks

By Juniper Networks Oct 4, 2021 8:00am

As a communications service provider (CSP), when working with business partners through their third-party partner networks, interconnects, peering networks and the Internet, it can be very challenging to guarantee quality. Your customer’s perception of your business hinges on your ability to continuously deliver this expected level of service quality so that their applications run smoothly without performance lags. In the cloud and 5G era this problem has become even more difficult to solve because of increasingly high-quality and performance requirements for over-the-top and cloud applications, as well as new ultra-low latency 5G applications with extremely strict SLAs.

So why is it such a problem to handle performance across these end-to-end services that run over hybrid networks that may include SD-WAN, multi-domain WAN and multi-cloud?

Firstly, the configuration and policies of the end-to-end path are complex. Any non-optimal configuration along the way, such as an inappropriate QoS setting, queue-size, routing policy or traffic engineering metric, will ultimately impact the end performance. Even misconfigured firewall rules can lead to silent performance issues that are difficult to pinpoint. Traditional monitoring tools and telemetry solutions do not enable operators to see these types of issues.

Secondly, there is insufficient network performance troubleshooting visibility and insight into third-party and partner networks because they have network equipment that are not in the scope of your operations team’s management. Silent failures and performance issues that impact end-to-end services often lead to blame games and finger pointing due to siloed domains that include both external and internal networks.

For example, maintaining service quality for cloud services is extremely difficult when an increasingly large part of your network is no longer under your operation team’s own control. Such end-to-end services are typically implemented with VPNs and SD-WAN overlays, so that your customer’s services are now transiting not only your network, but possibly over the Internet, as well as third-party cloud and partner networks. In addition, while connecting end-users to cloud applications, these applications typically are distributed and potentially reside in multiple data centers in a multi-cloud architecture where end-to-end services may be delivered across multiple public and/or private clouds. These cloud services are also typically implemented within the data center as virtualized service chains which are extended to include multiple value-added services as part of the end-to-end connectivity service path. The end-to-end customer service delivery path can become quite complex, and without end-to-end network performance troubleshooting visibility and insight, some of these network domains and service segments have now become blind spots when there are service impacting network issues.

Most operations staff find this problem a daunting and almost herculean task to solve when today’s tools do not provide service quality visibility end-to-end across the whole delivery chain. Many tools offer siloed monitoring for various domains, but unfortunately cannot help operations with blind spots across third-party clouds, partner networks, and un-supervised internal domains that are not within direct scope of management.

Even with fast and robust network telemetry collection, data aggregation lakes and network analytics, there is still a major issue in that it is extremely difficult to feed in all the right data required from all the devices which application service traffic passes through.

Device telemetry data is often incomplete due to a lack of access to all devices along the service path, such as when it traverses partner domains, and over-the-top services. In addition, it is also difficult to map statistics from underlay to overlay – especially beyond fault and service impact towards service quality.

In general, looking at device telemetry many times does not give end-to-end network and service Key Performance Indicators (KPIs), but rather these types of KPIs need to be inferred through using a multitude of device specific statistics metrics. Due to this inherent limitation, device telemetry is unable to truly measure and guarantee end-to-end network quality for services.

What is required to address the challenges of today’s network operations is a solution which is service-centric, allowing the customer’s service quality and end-user experience to be continually tested proactively across the end-to-end service delivery chain.

Active Assurance is the ideal solution for assuring end-to-end service quality. It works by measuring what matters directly by actively sending a small amount of synthetic traffic on the data plane to simulate an end user. No management plane integration is required to perform active testing, and the services being tested may also traverse through un-managed devices and third-party partner and peering networks. Synthetic active testing can be done across layers 2 to 7 to offer a comprehensive solution for proactively measuring unique per-service SLAs.

Performance monitoring, troubleshooting and handing when working with business partners and third-parties is made easy when Active Assurance is used to enable:

Agreed upon partner network service-level objectives (SLOs)
Active network testing across partner networks during fulfillment
Continual SLO monitoring for real-time insight across partner networks

All of this removes the blame game with partners and shortens mean-time-to-resolution (MTTR) for troubleshooting network performance issues across end-to-end network services.

For example, one typical use-case where Active Assurance is used for extending testing to an external part of the network is for assuring applications that run over multi-cloud and data center interconnect.

With an Active Assurance solution, virtual test agents can be deployed to sit in both private data centers and public cloud platforms. These test agents measure how peering and transit with those public cloud platforms are performing. Deployment is very simple because spinning up a test agent as virtual machine or a Docker container at each service endpoint in these data centers can be automated to start immediate testing and measurements of the communications between these points. These test agents can also be deployed at the customer location to measure across the end-to-end service.

Multi-cloud and private cloud use-cases typically have network services that run over third-party partner networks or networks that are not in the direct scope of the operations team’s management. Operations mostly have little to no insight into exactly how traffic is transiting from their network into the partner network. Simply put, Active Assurance shows what is happening in terms of service quality inside this hidden part of the network service which otherwise would not be possible to gain visibility into.

With more and more applications moving into public clouds from private clouds and other internet services, keeping track of these hidden parts of the network is becoming extremely important to keeping customers happy, even though this part of the network has been outside of the scope of network operations.

To learn more about this topic and “Service assurance in the 5G and cloud era,” please check out our White paper and TM Forum Webinar.