Cost optimization, convergence and productivity improvement are critical factors driving enterprises and network operators to adopt unified communication (UC), VoIP, telepresence/videoconferencing, cloud-based services, collaboration and virtualization. The underlying IP/MPLS transport is a key ingredient of this migration, since it provides enterprises and network operators with the ability to launch services quickly and cost effectively over managed or contracted cloud architectures. As enterprises and network operators place more critical services on IP networks, the ability to assure the delivery of these services over IP is becoming crucial.
Along with the growth in real-time IP service adoption, we have witnessed the evolution of solutions geared toward assuring these services. However, Heavy Reading's analysis shows significant gaps still exist between the needed capabilities and those currently available in the market. As human beings, not just computers, are typically the ultimate end-users of these real-time services, the emphasis is changing from a focus on "how the network is performing" to "what is the user experience" for these applications.
Business-critical real-time services are defined as those communication and collaboration services that are extremely sensitive to "conversational quality" – where the timing of the transmitted and received information is an essential component of the perceived service quality. Latency, packet drops and jitter typically are used to measure this conversational quality in IP networks – but many factors, ranging from the simple to the complex, contribute to service-quality degradation in IP networks. Examples of simple factors are network congestion, hardware faults and incorrect equipment configuration. Complex factors are usually routing related, typically of short duration, and difficult to isolate: route flaps, BGP peer resets, no routes/black holes, routing loops, etc.
It is understood that the underlying topology of IP networks is constantly changing. Heavy Reading's research clearly points to the fact that the paths through the network formed from the routing protocols are always adapting to real-time events in the network. This instability can be long-lived, persistent, of short duration or periodic. Regardless, all these events can have a significant impact on the paths taken by packets through the network and the ensuing end-user QoE. Network instabilities can be related to software/hardware defects, configuration errors, and transient physical/data-link issues. These triggers result in routing convergence events in which the IP routing protocols are required to recalculate shortest paths. Convergence can be slow, and the packet delivery is neither guaranteed nor deterministic during convergence events. Additionally, IP networks are susceptible to oscillations of routing changes, called route flaps, which cause re-convergence events to occur repeatedly. This results in poor end-to-end network performance, increased packet drops, or delivery of packets out of order, which degrades the efficiency with which packets are forwarded through the network.
Legacy Internet services, such as email, HTTP (Web), and file transfers, are more tolerant than real-time services to delay, latency, and retransmissions, so they are not as significantly affected by these instabilities. Real-time services, however, require guaranteed and deterministic packet delivery. They do not have the luxury of retransmissions and are extremely sensitive to degraded forwarding characteristics of routing instabilities.
Current mechanisms for IP traffic monitoring and data collection are not sufficient for analyzing these issues in real-time services. Monitoring based on log analysis, SNMP polling, periodic KPI gathering, probes, etc., cannot see any of these network events themselves, but only the end result of these events. So, when the service management/operations team tries to determine the root cause, it is almost impossible to accurately determine the causality between the network and the service. This results in skilled IP and services resources spending hours, if not days, trying to manually correlate the multiple network domains. Real-time services represent a significant change in the nature of traffic carried by IP networks. Because real-time services are extremely susceptible to packet loss, delay, jitter, and transient routing changes, assuring real-time services requires extra rigor in monitoring both the communication and the underlying network. It also demands the ability to correlate across multiple network layers and organizational silos to shed insight into service performance and end-user experience.
Managing IP service problems created by events in the network requires the evolution of service assurance solutions to what we define as Real-Time IP Service Assurance (RISA). RISA is a class of technology that addresses key pain points and concerns that enterprise and network operators are currently facing in the deployment of real-time services. RISA solutions must understand and manage real-time IP services based on the user's experience, not the network manager's view – providing end-to-end session visibility and real-time IP network visibility, and assuring services, based on real-time correlation and analysis of both the network and service layers in a single platform.
In Heavy Reading's opinion RISA should be treated as a specialized segment of the service assurance category and is equally applicable in enterprise and network operator markets. RISA solutions specialize in monitoring and assuring high-value IP services, such as VoIP, desktop video, HD telepresence, and HD video conferencing, which are easily impacted by events in the IP network due to their low tolerance for delay, latency, jitter, and packet loss.
Existing solutions are available for some aspects of RISA. The ideal approach to RISA, however, takes a cohesive, integrated approach to monitoring and automatically correlating currently disparate data from the network, media, and session layers. This automatic correlation should be on a per session basis so that per-session user experience analysis is available to enterprises and network operators in real-time to ensure that issues are visible before the end-customer complains. Comprehensive RISA solutions will be important for enabling the continued growth of real-time voice and video services for both enterprises and network operators. As demand for business-critical real-time IP services expands, we predict a bright future for this new market sub-segment.
Ari Banerjee is a senior analyst at Heavy Reading