Multi-CDN real-time data unlocks new possibilities for streaming performance

Nabil Kanaan Principal Product Manager, Video, Verizon Media
and Terri Allegretto, Product Marketing Manager, Verizon Media

As streaming video services mature and proliferate, so does the consumer expectation for flawless video delivery. Multi-CDN is a technique initially created to improve the viewer experience by dynamically balancing workloads across different CDN providers. But multi-CDN also has the potential to expose information that unlocks new possibilities in how, when, and why streams are delivered through different providers.

Some of these variables include CDN price commitments and preferences, content (live vs. VOD), device type, geographical footprint, and ISP. When the multi-CDN platform can evaluate and control CDN selection using these commitments and preferences, providers are able to deliver the best quality video and improve accuracy, reduce false positives, and make decisions that optimize for cost and quality.

Streaming services need to consistently deliver quality experiences regardless of any issues that may occur. These can range from a regional or global CDN outage, a last-mile ISP issue, or a sudden traffic spike. Multi-CDN can not only mitigate viewer disruptions, but also provide streaming services more client-side control to optimize the cost/quality decision, and add intelligence to automate these decisions. Automating these capabilities through a self-healing and intelligent network makes this possible for all content types and events, not just for the once-a-year, high-visibility events where operations personnel have “eyes on glass” throughout the event.

Using our integrated multi-CDN implementation, known as Smartplay Stream Routing, let’s explore how our customers are utilizing flexible, sophisticated rulesets to control CDN distribution, and how we can augment the rulesets with dynamic data decisioning to ensure the best quality stream is delivered to the end user at the lowest cost.

Rulesets are configured to determine what content is routed to which CDNs at any time. Combinations of content, playback, client, and usage rules are available to handle various use cases.

  • Client rules identify the playback session by something specific relating to the viewer. These include, but are not limited to, parameters such as their city, state, or country, the OS they are using, the ISP they are connecting to, or the user-agent, for example, Roku.
  • Playback rules identify the playback session by its playback session ID.
  • Content rules can be set up for live events or a channel ID, asset ID, event ID, or external ID.
  • Usage rules identify aggregated data thresholds for specific CDNs that align with contractual commitments and volume-based price discounts.

Figure 1. Combinations of content, playback, client, and usage rules can be configured in the Streaming Control Center to handle various use cases.

Profiles determine how playback sessions are distributed to each CDN and consist of one or more rulesets for ultimate control over CDN distribution. For example, a service provider could set up a profile stipulating that 90% of Philadelphia viewers using Roku devices on the Comcast ISP receive content from CDN A, and 10% of this same group of viewers receive content from CDN B. Or, for example, a service provider could request that 75% of UK-based Chrome users watching a live event receive traffic from CDN C, and the balance of viewers receive traffic from CDN D. Profiles give service providers ultimate control over CDN distribution by identifying traffic via rulesets and then determining how to allocate playback sessions between one or more CDN(s).

As confident as you may be with your rulesets, issues will arise, such as a regional CDN outage, and you’ll need to override your preconfigured business rules. By using data intelligence, CDN distribution decisions can be automated, ensuring performance and service reliability. The Verizon Media Platform aggregates real-time data from different sources to dynamically shift traffic based on capacity, utilization, and QoS metrics to ensure your content is always available and delivered with the best quality. And the decisioning engine has APIs to complement the data already available in the platform with customer-specific data to make even better-informed decisions, automatically and without operator intervention.

Defining a CDN’s capacity restricts bandwidth utilization to ensure that it does not exceed a specified level. Real-time data is aggregated from third-party CDN providers. This data immediately gives a comprehensive view of CDN capacities and informs the decisioning engine to select the appropriate CDN delivery option.

Figure 2. API-level server-to-server integration with (near) real-time data obtained from CDN partner integrations dynamically allocates traffic based on capacity and utilization.

Through the quality of service rules (QoS), video streaming traffic is dynamically rerouted to ensure consistent video quality throughout each viewer’s experience. For example, a quality of service rule could dictate that traffic should shift to another CDN when a CDN’s round-trip time (RTT) is above 300ms, and/or the availability threshold is below 90%. This capability is CDN agnostic, and decisions on traffic routing are based on performance metrics with data originating on the client-side.

The figure below shows an example of a CDN (in yellow) with an RTT above 300ms. This triggers the QoS rule forcing the other two CDNs to deliver the content until the issue is resolved.

Figure 3. Dynamic QoS display with top graphs measuring CDN performance based on the average RTT, and bottom graphs showing the real time re-routing of CDN distribution.

CDNs can experience partial or even system-wide outages despite such “self-healing” tactics as shifting traffic across PoPs in an effort to recover. CDN outages can stem from any number of factors, including cybersecurity attacks targeting the CDN itself. As such, the risk of a CDN outage is a growing concern. Since it’s unlikely that multiple CDNs would experience outages at the same time, a dynamic multi-CDN strategy can ensure that video sessions continue without impacting quality of experience (QoE).

The diagram below shows a somewhat typical failure and recovery that occurred to a global CDN. Availability quickly started decreasing from 100% around 4:40 p.m. to almost 0% by 4:55 p.m. across the U.S. The recovery from such a failure typically takes some time and happens in segments. In this case, it took until 5:30 p.m. before the CDN was fully recovered and healthy.

Figure 4. Example of regional CDN outage.

The details about the failure, in this case, were captured by our Stream Routing data aggregator based on average RTT and CDN availability data (see diagram below) at a state-level of granularity. With such a broad cross-section of data available, we are able to obtain reliable data about RTT and system availability at scale. This capability enabled the system to react automatically and shift traffic mid-session to a better performing CDN. This demonstrates how real-time data can augment business rulesets to ensure a higher level of performance and availability. Additionally, the ability to automatically switch CDNs mid-session without the end user having to refresh their playback session further ensures you are delivering the most optimal viewing experience for your subscribers.

Figure 5. Report showing average RTT across multiple CDNs, highlighting problem with CDN 5.

The adoption of multi-CDN remains low, driven in part by the inherent complexity in implementing a solution that optimizes performance on a global scale. This complexity is around the operational “eyes on glass” approach typically applied to high-profile events, as well as the need to source the components for a multi-CDN architecture (including the data for dynamic decisioning) from different vendors. All of this is made more difficult by having to manage the ongoing need to integrate and maintain these components. With Smartplay Stream Routing, service providers can distribute traffic flow across several CDNs to increase capacity, optimize economics, or improve performance in certain parts of the world. This static approach may be sufficient for some companies and is comparatively simple to set up and manage.

Our video traffic system can support this dynamic strategy by tapping into performance data from across our global network to address problems before they impact QoE. When network outages or other problems are identified, the system automatically reroutes around them. The system is completely CDN-agnostic and makes decisions solely on performance metrics.

Formerly Verizon Media Platform, Edgecast enables companies to deliver high performance, secure digital experiences at scale worldwide.