Low-latency live streaming with a faster CDN
Harkeerat Bedi, Research Scientist, Verizon Media, and Scott Yeager, Software Engineer, Verizon Media
Live sports are exciting to watch. Especially during pivotal moments, like when a shot comes out of nowhere to win the game. These moments can also be exciting for the technical team responsible for delivering fluid, real-time action. Live sports streams, which must balance a number of technical considerations and trade-offs, average around 30 seconds behind the live game on the field. Why the delay?
While content delivery networks are essential, they cannot reduce the latency caused by other parts of the video workflow. For example, latency is added from the moment of ingest when an image is converted into a signal. The raw signal then must be converted into a compressed format and transmitted to the video processing center, usually off-site and often in the cloud, which can be impacted by available bandwidth and infrastructure. Next comes the work of transcoding and packaging the content for various devices and bandwidths. Finally, as the stream is playing, advertising may be dynamically inserted into the stream just before it moves through the last mile of the Internet to the viewer’s device. It’s here that the player buffers, decodes, decompresses, and renders the final video segment. That’s a lot of steps in between the team’s game-winning goal and the content delivery network. And they can add up, especially when it has to happen for millions of viewers all at once. Latency in Super Bowl live streams, for example, average 28 to 47 seconds.
Reducing latency has become a focus for streaming service providers. For sports tied to split-second gaming bets, such as horse racing, delayed streams put remote participants at a disadvantage to those at the venue. Live tweets from viewers and newscasters at the venue can spoil exciting moments for fans watching on TV and live streaming. And with more and more viewers using a second screen while viewing live sports, it’s no wonder reducing the time behind live is becoming an important requirement for staying competitive and delivering a great viewing experience.
Reducing latency is an area of focus for us at Verizon Media. This effort involves researching and implementing incremental improvements across each of the processing steps and the other factors involved with delivering live streams. In this post, we look at what’s involved with one specific aspect of latency reduction — how our content delivery network handles the increased request volume that results from an increasingly popular low latency strategy — that of reducing segment size.
In the quest to reduce time behind live, streaming service providers are starting to embrace the use of shorter HLS or DASH segment durations. This can significantly reduce latency, but it has trade-offs such as additional overhead and increased risk of rebuffering. Whether these trade-offs are worthwhile depends entirely on the priority placed on latency compared to other QoE considerations. In some situations, as noted above, low latency is a top priority, while in others, current latency levels may be acceptable to deliver personalized advertising, 4K programming, or to allow for editing live content.
The role of segment size in latency
The streaming video industry has long used adaptive bitrate (ABR) technologies that break a stream into many individual video segments or chunks. Each segment is the same duration or size but encoded at different video quality levels or bit rates so the stream can adapt to the viewer’s available bandwidth as new segments are requested. Both of the main ABR protocols, Apple’s HLS and MPEG-DASH, provide controls for adjusting segment size.
Segment size plays a major role in latency because the player has to download a preset number of segments before it can start playing the live stream. This is done so the client video player can buffer enough video to ensure smooth video playback without rebuffering when there is congestion in the network. However, this also puts the stream behind live from the outset. Generally, embedded video players on iOS devices and web browsers buffer three video segments before starting playback. If a segment is four seconds long and the player has to buffer three segments before it can start playing, then the client is already 12 seconds behind live. The DASH protocol provides some flexibility by allowing manifest files to specify how much of a file needs to be buffered, but many DASH players and devices have yet to implement this functionality.
Reducing time behind live
Since buffering three segments is the de facto standard, the most popular technique for reducing the time behind live is to shrink the size or duration of each segment. In the example below, by reducing the segment size from four seconds to two seconds, the time behind live shrinks to just six seconds — half of what it would be with four-second segments.
Smaller segments can cause rebuffers
When using smaller segment sizes, the video workflow has to be more responsive to deliver a buffer-free live-video streaming experience. This is due to two factors:
First, by reducing the segment size, the player, which stores a fixed number of segments, is now storing less video. And since shorter segment sizes mean more files, your video workflow, and most importantly, the content delivery network must process and deliver twice as many file requests from players over a given stream duration. Because there’s less video buffered in the player during network congestion, it’s more likely that the congestion might cause a rebuffer. The player is now more sensitive to congestion, even during smaller congestion events.
Second, as we explained in a recent tech article, Optimizing the CDN for Live Streaming, it’s common in live sports to see surges in viewers when popular events start, or when a close game nears the final minutes. As the number of file requests goes up, the CDN needs to accommodate more file requests in the same amount of time. This task is compounded by a myriad of device types and connection speeds, as specified by adaptive bitrate parameters.
To illustrate the increase in file volume, Figure 2 shows a 16-second video segment delivered in different length segments. With four-second segments, only four files are needed to deliver the 16-second segment. But when we move to two-second segments, we need eight separate files — twice as many files that need to be processed through the CDN.
Improve segment delivery performance with Hot Filing
We’ve created a feature called Hot Filing to deal with the so-called “flash crowd” phenomenon when many live viewers join a stream at the same time. This feature refers to quickly replicating a popular segment or “hot file” to additional servers within a PoP (point of presence), so it can be delivered to viewers as fast as possible as demand rapidly increases.
By spreading the load to many servers, Hot Filing keeps any one server from getting overwhelmed as file requests suddenly spike. When a server gets overloaded, similar to a denial of service attack, the server will be slow to respond to file requests, potentially leading to rebuffering in the client player. By quickly identifying and replicating hot files, the risk of overloading a single server is much lower. Sudden changes in demand can now be met without adding latency.
Figure 3 shows how Hot Filing (Fig. 3.b) improves performance by preventing server overload. Without Hot Filing (Fig. 3.a), all traffic for a segment goes to Server 1 (S1). As audience demand spikes, the additional traffic continues to flow to S1, pushing it above its 100-user capacity. The situation continues to worsen as S1 serves 200 viewers at the peak. In contrast, Hot Filing (Fig. 3.b) handles this additional load by replicating files to two servers (S1 plus S2) and re-routing file requests to the newly available server.
Faster hot file identification
We recently enhanced Hot Filing by decreasing the time to move hot files to multiple servers to one second. We improved reaction time by changing the way hot files are identified within a PoP. We use a central process to aggregate file requests and byte counts for analysis. Previously, the data was pulled from the web server process on each server. While this worked fine, we discovered that a slow web server could slow down the aggregation of hot file data. To address this problem, the servers now write their request, and byte counts out to disc every second. As a result, when the central process pulls data, it doesn’t have to wait on the web server processes since the data is already written to a solid-state disk. This change alone is sufficient to accommodate the load for most live events.
The critical importance of a fast reaction time for live events is shown in Figure 3.c, which offers insight into how the Hot Filing process works to recruit additional resources. In the example shown in Figure 3.c, as the S1 server exceeds its 100-user capacity, files are quickly moved over to S2 as it reaches capacity. This lets the system accommodate all 200 users promptly and efficiently use the full capacity of available servers.
Hot Filing on multiple ports
For extremely popular events, such as professional football playoff games or major international soccer matches, spikes and surges in traffic can be very significant. Meeting this level of demand also requires changing how file segments are replicated to increase server capacity. Previously, the content delivery network was limited to replicating segments to one port per server. But now we can replicate files to multiple ports in each server. This increases the throughput of each server substantially so each PoP can handle more requests and thus much larger live events than before.
In our system, load balancing is handled by Cache Array Routing Protocol (CARP) hashing. For regular content, our approach has been to replicate the files across multiple servers, and we use CARP hashing to select a single port from each server. We do this to keep duplicate requests from being sent to the origin server and to limit inter-process communication.
Now, when a file gets hot enough, CARP starts selecting multiple ports per server to support even more requests. This approach works well for hot files since they are a small fraction of the unique files served by a server. The number of ports opened depends on the level of demand for a given hot file. This approach not only increases their data volume served per server but also increases the amount of CPU power available to process these requests.
As streaming service providers reduce the size of video segments to deliver a lower latency live stream, the Verizon Media Platform’s Hot Filing capabilities help the content delivery network manage the increased requests of video files, especially as the audience size grows for popular sporting events.