Video Over IP for IT Engineers

Welcome to the fantastic and exciting world of real-time video and audio over IP. This is a dynamic world of media for the masses that fuels entertainment, sports, news, sales, and big business. This article will focus on the networking essentials necessary to better understand, manage and master live video delivery. We will list and elaborate network interference/faults to the video as it traverses the IP network, and how to overcome these typical challenges.


Why is Live Video so sensitive?

Video is more sensitive as it depends on pieces getting sent in the exact order they were sent. Like a convoy of trucks carrying special goods, video data must keep track of the order, speed, and payload of the trucks in order to reconstruct the original object. If you fail to deliver a fragment, you will lose a lot of data. Video data also travels through packet switching networks, which don’t guarantee delivery like older telco networks do. They instead try to get as many packets delivered as possible while relying on transport protocols to deal with problems. Because packet switching networks share resources, there is no guarantee that data will arrive on time. This makes video data more sensitive to delays and interruptions.


What’s the difference between live video and OTT video?

OTT services like Netflix, Apple TV, Hulu, Amazon, and YouTube are not live video, the content delivered is already ready to watch. The video data is broken up into chunks of frames. Each chunk has the same visual data, but a different quality and resolution. Then, these chunks are sent to caching servers that are close to the users. The system reads a manifest list when a user asks for a video and sends it to the caching server to download the chunks as quickly as possible. If a download fails, it might try to download a smaller piece to give it a better chance of succeeding. Before they are used, the downloaded chunks are put back together locally. Live videos are sent at a set speed, so this download option is not available for them. The receiver has to put together and decode the data, which includes commercials, graphics, subtitles, and other parts. Because of this, most OTT services are 2 to 45 seconds behind the original source.

In contrast, live video is extremely difficult; it must buffer incoming IP packets in their original order, detect and overcome any loss packets, read packets at a constant rate for decoding, and manage packet burstiness or buffer depletion. In some applications, the decoding function must be synchronized across multiple locations or streams in order for the video to be synchronized. Most live video services are delayed by a few frames to 500 milliseconds in order to remain relevant.

Your task is then to ensure that the network does not interfere with these goals and that you provide the best bandwidth, QoS, latency, low jitter, and minimal packet loss.

Know your organization’s protocol

The first thing you should learn and understand is the type of video services available in your organization. Your organization may use one or more of the following protocols or services:

Video conferencing

Video conferencing solutions have been around for a while; notable examples include Webex, Zoom, Goto, Teams, Google Meet, and others. They all have the same basic concept: users initiate a session by sending a connection request to a remote server. Once a call (Peer 2 Peer or Peer to Group) is initiated, the video becomes available. The video is only a few megabits in size, and each frame is encoded individually. The stream passes through a remote server, which also forwards video from peers to you. The uploading stream’s size can vary between frames to accommodate network conditions, or it can be completely stalled to overcome low network bandwidth. Because of the dynamic nature of frame encoding, the decoder can drop frames and pause the video to ensure a continuous session. Overall, video conferencing is extremely sensitive to latency (the video may trail other peers), jitter (which can result in dropped frames or unwanted quality drops), and packet loss. 


When latency, jitter, and packet loss hampered the service, the professional video industry turned to UDP and RTP to deliver live video over the IP network. They desired a pseudo-TCP without the TCP sending window limitation. Several solutions have been developed over the years, including source packet buffering, destination buffering, and a messaging system to request packet retransmission. Most organizations may use one or more of the following: SRT, RIST, and Zixi. What all of these protocols have in common is that they use the UDP protocol with packet ideation and timestamps, that the receiver has a predefined buffer size to cope with jitter and reordering, and that the same buffer also allows an application-level protocol to request retransmission of a packet that may have been lost in delivery. The time required to deliver the missing packet is limited by the size of the receiving buffer. If the lost packet is not delivered on time and placed in the buffer, the decoder will experience some sort of decoding problem. However, most solutions take this into account and attempt to allow multiple retransmissions to occur.  ARQ protocols are excellent at overcoming normal network packet loss, jitter, and latency, but they may fail due to a lack of bandwidth, high jitter spikes, or packet loss. Most solutions will send encoded data traffic ranging from 1 mbit to 80 mbit per stream.


NDI refers to lightly compressed video from a camera or video mixer. There are several flavors that may use TCP for local short-haul delivery and UDP with ARQ for long-haul delivery. Each frame is lightly compressed and delivered to its destination. This method allows for a 1-3 frame delay when editing locally. Most people use it for local content editing before sending it via one of the ARQ protocols listed above. The NDI is used for sports, news, and church video carriage with as little video degradation and delay as possible.  NDI is extremely sensitive to packet loss, jitter, and latency when archiving the low delay target. Most solutions will send encoded data traffic ranging from 50mbit to 120mbit per stream.


JpegXS is used for ultra-low delay encoding, in which a few lines of video are encoded and immediately sent to the destination. This results in less than a one-frame delay. JpegXs can be found at local production events such as sports, which require minimal delay and high video quality. JpegXs is very sensitive to packet loss, jitter, and latency when archiving the low delay target. Most solutions will send encoded data portions of traffic, which may range from 125mbit to 1.5 Gbit per stream.


SMPTE2022-6 is classified as a legacy generation of uncompressed video over IP. It takes the SDI digital video protocol and packs its bytes into a single UDP/RTP packet that can be sent over a local network. Because no compression is used, the generated stream has a very high bitrate, typically ranging from 2.5 to 10 Gbit depending on the source video’s original resolution and frame rate. Uncompressed video is extremely sensitive to jitter and packet loss, so only the strictest and highest QoS networks are used to transport such streams. Error recovery uses dual path stream delivery to overcome some packet loss events; see ‘Seamless switching’.


SMPTE2110 is considered the next generation of uncompressed video over IP, with support for compressed traffic as well. Similarly to SMPTE2022-6 uncompressed, it is designed for ultra-low latency delivery, but with more robust timing synchronization. Because no compression is used, the generated stream has a very high bitrate, typically ranging from 5 Gbit to 25 Gbit depending on the source video’s original resolution and framerate. Uncompressed video is extremely sensitive to jitter and packet loss, so only very strict and high QoS networks are used to transport such streams.


What can cause issues with live video

We need to understand what issues are causing our organization’s Live video to break so that we can find a solution to overcome them. As every IT professional knows, knowledge and understanding are essential for controlling network behavior and are a valuable tool in the book of tricks.


Packet Loss

Several factors contribute to packet loss, including routing changes, congestion, electrical interference, faulty connectors, and bandwidth limits.  Lost packets can cause a variety of visual and audio problems during the decoding process. Packet loss can be characterized as salt and pepper or burst. Bursty packet loss can also result in early buffer depletion, which causes receiver resynchronization. It is then the responsibility of the packet recovery protocol to attempt to recover the lost packet, either by requesting retransmission or using another technique.


Jitter is defined as the difference in delay between two consecutive packets. If this value exceeds the size of the Receiving buffer, a packet will be read, and the buffer will become empty. Jitter is the worst enemy of buffer control and maintenance; many people don’t consider it a problem, but this is a major oversight. With the Video Receiver being so sensitive to depletion, a long jitter event can cause the buffer to deplete, necessitating a synchronization operation by the receiver, resulting in additional packet loss. Jitter is caused by network elements pushing or stalling packets as they pass through an element. A packet may become stalled, which increases the delay between it and the previous packet, resulting in a large number. Other services with higher priority may be prioritized through a specific network element, resulting in such events. However, other flows on the same link may burst and interrupt consecutive packets of the same flow.  


Latency is defined as the time it takes for a packet to travel from source to destination. Latency is important for streaming protocols or applications that require a consistent latency throughout the delivery. Latency is the result of physical length combined with the delay of each network element. Latency can vary due to route changes that cause the packet to travel a different path length or congestion events that increase the delay of a network element.


The Round Trip Time (RTT) delay is the sum of the time a packet spends traveling from source to destination and returning. RTT is important in retransmission protocols, which send a message back to notify of a lost packet event or general statistics.  RTT is not always equal in time because it combines latency from source to destination and latency from destination back. RTT is critical for low latency streaming and high packet loss because it controls the packet request message, which may require multiple iterations. If the RTT time exceeds the buffer length and time, retransmission will fail. For error-free delivery, most best practice guides recommend a buffer size of at least 2 RTT, preferably 7.

Dynamic Routing

Dynamic routing occurs when routing protocols implement routing changes on a network (local or WAN segment). Dynamic routing can also be controlled by an IP load balancer, which attempts to reduce traffic on a network segment. The primary issue with dynamic routing changes is the creation of faster or slower paths for segments of a video to travel. It can also generate a void path, causing some packets to be lost. The same changes cause jitter and latency changes. Live video prefers a constant fixed latency route whenever possible, so dynamic changes are undesirable, but they may be unavoidable.  

Rogue flows

Rogue flows are unknown flows to the network planner or architect that appear unexpectedly and cause network degradation such as jitter, packet loss, and bandwidth drainage. A rogue flow could be another service, an FTP backup, or any high-bitrate data transfer over the same link used for live video delivery. Rouge flows must be avoided to ensure continuous and uninterrupted video delivery.


Bandwidth refers to the physical or contractual limit of a particular NIC or Link. If the bandwidth is equal to or less than the intended video service, some packets may be dropped by a limiter. To avoid packet loss, bandwidth limitations must be tested and verified both before and during streaming.


How do protocols handle network errors?

Over the last few years, the broadcast industry has worked hard to find ways to enable continuous delivery of live video over IP networks while also addressing network challenges head on. Several protocols and techniques are used, each tailored to the type of video profile (compressed, ultra-low compressed, lightly compressed with low latency, and uncompressed).  

Jitter buffer

Many protocols and techniques rely on jitter buffers as their first line of defense. The Jitter buffer job decouples network behavior from the receiver reading the buffered data at a consistent rate. Most jitter buffers protect against network jitter effects and reconstruct the original order of received packets to prevent re-ordering. The same index reconstruction is used for packet recovery via retransmitted packet insertion or forward error correction. However, the jitter buffer has a drawback in that it causes a delay in video traffic, which is more noticeable when lightly compressed flows are used.


Automatic Repeat reQuest (ARQ) is a technique used by the sender and receiver to provide packet recovery by resending one or more packets that have been reported missing. The technique relies on the sender buffering recently transmitted packets, which are then ready to be called for if the original packet appears on the receiver. The receiver, in turn, scans the incoming packet, and if it finds a missing packet, it sends a message to the sender to resend the packet. It sounds simple, quick, and straightforward, but it is not. While the Sender can buffer any number of packets, the receiver is limited by the size of its receiving buffer (which is typically shared with the jitter buffer). The Sender is challenged to avoid creating bursts of traffic that can aggravate network elements and cause further damage. ARQ relies on RTT to transmit messages, and most applications require 2RTT to 7RTT (or even more) for optimal recovery.

Forward Error Correction

Forward Error Correction (FEC) is a technique in which the Sender continuously adds additional data to the original video stream via IP packets or an adjacent IP flow. The extra information allows the receiver to recover some packets. The amount and method vary by technique, and the percentage of recovered packets ranges from 1 to 10%, depending on the type of FEC used and the amount of original data received. The main advantage of FEC is that there is no need for messaging between the Sender and Receiver, which is why it is the preferred technique for low-latency live streaming. The main disadvantage is the excessive continuous overhead to the IP stream, which may cause bandwidth limitations and aggravate network elements due to increased traffic.

Seamless switching

Seamless switching, also known as SMPTE2022-7, is a technique used in a variety of protocols that may include ARQ, FEC, or nothing at all. The technique involves sending two or more duplicate streams from sender to receiver. The streams share the same payload and indexing for each packet. The stream may have a small time difference. Both streams reach the Receiver, which uses a small buffer equal to the time difference between the two streams plus some extra headroom. The Receiver then observes each packet reception; if a packet is missing from one stream, it may be taken from the second because they both carry the same payload. Because the streams travel via different routes, an event on one stream will most likely not appear on the second stream, or the time difference may cause different packet loss events on both streams. This method achieves a very low delay at the expense of sending a duplicate stream, or 100% overhead when compared to other techniques.  The main drawback is that if the same packets are lost, there is no way to recover them without using FEC, for example. The technique is also very sensitive to latency differences and jitter, as these can alter the time relationship between the two streams. 

Redundant pathways

The Legacy technique involves using redundant paths to stream identical or representations of the same content over two different routes. One route is referred to as A path, and the other as B path. The management system determines the primary path to take. And if it detects an excessive number of outgoing errors on this path, it switches between them. The main disadvantage of this technique is the ‘glitch’ that occurs prior to the switch (lost packets that caused the switch) and the time required to switch and resync.  


What to look for?

One of the long-standing issues is the sampling time and the data that is produced. Many people use short-term network testing tools like Iperf or Ping. That is what we all learned and prepared for. However, this is a brief snapshot of the network. And the information does not indicate a correlation with any event; did a packet loss occur because a network element was congested? Maybe this was caused by a route change? For many years, the video industry advocated for continuous monitoring of its workflow at the packet and content levels. 

Let’s look at the big picture first

This screen capture displays one hour of network real-time data statistics. One can see both the event and its root cause in one view. Routing changes, as indicated by the HOP count, result in error rate spikes, latency changes, and RTT.

Packet loss, error rate, and burst loss –


As previously stated, it is critical to detect packet loss events, both long and short term, as well as sporadic. The IT team should investigate and monitor short transient events related to the IP stream.  To achieve high precision, the sampling frequency should be set to a few milliseconds. Packet loss information can be expressed as the number of packets lost in a given time period or as a percentage of the stream, depending on the transmission protocol used to address such error rates. For example, 50 packet loss in a 4000 packets is not the same as 50 packet loss in a 20000 packets. However, singular events such as packet loss bursts should be monitored because they pose a challenge to the majority of ARQ, FEC, and seamless switching recovery techniques. 


Several examples:

In this example, we can see the difference
between a low average packet loss ratio and transient packet loss events that
are ten times the average.

This example shows two spikes of short-length packet loss that occur consecutively. The overall error rate is
extremely low.



Jitter must be closely monitored to detect and alarm any deviations from the protocol’s assumptions. Jitter is never fixed and changes all the time.

Latency, or RTT

Latency and RTT variations are caused by network route changes and congestion at one or more HOP.

This example shows harsh latency spikes that the delivery protocol must address and adjust the jitter buffer for. The same holds true for RTT variations.

Dynamic Routing changes, HOP slows down

The routes are dynamic; it is critical to review and stay on top of network elements and dynamic routing changes that have a significant impact on end-to-end service. In the example, we see the route from A to B, and then B to A. The Route’s length and complexity are clearly different. The top route also includes a midpoint split or IP load balancer.


The RTT of each HOP should be monitored and evaluated for changes, as an increase indicates network or device congestion.

Source-to-destination flow monitoring 

A visual comparison of the IP flow between the Sender and the Receiver can provide a good indication of the network impact on the received stream (if it has a lower bitrate or burstiness).


Ultimately, the journey through the complex terrain of Video over IP for IT Engineers emphasizes the critical role network infrastructure plays in delivering high-quality live video content. As we navigate the myriad of protocols, network errors, and technological nuances, it becomes clear that the ability to fine-tune and adapt network behavior to the specific demands of video transmission is not only advantageous but also necessary. By implementing the insights and strategies discussed, IT professionals can significantly improve their organization’s video delivery capabilities, resulting in robust, seamless, and efficient video streams. Armed
with this knowledge, you will be better able to troubleshoot, optimize, and innovate within your network environments, keeping up with the ever-changing landscape of digital media.