Video Perspectives
Measuring Quality of Experience for Over-the-Top Video Services
Download PDF
Mobile viewers are
quickly coming to
expect the highly
reliable, high-quality
delivery they demand
from traditional
broadcast television
or IPTV services.
INTRODUCTION
Over-the-Top (OTT) video traffic volumes continue to double every year and are predicted
to represent a full two-thirds of all mobile traffic by 2019. This staggering growth rate
is fueled by the unstoppable popularity of OTT video content from sites such as YouTube
and Netflix, and the proliferation of smart devices with larger screens and more powerful
processors which are well suited to play short- or long-form video content.
The lines between traditional television and mobile broadband are blurring as smart
televisions support built-in Wi-Fi connectivity and smartphones and tablets support 1080p
video and include HDMI output ports. This convergence not only drives a hybridization of
media devices, it skews viewer expectations. Mobile viewers are quickly coming to expect
the highly reliable, high-quality delivery they demand from traditional broadcast television
or IPTV services.
Mobile and fixed broadband service providers face the challenge of satisfying subscriber
expectations while struggling to manage the growing volume of OTT video traffic on their
networks. While video services offer the promise for new revenue streams, poor video
Quality of Experience (QoE) can quickly become a leading cause of subscriber churn.
Understanding QoE in the context of various service scenarios is key to offering revenuegenerating
video services or managing subscriber churn based on consistently poor QoE.
As video traffic has grown on mobile and fixed broadband networks, the tools to measure
the quality of video have not developed sufficiently to provide an accurate and scalable (i.e.
network-wide) means of measuring how subscribers experience video and their resulting
satisfaction levels. Traditional broadcast methods of measuring video quality are highly
accurate but not scalable. Network QoS KPIs on the other hand are scalable but do not
accurately reflect QoE. Conventional IPTV video quality measurement standards cannot cope
with the multitude of technologies and standards in the OTT environment.
Furthermore, as video traffic growth continues to outpace increases in network capacity,
video QoE is degrading. Figure 1 illustrates that during congested periods of the day,
sessions exhibiting poor QoE can represent over 50% of overall video traffic on the network.
With video traffic representing roughly 50% of all traffic, this means that—at times when
bandwidth is most expensive—over 30% of it is arguably being wasted, as it is being used to
deliver a poor experience.
This paper explores the need for a scalable technology that can provide perceptual QoE
scores that accurately represent what every subscriber on the network is experiencing.
Furthermore, the paper provides an explanation of how QoE can be measured for OTT video in
an accurate and scalable manner, and includes background concepts relating to the OTT video
lifecycle, streaming technologies and common video quality issues.
Figure 1
Video Perspectives
OTT VIDEO: MEDIA STREAMING FUNDAMENTALS
With the popularity of OTT video sites, such as YouTube and Netflix, video has become
pervasive and easy to access for all subscribers. But few people ever stop to consider how it
gets created and eventually viewed on a mobile device.
OTT VIDEO CONTENT LIFECYCLE
The lifecycle of any video content begins with content creation. Once a video has been shot
or captured, the content may be edited and one or more video clips may be authored from
that edited content. Authoring typically involves compressing the video (and audio) to reduce
the file size, followed by wrapping the compressed video (and audio) in a specific file type
(or container). The file is then hosted on a (media) server or Content Delivery Network
(CDN) where other users can access it. From the server or CDN, the file can then be
streamed to a client, where it is decoded and played on a device for viewing.
The source, and therefore the quality of the content, can range from user-generated videos
shot using a smartphone to major studio productions shot by a professional camera crew
using commercial-grade equipment. With social media sites, video tends to be usergenerated
which means the content is typically shot using lower-grade equipment, and the
resulting source quality tends to be lower.
Content is authored, either by a user who does their own editing, or automatically. The
processes that automatically author the content are often hidden to the user as they are
done directly on the capture device or transparently as part of uploading to the content
provider. YouTube uses the latter approach and encodes the video using a codec, such as H.264. Encoding reduces the bandwidth required to stream the video content over a network in real-time. All of the information pertaining to this video (metadata, video and audio) is then
put into a media container, such as FLV, MP4, MPEG2-TS or WebM. Some sites may encode
and store the content in multiple formats, with different resolutions, bitrates, codecs and
containers.
Adaptive streaming
technologies have
been introduced which
enable clients to
respond to changes in
network throughput
and switch to lower
bitrate streams when
the network
is congested
The video file is stored on a server or CDN where users can access it. The video file is
either downloaded (as is the case with iTunes) or streamed (as is the case with Netflix).
In order to watch downloaded video, the entire file must be received before playback can
begin, which can take a long time. The file is stored ‘more permanently’ and is available
for future consumption. For steaming video, playback begins almost immediately after the
user requests the content. The file is typically stored in a ‘more temporary’ location, and is
generally not available for future consumption. It is important to note that video files tend to
be streamed and not downloaded so that playback can begin before the entire file has been
received. When a subscriber requests the video content it is delivered from the content source
by streaming across a packet network. The client buffers sufficient incoming data to enable
continuous real-time decoding and playback.
This process may sound simple enough, but it is complex and full of opportunities for issues
to arise; issues that affect the quality of the delivery and/or the quality of the presentation.
Even a small percentage of packet loss can cause quality degradation. Some streaming
technologies include mechanisms for the device to switch to a lower bitrate profile (such
as adaptive streaming) in order to use less bandwidth on the network and increase the
probability of successful delivery.
STREAMING OTT VIDEO
In order to watch video on an internet-connected device, several components are required.
A media server on the network (or CDN) provides a source for the video content, which is
streamed over the network to an Internet-connected device with a client, which can receive
and display the video.
The content should be streamed in real time or faster. That is to say, the video data must
be sent at the same rate or faster than the rate required to sustain real-time playback.
When the content is streamed faster than the playback rate, video data accumulates in the
client’s buffer. This buffering helps prevent playback interruptions such as stalling and
can compensate for changes in network throughput. More recently, adaptive streaming
technologies have been introduced which enable clients to respond to changes in network
throughput and switch to lower bitrate streams when the network is congested.
With sufficient network throughput, a client receives the video data at a faster rate than
playback. Therefore brief outages or reductions in throughput can be tolerated without
impacting QoE, as long as the buffer stays full. However, during times of congestion or poor
connectivity, the video buffer may become empty which will result in stalling (hourglass) and
therefore poor QoE. If an adaptive streaming protocol is in use, the client can try switching to
a lower bandwidth stream, which may reduce stalling, but will degrade visual quality through
a reduction in resolution and bitrate.
There are numerous OTT video streaming technologies, below is a description of the most
popular protocols.
Figure 2
The OTT video lifecycle
PROGRESSIVE DOWNLOAD
Progressive download uses standard HTTP/TCP to stream (not download) content to the
client as quickly as possible, maximizing buffering potential for smooth playback. This
protocol is relatively simple and widely adopted across the Internet, most notably by YouTube.
Progressive download is suited to unicast applications only and does not support multicasting
(for live events). The most popular container formats delivered via progressive
download are FLV and MP4.
Figure 3
A comparison of typical
buffering strategies for
conventional RTSP
versus Progressive
Download
OTT video streaming technologies
REAL-TIME STREAMING PROTOCOL
Real-time Streaming Protocol (RTSP), along with Real-time Transport Protocol (RTP) and
Real-time Transport Control Protocol (RTCP), is commonly used to deliver live and on-deck
content as well as Video- On-Demand (VOD) services. RTSP is used to establish and control
the media session, to issue commands during the session, and is delivered over TCP. Most
RTSP servers use RTP to deliver the media streams, typically over unreliable UDP. The media
is therefore prone to significant quality degradation due to packet loss. Because of this, another approach called RTSP interleaved (which interleaves the RTP and RTCP data with
the RTSP data) can be used. Instead of having one flow for RTSP and separate flows for audio
and video tracks, a single RTSP/TCP flow is used. The RTSP data is sent as is, while RTP and
RTCP are multiplexed through virtual channels.
REAL-TIME MESSAGING PROTOCOL
Real-time Messaging Protocol (RTMP) is a protocol developed by Macromedia (now Adobe)
for streaming audio, video and data to a Flash player. Common variants include RTMPE
(encrypted) and RTMPS, which work over an SSL connection. RTMP is supported by both Flash
Media Server and Flash clients.
ADAPTIVE OR DYNAMIC STREAMING
With adaptive streaming, the client detects network bandwidth availability and dynamically
switches across multiple streams of differing bitrates to seamlessly deliver the content.
These protocols are founded on the premise that smooth delivery is the biggest contributor
to overall high video QoE. How the client decides which stream to select is specific to
the client. Some clients are more aggressive and will select the best quality stream first,
whereas others are more conservative and will select lower-quality streams and monitor
performance before switching to improve quality. There are many examples of this technology
including HTTP Live Streaming (HLS), HTTP Dynamic Streaming, Microsoft Silverlight Smooth
Streaming, and Netflix Streaming Service.
The impact of dynamic streaming protocols is to offer a real-time tradeoff between the visual
fidelity of the video and the throughput. However, due to the fact that clients only become
aware of network congestion after the fact, dynamic streaming tends to be reactive and
causes a high degree of visual quality variation, which in itself can lead to an overall worse
QoE for the subscriber.
HTML5 VIDEO
HTML5 augments and expands the HTML standard to include a method to natively embed
video on a website. This approach eliminates the dependence upon third-party browser
plug-ins. HTML5 is supported by newer browsers such as Internet Explorer 9, FireFox 3.5,
Safari 3.0, Chrome and Opera. While the standard is open, there are competing interests,
the standard is in flux, and browser vendors are free to support any video format they feel
appropriate. YouTube uses HTML5 to deliver content to Apple iOS devices such as the iPhone
and iPad. YouTube uses HTML5 to deliver content to Apple iOS devices such as the iPhone
and iPad.
PEER-2-PEER TV
Peer-2-Peer TV (P2PTV) delivers media over multiple peer connections. In P2PTV, each client
downloads a video stream while concurrently uploading that stream to other P2P users. This
approach is akin to a real-time BitTorrent. Streams are typically time- delayed by several
minutes compared to the original source content. Video quality is a factor of the number
of subscribers in the peer network with quality improving as the number of subscribers
increases. There are many P2PTV networks including PPLive, SopCast, StreamTorrent, Veetle,
and SwarmPlayer.
VIDEO CONFERENCING AND VIDEO CHAT
Video chat applications such as Skype or Apple FaceTime are introducing a whole new
set of OTT video use cases. The key difference between video chat and streaming is that
video chat needs to be delivered at a very low latency in order to satisfy real-time two-way
communication and it must be streamed bi-directionally. Popular video chat services
include Skype (proprietary, RC4 encrypted signaling protocols) and Apple’s FaceTime (SIP and
RTP based).
Every step of the video
content lifecycle can
contribute to video
quality issues, affecting
the subscriber’s QoE.
VIDEO QUALITY ISSUES
Every step of the video content lifecycle can contribute to video quality issues, affecting the
subscriber’s QoE.
CAPTURE
Poor video capture can be the result of a poor capture environment, e.g. lighting, a
low-quality lens, poor focus, low resolution, camera motion, etc. With the exception of
sophisticated pre- and post-processing of the captured video, it is very difficult for any future
step in the lifecycle to improve the quality of poorly captured content.
AUTHORING/ENCODING
The authoring step can introduce additional quality issues due to the use of lossy
compression algorithms, which are necessary in order to bring the required bandwidth down
to usable levels for real- time streaming of the video content. For audio, these quality issues
can appear as a result of reducing the sampling rate or number of channels relative to the
original (captured) content as well as the codec itself. Some of these audio artifacts include
ringing, echo, dropouts and hissing. For video, these quality issues appear as a result of
reduced bitrate, resolution or frame rate relative to the original (captured) content. Some
of these artifacts include blocking, blurring, jerkiness, trailing artifacts, ‘mosquito’ noise,
ringing, contouring, beating and breathing.
TRANSMISSION
There are two major network factors that affect video quality: congestion and connectivity.
The volume of data required to deliver media (with acceptable QoE) is significantly more than
for voice or other data forms such as email or static web content. The maximum amount of
traffic that can be simultaneously delivered to subscribers represents the total capacity of
the network. Congestion can lead to packets being dropped, delayed delivery of data or even
service interruption.
Based on the number of concurrent subscribers in a given cell sector, backhaul link or
otherwise limited aggregation point, and the amount of network traffic that they generate,
this can lead to congestion (where demand exceeds capacity). Congestion can lead to packets
being dropped, delayed delivery of data or even service interruption. For TCP-based (i.e.
reliable) non-adaptive streaming sessions this can result in long delays in initial playback.
Stalling for TCP-based ABR (adaptive bit rate) streaming sessions can be mitigated,
not eliminated, by switching to clips authored with lower encoded bitrate requiring less
bandwidth.
On wireless networks, signal issues due to coverage, handoff, interference or resource
contention can  lead to degraded throughput and therefore produce similar video quality
issues as those that appear under congestion. In this case though, the quality degrades only
for the subscribers experiencing the signal issues and not necessarily for all subscribers
on that network node. Under congestion, all subscribers on a particular network node are
impacted.
PLAYBACK
Due to the diversity of device types and display resolutions, the playback device itself has
a significant impact on the subscriber’s perception of video quality issues. On smaller
screens, artifacts become imperceptible at common viewing distances based on visual acuity
limitations. Artifacts noticeable on an HD display can be imperceptible on smartphone- type
devices. Increasing display resolutions on mobile devices for example, due to the emergence
of tablet devices, increases the minimum fidelity (and therefore bandwidth) requirements
necessary to satisfy subscriber video quality expectations.
COMMON VIDEO QUALITY ISSUES
As discussed above, there are many issues that contribute to poor video quality and affect the
QoE. The most significant issues are described in more detail below.
STALLING
Stalling (sometimes referred to as re-buffering) typically occurs during reliably
delivered media sessions. When the network fails to deliver the media content in
time for playback, due to insufficient throughput, playback will stall while the client
waits for additional content. Generally the client waits to receive a certain amount of
content in its internal buffer before resuming playback.
BLURRING
Blurring refers to a lack of sharpness and is the result of insufficient detail for the
display size and resolution. It is often due to content encoded at low resolution that
is displayed in high resolution on the playback device, e.g. in full-screen mode. As
device screen sizes increase, content will need to be captured and encoded at higher
resolutions (and therefore require more bandwidth) to mitigate this issue. Blurring
can also be caused by pre-processing prior to encoding where low-pass filtering may
be applied to smooth the content details permitting more aggressive compression.
Finally, blurring can be caused by encoding stages, including quantization as well as
de-blocking and de-noising filters.
BLOCKING
The main cause of blocking artifacts is the application of overly aggressive
quantization by the encoding algorithm to blocks of pixels. Typically this occurs when
the encoding algorithm is trying to compress to an aggressively low bitrate. The result
is that finer details within blocks of pixels and subtle differences in the values of
neighboring pixels are lost due to the removal of high frequency components in the
transform domain. This makes all the pixels within the block (in the spatial domain)
appear to have the same or similar value.
JERKINESS
Jerkiness refers to a class of temporal artifacts that result in a perceived lack of
smoothness or continuity of motion. This can be due to a reduced frame rate during
capture or encoding. This artifact will typically persist throughout the media session
and the subscriber may grow accustomed to it and even accept it, particularly at lower
resolutions and/or on smaller device screen sizes.
Other causes of jerkiness include loss of video content during unreliably delivered
(generally UDP-based) media sessions, frequent stalling during reliably delivered
(generally TCP-based) media sessions, and other client- or device-related
performance issues during playback. These result in more sporadic jerkiness, which
is very difficult for a subscriber to grow accustomed to.
DAMAGED BLOCKS
When video content is lost or corrupted during transmission and cannot be
retransmitted or reliably recovered (generally UDP-based), clients may take different
approaches when it comes to displaying the pictures containing the damaged blocks.
This is generally referred to as error concealment. On some clients the damaged
block may be omitted entirely, while on others it may be replaced by spatially or
temporally neighboring blocks or combinations thereof. Worse, this initial artifact or
discontinuity continues to propagate temporally and spatially until the entire picture
is refreshed (e.g. via an I-frame).
LOSS OF SERVICE
Loss of service refers to the failure to deliver video content and is akin to dropping a
voice call. This is caused by the congestion and signal issues previously described.
Generally the media session is terminated and cannot be recovered.
LOSS OF AUDIO-VIDEO SYNCHRONIZATION
Audio-video synchronization issues are easily noticed in scenes with dialog where lip
movements do not match the timing of audio delivery. This can be introduced in the
capture or authoring stage, although this is rare. More often, this is due to packet loss
in unreliably delivered (i.e. UDP-based) media sessions or client- or device-related
issues during playback.
Mobile and fixed
broadband service
providers face the
challenge of satisfying
subscriber expectations
while struggling to
manage the growing
volume of video traffic
on their networks
VIDEO QUALITY MEASUREMENT
Mobile and fixed broadband service providers face the challenge of satisfying subscriber
expectations while struggling to manage the growing volume of video traffic on their
networks. Video services offer the promise for new revenue streams. Inability to measure
(and assure) video QoE makes it difficult to offer revenue- generating video services.
Moreover, adding visibility into video QoE is key to managing increasing amounts of
subscriber churn due to consistently poor QoE.
As OTT video traffic has grown on mobile and fixed broadband networks, the technologies and
solutions used to measure video quality have not developed sufficiently to provide an accurate
and scalable means of determining the level of subscriber satisfaction when consuming
this OTT video. This section of the paper looks at how video quality has traditionally been
measured and scored and the applicability of such techniques to the OTT video domain.
SUBJECTIVE QUALITY ASSESSMENT
The ‘gold standard’ for assessing media quality is subjective experiments. These represent
the most accurate method for obtaining quality scores and ratings. In subjective video
experiments, a number of viewers—typically 15-30—are asked to watch a set of clips and
rate their quality. There are a wide variety of subjective testing methods and procedures,
which are beyond the scope of this paper. The most common/ concise way to reflect the result
of the experiment is through the average rating over all viewers. Note that, in some cases,
additional data processing, including normalization and outlier removal, may be required.
This average rating is referred to as a Mean Opinion Score (MOS), shown in Table 1. One wellknown
application of MOS score principles is in the evaluation of voice call quality, based on
various speech codecs and transmission parameters.
Table 1
MOS Scores
SCORES
QUALITY
IMPAIRMENT
5
Excellent
Imperceptible
4
Good
Perceptible, but not annoying
3
Fair
Slightly annoying
2
Poor
Annoying
1
Bad
Very annoying, unwatchable
It is always challenging to quantify a qualitative characteristic because perception is
individualistic and generally conveyed only as an opinion based on shared comparisons.
Subjectivity and variability of viewer ratings cannot be completely eliminated. Subjective
experiments try to minimize these factors through precise instructions, training and
controlled environments. It is still important to remember that a quality score is a noisy
measurement that is defined by a statistical distribution rather than an exact measurement.
OBJECTIVE QUALITY METRICS
Objective quality metrics are algorithms designed to characterize the quality of video and
predict subjective quality or viewer MOS. There are a wide variety of objective quality metrics,
from both academia and industry standardization activities. These metrics can be categorized
as being full- reference, reduced-reference, or no-reference, based on the amount of information
required about the reference video.
FULL-REFERENCE
Full-reference (FR) quality measurement techniques, illustrated in Figure 9,
compare a transformed version of the video to a reference version of the video. The
transformed version is typically the video as output from some system, which could
be an encoder, transcoder, lossy channel or other video processing system, while the
reference version is the input to the system. They operate in the spatial (i.e. pixel)
domain as opposed to the compressed domain.
Figure 8
Full-Reference measure
These measures are generally very accurate at reflecting how closely the transformed
video resembles the reference video, and some of the more complex methods also try
to find common artifacts such as blocking, blurring and related artifacts. Typically,
none of the other lifecycle stage impairments are accounted for.
When measuring video quality in a lab, this approach makes sense for several reasons:
Scalability is not required and computational complexity can be very high as
the measurement is being performed on a few streams.
The reference video is usually accessible.
Precise, often manual, spatial and temporal alignment of the reference and
transformed video can be performed.
The delivery network is reliable and uncongested. Therefore accounting for
transmission impairments is not a requirement.
However, when measuring OTT video in a network, the above conditions do not apply.
There are potentially many concurrent video sessions, so scalability to an
entire subscriber base is required, thus computational complexity must be
constrained.
Access to the reference video is difficult if not impossible.
Automatic spatial-temporal alignment is error- prone and computationally
expensive.
The delivery networks are much less reliable and often congested, thus
it is important to incorporate transmission impairments into the quality
measurement.
Several popular full-reference measures
Several popular full-reference measures are described below. They all operate in the spatial
domain and require access to the reference video. As such, they have all the deficiencies
identified above related to full-reference measures when it comes to measuring the quality of
OTT video.
PEAK SIGNAL-TO-NOISE RATIO
Peak Signal-to-Noise (PSNR) ratio is a measure that quantifies how much a signal
is degraded or corrupted by distortion or ‘noise’; the higher the ratio, the better
the quality of the signal. In the case of compressed video, the distortion is the loss
of information introduced by a lossy encoding process. In the case of transmission
channels, the distortion is the loss of information introduced by a lossy channel.
STRUCTURAL SIMILARITY INDEX
Structural Similarity Index (SSIM) is based on the principle that a human visual
system is highly trained to identify shapes, thus the metric focuses on the amount
of structural similarity between video frames. This is counter to most other metrics
(like PSNR) which are generally based on differences between images. This score is
typically in the range of -1 to 1, with 1 being the best score.
VIDEO QUALITY METRIC
Video Quality Metric (VQM) is a video quality measure based on a human visual
model of perceived effects of blurring, jerkiness (local and global), noise, and
video distortions. It is computationally intensive procedure composed of four steps:
calibration, feature extraction, parameter calculation, and final score calculation. A
unique VQM model is used for different scenarios, for example two different models
are needed for television and video conferencing.
PERCEPTUAL EVALUATION OF VIDEO QUALITY
Perceptual Evaluation of Video Quality (PEVQ) is another video quality measurement
based on a human visual model of the perceived effect of spatial and temporal
distortions. It is another very computationally intensive procedure, composed of
similar steps as VQM: calibration/alignment, perceptual difference calculation,
classification of differences, and final score calculation. It has been standardized as
part of ITU-T J.247. It employs five indicators and uses region-of-interest (ROI) to
limit complexity.
NO-REFERENCE
No-reference (NR), also referred to as zero-reference, quality measurement
techniques, illustrated in Figure 10, do not compare transformed to reference content.
Rather, no-reference techniques estimate quality by analyzing only the post-encoded
content, using algorithms and heuristics that are based on indicative encoding
parameters and/or inferred encoding artifacts. There are two sub-categories of noreference
approaches:
Bitstream-based methods, which typically parse various headers and
payloads to varying depths.
Pixel-based methods, which fully decode the compressed video to baseband,
are superior at detecting and quantifying encoding artifacts.
Figure 9
No-Reference measurement
Several popular full-reference measures
No-reference measures are not as accurate as full-reference, however they
are generally less computationally complex and are therefore more scalable in
terms of deployment in a service provider network. One attractive attribute is
that computational complexity can be traded off against accuracy by controlling
the depth of parsing. Access to reference content is not a requirement. Similar
to full-reference measures, conventional no- reference measures do not account
for transmission impairments. However, given their relatively low computational
complexity, they can be extended to incorporate network impairments and still
provide acceptable scalability and performance.
There are many no-reference approaches currently under study and/or development,
both within standardization bodies as well as academia. Within the standards
community, ITU-T SG-12 P.NAMS and P.NBAMS are developing non-intrusive
parametric models for the assessment of performance of multimedia streaming.
The former uses only header information while the latter uses the codec bit stream.
None have been approved or widely adopted to this point.
REDUCED-REFERENCE
Reduced-reference (RR), also referred to as partial- reference, quality measurement
techniques, illustrated in Figure 11, are a compromise between the full-reference
and no-reference approaches, in which only partial information about the reference
video is available for quality estimation. Reduced reference can be quite suitable in
situations where the overhead of storage or transmission of the reference video is
prohibitive but the accuracy of a no-reference approach is too low. The disadvantage
of this technique is that additional storage or transmission of side information is
necessary. This side information typically includes parameters summarizing the
quality of the reference video. The main advantage of this approach is that the quality
parameters for the reference video are computed only once, making it much more
scalable than full-reference approaches. The main problem with reduced-reference
quality measurement is that most OTT video does not contain the side information
required by this approach.
Figure 10
Reduced-Reference measure
BROADCAST VS. OTT VIDEO QUALITY MEASUREMENT
Some of the key differences between broadcast video and OTT video are outlined in Table 2 below.
These differences illustrate why expensive and complex solutions that make sense in the broadcast
video world are not viable for OTT video, outside of isolated use cases (e.g. lab trials or ‘shoot-outs’).
Effective OTT video quality measurement requires normalization of quality scores for a wide variety
of content, streaming technologies, and display devices. Timely support of emerging formats and
devices is essential. Consideration of the impact of network impairment on QoE is crucial. These
requirements suggest a scalable, low-complexity approach, i.e. no- reference.
QOS VS. QOE MEASUREMENT
Quality of Service (QoS) metrics or Key Performance Indicators (KPIs) are often used interchangeably
with or to infer QoE. For example, if enough data is delivered in time (high network throughput)
then a high QoE may be inferred. While there is a relationship between QoS and QoE, it is not a
direct relationship. Good QoS KPIs often occur yet QoE is unsatisfactory. QoS measurement tends
to be focused on the quality of a network, whereas QoE tends to be focused on user intent and
performance of an application (video playback). User intent and expectation vary significantly
depending on the characteristics of the client device, the type of content they are viewing and many
other variables.
Table 2
Broadcast vs. OTT
and video quality
measurement
considerations
ELEMENT
BROADCAST VIDEO
OTT VIDEO
Source content & formats
• Limited
• Small set of known standards
• Vast
• Large number of changing ‘standards’
Delivery channel
• Broadcast
• Reliable
• Uncongested
• Unicast
• Unreliable
• Congested
Ecosystem
• Unified
• Single entity has end-to-end control
• Fragmented
• Multiple competitive entities serving user
Client viewing device
• Limited number of vendors
• Static
• Similar
• Small set of known standards
• Large number of vendors
• Rapidly changing capabilities
• Widely varying devices
• No unified standards
Measurement
solution scalability
requirements
• Can be deployed “per channel”
• Low scalability needs
• Should be deployed “per concurrent subscriber”
• High scalability needs
Measurement
solution
computational
requirements
• Tightly constrained feature set
• High complexity remains cost-effective
• Wide and evolving feature set
• High complexity no longer cost-effective
NETWORK IMPACT ON VIDEO QUALITY
Traditional IP networks (without an end-to-end QoS architecture) provide best-effort service
over a common, shared infrastructure. Any link or node in the network can experience
congestion. The primary mitigation strategy is to drop packets proactively. Reliable
networking protocols account for this and have built-in congestion avoidance algorithms and