In this guest post, Arkady Bari, Principal Engineer II at Comcast Digital Home walks through how telecommunications enterprise Comcast Corporation set up monitoring for Amazon Kinesis Video Streams. To support their business requirements and performance SLAs, they added additional aggregated metrics vended by the Kinesis Video Stream service into Amazon CloudWatch.

Comcast offers a home security solution called Xfinity Home Security. Among other features, customers can record, store, and play back videos from security cameras installed at various locations. Originally, this feature was built and ran on Amazon EC2 and Amazon S3. Demand for the solution grew fast, so Comcast decided to migrate to Kinesis Video Streams, a fully managed video streaming service on AWS.

Like Comcast, customers using Kinesis Video Streams want to monitor their connected devices and be alerted quickly of any quality of service issues. CloudWatch is a monitoring and observability service that natively integrates with more than 70 AWS services and automatically publishes detailed one-minute metrics and custom metrics with up to one-second granularity. It also enables you to set alarms and automate actions based on predefined thresholds or on machine learning algorithms that identify anomalous behavior in the collected metrics.

Business Requirements

“An important consideration when selecting a video streaming service was operational visibility. To assess overall health of the service required the ability to aggregate various metrics across thousands of individual streams and trigger alerts when those metrics were outside of normal range,” said Arkady Bari, a Principal Engineer at Comcast.

High-performance video capture and low latency video playback is paramount for high customer satisfaction of the Xfinity Home Security system.

With the transition to Kinesis Video Streams, Comcast wanted to deploy a solution that would help them monitor performance and other run-time health information of their camera fleet at scale.

Following are some of the critical business requirements identified by Comcast to meet key SLA requirements:

  • Total number of cameras actively sending video frames to KVS.
  • Health status of all cameras (hundreds of thousands) and an alarm triggered if more than certain percentage of the cameras are not healthy at any given time.
  • Upload success rate of video frames across the whole fleet.
  • Playback metrics.

Approach

Comcast and AWS worked together to add new monitoring capabilities for Kinesis Video Streams – aggregated metrics vended by the Kinesis Video Stream service into Amazon CloudWatch.

The following aggregated metrics help provide deep performance insights critical to monitoring service quality:

  • Active connections (PutMedia.ActiveConnections)
  • Get HLS streaming session success rate (GetHLSStreamingSessionURL.Success)
  • Master manifest retrieval success rate (GetHLSMasterPlaylist.Success)
  • MP4 init retrieval success rate (GetMP4InitFragment.Success)
  • MP4 fragment retrieval success rate (GetMP4MediaFragment.Success)
  • p99 fragment upload success rate (PutMedia.Success)
  • p99 fragment ingestion latency (PutMedia.FragmentIngestionLatency)
  • p99 fragment persistence latency (PutMedia.FragmentPersistLatency)
  • p99 get HLS master manifest duration (GetHLSMasterPlaylist.Latency)
  • p99 MP4 init retrieval duration (GetMP4InitFragment.Latency)
  • p99 MP4 fragment retrieval duration (GetMP4MediaFragment.Latency)
  • p99 TS fragment retrieval duration (GetTSFragment.Latency)
  • p99 get HLS streaming session duration (GetHLSStreamingSessionURL.Latency)
  • Play list manifest retrieval success rate (GetHLSMediaPlaylist.Success)
  • TS fragment retrieval success rate (GetTSFragment.Success)

Stream metrics for total number of cameras

On the Amazon CloudWatch console, on the All Metrics tab, you can see individual Kinesis video stream metrics under Stream Metrics, as shown in the following screenshot.

If you search for a relevant Kinesis Video Stream name, you can look at individual metrics for that stream, as shown in the following screenshot.

The following screenshot shows p99 for PutMedia.ActiveConnections for a sample stream.

In large-scale video stream deployments like the one here, you might have thousands of devices sending streams to Kinesis Video Streams at any given moment. To see the active number of connections for all the streams at the AWS account level, you can use aggregated KVS metrics. To see aggregated KVS metrics within CloudWatch, select Metrics with no dimensions, as shown in the following screenshot.

On the next screen, select the relevant metric. For example, the aggregated metric PutMedia.ActiveConnections gives you the approximate number of client connections (cameras and other devices) that are continuously sending video fragments to Kinesis Video Streams, as shown in the following screenshot.

Success rate of frame uploads across the entire fleet of devices

To ensure that they were delighting customers, Comcast had to ensure the cameras were able to send video frames without any frame loss. In order to meet this performance goal, use the aggregated CloudWatch Metric data called PutMedia.Success which gives the success percentage rate of frame uploads across the entire fleet of devices, as shown in the following screenshot.

Video Ingestion latency and customer playback metrics

Aggregated video ingestion latencies can be calculated by tracking the p99 values for PutMedia.FragmentIngestionLatency and PutMedia.FragmentPersistLatency metrics, as shown in the graph below.

 

Using the aforementioned steps, Comcast created a CloudWatch dashboard to monitor all the relevant metrics used by the operations teams to monitor the performance health of their solution.

The following screenshots show an example of a CloudWatch dashboard for Kinesis Video Streams generated with sample data.

CloudWatch Dashboard used by Comcast operations team to monitor their camera fleet is shown below.

Comcast was able to deliver a high-quality security solution to their customers meeting the performance metric goals and SLAs set for a successful operation of the service.

Summary

This post demonstrated how Comcast worked closely with AWS to set up monitoring on Kinesis Video Streams, which helped them deliver a high-performing, observable large-scale security solution to their customers that met critical performance requirements. Amazon Kinesis Video Streams makes it easy to securely stream video from connected devices to AWS for analytics, machine learning (ML), playback, and other processing. When you use it in conjunction with CloudWatch, you can also easily set up monitoring to track performance metrics at a large scale.

About the Authors

Amit Kalawat is a Senior Solutions Architect at Amazon Web Services based out of New York. He works with enterprise customers as they transform their business and journey to the cloud.

 

 

 

 

 

Arkady Bari – Principal Engineer II at Comcast Digital Home, Tech Lead on migration of Comcast proprietary Continuous Video Recording solution to Kinesis Video Streams

 

 

 

 

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

from AWS Management & Governance Blog: https://aws.amazon.com/blogs/mt/understanding-amazon-kinesis-video-streams-behavior-using-amazon-cloudwatch-aggregated-metrics/