The purpose of this post is to create a blueprint for a fault-tolerant network between ground networks and your Amazon Virtual Private Cloud (Amazon VPC) for 24/7 broadcast environments. This design guide is vendor-neutral for on-premises networking.
- On-premises networking
- Amazon VPC
- AWS Direct Connect
From the time the first pair of 2022-7 packets leaves the contribution encoder (referred throughout this document as “the sender”) until they reach their destination (referred throughout this document as “the receiver”), we need to influence the path so they never go through the network infrastructure. We can control this with our network design from the first-hop at the access layer until we reach the edge of our VPC.
To create diverse paths at the access layer, the sender needs to be configured with two interfaces, each in its own subnet and VLAN. Each of the sender’s interfaces needs to be patched into separate access-layer devices, a “path A” first-hop router and a “path B” first-hop router.
Static routes may be required to direct the flows to the appropriate first-hop router. In this example workflow, the two flow destinations are in subnets 10.10.92.0/23 and 10.10.94.0/23. To direct one flow to each first-hop router, the following static routes need to be configured:
- 10.10.92.0/23 via 10.0.10.1 on device eth1
- 10.10.94.0/23 via 10.0.20.1 on device eth2
Between the first-hop routers and the network’s edge, be sure that the paths meet the following criteria:
- Physical diversity – both paths should never transit through the same routers or fiber.
- Undersubscription – the total bandwidth of traffic expected to transit the paths should not exceed the bandwidth of any link in the path.
- Pre-determined – the path taken by each flow should be pre-determined and verifiable, even if you use a dynamic routing protocol. For this reason, using overlay networking or tunneling is discouraged as it significantly complicates meeting these criteria and future troubleshooting.
Similar to the ground network configuration, the VPC should have two distinct Classless Inter-Domain Routing (CIDR) blocks. It is critical to understand that this is different from creating subnets. Direct Connect Gateways advertise entire CIDR blocks, not individual subnets. You need to configure Border Gateway Protocol (BGP) import filters later on based on the CIDR blocks, or else your Direct Connects won’t have path diversity. Additionally, it is important to work with your AWS Solutions Architect (AWS SA) and your network carriers to guarantee that path diversity exists from your location to and from AWS. We have specific AWS Direct Connect Point of Presence (POP) recommendations for guaranteed diversity to associated Regions. Please connect with your AWS SA to discuss how to build diverse systems.
Configured properly, this allows you to create Amazon Elastic Compute Cloud (Amazon EC2) instances or AWS Elemental Media Services configurations with two elastic network interfaces in the same Availability Zone but diverse CIDR blocks.
In the AWS Management Console, create a new VPC. Initially the wizard allows you to add a single CIDR block. After it is created, you must select your VPC and in the “Actions” menu, select Edit CIDRs and create the additional CIDR.
Once you save this configuration, you see “2 CIDRs” on your VPC overview page.
Next, create a subnet in each Availability Zone (AZ) in each CIDR. For a Region that has four AZs, you end up with eight subnets.
Create a virtual private gateway and attach it to your VPC. For most customers, leaving the default Amazon Standard Identification Number (ASIN) is fine.
Finally, locate the routing table for your VPC and enable propagation from your virtual private gateway.
At this point, your VPC is configured and ready. Before launching instances, you may want to stage additional configuration for your specific needs, including modifying DHCP options sets, or staging security groups to be used later.
AWS Direct Connect
The final piece of the design is to connect the properly configured ground and cloud networks. Generally speaking, the design will adhere to the AWS Direct Connect Resiliency Recommendations guide, Maximum Resiliency option. In addition to this, selecting the right Direct Connect POPs where AWS guarantees path diversity on top of resiliency is important.
Work with your AWS Account Manager or Solutions Architect to determine which two Direct Connect locations are geographically proximate to your location and can provide path diversity (zero shared physical infrastructure between the POP and the AWS data center).
Once you select two Direct Connect locations, source service providers for the point-to-point circuits to connect your on-premises routers or firewalls to the Direct Connect handoff. Ensure that these service providers do not share physical pathways to avoid total loss of connectivity in the event of a fiber cut or similar event. Note that the process of securing these circuits often takes several months from initial quote to completed construction.
You can now begin configuring your Direct Connects in the AWS Management console. The service providers need the Letter of Authorization generated in these steps to complete the connection.
In the Direct Connect service in your console, create a new Maximum Resiliency connection with the connection wizard.
Select the bandwidth for each site, the locations you to be used, and the service providers you selected. Note that your “Bandwidth” is the speed of a single link at each location. This configuration creates four 10-Gbps links. The bandwidth you select should exceed the total bandwidth required for one of the diverse paths of your 2022-7 flows.
Finally, review your configuration choices and complete the order.
The next screen allows you to download all of the Letters of Authorization (LoA) to provide to the service providers.
Hand off the LoAs to the appropriate service providers and wait for them to complete your cross connections.
Once complete, configure your ground-to-cloud connectivity.
First, create a Direct Connect Gateway. This is the object that connects your four Direct Connects to the VPC.
Next, associate the Direct Connect Gateway with the Virtual Private Gateway you attached to your VPC.
Next, configure private virtual interfaces for each of the four Direct Connects. In the configuration settings, ensure they are connected to the gateway you just created.
In the additional settings, manually select the IP addresses of the virtual interfaces and the interfaces that are configured on your edge device, as well as manually setting the BGP authentication key. For IP address management purposes, choosing your own IPs is recommended.
Download sample configuration from the Actions menu for your common vendors, or configure BGP peering yourself based on the values shown here.
When the BGP session is configured successfully, the BGP status flips from down to up. Note that there may be a delay of several minutes before the status updates.
Finally, you are ready to enforce path diversity over your Direct Connects. In order to do this, your edge routers must filter the incoming BGP advertisements from the Direct Connects. As illustrated in the following diagram, Customer Edge Firewall 1 receives advertisements for both CIDR blocks, 10.10.92.0/23 and 10.10.94.0/23. Configure it to import 10.10.92.0/23, but reject 10.10.94.0/23. To enforce path diversity in the other direction, Customer Edge Firewall 1 must advertise only one of the sender’s subnets, in this example 10.0.10.0/24.
Configure the other edge device to import the remaining routes and advertise the other sender subnet.
Inspect the routing tables on the edge devices and in your VPC to confirm that all of the routes are propagating as expected.
If the edge devices and your VPC routing table show all of the expected routes, your configuration is complete and you can begin failover testing and (finally!) production workflows in the environment.
Additional notes on the workflow
From a network engineering perspective, it can seem counterintuitive to intentionally reduce the number of available paths in an effort to guarantee diversity. However in practice, this can be preferable. If both copies of the 2022-7 flow are routed through a device that shuts down in an ungraceful manner, both flows lose at least the number of packets on the wire at the time of the crash, in addition to any packets that cannot be buffered while the routing protocol selects the next-best path. Because both flows briefly disappear, the 2022-7 receiver is unable to recover any packets and there is a visible disruption to the content.
During the time the dynamic routing protocol recovers, you may lose tens or hundreds of packets (worse, if your routing protocol is not tuned properly). The visual interruption likely lasts only a few frames.
If the same failure occurred in the fully diverse network design, you would likely lose millions or tens of millions of packets, but there would be no visual disruption because of the ability of the 2022-7 receiver to select any missing packets from the non-impacted flow.
In this post, we shared a blueprint for a fault-tolerant network between ground networks and your VPC for 24/7 broadcast environments. This design guide is vendor-neutral for on-premises networking.