AWS Hybrid Connectivity: Direct Connect, Site-to-Site VPN, and TGW Integration
Hybrid cloud is not a buzzword when your ERP still lives in a colo and your new checkout service runs in AWS. The networking problem is concrete: reliable, secure, routable connectivity between on-prem CIDRs and VPCs, with failover that doesn't require a bridge call at 2 AM.
This guide covers AWS Direct Connect (DX), Site-to-Site VPN, how they attach to Transit Gateway, and the BGP/route-policy decisions that separate a clean design from an outage waiting to happen.
The Three Hybrid Options
| Method | Throughput | Latency | Best for |
|---|---|---|---|
| Site-to-Site VPN | Up to ~1.25 Gbps per tunnel (pair for HA) | Internet-dependent | Quick start, backup path, remote sites |
| Direct Connect | 1 Gbps – 100 Gbps dedicated | Consistent, low | Production, bulk data, compliance |
| DX + VPN (backup) | DX primary, VPN failover | Best of both | What most enterprises actually run |
Private VIF vs. Public VIF on DX:
- Private VIF — BGP to your VPCs (via DX Gateway → TGW or VGW). This is your RFC1918 traffic.
- Public VIF — reach AWS public endpoints (S3, DynamoDB public, etc.) without hairpinning through the internet from on-prem. Useful for S3-heavy pipelines.
Site-to-Site VPN to Transit Gateway
Modern pattern: VPN attaches to TGW, not the legacy Virtual Private Gateway (VGW) on a single VPC.
[ On-prem router ]══IPsec══► [ VPN attachment on TGW ]◄──► [ VPC spokes ]
Terraform sketch
resource "aws_customer_gateway" "onprem" {
bgp_asn = 65001
ip_address = "203.0.113.10"
type = "ipsec.1"
}
resource "aws_vpn_connection" "onprem" {
transit_gateway_id = aws_ec2_transit_gateway.hub.id
customer_gateway_id = aws_customer_gateway.onprem.id
type = "ipsec.1"
static_routes_only = false # BGP — preferred at scale
tunnel1_phase1_encryption_algorithms = ["AES256"]
tunnel1_phase2_encryption_algorithms = ["AES256"]
}
Static vs. BGP: Static routes work for one prefix. The moment you have more than a handful of CIDRs or need dynamic failover, use BGP. ANS will ask about BGP attributes on DX — same logic applies to VPN when using dynamic routing.
Tunnel redundancy
AWS provides two tunnels per VPN connection. On-prem must terminate both (active/standby or ECMP depending on vendor). A single-tunnel deployment is a study guide answer, not a production design.
Direct Connect Architecture
[ On-prem ]──DX circuit──► [ DX Location ]──► [ DX Gateway ]──► [ TGW or VGW ]
Components:
- DX connection — physical port at a DX location (you or a partner cross-connects)
- Virtual Interface (VIF) — VLAN on that port; BGP session
- DX Gateway — anchors private VIF to one or more VPC regions (private VIF is region-local; DX Gateway can associate with TGW in that region)
- TGW association — propagates on-prem routes to VPC attachments
BGP on Direct Connect
You receive from AWS (typical):
- AWS-side ASN:
64512(varies — check your LOA-CFA) - BGP auth (MD5) optional but recommended
You advertise to AWS:
- Only the prefixes you intend AWS to reach — never leak your ISP's full table
- Use AS_PATH prepending on backup DX or backup VPN to influence inbound path selection
You receive from AWS:
- VPC CIDRs (and routes from other VPCs attached to TGW if propagated)
Local Preference on your edge router controls whether DX or VPN carries outbound traffic. Common: prefer DX (higher local-pref), VPN as backup.
DX + VPN Failover Pattern
- Primary: DX Private VIF → TGW → spokes
- Backup: VPN → same TGW → same spokes
- TGW route table: same on-prem prefixes via both attachments with different BGP preferences on the AWS side (or static preference via route table static routes — but BGP is cleaner)
When DX fails:
- BGP withdraws DX routes
- VPN routes remain
- Convergence depends on BGP timers — tune
hold-time/keepalivewith your provider; AWS defaults are conservative
Test failover quarterly. DX failures are rare; VPN backup with expired crypto profiles is common.
Security and Inspection
Hybrid traffic often must pass a firewall:
- On-prem: traffic hairpins through your edge firewall before DX — common for compliance
- In AWS: inspection VPC on TGW (see Transit Gateway article)
- MACsec: line-rate encryption on DX at Layer 2 — required for some financial workloads
Don't forget security groups are stateful but NACLs are stateless — asymmetric hybrid paths fail at NACLs first when someone tightens a /tmp rule at 5 PM Friday.
Troubleshooting Hybrid Paths
| Symptom | Likely layer | Check |
|---|---|---|
| BGP up, no traffic | Route policy / prefix filter | show ip bgp, AWS route table |
| One AZ works | TGW ENI subnet / AZ affinity | Attachment subnets per AZ |
| Intermittent drops | MTU / fragmentation | ping -M do -s 1472 end-to-end |
| Slow after failover | VPN bandwidth ceiling | DX vs VPN capacity planning |
| DNS works, apps don't | Split-horizon DNS | Resolver endpoints, on-prem conditional forwarders |
VPC Route 53 Resolver endpoints let on-prem resolve private hosted zones — hybrid designs that forget DNS are the silent killer.
# DX virtual interface state
aws directconnect describe-virtual-interfaces
# VPN tunnel status
aws ec2 describe-vpn-connections --vpn-connection-ids vpn-abc123
# Effective routes on TGW
aws ec2 search-transit-gateway-routes \
--transit-gateway-route-table-id tgw-rtb-xxxx \
--filters Name=type,Values=propagated,static
Cost and Capacity Planning
- DX: port hours + outbound data transfer (often cheaper than internet egress at scale)
- VPN: data transfer out + no port fee — expensive at sustained Gbps
- Direct Connect Gateway and TGW add their own per-GB charges
Right-size: a 1 Gbps DX with 200 Mbps average utilization beats a 10 Gbps port you can't justify — but leave headroom for backup windows and DR failovers.
ANS Exam Quick Hits
- VPN supports IPv4 and IPv6 (know tunnel inside/outside addressing)
- DX connection vs. hosted connection vs. hosted VIF (partner model)
- Transit VIF connects DX to TGW (not the older VGW-only model)
- Accelerated Site-to-Site VPN (if still in curriculum) — performance option over standard VPN
- PrivateLink is not a hybrid connectivity replacement — it's consumer-to-service without VPC peering
Related Reading
- AWS Transit Gateway — hub-and-spoke and inspection VPC
- VPC Fundamentals — CIDR, subnets, route tables
- BGP Notes — path attributes and troubleshooting
- Troubleshooting L1 to L7 — systematic debugging when hybrid paths fail
Book a network health check if you're designing hybrid connectivity for production.