Versatile Routing and Services with BGP: Understanding and Implementing BGP in SR-OS (2014)
Appendix A. Path Selection Process
This appendix explains the BGP path selection process and provides some additional detail about MED comparison based on parameter settings.
Best-Path Selection Algorithm
The BGP path decision process compares paths to the same destination prefix that are held in the Adj-RIB-In and defines a degree of preference for a path (or paths), which in turn is advertised to peers (subject to Adj-RIB-Out policy).
If the Next-Hop attribute of a BGP route is an address that is not reachable (resolvable), the route is not considered as part of the decision process. The process follows these steps:
1. Select the route from the hierarchy of routes learned from different protocols. In SR-OS this is indicated as preference, and the route learned through the protocol with the lowest preference value is considered the best. (Note that IBGP and EBGP both have a preference value of 170.)
2. Select the route with the highest Local Preference (LOCAL-PREF attribute).
3. Select the route with the least number of Autonomous Systems in its AS_PATH attribute (unless as-path-ignore is configured). An AS_SET counts as one AS.
4. Select the route with the lowest ORIGIN attribute (where IGP = 0, EGP =1, and incomplete = 2).
5. Select the route with the lowest MED value if one of the following applies:
i. Both routes have the MED attribute and were advertised by the same neighbor AS (leftmost AS in the AS_PATH).
ii. Both routes were advertised by a different neighbor AS but always-compare-med without the strict-as option is configured.
iii. One or both routes do not have the MED, but always-compare-med is configured and indicates the MED value to assume for routes that do not have the attribute.
6. Select the route learned by an EBGP over the route learned from an IBGP peer.
7. Select the route with the lowest IGP distance to the BGP Next-Hop of the route (unless ignore-nh-metric is configured). If the BGP Next-Hop is resolved by an LSP (for example, IGP shortcuts or BGP-VPN routes), the cost from the tunnel-table is used.
8. Select the route with the lowest ORIGINATOR ID or received from the peer with the lowest BGP Identifier (unless ignore-router-id) is configured and the routes being compared are EBGP routes).
9. Select the route with the shortest CLUSTER list. An empty cluster list is considered to have a length of 0.
10. Select the route received from the lowest peer IP address.
Always-Compare-MED
As indicated previously in Step 5, the MED attribute is typically used in the decision process only if both routes have the attribute present and come from the same neighboring AS. There are, however, some exceptions depending on router configuration, notably the use of the always-compare-medcommand and the strict-as keyword. Table A.1 shows the influence each command/keyword has on the path selection algorithm.
By default, MED values of VPN-IPv4 routes are always compared even if always-compare-med is disabled (default). This behavior is historic and allows for sites of the same VPN to belong to different Autonomous System numbers. If this behavior is undesirable, you can disable it using the always-compare-med strict-as command.
Table A.1 MED Comparison with always-compare-med
Command |
MED Comparison |
always-compare-med disabled |
Only compare the MED of two paths if they come from the same neighbor-AS and both paths have a MED attribute. Otherwise skip the step. |
always-compare-med |
Only compare the MED of two paths (whether or not they are from the same neighbor-AS) if they both have a MED attribute. Otherwise skip the step. |
always-compare-med zero |
Always compare the MED of two paths, even if they are from a different neighbor AS. If one or both paths do not have a MED attribute, consider the MED to be zero. |
always-compare-med infinity |
Always compare the MED of two paths, even if they are from a different neighbor-AS. If one or both paths do not have a MED attribute, consider the MED to be infinite. |
always-compare-med strict-as zero |
Only compare the MED of two paths if they come from the same neighbor-AS. If one or both paths do not have a MED attribute, consider the MED to be zero. |
always-compare-med strict-as infinity |
Only compare the MED of two paths if they come from the same neighbor-AS. If one or both paths do not have a MED attribute, consider the MED to be infinite. |
Deterministic MED
In some environments the outcome of the BGP path selection process can be unpredictable and potentially lead to route oscillation because it depends on the order in which routes are learned. Consider the example shown in Figure A.1 where three external peers are advertising the prefix 172.16.32.0/20 with different AS paths and MED values.
Figure A.1 Deterministic MED
Using router R3 as the calculating router, assume that routes are learned from peers in the order A, then B, then C.
When route A is received, it is the only route to the prefix 172.16.32.0/20 so it is automatically the best route. When route B arrives it is compared to route A (the current best path). Because the neighbor ASs of routes A and B are different the always-compare-med configuration option determines whether the MEDs in the two routes are comparable or not. For the sake of example, assume the always-compare-med option is not enabled, so route A remains the best path because of its lower BGP identifier. When route C arrives it is compared to route A, and because the neighbor ASs are the same, route C is selected as the new best path because it has the lowest MED value.
Output A-1: Best Path with Routes Received A-B-C
*A:R3# show router bgp routes
==================================================================
BGP Router ID:192.168.0.11 AS:200 Local AS:200
==================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
Origin codes : i - IGP, e - EGP, ? - incomplete, > - best, b - backup
==================================================================
BGP IPv4 Routes
==================================================================
Flag Network LocalPref MED
Nexthop Path-Id Label
As-Path
------------------------------------------------------------------
u*>i 172.16.32.0/20 100 2
192.168.0.22 33 -
64509
*i 172.16.32.0/20 100 5
192.168.0.13 31 -
64509
*i 172.16.32.0/20 100 10
192.168.0.21 32 -
64510
------------------------------------------------------------------
Routes : 3
==================================================================
Next, consider an example where the routes are received in the order A, then C, then B. When route A is received, it is the only route to the prefix 172.16.32.0/20 so it is automatically the best route. When route C arrives it is compared to route A (the current best path), and because the neighbor AS is the same, the route from C is installed due to the lower MED. When route B arrives, it is compared to route C. Because the neighbor AS is different and always-compare-med is not enabled, the MED is not compared, and route B becomes the best path because of the lowest router ID.
Output A-2: Best Path with Routes Received A-C-B
*A:R3# show router bgp routes
=========================================================================
BGP Router ID:192.168.0.11 AS:200 Local AS:200
=========================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
Origin codes : i - IGP, e - EGP, ? - incomplete, > - best, b - backup
=========================================================================
BGP IPv4 Routes
=========================================================================
Flag Network LocalPref MED
Nexthop Path-Id Label
As-Path
-------------------------------------------------------------------------
u*>i 172.16.32.0/20 100 10
192.168.0.21 36 -
64510
*i 172.16.32.0/20 100 2
192.168.0.22 35 -
64509
*i 172.16.32.0/20 100 5
192.168.0.13 34 -
64509
-------------------------------------------------------------------------
Routes : 3
======================================================================
The Deterministic MED feature overcomes this problem and changes how MED comparisons are done to ensure deterministic best path selections. The main change is to always group received routes by neighbor AS (first AS in the AS_PATH or the local AS if the AS_PATH is empty). Within each group, BGP selects the best path. (The configuration of always-compare-med does not matter for this step.) Finally, BGP compares all the “group-best paths,” and for this step the configuration of always-compare-med is relevant. If one path remains after this final MED comparison step, this is the overall best path. If multiple paths remain, further rules of the decision must be evaluated.
Consider again the preceding example. Router A and C belong to the same neighbor AS group and the comparison of these two paths always selects route C as the group-best (lowest MED). With always-compare-med disabled, the MEDs of the group-best-paths cannot be compared so further rules must be evaluated. Route B is ultimately selected over route C as the best path in this example because of the lowest BGP identifier. When deterministic MED is enabled, route B will always be selected as best regardless of the arrival order of routes A, B, and C.
The deterministic-med function and always-compare-med are both enabled in the best-path-selection node of the base BGP context. As indicated previously, enabling deterministic MED can be considered best practice to provide deterministic path selection and also avoid potential route oscillation.
Output A-3: Deterministic MED Configuration
router
bgp
best-path-selection
always-compare-med strict-as zero
deterministic-med
exit
References and Glossary
References
1 |
RFC 4271 |
A Border Gateway Protocol 4 (BGP-4) |
2 |
RFC 5942 |
Capabilities Advertisement with BGP-4 |
3 |
RFC 4760 |
Multi-Protocol Extensions for BGP-4 |
4 |
RFC 4360 |
BGP Extended Communities Attribute |
5 |
RFC 4364 |
BGP/MPLS IP-VPNs |
6 |
RFC 2918 |
Route Refresh Capability for BGP-4 |
7 |
RFC 4684 |
Constrained Route Distribution for BGP/MPLS IP-VPNs |
8 |
RFC 3107 |
Carrying Label Information in BGP-4 |
9 |
draft-ietf-mpls-seamless-mpls |
Seamless MPLS Architecture |
10 |
RFC 4724 |
Graceful Restart Mechanism for BGP |
11 |
RFC 4761 |
VPLS Using BGP for Auto-Discovery and Signaling |
12 |
draft-ietf-l2vpn-vpls-multihoming |
BGP based Multi-Homing in VPLS |
13 |
RFC 6624 |
Layer-2 VPNs Using BGP for Auto-Discovery and Signaling |
14 |
RFC 6513 |
Multicast in BGP/MPLS IP-VPNs |
15 |
RFC 6037 |
Cisco Systems' Solution for Multicast in BGP/MPLS IP-VPNs |
16 |
draft-ietf-grow-ops-reqs-for-bgp-error-handling |
Operational Requirements for Enhanced Error Handling Behavior in BGP-4 |
17 |
RFC 4798 |
Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider Edge Routers |
18 |
draft-ietf-idr-best-external |
Advertisement of the Best External Route in BGP |
19 |
draft-ietf-pwe3-dynamic-ms-pw |
Dynamic Placement of Multi-Segment Pseudowires |
20 |
draft-ietf-l2vpn-evpn |
BGP/MPLS Based Ethernet VPN |
21 |
RFC 5575 |
Flow Specification |
22 |
RFC 3704 |
Ingress Filtering for Multi-Homed Networks |
23 |
RFC 5082 |
The Generalized TTL Security Mechanism (GTSM) |
24 |
RFC 5331 |
Upstream Label Assignment and Context-Specific Label Space |
25 |
draft-ietf-idr-bgp-gr-notification |
Notification Support for BGP Graceful Restart |
26 |
draft-ietf-idr-ls-distribution |
Advertising Link State Information in BGP |
27 |
draft-ietf-sidr-pfx-validate |
BGP Prefix Origin Validation |
28 |
RFC 6810 |
RPKI-Router Protocol |
29 |
draft-ietf-side-origin-validation-signalling |
BGP Prefix Origin Validation State Extended Community |
Glossary
ABR |
Area Border Router |
AD |
(or A-D) Auto-Discovery |
AF |
Assured Forwarding |
AFI |
Address Family Indicator |
AGI |
Attachment Group Identifier |
AGN |
Aggregation Node |
ALTO |
Application Layer Transport Optimization |
AN |
Access Node |
ARP |
Address Resolution Protocol |
AS |
Autonomous System |
ASBR |
Autonomous System Border Router |
ASN |
Autonomous System Number |
BE |
Best Effort |
BFD |
Bidirectional Forwarding Detection |
BGP |
Border Gateway Protocol |
BNG |
Broadband Network Gateway |
CE |
Customer Edge |
CMS |
Cloud Management System |
CSC |
Carrier Supporting Carrier |
CSV |
Circuit Status Vector |
DF |
Designated Forwarder |
DHCP |
Dynamic Host Configuration Protocol |
EBGP |
Exterior BGP |
ECMP |
Equal Cost Multi-Path |
EF |
Expedited Forwarding |
EOR |
End of Rib (Marker) |
ERO |
Explicit Route Object |
ESI |
Ethernet Segment Identifier |
EVI |
Ethernet VPN Instance |
FC |
Forwarding Class |
FDB |
Forwarding Database |
FEC |
Forwarding Equivalence Class |
FIB |
Forwarding Information Base |
FSM |
Finite State Machine |
GR |
Graceful Restart |
GRE |
Generic Routing Encapsulation |
GUA |
Globally Unique Address |
I-PMSI |
Inclusive PMSI |
IBGP |
Interior BGP |
IGP |
Interior Gateway Protocol |
IMM |
Integrated Media Module |
IOM |
Input Output Module |
KVM |
Kernel-Based Virtual Machine |
LAG |
Link Aggregation Group |
LB |
Label Base |
LDP |
Label Distribution Protocol |
LSP |
Label Switched Path |
MDT |
Multicast Distribution Tree |
MEP |
Maintenance Endpoint |
MH-ID |
Multi-Homed Identifier |
MP2MP |
MultiPoint to MultiPoint |
MRAI |
Minimum Route Advertisement Interval |
MS-PW |
Multi-Segment Pseudowire |
MSDP |
Multicast Source Discovery Protocol |
MTU |
Maximum Transmission Unit |
MVPN |
Multicast VPN |
NAT |
Network Address Translation |
NCP |
Network Control Protocol |
NHLFE |
Next-Hop Label Forwarding Entry |
NLRI |
Network Layer Reachability Information |
NSF |
Non-Stop Forwarding |
NSH |
Next Signalling Hop |
NSR |
Non-Stop Routing |
NVE |
Network Virtualization Edge |
NVO |
Network Virtualization Overlay |
ORF |
Outbound Route Filtering |
ORR |
Optimal Route Reflection |
P2MP |
Point to MultiPoint |
PCE |
Path Computation Element |
PDU |
Protocol Data Unit |
PE |
Provider Edge |
PIC |
Prefix Independent Convergence |
PIM |
Protocol Independent Multicast |
PLR |
Point of Local Repair |
PMSI |
Provider Multicast Service Instance |
PPP |
Point-to-Point Protocol |
QOS |
Quality of Service |
QPPB |
QOS Policy Propagation Using BGP |
RD |
Route Distinguisher |
RG |
Residential Gateway |
RIB |
Routing Information Base |
ROA |
Route Origin Attestation |
RP |
Rendezvous Point |
RPF |
Reverse Path Forwarding |
RPKI |
Resource Public Key Infrastructure |
RPT |
Rendezvous Point Tree |
RR |
Route-Reflector |
RSVP |
Resource Reservation Protocol |
RT |
Route Target |
RTBH |
Remote Triggered Black-Holing |
RTM |
Route Table Manager |
S-PE |
Switching PE |
S-PMSI |
Selective PMSI |
S2L |
Source To Leaf |
SA |
Source Active |
SAII |
Source Attachment Individual Identifier |
SAFI |
Sub Address Family Identifier |
SAP |
Service Access Point |
SDP |
Service Distribution Point |
SPT |
Shortest Path Tree |
SR-OS |
Service Router Operating System |
SSM |
Source-Specific Multicast |
STP |
Spanning Tree Protocol |
T-PE |
Terminating PE |
TAII |
Target Attachment Individual Identifier |
TCP |
Transmission Control Protocol |
TED |
Traffic Engineering Database |
TLV |
Type Length Value |
TTL |
Time-To-Live |
UMH |
Upstream Multicast Hop |
URPF |
Unicast RPF |
VA |
Virtual Application |
VBO |
VE Block Offset |
VBS |
VE Block Size |
VE |
VPLS Edge |
VE ID |
VPLS Edge Identifier |
VID |
VLAN Identifier |
VM |
Virtual Machine |
VPLS |
Virtual Private LAN Service |
VPRN |
Virtual Private Routed Network |
VPWS |
Virtual Private Wire Service |
VRF |
VPN Routing and Forwarding |
VRP |
Validated ROA Payload |
VRR |
Virtual Route-Reflector |
VRRP |
Virtual Router Redundancy Protocol |
VSI |
VPLS Switch Instance |
VTEP |
VXLAN Tunnel End Point |
VXLAN |
Virtual eXtensible Local Area Network |