Versatile Routing and Services with BGP: Understanding and Implementing BGP in SR-OS (2014)
Chapter 3. Using BGP in VPLS
The setup, maintenance, and teardown of pseudowires forming a VPLS can be achieved using either an LDP or BGP control plane, although the functional behavior of VPLS is the same across both models. The LDP control plane model deals exclusively with the pseudowire signaling aspects, while the BGP model (RFC 4761) is broken into the sub-components of Auto-Discovery and pseudowire signaling; both can be fulfilled within a single BGP UPDATE message.
The Auto-Discovery and signaling sub-components are enabled independently in SR-OS, which leads to support of a further “hybrid” model where Auto-Discovery is managed by BGP but pseudowire signaling is managed by LDP. It's even possible that within a single VPLS instance, some parts of the service use LDP signaling and some use BGP signaling (for example, a Hierarchical-VPLS implementation with a core using BGP, and metro areas using LDP).
To identify other PE routers forming part of the VPLS (as part of the Auto-Discovery mechanism) the Route Target Extended Community (RFC4360) is used. The semantics of the use of Route Targets are exactly the same as used for BGP-MPLS IP-VPN. If a VPLS is fully meshed, a single Route Target suffices. A VPLS NLRI is also introduced with L2VPN AFI (25) and VPLS SAFI (65) for the purpose of declaring VPLS membership and exchanging demultiplexors.
To declare VPLS membership, a PE router belonging to a given VPLS announces its VPLS NLRI with the relevant Route-Target and accepts VPLS NLRI from other PE routers that contain the same Route Target.
To exchange demultiplexors, the fields VE ID, VE Block Offset, VE Block Size, and Label Base are used. The way that the demultiplexor is derived requires a little explanation. When establishing a pseudowire between two endpoints, an MPLS label is exchanged that serves as a demultiplexor to identify traffic of that given pseudowire among a number of pseudowires that might be carried in a single (MPLS or GRE) tunnel. For a VPLS service, that same demultiplexor requirement exists in order to do the following:
i. Identify the specific VPLS instance to which the packet belongs for packet forwarding.
ii. Identify the ingress PE for the purpose of MAC learning and populating the VPLS Forwarding Database (FDB).
To facilitate MAC learning, the demultiplexor label must be unique to a given ingress PE. To achieve that, the BGP VPLS control plane uses the concept of “label blocks” defined by a label base (LB) and a VE block size (VBS). The label base is a contiguous set of labels LB, LB+1, until the block reaches LB+VBS-1. When a PE advertises its VPLS NLRI to other PE routers in the same VPLS with a common label block, each receiving PE infers the label intended to be used for forwarding by adding its own unique VPLS Edge ID (VE ID) to the label base. Using this approach ensures that each receiving PE derives a unique demultiplexor that can be used at the advertising PE for both forwarding and MAC learning in the correct VPLS instance.
SR-OS always uses a VBS of 8 for BGP-VPLS
To assist in the administration of label blocks, it may be beneficial to have multiple smaller label blocks assigned to certain VE IDs (known as a “remote VE set”) instead of using a single larger label block assigned to all VE IDs in the VPLS. To achieve this, VE block offsets (VBOs) are used so that the label block becomes LB+VBO, LB+VBO+1, until the block reaches LB+VBO+VBS-1. A PE advertising VPLS NLRI may advertise multiple NLRIs, each with a different VBO and label block that defines the set of VE IDs, or remote VE set, that should use that block.
Suppose PE1 is part of a VPLS and sends a VPLS NLRI with VE ID “V”, VE Block Offset “VBO”, VE Block Size “VBS”, and label base “LB”. If PE2 is part of the same VPLS (determined by Route-Target) and has VE ID W, it implements the following to compute the label to be used:
i. First, PE2 verifies if VE ID W is part of PE1's intended “remote VE set.” If VBO <= W < VBO + VBS, W is part of PE1's remote VE set. If not, PE2 ignores the message.
ii. If PE2 is part of PE1's remote VE set, it sets up a pseudowire to PE1 where the demultiplexor label to send traffic is computed as (LB + W – VBO).
By way of example, assume PE25 advertises an UPDATE message containing the following VPLS NLRI:
RD:64496:600, VE-ID=25, VBO=25, VBS=8, LB=131056
RD:64496:600, VE-ID=25, VBO=17, VBS=8, LB=131015
If PE24 with VE ID=24 is in the same VPLS (determined by Route-Target), PE-24 must first determine if it is part of PE25's remote VE set by identifying the NLRI for which VBO <= 24 < VBO + VBS. This is the NLRI with label base 131015. PE24 then must set up a pseudowire with a demultiplexor label computed as 131015 (LB) + 24 (VE-ID) – 17 (VBO) = 131022.
The BGP VPLS specification also introduces a new BGP Extended Community attribute “Layer-2 Information Extended Community” for the purpose of signaling control information about the pseudowires to be set up for a given VPLS. The Extended Community value allocated by IANA is 0x800a and the Encapsulation Type is defined as value 19 for VPLS.
The Control Flags field contains a number of bits with the following meanings (the D and F bits are relevant to BGP multi-homing and will be discussed in further detail later in this chapter):
C When C is set to 1, it defines that a Control word must be present when sending VPLS packets to this PE.
S When S is set to 1, sequenced delivery of frames must be used when sending VPLS packets to this PE (not supported in SR-OS).
D When D is set to 1, it indicates that all attachment circuits connecting a VPLS CE and PE are down.
F Indicates when to flush MAC state. A designated forwarder must set the F bit and a non-designated forwarder must clear the F bit when sending BGP Multi-Homing advertisements. A state transition from 1 to 0 for the F bit can be used by a remote PE to flush all the MACs learned from the PE that is transitioning from designated forwarder to non-designated forwarder.
The VPLS preference is used to control the selection of a particular site as Designated Forwarder (DF) when multiple PEs are assigned the same multi-homing Site ID. The VPLS preference field is specified because the well-known attribute LOCAL_PREF is non-transitive and therefore inadequate for Inter-AS operations. The higher VPLS preference value is preferred as selection criteria.
To illustrate the configuration requirements and control plane aspects of BGP-VPLS and its various models, I'll use the simple logical topology depicted in Figure 3-3. In this figure, PE1, PE2, PE3, and PE4 are all supporting sites of a VPLS instance. CE1, CE2, CE3, and CE4 form part of the same VPLS instance, with CE1 and CE2 representing a dual-homed site. I'll look at BGP multi-homing later in this section as a mechanism to provide redundant access connectivity with a loop-free topology. All the PE routers are IBGP peered with RR1 in AS 64496.
Before discussing in some detail how BGP VPLS services are configured, the PE routers that are participating in the VPLS (and any other BGP speakers such as Route-Reflectors) must be configured to support the L2VPN Address Family for mutual exchange of VPLS NLRI. If this Address Family is being added, it must be negotiated as a capability in an OPEN exchange. In short, the addition of the L2VPN Address Family causes the router to send a NOTIFICATION message to the associated peers followed by an OPEN message containing the new capability.
Output 3-1: Addition of L2VPN Address Family
router
bgp
group "IBGP"
family l2vpn
peer-as 64496
neighbor …
BGP Auto-Discovery with LDP Signaling
For the purpose of BGP Auto-Discovery, a VPLS requires two additional parameters: the VPLS-ID and the VSI-ID.
The VPLS-ID is a unique network-wide identifier with the same value assigned for all VPLS Switch Instances (VSIs) belonging to the same VPLS. This identifier is encoded as a two-octet AS specific or IPv4 address specific Extended Community attribute. This is the only Auto-Discovery parameter that requires explicit configuration. It is defined using the vpls-id command in the bgp-ad node.
The VSI-ID is a unique identifier for each individual VSI and is carried in the VPLS NLRI. Where Auto-Discovery is used without BGP signaling, a simplified version of the VPLS NLRI is used; only the Route-Distinguisher field and the next four bytes (VE ID and VE Block Offset) are used to identify the VSI-ID. (There is no requirement for Block Size and Label Base fields of the NLRI because LDP is used for signaling of demultiplexor labels.)
The VSI-ID requires no explicit configuration. The Route-Distinguisher is automatically derived from a concatenation of the global ASN and VPLS service ID. The next four bytes are by default the system IP address, but they can be modified using the vsi-id syntax again in the bgp-ad node.
The minimum required configuration is shown in Output 3-2. Aside from the vpls-id already discussed, the other notable requirement is the pw-template configuration in the service context. This provides a template configuration defining the parameters of the pseudowire and contains local behavioral characteristics such as MAC-aging timers and split-horizon-groups, as well as signaled parameters such as vc-type and support for control-word. The pw-template is subsequently referenced within the bgp node of the VPLS service.
Output 3-2: BGP-AD with LDP Signaling Configuration
service
pw-template 1 create
split-horizon-group "SHG"
exit
vpls 600 customer 20 create
bgp
pw-template-binding 1
exit
exit
bgp-ad
vpls-id 64496:600
no shutdown
exit
stp
shutdown
exit
sap 1/1/3:600.1 create
exit
no shutdown
exit
When the bgp-ad node has been placed in a no shutdown state, the Auto-Discovery process commences and the VPLS NLRI is advertised. Debug 3-1 shows the VPLS NLRI generated by PE3 as a result of the preceding configuration. As indicated above, the BGP Auto-Discovery uses a simplified version of the VPLS NLRI containing Route-Distinguisher and system ID encoded into the VE ID and VE Block Offset fields, and this can be seen prefixed with [AD]. Also note that while the Layer-2 Information Extended Community attribute is present it does not contain any Encapsulation Type field, Layer-2 MTU field Control flags field, or VPLS Preference.
When using BGP-AD, the Route-Distinguisher and Route-Target values are automatically derived from the vpls-id configured under the bgp-ad node. This automation yields a common export and import Route-Target value and effectively delivers a VPLS topology with any-to-any connectivity. If the requirement is to construct a different topology, such as hub and spoke, you can explicitly configure the Route-Target values (and Route-Distinguisher value) under the bgp node of the VPLS service.
Debug 3-1: VPLS NLRI for BGP-AD only
5 2013/05/22 13:55:28.59 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 81
Flag: 0x90 Type: 14 Len: 23 Multiprotocol Reachable NLRI:
Address Family L2VPN
NextHop len 4 NextHop 192.0.2.13
[AD] 192.0.2.13/32, RD 64496:600
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.13
Flag: 0x80 Type: 10 Len: 4 Cluster ID:
192.0.2.23
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64496:600
l2-vpn/vrf-imp:64496:600"
The attributes of the L2VPN VPLS NLRI received at a remote PE can be viewed using conventional RIB-IN commands, but equally useful, if not more useful, is the service-level L2 route-table output shown in Output 3-3. This output contains the SDP ID of the pseudowire that was signaled in LDP as a result of the associated BGP Auto-Discovery UPDATE message. The pseudowire uses the Generalized PWid FEC Element (FEC 129) for pseudowire setup where the Address Group Identifier (AGI) can be used to signal membership information, and the Source Attachment Individual Identifier (SAII) and Target Attachment Individual Identifier (TAII) represent the VSI-IDs of the two PE routers forming the pseudowire.
Output 3-3: L2 Route-Table for BGP-AD
*A:PE1# show service l2-route-table bgp-ad
-----------------------------------------------------------------------
======================================================================
Services: L2 Route Information - Summary
======================================================================
Svc Id L2-Routes (RD-Prefix) Next Hop Origin
Sdp Bind Id PW Temp Id
----------------------------------------------------------------------
600 *64496:600-192.0.2.13 192.0.2.13 BGP-L2
17407:4294967285 1
----------------------------------------------------------------------
No. of L2 Route Entries: 1
======================================================================
When a PE router no longer participates in the VPLS service, an MP_UNREACH_NLRI is advertised containing the VPLS NLRI. This causes remote PEs to automatically commence pseudowire teardown by sourcing LDP Label Withdraw/Release messages.
Debug 3-2: VPLS NLRI MP_UNREACH NLRI
7 2013/05/22 13:56:41.11 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 21
Flag: 0x90 Type: 15 Len: 17 Multiprotocol Unreachable NLRI:
Address Family L2VPN
[AD] 192.0.2.13/32, RD 64496:600
"
BGP Auto-Discovery and Signaling
BGP-VPLS for Auto-Discovery and Signaling again requires that a pw-template is created and referenced in the bgp node of the VPLS service. The bgp node also provides the context for configuring route-distinguisher and route-target values, which are signaled in the VPLS NLRI and Extended Community attributes respectively.
The bgp-vpls node max-ve-id command configures the maximum allowed value for the ve-id; either configured locally or received in an incoming VPLS NLRI. Configuration of a ve-id value higher than the max-ve-id value is blocked in CLI. If a VPLS NLRI is received with a ve-id higher than the max-ve-id, the NLRI is dropped and an error message generated. The ve-name is a locally significant name for the VPLS instance, while the ve-id is used to derive demultiplexor labels and is signaled as part of the VPLS NLRI. Note that before the bgp-vpls node can be placed into a “no shutdown” state, a route-distinguisher value must be configured in the bgp node or a vpls-id must exist in the bgp-ad node. Output 3-4 shows the completed configuration at PE3.
Output 3-4 BGP-VPLS with AD Signaling
service
pw-template 1 create
split-horizon-group “SHG”
exit
vpls 600 customer 20 create
bgp
route-distinguisher 64496:600
route-target export target:64496:600 import
target:64496:600
pw-template-binding 1
exit
exit
bgp-vpls
max-ve-id 512
ve-name "PE-3"
ve-id 13
exit
no shutdown
exit
stp
shutdown
exit
sap 1/1/3:600.1 create
exit
no shutdown
Debug 3-3 shows the VPLS NLRI as received at PE1 from PE3. The notable difference between this VPLS NLRI and the one from Debug 3-1 (where only Auto-Discovery was used) is that there are two NLRIs, each with different VE Block Offsets and a different Label Base. In SR-OS, a VPLS PE sends out a VPLS NLRI with a label block covering at a minimum its own veid. If, however, it has received a VPLS NLRI containing other veids, it also advertises label blocks for them if additional label blocks are required. In this example, PE3 (veid 13) sends enough label blocks in the UPDATE to cover all of the veids that it has seen in the VPLS NLRIs it has received. Consequently, PE3 sends one VPLS NLRI with a VBO of 9 (which covers the veids 11 and 12 for PE2 and PE3 respectively) and one VPLS NLRI with a VBO of 17 (which covers veid 22 for PE1).
Note also that the Layer-2 Information Extended Community contains the Encapsulation Type, MTU, and Preference fields. It does not, however, contain any non-zero flags in the Control flags field. This becomes important when I discuss Multi-Homing in the next section.
Output 3-5 shows the BGP RIB-IN at PE1 showing the VPLS NLRI received from PE3 and PE4 and the associated dynamically-created SDP IDs. Output 3-6 shows the same SDP IDs within the VPLS service with the type “BGP” together with the calculated (egress) demultiplexor labels.
Debug 3-3: VPLS NLRI with AD and Signaling
6 2013/06/04 08:49:11.57 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 105
Flag: 0x90 Type: 14 Len: 47 Multiprotocol Reachable NLRI:
Address Family L2VPN
NextHop len 4 NextHop 192.0.2.13
[VPLS] veid: 13, vbo: 9, vbs: 8, label-base: 262102, RD 64496:600
[VPLS] veid: 13, vbo: 17, vbs: 8, label-base: 262094, RD 64496:600
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.13
Flag: 0x80 Type: 10 Len: 4 Cluster ID:
192.0.2.23
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64496:600
l2-vpn/vrf-imp:Encap=19: Flags=none: MTU=1514: PREF=0
"
Output 3-5: BGP-VPLS L2-Route-Table
*A:PE1# show service l2-route-table bgp-vpls
==================================================================
Services: L2 Bgp-Vpls Route Information - Summary
==================================================================
Svc Id L2-Routes (RD) Next Hop Ve-Id
Sdp Bind Id PW Temp Id
------------------------------------------------------------------
600 *64496:600 192.0.2.12 12
17405:4294967275 1
600 *64496:600 192.0.2.13 13
17404:4294967274 1
------------------------------------------------------------------
No. of L2 Bgp-Vpls Route Entries: 2
==================================================================
Output 3-6: BGP-VPLS Service-Level SDPs
*A:PE1# show service id 600 sdp
==========================================================================
Services: Service Destination Points
==========================================================================
SdpId Type Far End addr Adm Opr I.Lbl E.Lbl
--------------------------------------------------------------------------
17404:4294967274 Bgp* 192.0.2.13 Up Up 131026 262099
17405:4294967275 Bgp* 192.0.2.12 Up Up 131025 262074
--------------------------------------------------------------------------
Number of SDPs : 2
--------------------------------------------------------------------------
==========================================================================
Debug 3-4: MP_REACH NLRI with D-bit set
7 2013/06/04 08:50:08.58 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 105
Flag: 0x90 Type: 14 Len: 47 Multiprotocol Reachable NLRI:
Address Family L2VPN
NextHop len 4 NextHop 192.0.2.13
[VPLS] veid: 13, vbo: 9, vbs: 8, label-base: 262102, RD 64496:600
[VPLS] veid: 13, vbo: 17, vbs: 8, label-base: 262094, RD 64496:600
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.13
Flag: 0x80 Type: 10 Len: 4 Cluster ID:
192.0.2.23
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64496:600
l2-vpn/vrf-imp:Encap=19: Flags=D: MTU=1514: PREF=0
"
If a PE router has a single attachment circuit (SAP) that transitions from an operationally up state to an operationally down state (or has multiple SAPs that all transition from up to down) it sources an MP_REACH VPLS NLRI with the D-bit set to 1 in the Layer-2 Information Extended Community attribute. This triggers PE routers receiving this UPDATE message to flush any MAC addresses learned from the advertising PE router from their FDB tables.
When a PE router no longer participates in the VPLS service, an MP_UNREACH_NLRI is advertised containing the VPLS NLRI. Other PEs participating in the VPLS receiving the UPDATE remove the originating PE from the VPLS and tear down any BGP-signaled SDPs.
Debug 3-5: MP_UNREACH_NLRI for BGP AD and Signaling
78 2013/05/22 15:59:42.09 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 45
Flag: 0x90 Type: 15 Len: 41 Multiprotocol Unreachable NLRI:
Address Family L2VPN
[VPLS] veid: 13, vbo: 9, vbs: 0, label-base: 0, RD 64496:600
[VPLS] veid: 13, vbo: 17, vbs: 0, label-base: 0, RD 64496:600
"
BGP Multi-Homing
When delivering business services it is often a requirement for a Service Provider to give redundant connectivity to one or more sites, often referred to as “multi-homing.” In the case of a VPLS service, the Service Provider has a requirement to ensure that this multi-homed redundant access provides a loop-free topology to protect both the customer network and the Service Provider network infrastructure. The VPLS multi-homing specification detailed in draft-ietf-l2vpn-vpls-multihoming provides a BGP-based multi-homing solution that is applicable to both BGP and LDP VPLS technologies, and in the case of LDP VPLS can be used without the use of the BGP Auto-Discovery solution. Using BGP to provide a loop-free multi-homing solution has significant benefits over other multi-homing technologies, such as Spanning Tree Protocol or Multi-Chassis (active/standby) Link Aggregation Group (LAG), because it is entirely transparent to the customer (there is no requirement for the CPE to run any control plane protocol), uses a standard and scalable protocol that is familiar to most if not all Service Providers, and in many cases just requires the addition of the L2-VPN Address Family if it is not already in use.
The BGP multi-homing specification describes procedures for electing a Designated Forwarder (DF) among the PEs that are multi-homed to a customer site. The DF is responsible for forwarding traffic from the VPLS service to the Attachment Circuit and vice-versa, while the non-DF does not forward any traffic, thus creating a loop-free topology where only one attachment circuit is placed in forwarding state. The DF election process uses a Multi-Homed ID (MH-ID) to distinguish sites that are part of the same customer site, so the same MH-ID must be assigned on all VPLS PEs that are multi-homed to the same customer site. When two or more VPLS PEs advertise their NLRI with the same MH-ID they are identified as candidates for DF selection.
The MH-ID is carried in the VE-ID field of the VPLS NLRI shown in Figure 3-1 and when present the VE Block Offset, VE Block Size, and Label Base fields are all set to 0. In addition, each of the PE's supporting the multi-homed site must be provisioned with a unique Route Distinguisher, which allows VPLS advertisements from different VPLS PEs to be distinct even if the advertisements have the same VE-ID. This situation is entirely possible where multi-homing is used.
Figure 3-1 BGP VPLS NLRI
Figure 3-2 Layer-2 Information Extended Community
The DF election algorithm is a two-stage process applicable to all VPLS NLRIs (not just MH NLRIs) that operates at BGP and VPLS levels. The first stage performs bucketization of relevant and comparable advertisements, while the second stage picks a single winner from each bucket by repeatedly applying a tie-breaking algorithm on a pair of advertisements from that bucket. The algorithm is not without a level of complexity and for the purpose of conciseness is not described here but is detailed in Section 3-3 of draft-ietf-l2vpn-vpls-multihoming. The result of the election is that any PEs elected as non-DF for a given Site-ID place their attachment circuits in non-forwarding state. Because a VPLS builds its forwarding tables based on the data-plane, there is no explicit requirement to signal this non-forwarding state to the CPE, but SR-OS allows this state to be linked to a Y.1731 Maintenance Endpoint (MEP) on the attachment circuit and either suspend the generation of Circuit Continuity Messages (CCMs) or set the Interface-TLV to down toward the peer MEP.
In addition to the MH-ID encoded in the VE-ID field of the MH NLRI, BGP multi-homing uses the D and F bits of the Layer-2 Information Extended Community to signal forwarding state. Their purpose was described earlier in this section but I provide some examples of these bits in use later in this section.
Once again, I use the topology shown in Figure 3-3 to demonstrate the use of BGP multi-homing where PE1 and PE2 represent a dual-homed site consisting of CE1 and CE2. An example of the configuration applied at PE1 is given in Output 3-7 where the only additional configuration over and above that used for BGP Auto-Discovery and Signaling is the provision of the site object. The site context allows for provision of the site-id representing the MH-ID. As previously described, the site context must be the same across the PEs supporting a common multi-homed site. The site context also allows you to reference SAPS that will be placed in forwarding or non-forwarding state depending on the outcome of the DF election.
Figure 3-3 BGP-VPLS Topology
The boot-timer defines how long the system waits after a node reboot before executing the DF election algorithm. This delay is to allow for BGP sessions to be established and for NLRI to be exchanged so that the executing PE has received all relevant information before executing the algorithm. The site-activation-timer serves a similar function and is used to define a time period during which the system will keep any local sites in standby status after they have become operational. Like the boot-timer, the purpose is to allow some time to receive BGP UPDATEs from remote PEs and therefore make a more informed decision when executing the DF election algorithm. Note that the site “DUAL-HOME” in Output 3-7 is placed in a shutdown state at present. This is intentional so that you can distinguish between VPLS and MH NLRI as you move through this section.
Output 3-7: BGP Multi-Homing Configuration
service
pw-template 1 create
split-horizon-group "SHG"
exit
vpls 600 customer 20 create
bgp
route-distinguisher 64496:600
route-target export target:64496:600 import
target:64496:600
pw-template-binding 1
exit
exit
bgp-vpls
max-ve-id 512
ve-name "PE-1"
ve-id 22
exit
no shutdown
exit
stp
shutdown
exit
site "DUAL-HOME" create
shutdown
site-id 600
sap 1/1/3:600.22
boot-timer 360
site-activation-timer 30
exit
sap 1/1/3:600.22 create
exit
no shutdown
Both PE1 and PE2 have the VPLS site context in an administratively shutdown state when the VPLS is enabled. Debug 3-6 shows the VPLS NLRI advertised by PE2 at this point. Note that in the NLRI field the VE Block Offset, VE Block Size, and Label Base all have non-zero values and that none of the flags in the flags field in the Layer-2 Information Extended Community are set.
Debug 3-6: PE2 VPLS NLRI
22 2013/06/04 12:50:34.59 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 105
Flag: 0x90 Type: 14 Len: 47 Multiprotocol Reachable NLRI:
Address Family L2VPN
NextHop len 4 NextHop 192.0.2.11
[VPLS] veid: 11, vbo: 9, vbs: 8, label-base: 262087, RD 64496:601
[VPLS] veid: 11, vbo: 17, vbs: 8, label-base: 262079, RD 64496:601
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.11
Flag: 0x80 Type: 10 Len: 4 Cluster ID:
192.0.2.23
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64496:600
l2-vpn/vrf-imp:Encap=19: Flags=none: MTU=1514: PREF=0
"
The site context at PE1 and PE2 is subsequently enabled and a MH NLRI is then sourced from both of the VPLS PEs. Debug 3-7 shows the MH NLRI sourced from PE2 where the VE Block Offset, VE Block Size, and Label Base fields are all set to 0 and the VE-ID field carries the MH-ID of 600 configured at PE1 and PE2 under the site context. In addition, the flags field of the Layer-2 Information Extended Community shows that the DF bits are set, meaning that PE2 is the elected DF.
Debug 3-7: MH NLRI Sourced by PE2
23 2013/06/04 12:52:07.59 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 86
Flag: 0x90 Type: 14 Len: 28 Multiprotocol Reachable NLRI:
Address Family L2VPN
NextHop len 4 NextHop 192.0.2.11
[MH] site-id: 600, RD 64496:601
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.11
Flag: 0x80 Type: 10 Len: 4 Cluster ID:
192.0.2.23
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64496:600
l2-vpn/vrf-imp:Encap=19: Flags=-DF: MTU=0: PREF=0
"
Using the service-level l2-route-table verification shown in Output 3-8, you can verify the presence of BGP VPLS NLRI from all PE routers in the VPLS domain bound to the associated BGP-signaled SDPs. In addition, note that PE2 (192.0.2.11) is the DF for Site-ID 600 (DF bits set), while PE1 (192.0.2.22) is the non-DF. As previously discussed, this means PE1 must place all attachment circuits associated with Site-ID 600 into a non-forwarding state. This is shown in Output 3-9 where the SAP is put into an operationally down state with the flag StandbyForMHProtocol.
Output 3-8: VPLS Domain L2-Route-Table at PE3
*A:PE3# show service id 600 l2-route-table multi-homing bgp-vpls
==========================================================================
Services: L2 Multi-Homing Route Information - Summary
==========================================================================
Svc Id L2-Routes (RD-Prefix) Next Hop SiteId State DF
--------------------------------------------------------------------------
600 64496:600 192.0.2.22 600 up(0) clear
600 64496:601 192.0.2.11 600 up(0) set
--------------------------------------------------------------------------
No. of L2 Multi-Homing Route Entries: 2
==========================================================================
==========================================================================
Services: L2 Bgp-Vpls Route Information - Service 600
==========================================================================
L2-Routes (RD) Next Hop Ve-Id Sdp Bind Id
--------------------------------------------------------------------------
*64496:601 192.0.2.11 11 17400:4294967267
*64496:600 192.0.2.12 12 17404:4294967274
*64496:600 192.0.2.22 22 17399:4294967266
--------------------------------------------------------------------------
No. of L2 Bgp-Vpls Route Entries: 3
==========================================================================
Output 3-9: PE1 Attachment Circuit in Non-Forwarding State
*A:PE1# show service id 600 sap 1/1/3:600.22
==================================================================
Service Access Points(SAP)
==================================================================
Service Id : 600
SAP : 1/1/3:600.22 Encap : qinq
QinQ Dot1p : Default
Description : (Not Specified)
Admin State : Up Oper State : Down
Flags : StandByForMHProtocol
Multi Svc Site : None
Last Status Change : 06/04/2013 12:53:42
Last Mgmt Change : 06/04/2013 07:14:10
==================================================================
To verify the switchover capabilities of the BGP multi-homing feature, you can simulate a failure on the attachment circuit at the current DF, PE2. Because the reconvergence times largely depend on BGP UPDATE propagation times, SR-OS allows for the optional and configurable rapid propagation of BGP UPDATEs (bypassing the conventional Minimum Route Advertisement Interval or MRAI timer) using the rapid-update 12-vpn command under the global BGP context.
Output 3-10: L2-VPN Rapid Update Feature
router
bgp
rapid-update l2-vpn
group "IBGP"
family l2vpn
peer-as 64496
neighbor …
When the attachment circuit is failed at PE2, it sources an MP_REACH MH NLRI shown in Debug 3-8 with the flags field of the Layer-2 Information community set to D, but with the F bit clear. To avoid black-hole conditions, remote BGP VPLS PEs must interpret DF bit transitions from 1 to 0 as an implicit MAC “flush-all-from-me” indication. As a result, MAC addresses learned from the transitioning VPLS PE are flushed and flooding/relearning takes place until MAC tables are populated again.
Debug 3-8: PE2 Attachment Circuit Failure
26 2013/06/04 13:59:23.62 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 86
Flag: 0x90 Type: 14 Len: 28 Multiprotocol Reachable NLRI:
Address Family L2VPN
NextHop len 4 NextHop 192.0.2.11
[MH] site-id: 600, RD 64496:601
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.11
Flag: 0x80 Type: 10 Len: 4 Cluster ID:
192.0.2.23
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64496:600
l2-vpn/vrf-imp:Encap=19: Flags=D: MTU=0: PREF=0
"
The BGP UPDATE from PE2 indicating a transition to non-DF (attachment circuit down) means that PE1, which forms part of the same Site-ID, must transition to DF. As a result, PE1 sources an MH NLRI with the DF bits of the flags field in the Layer-2 Information community set as shown in Debug 3-9.
Note that once the attachment circuit at PE2 recovers, the restoration is revertive and that this behavior cannot be changed.
The topology used for the purpose of illustrating how BGP multi-homing operates used a single SAP at each of the multi-homing VPLS PEs. As such, it represented a binary up/down condition for the Site-ID. However, what would happen if, in our case, the following took place:
· PE1 and PE2 had more than one SAP belonging to Site-ID 600, and
· At the DF, one of those SAPs failed, leaving a situation where some of the SAPs of the Site-ID are operationally up, and some SAPs of the Site-ID are operationally down.
Debug 3-9: Transition to DF at PE1
27 2013/06/04 13:59:23.62 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 86
Flag: 0x90 Type: 14 Len: 28 Multiprotocol Reachable NLRI:
Address Family L2VPN
NextHop len 4 NextHop 192.0.2.22
[MH] site-id: 600, RD 64496:600
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.22
Flag: 0x80 Type: 10 Len: 4 Cluster ID:
192.0.2.23
Flag: 0xc0 Type: 16 Len: 16 Extended Community:
target:64496:600
l2-vpn/vrf-imp:Encap=19: Flags=-DF: MTU=0: PREF=0
"
The action taken during this kind of scenario is controllable using the failed-threshold command in the site context to indicate a number of objects (1 to 1000) that should be down in order for the site to be declared down.
The use of BGP multi-homing detailed in this section used redundantly connected SAPs to illustrate the concept of Site-ID and DF election to provide a redundant loop-free topology. It is, however, equally possible to use SDPs within the context of a site, meaning that BGP multi-homing can also be used in a Hierarchical VPLS topology as shown in Figure 3-4. An example of this is to interconnect fully-meshed metro VPLS domains through a core VPLS domain where each metro VPLS domain is represented as a Multi-Homed site to the core, thus creating a loop-free topology.
Figure 3-4 Hierarchical VPLS with BGP Multi-Homing