Versatile Routing and Services with BGP: Understanding and Implementing BGP in SR-OS (2014)
Chapter 7. Multicast
Delivery of multicast traffic from one or more sources to potentially many receivers requires that the correct multicast forwarding state is established on all routers in the path from source to receiver. This includes performing packet replication where necessary (for example, receive traffic for group (S1, G1) on interface i, and forward that traffic on interface i1 and i2). Primarily, Protocol Independent Multicast (PIM) has been used to create this multicast state, which, unlike some of its lesser-deployed predecessors, is abstract from the underlying protocols used to exchange unicast reachability. However, PIM needs that unicast reachability information to determine the Reverse Path Forwarding (RPF) interface toward the source (or next-hop toward the source), and this is where BGP first became important in multicast environments. In intra-AS environments this information could be obtained from the IGP, but in inter-AS environments this information was exchanged in BGP, and so the IPv4 Multicast Address Family was introduced.
As extensions for multicast in BGP-MPLS IP-VPNs were defined, originally using the draft-rosen architecture, BGP was again used as an Auto-Discovery mechanism for PIM neighbors of the same multicast domain. More recent developments in Multicast VPN technology have subsumed the draft-rosen architecture as a subset of its capabilities and have extended the role of BGP so that it can be used to create multicast forwarding state, thereby effectively replacing PIM in the core of the network.
Inter-Domain IPv4-IPv6 PIM
The use of PIM Sparse Mode (PIM-SM) and Any Source Multicast (ASM) represents some challenges when the sources and receivers are situated in different Autonomous Systems. Within a common AS, an active source registers to the Rendezvous Point (RP). When any interested receivers have joined to the same RP, PIM (S,G) state is created between the source and the RP, and traffic flows down the (*,G) shared tree from the RP toward the receiver. Assume, however, that the source for group G is situated in AS 64510 and an interested receiver for group G is situated in AS 64511. The source registers to the RP in AS 64510, and the interested receiver joins to the RP in AS 64511, but because the RP in AS 64511 isn't aware that the source in AS 64510 is sending to group G, the Any Source Multicast model is broken.
The Multicast Source Discovery Protocol (MSDP) was conceived to allow RPs to notify other RPs in different domains when a multicast source is active, using Source-Active (SA) messages. When an RP first learns of the presence of a source in its own AS, it encapsulates the first Register packet in an SA message and advertises it to each of its MSDP peers. The SA message identifies the source, the group to which the source is sending, and the RP's own address. The SA messages are flooded to other MSDP peers using peer-RPF Flooding, which uses the originating RP's address (in the incoming SA message) to determine whether an incoming SA message was received on the correct interface toward the originator. If this is the case, the SA message is then readvertised to each of the RP's MSDP peers.
When a receiving RP has interested receivers for the active source, it creates (S,G) state and joins directly toward the source. When multicast packets subsequently arrive from the source, they are flooded down the shared tree (RPT) by the receiving RP. When the last-hop router receives the first packet, it optionally can create (S,G) state toward the source. The originating RP thereafter periodically sends SA messages with a list of all the sources that are currently registered to the originating RP.
The use of MSDP solves one of the problems of inter-domain multicast; it notifies other domains of an active source. However, another problem remains. Using the previous example, assume the RP in AS 64511 receives an SA message from the RP in AS 64510 indicating source S is active for group G. Next, assume the RP in AS 64511 has an interested receiver for group G so it needs to send an (S,G) join toward the source S. The first thing the RP needs to do is to determine the RPF interface toward source S. However, source S is not in the IGP of AS 64511 and so the RPF check fails. This is the purpose of Multicast IPv4 BGP (and Multicast IPv6 BGP). It is used to advertise IPv4 prefixes across AS boundaries for RPF resolution and is carried in Multi-Protocol BGP using AFI 1 SAFI 2 (or AFI 2 for IPv6).
To illustrate the use of MSDP and Multicast IPv4 I'll use the test topology illustrated in Figure 7-1. Routers R1 and R2 belong to AS 64496 and are peered with RR1 for Address Families IPv4 and Multicast IPv4 only. Router R1 has a single source connected at address 10.1.1.10 sending to group 239.255.194.222.
Figure 7-1 Inter-Domain IPv4 Multicast
Routers R3 and R4 belong to AS 64510 and are peered with RR2, again for Address Families IPv4 and Multicast IPv4 only. Both AS 64496 and AS 64510 run PIM-SM, and each has an RP that is peered in MSDP with the RP of the neighboring AS. Routers R2 and R3 are the peering routers between the ASs and peer with a single EBGP session supporting the Multicast-IPv4 Address Family only, through which the source (10.1.1.0/24) and RP (192.0.2.23) prefixes are advertised from R2 to R3 for RPF determination in AS 64510. The source (10.1.1.0/24) prefix is used for RPF lookup in the event of a PIM join toward the source, while the RP (192.0.2.23) prefix is used for RPF lookup of incoming MSDP Source-Active messages. The interconnect link is also configured for PIM to be able to pass multicast traffic between the domains.
Output 7-1 shows the configuration of RR1-RP1 within AS 64496. From a BGP perspective, there is very little to configure with the exception of the mcast-ipv4 Address Family. Note that if this Address Family is being added, it must be negotiated as a capability in an OPEN message and therefore triggers a NOTIFICATION/OPEN exchange with the associated peers. The msdp node provides the context to configure remote peers together with the local-address that will be used for each peering session.
Output 7-1: RR1-RP1 Configuration
router
bgp
group "CLIENTS"
family ipv4 mcast-ipv4
cluster 192.0.2.23
peer-as 64496
neighbor 192.0.2.21
exit
neighbor 192.0.2.22
exit
exit
no shutdown
msdp
peer 192.0.2.12
active-source-limit 512
local-address 192.0.2.23
exit
exit
Recall that router R2 advertises the source subnet 10.1.1.0/24 and the RP address 192.0.2.23/32 to router R3 for RPF determination within AS 64510. To do that, R2 uses a conventional route policy like that shown in Output 7-2, but those prefixes must exist in the Multicast RIB. SR-OS separates the RIB and FIB for unicast and multicast to allow for incongruent routing of unicast and multicast traffic, and by default IGP routes populate the unicast routing table and not the multicast table. Because R2 knows the source subnet and RP address in the IGP, you can populate the multicast routing table with those routes using the multicast-import ipv4 command under the IGP node (in this case IS-IS) as shown in Output 7-3.
Output 7-2: Router R1 EBGP Route Policy
*A:R2# show router policy "RP-Source"
entry 10
from
prefix-list "RP-Source"
exit
to
protocol bgp
exit
action accept
exit
exit
default-action reject
Output 7-3: Multicast-Import Configuration at R2
router
isis
multicast-import ipv4
etc
When the advertised prefixes are learned by router R3 in AS 64510 in Multicast-IPv4 Address Family, they are automatically populated into the multicast routing table and propagated in IBGP to other BGP speakers in AS 64510 as shown in Output 7-4. When routers in AS 64510 execute a multicast RPF check, by default they look only in the unicast routing table. Because the AS 64496 source subnet and RP address are learned in Multicast-IPv4, they are not present in the unicast routing table. Therefore you must configure the routers in AS 64510 to execute multicast RPF checks using both the multicast and unicast routing tables with the rpf-table both command under the pim node. When the both keyword is used, SR-OS looks up the route in the multicast route table first. If the route is not present there, SR-OS looks up the route in the unicast route table. For completeness, the configuration is shown in Output 7-5.
Output 7-4: R3 Multicast Routing Table
*A:R3# show router route-table mcast-ipv4 protocol bgp
=================================================================
Multicast IPv4 Route Table (Router: Base)
=================================================================
Dest Prefix[Flags] Type Proto Age Pref
Next Hop[Interface Name] Metric
-----------------------------------------------------------------
10.1.1.0/24 Remote BGP 00h02m11s 170
192.168.148.49 0
192.0.2.23/32 Remote BGP 00h02m11s 170
192.168.148.49 0
-----------------------------------------------------------------
No. of Routes: 2
Flags: L = LFA nexthop available B = BGP backup route available
n = Number of times nexthop is repeated
=================================================================
Output 7-5: Multicast RPF Table Configuration in AS 64510
router
pim
rpf-table both
etc
With the required configurations in place, the receiver connected to R4 joins to group 239.255.194.222 and establishes (*,G) state at RP2. The source at 10.1.1.10 connected to R1 begins to send traffic to group address 239.255.194.222, which causes the first hop router R1 to register to RP1 in AS 64496. In turn, RP1 in AS 64496 sends an MSDP Source-Active message to RP2 in AS 64510. This is shown in Output 7-6.
Output 7-6: Source-Active Message sourced by RP1
*A:RP2# show router msdp source-active
======================================================================
MSDP Source Active Info
======================================================================
Grp Address Src Address Origin RP Peer Address State Timer
----------------------------------------------------------------------
239.255.194.222 10.1.1.10 192.0.2.23 192.0.2.23 58
----------------------------------------------------------------------
MSDP Source Active : 1
======================================================================
When RP2 receives the Source-Active message it recognizes that it has an interested receiver and therefore sends an (S,G) join to create a shortest-path tree between itself and the source. The RPF check for this join uses the multicast routing table because the source 10.1.1.0/24 is not present in the unicast routing table. When the (S,G) tree is established between RP2 in AS 64510 and R1 in AS 64496, traffic from source 10.1.1.10 flows along the (S,G) state from R1 to RP2, and then down the (*,G) tree from RP2 to R4. The result is that RP2 has two PIM states for group 239.255.194.222 as shown in Output 7-7.
Output 7-7: PIM State for Group 239.255.194.222 at RP2
*A:RP2# show router pim group 239.255.194.222
==========================================================
PIM Groups ipv4
==========================================================
Group Address Type Spt Bit Inc Intf No.Oifs
Source Address RP Inc Intf(S)
----------------------------------------------------------
239.255.194.222 (*,G) 1
* 192.0.2.12
239.255.194.222 (S,G) to-R3 0
10.1.1.10 192.0.2.12
----------------------------------------------------------
Groups : 2
==========================================================
When router R4 receives the first packet down the shared tree from RP2, it optionally can create (S,G) state directly between itself and the source (thus moving off the shared tree between itself and RP2). By default, SR-OS moves to the shortest path tree when the first packet is received on the shared tree, and as shown in Output 7-8 this is indeed what R4 does. Again, the RPF check for this (S,G) join uses the multicast routing table populated by the Multicast-IPv4 routes advertised by AS 64496.
Output 7-8: Router R4 PIM State for Group 239.255.194.222
*A:R4# show router pim group
==========================================================
PIM Groups ipv4
==========================================================
Group Address Type Spt Bit Inc Intf No.Oifs
Source Address RP Inc Intf(S)
----------------------------------------------------------
239.255.194.222 (*,G) to-RP2 1
* 192.0.2.12
239.255.194.222 (S,G) spt to-R3 1
10.1.1.10 192.0.2.12
----------------------------------------------------------
Groups : 2
==========================================================
Multicast in MPLS/BGP IP-VPNs
The original proposal for extending BGP/MPLS IP-VPNs to support IP multicast came from the draft-rosen-vpn-mcast framework. This became a subset of the wider-reaching Multicast-VPN specification (frequently referred to as Next-Generation Multicast-VPN) (RFC 6513), retiring the draft-rosen architecture to historic status (RFC 6037). Although draft-rosen is a subset of the Multicast-VPN specification, it's useful to be able to distinguish between them. I'll refer to the original draft-rosen specification simply as draft-rosen, and the remainder of Multicast in BGP/MPLS IP-VPNs work simply as Multicast-VPN or MVPN.
Draft-Rosen
The draft-rosen framework uses PIM to extend an (S,G) or (*,G) multicast distribution tree from a customer site, through the Service Provider network, to n x customer sites within the same VPN. Each PE router runs an instance of PIM for each multicast-enabled VRF (or MVPN). In each of these MVPN instances, the PE router maintains a PIM adjacency with connected CE routers and maintains separate MVPN multicast routing tables. Entities contained within these MVPNs are generically referred to “C-instance.” For example, multicast state is referred to as C-multicast and represented as (C-*,C-G) or (C-S,C-G). The PE router also runs a global instance of PIM known as the Service Provider instance or “P-instance,” with which it forms adjacencies with each of its IGP neighbors such as P routers and/or other PE routers.
Each MVPN is assigned to a multicast domain that defines a set of PE routers supporting a VPN that are able to send multicast traffic to each other. Each multicast domain is configured with a multicast group address belonging to the Service Provider P-instance, which is used to create multicast tunnels between PE routers forming part of the MVPN. The encapsulation technique for these multicast tunnels is GRE; the source address is the PE system address and the destination address is the P-instance multicast domain group address. Customer control plane (PIM) and data-plane traffic is subsequently encapsulated within these multicast tunnels so that it remains transparent to the Service Provider core.
So, with PIM in the C-instance and PIM in the P-instance, where did BGP become useful in the draft-rosen architecture? The answer is for Auto-Discovery of PE routers forming part of the same multicast domain (MVPN). The PE routers belonging to the same multicast domain discover each other using an NLRI known as the Multicast Distribution Tree SAFI (MDT SAFI) (RFC 6037), which utilizes AFI 1 SAFI 66. The RD:IPv4 Address field contains the system address of the advertising PE router, while the Group Address field contains the P-instance group address assigned to this multicast domain. The MDT-SAFI NLRI is sent in a BGP UPDATE message together with a Route-Target Extended Community attribute (described in Chapter 2) used to define the members of the MVPN.
Figure 7-2 MDT-SAFI NLRI
Figure 7-3 Draft-Rosen MVPN Architecture
Upon receiving a BGP UPDATE with the MDT-SAFI NLRI and relevant Route Target values, PE routers in the multicast domain establish multicast tunnels to create a Multicast Distribution Tree (MDT) by sending PIM joins to the multicast domain P-instance address. In the case of using PIM-SM in the P-instance, the PIM joins are sent toward the Rendezvous Point (RP) to create a (*,G) shared tree. In the case of PIM-SSM in the P-instance, the MDT-SAFI NLRI provides the receiving PE with the source address to which it should join, and so the PIM join is directed toward the source to create (S,G) multicast state in the core where each advertising PE router is the root of the tree and all of the receiving PE routers are leaves of the tree.
The MDT effectively creates a broadcast domain where all traffic forwarded onto the MDT by a given PE router is seen by every other PE router in the domain and is referred to as the Default-MDT. The Default-MDT is useful for deployments of multicast, such as broadcast TV, where the intention is that all PE routers receive all content, but can lead to sub-optimality in cases where PE routers do not have interested receivers (in this case the traffic is just silently discarded by the receiving PE). Clearly this has the potential to be sub-optimal and an inefficient use of bandwidth in the core of the Service Provider network. The draft-rosen framework and subsequent Multicast in BGP/MPLS IP-VPNs specification both detail how “data” distribution trees can be created to increase optimality. While PIM C-Join/Prune messages are always passed over the Default-MDT, separate Data-MDTs can be created to pass traffic only to PE routers that have interested receivers. Therefore, every PE in the multicast domain joins the Default-MDT, but a PE does not join a non-default distribution tree unless it is connected to an MVPN site that explicitly needs to receive traffic from a group that has been assigned to that tree. Within the draft-rosen framework, the method for signaling these Data-MDTs is through extensions to PIM.
To illustrate the use of BGP and the MDT-SAFI for draft-rosen Multicast-VPN Auto-Discovery I'll use the topology shown in Figure 7-4. PE1, PE2, and PE4 form part of the MVPN multicast domain and all have connected CE routers running PIM. The P-instance uses PIM-SSM, while the C-instance runs both PIM-SSM and PIM-SM with an RP (Bootstrap Router) running on CE1. All the PE routers are IBGP peered with a Route-Reflector RR1 and are configured to support the MDT-SAFI Address Family. As with any Address Family supported in Multi-Protocol BGP, its use is negotiated as a capability during the OPEN exchange; therefore the addition of MDT-SAFI causes the router to send a NOTIFICATION message to its peer, followed by an OPEN message containing the new capability.
Figure 7-4 Draft-Rosen MVPN Topology
Output 7-9 shows the configuration requirements for the addition of draft-rosen Multicast VPN to a unicast VPN, although no explanation is given of the unicast parameters as described in Chapter 2. The Multicast VPN parameters are all configured under the mvpn node, but before configuring any parameters within this context PIM must be administratively enabled. In the example, PIM is running on the PE to CE interface toward CE1, but even if PIM isn't actually required toward a multicast receiver (that is, the receiver is directly connected and only IGMP is required), PIM still must be administratively enabled. The configuration syntax within the mvpn node is exactly the same for draft-rosen and next-generation Multicast-VPNs but uses the terminology from the latter. The first parameter is the auto-discovery command which, for draft-rosen, is set to mdt-safi. An alternative, default option is to use the MVPN-IPv4 Address Family defined in Multicast VPN for Auto-Discovery, but MDT-SAFI predated this NLRI and is historically used for draft-rosen.
The provider-tunnel provides the context for configuring the Default-MDT using the inclusive keyword. Within the inclusive node, the example shows that PIM SSM is in use and the P-instance group address for the Default-MDT is 239.255.1.1. The provider-tunnel also provides the context for configuring one or more Data-MDTs using the selective keyword. Data-MDTs provide the capability to optimize bandwidth utilization by sending traffic only to PE routers that have interested receivers, but have the disadvantage that they create more state in the provider backbone. The data-threshold command therefore provides the means to restrict the creation of Data-MDTs only to certain C-groups and only once they have exceeded a certain bandwidth threshold. The data-threshold command serves this purpose, and the next example shows the C-group range to be unconstrained (224.0.0.0/4), with the bandwidth threshold set at 100Kb/s. The system monitors C-groups with (S,G) state. When a connected source has crossed this threshold, the PE router signals all other PEs through the Default-MDT to indicate that a Data-MDT is being created for this C-group. This Data-MDT consumes another P-instance group, so to constrain how much state a given Multicast VPN can consume, the selective context allows for a prefix/mask to be defined. In this case the pim-ssm 239.255.16.0/24 command allows creation of up to 256 Data-MDTs.
Output 7-9: PE1 MVPN Configuration
service
vprn 500 customer 1 create
vrf-target target:64496:202
autonomous-system 64496
route-distinguisher 64496:202
auto-bind ldp
interface "PE-to-CE" create
address 192.168.0.1/30
sap 1/1/3:20.298 create
exit
exit
pim
interface "PE-to-CE"
exit
no shutdown
exit
mvpn
auto-discovery mdt-safi
provider-tunnel
inclusive
pim ssm 239.255.1.1
exit
exit
selective
data-threshold 224.0.0.0/4 100
Pim-ssm 239.255.16.0/24
exit
exit
exit
no shutdown
When the MVPN configuration is complete and the service has been enabled, the process of MVPN Auto-Discovery starts in addition to conventional unicast prefix distribution. Debug 7-1 shows the MDT-SAFI NLRI advertised by PE4 as received at PE1. Note that the NLRI shows the configured RD of the VPRN and the IPv4 system address together with the Default-MDT configured within the mvpn inclusive node.
Debug 7-1: MDT-SAFI Sourced from PE4
4 2013/06/07 15:04:59.85 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 65
Flag: 0x90 Type: 14 Len: 26 Multiprotocol Reachable NLRI:
Address Family MDT-SAFI
NextHop len 4 NextHop 192.0.2.13
[MDT-SAFI] Addr 192.0.2.13, Group 239.255.1.1, RD 64496:202
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.13
Flag: 0x80 Type: 10 Len: 4 Cluster ID:
192.0.2.23"
Because PE1 now has the source (192.0.2.13) and group address (239.255.1.1), it joins directly to that source-specific group so that PE4 is the root and PE1 and PE2 are leaves of the multicast tree. Equally, PE1 and PE2 become the root of their own source-specific trees. Once the multicast state is created, each PE router sends PIM Hellos into the Default-MDT to discover and maintain PIM adjacencies. In the case of PE1 in Output 7-10, there are three PIM neighbors: one to CE1 and two through the Default-MDT to PE2 and PE4.
Although I don't intend to show the complete functionality and operation of draft-rosen multicast VPN, I'll take a moment to illustrate the basic ability to join a multicast group within the C-instance. To that end, the receiver behind CE4 joins to the C-group 239.255.174.1. Recall that the C-instance is running PIM-SM and that the RP is hosted on CE1 at 172.31.101.1. Output 7-11 shows the PIM group state for group 239.255.174.1 at CE4 with a Rendezvous Point tree hosted at 172.31.101.1 (CE1) showing that multicast RPT state is correctly established.
Output 7-10: PE1 MDT PIM Neighbors
*A:PE1# show router 202 pim neighbor
========================================================================
PIM Neighbor ipv4
========================================================================
Interface Nbr DR Prty Up Time Expiry Time Hold Time
Nbr Address
------------------------------------------------------------------------
PE-to-CE 1 4d 03:57:13 0d 00:01:17 105
192.168.0.2
202-mt-239.255.1.1 1 3d 22:09:56 0d 00:01:18 105
192.0.2.13
202-mt-239.255.1.1 1 4d 03:25:56 0d 00:01:19 105
192.0.2.21
------------------------------------------------------------------------
Neighbors : 3
========================================================================
Output 7-11: CE4 PIM Group State
*A:CE4# show router 298 pim group 239.255.174.1 detail
===============================================================
PIM Source Group ipv4
===============================================================
Group Address : 239.255.174.1
Source Address : *
RP Address : 172.31.101.1
Advt Router : 192.168.0.5
Flags : Type : (*,G)
MRIB Next Hop : 192.168.0.5
MRIB Src Flags : remote
Keepalive Timer : Not Running
Up Time : 0d 00:21:06
Resolved By : rtable-u
Up JP State : Joined
Up JP Expiry : 0d 00:00:53
Up JP Rpt : Not Joined StarG
Up JP Rpt Override : 0d 00:00:00
Rpf Neighbor : 192.168.0.5
Incoming Intf : CE-to-PE
Outgoing Intf List: loopback
Curr Fwding Rate : 0.0 kbps
Forwarded Packets : 0 Discarded Packets : 0
Forwarded Octets : 0 RPF Mismatches : 0
Spt threshold : 0 kbps ECMP opt threshold : 7
Admin bandwidth : 1 kbps
---------------------------------------------------------------
Groups : 1
===============================================================
Inter-AS Draft-Rosen
Before moving on from draft-rosen multicast VPN, consider the use case of extending a multicast VPN across an Autonomous System boundary. To extend the multicast VPN through this interconnect, two extensions are required. In Figure 7-5, two Autonomous Systems are interconnected through the use of a Type B Interconnect (RFC 4364, Section 10). I'll use this as a reference model to describe those two extensions.
Figure 7-5 Inter-AS Draft-Rosen Topology
(i) Connector Attribute
As previously described, the Default-MDT is a PIM-enabled interface. When a PE needs to send a C-multicast PIM join through the MDT to an Upstream Multicast Hop (UMH), it must determine the Reverse Path Forwarding (RPF) interface toward the particular C-address. The PE looks up the C-address in the VRF; if the C-address is learned through MP-BGP, its Next-Hop address is one of the PEs that is a PIM adjacency over the Default-MDT. That given multicast tunnel is the RPF interface. However, when VPN-IPv4 prefixes are advertised through a Type B interconnect, both ASBRs impose themselves as Next-Hops on the UPDATE message but do not actually participate in the VPN or Multicast VPN. As a result, the Next-Hop attribute cannot be used to correlate to a PIM adjacency on the MDT, and the RPF interface toward the C-source cannot be determined.
The BGP Connector attribute solves this problem. The connector attribute is an optional transitive attribute that is carried with VPN-IPv4 prefixes in a draft-rosen Multicast VPN and contains the originating PE system address (the same address that is used to forward multicast packets onto the MDT). Therefore, when a PE looks up a C-address in the VRF for RPF determination, it uses the Connector attribute instead of the Next-Hop attribute.
In SR-OS, VPN-IPv4 prefixes are advertised with the Connector attribute present when auto-discovery has been configured for mdt-safi within the mvpn context of the VPRN service.
(ii) PIM RPF Vector
In Figure 7-5, PE1 signals an UPDATE message with an MDT-SAFI NLRI containing its system address toward PE2 as part of the Auto-Discovery process. As a result, PE2 sends a P-instance PIM join toward that system address to create hop-by-hop PIM state for the multicast distribution tree. If you assume that PE2's IGP next-hop is a P router, that router receives a PIM join destined toward PE1, but PE1's IP address isn't known to this router because it is not known within the IGP of AS 64510. Therefore the P router cannot forward the PIM join and it is dropped.
The RPF Vector TLV is an extension to PIM that specifies the IP address of the ASBR on the path to the root of the multicast distribution tree. When a PE router sends a PIM join message into the core, it includes the PIM RPF Vector within the PIM Join Attribute. Each core router thereafter does its RPF check on the address contained in the RPF Vector TLV and propagates the join toward the specified ASBR (Vector) to create the multicast distribution tree.
PIM RPF Vector must be enabled on all PE routers, P routers, and ASBRs that will be part of the multicast distribution tree. It is configured using the rpfv mvpn command under the global PIM node.
Output 7-12: RPF Vector Configuration
router
pim
interface "system"
exit
interface "to-Core"
exit
rpfv mvpn
Again using Figure 7-5 as reference, PE2 distributes the VPN prefix 172.31.102.0/24 into VPN-IPv4, and because auto-discovery mdt-safi is configured within the VPRN, the UPDATE message contains the Connector attribute. Output 7-13 shows the VPN-IPv4 prefix 172.31.102.0/4 sourced by PE2 as received at PE1. Note that the Next-Hop attribute is set to ASBR1 (192.0.2.22) and the presence of the Connector attribute consisting of the Route-Distinguisher and system address of the originating PE. PE1 subsequently uses this attribute for RPF interface determination.
The P-instance group address used to form the Default-MDT in Figure 7-5 is 239.255.1.1, which is an SSM group. As a result, when PE1 receives an MDT-SAFI NLRI from PE2, it sends a PIM join directly toward the source (192.0.2.11), which includes the RPF Vector TLV. This is shown inOutput 7-14. The source is PE2 (192.0.2.11), and the Advertising Router is ASBR1 (192.0.2.22). The resulting RPF Vector is also set to ASBR1 (192.0.2.22).
Output 7-13: BGP Connector Attribute
*A:PE1# show router bgp routes vpn-ipv4 172.31.102.0/24 detail
==================================================================
BGP Router ID:192.0.2.21 AS:64496 Local AS:64496
==================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
Origin codes : i - IGP, e - EGP, ? - incomplete, > - best, b – backup
==================================================================
BGP VPN-IPv4 Routes
==================================================================
------------------------------------------------------------------
Original Attributes
Network : 172.31.102.0/24
Nexthop : 192.0.2.22
Route Dist. : 64510:100 VPN Label : 262139
Path Id : None
From : 192.0.2.23
Res. Nexthop : n/a
Local Pref. : 100 Interface Name : to-ASBR1
Aggregator AS : None Aggregator : None
Atomic Aggr. : Not Atomic MED : None
AIGP Metric : None
Connector : RD 64510:100, Originator 192.0.2.11
Community : target:64496:100
Cluster : 192.0.2.23
Originator Id : 192.0.2.22 Peer Router Id : 192.0.2.23
Fwd Class : None Priority : None
Flags : Used Valid Best IGP
Route Source : Internal
AS-Path : 64510
VPRN Imported : 100
....snip
------------------------------------------------------------------
Routes : 1
==================================================================
Output 7-14: PE1 PIM RPF Vector
*A:PE1# show router pim group 239.255.1.1 source 192.0.2.11 detail
======================================================================
PIM Source Group ipv4
======================================================================
Group Address : 239.255.1.1
Source Address : 192.0.2.11
RP Address : 0
Advt Router : 192.0.2.22
Upstream RPFV Nbr : 192.0.2.130
RPFV Type : Mvpn 64496:10 RPFV Proxy : 192.0.2.22
Flags : spt Type : (S,G)
MRIB Next Hop : 192.0.2.130
MRIB Src Flags : remote
Keepalive Timer Exp: 0d 00:03:08
Up Time : 0d 16:03:5 Resolved By : rtable-u
Up JP State : Joined Up JP Expiry : 0d 00:00:06
Up JP Rpt : Not Joined StarG Up JP Rpt Override : 0d 00:00:00
Register State : No Info
Reg From Anycast RP: No
Rpf Neighbor : 192.0.2.130
Incoming Intf : to-ASBR1
Outgoing Intf List : system
Curr Fwding Rate : 0.0 kbps
Forwarded Packets : 1924 Discarded Packets : 0
Forwarded Octets : 150072 RPF Mismatches : 0
Spt threshold : 0 kbps ECMP opt threshold : 7
Admin bandwidth : 1 kbps
----------------------------------------------------------------------
Groups : 1
======================================================================
The preceding example of a Type B Interconnect has a requirement for both the BGP Connector Attribute and the PIM RPF Vector TLV in order to build the multicast distribution tree. In a Type C Interconnect, the Next-Hop attribute of the MDT-SAFI remains intact as the originating PE and not the ASBR, and so the use of the Connector attribute is superfluous. The PIM RPF Vector TLV still must be included in the PIM Join however, and should be set to the Next-Hop of the labeled BGP route for the originating PE.
Multicast VPN
The architecture defined in RFC 6513 introduces the notion of a P-Multicast Service Interface (PMSI) to define the entity that connects a set of PE routers forming a Multicast VPN. The PMSI is considered an overlay on the P-network used for sending to all or some PEs in the MVPN. The PMSI can be constructed using a number of transport mechanisms such as PIM, mLDP, or RSVP-TE. There is, however, a very clear distinction between the multicast service defined by the PMSI and the transport mechanism used to instantiate that PMSI, referred to as P-tunnels. This allows the MVPN architecture to easily facilitate the use of various transport protocols.
An Inclusive PMSI (I-PMSI) is one that enables a packet sent onto a particular MVPN to be received by all other PEs attached to the same MVPN. An I-PMSI is analogous to the Default-MDT of the draft-rosen Multicast VPN. A notable difference between draft-rosen and MVPN, however, is that draft-rosen establishes MVPN-specific PIM adjacencies between PE routers forming that MVPN and uses those adjacencies to propagate C-multicast routing information from CE to CE using PIM. Because MVPN I-PMSIs can be constructed using transport protocols other than PIM, an option is provided to maintain a PIM-free provider core by implementing a PIM-BGP interworking function at the PE and using BGP to propagate C-multicast routing information between PE routers within the MVPN.
A Selective PMSI (S-PMSI) provides a mechanism where a packet sent onto the MVPN is received by a subset of the other PEs of that MVPN, and there may be multiple S-PMSIs per MVPN. Again, it is analogous to a Data-MDT of the draft-rosen multicast VPN.
The MVPN specification defines a new NLRI, the MCAST-VPN NLRI, carried in Multi-Protocol BGP using AFI 1 SAFI 5. The information carried in the MCAST-VPN NLRI can be broken into two categories: information used for Auto-Discovery (A-D), and information used for distribution of C-multicast routing information. The format of the MCAST-VPN NLRI is shown in Figure 7-6. Table 7.1 lists the seven possible Route-Types. This section describes their use.
Figure 7-6 MCAST-VPN NLRI Format
Table 7.1 MCAST-VPN NLRI Route Types
Route-Type |
Category |
Purpose |
1 |
Intra-AS I-PMSI A-D route |
|
2 |
Inter-AS I-PMSI A-D route |
|
3 |
A-D Route |
S-PMSI A-D route |
4 |
Leaf A-D route |
|
5 |
Source-Active A-D route |
|
6 |
C-Multicast |
Shared-Tree Join route |
7 |
Source-Tree Join route |
The test topology in Figure 7-7 will be used to illustrate the use of MVPN. PE1, PE2, and PE4 form part of the MVPN and use mLDP for I-PMSI and S-PMSI transport tunnels. All of the PE routers in the MVPN belong to a single AS peering in IBGP with RR1, and all IBGP sessions are configured to support the MVPN-IPv4 Address Family. Like every other Address Family, MVPN-IPv4 Address Family is negotiated as a capability during the OPEN message exchange. C-PIM is run between the PE routers and CE routers, but because the intention is to use mLDP as the PMSI transport, PIM is not enabled within the P-instance.
Figure 7-7 MVPN Test Topology
Output 7-15 shows an example configuration for MVPN, and again I don't cover unicast parameters in this chapter because they have been described in Chapter 2. The first notable parameter is the auto-discovery command, which for Multicast-VPN is set to default. This essentially means MCAST-VPN A-D route (Route Type 1 or Route Type 2) is used for auto-discovery. Because PIM is not used within the PMSI in this example, the command c-multicast signaling bgp command is configured to enable the use of MCAST-VPN C-Multicast routes (Route Type 6 or 7) for propagating c-multicast signaling. Within SR-OS, propagation of C-Multicast signaling using BGP is the only option when mLDP or RSVP-TE is used for the I-PMSI transport tunnel. However, this does not completely eradicate the use of PIM in the provider core. For example, if PIM-SM is used in the C-Multicast instance and PIM Bootstrap messages are used to propagate Group-to-RP mappings, these are propagated natively through the I-PMSI.
The provider-tunnel provides the context for configuring the I-PMSI using the inclusive keyword. Within the inclusive node, the example shows that mLDP is used for the transport protocol for the P-tunnel and all that is required here is that mldp is placed in a no-shutdown state. The provider-tunnel also provides the context for configuring one or more S-PMSIs using the selective keyword, but note that SR-OS enforces the use of the same tunnel technology for both S-PMSI and I-PMSI. Therefore, mLDP is configured here. The data-threshold command defines which range of C-instance groups are allowed to trigger the creation of one or more S-PMSI's as well as the bandwidth threshold that must be crossed. Finally, the vrf-target command allows configuration of a Route-Target value that may or may not be the same value as that used for unicast VPN membership. The Route-Target value is signaled as part of the Auto-Discovery process (detailed later in this section), and its function is exactly the same as for a unicast VPN; it allows each PE to discover the PEs that belong to a given MVPN. If any other PE has the advertised Route-Target value configured for import into a VRF, it treats the advertising PE as a member of the MVPN.
Output 7-15: PE1 MVPN Configuration
service
vprn 202 customer 1 create
vrf-target target:64496:202
autonomous-system 64496
route-distinguisher 64496:202
auto-bind ldp
interface "PE-to-CE" create
address 192.168.0.1/30
sap 1/1/3:20.298 create
exit
exit
pim
interface "PE-to-CE"
exit
no shutdown
exit
mvpn
auto-discovery default
c-mcast signaling bgp
provider-tunnel
inclusive
mldp
no shutdown
exit
exit
selective
data-threshold 224.0.0.0/4 100
mldp
no shutdown
exit
exit
vrf-target target:64496:202
exit
exit
no shutdown
If an RSVP-TE P2MP LSP is used to construct the I-PMSI (and potentially one or more S-PMSIs), the configuration requirement is largely the same as mLDP with a couple of notable differences. Although the example continues to use mLDP as a transport tunnel, Output 7-16 shows the differences between mLDP and RSVP-TE. Within the inclusive and selective PMSI contexts, the rsvp keyword is used to indicate the use of RSVP-TE P2MP LSPs. Within the same context, an lsp-template is also referenced. The lsp-template is configured under the mpls context and simply defines the characteristics that should be used to signal the P2MP Source-to-Leaf (S2L) sub-LSPs. For P2MP LSPs the lsp-template must contain the creation-time attribute p2mp and must reference a default-path that defines the use of strict and/or loose hops or a dynamically computed path for the S2L sub-LSP. In this example the lsp-template references a default-path called “dynamic,” which has no strict/loose hops defined and therefore uses a dynamically computed path for each S2L sub-LSP.
Output 7-16: PE1 MVPN Configuration using RSVP-TE
router
mpls
path "dynamic"
no shutdown
exit
lsp-template "vrf202-p2mp" p2mp
default-path "dynamic"
cspf
no shutdown
exit
service
vprn 202 customer 1 create
mvpn
auto-discovery default
c-mcast-signaling bgp
provider-tunnel
inclusive
rsvp
lsp-template "vrf202-p2mp"
no shutdown
exit
exit
selective
rsvp
lsp-template "vrf202-p2mp"
no shutdown
exit
data-threshold 224.0.0.0/4 1
exit
exit
vrf-target target:701:202
exit
exit
To commence Auto-Discovery/tunnel binding within a common AS, PE routers within an MVPN each source an Intra-AS I-PMSI A-D route (Route Type 1). Debug 7-2 shows the Intra-AS I-PMSI A-D route sourced by PE4. The route carries a single NLRI with the RD set to the RD of the VRF and the originator field set to the system address (in this case PE4, 192.0.2.13). The Extended Community Route Target value determines whether the received route is eligible for import to a given MVPN. The semantics of the use of Route Target are exactly the same as used for BGP-MPLS IP-VPN. To constrain distribution of intra-AS membership/binding information, the UPDATE message also carries the well-known community NO_EXPORT.
Because the purpose of the Intra-AS I-PMSI A-D route is to enable P-tunnel binding, the UPDATE must also carry a PMSI Tunnel attribute. The PMSI Tunnel attribute is an optional transitive attribute that instructs the receiving PE or PEs how to construct the P-tunnel. Figure 7-8 shows its format.
Figure 7-8 PMSI Tunnel Attribute
The L flag is currently the only flag specified in the Flags field, and is used to indicate “Leaf Information Required.” When this flag is set, the receiving PE must respond by generating a Leaf A-D Route (Route Type 4) indicating that it needs to join, or be joined to, the signaled PMSI Tunnel. This allows the originating PE to elicit the leaves of the tree and is useful when the PE wants to know who the receivers are before selecting a P-tunnel, referred to as Explicit Tracking. I'll discuss a potential use-case for this later in this section.
The MPLS Label field offers the ability to carry a label in the high-order 20 bits of this 3-octet field. When multiple MVPNs are aggregated onto a single P-multicast tree, some demultiplexing information is required to allow the egress PE to determine to which MVPN the packet belongs. This is the function of the upstream assigned MPLS label signaled in the PMSI Tunnel attribute. When the demultiplexing label is used, it is placed immediately beneath the P-multicast tree header. Aggregate trees are not supported in SR-OS. While the intention of aggregate trees is to reduce P-multicast tree state, they are useful only when there is a high congruency of customer sites between different MVPN instances.
The Tunnel Identifier field identifies the type of tunneling technology that will be used to establish the PMSI tunnel. Table 7.2 lists the possible tunnel types and the tunnel identifier for each type. In the example of Debug 7-2, the tunnel type is P2MP mLDP (value 2). The identifier 0x2001, representing the P2MP FEC ID 8193, is locally assigned by PE4 to this LSP.
Table 7.2 PMSI Tunnel Identifiers
Value |
Tunnel Type |
Tunnel Identifier |
0 |
No Tunnel Information |
Not present (used for Explicit Tracking) |
1 |
RSVP-TE P2MP LSP |
<Tunnel ID, Extended Tunnel ID, P2MP ID> |
2 |
mLDP P2MP LSP |
Opaque P2MP FEC element |
3 |
PIM-SSM Tree |
<Sender Address, P-Multicast Group> |
4 |
PIM-SM Tree |
<P-root Node Address, P-Multicast Group> |
5 |
BIDIR-PIM Tree |
<Sender Address, P-Multicast Group> |
6 |
Ingress Replication |
Unicast tunnel endpoint IP address |
7 |
mLDP MP2MP LSP |
Opaque MP2MP FEC element |
Debug 7-2: A-D route with PMSI Tunnel Attribute
12 2013/06/12 11:13:26.04 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 105
Flag: 0x90 Type: 14 Len: 23 Multiprotocol Reachable NLRI:
Address Family MVPN_IPV4
NextHop len 4 NextHop 192.0.2.13
Type: Intra-AD Len: 12 RD: 64496:202 Orig: 192.0.2.13
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 8 Len: 4 Community:
no-export
Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.13
Flag: 0x80 Type: 10 Len: 4 Cluster ID:
192.0.2.23
Flag: 0xc0 Type: 16 Len: 8 Extended Community:
target:64496:202
Flag: 0xc0 Type: 22 Len: 22 PMSI:
Tunnel-type LDP P2MP LSP (2)
Flags [Leaf not required]
MPLS Label 0
Root-Node 192.0.2.13, LSP-ID 0x2001 "
You can verify the presence of Intra-AS I-PMSI A-D routes in the RIB-IN from participating PEs as shown in Output 7-17.
Output 7-17: Intra-AS I-PMSI routes at PE1
*A:PE1# show router bgp routes mvpn-ipv4 type intra-ad rd 64496:202
================================================================================
BGP Router ID:192.0.2.22 AS:64496 Local AS:64496
================================================================================
Legend -
Status codes : u - used, s - suppressed, h - history, d - decayed, * - valid
Origin codes : i - IGP, e - EGP, ? - incomplete, > - best, b - backup
================================================================================
BGP MVPN-IPv4 Routes
================================================================================
Flag RouteType OriginatorIP LocalPref MED
RD SourceAS Label
Nexthop SourceIP
As-Path GroupIP
--------------------------------------------------------------------------------
u*>i Intra-Ad 192.0.2.13 100 0
64496:202 - -
192.0.2.13 -
No As-Path -
u*>i Intra-Ad 192.0.2.21 100 0
64496:202 - -
192.0.2.21 -
No As-Path -
i Intra-Ad 192.0.2.22 100 0
64496:202 - -
192.0.2.22 -
No As-Path -
--------------------------------------------------------------------------------
Routes : 3
================================================================================
When the Intra-AS I-PMSI A-D route carries the PMSI tunnel attribute and the tunnel type is set to mLDP or PIM, the receiving PE advertises a P2MP FEC for the {Root Node, LSP ID} advertised in the PMSI tunnel attribute. This P2MP FEC is advertised upstream toward the root, and the receiver becomes a leaf of the tree rooted at the originating PE. In this case the tree is said to be “receiver-driven.” If the tunnel type is set to RSVP-TE, each PE participating in the MVPN learns of all the leaf PEs through I-PMSI A-D routes and then signals P2MP S2L sub-LSP to all of the leaves using RSVP-TE. Each PE subsequently adds or removes S2L sub-LSPs as PEs are added to the MVPN or removed from it. In this case the tree is said to be “source-driven.” Assuming a successful exchange of Intra-AS I-PMSI A-D routes takes place and LSPs are successfully set up, each PE forming the MVPN becomes the root of a P-multicast tree with all other participating PEs forming leaves of that tree.
Because P2MP LDP is used in the test topology, you can verify the presence of P2MP LSPs as shown at PE4 in Output 7-18. In this output at PE4, the first two entries show PE4 as the root of a P2MP tree with two sub-LSPs that egress on different interfaces towards different IGP next-hops. The last two entries show PE4 as the leaf of trees rooted at PE2 (192.0.2.21) and PE1 (192.0.2.22), respectively.
Output 7-18: PE4 P2MP LDP FECs
*A:PE4# show router ldp bindings fec-type p2mp
======================================================================
LDP LSR ID: 192.0.2.13
======================================================================
Legend: U - Label In Use, N - Label Not In Use, W - Label Withdrawn
WP - Label Withdraw Pending, BU - Alternate For Fast Re-Route
======================================================================
LDP Generic P2MP Bindings
======================================================================
P2MP-Id RootAddr
Interface Peer IngLbl EgrLbl EgrIntf/ EgrNextHop
LspId
----------------------------------------------------------------------
8193 192.0.2.13
73732 192.0.2.11 -- 262102 1/1/1:100 192.0.2.150
8193 192.0.2.13
73732 192.0.2.12 -- 262108 1/1/2:100 192.0.2.145
8193 192.0.2.21
73734 192.0.2.11 262109U -- -- --
8193 192.0.2.22
73733 192.0.2.11 262110U -- -- --
----------------------------------------------------------------------
No. of Generic P2MP Bindings: 4
======================================================================
Recall that in the draft-rosen implementation where PIM is used to construct the PMSI (MDT), VRF-specific PIM adjacencies are established across the P-multicast instance (see Output 7-10) to discover MVPN neighbors and maintain adjacencies. In the MVPN specification, when the PMSI is instantiated using non-PIM transport protocols such as mLDP or RSVP-TE, PIM is clearly not present. Moreover, even where PIM is used to instantiate the PMSI, the Auto-Discovery process is achieved using BGP, and so there is no requirement to send PIM Hellos to discover neighbors and maintain adjacencies. Output 7-19 shows this from PE1's perspective. The adjacency to CE1 (192.168.0.2) is PIM and has an expiration time suitable to that protocol, but the adjacencies to PE4 (192.0.2.13) and PE2 (192.0.2.21) have expiration times of never. This is because Auto-Discovery was facilitated by an MP_REACH_NLRI containing an Intra-AS I-PMSI A-D route, and if a PE is no longer available it simply withdraws that UPDATE by sending a corresponding MP_UNREACH _NLRI. There is no requirement for PIM to maintain the adjacency.
Output 7-19: MVPN PIM Neighbors at PE1
*A:PE1# show router 202 pim neighbor
======================================================================
PIM Neighbor ipv4
======================================================================
Interface Nbr DR Prty Up Time Expiry Time Hold Time
Nbr Address
----------------------------------------------------------------------
PE-to-CE 1 0d 21:48:54 0d 00:01:32 105
192.168.0.2
mpls-if-73731 1 0d 21:46:56 never 65535
192.0.2.13
mpls-if-73732 1 0d 21:46:30 never 65535
192.0.2.21
----------------------------------------------------------------------
Neighbors : 3
======================================================================
Next, consider how C-Multicast routing information is propagated through the MVPN. As shown in Figure 7-7, CE2 has a source connected at address 10.1.1.10 that is sending traffic to group address 239.255.194.222. At CE4 there is an interested receiver, so as the last hop router, CE4 sends a PIM (S,G) join toward the source. When this (S,G) join is received at PE4 it is not propagated any further toward the source using PIM, but is mapped into a C-Multicast Source-Tree Join route (Route Type 7) and forwarded in an MP_REACH _NLRI as shown in Debug 7-3. The Source-Tree Join contains the RD and Source AS, together with the C-Source and C-group address that is being joined.
Note that the “Target” Extended Community present in the Source-Tree Join is not the same as the Route Target used for unicast routing and/or MVPN Auto-Discovery (in this case 64496:202). So, how does the receiving PE know to which MVPN this C-Multicast UPDATE belongs? The MVPN specification introduces a new BGP Extended Community called the “VRF Route Import.” It is constructed as an IP address (system address) followed by a 2-octet integer and allows the community to uniquely identify a VRF on a given PE. The VRF Route Import Extended Community is appended to every unicast VPN-IP prefix sourced by PEs belonging to the VPN unless it is known that this VPN-IP address could never be a multicast source and/or RP. For example, the Source-Tree Join advertised by PE4 is destined toward a source 10.1.1.10 connected to CE2, which in turn is connected to PE2. In Output 7-20, the VPN-IPv4 prefix advertised by PE2 for the prefix 10.1.1.0/24 carries the VRF Route Import community value 192.0.2.21:4, which uniquely identifies PE2 and the specific VRF expressed as an integer at PE2. This is the value that is used in the Route Target Extended Community by PE4 when it sends the Source-Tree Join and ensures that the UPDATE is imported only at PE2. In this case, PE2 is referred to as the Upstream Multicast Hop (UMH) from the perspective of PE4 or any other receiver PEs.
The RD value used in the NLRI of the Source-Tree Join can also vary. In the test setup of Figure 7-7, the PEs in the MVPN share a common RD (64496:202). However, if different RD's were in use within the MVPN, the NLRI in the Source-Tree Join advertised by PE4 would contain the RD value of the VPN-IPv4 prefix advertised by PE2 for the prefix 10.1.1.0/24. The reason for using the RD of the Upstream Multicast Hop is to try to alleviate load on the PE connected to the source. For example, if the source PE and receiver PEs peer to a Route-Reflector, the Route-Reflector receives multiple Source-Tree Joins but only forwards the best path to the source. This use of Route Target and Route Distinguisher values derived from the VPN-IPv4 prefix sourced by the Upstream Multicast Hop is applicable to Source-Tree Joins, Shared-Tree Joins, and Leaf A-D Routes, where there is only one intended recipient.
A few words about Upstream Multicast Hop (UMH) selection. In the example, the route toward the C-source 10.1.1.10 is known only via PE4, so the process of selecting the UMH is straightforward. However, in the event that a number of PEs have advertised the same VPN-IPv4 prefix for the C-source (or C-RP) subnet, you need a mechanism to ensure that Source-Tree/Shared-Tree Joins from all PEs are sent to a single PE to avoid duplicate packets being sent onto the PMSI.
This process is known as the “Single Forwarder Selection” and uses a process known as the UMH selection process to ensure that a single PE is selected. When determining the UMH for a particular C-source/C-RP, a PE groups all PEs that have advertised a unicast route for the C-source/C-RP subnet (not just the PE with the preferred unicast route) and place them into a “UMH-route-candidate-set.” From this set, the default behavior for UMH selection is to use the numerically highest IP address which, of course, ensures that all PEs within the multicast domain select the same UMH. (This, in turn, avoids duplicate packets being sent onto the PMSI.) This may not be guaranteed if, for example, UMH selection is based on the preferred unicast route toward the C-source/C-RP, particularly if the preferred unicast route selection is based on IGP distance to the BGP Next-Hop. However, SR-OS provides for alternative UMH selection criteria using one of the following:
· A hash-based selection algorithm. This allows for load-balancing of UMH per (C-root, C-G) state.
· Tunnel-status monitoring. This uses a unidirectional BFD-like mechanism between root and leaf that allows the leaf to monitor the tunnel health and switch to a backup UMH upon failure, but is applicable only to RSVP I-PMSIs.
· Unicast route preference.
You can configure these options within the mvpn context using the command umh-selection.
Debug 7-3: Source-Tree Join route from PE4
1 2013/06/13 11:58:44.03 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 83
Flag: 0x90 Type: 14 Len: 33 Multiprotocol Reachable NLRI:
Address Family MVPN_IPV4
NextHop len 4 NextHop 192.0.2.13
Type: Source-Join Len:22 RD: 64496:202 SrcAS: 64496 Src: 10.1.1.10
Grp: 239.
255.194.222
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0x80 Type: 9 Len: 4 Originator ID: 192.0.2.13
Flag: 0x80 Type: 10 Len: 4 Cluster ID:
192.0.2.23
Flag: 0xc0 Type: 16 Len: 8 Extended Community:
target:192.0.2.21:4 "
Output 7-20: VRF Route Import Extended Community
*A:PE4# show router bgp routes vpn-ipv4 10.1.1.0/24 detail | match post-lines 1 "Community"
Community : target:64496:202 l2-vpn/vrf-imp:192.0.2.21:4
source-as:64496:0
When PE2 receives the Source-Tree Join shown in Debug 7-3, it creates the associated PIM state for group 239.255.194.222 within the MVPN and propagates the join upstream toward the source (CE2) using PIM. Because CE2 is the first-hop router, it subsequently forwards multicast traffic downstream toward PE2, at which point PE2 will recognizes that a multicast source is active.
If the MVPN instance supports both shared trees and source trees (or just shared trees), the PE detecting an active source advertises a Source Active A-D route (Route Type 5) to all other PE routers in the MVPN. The purpose of this Source Active A-D route is to avoid packet duplication onto the PMSI in certain circumstances. Previously, this section discussed the selection of a common and consistent Upstream Multicast Hop (UMH) to avoid packet duplication onto the PMSI. However, when an MVPN receiver switches from a shared tree (C-RP based) to a shortest path tree, this consistent UMH selection may not be enough. This transition can cause packet duplication where both the RP and the source are forwarding traffic onto the PMSI.
For example, assume that PE1, PE2, PE3, PE4, and PE5 belong to the same MVPN, for which a multipoint-to-multipoint LSP is used to construct a Multi-Directional Inclusive-PMSI (MI-PMSI). (Multipoint-to-multipoint LSPs are not currently supported in SR-OS, but the explanation is still valid.) PE1, PE2, and PE3 are on the (C-*, C-G) tree, where PE4 is the selected UMH for this group (because the C-RP is connected to PE4). PE2 subsequently switches to the shortest path tree and generates a Source-Tree Join C-multicast route toward PE5, which is the selected UMH for the C-source. This results in PE1, PE2, and PE3 receiving duplicate traffic for the (C-S, C-G) - from PE4 on the C-RP shared tree and from PE5 on the source tree. The RPF check at the receiver PEs includes checking that the traffic is arriving on the correct interface, but does not check if the traffic is arriving from the anticipated PE (the UMH).
To avoid this situation, a PE detecting an active C-multicast source advertises the Source Active A-D route for the (C-S, C-G) entry. When other PEs in the MVPN receive this Source Active A-D route, they check if the C-group advertised in that Source Active A-D route matches an active (C-*, C-G) entry in the MVPN. If it does, the receiving PE sends a Source-Tree Join toward the PE originating the Source Active A-D route to receive the (C-S, C-G) traffic. If the receiving PE also has the PMSI as an outgoing interface in the MVPN (C-*, C-G) entry, it transitions that entry to represent (C-*, C-G, rpt) state. In other words, the PE acts as if it had received a PIM (C-*, C-G, rpt) prune from the PMSI, without actually having received one.
The advertising of Source Active A-D routes is not required when an MVPN supports only source-based trees (PIM-SSM) and can be disabled using the no intersite-shared command within the mvpn context. Note that although a PIM-SSM range may be defined within the VPRN PIM instance, this does not preclude PIM-SM from also operating within that same PIM instance—it only precludes PIM-SM from operating within that defined group range.
Because PE2 has detected that C-source 10.1.1.10 is active, and this MVPN instance has the capability to support shared trees and source trees, it advertises a Source Active A-D route as shown in Debug 7-4 containing the C-S address 10.1.1.10 and C-G address 239.255.194.222.
Debug 7-4: Source Active A-D Route sourced by PE2
2 2013/06/13 11:58:50.23 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Send BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 65
Flag: 0x90 Type: 14 Len: 29 Multiprotocol Reachable NLRI:
Address Family MVPN_IPV4
NextHop len 4 NextHop 192.0.2.21
Type: Source-AD Len: 18 RD: 64496:202 Src: 10.1.1.10 Grp: 239.255.194.222
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 8 Extended Community:
target:64496:202 "
If a PE router that has joined to a C-Multicast group no longer has interested receivers, it sources a withdraw (MP_UNREACH _NLRI) for the previously advertised Source-Tree (or Shared-Tree) Join route. Debug 7-5 shows an example of this. CE4 sends a PIM prune for group 239.255.194.222 toward PE4, which then sources the MP_UNREACH_NLRI for the Source-Tree Join route.
Debug 7-5: PE4 Source-Tree Join Withdraw
4 2013/06/13 14:31:14.07 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Received BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 31
Flag: 0x90 Type: 15 Len: 27 Multiprotocol Unreachable NLRI:
Address Family MVPN_IPV4
Type: Source-Join Len:22 RD: 64496:202 SrcAS: 64496 Src: 10.1.1.10 Grp: 239.
255.194.222 "
The A-D route signaling, subsequent P-tunnel instantiation, and C-Multicast routing information so far have been restricted to the I-PMSI, where every PE participating in the MVPN receives every packet forwarded onto the I-PMSI. As previously discussed, in the presence of C-multicast traffic where only a subset of the PEs in the MVPN have interested receivers, the option exists to optimize bandwidth utilization and move to a Selective PMSI where only that subset of PEs receives the traffic.
Referring again to Figure 7-7, the host 10.1.1.10 behind CE2 is sending to group 239.255.194.222 and an interested receiver behind CE4 has joined to the group. In this scenario, PE1 has no interested receivers but receives the C-multicast traffic from 10.1.1.10 over the I-PMSI (in a P2MP mLDP LSP) and simply discards the traffic. To optimize the traffic distribution, PE2 monitors the data-threshold configured in the MVPN together with the permitted C-group range. When the threshold has been crossed, PE2 signals the move from I-PMSI to S-PMSI using an S-PMSI A-D route (Route Type 3). Debug 7-6 shows the S-PMSI A-D route sourced from PE2 when host 10.1.1.10 crosses the data-threshold, where the NLRI contains the RD and originating PE system address as well as the C-source and C-multicast group. PE routers in the MVPN that receive this UPDATE message check to see if they have interested receivers, and if the answer is yes they establish the S-PMSI P-tunnel indicated in the PMSI tunnel attribute. In this case, the P-tunnel transport protocol is mLDP and the LSP ID is 8194 (0x2002). After signaling the S-PMSI A-D route, the originating PE connected to the source waits the data-delay-interval (default three seconds) before switching the C-multicast traffic from the I-PMSI to the advertised S-PMSI.
Debug 7-6: PE2 S-PMSI A-D Route
10 2013/06/13 14:41:50.23 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Send BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 94
Flag: 0x90 Type: 14 Len: 33 Multiprotocol Reachable NLRI:
Address Family MVPN_IPV4
NextHop len 4 NextHop 192.0.2.21
Type: SPMSI-AD Len: 22 RD: 64496:202 Orig: 192.0.2.21 Src:
10.1.1.10 Grp: 239.255.194.222
Flag: 0x40 Type: 1 Len: 1 Origin: 0
Flag: 0x40 Type: 2 Len: 0 AS Path:
Flag: 0x80 Type: 4 Len: 4 MED: 0
Flag: 0x40 Type: 5 Len: 4 Local Preference: 100
Flag: 0xc0 Type: 16 Len: 8 Extended Community:
target:64496:202
Flag: 0xc0 Type: 22 Len: 22 PMSI:
Tunnel-type LDP P2MP LSP (2)
Flags [Leaf not required]
MPLS Label 0
Root-Node 192.0.2.21, LSP-ID 0x2002
"
Pause for a moment and hypothetically imagine that the preceding S-PMSI A-D Route was constructed using an RSVP-TE P2MP LSP instead of an mLDP P2MP LSP. Recall that P2MP mLDP LSPs use receiver-driven trees, so upon receipt of an S-PMSI A-D route specifying P2MP LDP as the tunnel-type, the receiving PE has all the information it needs (Root-Node, LSP-ID) to join as a leaf of the tree rooted at the originating PE. In contrast, P2MP RSVP LSPs use source-driven trees, and so some information is needed at the source in order for it to know which PEs it should signal the P2MP S2L sub-LSPs to. In other words, the source needs to know who the leaves of the tree are before it can join them to the tree. In the case of the I-PMSI, that information is derived from the PMSI Tunnel Attribute of the Intra-AS I-PMSI A-D Route because they are sourced by every PE in the Multicast VPN and contain each PE's originating IP address. In the case of the S-PMSI, however, the PE connected to the source can't know the subset of remote PEs in the Multicast VPN that have interested receivers and who therefore have a requirement to join to the S-PMSI tree. So, when using RSVP-TE P2MP LSPs for S-PMSI creation, the S-PMSI A-D Route has the L flag (Leaf Required) set in the PMSI tunnel attribute. Remote PEs that have interested receivers subsequently respond with a Leaf A-D Route (Route Type 4) indicating a requirement to join to the PMSI tunnel. This provides the PE connected to the source with the information about who the leaves of the tree should be, and allows RSVP-TE P2MP S2L sub-LSPs to be signaled to each of those leaves.
In the example using mLDP, the PE connected to the source continues to monitor the data threshold of the C-source even after the S-PMSI has been signaled. If the monitored rate falls below the configured threshold, the PE withdraws the advertised S-PMSI A-D route in an MP_UNREACH_NLRI and thereafter reverts the C-multicast traffic back to the I-PMSI.
Debug 7-7: PE2 S-PMSI A-D route Withdraw
12 2013/06/13 14:42:58.46 UTC MINOR: DEBUG #2001 Base Peer 1: 192.0.2.23
"Peer 1: 192.0.2.23: UPDATE
Peer 1: 192.0.2.23 - Send BGP UPDATE:
Withdrawn Length = 0
Total Path Attr Length = 31
Flag: 0x90 Type: 15 Len: 27 Multiprotocol Unreachable NLRI:
Address Family MVPN_IPV4
Type: SPMSI-AD Len: 22 RD: 64496:202 Orig: 192.0.2.21 Src: 10.1.1.10 Grp:
239.255.194.222 "
Using BGP to propagate C-Multicast routing information can have a significant impact on the PIM join/leave latency times because, by default, the MCAST-VPN NLRI is subject to the Minimum Route Advertisement Interval (MRAI) at each BGP speaker. This makes the problem even more acute where Route-Reflection is used. To help maintain join/leave latency times, the rapid-update command under the global BGP context can be used for the mvpn-ipv4 Address Family, which essentially bypasses the configured min-route-advertisement interval and sends UPDATE messages as soon as they are originated or received.
Output 7-21: Rapid-Update for MCAST-VPN NLRI
bgp
rapid-update mvpn-ipv4
group "IBGP"
family vpn-ipv4 vpn-ipv6 mvpn-ipv4
peer-as 64496
…etc