Thursday, February 14, 2013

CCIE DC Multicast Part 4

Hi Guys Part 4 of my CCIE DC multicast article is presented below, in this article we are going to look at Rendezvous points discovery mechanisms including Auto-RP and Boot Strap Router (BSR), then Anycast RP. You probably have seen from my blog posts so far the standard method of assigning an RP:

ip pim rp-address


but as your multicast network grows, this could potentially become less scalable than you might like, so let's look at some other options.


Auto RP
 

Auto RP is a cisco proprietary method that to be honest is not really used as much anymore now that PIMv2 is available, but for completeness we will go over it.

The way it works is that all PIM enabled routers automatically join the Cisco-RP-Discovery multicast group (which is 224.0.1.40), in order to receive RP mapping information.

RP Mapping info is sent (sourced) to this group by a cisco router configured as a mapping agent. Multiple mapping agents can be configured for redundancy purposes.

A mapping agent also joins the group 224.0.1.39 in order to learn which routers in a network are potentially RP's. Candidate RP's source there candidacy as an RP to this multicast group by sending an RP_Announce message to this Cisco-RP-Announce group.

If multiple RP's announce there candidacy, highest IP address wins

The mapping agent then distributes this information to the 224.0.1.40 as an RP_Discovery message.

Without further explanation let's dive right in!

For this lab we will go back to our original topology,








In this we will make our RP the candidate via autoRP and make sure all our other devices discover it.

PIM2 will act as our mapping agent.

First of all, if we look on PIM1:

PIM1#show ip mroute
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry, E - Extranet,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group,
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 224.0.1.40), 00:26:20/00:02:36, RP 0.0.0.0, flags: DCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet1/0, Forward/Sparse, 00:22:19/00:02:32


You can see we have as we mentioned, already joined the multicast group 224.0.1.40, but look at those flags, one of them is D for Dense! But we specified sparse mode!

The problem with the auto-RP mechanism is that the RP info is distributed via multicast.. now if we don't forward multicast, because we don't have an RP and we are in sparse mode.. you can see that this would present a chicken and egg problem, so these groups are automatically put into Dense mode so that the traffic can propogate across the network (potential attack vector? you decide)


So, let's make RP announce itself.


RP(config)#ip pim send-rp-announce lo1 scope 4
 


 The scope command controls the TTL value of the packet, to make sure that we don't announce our RP candidacy too far into the network (incase you had separate RP's for diffirent areas of your network)

Let's look at RP's routing table


RP#show ip mro
(*, 224.0.1.39), 00:00:46/stopped, RP 0.0.0.0, flags: DP
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list: Null

(3.3.3.3, 224.0.1.39), 00:00:46/00:02:13, flags: PT
  Incoming interface: Loopback1, RPF nbr 0.0.0.0
  Outgoing interface list: Null



You can see from above that the RP has now joined the multicast group 224.0.1.39 and infact is showing the (S,G) 3.3.3.3!

RP#debug ip pim
PIM debugging is on
*Feb 14 12:27:07.627: PIM(0): check pim_rp_announce 1
*Feb 14 12:27:07.627: PIM(0): send rp announce

You can see that the RP has even started announcing, but because there are no outgoing interfaces, the RP is not being seen anywhere else. Let's configure our mapping agent Next (PIM2)

PIM2(config)#ip pim send-rp-discovery scope 4



This will allow PIM2 to suddenly act as the mapping agent, while we where entering this command the following debug showed up on the RP:


*Feb 14 12:28:07.627: PIM(0): check pim_rp_announce 1
*Feb 14 12:28:07.627: PIM(0): send rp announce
*Feb 14 12:29:07.627: PIM(0): check pim_rp_announce 1
*Feb 14 12:29:07.627: PIM(0): send rp announce
*Feb 14 12:29:07.743: PIM(0): Initiating register encapsulation tunnel creation for RP 3.3.3.3
*Feb 14 12:29:07.751: PIM(0): Initial register tunnel creation succeeded for RP 3.3.3.3
*Feb 14 12:29:07.755: PIM(0): Initiating register decapsulation tunnel creation for RP 3.3.3.3
*Feb 14 12:29:07.759: PIM(0): Initial register tunnel creation succeeded for RP 3.3.3.3
*Feb 14 12:29:08.059: PIM(0): Received v2 Join/Prune on GigabitEthernet2/0 from 10.2.0.1, to us
*Feb 14 12:29:08.067: PIM(0): Join-list: (*, 239.1.1.1), RPT-bit set, WC-bit set, S-bit set
*Feb 14 12:29:08.075: PIM(0): Check RP 3.3.3.3 into the (*, 239.1.1.1) entry
*Feb 14 12:29:08.083: PIM(0): Adding register decap tunnel (Tunnel1) as accepting interface of (*, 239.1.1.1).
*Feb 14 12:29:08.091: PIM(0): Add GigabitEthernet2/0/10.2.0.1 to (*, 239.1.1.1), Forward state, by PIM *G Join*Feb 14 12:29:08.807: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel0, changed state to up
*Feb 14 12:29:08.903: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to up


You can see that the mapping agent has joined the multicast group 224.0.1.39

RP:
(*, 239.1.1.1), 00:01:15/00:03:12, RP 3.3.3.3, flags: S
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet2/0, Forward/Sparse, 00:01:15/00:03:12

(*, 224.0.1.39), 00:05:15/stopped, RP 0.0.0.0, flags: DC
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet2/0, Forward/Sparse, 00:01:27/00:02:25 

 
The RP forwards traffic out GI2/0 which faces PIM2.

On PIM2:

(3.3.3.3, 224.0.1.39), 00:01:52/00:02:07, flags: LT
  Incoming interface: GigabitEthernet1/0, RPF nbr 10.2.0.2
  Outgoing interface list:
    GigabitEthernet2/0, Forward/Sparse, 00:01:52/00:02:50
    GigabitEthernet3/0, Forward/Sparse, 00:01:52/00:02:25



We can see that the RP Mapping agent is forwarding traffic for 224.0.1.39 from this RP

On PIM1 there is now a source for traffic to 224.0.1.40:



(10.2.0.1, 224.0.1.40), 00:02:53/00:02:05, flags: LT
  Incoming interface: GigabitEthernet2/0, RPF nbr 10.1.0.2
  Outgoing interface list:
    GigabitEthernet1/0, Forward/Sparse, 00:02:53/00:02:23


It's 10.2.0.1, which is PIM2, so we can see here that PIM1 is learning about the RP over this multicast group.

The following command verifies these mappings:


PIM1#show ip pim rp mapping
PIM Group-to-RP Mappings

Group(s) 224.0.0.0/4
  RP 3.3.3.3 (?), v2v1
    Info source: 10.2.0.1 (?), elected via Auto-RP
         Uptime: 00:04:02, expires: 00:02:52


This concludes Auto-RP for our purposes, let's move on to BSR.


Bootstrap Router Mechanism


BSR is very similiar to auto-rp except it does not use multicast to send RP information, instead it uses a hop by hop flooding mechanism.

The key to the BSR mechanism is the boot strap router itself, one of the routers is elected the BSR.

The Candidate RP's then inform the BSR via unicast of there candidacy.


The BSR then floods this information out all interfaces every 60 seconds. The BSR floods out ALL candidates that it has possibly received, and the PIM routers all run the same hash algorithim to select the most appropriate RP for each group list. leading to all routers selecting the same RP's The reasoning behind this is that if an RP fails, all the routers in the PIM network have all the info they need straight away to select another RP, reducing failure time.


Multiple BSR's may also be configured, the root BSR is elected in a similiar method to how the root of a spanning-tree is selected. BSR priority can be used to determine a primary BSR.

If no BSR is available, the routers will switch to the statically configured RP address, if none is configured, all groups will switchover to dense mode. so you can see that a lot of focus on high availability  has been placed with BSR.


Let's check out how this works

In this Example, RP is going to be our candidate RP, and PIM2 will be our BSR.

RP(config)#ip pim rp-candidate lo1

RP#PIM(0): rp adv timer expired
*Feb 14 14:00:19.627: PIM-BSR(0): Build v2 Candidate-RP advertisement for 3.3.3.3 priority 0, holdtime 150
*Feb 14 14:00:19.627: PIM-BSR(0):  Candidate RP's group prefix 224.0.0.0/4
*Feb 14 14:00:19.631: PIM-BSR(0): no bootstrap router address


As you can see from above, there is no BSR candidate, so the RP candidate has no one to advertise to, so we need to fix that.

Next, we configure PIM2 as a BSR Candidate:

PIM2(config)#ip pim bsr-candidate gi1/0


Suddenly things go a little crazy:

PIM2#
*Feb 14 14:01:35.507: PIM-BSR(0): Bootstrap message for 10.2.0.1 originated
*Feb 14 14:01:35.571: PIM(0): Received v2 Bootstrap on GigabitEthernet3/0 from 10.0.0.1
*Feb 14 14:01:35.575: PIM-BSR(0): bootstrap (10.2.0.1) on non-RPF path GigabitEthernet3/0 or from non-RPF neighbor 10.2.0.1 discarded
 


Here we have the PIM2 sending out it's BSR candidacy, remember that BSR candidacy is flooded out all interfaces, you can see that PIM2 actually receives a BSR candidate message on Gi3/0 for itself from PIM1! But it discards it, but the point is clear: the BSR candidate availability is sent out all interfaces.

*Feb 14 14:01:36.071: %SYS-5-CONFIG_I: Configured from console by console
*Feb 14 14:01:38.203: PIM(0): Received v2 Candidate-RP-Advertisement on GigabitEthernet1/0 from 10.2.0.2
*Feb 14 14:01:38.207: PIM-BSR(0):  RP 3.3.3.3, 1 Group Prefixes, Priority 0, Holdtime 150
*Feb 14 14:01:38.211: (0): pim_add_prm:: 224.0.0.0/240.0.0.0, rp=3.3.3.3, repl = 0, ver =2, is_neg =0, bidir = 0, crp = 0
*Feb 14 14:01:38.215: PIM(0): Added with
*Feb 14 14:01:38.219:  prm_rp->bidir_mode = 0 vs bidir = 0 (224.0.0.0/4, RP:3.3.3.3), PIMv2
 

Here you can see the BSR (PIM2) received a  candidate RP message from the RP, so it addeds it as an RP and starts advertising it out.

*Feb 14 14:01:38.219: PIM(0): Initiating register encapsulation tunnel creation for RP 3.3.3.3
*Feb 14 14:01:38.219: PIM(0): Initial register tunnel creation succeeded for RP 3.3.3.3
*Feb 14 14:01:38.219: PIM(0): Check RP 3.3.3.3 into the (*, 239.1.1.1) entry
*Feb 14 14:01:38.235: PIM-BSR(0): RP-set for 224.0.0.0/4
*Feb 14 14:01:38.235: PIM-BSR(0):   RP(1) 3.3.3.3, holdtime 150 sec priority 0
*Feb 14 14:01:38.239: PIM-BSR(0): Bootstrap message for 10.2.0.1 originated

Now that PIM2 itself has an RP, it creates the RP tunnels for multicast traffic delivery, and originates a BSR message so all the other routers can learn about the RP.


On PIM1 we can confirm this:

PIM1#show ip pim rp mapping
PIM Group-to-RP Mappings

Group(s) 224.0.0.0/4
  RP 3.3.3.3 (?), v2
    Info source: 10.2.0.1 (?), via bootstrap, priority 0, holdtime 150
         Uptime: 00:04:10, expires: 00:02:16


As you can see the router has learnt about the RP via BSR.


Finally, let's look at anycast RP.

Anycast RP

The thing about all the above protocols, is that RP recovery takes quite a while! Should the RP die things take quite a while to switch over, and what's worse is both Auto-RP and BSR produce a lot of control plane load just to provide redundancy. So the ultra smart internet engineers said to themselves "What if we could use the unicast routing tables to provide RP redundancy?"

And walla, Anycast RP was born.

Anycast RP is not _really_ a protocol (well, it is, but more on that later. bare with me here!), what happens is, we specify an RP address that actually exists on two routers as a loopback!

What this means is, each of our PIM devices will route multicast to the closest RP to them, and in the event that one of the RP's dies, the unicast routing protocol which we know and love will direct each of the devices to the appropriate location. However things are not quite this simple as this will introduce a little problem, which we will cover soon


Let's check it out!


For this example, we need to modify our topology slightly



Now we have two RP's for this concept and a single source connected to both.. this will become important shortly!

 On RP1 and RP2 define a loopback address:

interface Loopback1
 ip address 4.4.4.4 255.255.255.255
end






Next, we go to our source and check it's route to 4.4.4.4:

source#show ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 2, type intra area
  Last update from 10.1.0.1 on GigabitEthernet1/0, 00:00:12 ago
  Routing Descriptor Blocks:
  * 10.2.0.1, from 10.2.0.1, 00:00:59 ago, via GigabitEthernet2/0
      Route metric is 2, traffic share count is 1
    10.1.0.1, from 10.1.0.1, 00:00:12 ago, via GigabitEthernet1/0
      Route metric is 2, traffic share count is 1



As you can see, it's route to 4.4.4.4 lists both routers as equal paths, so let's make Source prefer RP2's link:


!
interface GigabitEthernet1/0
 ip ospf cost 20000
end

source#show ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 2, type intra area
  Last update from 10.2.0.1 on GigabitEthernet2/0, 00:01:10 ago
  Routing Descriptor Blocks:
  * 10.2.0.1, from 10.2.0.1, 00:01:57 ago, via GigabitEthernet2/0
      Route metric is 2, traffic share count is 1



Now our preferred path to 4.4.4.4 is via Gi2/0, Great!


(please note that all of this is optional at this point, I am just doing it to show you what Anycast RP "breaks.." you will see in a minute)

so, let's go to each router and add 4.4.4.4 as an RP statically, note that anycast RP can work with auto RP or BSR for advertising itself, the only real trick to anycast RP is that we have the same IP address on multiple RP's.

On receiver1 and receiver2 let's join the multicast group 239.1.1.1:

Receiver2(config-if)#ip igmp join-group 239.1.1.1

Done, let's now look at the routing tables of RP1 and RP2:


RP1#show ip mroute
IP Multicast Routing Table

(*, 239.1.1.1), 00:00:44/00:02:15, RP 4.4.4.4, flags: SJC
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet1/0, Forward/Sparse, 00:00:44/00:02:15






RP2#show ip mroute
(*, 239.1.1.1), 04:34:41/00:02:11, RP 4.4.4.4, flags: SJC
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet2/0, Forward/Sparse, 04:34:41/00:02:11




Both RP1 and RP2 are showing an outgoing interface for the traffic for 239.1.1.1, Great! Let's try a ping from the source now.

source#ping 239.1.1.1
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:

Reply to request 0 from 2.2.2.1, 52 ms

Hmmm.. that's weird, i only got a response from one receiver?

The reason is because the RP's are not aware of each others sources for traffic, so when traffic is delivered from the source up to the RP, only one RP gets a copy of it, and only one RP can then deliver that traffic to it's receivers, since they are both not aware of each others receivers and sources, problems will occur like above!

If we make source prefer the route via Gi1/0...


source(config-if)#int gi2/0
source(config-if)#ip ospf cost 30000

source#show ip route 4.4.4.4
Routing entry for 4.4.4.4/32
  Known via "ospf 1", distance 110, metric 20001, type intra area
  Last update from 10.1.0.1 on GigabitEthernet1/0, 00:00:28 ago
  Routing Descriptor Blocks:
  * 10.1.0.1, from 10.1.0.1, 00:00:28 ago, via GigabitEthernet1/0      Route metric is 20001, traffic share count is 1


When we ping the multicast 239.1.1.1 it only responds for our Reciever1 receiver, even though both have joined the group:

source#ping 239.1.1.1 source gi1/0
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 10.1.0.2

Reply to request 0 from 1.1.1.1, 36 ms


To resolve this problem on IOS we use Multicast Source Discovery Protocol, which as the name probably implies, helps discover sources, originally used for inter-isp multicast routing we use it here to help us with our multiple AP setup.

RP2(config)#ip msdp peer 1.1.1.2 connect-source gi2/0
RP2(config)#ip msdp originator-id gi2/0

And on RP1 we do the opposite:


RP1(config)#ip msdp peer 2.2.2.2 connect-source gi1/0
*Feb 14 16:57:10.258: %MSDP-5-PEER_UPDOWN: Session to peer 2.2.2.2 going uor
RP1(config)#ip msdp originator-id gi1/0


We can now see a peer relationship between the two over MSDP:


RP1#show ip msdp sum
MSDP Peer Status Summary
Peer Address     AS    State    Uptime/  Reset SA    Peer Name
                                Downtime Count Count
2.2.2.2          ?     Up       00:00:25 0     0     ?

Let's see what happens when we ping now...

source#ping 239.1.1.1 source gi1/0
Type escape sequence to abort.
Sending 1, 100-byte ICMP Echos to 239.1.1.1, timeout is 2 seconds:
Packet sent with a source address of 10.1.0.2

Reply to request 0 from 1.1.1.1, 52 ms
Reply to request 0 from 2.2.2.1, 92 ms


Success!



 Let's check out the Peering command:

RP2#show ip msdp peer
MSDP Peer 1.1.1.2 (?), AS ?  Connection status:
    State: Up, Resets: 0, Connection source: GigabitEthernet2/0 (2.2.2.2)
    Uptime(Downtime): 00:00:17, Messages sent/received: 1/2
    Output messages discarded: 0
    Connection and counters cleared 00:01:17 ago
  SA Filtering:
    Input (S,G) filter: none, route-map: none
    Input RP filter: none, route-map: none
    Output (S,G) filter: none, route-map: none
    Output RP filter: none, route-map: none
  SA-Requests:
    Input filter: none
  Peer ttl threshold: 0
  SAs learned from this peer: 1  Number of connection transitions to Established state: 1
    Input queue size: 0, Output queue size: 0
  MD5 signature protection on MSDP TCP connection: not enabled
  Message counters:
    RPF Failure count: 0
    SA Messages in/out: 1/0
    SA Requests in: 0
    SA Responses out: 0
    Data Packets in/out: 0/0

 We can see from the above that we have peer'd with the other RP and that there is an active source address that we are caching


RP2#show ip msdp sa-cache
MSDP Source-Active Cache - 1 entries
(10.1.0.2, 239.1.1.1), RP 1.1.1.2, AS ?,00:01:18/00:05:40, Peer 1.1.1.2

Now if we check the show ip mroute for that entry



(10.1.0.2, 239.1.1.1), 00:01:34/00:01:25, flags: M
  Incoming interface: GigabitEthernet3/0, RPF nbr 10.0.0.1
  Outgoing interface list:
    GigabitEthernet2/0, Forward/Sparse, 00:01:34/00:02:36


We can see that there is an entry for this source with an interesting new flag we have not seen before "M", which means "M - MSDP created entry"


Couldn't have said it easier myself


Now! This is one way to do it with MSDP, however on Nexus operating system (what we will be using in the CCIE DC when we all pass ;)) we actually DO have a protocol called Anycast RP, and the protocol is used to allow two anycast RP's to share information about active sources and is NOT part of MSDP.


To configure on nexus, issue the following commands:


Nexus:
ip pim anycast-rp 172.16.1.1 192.168.10.1
ip pim anycast-rp 172.16.1.1 192.168.10.2 


172.16.1.1 is your actual RP address, and 192.168.10.1 is an IP address of the nexus itself (you must specify yourself as being an RP Candidate) and 192.168.10.2 is the other RP Candidate.


I hope you enjoyed this blog entry, Now that we have covered pretty much the whole nine yards of Multicast I promise the next one will cover how all of this ties into Nexus, the CCIE DC exam and OTV :)



7 comments:

  1. Best multicast explanation i've ever read!
    Thanks a lot!

    ReplyDelete
  2. Peter, excellent series on multicast. On you comment: "Now that we have covered pretty much the whole nine yards of Multicast I promise the next one will cover how all of this ties into Nexus, the CCIE DC exam and OTV :)", have you posted this yet or any plans to post it?

    ReplyDelete
  3. Hi Tracy Part 5 is now available :)
    http://www.ccierants.com/2013/05/ccie-dc-multicast-part-5-multicast-and.html

    ReplyDelete
  4. Amazing explanations!

    Dan ///ccie34827

    ReplyDelete
  5. Aaaah, I'm going to Brussels in two days to do the DC lab and I had overlooked how NX-OS's anycast-rp worked in relation to MSDP! I was careless. Thanks for pointing that out.

    ReplyDelete