Sunday, May 26, 2013

CCIE DC: Multicast Part 5, Multicast and it's relation to OTV

Hi Guys

So in my four part Multicast Series:
http://www.ccierants.com/2013/02/ccie-dc-multicast-part-1.html
http://www.ccierants.com/2013/02/ccie-dc-multicast-part-2.html
http://www.ccierants.com/2013/02/ccie-dc-multicast-part-3.html
http://www.ccierants.com/2013/02/ccie-dc-multicast-part-4.html


We went in depth on multicast, multicast with SSM, multicast with Bi-DIR and how our RP's etc work for us with multicast, I promised in Part 1 that I would link it all back to OTV and I am happy to say I have done so now, let's take a look!

The thing you must remember about OTV is that eventually, you might have multiple data centres connected with OTV, so when cisco designed OTV they said to themselves, we have to have a way to make it easy for OTV DC's to discover each other (OTV Control-Group) and ensure that when we are distributing multicast to multiple OTV DC's that they can utilize an underlying multicast network (OTV Data-Group)

Let's look at a typical OTV interface configuration:

 interface Overlay1
  otv join-interface Ethernet1/9
  otv control-group 224.1.1.2
  otv data-group 232.0.0.0/8
  otv extend-vlan 10
  no shutdown

!

So the first thing is lets look at the control-group, the control group command above means that when OTV is trying to establish an adjacency with it's neighbor, it will use this mcast group to send out the requests to see if anyone wants to establish a peer relationship with it.

So if you can't establish a peer adjacency over a network your sure is enabled for multicast, you need to look at this control-group multicast address and check for reachability end to end by using the show ip mroute and your knowledge from Part 1 to 4 of my mcast tutorial :).

Here is an example of this in the show ip mroute table:


N7K3# show ip mroute
IP Multicast Routing Table for VRF "default"

(*, 224.1.1.2/32), uptime: 00:22:02, otv ip
  Incoming interface: Ethernet1/9, RPF nbr: 169.254.0.71
  Outgoing interface list: (count: 1)
    Overlay1, uptime: 00:22:02, otv


You can see from the above that this little guy has received traffic incoming for this group via the eth1/9 interface, and then passes it on to the overlay interface so that the ISIS adjacency can establish.

This is all fairly straight forward at this point, let's check out what the data-group is all about:

The astute reader may have noticed the IP addressing for the data-group:

interface Overlay1
  otv data-group 232.0.0.0/8

!

This data-group range is the source-specific-multicast group range, which as we know from our previous tutorials requires Source Specific Multicast (SSM)

This is why we are obligated when configuring OTV, to put IP IGMP Version 3 on our join interface:

interface Ethernet1/9
  ip address 169.254.0.71/24
  ip igmp version 3
  no shutdown

!

if we did not have this, the source-specific multicast would not work correctly.

Now we know how this part works, let's generate some mcast traffic



Side note, with iperf be very careful when testing., be sure to run an ip igmp snooping events debug so that you can make sure you see the multicast receiver (the server side) sending the appropriate IGMP join messages, so that the mroute table can be updated

N7K3# 2013 May 26 09:04:42.248659 igmp: SN: 10 Noquerier timer expired, remove all the groups in this vlan.
2013 May 26 09:04:42.249253 igmp: SN: 10 Removing group entry (*, 239.1.1.2) which has no oifs


So on iperf we simply setup our source and receiver, then we can see the mcast traffic flow:

bin/iperf.exe -s -u -P 0 -i 1 -p 5001 -w 41.0K -B 239.1.1.2 -l 1000.0B -f k
------------------------------------------------------------
Server listening on UDP port 5001
Binding to local address 10.0.0.1
Receiving 1000 byte datagrams
UDP buffer size: 41.0 KByte
------------------------------------------------------------
[148] local 10.0.0.1 port 5001 connected with 10.0.0.2 port 53766



Let's look what that has done to our ip mroute table



N7K3# show ip mroute
IP Multicast Routing Table for VRF "default"

(*, 224.1.1.2/32), uptime: 00:27:48, otv ip
  Incoming interface: Ethernet1/9, RPF nbr: 169.254.0.71
  Outgoing interface list: (count: 1)
    Overlay1, uptime: 00:27:48, otv

(169.254.0.72/32, 232.0.0.0/32), uptime: 00:00:01, otv ip
  Incoming interface: Ethernet1/9, RPF nbr: 169.254.0.72
  Outgoing interface list: (count: 1)
    Overlay1, uptime: 00:00:01, otv



What the heck?  We sent mcast traffic to 239.1.1.1, but the mcast routing table shows it as in the 232.0.0.0 mcast address! What is going on?

Here it is: OTV encapsulates multicast traffic... inside of multicast, then delivers it using the SSM range (the data range) you specified, then decapsulates it at the other end and delivers it to your host:


N7K3# show otv mroute

OTV Multicast Routing Table For Overlay1

(10, *, 239.1.1.2), metric: 0, uptime: 00:00:07, igmp  Outgoing interface list: (count: 1)
    Eth2/11, uptime: 00:00:07, igmp
(10, 10.0.0.2, 239.1.1.2), metric: 0, uptime: 00:01:47, overlay(s)
  Outgoing interface list: (count: 0)



You can see from the above that once OTV receives the mcast traffic (since it is listed as a outgoing interface for the 232.0.0/32 route as per our show ip mroute), it then delivers it to it's own internal otv mroute table which then actually sends the mcast traffic to the correct interface (in our case Eth2/11)

What a hack, it works but wow quite genius!

Why did we do this? because OTV wants to use Source Specific Multicast for distributing the traffic over the Layer 3 DCI Interconnect, why specifically it has to be SSM I am not sure, but if you try and use any other data-group range (non SSM range:)

N7K4(config)# int overlay1
N7K4(config-if-overlay)#   otv data-group 231.0.0.0/24
N7K4(config-if-overlay)# exit
N7K4# show ip mroute
IP Multicast Routing Table for VRF "default"

(*, 224.1.1.2/32), uptime: 00:30:35, ip otv
  Incoming interface: Ethernet1/9, RPF nbr: 169.254.0.72
  Outgoing interface list: (count: 1)
    Overlay1, uptime: 00:00:20, otv

(169.254.0.72/32, 231.0.0.0/32), uptime: 00:00:05, igmp ip
  Incoming interface: Ethernet1/9, RPF nbr: 169.254.0.72
  Outgoing interface list: (count: 1)
    Ethernet1/9, uptime: 00:00:05, igmp, (RPF)

N7K4# show otv mroute

OTV Multicast Routing Table For Overlay1

(10, *, 239.1.1.2), metric: 0, uptime: 00:00:14, overlay(r)
  Outgoing interface list: (count: 1)
    Overlay1, uptime: 00:00:14, isis_otv-default

(10, 10.0.0.2, 239.1.1.2), metric: 0, uptime: 00:00:09, site
  Outgoing interface list: (count: 1)
    Overlay1, uptime: 00:00:09, otv



It still does work, let's try removing the ip igmp version 3 command:


interface Ethernet1/9
  ip address 169.254.0.72/24
  no shutdown



(Showing no ip igmp version 3)

Multicast traffic still flows just fine, note i even tried a new group


bin/iperf.exe -s -u -P 0 -i 1 -p 5001 -w 41.0K -B 239.1.1.3 -l 1000.0B -f k
------------------------------------------------------------
Server listening on UDP port 5001
Binding to local address 10.0.0.1
Receiving 1000 byte datagrams
UDP buffer size: 41.0 KByte
------------------------------------------------------------
[148] local 10.0.0.1 port 5001 connected with 10.0.0.2 port 53770
[ ID] Interval       Transfer     Bandwidth       Jitter   Lost/Total Datagrams
[148]  0.0- 1.0 sec   121 KBytes   992 Kbits/sec  0.024 ms 1600613993/  124 (1.3e+009%)
[148]  1.0- 2.0 sec   121 KBytes   992 Kbits/sec  0.007 ms    0/  124 (0%)
[148]  2.0- 3.0 sec   120 KBytes   984 Kbits/sec  0.009 ms    0/  123 (0%)
[148]  3.0- 4.0 sec   121 KBytes   992 Kbits/sec  0.001 ms    0/  124 (0%)
[148]  4.0- 5.0 sec   121 KBytes   992 Kbits/sec  0.001 ms    0/  124 (0%)
[148]  5.0- 6.0 sec   121 KBytes   992 Kbits/sec  0.001 ms    0/  124 (0%)
[148]  6.0- 7.0 sec   120 KBytes   984 Kbits/sec  0.002 ms    0/  123 (0%)
[148]  7.0- 8.0 sec   121 KBytes   992 Kbits/sec  0.001 ms    0/  124 (0%)
[148]  8.0- 9.0 sec   121 KBytes   992 Kbits/sec  0.016 ms    0/  124 (0%)
[148]  9.0-10.0 sec   124 KBytes  1016 Kbits/sec  0.001 ms    0/  127 (0%)
[148]  0.0-10.1 sec  1214 KBytes   988 Kbits/sec  0.001 ms    0/ 1243 (0%)



I can only assume cisco want you to use SSM for larger topologies because it has some benefits for large-scale, I can't think of any other reason, but i would obviously recommend sticking to using 232.0.0.0/8 as per cisco's recommendation and stick to using ip igmp version 3 on your join interface as per cisco's recommendation.


What if our provider or DCI link doesn't support multicast? Can we not establish OTV? Since

Version 5.1 I believe of NX-OS you are now able to use an "adjacency-server", which is basically a central point for the OTV interconnects, the configuration for this is shown:


Non-adjacency server side:

interface Overlay1
  otv isis authentication-type md5
  otv isis authentication key-chain OTV
  otv join-interface Ethernet1/9
  otv extend-vlan 10
  otv use-adjacency-server 169.254.0.72 unicast-only
  no shutdown


Adjacency side:



interface Overlay1
  otv isis authentication-type md5
  otv isis authentication key-chain OTV
  otv join-interface Ethernet1/9
  otv extend-vlan 10
  otv adjacency-server unicast-only
  no shutdown


Multicast traffic still works over this, it is just encapsulated inside unicast over the overlay,


N7K3# show otv mroute

OTV Multicast Routing Table For Overlay1


(10, *, 239.1.1.4), metric: 0, uptime: 00:00:08, igmp
  Outgoing interface list: (count: 1)
    Eth2/11, uptime: 00:00:08, igmp
N7K3# show ip mroute
IP Multicast Routing Table for VRF "default"



As you can see there are no entries in the multicast routing table when using adjacency, because all multicast traffic stays within the OTV tunnel, the OTV itself has an mroute table, which takes that unicast-encapsulated multicast and spits it out the IGMP joined interfaces.


I hope this helps guys!

Saturday, May 25, 2013

CCIE DC: Control Plane Policing, speeding up FTP and ensuring ping's are not dropped to your sup engine

Hi Guys

So this is a super quick blog post just because it was something that always bothered me, plus it's a way to show you some Control Plane Policing :).


If you have ever pinged a NX-OS device you have noticed that it drops packets, which can cause you concern if your pinging the device directly (it won't drop them if you are pinging something BEHIND the device, just if your pinging directly to the control plane itself:)


--- 10.150.99.114 ping statistics ---
1000 packets transmitted, 996 packets received, 0.40% packet loss
round-trip min/avg/max = 1.147/2.313/45.684 ms


As you can see above, it's not many, around 4 packets every 1000, but it's annoying enough to bother me.

So I was learning about control plane Policing, your NX-OS Device comes with a bunch of control-plane policing policies by default, you can check them out by issuing show run all

The relevant one's to what I am working on is below:

Here is an ACL that defines the traffic:

ip access-list copp-system-p-acl-icmp
  10 permit icmp any any echo
  20 permit icmp any any echo-reply


Here is a class-map that matches this traffic (along with some other traffic types)



class-map type control-plane match-any copp-system-p-class-monitoring
  match access-group name copp-system-p-acl-icmp
  match access-group name copp-system-p-acl-icmp6
  match access-group name copp-system-p-acl-traceroute


Here is the relevant part of the policy-map that controls this traffic:


policy-map type control-plane copp-system-p-policy-strict

  class copp-system-p-class-monitoring
    set cos 1
    police cir 130 kbps bc 1000 ms conform transmit violate drop


As you can see from the above, ping packets are rate-limited to a lowly 130 kilobits per second, for me this is quite low and I think that ping packets are not always attack vectors, so at this point I could manually modify the existing policy, create a new policy then apply it like so:


control-plane
  service-policy input

!

Or alternatively, use the "copp profile" command to configure one of the preconfigured COPP Profiles


mudcswp02core(config)# copp profile ?
  dense     The Dense   Profile
  lenient   The Lenient Profile
  moderate  The Moderate Profile
  strict    The Strict Profile

 

The COPP profile looks after everything, from things like your BGP traffic, OSPF traffic, all sorts of traffic types to ensure that traffic cannot overload the supervisor engine, so be careful when modifying this COPP Value, but if you see things like ping being dropped or you can't perform certain amounts of traffic over the link (maybe FTP or SSH or secure copy), this is where you can sort it out.

Speaking of which, check out the defaults for FTP:

FTP falls under the following class-map:

class-map type control-plane match-any copp-system-p-class-management
  match access-group name copp-system-p-acl-ftp



Which has the following policy set:


  class copp-system-p-class-management
    set cos 2
    police cir 10000 kbps bc 250 ms conform transmit violate drop 


Which as you can see, is 10 megabits per second, maybe you want your file transfers to the flash of the sup to go faster? modify this value :).






Saturday, May 18, 2013

CCIE DC: The VPC Topology you never want to implement, the "Pink Slip" Topology

Hi Guys

So one of my most popular blog posts of all time is my "vPC - The gotcha's you need to know", with a whole.. 10.. maybe even 15!!! unique visitors, no I kid it's a few more than that and easily the most popular on the site.


Let's take a fresh approach to my blog here and write about a topology for vPC that you would NEVER want to implement, it will be funny for us to identify the mistakes made in this topology, plus we can learn all about vPC Failure scenario's too!

Just a few quick shout out's, A certain someone who if they want to take credit will post a reply is responsible for getting me access to a lab so i can actually show you all this, So cheers Anonymous Person!

The INE CCIE DC Video's where invaluable in the creation of this blog post, the advanced vPC video's are exceptionally good.

The blog post will contain "ALWAYS" tips that you can use to make sure your implementing a secure, robust vPC Topology first time every time.

It's worth noting that almost every step of the way, protection mechanisms in NXOS and vPC prevented me from trying to configure it the wrong way, for example, I was unable to configure the peer link before I configured the keepalive.

Anyway now that's out the way let's look at our Topology.









(As you can clearly see, I was actually meant to be a graphic artist but got lumped into networking ;))

OK so now we have our simple diagram showing our very simple topology. Our Hero Network Engineer, let's call him leethax0r decides to implement a vPC Topology, "how hard can it be?" He says to himself, After all I have seen the Cisco Certified guys do it tons of times, it's just like port-channel's with a stacked switch right? Of course leethax0r is too proud to learn how to do properly, if it was up to him, they'd be implementing an ABC (Anything But Cisco) Network anyway.

Through a series of serious mistakes our hero leethax0r (or hax0r for short) implements a terrible configuration, leading to a terrible vPC implementation (Which invaribly, he will blame on "cisco bugs", rather than his own misunderstanding and misconfiguration)


The first thing hax0r decides to do, is implement a peer keepalive, he knows this is the first step in implementing vPC


(Config shown in red so that hopefully people don't copy paste it and try and implement it!)

feature interface-vlan
feature lacp
feature vpc

int vlan 10
 no shut
 ip add 10.10.10.2/24
!





hax0r does the same thing on each side and changes the IP Address, so far he hasn't really done anything wrong, It's perfectly valid to use an SVI interface for your peer keepalives, although it is strongly recommended to place them in there own VRF so they are not affected by the global network IP routing, it's also recommended to create a dedicated link between the two switches that just carries this particular VLAN for this SVI, but this is where our hero hax0r goes horribly wrong...

Hax0r decides that he wants to implement OSPF and rely on the network upstream from his 5k to provide alternative paths for the peer keepalives to reach each other, what could possibly go wrong? he says to himself:

ALWAYS: Implement a totally seperate vRF for the vPC Peer keepalive and (where possible) have a back to back interface between the vPC Peers, a directly connected interface only carrying the peer keepalive, the peer-keepalive link prevents dual-active, the most catastrophic situation that can occur with vPC!

This is the config that hax0r implements:

Switch 2

interface Vlan10
  no shutdown
  ip address 10.10.10.2/24
  ip router ospf 1 area 0.0.0.0

interface Vlan20
  no shutdown
  ip address 10.20.20.1/24
  ip router ospf 1 area 0.0.0.0

 

Switch 1:
interface Vlan10
  no shutdown
  ip address 10.10.10.1/24
  ip router ospf 1 area 0.0.0.0

interface Vlan20

interface Vlan30
  no shutdown
  ip address 10.30.30.1/24
  ip router ospf 1 area 0.0.0.0



He then checks he can reach switch 1 from Switch 2 on there seperate VLAN interfaces that will be routed via VLAN 10 (the shared VLAN)

SIWTCH2# ping 10.30.30.1
PING 10.30.30.1 (10.30.30.1): 56 data bytes
64 bytes from 10.30.30.1: icmp_seq=0 ttl=254 time=0.823 ms
64 bytes from 10.30.30.1: icmp_seq=1 ttl=254 time=0.619 ms
64 bytes from 10.30.30.1: icmp_seq=2 ttl=254 time=0.61 ms
64 bytes from 10.30.30.1: icmp_seq=3 ttl=254 time=7.755 ms
64 bytes from 10.30.30.1: icmp_seq=4 ttl=254 time=9.645 ms

L33thax0r pats himself on the back at his correct implementation of OSPF.

Next, hax0r implements the vPC Peer Keepalive:

vpc domain 1
  peer-keepalive destination 10.20.20.1 source 10.30.30.1 vrf default

!

hax0r checks show vpc to see if the peer is up


SIWTCH2# show vpc
Legend:
                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1
Peer status                       : peer link not configured
vPC keep-alive status             : peer is aliveConfiguration consistency status  : failed
Per-vlan consistency status       : failed
Configuration inconsistency reason: vPC peer-link does not exist
Type-2 consistency status         : failed
Type-2 inconsistency reason       : vPC peer-link does not exist
vPC role                          : none established
Number of vPCs configured         : 0
Peer Gateway                      : Disabled
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Disabled (due to peer configuration)
Auto-recovery status              : Disabled


Hax0r is now happy that his vPC is showing as up, hax0r, out of his depth notices something about a "peer link", some quick googling leads hax0r to some sample configuration, so hax0r implements the config he found on a popular ABC podcast


interface Ethernet1/20
  switchport mode trunk
  channel-group 1 mode active
!

SIWTCH2(config)# int po1
SIWTCH2(config-if)# vpc peer-link


This is dead wrong: Hax0r didn't use more than one link for the peer-link and  he didn't set the spanning-tree port type to network (Thankfully this last one the switch will do this automatically for him)

ALWAYS: Implement multiple links, bundled together for your vPC Peer-link, if your using a Nexus 7000 this should be across multiple linecards (and if you don't have multiple linecards in your nexus 7000, GET THEM!)

ALWAYS: Use spanning-tree port type network on the vPC Peer Link (Switch will do this automatically for you)

Hax0r now checks the vPC Output:

SIWTCH2# show vpc
Legend:
                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1
Peer status                       : peer adjacency formed ok
vPC keep-alive status             : peer is alive
Configuration consistency status  : success
Per-vlan consistency status       : success
Type-2 consistency status         : success
vPC role                          : primary
Number of vPCs configured         : 0
Peer Gateway                      : Disabled
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Enabled
Auto-recovery status              : Disabled

vPC Peer-link status
---------------------------------------------------------------------
id   Port   Status Active vlans
--   ----   ------ --------------------------------------------------
1    Po1    up     1,10


Hax0r has done it! The VPC is up! Truly his l33tness should  be known everywhere, maybe he'll even be able to convince the bosses to take out this evil Cisco network and put ABC vendor in, now he's shown his l33t skills.

However, little known to hax0r, his shoddy vPC config is a timebomb waiting to go off.

L33t Hax0r calls his co-worker and system administrator, , LulzAdmin (or Lulz for short) to come and plug his servers in, the vPC is ready.

LulzAdmin (Who is almost as good as l33t hax0r at his job) has his server ready. The server is MISSION Critical, the entire business depends on this server.

l33t hax0r and lulzadmin can't work out how to dual attach there server, they can't seem to get the teaming working across two ports, and after blaming "cisco bugs" they decide that the server will be fine single attached.

ALWAYS: Dual Attach your servers to both vPC Peers, dual attach EVERYTHING, you never want single-attached devices.
 
This mission critical server, is about to have some major problems...

L33t hax0r plugs the server into one of the ports on the switch:

int eth1/1
 channel-group 10 mode active

!

int po10
 switchport mode access
 switchport access vlan 50
!


Examine the below output:


SWITCH1# show vpc orphan-ports
Note:
--------::Going through port database. Please be patient.::--------

VLAN           Orphan Ports
-------        -------------------------
1              Eth1/10, Eth1/15, Eth2/1, Eth2/2
10             Eth1/10
50             Eth1/1, Eth1/10


Here is our first problem, as mentioned, we have some single attached ports, which in this case is vPC 50,



SIWTCH2# show mac address-table vlan 10
Legend:
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link
   VLAN     MAC Address      Type      age     Secure NTFY   Ports/SWID.SSID.LID
---------+-----------------+--------+---------+------+----+------------------
* 10       547f.eeaf.1cbc    static    0          F    F  Router
* 10       547f.eeaf.3a3c    static    0          F    F  Po1

 



What is wrong with this output? Let's see, the OSPF neighbor relationship between the two switches is via the Port-Channel link! That means if the vPC Peer link goes down, the keepalives will also go down as well, which means the vPC Secondary will suspend any vPC Peer Ports for risk of a dual active situation!


After configuring this mess, hax0r and lulz go for a "Hard earnt" drink at the local bar, meanwhile, the Data Centre they are hosting the Nexus at has a scheduled Power Outage for UPS B Feed, if you have your equipment dual attached to two power supplies on two diffirent feeds, you should be good, but take a guess as to if our hero hax0r did such a thing ;).

The Secondary nexus, connected to UPS Feed B, powers off.

Lulz and Hax0r receive a phone call, it's BigBossMan, the Webserver has gone Offline! What have you done?


Lulz and Hax0r rush to the data centre to investigate the problem, upon logging into the powered up nexus, Lulz and Hax0r see the following:


SWITCH1# show vpc
Legend:
                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1
Peer status                       : peer link is down
vPC keep-alive status             : Suspended (Destination IP not reachable)
Configuration consistency status  : success
Per-vlan consistency status       : success
Type-2 consistency status         : success
vPC role                          : secondary, operational primary
Number of vPCs configured         : 1
Peer Gateway                      : Disabled
Dual-active excluded VLANs        : -
Graceful Consistency Check        : Enabled
Auto-recovery status              : Disabled

vPC Peer-link status
---------------------------------------------------------------------
id   Port   Status Active vlans
--   ----   ------ --------------------------------------------------
1    Po1    down   -

vPC status
----------------------------------------------------------------------------
id     Port        Status Consistency Reason                     Active vlans
------ ----------- ------ ----------- -------------------------- -----------
10     Po10        down   failed      Peer-link is down          -SWITCH1#

The port channel to the server! Po10! It's down, Why? What is this "peer link down" all about, just because the other switch died, why should this one fail?

The problem is that the vPC peer link can't come up until the peer keepalive is up, and the peer keepalive can't come up until the vPC Peer Link comes up... Even though VLAN 10 is configured to go upstream to the rest of the network (as per hax0r's plan to use the rest of the global routing table to allow the keepalives to function) when a VPC Peer-link goes down, any SVI's that belong to that vPC Peer-link will shut down.

ALWAYS: Make sure the L3 link which the two vPC peers use for there routing protocol adjacency  does NOT travel over the vPC Peer-link, as if the vPC peer link dies, these SVI's will be suspended!




Lulz and hax0r, do some quick googling and seem to think they need a command called "auto-recovery, as quickly as they can they configure auto-recovery on the switches and reload the switches...

NOW, they truly are hosed, by implementing auto-recovery which tells both switches to go active after a reload if they can't see the peer, and by not having the peer-link able to come up because the keepalives can't get through, they will now cause a dual-active scenario.

The Dual Active Scenario leads the entire network to come down, Cisco Certified Engineers come in and inspect what happened, proving the misconfiguration on hax0r's part, hax0r is summarily shown his pink slip, still cursing "cisco bugs" for his downfall.


This is just some of the mistakes that can be made when your configuring vPC that can lead it to cause major problems, Now that we know what we shouldn't do, let's examine the diffirent commands in VPC and what they protect us from.

First of all, I explain the peer gateway command in detail in my vPC - The gotcha's you need to know article. You should always implement peer-gateway, especially if you have netapp or F5 Devices.

What about Auto-Recovery, what exactly does that Do?

As we explained in our previous example, auto-recovery is useful in a situation where both switches power off, but only one switch turns back on (maybe the other one was hit by an electrical surge or something else), in this particular scenario, if auto-recovery was not turned on what would happen is that the vPC would never establish, so the switch that is now ON would never turn on it's vPC member port,s because the vPC peer-link would never have come up.

Auto-recovery will resolve this, Auto-recovery says that after a certain period (default is 240 seconds), the switch will assume the peer has died and bring up the ports.

The Most important thing, when enabling Auto-recovery, is to be damn sure that if both switches reset, they will always be able to get the vPC Peer Link up and/or the vPC Peer Keepalive so that they can detect a dual-active scenario.

Therefore, it is recommended to turn on auto-recovery, and as long as you can satisfy the above criteria, you are safe to turn this on.

NOTE: You can turn on auto-recovery retroactively, so if you ever walked into a situation where this was occuring and you hadn't turned  it on previously, if you turn it on, 240 seconds later the vPC will become active.

Auto Recovery also assists in another situation, if you have a vPC Peer-link go down:

SWITCH2(config)# int po1
SWITCH2(config-if)# shut
2013 May 18 15:23:35 SWITCH2 %$ VDC-1 %$ %VPC-2-VPC_SUSP_ALL_VPC: Peer-link going
down, suspending all vPCs on secondary


The Secondary will suspend all vPC Member Ports as it should, this is the behavior that is executed as part of vPC to prevent Dual-Active Scenario's and to ensure correct forwarding, because the peer-link is quite important.

So, now what happens if the primary now dies in this situation, the secondary vPC will STILL leave the ports down:


vPC Peer-link status
---------------------------------------------------------------------
id   Port   Status Active vlans
--   ----   ------ --------------------------------------------------
1    Po1    down   -

vPC status
----------------------------------------------------------------------------
id     Port        Status Consistency Reason                     Active vlans
------ ----------- ------ ----------- -------------------------- -----------
10     Po10        down   failed      Peer-link is down          -


With auto-recovery, the secondary switch can realise that the primary is not coming back and enable the vPC member ports.

Again however, it's exceptionally important that your positive the vPC Peer Keepalive will always work reliably so as to avoid dual active scenario's, this is the reason that auto-recovery is not enabled by default, because it could potentially lead to dual-active.


The following output is typically what you will see when a vPC PO has been enabled due to auto-recovery:

vPC status
----------------------------------------------------------------------------
id     Port        Status Consistency Reason                     Active vlans
------ ----------- ------ ----------- -------------------------- -----------
10     Po10        up     success     Type checks were bypassed  50
                                      for the vPC



Let's look at two other commands, graceful consistency check and Peer-switch.

Graceful consistency check helps in the following situations, Let's pretend our friend hax0r has been rehired at a company, much to his chagrin they have vPC, our friend hax0r is asked to reconfigure a vPC port-channel from an access port to a trunk port.

Woops! the ports have just gone down now as the paramters don't match, our friend hax0r is not having much luck!

With the graceful-consistency check command configured, one end of the link will suspend but the other will remain up, giving you chance to match the parameters on both side, so you don't have to bring the link down to make changes, therefore this is highly recommended.


With Peer-Switch, unfortunately in my topology I do not have enough switches to show you what particular scenario is needed for it to be helpful, but to cut a long story short, if you have a switch connected via a vPC port channel to your two vPC Peers, if one of these Peers is the root of the spanning-tree, Peer-switch can be useful to ensure that in the event of failure of this root bridge, and subsequent recovery, that there is no delay in forwarding while waiting for spanning-tree to reconverge, because both the vPC Peers will appear as one giant bridge from the perspective of spanning-tree (since they will share the same bridge-ID), I recommend you turn this command on, i cannot think of any reason not to.


So In Conclusion, I personally recommend the following as default config for vPC:

Graceful Consistency Check
Auto Recovery (Just be sure your peer-keepalive or peer-link will always work after a reboot)
Peer-Gateway
Peer-Switch
 

Finally, there are a few interesting options for your vPC that could allow you to do some naughty configuration if you wanted to.


The Dual-active command under vPC Config allows you to specify that a particular SVI interface will NOT go down if the peer-link fails, but instead is excluded from suspension, this could have been useful to our friend hax0r for VLAN 10, but it's still not recommended to configure it this way, you should not need to use this dual-active command in most situations, except in certain orphan port situations where you want to ensure that if an orphan port exists, and the peer-link dies, that the orphan port on the secondary can get to it's SVI interface.


I hope you enjoyed this blog post I spent a while on it, I hope I covered off your questions :)


















Sunday, May 12, 2013

CCIE DC: First Official Rack Rental! Spanning-tree Bridge Assurance and LACP suspend-individual

Hi Guys!

Today marked an important occasion as I had my first "official" rack rental (I have had others thanks to some generous people, but this was my first with the "big two" training vendors)

I concentrated on:

- CFS
- Port Channels (LACP)
- Spanning-tree Bridge Assurance
- FEX stuff


Here is the first few useful things i found.


I am sure we have all done this before:


You configure a few member interfaces for your etherchannel:


SW3(config)# int eth1/4, eth1/2
SW3(config-if-range)# channel-group 2 mode active



You configure a few options for the Port channel:

interface port-channel2
  description ### I am L33t ###
  switchport mode trunk
  switchport trunk allowed vlan 1
  spanning-tree port type network
  speed 10000



These then apply on your member ports:

interface Ethernet1/4
  switchport mode trunk
  switchport trunk allowed vlan 1
  channel-group 2 mode active



But woops you forgot a port, you meant to add Eth1/1 too!



SW3(config)# int eth1/1
SW3(config-if)# channel-group 2 mode active
command failed: port not compatible [Ethernet Layer]



Damn what a pain in the ass! Now I have to go and add all the options to the port, like the spanning-tree mode etc.. or do I?

 SW3(config-if)# channel-group 2 force mode active


Let's take a look at the config now:


 SW3(config-if)# show run int eth1/1
 

interface Ethernet1/1
  switchport mode trunk
  switchport trunk allowed vlan 1
  channel-group 2 mode active



Awesome! All the appropriate config has applied without me having to put it all in manually. A bit of a time saver on the Lab, when every second will count!

Let's talk more about LACP, there are two commands available for LACP that are not on non-nexus platforms, and these commands are enabled by default, and they can actually be quite a pain:

lacp suspend-individual

and

lacp graceful-convergence


Let's talk about suspend-individual.

so the idea behind LACP suspend-individual is that if a port-channel does not receive any LACP PDU's on a particular port-channel, in the normal case these ports would be placed into "Individual" state:


SW1# show port-channel sum
Flags:  D - Down        P - Up in port-channel (members)
        I - Individual  H - Hot-standby (LACP only)
        s - Suspended   r - Module-removed
        S - Switched    R - Routed
        U - Up (port-channel)
        M - Not in use. Min-links not met
--------------------------------------------------------------------------------
Group Port-       Type     Protocol  Member Ports
      Channel
--------------------------------------------------------------------------------
1     Po1(SD)     Eth      LACP      Eth3/1(I)    Eth3/3(I)   


 

This means that LACP will treat these as two independent links, not as an etherchannel, but let's say you had the other end of the link misconfigured, and had port channel in ON mode on the other end:


SW2# show run int eth1/1

interface Ethernet1/1
  switchport mode trunk
  channel-group 10



Suddenly you have a potential loop in the network, and you will see some very strange spanning-tree behavior:

SW2# show spanning-tree vlan 30

VLAN0030
  Spanning tree enabled protocol rstp
  Root ID    Priority    4126
             Address     547f.eec2.7d01
             This bridge is the root
             Hello Time  2  sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    4126   (priority 4096 sys-id-ext 30)
             Address     547f.eec2.7d01
             Hello Time  2  sec  Max Age 20 sec  Forward Delay 15 sec

Interface        Role Sts Cost      Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Po10             Desg FWD 1         128.4105 P2p
SW2# show spanning-tree vlan 30

VLAN0030
  Spanning tree enabled protocol rstp
  Root ID    Priority    4126
             Address     547f.eec2.7d01
             This bridge is the root
             Hello Time  2  sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    4126   (priority 4096 sys-id-ext 30)
             Address     547f.eec2.7d01
             Hello Time  2  sec  Max Age 20 sec  Forward Delay 15 sec

Interface        Role Sts Cost      Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Po10             Desg BLK 1         128.4105 Dispute P2p




The port will block/unblock as it keeps seeing "Designated" BPDU's.

We can resolve this by telling the port on the upstream switch to suspend ports if they are part of an etherchannel and we are expecitng to receive LACP PDU's for them:


SW1(config)# int po1
SW1(config-if)# lacp suspend-individual
ERROR: Cannot set/reset lacp suspend-individual for port-channel1 that is admin up
SW1(config-if)# shut
SW1(config-if)# lacp suspend-individual
SW1(config-if)# no shut




Now the ports will show as suspended:

SW1(config-if)# end
SW1# show port-channel sum
Flags:  D - Down        P - Up in port-channel (members)
        I - Individual  H - Hot-standby (LACP only)
        s - Suspended   r - Module-removed
        S - Switched    R - Routed
        U - Up (port-channel)
        M - Not in use. Min-links not met
--------------------------------------------------------------------------------
Group Port-       Type     Protocol  Member Ports
      Channel
--------------------------------------------------------------------------------
1     Po1(SD)     Eth      LACP      Eth3/1(s)    Eth3/3(s)   



As soon as we reconfigure our etherchannel correctly on the other side:


SW2(config)# int eth1/1, eth1/3
SW2(config-if-range)# channel-group 10 mode active

The port comes out of suspended state and traffic will flow

SW1# show port-channel sum
Flags:  D - Down        P - Up in port-channel (members)
        I - Individual  H - Hot-standby (LACP only)
        s - Suspended   r - Module-removed
        S - Switched    R - Routed
        U - Up (port-channel)
        M - Not in use. Min-links not met
--------------------------------------------------------------------------------
Group Port-       Type     Protocol  Member Ports
      Channel
--------------------------------------------------------------------------------
1     Po1(SU)     Eth      LACP      Eth3/1(P)    Eth3/3(P)   


So as you can see, you didn't need to no shut the ports or anything on the Sw2 side, you just had to get it to start advertising LACP PDU's

The PROBLEM is that some linux hosts and some other hosts, will not bring up the LACP until they receive the LACP PDU's first, so this can make the switch place the ports into the suspended state indefinately, since the switch is expecting LACP PDU's, but the host never sends them, so the port channel remains down.

So for linux hosts and other devices that may not send PDU's straight away, turn off LACP suspend individual, for all other ports your perfectly safe having it enabled.


Let's talk about bridge assurance.


Bridge assurance is a feature on the Nexus platforms that uses BPDU's as a method to perform "pruning" of unwanted VLAN's (although this is more of an unforseen benefit of the design) and to protect against unidirectional links.

The way bridge assurance works is, if you specify a port as spanning-tree port type network (which is NOT set by default by the way, except on vPC Peer-Links) then what will happen is spanning-tree bridge assurance will force both links to constantly send BPDU's both directions, as sort of a method of keepalive, if spanning-tree bridge assurance notices that these BPDU's go missing on either end, it knows that there is a unidirectional fault on the link (or another fault on the link) and immediately blocks the port via spanning-tree so that an alternative path can be taken.

The added advantage of this technology, is that when using rapid spanning-tree, each VLAN has it's own BPDU's right? Let's say we have a config like this:



Switch 1 has VLAN 1, 10, and 30

Switch 3 has VLAN 1 and 10


On the switches port-channels to each other, we specify these are "network" ports:



SW3# show run int po2

interface port-channel2
  description ### I am L33t ###
  switchport mode trunk
  switchport trunk allowed vlan 1
  spanning-tree port type network
  speed 10000

!

SW1# show run int po2
interface port-channel2
  switchport
  switchport mode trunk
  spanning-tree port type network


As you can see here, we have port type network on both switches, lets see what spanning-tree on Sw1 has to say about this:

SW1# show spanning-tree int po2

Vlan             Role Sts Cost      Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
VLAN0001         Desg FWD 1         128.4097 Network P2p

VLAN0010         Desg BKN*1         128.4097 Network P2p *BA_Inc

VLAN0030         Desg BKN*1         128.4097 Network P2p *BA_Inc


As you can see, spanning-tree bridge assurance is blocking vlan 10 and 30 from going out on this link because on sw3 we have said switchport trunk allowed vlan 1, so the upstream swithc (Swithc 1) is not receiving any BPDU's, so as  far as he is concerned there's no good reason to send the traffic down.

If we add VLAN 10 to the Switch 3 trunk interface:



SW3(config)# int po2
SW3(config-if)# switchport trunk allowed vlan add 10



This will instantly as soon as the BPDU's are advertised unblock the port on the upstream switch:


SW1# 2013 May 12 10:02:47 SW1 %$ VDC-1 %$ %STP-2-BRIDGE_ASSURANCE_UNBLOCK: Bridge Assurance unblocking port port-channel2 VLAN0010.

Now we have seen how spanning-tree bridge assurance works, let's see what can happen if we misconfigure it.


In this example, we have a trunk between SW1 to SW2:

SW1# show run int po1


interface port-channel1
  switchport
  switchport mode trunk
  spanning-tree port type network

On Switch2:

interface port-channel10
  switchport mode trunk
  speed 10000


Just to make this example a bit easier to follow, on Switch 2 we have made Switch 2 the root of the spanning tree.

Let's take a look at the show spanning-tree on Switch 1:

SW1# show spanning-tree interf po1

Vlan             Role Sts Cost      Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
VLAN0001         Desg BKN*1         128.4096 Network P2p *BA_Inc

VLAN0010         Desg BKN*1         128.4096 Network P2p *BA_Inc

VLAN0030         Root FWD 1         128.4096 Network P2p 



As you can see from the above example, the switch has placed the port into BLOCKING based on Bridge Assurance (BA_INC).

It has however kept vlan 30 unblocked, why?


SW2# show spanning-tree inter po10

Vlan             Role Sts Cost      Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
VLAN0001         Root FWD 1         128.4105 P2p

VLAN0030         Desg FWD 1         128.4105 P2p




Because for VLAN 30, SW2 is the root switch. Since SW2 is the root for VLAN 30, that means that the port facing upstream is a designated port from SW2's perspective, BPDU's are always sent out Designated ports, therefore SW1 is receiving BPDU's from SW2:


SW2# debug spanning-tree bpdu_tx

2013 May 12 06:59:11.492578 stp: RSTP(30): transmitting RSTP BPDU on port-channel10
2013 May 12 06:59:11.492608 stp: vb_vlan_shim_send_bpdu(1977): VDC 1 Vlan 30 port port-channel10 enc_type 1 len 42
2013 May 12 06:59:13.492584 stp: RSTP(30): transmitting RSTP BPDU on port-channel10
2013 May 12 06:59:13.492615 stp: vb_vlan_shim_send_bpdu(1977): VDC 1 Vlan 30 port port-channel10 enc_type 1 len 42
2013 May 12 06:59:15.492581 stp: RSTP(30): transmitting RSTP BPDU on port-channel10
2013 May 12 06:59:15.492610 stp: vb_vlan_shim_send_bpdu(1977): VDC 1 Vlan 30 port port-channel10 enc_type 1 len 42
2013 May 12 06:59:17.490094 stp: RSTP(30): transmitting RSTP BPDU on port-channel10
2013 May 12 06:59:17.490124 stp: vb_vlan_shim_send_bpdu(1977): VDC 1 Vlan 30 port port-channel10 enc_type 1 len 42
2013 May 12 06:59:19.490097 stp: RSTP(30): transmitting RSTP BPDU on port-channel10
2013 May 12 06:59:19.490127 stp: vb_vlan_shim_send_bpdu(1977): VDC 1 Vlan 30 port port-channel10 enc_type 1 len 42




As you can see from the above debug output, SW2 is sending BPDU's out po10 on vlan 30, since SW1 is receiving BPDU's for this VLAN, the bridge assurance feature says well im receiving BPDU's, so we are good to go here, lets unblock this port.


What is missing from this output though is SW2 sending BPDU's for VLAN 1 and 10, it will NOT send these, why? because for VLAN 1 and 10 Port10 is SW2's root port (the port where it can find the root bridge) and spanning-tree does not transmit BPDU's up the root port, therefore spanning-tree bridge assurance is not receiving any BPDU's for these VLAN's and is therefore blocking the port.


This shows the importance that if you are going to use spanning-tree bridge assurance, you need to make sure you set the spanning-tree port type network on BOTH ends of the link, if your connecting to a 6500 for example, you can't do spanning-tree bridge assurance, therefore you want to turn it off for any ports facing a 6500. (or just don't specify spanning-tree port type network, because it will only run on ports configured as spanning-tree port type network)


Let's fix up the spanning-tree port type network on switch 2:

SW2(config)# int po10
SW2(config-if)# spanning-tree port type network


As soon as we do this, SW1 unblocks:


SW1# 2013 May 12 10:14:05 SW1 %$ VDC-1 %$ %STP-2-BRIDGE_ASSURANCE_UNBLOCK: Bridge Assurance unblocking port port-channel1 VLAN0001.