Having spent a lot of time with customers working on vPC deployments, I have found quite a few of the gotcha's for vPC that I want to share with you now. There are plenty of guides out there on the internet including from Cisco themselves but I have found a lot of them to be dated as improvements are constantly made to vPC.
This blog post addresses vPC considerations for the following version:
NX-OS 6.0(2) on Nexus 7000 Hardware
Now we have that out of the way :)
So, if you don't know what vPC is, have never even looked at the basics on how to configure it, this is not the blog post for you. This blog post assumes you have vPC enabled and are maybe experiencing strange behavior, or you have been through the basics of vPC and are about to deploy but just want to know the gotchas
Let's talk about one vPC design caveat, addressed very well by Brad Hedlund in his blog post
Layer 3 Considerations
This particular vPC design caveat could end up causing you lots of grief if you are unaware of it.
To understand this caveat, you must understand the following rule:
vPC will not allow traffic that was RECEIVED over a VPC peer-link to be sent out a vPC member port.
This is a loop prevention method, keep that in the back of your head as you read this.
So, Let's say you have two Nexus 7k's, let's make things really simple and say that you have two VLAN's, one is your server/router VLAN, VLAN 99 and the other is VLAN 100 which is your user VLAN.
So, you have a router connected to the First Nexus, from a routing point of view it peers with the two nexus over VLAN 99.
Your router is not etherchanneled to the Nexus, it's just connected via a normal access port
You then have a server which is on a vPC port channel, called vPC 1, vPC 1 has a configuration like so:
switchport access vlan 100
switchport mode access
Pretty simple config but will do for what we are trying to show. It is connected to both Nexus
Now, for some reason your router, even though it is physically connected to the primary nexus, decides to use the Secondary Nexus as a next-hop address for the vlan 100 subnet, maybe something happened with the routing protocol on the first nexus, or it was simply misconfigured from the start, whatever the case, you have now broken the golden rule for vPC loop prevention I mentioned above
Think about it, you have a router (let's say 22.214.171.124) trying to get to a server (let's say 10.1.1.2) but it's next hop is the nexus connected OVER the vPC peer Link, then the second nexus would need to route it down a vPC MEMBER PORT
The traffic will be dropped by the loop prevention technology.
There are several solutions to this, most of which are well addressed in Brad Hedlunds document, you could create a VLAN for the router and the two nexus to establish their peer relationship on and make sure that that VLAN is not trunked to any vPC member ports, you could create an entirely seperate link between the two Nexus to carry the Layer 3, you could run the router into both chassis and use Layer 3 ports. Lot's of options. But if you ever have problems and the routing is not working, go back to that golden rule, Am i coming in over a vPC link and then trying to go out a member port?
The next Layer 3 caveat is an odd one, but worth talking about. Apparently some SAN's out there from EMC and Netapp, implement something they call "fast routing" which basically means that whenever they receive a packet from an IP address, they store the MAC address and IP address combination in there ARP table, so by the end of it there ARP table would look something like this:
Where aaaa.bbbb.cccc is the MAC address of there default gateway, the idea behind this is that it means the SAN does not have to perform a route lookup/ARP request and should save it some time, in my humble opinion it would shave maybe a fraction of a millisecond in most modern CPU's on the SAN's and in return horribly breaks the RFC (is it acceptable as part of the RFC? Am I dead wrong? it would not be the first time, leave a reply below or ping me on twitter @ccierants)
Anyway, regardless of the merits, this causes problems for the Nexus when used in combination with VRRP
the problem is that with VRRP, the default gateway has a VRRP Defined MAC, but the actual reply when it comes back to the Netapp will actually be from the Burnt In MAC address, this can cause problems! Because now when teh netapp does its look in it's arp table, it will send the traffic there, if for some reason this is the non active neighbor (the non VRRP Master), and the frame is destined for a vPC port member.. guess what, we just broke the golden rule again.
So in order to fix this, cisco implemented the peer-gateway command, the peer-gateway command tells the Nexus 7k's to route any frame rather than forwarding it over the vPC link if it is received for either mac address of either Nexus 7k. Easy Peasy!
Here is how to configure it, I can't see a single downside to configuring peer-gateway so recommend you always turn this on :)
Nexus(config)# vpc domain 1
Ok, On to a few more caveats.
Making changes to your vPC's
This is not strictly an issue with the version of NX-OS we are running in our example as the feature to stop this causing problems is turned on by default, however it is included here in case someone turned it off :)
Let's say you had a simple vPC that looked like this on both switches:
switchport mode access
switchport access vlan 50
Simple, easy, but for some reason you want to change the MTU, this would be considered a type 1 mismatch and as soon as you changed it, the vPC would be brought down across BOTH NEXUS 7'K's!!!
"What the hell just happened? I was careful and I only changed one port, now my server has gone offline, since it was etherchannel'd I should have been fine!" < - this is what you would have been saying to yourself prior to NX-OS 5.2, as a feature called "Graceful consistency check" did not exist, to see if you have graceful consistency check enabled:
Nexus# show vpc
(*) - local vPC is down, forwarding via vPC peer-link
vPC domain id : 1
Peer status : peer adjacency formed ok
If this is not set as enabled.. trust me, set it as enabled:
Nexus# conf t
Enter configuration commands, one per line. End with CNTL/Z.
Nexus(config)# vpc domain 1
Nexus(config-vpc-domain)# graceful consistency-check
OK excellent let's keep going :)
The next thing to talk about quickly is the difference between the peer-link and the peer-keepalive link.
The peer-link is an important part of the vPC puzzle, the peer-keepalive link is actually not so important. the peer-keepalive link you could actually unplug and your vPC peers would continue to function quite happily, you would have messages that the peer keepalive had failed, but you would be able to continue working, in previous NX-OS releases you would have been unable to make configuration changes, but this is not the case anymore.
What the Peer-keepalive does do however, is that in the event your peer-link fails, the peer keepalive is used to prevent a split brain scenario, if your peer-links die but the chassis itself actually remains up, you will get a message like so:
Nexus %ETHPORT-3-IF_ERROR_VLANS_SUSPENDED: VLANs 110, 99,on Interface p
ort-channel10 are being suspended. (Reason: vPC peer is not reachable over cfs
This is to prevent loops, any vPC member ports are shutdown on the secondary vPC peer.
OK next it is time to talk about a command auto-recover, this is NOT SET BY DEFAULT in this NX-OS although I would argue strongly that it should be.
Let's say for some reason, you are in a situation where both your Nexus's have been turned off, and you can only bring back one of them (turn on one of them), maybe you had a power outage and only have enough power to bring up one (A UPS from a particular feed has died) or maybe a power spike blew up one chassis and your waiting for cisco to deliver the spares for the other in the mean time, whatever the situation may be, if the end result is, you are turning on one chassis but not the other you need the auto-recover command. This command is NOT relevant if you had two Nexus switches up and let's say the power failed to one of them, if you restored the power to that Nexus, you would not need to worry about this command: the two Nexus would see each other and restore there relationship, and while one of them was offline, the vPC would have kept working.
By default, if a Nexus has been turned on with vPC configuration, and vPC port channels configured, if it cannot see it's vPC peer, it will not bring the vPC port channels up!
You can tell the Nexus upon bootup to wait a certain amount of time before deciding that hey, the other nexus involved in my vPC is not coming back any time soon, he is on an extended lunchbreak or something, so let's get those vPC's up so we can start forwarding traffic.
Here is how to turn it on:
Nexus(config)# vpc domain 1
Enables restoring of vPCs in a peer-detached state after reload, will wait for 240 seconds to determine if peer is un-reachable
As per the warning, the default time to wait before bringing up the vPC's if you can't see a peer is 240 seconds, this timer can be adjusted as a parameter to the auto-recovery command.
I will now bring you to one final caveat
Mis-configured port-channel on the end device.
So your probably use to the fact that, if you enable two interfaces for a port channel using LACP, if the other end doesn't have port-channel turned on or there is some other problem, no worries right? LACP will just place the port(s) into standalone mode and spanning-tree will just choose an active path.
Unfortunately with the Nexus, there is no such thing as standalone, it is either part of a vPC or it will be suspensed as the following output shows:
Nexus# show port-channel sum
Flags: D - Down P - Up in port-channel (members)
I - Individual H - Hot-standby (LACP only)
s - Suspended r - Module-removed
S - Switched R - Routed
U - Up (port-channel)
M - Not in use. Min-links not met
Group Port- Type Protocol Member Ports
6 Po6(SD) Eth LACP Eth1/1(s)
Nexus# show int eth1/1
Ethernet1/1 is down (suspended(no LACP PDUs))
Easy to fix:
Nexus(config-if)# lacp ?
max-bundle Configure the port-channel max-bundle
min-links Configure the port-channel min-links
suspend-individual Configure lacp port-channel state. Disabling this will
cause lacp to put the port to individual state and not
suspend the port in case it does not get LACP BPDU from
the peer ports in the port-channel
so if we enter:
Nexus(config-if)# no lacp suspend-individual
Warning: !! Disable lacp suspend-individual only on port-channel with edge ports. Disabling this on network port port-channel could lead to loops.!
Nexus(config-if)# no shut
As per the warning guys, this could cause you HUGE problems if you enable this on a port that is part of a vPC, so I would only use this no lacp suspend-individual on ports that are not part of a vPC port channel (in which case, why are you port channeling, in which case, why don't you just fix the fact that the other end is not doing port-channel or just remove the port channel config from the Nexus?)
I hope this helps someone out there!