Thursday, June 27, 2013

CCIE DC: Nested NPV

Hi Guys!

(Note: Guys, When i first tried this nested NPV it appeared to be working, I eventually found out that what I was trying to do, multiple levels of NPV nesting is not supported.

Really quickly, if you have something like this:

Cisco 5k (Normal FC Switch) - NPV 5k - NPV 5K - FI

This is not supported, because the FI itself, is actually a Nested  NPV device, it provides nested NPV to your downstream servers etc, this is one level too deep for nested NPV.

I discovered this using the incredibly useful command (Plus Rick Mur's Help in pointing out the topology is not supported ;))

show npv internal errors

Here is the relevant output:

2) Event:E_DEBUG, length:274, at 587815 usecs after Thu Jun 27 14:50:41 2013
    [118] S(0,fc1/31) Nested NPV Validation failed(Nested NPV connectivity is no
t supported)
for FLOGI with fWWN: 20:41:00:0d:ec:f4:cc:c0, nWWN: 20:01:00:0d:ec:
f4:cc:c1, sWWN from nWWN: 20:00:00:0d:ec:f4:cc:c0, sWWN from fWWN: 20:00:00:0d:e
c:f4:cc:c0 on server interface: fc1/31



So Four things I learnt from this in summary:

1. Nested NPV can only go one level Deep
2. show npv internal errors is a life saver
3. Rick Mur Rocks
4. the Cisco UCS FI itself is actually performing nested-NPV


So at first I thought this was not possible, but it is, it's called nested NPV.

Basically the idea is something like the following diagram:



While we do this, I am also going to spend some time looking at UCS and it's own treatment of FC and FCoE


OK First of all, before I go any further, I want to clarify something with 100 percent certainty, this is for my own knowledge and for others too :).

IN UCS 2.0,, their is NO NORTHBOUND FCoE Supported.

Your northbound connections must ALWAYS be FC. The only exception being appliance storage FCoE which we will cover later.


There glad we got that out of the way

Here is my config on my main switch, the most northbound switch:

version 5.1(3)N2(1a)

feature fcoe
feature npiv
interface Ethernet1/17
  switchport mode trunk

!
interface vfc1
  bind interface Ethernet1/17
  no shutdown
!

As you can see, pretty straightforward.

Next, this is my config on the middle switch,

Pod3-5548-B# show run | inc feature
feature fcoe
feature telnet
no feature http-server
feature lldp
feature npv
feature npiv


I have highlighted the key features, you enable BOTH NPV _AND_ NPIV At the same time, if you look on a UCS, this is exactly what a UCS is also doing.

Pod3-5548-B# show run int vfc1

!Command: show running-config interface vfc1
!Time: Thu Jun 27 14:07:24 2013

version 5.1(3)N2(1a)

interface vfc1
  bind interface Ethernet1/17
  switchport mode NP
  no shutdown

Pod3-5548-B# show int vfc1
vfc1 is trunking (Not all VSANs UP on the trunk)
    Bound interface is Ethernet1/17
    Hardware is Ethernet
    Port WWN is 20:00:54:7f:ee:43:c5:3f
    Admin port mode is NP, trunk mode is on
    snmp link state traps are enabled
    Port mode is TNP
    Port vsan is 1
    Trunk vsans (admin allowed and active) (1,88)
    Trunk vsans (up)                       (88)
    Trunk vsans (isolated)                 ()
    Trunk vsans (initializing)             (1)


As you can see the NP Port is up and trunking as we expect.

(Note: this section had to be left blank as i got it working temporarily, but annoyingly and inexplicably it stopped working :(






NPV Trunking with UCS

So let's say for example, you want to use multiple VSAN's on your UCS and you want to keep the UCS in end-host mode, that is fine, no problem at all, but you do need to configure a few things.

First, you need to enable trunking on the Fabric Interconnect:




This will actually change the configuration on the FC Port:


myucs-A(nxos)# show run int fc2/1

!Command: show running-config interface fc2/1
!Time: Thu Jun 27 11:29:43 2013

version 5.0(3)N2(2.11e)

interface fc2/1
  switchport mode NP
  switchport trunk mode on
  no shutdown


This is pretty much what you would expect, then check out what you can see on the upstream switch:

interface fc1/31
  switchport mode F
  no shutdown

You can see the device logged into the fabric:

Pod3-5548-A# show flogi database
--------------------------------------------------------------------------------
INTERFACE        VSAN    FCID           PORT NAME               NODE NAME
--------------------------------------------------------------------------------
fc1/31           1     0x900000  20:41:00:0d:ec:f4:ca:c0 20:01:00:0d:ec:f4:ca:c1
fc1/31           77    0x310000  20:00:00:25:b5:11:11:11 20:00:00:25:b5:00:00:01



If we jump onto the fabric interconnect, we can get some useful info too:



myucs-A(nxos)# show npv status

npiv is enabled

disruptive load balancing is disabled

External Interfaces:
====================
  Interface:  fc2/1, State: Trunking
        VSAN:    1, State: Up, FCID: 0x900000
        VSAN:   77, State: Up
  Interface:  fc2/2, State: Trunking
        VSAN:    1, State: Up, FCID: 0x900001
        VSAN:   77, State: Up

  Number of External Interfaces: 2

Server Interfaces:
==================
  Interface: vfc693, VSAN:   77, State: Up

  Number of Server Interfaces: 1

myucs-A(nxos)#



So you can see that the host has logged in, and that the VSAN is up and ready to go!

Woo Hoo!

If you where to break a link on the upstream switch:

od3-5548-A(config)# int fc1/31
Pod3-5548-A(config-if)# shut
Pod3-5548-A(config-if)# end
Pod3-5548-A# show flogi database
--------------------------------------------------------------------------------
INTERFACE        VSAN    FCID           PORT NAME               NODE NAME
--------------------------------------------------------------------------------
fc1/32           1     0x900001  20:42:00:0d:ec:f4:ca:c0 20:01:00:0d:ec:f4:ca:c1
fc1/32           77    0x310000  20:00:00:25:b5:11:11:11 20:00:00:25:b5:00:00:01

As you can see everything will just move to the other interface, Great!

Put the link back on:

Pod3-5548-A(config)# int fc1/31
Pod3-5548-A(config-if)# no shut
Pod3-5548-A(config-if)# show flogi database
--------------------------------------------------------------------------------
INTERFACE        VSAN    FCID           PORT NAME               NODE NAME
--------------------------------------------------------------------------------
fc1/32           1     0x900001  20:42:00:0d:ec:f4:ca:c0 20:01:00:0d:ec:f4:ca:c1
fc1/32           77    0x310000  20:00:00:25:b5:11:11:11 20:00:00:25:b5:00:00:01

Total number of flogi = 2.
You can see the traffic won't move, this is because the FI doesn't have the following NPV command configured:

Pod3-5548-B(config)# npv auto-load-balance ?
  disruptive  Enable disruptive auto load balancing among external links
Pod3-5548-B(config)# npv auto-load-balance disruptive


This is not configurable on the FI.



Troubleshoot SAN connectivity

Let's quickly talk about troubleshooting on UCS, especially SAN troubleshooting and boot from SAN, check out the following sequence of commands:


myucs-A# connect adapter 1/1/1
adapter 1/1/1 # connect

adapter 1/1/1 (top):1# attach-fls
adapter 1/1/1 (fls):1# login

 

With the above commands, you will have attached to the FLS on the actual FC Adapter! Very nice! now you can get lots of useful info,

adapter 1/1/1 (fls):2# vnic
---- ---- ---- ------- -------
vnic ecpu type state   lif   
---- ---- ---- ------- -------
5    1    fc   active  2 


This command gives us the LIF ID (2) which we use in our next command:
 
adapter 1/1/1 (fls):3# lif 2
               lifid : 2
               state : active
                wwpn : 20:00:00:25:b5:11:11:11
             lif_mac : 00:25:b5:11:11:11
                port : 0
         flogi state : init
             fcf_mac : 00:00:00:00:00:00
              ha_mac : 00:00:00:00:00:00
                 fid : 0x000000
                vlan : 0
                 cos : 3
         flogi_retry : 8
         plogi_retry : 8
       flogi_timeout : 4000
       plogi_timeout : 20000
               tx_wq : 10
        exch_wq_base : 11
        exch_wq_size : 4
               rx_rq : 1016
            fcmgt_wq : 15
            fcmgt_rq : 1018
       fcmgt_cq_base : 17
       fcmgt_cq_size : 2
     fcmgt_intr_base : 34
     fcmgt_intr_size : 1








If you repeat this command you can get lots of useful info!


adapter 1/1/1 (fls):4# lif 2
               lifid : 2
               state : active
                wwpn : 20:00:00:25:b5:11:11:11
             lif_mac : 00:25:b5:11:11:11
                port : 0
         flogi state : wait_flogi_rsp
             fcf_mac : 00:00:00:00:00:00
              ha_mac : 00:00:00:00:00:00
                 fid : 0x000000
                vlan : 0
                 cos : 3
         flogi_retry : 8
         plogi_retry : 8
       flogi_timeout : 4000
       plogi_timeout : 20000
               tx_wq : 10
        exch_wq_base : 11
        exch_wq_size : 4
               rx_rq : 1016
            fcmgt_wq : 15
            fcmgt_rq : 1018
       fcmgt_cq_base : 17
       fcmgt_cq_size : 2
     fcmgt_intr_base : 34
     fcmgt_intr_size : 1

adapter 1/1/1 (fls):5# lif 2
               lifid : 2
               state : active
                wwpn : 20:00:00:25:b5:11:11:11
             lif_mac : 00:25:b5:11:11:11
                port : 0
         flogi state : flogi est
             fcf_mac : 00:0d:ec:f4:ca:f0
              ha_mac : 0e:fc:00:31:00:00
                 fid : 0x310000
                vlan : 0
                 cos : 3
         flogi_retry : 8
         plogi_retry : 8
       flogi_timeout : 4000
       plogi_timeout : 20000
               tx_wq : 10
        exch_wq_base : 11
        exch_wq_size : 4
               rx_rq : 1016
            fcmgt_wq : 15
            fcmgt_rq : 1018
       fcmgt_cq_base : 17
       fcmgt_cq_size : 2
     fcmgt_intr_base : 34
     fcmgt_intr_size : 1


adapter 1/1/1 (fls):6# lif 2
               lifid : 2
               state : active
                wwpn : 20:00:00:25:b5:11:11:11
             lif_mac : 00:25:b5:11:11:11
                port : 0
         flogi state : flogi est
             fcf_mac : 00:0d:ec:f4:ca:f0
              ha_mac : 0e:fc:00:31:00:00
                 fid : 0x310000
                vlan : 0
                 cos : 3
         flogi_retry : 8
         plogi_retry : 8
       flogi_timeout : 4000
       plogi_timeout : 20000
               tx_wq : 10
        exch_wq_base : 11
        exch_wq_size : 4
               rx_rq : 1016
            fcmgt_wq : 15
            fcmgt_rq : 1018
       fcmgt_cq_base : 17
       fcmgt_cq_size : 2
     fcmgt_intr_base : 34
     fcmgt_intr_size : 1


adapter 1/1/1 (fls):7# lif 2
               lifid : 2
               state : active                wwpn : 20:00:00:25:b5:11:11:11
             lif_mac : 00:25:b5:11:11:11
                port : 0
         flogi state : flogi est
             fcf_mac : 00:0d:ec:f4:ca:f0
              ha_mac : 0e:fc:00:31:00:00
                 fid : 0x310000
                vlan : 0
                 cos : 3
         flogi_retry : 8
         plogi_retry : 8
       flogi_timeout : 4000
       plogi_timeout : 20000
               tx_wq : 10
        exch_wq_base : 11
        exch_wq_size : 4
               rx_rq : 1016
            fcmgt_wq : 15
            fcmgt_rq : 1018
       fcmgt_cq_base : 17
       fcmgt_cq_size : 2
     fcmgt_intr_base : 34
     fcmgt_intr_size : 1




5 comments:

  1. Hi Peter,

    Great article. So are you saying we can do the following:

    N5K (NPIV) -> N5K (NPV & NPIV) -> UCS FI (NPV)

    and this works?

    Thanks

    Dominic

    ReplyDelete
  2. Does this infer a single Nexus switch operating as the north bound switch for a given FI can operate in NPIV mode to cater to it, as well as operate as an NPV switch for other operational requirements ?

    ReplyDelete
  3. I'd have to test it myself, but I don't think nested NPV works.

    I'm afraid I don't agree with the statement "FI itself, is actually a Nested NPV device". It's not, it's a regular NPV device when it's at its default FC End Host Mode.
    This means that the upstream N5K has to be running NPIV.

    If then you attempt to configure that same N5K also as an NPV switch and add a second N5K to the hiearchy running NPIV, my guess is that the vHBAs of the UCS blades will not be able to login to that new N5K at the top of the hierarchy. I think the output above clarifies it: "Nested NPV Validation failed(Nested NPV connectivity is not supported)".

    ReplyDelete