TECH::Data LIF best practices for NAS in cDOT 8.3

network_cabling1

NOTE: Some of the documents linked in this blog are in the process of being updated for 8.3, so they may contain older information about clustered Data ONTAP 8.2.x. Check back on occasion to these links to get the most up to date content.

Clustered Data ONTAP 8.3 allows storage administrators to provide the following benefits:

  • Seamless scale-out storage
  • Multiprotocol Unified Access (NFS, CIFS and SAN)
  • Non-disruptive operations

This is done by way of secure multi-tenant architecture with Storage Virtual Machines. This blog will cover logical interface (LIF) considerations for clustered Data ONTAP 8.3 as they pertain to NAS storage operations. These considerations will eventually be added to TR-4067. For a complete list of networking best practices, see TR-4182: Ethernet Storage Best Practices for cDOT Configurations.

Storage Virtual Machines (SVMs)

SVMs are logical storage containers that own storage resources such as flexible volumes, logical interfaces (LIFs), exports, CIFS shares, etc. Think of them as a storage “blade center” in your cluster. These SVMs share physical hardware resources in the cluster with one another, such as network ports/VLANs, aggregates with physical disk, CPU, RAM, switches, etc. As a result, load for SVMs can be balanced across a cluster for maximum performance and efficiency, or to leverage SaaS functionality, among other benefits.

Cluster considerations

A cluster can be comprised of several HA pairs of nodes (4 HA pairs/8 nodes with SAN, 12 HA pairs/24 nodes with NAS). Each node in the cluster has its own copy of a replicated database with the cluster and SVM configuration information. Additionally, each node has its own set of user space applications that handle cluster operations and node-specific caches, not to mention its own set of RAM, CPU, disks, etc. So while a cluster operates as a single entity, it does have the underlying concept of individualized components. As a result, it makes sense to take the physical hardware in a cluster under consideration when implementing and designing.

Data LIF considerations

Data LIFs can live on any physical port in a cluster that is added to a valid broadcast domain. These data LIFs are configured with SVM-aware routing mechanisms that allow for the correct pathing of ethernet traffic in a SVM, regardless of where a valid data LIF lives in the cluster. Prior to 8.3, SVMs routed at a node level, so traffic could only travel via the node that owned a data LIF. In cDOT 8.3, traffic will route from the data LIF even if it is a non-local path.

However, despite this enhancement in cDOT 8.3, it is still worth considering the original best practice recommendation for data LIFs participating in NAS operations…

One data LIF per node, per SVM

With the introduction of IP Spaces in clustered Data ONTAP, this recommendation is more of a reality, as storage administrators no longer have to use unique IP addresses in the cluster for SVMs. With IP Spaces, IP addresses can be duplicated in the cluster on a per-SVM basis to allow for true secure multi-tenancy architecture. For more information on IP Spaces, see TR-4182.

Thus, if using a 24 node cluster, using 24 data LIFs per SVM would be ideal for the following reasons:

  • Ability to leverage data locality features – Features such as NFS referrals, CIFS auto location and pNFS ensure data locality for NAS traffic regardless of where the volumes live in a cluster. These help balance load better, but also make use of local caches and fastpath mechanisms for NAS traffic. For more information, see TR-4067.
  • Ability to reduce cluster network traffic – While cluster network traffic is generally not an issue (you’re more likely to peg the CPU or disk before you saturate a 10gb network), it is better to limit the amount of traffic on a cluster network as much as possible.
  • Ability to ensure data locality in the event of a volume move – If you move a volume to another node, you can ensure you still have a local path to the data if every node has a data LIF for the SVM.
  • Ability to spread the load out across nodes and leverage all the available hardware (CPU, RAM, etc) – If you load up all your NAS traffic on one node via one data LIF, you aren’t realizing the value of the other nodes in the cluster. Spreading network traffic ensures all available physical entities are being utilized. Why pay for hardware you aren’t using?
  • Ability to balance network connections across multiple cluster nodes – Clusters are single entities, as are SVMs. But they do have underlying hardware that have their own maximums, such as number of connections, etc. For info on hardware maximums in cDOT, see the configuration information for your version of ONTAP.
  • Ability to reduce impact of storage failover/givebacks (SFO). – Fewer clients impacted when SFO events happen, whether they are planned or unplanned.
  • Ability to leverage features such as on-box DNS – On-box DNS allows data LIFs to act as DNS servers and honor forwarded zone requests. Once a zone request is received, the cluster will determine the ideal node to service that request based on that node’s CPU and throughput, providing intelligent DNS load balancing (as opposed to round robin DNS, which is a serial process). For more information regarding on-box DNS (and how to configure it), see TR-4182 and TR-4073.

Keep in mind that the above are merely recommendations and not requirements unless using data locality features such as pNFS.

Any questions? Comments? Suggestions for blog topics? Feel free to comment or contact me on Twitter.

Advertisements

16 thoughts on “TECH::Data LIF best practices for NAS in cDOT 8.3

  1. I am curious what you mean by… “Prior to 8.3, SVMs routed at a node level, so traffic could only travel via the node that owned a data LIF. In cDOT 8.3, traffic will route from the data LIF even if it is a non-local path.”

    Did Nodes ever own LIFS? I thought Nodes owned Ports and LIFS were owned by an SVM and could be moved to any (appropriate) PORT in the cluster. also curious what you mean by traffic could only travel by the node that owned the data LIF, but in 8.3 traffic will route from the data LIF even if its is a non-local path. I’m having a hard time wrapping my head around that.

    That doesn’t seem correct to me. I thought direct and indirect I/O were part of Clustered ONTAP. if the request came in on a LIF but the volume was on a different node the I/O crosses the Cluster network and the response traverses it again to exit via the same LIF it came in on. I’m speaking of NAS traffic. SMB 1/2/2.1 and even 3, without doing autolocation. As far as i know the request coming in on LIF1 but exiting on LIF2 won’t/can’t happen.

    Like

    • The phrasing was a bit off. I was referring to auth requests. In 8.2.x and prior, if you issued an auth request on a node with a data LIF, that request would travel on that LIF. If you issued an auth request on a node with no data LIF in the SVM, the request would fail, as it was unable to route out of the node. In 8.3, auth requests can route anywhere in the cluster in SVMs if a routeable data LIF is present on any node.

      Nodes never “own” LIFs, as you state. But they do live on physical ports owned by physical nodes, which have individual processes.

      Regular NAS traffic (NFS, SMB) will arrive on a data LIF and leverage the owning node’s processes.

      Hope that helps.

      Like

  2. That makes a lot more sense to me. I thought you were talking about Data LIFS, not Mgmt LIFs and thats what confused me. Thanks for clarifying.

    Like

  3. So, For 8.3.X – system that runs up to 20 VMware NFS Datastores… Can I use single LIF only for all of them, Or should I, from some reason, Have separate LIF for each datastore?
    No plans in the near or far future to scale out the cluster.
    thanks,
    Itay

    Like

  4. From TR-4333 (April 2016) :
    Best Practice
    For each NFS datastore, provision a LIF for the SVM on each node.

    I have about 80 NFS datastores to migrate from 7-mode to cluster mode. What is the real best practice from your point of view ? Thanks.

    Like

    • As far as I know, that best practice has been changed to a data LIF per node, per SVM, regardless of number of datastores. So no more 1:1 LIF to datastore needed. Next release of that TR should cover it.

      Like

  5. Pingback: Why Is the Internet Broken: Greatest Hits | Why Is The Internet Broken?

  6. Pingback: Spreading the love: Load balancing NAS connections in ONTAP | Why Is The Internet Broken?

  7. Any idea on next release date on that TR? I want to host 100 NFS vols on 4-6 nodes and then be able to vol move them individually to other aggrs(nodes) without having to change their mount IP. I was about to set up 1 mount IP per lif per datastore instead

    Like

  8. Hi Justin,
    Is there a way to tell if a volume is being accessed via an indirect path? I’m analyzing performance of a cluster, see plenty of cluster sent/received traffic in statistics show-periodic, don’t see a data LIF on every node and in some cases, volumes are named for the node/aggregate upon which they were originally created, however, are now living on a different node that does not contain a data LIF for the corresponding vserver. Like anything else, I have to prove/quantify the problem so a more granular report (other than cluster sent/received) would be very helpful.

    PS. client stats show remote_ops but doesn’t tell me, from which LIF the operation originated, nor does it tell me where it was going.

    Thanks in advance.

    Jim

    Like

    • There is currently no way to tie a data LIF to a volume for NFS mounts. cifs session show -instance does a little better.

      Vserver: DEMO

      Node: ontap9-tme-8040-01
      Session ID: 4365395413805563906
      Connection ID: 707037117
      Incoming Data LIF IP Address: 10.193.67.237
      Workstation IP Address: 10.193.67.236
      Authentication Mechanism: Kerberos
      User Authenticated as: domain-user
      Windows User: NTAP\Administrator
      UNIX User: root
      Open Shares: 2
      Open Files: 3
      Open Other: 0
      Connected Time: 4d 3h 46m 58s
      Idle Time: 21h 50m 51s
      Protocol Version: SMB3
      Continuously Available: Yes
      Is Session Signed: false
      NetBIOS Name: demo
      SMB Encryption Status: unencrypted
      Connection Count: 1

      As for finding remote traffic for a specific volume, remote_ops in the per client stats is your best bet.

      You can also leverage network connections active show to display active client connections.

      But there’s no simple way to find NFS export to data LIF.

      Like

      • Hi Justin,

        Thanks for the reply. Unfortunately, there is no clean way to see what I want to see. A variation of the “cifs session” command may be the best method. Figuring out the shares, to which volume they belong, then the corresponding aggregates may be the first step. Next, use the “cifs session file show -vserver -node -share “. The output of this command, when combined with the share-to-volume-to-aggregate output, can be used to determine indirect access. As an example, a share named Test in vserver1, on vol1, that is on aggregate1 of node1 would have indirect access if there are entries in the output from the “cifs session file show -vserver vserver1 -node node2 -share Test”. Note that I’ve specified node2 to indicate that I want to see all sessions coming from the HA partner, not the node that owns the actual data being accessed.There is still the question of how many ops and how much data is being accessed indirectly, but hey, 1 battle at a time.

        Thanks again.

        Like

  9. I’d like to add one more detail to the aforementioned “cifs session file show” command. If you know the volume/aggregate/node of particular share, you can use “!node” in the -node argument. This will show you sessions from all nodes that are not the node which owns the volume/aggregate.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s