How to find average file size and largest file size using XCP

If you use NetApp ONTAP to host NAS shares (CIFS or NFS) and have too many files and folders to count, then you know how challenging it can be to figure out file information in your environment in a quick, efficient and effective way.

This becomes doubly important when you are thinking of migrating NAS data from FlexVol volumes to FlexGroup volumes, because there is some work up front that needs to be done to ensure you size the capacity of the FlexGroup and its member volumes correctly. TR-4571 covers some of that in detail, but it basically says “know your average file size.” It currently doesn’t tell you *how* to do that (though it will eventually). This blog attempts to fill that gap.

XCP

I’ve written previously about XCP here:

Generally speaking, it’s been to tout the data migration capabilities of the tool. But, in this case, I want to highlight the “xcp scan” capability.

XCP scan allows you to use multiple, parallel threads to analyze an unstructured NAS share much more quickly than you could with basic tools like rsync, du, etc.

The NFS version of XCP also allows you to output this scan to a file (HTML, XML, etc) to generate a report about the scanned data. It even does the math for you and finds the largest (max) file size and average file size!

xcpfilesize

The command I ran to get this information was:

# xcp scan -md5 -stats -html SERVER:/volume/path > filename.html

That’s it! XCP will scan and write to a file. You can also get info about the top five file consumers (by number and capacity) by owner, as well as get some nifty graphs. (Pro tip: Managers love graphs!)

xcp-graphs

What if I only have SMB/CIFS data?

Currently, XCP for SMB doesn’t support output to HTML files. But that doesn’t mean you can’t have fun, too!

You can stand up a VM using CentOS or whatever your favorite Linux kernel is and use XCP for NFS to scan data – provided the client has the necessary access to do so and you can score an NFS license (even if it’s eval). XCP scans are read-only, so you shouldn’t have issues running them.

Just keep in mind the following:

NFS to shares that have traditionally been SMB/CIFS-only are likely NTFS security style. This means that the user you are accessing the data as (for example, root) should be able to map to a valid Windows user that has read access to the data. NFS clients that access NTFS security style volumes map to Windows users to figure out permissions. I cover that here:

Mixed perceptions with NetApp multiprotocol NAS access

You can check the volume security style in two ways:

  • CLI with the command
    ::> volume show -volume [volname] -fields security-style
  • OnCommand System Manager under the “Storage -> Qtrees” section (yea, yea… I know. Volumes != Qtrees)

ocsm-qtree

To check if the user you are attempting to access the volume via NFS with maps to a valid and expected Windows user, use this CLI command from diag privilege:

::> set diag
::*> diag secd name-mapping show -node node1 -vserver DEMO -direction unix-win -name prof1

'prof1' maps to 'NTAP\prof1'

To see what Windows groups this user would be a member of (and thus would get access to files and folders that have those groups assigned), use this diag privilege command:

::*> diag secd authentication show-creds -node ontap9-tme-8040-01 -vserver DEMO -unix-user-name prof1

UNIX UID: prof1 <> Windows User: NTAP\prof1 (Windows Domain User)

GID: ProfGroup
 Supplementary GIDs:
 ProfGroup
 group1
 group2
 group3
 sharedgroup

Primary Group SID: NTAP\DomainUsers (Windows Domain group)

Windows Membership:
 NTAP\group2 (Windows Domain group)
 NTAP\DomainUsers (Windows Domain group)
 NTAP\sharedgroup (Windows Domain group)
 NTAP\group1 (Windows Domain group)
 NTAP\group3 (Windows Domain group)
 NTAP\ProfGroup (Windows Domain group)
 Service asserted identity (Windows Well known group)
 BUILTIN\Users (Windows Alias)
 User is also a member of Everyone, Authenticated Users, and Network Users

Privileges (0x2080):
 SeChangeNotifyPrivilege

If you want to run XCP as root and want it to have administrator level access, you can create a name mapping. This is what I have in my SVM:

::> vserver name-mapping show -vserver DEMO -direction unix-win

Vserver: DEMO
Direction: unix-win
Position Hostname         IP Address/Mask
-------- ---------------- ----------------
1        -                -                Pattern: root
                                           Replacement: DEMO\\administrator

To create a name mapping for root to map to administrator:

::> vserver name-mapping create -vserver DEMO -direction unix-win -position 1 -pattern root -replacement DEMO\\administrator

Keep in mind that backup software often has this level of rights to files and folders, and the XCP scan is read-only, so there shouldn’t be any issue. If you are worried about making root an administrator, create a new Windows user for it to map to (for example, DOMAIN\xcp) and add it to the Backup Operators Windows Group.

In my lab, I ran a scan on a NTFS security style volume called “xcp_ntfs_src”:

::*> vserver security file-directory show -vserver DEMO -path /xcp_ntfs_src

Vserver: DEMO
 File Path: /xcp_ntfs_src
 File Inode Number: 64
 Security Style: ntfs
 Effective Style: ntfs
 DOS Attributes: 10
 DOS Attributes in Text: ----D---
Expanded Dos Attributes: -
 UNIX User Id: 0
 UNIX Group Id: 0
 UNIX Mode Bits: 777
 UNIX Mode Bits in Text: rwxrwxrwx
 ACLs: NTFS Security Descriptor
 Control:0x8014
 Owner:NTAP\prof1
 Group:BUILTIN\Administrators
 DACL - ACEs
 ALLOW-BUILTIN\Administrators-0x1f01ff-OI|CI
 ALLOW-DEMO\Administrator-0x1f01ff-OI|CI
 ALLOW-Everyone-0x100020-OI|CI
 ALLOW-NTAP\student1-0x120089-OI|CI
 ALLOW-NTAP\student2-0x120089-OI|CI

I used this command and nearly 600,000 objects were scanned in 25 seconds:

# xcp scan -md5 -stats -html 10.x.x.x:/xcp_ntfs_src > xcp-ntfs.html
XCP 1.3D1-8ae2672; (c) 2018 NetApp, Inc.; Licensed to Justin Parisi [NetApp Inc] until Tue Sep 4 13:23:07 2018

126,915 scanned, 85,900 summed, 43.8 MiB in (8.75 MiB/s), 14.5 MiB out (2.89 MiB/s), 5s
 260,140 scanned, 187,900 summed, 91.6 MiB in (9.50 MiB/s), 31.3 MiB out (3.34 MiB/s), 10s
 385,100 scanned, 303,900 summed, 140 MiB in (9.60 MiB/s), 49.9 MiB out (3.71 MiB/s), 15s
 516,070 scanned, 406,530 summed, 187 MiB in (9.45 MiB/s), 66.7 MiB out (3.36 MiB/s), 20s
Sending statistics...
 594,100 scanned, 495,000 summed, 220 MiB in (6.02 MiB/s), 80.5 MiB out (2.56 MiB/s), 25s
594,100 scanned, 495,000 summed, 220 MiB in (8.45 MiB/s), 80.5 MiB out (3.10 MiB/s), 25s.

This was the resulting report:

xcp-ntfs

Happy scanning!

Advertisements

Workaround for Mac Finder errors when unzipping files in ONTAP

ONTAP allows you to mount volumes to other volumes in a Storage Virtual Machine, which provides a way for storage administrators to create their own folder structures across multiple nodes in a cluster. This is useful when you want to ensure the workload gets spread across nodes, but you can’t use FlexGroup volumes for whatever reason.

This graphic shows how that can work:

junctioned-volumes.png

In NAS environments, a client will ask for a file or folder location and ONTAP will re-direct the traffic to wherever that object lives. This is supposed to be transparent to the client, provided they follow standard NAS deployment steps.

However, not all NAS clients are created equal. Sometimes, Linux serves up SMB and will do things differently than Microsoft does. Windows also will do NFS, but it doesn’t entirely follow the NFS RFCs. So, occasionally, ONTAP doesn’t expect how a client handles something and stuff breaks.

Mac Finder

If you’ve ever used a Mac, you’ll know that the Finder can do somethings a little differently than the Terminal does. In this particular issue, we’ll focus on how Finder unzips files (when you double-click the file) in volumes that are mounted to other volumes in ONTAP.

One of our customers hit this issue, and after poking around a little bit, I figured out how to workaround the issue.

Here’s what they were doing:

  • SMB to Mac clients
  • Shares at the parent FlexVol level (ie, /vol1
  • FlexVols mounted to other FlexVols several levels deep (ie, /vol1/vol2/vol3)

When files are unzipped after accessing a share at a higher level and then drilling down into other folders (which are actually FlexVols mounted to other FlexVols), then unzipping in Finder via double-click fails.

When the shares are mounted at the same level as the FlexVol where the unzip is attempted, unzip works. When the Terminal is used to unzip, it works.

However, when your users refuse to use/are unable to use the Terminal and you don’t want to create hundreds of shares just to work around one issue, it’s an untenable situation.

So, I decided to dig into the issue…

Reproducing the issue

The best way to troubleshoot problems is to set up a lab environment and try to recreate the problem. This allows you freedom to gather logs, packet traces, etc. without bothering your customer or end user. So, I brought my trusty old 2011 MacBook running OS Sierra and mounted the SMB share in question.

These are the volumes and their junction paths:

DEMO inodes /shared/inodes
DEMO shared /shared

This is the share:

 Vserver: DEMO
 Share: shared
 CIFS Server NetBIOS Name: DEMO
 Path: /shared
 Share Properties: oplocks
 browsable
 changenotify
 show-previous-versions
 Symlink Properties: symlinks
 File Mode Creation Mask: -
 Directory Mode Creation Mask: -
 Share Comment: -
 Share ACL: Everyone / Full Control
 File Attribute Cache Lifetime: -
 Volume Name: shared
 Offline Files: manual
 Vscan File-Operations Profile: standard
 Maximum Tree Connections on Share: 4294967295
 UNIX Group for File Create: -

I turned up debug logging on the cluster (engage NetApp Support if you want to do this), got a packet trace on the Mac and reproduced the issue right away. Lucky me!

finder-error

I also tried a 3rd party unzip utility (Stuffit Expander) and it unzipped fine. So this was definitely a Finder/ONTAP/NAS interaction problem, which allowed me to focus on that.

Packet traces showed that the Finder was attempting to look for a folder called “.TemporaryItems/folders.501/Cleanup At Startup” but couldn’t find it – and couldn’t create it, apparently either. But it would created folders named “BAH.XXXX” instead, and they wouldn’t get cleaned up.

So, I thought, why not manually create the folder path, since it wasn’t able to do it on its own?

You can do this through Terminal, or via Finder. Keep in mind that the path above has “folders.501” – 501 is my uid, so check your users uid on the Mac and make sure the folder path is created using that uid. If you have multiple users that access the share, you may need to create multiple folders.xxx in .TemporaryItems.

If you do it via Finder, you may want to enable hidden files. I learned how to do that via this article:

https://ianlunn.co.uk/articles/quickly-showhide-hidden-files-mac-os-x-mavericks/

So I did that and then I unmounted the share and re-mounted, to make sure there wasn’t any weird cache issue lingering. You can check CIFS/SMB sessions, versions, etc with the following command, if you want to make sure they are closed:

cluster::*> cifs session show -vserver DEMO -instance

Vserver: DEMO

Node: node1
 Session ID: 16043510722553971076
 Connection ID: 390771549
 Incoming Data LIF IP Address: 10.x.x.x
 Workstation IP Address: 10.x.x.x
 Authentication Mechanism: NTLMv2
 User Authenticated as: domain-user
 Windows User: NTAP\prof1
 UNIX User: prof1
 Open Shares: 1
 Open Files: 1
 Open Other: 0
 Connected Time: 7m 49s
 Idle Time: 6m 2s
 Protocol Version: SMB3
 Continuously Available: No
 Is Session Signed: true
 NetBIOS Name: -
 SMB Encryption Status: unencrypted
 Connection Count: 1

Once I reconnected with the newly created folder path, double-click unzip worked perfectly!

Check it out yourself:

Note: You *may* have to enable the option is-use-junctions-as-reparse-points-enabled on your CIFS server. I haven’t tested with it off and on thoroughly, but I saw some inconsistency when it was disabled. For the record, it’s on by default.

You can check with:

::*> cifs options show -fields is-use-junctions-as-reparse-points-enabled

Give it a try and let me know how it works for you in the comments!

Life hack: Change the UID of all files in NAS volume… without actually changing anything!

146538-640x313

There are a ton of hidden gems in ONTAP that go unnoticed because they get added without a lot of fanfare. For example, the volume recovery queue got added in 8.3 and no one really knew what it was, what it did, or why the volumes they deleted didn’t seem to actually get deleted for 24 hours.

I keep my ears open for these features so I can promote them and I ran across a pretty slick, simple gem while at the NetApp Converge (sales kick off) conference, from an old colleague in my support days that now does SE work. (Shout out to Maarten Lippmann!)

But, features are only as good as their use cases.

Here’s the scenario…

Let’s say you have a Git code repository with millions of files and its files are owned by a number of different people that one of your developers wants to access and make changes to. They don’t have access to some of those files by way of permissions, but there are way too many to re-permission effectively and in a timely manner. Plus, if you change the access to these files, you might break the code repo horribly.

So, how do you:

  • Create a usable copy of the entire code repo in a reasonable amount of time without eating up a ton of space
  • Assign a new owner to all the files in the volume quickly and easily
  • Keep the original repo intact

It’s pretty easy in ONTAP, actually – In fact, it’s a single command. All you need is a FlexClone license and you can make an instant copy of a volume with a new file owner without impacting the source volume and without using up any new space. Additionally, if you wanted to keep those changes, you can split the clone into its own unique volume.

In the following example, I have an existing volume that has a ton of files and folders, all owned by root:

[root@XCP nfs4]# ls -la
total 8012
d------r-x. 102 root root 8192 Apr 11 11:41 .
drwxr-xr-x. 5 root root 4096 Apr 12 17:20 ..
----------. 1 root root 0 Apr 11 11:29 file
d---------. 1002 root root 77824 Apr 11 11:47 topdir_0
d---------. 1002 root root 77824 Apr 11 11:47 topdir_1
...
d---------. 1002 root root 77824 Apr 11 11:47 topdir_99

I want the new owner of the files in the cloned volume to be a user named “prof1” and the GID to be 1101.

cluster::*> getxxbyyy getpwbyname -node ontap9-tme-8040-01 -vserver DEMO -username prof1
 (vserver services name-service getxxbyyy getpwbyname)
pw_name: prof1
pw_passwd:
pw_uid: 1100
pw_gid: 1101
pw_gecos:
pw_dir:
pw_shell:

So, I do the following:

cluster::*> vol clone create -vserver DEMO -flexclone clone -type RW -parent-vserver DEMO -parent-volume flexvol -junction-active true -foreground true -junction-path /clone -uid 1100 -gid 1101
[Job 12606] Job succeeded: Successful

cluster::*> vol show -vserver DEMO -volume clone -fields clone-volume,clone-parent-name,clone-parent-vserver
vserver volume clone-volume clone-parent-vserver clone-parent-name
------- ------ ------------ -------------------- -----------------
DEMO clone true DEMO flexvol

That command took literally 10 seconds to complete. There are over 1.8 million objects in that volume.

cluster::*> df -i /vol/clone
Filesystem iused ifree %iused Mounted on Vserver
/vol/clone/ 1824430 4401487 29% /clone DEMO

Then, I check the owner of the files:

cluster::*> vserver security file-directory show -vserver DEMO /clone/nfs4

Vserver: DEMO
 File Path: /clone/nfs4
 File Inode Number: 96
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 10
 DOS Attributes in Text: ----D---
Expanded Dos Attributes: -
 UNIX User Id: 1100
 UNIX Group Id: 1101
 UNIX Mode Bits: 5
 UNIX Mode Bits in Text: ------r-x
 ACLs: NFSV4 Security Descriptor
 Control:0x8014
 DACL - ACEs
 ALLOW-user-prof1-0x1601ff-FI|DI|IO
 ALLOW-user-student1-0x21-FI|DI|IO
 ALLOW-group-ProfGroup-0x1200a9-FI|DI|IO|IG
 ALLOW-EVERYONE@-0x1200a9

cluster::*> vserver security file-directory show -vserver DEMO /clone/nfs4/topdir_99

Vserver: DEMO
 File Path: /clone/nfs4/topdir_99
 File Inode Number: 3556
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 10
 DOS Attributes in Text: ----D---
Expanded Dos Attributes: -
 UNIX User Id: 1100
 UNIX Group Id: 1101
 UNIX Mode Bits: 0
 UNIX Mode Bits in Text: ---------
 ACLs: NFSV4 Security Descriptor
 Control:0x8004
 DACL - ACEs
 ALLOW-user-prof1-0x1601ff-FI|DI
 ALLOW-user-student1-0x21-FI|DI
 ALLOW-group-ProfGroup-0x1200a9-FI|DI|IG

And from the client:

[root@XCP nfs4]# pwd
/clone/nfs4

[root@XCP nfs4]# ls -la
total 8012
d------r-x. 102 1100 1101 8192 Apr 11 11:41 .
drwxr-xr-x. 5 1100 1101 4096 Apr 12 17:20 ..
----------. 1 1100 1101 0 Apr 11 11:29 file
d---------. 1002 1100 1101 77824 Apr 11 11:47 topdir_0
d---------. 1002 1100 1101 77824 Apr 11 11:47 topdir_1
d---------. 1002 1100 1101 77824 Apr 11 11:47 topdir_10
d---------. 1002 1100 1101 77824 Apr 11 11:47 topdir_11
d---------. 1002 1100 1101 77824 Apr 11 11:47 topdir_12

It shouldn’t be that easy, should it?

If I wanted to split the volume off into its own volume (such as when a dev makes changes and wants to keep them, but doesn’t want to change the source volume):

cluster::*> vol clone split
 estimate show start status stop

If I want to delete the clone after I’m done, I just run “volume destroy.”

Questions? Hit me up in the comments!

Backing up/restoring ONTAP SMB shares with PowerShell

486042-636355594290390040-16x9

A while back, I posted a SMB share backup and restore PowerShell script written by one of our SMB developers.  Later, Scott Harney added some scripts for NFS exports. You can find those here:

https://github.com/DatacenterDudes/cDOT-CIFS-share-backup-restore

That was back in the ONTAP 8.3.x timeframe. They’ve worked pretty well for the most part, but since then, we’re up to ONTAP 9.3 and I’ve occasionally gotten feedback that the scripts throw errors sometimes.

While the idea of an open script repository is to have other people send updates of scripts and make it a living, breathing and evolving entity, that’s not how this script has ended up. Instead, it’s gotten old and crusty and in need of an update. The inspiration was this reddit thread:

So, I’ve done that. You can find the updated versions of the script for ONTAP 9.x at the same place as before:

https://github.com/DatacenterDudes/cDOT-CIFS-share-backup-restore

However, other than for testing purposes, it may not have been necessary to do anything. I actually ran the original restore script without changing anything of note (changed some comments) and it ran fine. The errors most people see either have to do with the version of the NetApp PowerShell toolkit, a syntax error in their copy/paste or their version of PowerShell. Make sure they’re all up to date, else you’ll run into errors. I used:

  • Windows 2012R2
  • ONTAP 9.4 (yes, I have access to early releases!)
  • PowerShell 4.0.1.1
  • Latest NetApp PowerShell toolkit (4.5.1 for me)

When should I use these scripts?

These were created as a way to fill the gap that SVM-DR now fills. Basically, before SVM-DR existed, there was no way to backup and restore CIFS configurations. Even with SVM-DR, these scripts offer some nice granular functionality to backup and restore specific configuration areas and can be modified to include other things like CIFS options, SAN configuration, etc.

As for how to run them…

Backing up your shares

1) Download and install the latest PowerShell toolkit from https://mysupport.netapp.com/tools/info/ECMLP2310788I.html?productID=61926

ps-toolkit

2) Import the DataONTAP module with “Import-Module DataONTAP”

(be sure that the PowerShell window is closed and re-opened after you install the toolkit; otherwise, Windows won’t find the new module to import)

3) Back up the desired shares as per the usage comments in the script. (see below)

# Usage:
# Run as: .\backupSharesAcls.ps1 -server <mgmt_ip> -user <mgmt_user> -password <mgmt_user_password> -vserver <vserver name> -share <share name or * for all> -shareFile <xml file to store shares> -aclFile <xml file to store acls> -spit <none,less,more depending on info to print>
#
# Example
# 1. If you want to save only a single share on vserver vs2.
# Run as: .\backupSharesAcls.ps1 -server 10.53.33.59 -user admin -password netapp1! -vserver vs2 -share test2 -shareFile C:\share.xml -aclFile C:\acl.xml -spit more 
#
# 2. If you want to save all the shares on vserver vs2.
# Run as: .\backupSharesAcls.ps1 -server 10.53.33.59 -user admin -password netapp1! -vserver vs2 -share * -shareFile C:\share.xml -aclFile C:\acl.xml -spit less
#
# 3. If you want to save only shares that start with "test" and share1 on vserver vs2.
# Run as: .\backupSharesAcls.ps1 -server 10.53.33.59 -user admin -password netapp1! -vserver vs2 -share "test* | share1" -shareFile C:\share.xml -aclFile C:\acl.xml -spit more
#
# 4. If you want to save shares and ACLs into .csv format for examination.
# Run as: .\backupSharesAcls.ps1 -server 10.53.33.59 -user admin -password netapp1! -vserver vs2 -share * -shareFile C:\shares.csv -aclFile C:\acl.csv -csv true -spit more

If you use “-spit more” you’ll get verbose output:

backup-shares

4) Review the shares/ACLs via the XML files.

That’s it for backup. Pretty straightforward. However, our backups are only as good as our restores…

Restoring the shares using the script

I don’t recommend testing this script the first time on a production system. I’d suggest creating a test SVM, or even leveraging SVM-DR to replicate the SVM to a target location.

In my lab, however… who cares! Let’s blow it all away!

delete-shares

Now, run your restore.

restore-shares-acl

That’s it! Happy backing up/restoring!

Tips for running the script

  • Before running the script, copy and paste it into the “PowerShell ISE” to verify that the syntax is correct. From there, save the script to the local client. Syntax errors can cause problems with the script’s success.
  • Use the latest available NetApp PowerShell Toolkit and ensure the PowerShell version on your client matches what is in the release notes for the toolkit.
  • Test the script on a dummy SVM before running in production.
  • Ensure the DataONTAP module has been imported; if import fails after installing the toolkit, close the PowerShell window and re-open it.

Questions?

If you have any questions or comments, leave them here. Also, if you customize these at all, please do share with the community! Add them to the Github repository or create your own repo!

Cache Rules Everything Around Me: New Global Name Service Cache in ONTAP 9.3

cache-rules

In an ONTAP cluster made up of individual nodes with individual hardware resources, it’s useful if a storage administrator can manage the entire cluster as a monolithic entity, without having to worry about what lives where.

Prior to ONTAP 9.3, name service caches were node-centric, for the most part. This sometimes could create scenarios where a cache could become stale on one node, where it was recently populated on another node. Thus, a client may get different results depending on which physical node the network connection occurred.

The following is pulled right out of the new name services best practices technical report (https://www.netapp.com/us/media/tr-4668.pdf), which acts as an update to TR-4379. I wrote some of this, but most of what’s written here is by the new NFS/Name Services TME Chris Hurley. (@averageguyx) This is basically a copy/paste, but I thought this was a cool enough feature to highlight on its own.

Global Name Services Cache in ONTAP 9.3

ONTAP 9.3 offers a new caching mechanism that moves name service caches out of memory and into a persistent cache that is replicated asynchronously between all nodes in the cluster. This provides more reliability and resilience in the event of failovers, as well as offering higher limits for name service entries due to being cached on disk rather than in node memory.

The name service cache is enabled by default. If legacy cache commands are attempted in ONTAP 9.3 with name service caching enabled, an error will occur, such as the following:

Error: show failed: As name service caching is enabled, "Netgroups" caches no longer exist. Use the command "vserver services name-service cache netgroups members show" (advanced privilege level) to view the corresponding name service cache entries.

The name service caches are controlled in a centralized location, below the name-service cache command set. This provides easier cache management, from configuring caches to clearing stale entries.

The global name service cache can be disabled for individual caches using vserver services name-service cache commands in advanced privilege, but it is not recommended to do so. For more detailed information, please see later sections in this document.

ONTAP also offers the additional benefit of using the caches while external name services are unavailable.  If there is an entry in the cache, regardless if the entry’s TTL is expired or not, ONTAP will use that cache entry when external name services servers cannot be reached, thereby providing continued access to data served by the SVM.

Hosts Cache

There are two individual host caches; forward-lookup and reverse-lookup but the hosts cache settings are controlled as a whole.  When a record is retrieved from DNS, the TTL of that record will be used for the cache TTL, otherwise, the default TTL in the host cache settings will be used (24 hours).  The default for negative entries (host not found) is 60 seconds.  Changing DNS settings does not affect the cache contents in any way.

  • The network ping command does not use the name services hosts cache when using a hostname.

User and Group Cache

The user and group caches consist of three categories; passwd (user), group and group membership.

  • Cluster RBAC access does not use the any of the caches

Passwd (User) Cache

User cache consists of two caches, passwd and passwd-by-uid.  The caches only cache the name, uid and gid aspects of the user data to conserve space since the other data such as homedir and shell are irrelevant for NAS access.  When an entry is placed in the passwd cache, the corresponding entry is created in the passwd-by-uid cache.  By the same token, when an entry is deleted from one cache, the corresponding entry will be deleted from the other cache.  If you have an environment where there are disjointed username to uid mappings, there is an option to disable this behavior.

Group Cache

Like the passwd cache, the group cache consists of two caches, group and group-by-gid.  When an entry is placed in the group cache, the corresponding entry is created in the group-by-gid cache.  By the same token, when an entry is deleted from one cache, the corresponding entry will be deleted from the other cache.  The full group membership is not cached to conseve space and is not necessary for NAS data access, therefore only the group name and gid are cached.  If you have an environment where there are disjointed group name to gid mappings, there is an option to disable this behavior.

Group Membership Cache

In file and NIS environments, there is no efficient way to gather a list of groups a particular user is a member of, so for these environments ONTAP has a group membership cache to provide these efficiencies.  The group membership cache consists of a single cache and contains a list of groups a user is a member of.

Netgroup Cache

Beginning in ONTAP 9.3, the various netgroup caches have been consolidated into 2 caches; a netgroup.byhost and a netgroup.byname cache.  The netgroup.byhost cache is the first cache consulted for the netgroups a host is a part of.  Next, if this information is not available, then the query reverts to gathering the full netgroup members and comparing that to the host.  If the information is not in the cache, then the same process is performed against the netgroup ns-switch sources.  If a host requesting access via a netgroup is found via the netgroup membership lookup process, that ip-to-netgroup mapping is always added to the netgroup.byhost cache for faster future access.  This also leads to needing a lower TTL for the members cache so that changes in netgroup membership can be reflected in the ONTAP caches within the TTL timeframe.

Viewing cache entries

Each of the above came service caches and be viewed.  This can be used to confirm whether or not expected results are gotten from name services servers.  Each cache has its own individual options that you can use to filter the results of the cache to find what you are looking for.  In order to view the cache, the name-services cache <cache> <subcache> show command is used.

Caches are unique per vserver, so it is suggested to view caches on a per-vserver basis.  Below are some examples of the caches and the options.

ontap9-tme-8040::*> name-service cache hosts forward-lookup show  ?

  (vserver services name-service cache hosts forward-lookup show)

  [ -instance | -fields <fieldname>, ... ]

  [ -vserver <vserver name> ]                                                   *Vserver

  [[-host] <text>]                                                              *Hostname

  [[-protocol] {Any|ICMP|TCP|UDP}]                                              *Protocol (default: *)

  [[-sock-type] {SOCK_ANY|SOCK_STREAM|SOCK_DGRAM|SOCK_RAW}]                     *Sock Type (default: *)

  [[-flags] {FLAG_NONE|AI_PASSIVE|AI_CANONNAME|AI_NUMERICHOST|AI_NUMERICSERV}]  *Flags (default: *)

  [[-family] {Any|Ipv4|Ipv6}]                                                   *Family (default: *)

  [ -canonname <text> ]                                                         *Canonical Name

  [ -ips <IP Address>, ... ]                                                    *IP Addresses

  [ -ip-protocol {Any|ICMP|TCP|UDP}, ... ]                                      *Protocol

  [ -ip-sock-type {SOCK_ANY|SOCK_STREAM|SOCK_DGRAM|SOCK_RAW}, ... ]             *Sock Type

  [ -ip-family {Any|Ipv4|Ipv6}, ... ]                                           *Family

  [ -ip-addr-length <integer>, ... ]                                            *Length

  [ -source {none|files|dns|nis|ldap|netgrp_byname} ]                           *Source of the Entry

  [ -create-time <"MM/DD/YYYY HH:MM:SS"> ]                                      *Create Time

  [ -ttl <integer> ]                                                            *DNS TTL




ontap9-tme-8040::*> name-service cache unix-user user-by-id show

  (vserver services name-service cache unix-user user-by-id show)

Vserver    UID         Name         GID            Source  Create Time

---------- ----------- ------------ -------------- ------- -----------

SVM1       0           root         1              files   1/25/2018 15:07:13

ch-svm-nfs1

           0           root         1              files   1/24/2018 21:59:47

2 entries were displayed.

If there are no entries in a particular cache, the following message will be shown:

ontap9-tme-8040::*> name-service cache netgroups members show

  (vserver services name-service cache netgroups members show)

This table is currently empty.

There you have it! New cache methodology in ONTAP 9.3. If you’re using NAS and name services in ONTAP, it’s highly recommended to go to ONTAP 9.3 to take advantage of this new feature.

Behind the Scenes: Episode 126 – Komprise

Welcome to the Episode 126, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, we bring in Komprise (@Komprise) CEO Kumar Goswami (@KumarKGoswami) to chat about data management and how their software helps get the most out of your NetApp storage systems!

komprise

For more information about Komprise, check out komprise.com!

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

This week’s episode is here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

XCP SMB/CIFS support available!

If you’re not familiar with what XCP is, I covered it in a previous blog post, Migrating to ONTAP – Ludicrous speed! as well as in the XCP podcast. Basically, it’s a super-fast way to scan and migrate data.

One of the downsides of the tool was the fact that it only supported NFSv3 migrations, which also meant it couldn’t handle NTFS style ACLs. Doing that would require a SMB/CIFS supported version of XCP. Today, we get that with XCP SMB/CIFS 1.0:

https://mysupport.netapp.com/tools/download/ECMLP2357425DT.html?productID=62115&pcfContentID=ECMLP2357425

XCP for SMB/CIFS supports the following:

“show” Displays information about the CIFS shares of a system
“scan”  Reads all files and directories found on a CIFS share and build assessment reports
“copy”  Recursively copies everything from source to destination
“sync”  Performs multiple incremental syncs from source to target
“verify”  Verifies that the target state matches the source, including attributes and NTFS ACLs
“activate”  Activates the XCP license on Windows hosts
“help”     Displays detailed information about XCP commands and options

 

Right now, it’s CLI only, but be on the lookout for a GUI version.

“Installing” XCP on Windows

XCP in Windows is a simple executable file that runs via the cmd or a PowerShell window. One of the pre-requisites for the software includes Microsoft Visual C++ Redistributable for Visual Studio 2017. If you don’t install this, trying to run the program will result in an error that calls out a specific DLL that isn’t registered.

When I copied the file to my Windows host, I created a new directory called “C:\XCP.” You can put that directory anywhere. To run the utility in CMD, you can either navigate to the directory and run “xcp” or add the directory to your system paths to run from anywhere.

For example:

env-windows-path

XCP-path

Once that’s done, run XCP from any location:

cifs-xcp

cifs-xcp-ps.png

Licensing XCP

XCP is a licensed feature. That doesn’t mean you have to pay for it; the license is only used for tracking purposes. But you do have to apply a license. In Windows, that’s pretty easy.

  1. Download a license from xcp.netapp.com
  2. Copy the license into the C:\NetApp\XCP folder
  3. Run “xcp activate”

xcp-license.png

XCP show

The command “xcp show \\server” can give some useful information for an ONTAP SMB/CIFS server, such as:

  • Available shares
  • Capacity (used and available)
  • Current connections
  • Folder path
  • Share attributes and permissions

This output is a good way to get an overall look at what is available on a server.

cifs-xcp-show.png

XCP scan

XCP has a number of useful scanning features. These include:

PS C:\XCP> xcp help scan

usage: xcp scan [-h] [-v] [-parallel <n>] [-match <filter>] [-preserve-atime]
 [-depth <n>] [-stats] [-l] [-ownership] [-du]
 [-fmt <expression>]
 source

positional arguments:
 source

optional arguments:
 -h, --help show this help message and exit
 -v increase debug verbosity
 -parallel <n> number of concurrent processes (default: <cpu-count>)
 -match <filter> only process files and directories that match the filter
 (see `xcp help -match` for details)
 -preserve-atime restore last accessed date on source
 -depth <n> limit the search depth
 -stats print tree statistics report
 -l detailed file listing output
 -ownership retrieve ownership information
 -du summarize space usage of each directory including
 subdirectories
 -fmt <expression> format file listing according to the python expression
 (see `xcp help -fmt` for details)

I scanned my “shared” directory with the -stats option and it was able to scan over 60,000 files in 31 seconds and gave me the following stats:

== Maximum Values ==
 Size Depth Namelen Dirsize
 2.02KiB 5 15 100

== Average Values ==
 Size Depth Namelen Dirsize
 25.6 5 6 6

== Top File Extensions ==
 .py
 50003 1

== Number of files ==
 empty <8KiB 8-64KiB 64KiB-1MiB 1-10MiB 10-100MiB >100MiB
 3 50001

== Space used ==
 empty <8KiB 8-64KiB 64KiB-1MiB 1-10MiB 10-100MiB >100MiB
 0 1.22MiB 0 0 0 0 0

== Directory entries ==
 empty 1-10 10-100 100-1K 1K-10K >10k
 2 10004 101

== Depth ==
 0-5 6-10 11-15 16-20 21-100 >100
 60111

== Modified ==
 >1 year >1 month 1-31 days 1-24 hrs <1 hour <15 mins future
 60111

== Created ==
 >1 year >1 month 1-31 days 1-24 hrs <1 hour <15 mins future
 60111

Total count: 60111
Directories: 10107
Regular files: 50004
Symbolic links:
Junctions:
Special files:
Total space for regular files: 1.22MiB
Total space for directories: 0
Total space used: 1.22MiB
60,111 scanned, 0 errors, 31s

When I increased the parallel threads to 8, it finished in 18 seconds:

PS C:\XCP> xcp scan -stats -parallel 8 \\demo\shared

Total count: 60111
Directories: 10107
Regular files: 50004
Symbolic links:
Junctions:
Special files:
Total space for regular files: 1.22MiB
Total space for directories: 0
Total space used: 1.22MiB
60,111 scanned, 0 errors, 18s

XCP copy

With xcp copy, I can copy SMB/CIFS data with or without ACLs at a much faster rate than simple robocopy. Keep in mind that with this version of XCP, it doesn’t have BACKUP OPERATOR rights, so you’d need to run the utility as an admin user on both source and destination.

In the following example, I used robocopy to copy the same dataset as XCP to a NetApp FlexGroup volume.

Robocopy to FlexGroup results (~20-30 minutes)

         Total Copied Skipped Mismatch FAILED Extras
 Dirs :  10107  10106       1        0      0      0
 Files : 50004  50004       0        0      0      0
 Bytes : 1.21m  1.21m       0        0      0      0
 Times : 0:19:01 0:13:11 0:00:00 0:05:50

Speed : 1615 Bytes/sec.
 Speed : 0.092 MegaBytes/min.

UPDATE: Someone asked if the above robocopy run was done with the /MT flag, which would be a more fair apples to apples comparison, since XCP does multithreading. It wasn’t. The syntax used was:

PS C:\XCP> robocopy /S /COPYALL source destination

So, I re-ran it using MT:8 and with an empty FlexGroup after restoring the base snapshot and converting the security style to NTFS to ensure the ACLs come over as well. The multithreading of robocopy cut the time to completion roughly in half.

Robocopy /MT to FlexGroup results (~8-9 minutes)

 PS C:\XCP> robocopy /S /COPYALL /MT:8 \\demo\shared \\demo\flexgroup\robocopyMT

-------------------------------------------------------------------------------
 ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : Tue Aug 22 20:32:54 2017

Source : \\demo\shared\
 Dest : \\demo\flexgroup\robocopyMT\

Files : *.*

Options : *.* /S /COPYALL /MT:8 /R:1000000 /W:30
------------------------------------------------------------------------------
Total Copied Skipped Mismatch FAILED Extras
 Dirs : 10107 10106 1 0 0 0
 Files : 50004 50004 0 0 0 0
 Bytes : 1.21 m 1.21 m 0 0 0 0
 Times : 0:35:21 0:06:23 0:00:00 0:01:59

Ended : Tue Aug 22 20:41:18 2017

Then I re-ran the XCP to FlexGroup by restoring the baseline snapshot and then making sure the security style of the volume was NTFS. (It was UNIX before, which would have affected ACLs and overall speed). But, the run still held within 4 minutes. So, we’re looking at 2x as fast as robocopy with a small 60k file and folder workload. In addition, the host I’m using is a Windows 7 client VM with a 1GB network connection and not a ton of power behind it. XCP works best with more robust hardware.

win7-info

XCP to FlexGroup results – NTFS security style (~4 minutes!)

PS C:\XCP> xcp copy -parallel 8 \\demo\shared \\demo\flexgroup\XCP
1,436 scanned, 0 errors, 0 skipped, 0 copied, 0 (0/s), 5s
4,381 scanned, 0 errors, 0 skipped, 507 copied, 12.4KiB (2.48KiB/s), 10s
5,426 scanned, 0 errors, 0 skipped, 1,882 copied, 40.5KiB (5.64KiB/s), 15s
7,431 scanned, 0 errors, 0 skipped, 3,189 copied, 67.4KiB (5.37KiB/s), 20s
8,451 scanned, 0 errors, 0 skipped, 4,537 copied, 96.1KiB (5.75KiB/s), 25s
9,651 scanned, 0 errors, 0 skipped, 5,867 copied, 123KiB (5.31KiB/s), 30s
10,751 scanned, 0 errors, 0 skipped, 7,184 copied, 150KiB (5.58KiB/s), 35s
12,681 scanned, 0 errors, 0 skipped, 8,507 copied, 178KiB (5.44KiB/s), 40s
13,891 scanned, 0 errors, 0 skipped, 9,796 copied, 204KiB (5.26KiB/s), 45s
14,861 scanned, 0 errors, 0 skipped, 11,136 copied, 232KiB (5.70KiB/s), 50s
15,966 scanned, 0 errors, 0 skipped, 12,464 copied, 259KiB (5.43KiB/s), 55s
18,031 scanned, 0 errors, 0 skipped, 13,784 copied, 287KiB (5.52KiB/s), 1m0s
19,056 scanned, 0 errors, 0 skipped, 15,136 copied, 316KiB (5.80KiB/s), 1m5s
20,261 scanned, 0 errors, 0 skipped, 16,436 copied, 342KiB (5.21KiB/s), 1m10s
21,386 scanned, 0 errors, 0 skipped, 17,775 copied, 370KiB (5.65KiB/s), 1m15s
23,286 scanned, 0 errors, 0 skipped, 19,068 copied, 397KiB (5.36KiB/s), 1m20s
24,481 scanned, 0 errors, 0 skipped, 20,380 copied, 424KiB (5.44KiB/s), 1m25s
25,526 scanned, 0 errors, 0 skipped, 21,683 copied, 451KiB (5.35KiB/s), 1m30s
26,581 scanned, 0 errors, 0 skipped, 23,026 copied, 479KiB (5.62KiB/s), 1m35s
28,421 scanned, 0 errors, 0 skipped, 24,364 copied, 507KiB (5.63KiB/s), 1m40s
29,701 scanned, 0 errors, 0 skipped, 25,713 copied, 536KiB (5.70KiB/s), 1m45s
30,896 scanned, 0 errors, 0 skipped, 26,996 copied, 561KiB (5.15KiB/s), 1m50s
31,911 scanned, 0 errors, 0 skipped, 28,334 copied, 590KiB (5.63KiB/s), 1m55s
33,706 scanned, 0 errors, 0 skipped, 29,669 copied, 617KiB (5.52KiB/s), 2m0s
35,081 scanned, 0 errors, 0 skipped, 30,972 copied, 644KiB (5.44KiB/s), 2m5s
36,116 scanned, 0 errors, 0 skipped, 32,263 copied, 671KiB (5.30KiB/s), 2m10s
37,201 scanned, 0 errors, 0 skipped, 33,579 copied, 698KiB (5.48KiB/s), 2m15s
38,531 scanned, 0 errors, 0 skipped, 34,898 copied, 726KiB (5.65KiB/s), 2m20s
40,206 scanned, 0 errors, 0 skipped, 36,199 copied, 753KiB (5.36KiB/s), 2m25s
41,371 scanned, 0 errors, 0 skipped, 37,507 copied, 780KiB (5.39KiB/s), 2m30s
42,441 scanned, 0 errors, 0 skipped, 38,834 copied, 808KiB (5.63KiB/s), 2m35s
43,591 scanned, 0 errors, 0 skipped, 40,161 copied, 835KiB (5.47KiB/s), 2m40s
45,536 scanned, 0 errors, 0 skipped, 41,445 copied, 862KiB (5.31KiB/s), 2m45s
46,646 scanned, 0 errors, 0 skipped, 42,762 copied, 890KiB (5.56KiB/s), 2m50s
47,691 scanned, 0 errors, 0 skipped, 44,052 copied, 916KiB (5.30KiB/s), 2m55s
48,606 scanned, 0 errors, 0 skipped, 45,371 copied, 943KiB (5.45KiB/s), 3m0s
50,611 scanned, 0 errors, 0 skipped, 46,518 copied, 967KiB (4.84KiB/s), 3m5s
51,721 scanned, 0 errors, 0 skipped, 47,847 copied, 995KiB (5.54KiB/s), 3m10s
52,846 scanned, 0 errors, 0 skipped, 49,138 copied, 1022KiB (5.32KiB/s), 3m15s
53,876 scanned, 0 errors, 0 skipped, 50,448 copied, 1.02MiB (5.53KiB/s), 3m20s
55,871 scanned, 0 errors, 0 skipped, 51,757 copied, 1.05MiB (5.42KiB/s), 3m25s
57,011 scanned, 0 errors, 0 skipped, 53,080 copied, 1.08MiB (5.52KiB/s), 3m30s
58,101 scanned, 0 errors, 0 skipped, 54,384 copied, 1.10MiB (5.39KiB/s), 3m35s
59,156 scanned, 0 errors, 0 skipped, 55,714 copied, 1.13MiB (5.57KiB/s), 3m40s
60,111 scanned, 0 errors, 0 skipped, 57,049 copied, 1.16MiB (5.52KiB/s), 3m45s
60,111 scanned, 0 errors, 0 skipped, 58,483 copied, 1.19MiB (6.02KiB/s), 3m50s
60,111 scanned, 0 errors, 0 skipped, 59,907 copied, 1.22MiB (5.79KiB/s), 3m55s
60,111 scanned, 0 errors, 0 skipped, 60,110 copied, 1.22MiB (5.29KiB/s), 3m56s

XCP sync and verify

Sync and verify can be used during data migrations to ensure the source and target match up before cutting over. These use the same multi-processing capabilities as copy, so this should also be fast. Keep in mind that sync could also potentially be used to do incremental backups using XCP!

xcp-verify.png

Behind the Scenes: Episode 100 – XCP

Welcome to the Episode 100, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

group-4-2016

This week is our 100th episode! In true TechONTAP podcast fashion, we didn’t celebrate it at all.

Instead, we stuck to the tech and brought in Bogdan Minciu and Joshey Lazer of the XCP team to discuss XCP and the upcoming release that supports CIFS/SMB.

Also, check out the podcast episode on migrations (where we chat about XCP) and this XCP blog.

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

You can listen here:

 

How to host FTP-accessed data in ONTAP

I’m an official ONTAP chef!

maxresdefault[1]

That’s right; I created an ONTAP recipe, which shows you how to use NetApp ONTAP storage systems to host data for FTP shares.

You can find it here:

https://community.netapp.com/t5/Data-ONTAP-Discussions/ONTAP-Recipes-Work-around-the-lack-of-FTP-support-in-ONTAP-with-CIFS-SMB-or-NFS/m-p/132757

Keep in mind that native FTP support is not in ONTAP, so for the foreseeable future, use FTP VMs and CIFS/NFS shares. You can also use a similar approach to hosting HTTP data.

If you absolutely need native FTP/HTTP services in ONTAP, 7-Mode will be the way to go.

If you have any questions about the recipe, leave a comment here or on the recipe.

Case study: Using OSI methodology to troubleshoot NAS

Recently, I installed some 10GB cards into an AFF8040 so I could run some FlexGroup performance tests (stay tuned for that). I was able to install the cards myself, but to get them connected to a network here at NetApp’s internal labs, you have to file a ticket. This should sound familiar to many people, as this is how real-world IT works.

So I filed the ticket and eventually, the cards were connected. However, just like in real-world IT, the network team has no idea what the storage team (me) has configured, and the storage team (me) has no idea how the network team has things configured. So we had to troubleshoot a bit to get the cards to ping correctly. Turns out, they had a vlan tag on the ports that weren’t needed. Removed those and fixed the port channel and cool! We now had two 10GB LACP interfaces on a 2 node cluster!

Not so fast…

Turns out, ping is a great test for basic connectivity. But it’s awful for checking if stuff *actually works.* In this case, I could ping via the 10GB interfaces and even mount via NFSv3 and list directories, etc. But those are lightweight metadata operations.

Whenever I tried a heavier operation like a READ, WRITE or READDIRPLUS (incidentally, tab completion for a path when typing a command on an NFS mount? READDIRPLUS call), the client would hang indefinitely. When I would CTL + C out of the command, the process would sometimes also hang. And subsequent operations, including the GETATTR, LOOKUP, etc operations would also hang.

So, now I had a robust network that couldn’t even perform tab completions.

Narrowing down the issue

I like to start with a packet trace, as that gives me a hint where to focus my efforts. In this issue, I started a packet capture on both the client (10.63.150.161) and the cluster (10.193.67.218). In the traces, I saw some duplicate ACKs, as well as packets being sent but not replied to:

readdirplus-noreply.png

In the corresponding filer trace, I saw the READDIRPLUS call come in and get replied to, and then re-transmitted a bunch of times. But, as the trace above shows, the client never receives it.

readdirplus-filer

That means the filer is doing what it’s supposed to. The client is doing what it’s supposed to. But the network is blocking or dropping the packet for some reason.

When troubleshooting any issue, you have to start with a few basic steps (even though I like to start with the more complicated packet capture).

For instance…

What changed?

Well, this one was easy – I had added an entire new network into the mix. End to end. My previous ports were 1GB and worked fine. This was 10GB infrastructure, with LACP and jumbo frames. And I had no control over that network. Thus, I was left with client and server troubleshooting for now. I didn’t want to file another ticket before I had done my due diligence, in case I had done something stupid (totally within the realm of possibility, naturally).

So where did I go from there?

Start at layers 1, 2 and 3

The OSI model is something I used to take for granted as something interviewers asked because it seemed like a good question to stump people on. However, over the course of the last 10 years, I’ve come to realize it’s useful. What I was troubleshooting was NFS, which is all the way at layer 7 (the application layer).

osi-network-layer-cats[1]

So why start at layers 1-3? Why not start where my problem is?

Because with years of experience, you learn that the issue is rarely at the layer you’re seeing the issue manifest. It’s almost always farther down the stack. Where do you think the “Is it plugged in?” joke comes from?

media-cache-ak0-pinimg-com_736x_88_ac_c8_88acc8216648b26114507ca04686b357

Layer 1 means, essentially, is it plugged in? In this case, yes, it was. But it also means “are we seeing errors on the interfaces that are plugged in?” In ONTAP, you can see that with this command:

ontap9-tme-8040::*> node run * ifstat e2a
Node: ontap9-tme-8040-01

-- interface e2a (8 days, 23 hours, 14 minutes, 30 seconds) --

RECEIVE
 Frames/second: 1 | Bytes/second: 30 | Errors/minute: 0
 Discards/minute: 0 | Total frames: 84295 | Total bytes: 7114k
 Total errors: 0 | Total discards: 0 | Multi/broadcast: 0
 No buffers: 0 | Non-primary u/c: 0 | L2 terminate: 9709
 Tag drop: 0 | Vlan tag drop: 0 | Vlan untag drop: 0
 Vlan forwards: 0 | CRC errors: 0 | Runt frames: 0
 Fragment: 0 | Long frames: 0 | Jabber: 0
 Error symbol: 0 | Illegal symbol: 0 | Bus overruns: 0
 Queue drop: 0 | Xon: 0 | Xoff: 0
 Jumbo: 0 | JMBuf RxFrames: 0 | JMBuf DrvCopy: 0
TRANSMIT
 Frames/second: 82676 | Bytes/second: 33299k | Errors/minute: 0
 Discards/minute: 0 | Total frames: 270m | Total bytes: 1080g
 Total errors: 0 | Total discards: 0 | Multi/broadcast: 4496
 Queue overflows: 0 | No buffers: 0 | Xon: 0
 Xoff: 0 | Jumbo: 13 | TSO non-TCP drop: 0
 Split hdr drop: 0 | Pktlen: 0 | Timeout: 0
 Timeout1: 0 | Stray Cluster Pk: 0
DEVICE
 Rx MBuf Sz: Large (3k)
LINK_INFO
 Current state: up | Up to downs: 22 | Speed: 10000m
 Duplex: full | Flowcontrol: none

In this case, the interface is pretty clean. No errors, no “no buffers,” no CRC errors, etc. I can also see that the ports are “up.” The up to downs are high, but that’s because I’ve been adding/removing this port from the ifgrp multiple times, which leads me to the next step…

Layer 2/3

Layer 2 includes the LACP/port channel, as well as the MTU settings.  Layer 3 can also include pings and some switches, as well as routing.

Since the port channel was a new change, I made sure that the networking team verified that the port channel was configured properly, with the correct ports added to the channel. I also made sure that the MTU was 9216 on the switch ports, as well as the ports on the client and storage. Those all checked out.

However, that doesn’t mean we’re done with layer 2; remember, basic pings worked fine, but those operate at 1500 MTU. That means we’re not actually testing jumbo frames here. The issue with the client was that any NFS operation that was not metadata was never making it back to the client; that suggests a network issue somewhere.

I didn’t mention before, but this cluster also has a properly working 1GB network on 1500 MTU on  the same subnet, so that told me routing was likely not an issue. And because the client was able to send the information just fine and had the 10GB established for a while, the issue likely wasn’t on the network segment the client was connected to. The problem resided somewhere between the filer 10GB ports and the new switch the ports were connected to. (Remember… what changed?)

Jumbo frames

From my experience with troubleshooting and general IT knowledge, I knew that for jumbo frames to work properly, they had to be configured up and down the entire stack of the network. I knew the client was configured for jumbo frames properly because it was a known entity that had been chugging along just fine. I also knew that the filer had jumbo frames enabled because I had control over those ports.

What I wasn’t sure of was if the switch had jumbo frames configured for the entire stack. I knew the switch ports were fine, but what about the switch uplinks?

Luckily, ping can tell us. Did you know you could ping using MTU sizes?

Pinging MTU in Windows

To ping using a packet size in Windows, use:

ping -f -l [size] [address]

-f means “don’t fragment the packet.” That means, if I am sending a jumbo frame, don’t break it up into pieces to fit. If you ping using -f and a large MTU, the MTU size needs to be able to squeeze into the network MTU size. If it can’t, you’ll see this:

C:\>ping -f -l 9000 10.193.67.218

Pinging 10.193.67.218 with 9000 bytes of data:
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.
Packet needs to be fragmented but DF set.

Ping statistics for 10.193.67.218:
 Packets: Sent = 4, Received = 0, Lost = 4 (100% loss)

Then, try pinging with only -l (which specifies the packet size). If that fails, you have a good idea that your issue is MTU size. Note: My windows client didn’t have jumbo frames enabled, so I didn’t bother trying to use it to troubleshoot using it.

Pinging MTU in Linux

To ping using a packet size in Linux, use:

ping [-M do] [-s <packet size>] [host]

-f, again, means “don’t fragment the packet.” That means, if I am sending a jumbo frame, don’t break it up into pieces to fit. If you ping using -f and a large MTU, the MTU size needs to be able to squeeze into the network MTU size.

-M <hint>: Select Path MTU Discovery strategy.

<hint> may be either “do” (prohibit fragmentation, even local one), “want” (do PMTU discovery, fragment locally when packet size is large), or “dont” (do not set DF flag).

Keep in mind that the MTU size you specify won’t be *exactly* 9000; there’s some overhead involved. In the case of Linux, we’re dealing with about 28 bytes. So an MTU of 9000 will actually come across as 9028 and complain about the packet being too long:

# ping -M do -s 9000 10.193.67.218
PING 10.193.67.218 (10.193.67.218) 9000(9028) bytes of data.
ping: local error: Message too long, mtu=9000

Instead, ping jumbo frames using 9000 – 28 = 8972:

# ping -M do -s 8972 10.193.67.218
PING 10.193.67.218 (10.193.67.218) 8972(9000) bytes of data.
^C
--- 10.193.67.218 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1454ms

In this case, I lost 100% of my packets. Now, let’s ping using 1500 – 28 = 1472:

# ping -M do -s 1472 10.193.67.218
PING 10.193.67.218 (10.193.67.218) 1472(1500) bytes of data.
1480 bytes from 10.193.67.218: icmp_seq=1 ttl=249 time=0.778 ms
^C
--- 10.193.67.218 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 590ms
rtt min/avg/max/mdev = 0.778/0.778/0.778/0.000 ms

All good! Just to make sure, I pinged a known working client that has jumbo frames enabled end to end:

# ping -M do -s 8972 10.63.150.168
PING 10.63.150.168 (10.63.150.168) 8972(9000) bytes of data.
8980 bytes from 10.63.150.168: icmp_seq=1 ttl=64 time=1.12 ms
8980 bytes from 10.63.150.168: icmp_seq=2 ttl=64 time=0.158 ms
^C
--- 10.63.150.168 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1182ms
rtt min/avg/max/mdev = 0.158/0.639/1.121/0.482 ms

Looks like I have data pointing to jumbo frame configuration as my issue. And if you’ve ever dealt with a networking team, you’d better bring data. 🙂

 

Resolving the issue

The network team confirmed that the switch uplink was indeed not set to support jumbo frames. The change was going to take a bit of time, so rather than wait until then, I switched my ports to 1500 in the interim and everything was happy again. Once the jumbo frames get enabled on the cluster’s network segment, I can re-enable them on the cluster.

Where else can this issue crop up?

MTU mismatch is a colossal PITA. It’s hard to remember to look for it and hard to diagnose, especially if you don’t have access to all of the infrastructure.

In ONTAP, specifically, I’ve seen MTU mismatch break:

  • CIFS setup/performance
  • NFS operations
  • SnapMirror replication

Pretty much anything you do over a network can be affected, so if you run into a problem all the way up at the application layer, remember the OSI model and start with the following:

  • Check layers 1-3
  • Ask yourself “what changed?”
  • Compare against working configurations, if possible