How to see data transfer on an ONTAP cluster network

ONTAP clusters utilize a backend cluster network to allow multiple HA pairs to communicate and provide more scale for performance and capacity. This is done by allowing you to nondisruptively add new nodes (and, as a result, capacity and compute) into a cluster. Data will be accessible regardless of where you connect in the cluster. You can scale up to 24 nodes for NAS-only clusters, while being able to mix different HA pair types in the same cluster if you choose to offer different service levels for storage (such as performance tiers, capacity tiers, etc).

Network interfaces that serve data to clients live on physical ports on nodes and are floating/virtual IP addresses that can move to/from any node in the cluster. File systems for NAS are defined by Storage Virtual Machines (SVMs) and volumes. The SVMs own the IP addresses you would use to access data.

When using NAS (CIFS/SMB/NFS) for data access, you can connect to a data interface in the SVM that lives on any node in the cluster, regardless of where the data volume resides. The following graphic shows how that happens.

When you access a NAS volume on a data interface on the same node as the data volume, ONTAP can “cheat” a little and directly interact with that volume without having to do extra work.

If that data interface is on a different node than where the volume resides, then the NAS packet gets packaged up as a proprietary protocol and shipped over the cluster network backend to the node where the volume lives. This volume/node relationship is stored in an internal database in ONTAP so we always have a map to find volumes quickly. Once the NAS packet arrives on the destination node, it gets unpackaged, processed and then the response to the client goes back out the way it came.

Traversing the cluster network has a bit of a latency cost, however, as the packaging/unpackaging/traversal takes some time (more time than a local request). This manifests into slightly less performance for those workloads. The impact of that performance hit is negligible in most environments, but for latency-sensitive applications, there might be some noticeable performance degradation.

There are protocol features that help mitigate the remote I/O that can occur in a cluster, such as SMB node referrals and pNFS, but in scenarios where you can’t use either of those (SMB node referrals didn’t use Kerberos in earlier Windows versions; pNFS needs NFSv4.1 and later), then you’re going to likely have remote cluster traffic. As mentioned, in most cases this isn’t an issue, but it may be useful to have an easy way to find out if an ONTAP cluster is doing remote/cluster traffic.

Cluster level – Statistics show-periodic

To get a cluster-wide view if there is remote traffic on the cluster, you can use the advanced priv command “statistics show-periodic.” This command gives a wealth of information by default, such as:

  • CPU average/busy
  • Total ops/NFS ops/CIFS ops
  • FlexCache ops
  • Total data recieved/sent (Data and cluster network throughput)
  • Data received/sent (Data throughput only)
  • Cluster received/sent (Cluster throughput only)
  • Cluster busy % (how busy the cluster network is)
  • Disk reads/writes
  • Packets sent/received

We also have options to limit the intervals, define SVMs/vservers, etc.

::*> statistics show-periodic ?
[[-object] ] *Object
[ -instance ] *Instance
[ -counter ] *Counter
[ -preset ] *Preset
[ -node ] *Node
[ -vserver ] *Vserver
[ -interval ] *Interval in Seconds (default: 2)
[ -iterations ] *Number of Iterations (default: 0)
[ -summary {true|false} ] *Print Summary (default: true)
[ -filter ] *Filter Data

But for backend cluster traffic, we only care about a few of those, so we can filter the iterations for only what we want to view. In this case, I just want to look at the data sent/received and the cluster busy %.

::*> statistics show-periodic -counter total-recv|total-sent|data-recv|data-sent|cluster-recv|cluster-sent|cluster-busy

When I do that, I get a cleaner, easier to read capture. This is what it looks like when we have remote traffic. This is an NFSv4.1 workload without pNFS, using a mount wsize of 64K.

cluster1: cluster.cluster: 5/11/2021 14:01:49
    total    total     data     data cluster  cluster  cluster
     recv     sent     recv     sent    busy     recv     sent
 -------- -------- -------- -------- ------- -------- --------
    157MB   4.85MB    148MB   3.46MB      0%   8.76MB   1.39MB
    241MB   70.2MB    197MB   4.68MB      1%   43.1MB   65.5MB
    269MB    111MB    191MB   4.41MB      4%   78.1MB    107MB
    329MB   92.5MB    196MB   4.52MB      4%    133MB   88.0MB
    357MB    117MB    246MB   5.68MB      2%    111MB    111MB
    217MB   27.1MB    197MB   4.55MB      1%   20.3MB   22.5MB
    287MB   30.4MB    258MB   5.91MB      1%   28.7MB   24.5MB
    205MB   28.1MB    176MB   4.03MB      1%   28.9MB   24.1MB
cluster1: cluster.cluster: 5/11/2021 14:01:57
    total    total     data     data cluster  cluster  cluster
     recv     sent     recv     sent    busy     recv     sent
 -------- -------- -------- -------- ------- -------- --------
Minimums:
    157MB   4.85MB    148MB   3.46MB      0%   8.76MB   1.39MB
Averages for 8 samples:
    258MB   60.3MB    201MB   4.66MB      1%   56.5MB   55.7MB
Maximums:
    357MB    117MB    258MB   5.91MB      4%    133MB    111MB

As we can see, there is an average of 55.7MB sent and 56.5MB received over the cluster network each second; this accounts for an average of 1% of the available bandwidth, which means we have plenty of cluster network utilization left over.

When we look at the latency for this workload, this is what we see. (Using qos statistics latency show)

Policy Group            Latency
-------------------- ----------
-total-                364.00us
extreme-fixed          364.00us
-total-                619.00us
extreme-fixed          619.00us
-total-                490.00us
extreme-fixed          490.00us
-total-                409.00us
extreme-fixed          409.00us
-total-                422.00us
extreme-fixed          422.00us
-total-                474.00us
extreme-fixed          474.00us
-total-                412.00us
extreme-fixed          412.00us
-total-                372.00us
extreme-fixed          372.00us
-total-                475.00us
extreme-fixed          475.00us
-total-                436.00us
extreme-fixed          436.00us
-total-                474.00us
extreme-fixed          474.00us

This is what the cluster network looks like when I use pNFS for data locality:

cluster1: cluster.cluster: 5/11/2021 14:18:19
    total    total     data     data cluster  cluster  cluster
     recv     sent     recv     sent    busy     recv     sent
 -------- -------- -------- -------- ------- -------- --------
    208MB   6.24MB    206MB   4.76MB      0%   1.56MB   1.47MB
    214MB   5.37MB    213MB   4.85MB      0%    555KB    538KB
    214MB   6.27MB    213MB   4.80MB      0%   1.46MB   1.47MB
    219MB   5.95MB    219MB   5.40MB      0%    572KB    560KB
    318MB   8.91MB    317MB   7.44MB      0%   1.46MB   1.47MB
    203MB   5.16MB    203MB   4.62MB      0%    560KB    548KB
    205MB   6.09MB    204MB   4.64MB      0%   1.44MB   1.45MB
cluster1: cluster.cluster: 5/11/2021 14:18:26
    total    total     data     data cluster  cluster  cluster
     recv     sent     recv     sent    busy     recv     sent
 -------- -------- -------- -------- ------- -------- --------
Minimums:
    203MB   5.16MB    203MB   4.62MB      0%    555KB    538KB
Averages for 7 samples:
    226MB   6.28MB    225MB   5.22MB      0%   1.08MB   1.07MB
Maximums:
    318MB   8.91MB    317MB   7.44MB      0%   1.56MB   1.47MB

There is barely any cluster traffic other than the normal cluster operations. The “data” and “total” sent/received is nearly identical.

And the latency was an average of .1 ms lower.

Policy Group            Latency
-------------------- ----------

-total-                323.00us
extreme-fixed          323.00us
-total-                323.00us
extreme-fixed          323.00us
-total-                325.00us
extreme-fixed          325.00us
-total-                336.00us
extreme-fixed          336.00us
-total-                325.00us
extreme-fixed          325.00us
-total-                328.00us
extreme-fixed          328.00us
-total-                334.00us
extreme-fixed          334.00us
-total-                341.00us
extreme-fixed          341.00us
-total-                336.00us
extreme-fixed          336.00us
-total-                330.00us
extreme-fixed          330.00us

Try it out and see for yourself! If you have questions or comments, enter them below.

Behind the Scenes – Episode 288 – ONTAP System Manager 9.9.1

Welcome to the Episode 288, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

2019-insight-design2-warhol-gophers

This week, NetApp Principal TME Chris Gebhardt (@chrisgeb), PM Aniket Singh (aniket.singh@netapp.com) and TME Yizhao Zhuang (yizhao.zhuang@netapp.com) join us to discuss the latest ONTAP System Manager 9.9.1 changes.

For more information about System Manager:

https://docs.netapp.com/us-en/ontap/

Podcast Transcriptions

If you want a searchable transcript of the episode, check it out here (just set expectations accordingly):

Episode 288 – ONTAP System Manager 9.9.1 – Transcript

Just use the search field to look for words you want to read more about. (For example, search for “storage”)

transcript.png

Be sure to give us feedback (or if you need a full text transcript – Gong does not support sharing those yet) on the transcription in the comments here or via podcast@netapp.com! If you have requests for other previous episode transcriptions, let me know!

Tech ONTAP Community

We also now have a presence on the NetApp Communities page. You can subscribe there to get emails when we have new episodes.

Tech ONTAP Podcast Community

techontap_banner2

Finding the Podcast

You can find this week’s episode here:

You can also find the Tech ONTAP Podcast on:

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Running PowerShell from Linux to Query SMB Shares in NetApp ONTAP

I recently got a question about how to perform the following scenario:

  • Run a script from Linux that calls PowerShell on a remote Windows client using Kerberos
  • Remote Windows client uses PowerShell to authenticate against an ONTAP SMB share

That’s some Inception-style IT work.

Inception Ending Explained: Christopher Nolan's Endless Spinning | Observer

The issue they were having was that the credentials used to connect to the Windows client weren’t passing through to the ONTAP system. As a result, they’d get “Access Denied” in their script when attempting to access the share. I figured out how to get this working and rather than let that knowledge rot in the far reaches of my brain, I’m writing this up, since in my Google hunt, I found lots of people had similar issues with Linux PowerShell (not necessarily to ONTAP).

This is a known issue with some workarounds listed here:

Making the second hop in PowerShell Remoting

One workaround is to use “Resource-based Kerberos constrained delegation,” where you basically tell the 3rd server to accept delegated credentials from the 2nd server via the PrincipalsAllowedToDelegateToAccount parameter in the ADComputer cmdlets. We’ll cover that in a bit, but first…

WAIT. I can run PowerShell on Linux???

Well, yes! And this article tells you how to install it:

Installing PowerShell on Linux

Now, the downside is that not all PowerShell modules are available from Linux (for example, ActiveDirectory isn’t currently available). But it works!

PS /> New-PSSession -ComputerName COMPUTER -Credential administrator@NTAP.LOCAL -Authentication Kerberos

PowerShell credential request
Enter your credentials.
Password for user administrator@NTAP.LOCAL: **

Id Name     Transport ComputerName ComputerType  State  ConfigurationName    Availability
-- ----     --------- ------------ ------------  ----- --------------------- ------------
9 Runspace9 WSMan     COMPUTER     RemoteMachine Opened Microsoft.PowerShell Available

In that document, they don’t list CentOS/RHEL 8, which can be problematic, as you might run into some issues with the SSL libraries (This blog calls one of those issues out, as well as a few others).

On my Centos8.3 box, I ran into this issue:

New-PSSession: This parameter set requires WSMan, and no supported WSMan client library was found. WSMan is either not installed or unavailable for this system.

Using the guidance from the blog listed earlier, I found that there were a couple of files not found:

# ldd /opt/microsoft/powershell/7/libmi.so
…
libssl.so.1.0.0 => not found
libcrypto.so.1.0.0 => not found
…

That blog lists 1.0.2 as what is needed and looks to be using a different Linux flavor. You can find the files you need/where they live with:

# find / -name 'libssl.so.1.'
/usr/lib64/.libssl.so.1.1.hmac
/usr/lib64/libssl.so.1.1
/usr/lib64/libssl.so.1.1.1g
/usr/lib64/.libssl.so.1.1.1g.hmac
/opt/microsoft/powershell/7/libssl.so.1.0.0

Then you can use the symlink workaround and those files show up properly with ldd:

ln -s libssl.so.1.1 libssl.so.1.0.0
ln -s libcrypto.so.1.1 libcrypto.so.1.0.0
ldd /opt/microsoft/powershell/7/libmi.so
...
libssl.so.1.0.0 => /lib64/libssl.so.1.0.0 (0x00007f41ce3fc000)
libcrypto.so.1.0.0 => /lib64/libcrypto.so.1.0.0 (0x00007f41cdf16000)
...

However, authenticating with the server also requires an additional step.

Authenticating Linux PowerShell with Windows

You can authenticate to Windows servers with Linux PowerShell using the following methods:

Basic Default Kerberos Credssp Digest Negotiate

Here’s how each auth method works (or doesn’t) without doing anything else.

BasicNew-PSSession: Basic authentication is not supported over HTTP on Unix.
DefaultNew-PSSession: MI_RESULT_ACCESS_DENIED
CredSSPNew-PSSession: MI_RESULT_ACCESS_DENIED
DigestNew-PSSession: MI_RESULT_ACCESS_DENIED
KerberosAuthorization failed Unspecified GSS failure.
NegotiateAuthorization failed Unspecified GSS failure.
Auth methods with Linux PowerShell and results with no additional configs

In several places, I’ve seen the recommendation to install gssntlmssp on the Linux client, which works fine for “Negotiate” methods:

PS /> New-PSSession -ComputerName SERVER -Credential administrator@NTAP.LOCAL -Authentication Negotiate

PowerShell credential request
Enter your credentials.
Password for user administrator@NTAP.LOCAL: **

Id Name Transport ComputerName ComputerType State ConfigurationName Availability
-- ---- --------- ------------ ------------ ----- ----------------- ------------
2 Runspace2 WSMan SERVER       RemoteMachine Opened Microsoft.PowerShell Available

But not for Kerberos:

PS /> New-PSSession -ComputerName SERVER -Credential administrator@NTAP.LOCAL -Authentication Kerberos

PowerShell credential request
Enter your credentials.
Password for user administrator@NTAP.LOCAL: **

New-PSSession: [SERVER] Connecting to remote server SERVER failed with the following error message : Authorization failed Unspecified GSS failure. Minor code may provide more information Configuration file does not specify default realm For more information, see the about_Remote_Troubleshooting Help topic.

The simplest way to get around this is to add the Linux client to the Active Directory domain. Then you can use Kerberos for authentication to the client (at least with a user that has the correct permissions, such as a domain administrator).

*Alternately, you could do all this manually, which I don’t recommend.

**For non-AD KDCs, config methods will vary.

# realm join NTAP.LOCAL
Password for Administrator:

# pwsh
PowerShell 7.1.3
Copyright (c) Microsoft Corporation.

https://aka.ms/powershell
Type 'help' to get help.

PS /> New-PSSession -ComputerName SERVER -Credential administrator@NTAP.LOCAL -Authentication Kerberos

PowerShell credential request
Enter your credentials.
Password for user administrator@NTAP.LOCAL: **********


 Id Name            Transport ComputerName    ComputerType    State         ConfigurationName     Availability
 -- ----            --------- ------------    ------------    -----         -----------------     ------------
  1 Runspace1       WSMan     SERVER          RemoteMachine   Opened        Microsoft.PowerShell     Available


So, now that we know we can establish a session to the Windows server where we want to leverage PowerShell, now what?

Double-hopping with Kerberos using Delegation

One way to use Kerberos across multiple servers (including NetApp ONTAP) is to leverage the PrincipalsAllowedToDelegateToAccount parameter.

The script I’ll use does a basic “Get-Content” call to a file in an SMB/CIFS share in ONTAP (similar to “cat” in Linux).

If I don’t set the PrincipalsAllowedToDelegateToAccount parameter, a credential passed from Linux PowerShell to a Windows server to ONTAP will use Kerberos -> NTLM (with a NULL user) for the authentication and this is the end result:

# pwsh test.ps1

PowerShell credential request
Enter your credentials.
Password for user administrator@NTAP.LOCAL: **********

Test-Path: Access is denied
False
Get-Content: Access is denied
Get-Content: Cannot find path '\\DEMO\files\file-symlink.txt' because it does not exist.

In a packet capture, we can see the session setup uses NULL with NTLMSSP:

14    0.031496   x.x.x.x   x.x.x.y      SMB2 289  Session Setup Request, NTLMSSP_AUTH, User: \

And here’s what the ACCESS_DENIED looks like:

20    0.043026   x.x.x.x   x.x.x.y      SMB2 166  Tree Connect Request Tree: \\DEMO\files
21    0.043217   x.x.x.y   x.x.x.x      SMB2 131  Tree Connect Response, Error: STATUS_ACCESS_DENIED

To use Kerberos passthrough/delegation, I run this PowerShell command to set the parameter on the destination (ONTAP) CIFS server:

Set-ADComputer -Identity DEMO -PrincipalsAllowedToDelegateToAccount SERVER$

That allows the SMB session to ONTAP to set up using Kerberos auth:

2603 26.877660 x.x.x.x x.x.x.y SMB2 2179 Session Setup Request
2673 26.909735 x.x.x.y x.x.x.x SMB2 326 Session Setup Response
supportedMech: 1.2.840.48018.1.2.2 (MS KRB5 - Microsoft Kerberos 5)

And the tree connect succeeds (you may need to run klist purge on the Windows client):

2674 26.910117 x.x.x.x x.x.x.y SMB2 154 Tree Connect Request Tree: \demo\files
2675 26.910630 x.x.x.x x.x.x.y SMB2 138 Tree Connect Response

This is the result from the Linux client:

# pwsh test.ps1

PowerShell credential request
Enter your credentials.
Password for user administrator@NTAP.LOCAL: **********

True
This is a file symlink.

So, how do we work around this issue if we can’t delegate Kerberos?

Using the NULL user and NTLM

Remember when I said the request without Kerberos delegation used the NULL user and NTLMSSP?

14    0.031496   x.x.x.x   x.x.x.y      SMB2 289  Session Setup Request, NTLMSSP_AUTH, User: \ 

The reason we saw “Access Denied” to the ONTAP CIFS/SMB share is because ONTAP disallows the NULL user by default. However, in ONTAP 9.0 and later, you can enable NULL user authentication, as described in this KB article:

How to grant access to NULL (Anonymous) user in Clustered Data ONTAP

Basically, it’s a simple two-step process:

  1. Create a name mapping rule for ANONYMOUS LOGON
  2. Set the Windows default NULL user in the CIFS options

Here’s how I did it in my SVM (address is the Windows client IP):

::*> vserver name-mapping create -vserver DEMO -direction win-unix -position 3 -pattern "ANONYMOUS LOGON" -replacement pcuser -address x.x.x.x/24

The Windows user needs to be a valid Windows user.

::*> cifs options modify -vserver DEMO -win-name-for-null-user NTAP\powershell-user

You can verify ONTAP can find it with:

::*> access-check authentication translate -node node1 -vserver DEMO -win-name NTAP\powershell-user
S-1-5-21-3552729481-4032800560-2279794651-1300

Once that’s done, we authenticate with NTLM and get access with the NULL user:

14 0.009012 x.x.x.x x.x.x.y SMB2 289 Session Setup Request, NTLMSSP_AUTH, User: \
27 0.075264 x.x.x.x x.x.x.y SMB2 166 Tree Connect Request Tree: \DEMO\files
28 0.075747 x.x.x.y x.x.x.x SMB2 138 Tree Connect Response

And the Linux client is able to run the PowerShell calls:

# pwsh test.ps1

PowerShell credential request
Enter your credentials.
Password for user administrator@NTAP.LOCAL: **********

True
This is a file symlink.

Questions? Comments? Add them below!

Behind the Scenes – Episode 287 – Splunk SmartStore and NetApp StorageGrid

Welcome to the Episode 287, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

2019-insight-design2-warhol-gophers

This week, Josh Atwell (@josh_atwell), Steven Pruchniewski (@webscalesteve), James Bradshaw (james.bradshaw@netapp.com), Raj Grewal (raj.grewal@netapp.com) and Joseph Kandatilparambil (joseph.kandatilparambil@netapp.com) all join us to discuss Splunk SmartStore and the certification work being done with NetApp StorageGrid.

For more information about StorageGrid:

Podcast Transcriptions

If you want a searchable transcript of the episode, check it out here (just set expectations accordingly):

Episode 287 – Splunk SmartStore with NetApp StorageGrid – Transcript

Just use the search field to look for words you want to read more about. (For example, search for “storage”)

transcript.png

Be sure to give us feedback (or if you need a full text transcript – Gong does not support sharing those yet) on the transcription in the comments here or via podcast@netapp.com! If you have requests for other previous episode transcriptions, let me know!

Tech ONTAP Community

We also now have a presence on the NetApp Communities page. You can subscribe there to get emails when we have new episodes.

Tech ONTAP Podcast Community

techontap_banner2

Finding the Podcast

You can find this week’s episode here:

You can also find the Tech ONTAP Podcast on:

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

How to Map File and Folder Locations to NetApp ONTAP FlexGroup Member Volumes with XCP

The concept behind a NetApp FlexGroup volume is that ONTAP presents a single large namespace for NAS data, while ONTAP handles the balance and placement of files and folders to the underlying FlexVol member volumes, rather than a storage administrator needing to manage that.

I cover it in more detail in:

There’s also this USENIX presentation:

However, while not knowing/caring where files and folders live in your cluster is nice most of the time, there are occasions where you may need to figure out where a file or folder *actually* lives in the cluster – such as if a member volume has a large imbalance of capacity usage and you need to know what files need to be deleted/moved out of that volume. Previously, there’s been no real good way to do that, but thanks to the efforts of one of our global solutions architects (and one of the inventors of XCP), we now have a way and we don’t even need a treasure map.

Amazon.com: Plastic Treasure Map Party Accessory (1 count) (1/Pkg): Kitchen  & Dining

What is NetApp XCP?

If you’re unfamiliar with NetApp XCP, it’s NetApp’s FREE copy utility/data move that also can be used to do file analytics. There are other use cases, too:

Using XCP to delete files en masse: A race against rm

How to find average file size and largest file size using XCP

Because XCP can run in parallel from a client, it can perform tasks (such as find) much faster in high file count environments, so you’re not sitting around waiting for a command to finish for minutes/hours/days.

Since a FlexGroup is pretty much made for high file count environments, we’d want a way to quickly find files and their locations.

ONTAP NAS and File Handles

In How to identify a file or folder in ONTAP in NFS packet traces, I covered how to find inode information and a little bit about how ONTAP file handles are created/presented. The deep details aren’t super important here, but the general concept – that FlexGroup member volume information is stored in file handles that NFS can read – is.

Using that information and some parsing, there’s a Python script that can be used as an XCP plugin to translate file handles into member volume index numbers and present them in easy-to-read formats.

That Python script can be found here:

FlexGroup File Mapper

How to Use the “FlexGroup File Mapper” plugin with XCP

First of all, you’d need a client that has XCP installed. The version isn’t super important, but the latest release is generally the best release to use.

There are two methods we’ll use here to map files to member volumes.

  1. Scan All Files/Folders in a FlexGroup and Map Them All to Member Volumes
  2. Use a FlexGroup Member Volume Number and Find All Files in that Member Volume

To do this, I’ll use a FlexGroup that has ~2 million files.

::*> df -i FGNFS
Filesystem iused ifree %iused Mounted on Vserver
/vol/FGNFS/ 2001985 316764975 0% /FGNFS DEMO

Getting the XCP Host Ready

First, copy the FlexGroup File Mapper plugin to the XCP host. The file name isn’t important, but when you run the XCP command, you’ll either want to specify the plugin’s location or run the command from the folder the plugin lives in.

On my XCP host, I have the plugin named fgid.py in /testXCP:

# ls -la | grep fgid.py
-rw-r--r-- 1 502 admin 1645 Mar 25 17:34 fgid.py
# pwd
/testXCP

Scan All Files/Folders in a FlexGroup and Map Them All to Member Volumes

In this case, we’ll map all files and folders to their respective FlexGroup member volumes.

This is the command I use:

xcp diag -run fgid.py scan -fmt '"{} {}".format(x, fgid(x))' 10.10.10.10:/exportname

You can also include -parallel (n) to control how many processes spin up to do this work and you can use > filename at the end to pipe the output to a file (recommended).

For example, scanning ~2 million files in this volume took just 37 seconds!

# xcp diag -run fgid.py scan -fmt '"{} {}".format(x, fgid(x))' 10.10.10.10:/FGNFS > FGNFS.txt
402,061 scanned, 70.6 MiB in (14.1 MiB/s), 367 KiB out (73.3 KiB/s), 5s
751,933 scanned, 132 MiB in (12.3 MiB/s), 687 KiB out (63.9 KiB/s), 10s
1.10M scanned, 193 MiB in (12.2 MiB/s), 1007 KiB out (63.6 KiB/s), 15s
1.28M scanned, 225 MiB in (6.23 MiB/s), 1.14 MiB out (32.6 KiB/s), 20s
1.61M scanned, 283 MiB in (11.6 MiB/s), 1.44 MiB out (60.4 KiB/s), 25s
1.91M scanned, 335 MiB in (9.53 MiB/s), 1.70 MiB out (49.5 KiB/s), 31s
2.00M scanned, 351 MiB in (3.30 MiB/s), 1.79 MiB out (17.4 KiB/s), 36s
Sending statistics…

Xcp command : xcp diag -run fgid.py scan -fmt "{} {}".format(x, fgid(x)) 10.10.10.10:/FGNFS
Stats : 2.00M scanned
Speed : 351 MiB in (9.49 MiB/s), 1.79 MiB out (49.5 KiB/s)
Total Time : 37s.
STATUS : PASSED

The file created was 120MB, though… that’s a LOT of text to sort through.

-rw-r--r--. 1 root root 120M Apr 27 15:28 FGNFS.txt

So, there’s another way to do this, right? Correct!

If I know the folder I want to filter, or even a matching of file names, I can use -match in the command. In this case, I want to find all folders named dir_33.

This is the command:

# xcp diag -run fgid.py scan -fmt '"{} {}".format(x, fgid(x))' -match "name=='dir_33'" 10.10.10.10:/FGNFS > dir_33_FGNFS.txt

This is the output of the file. Two folders – one in member volume 3, one in member volume 4:

# cat dir_33_FGNFS.txt
x.x.x.x:/FGNFS/files/client1/dir_33 3
x.x.x.x:/FGNFS/files/client2/dir_33 4

If I want to use pattern matching for file names (ie, I know I want all files with “moarfiles3” in the name), then I can do this using regex and/or wildcards. More examples can be found in the XCP user guides.

Here’s the command I used. It found 440,400 files with that pattern in 27s.

# xcp diag -run fgid.py scan -fmt '"{} {}".format(x, fgid(x))' -match "fnm('moarfiles3*')" 10.10.10.10:/FGNFS > moarfiles3_FGNFS.txt

507,332 scanned, 28,097 matched, 89.0 MiB in (17.8 MiB/s), 465 KiB out (92.9 KiB/s), 5s
946,796 scanned, 132,128 matched, 166 MiB in (15.4 MiB/s), 866 KiB out (80.1 KiB/s), 10s
1.31M scanned, 209,340 matched, 230 MiB in (12.8 MiB/s), 1.17 MiB out (66.2 KiB/s), 15s
1.73M scanned, 297,647 matched, 304 MiB in (14.8 MiB/s), 1.55 MiB out (77.3 KiB/s), 20s
2.00M scanned, 376,195 matched, 351 MiB in (9.35 MiB/s), 1.79 MiB out (48.8 KiB/s), 25s
Sending statistics…

Filtered: 444400 matched, 1556004 did not match

Xcp command : xcp diag -run fgid.py scan -fmt "{} {}".format(x, fgid(x)) -match fnm('moarfiles3*') 10.10.10.10:/FGNFS
Stats : 2.00M scanned, 444,400 matched
Speed : 351 MiB in (12.6 MiB/s), 1.79 MiB out (65.7 KiB/s)
Total Time : 27s.

And this is a sample of some of those entries (the file is 27MB):

x.x.x.x:/FGNFS/files/client1/dir_45/moarfiles3158.txt 3
x.x.x.x:/FGNFS/files/client1/dir_45/moarfiles3159.txt 3

I can also look for files over a certain size. In this volume, the files are all 4K in size; but in my TechONTAP volume, I have varying file sizes. In this case, I want to find all .wav files greater than 100MB. This command didn’t seem to pipe to a file for me, but the output was only 16 files.

# xcp diag -run fgid.py scan -fmt '"{} {}".format(x, fgid(x))' -match "fnm('.wav') and size > 500*M" 10.10.10.11:/techontap > TechONTAP_ep.txt

10.10.10.11:/techontap/Episodes/Episode 20x - Genomics Architecture/ep20x-genomics-meat.wav 4
10.10.10.11:/techontap/archive/combine.band/Media/Audio Files/ep104-webex.output.wav 5
10.10.10.11:/techontap/archive/combine.band/Media/Audio Files/ep104-mics.output.wav 3
10.10.10.11:/techontap/archive/Episode 181 - Networking Deep Dive/ep181-networking-deep-dive-meat.output.wav 6
10.10.10.11:/techontap/archive/Episode 181 - Networking Deep Dive/ep181-networking-deep-dive-meat.wav 2

Filtered: 16 matched, 7687 did not match

xcp command : xcp diag -run fgid.py scan -fmt "{} {}".format(x, fgid(x)) -match fnm('.wav') and size > 100M 10.10.10.11:/techontap
Stats : 7,703 scanned, 16 matched
Speed : 1.81 MiB in (1.44 MiB/s), 129 KiB out (102 KiB/s)
Total Time : 1s.
STATUS : PASSED

But what if I know that a member volume is getting full and I want to see what files are in that member volume?

Use a FlexGroup Member Volume Number and Find All Files in that Member Volume

In the case where I know what member volume needs to be addressed, I can use XCP to search using the FlexGroup index number. The index number lines up with the member volume numbers, so if the index number is 6, then we know the member volume is 6.

In my 2 million file FG, I want to filter by member 6, so I use this command, which shows there are ~95019 files in member 6:

# xcp diag -run fgid.py scan -match 'fgid(x)==6' -parallel 10 -l 10.10.10.10:/FGNFS > member6.txt

 615,096 scanned, 19 matched, 108 MiB in (21.6 MiB/s), 563 KiB out (113 KiB/s), 5s
 1.03M scanned, 5,019 matched, 180 MiB in (14.5 MiB/s), 939 KiB out (75.0 KiB/s), 10s
 1.27M scanned, 8,651 matched, 222 MiB in (8.40 MiB/s), 1.13 MiB out (43.7 KiB/s), 15s
 1.76M scanned, 50,019 matched, 309 MiB in (17.3 MiB/s), 1.57 MiB out (89.9 KiB/s), 20s
 2.00M scanned, 62,793 matched, 351 MiB in (8.35 MiB/s), 1.79 MiB out (43.7 KiB/s), 25s

Filtered: 95019 matched, 1905385 did not match

Xcp command : xcp diag -run fgid.py scan -match fgid(x)==6 -parallel 10 -l 10.10.10.10:/FGNFS
Stats       : 2.00M scanned, 95,019 matched
Speed       : 351 MiB in (12.5 MiB/s), 1.79 MiB out (65.0 KiB/s)
Total Time  : 28s.
STATUS      : PASSED

When I check against the files-used for that member volume, it lines up pretty well:

::*> vol show -vserver DEMO -volume FGNFS__0006 -fields files-used
vserver volume      files-used
------- ----------- ----------
DEMO    FGNFS__0006 95120

And the output file shows not just the file names, but also the sizes!

rw-r--r-- --- root root 4KiB 4KiB 18h22m FGNFS/files/client2/dir_143/moarfiles1232.txt
rw-r--r-- --- root root 4KiB 4KiB 18h22m FGNFS/files/client2/dir_143/moarfiles1233.txt
rw-r--r-- --- root root 4KiB 4KiB 18h22m FGNFS/files/client2/dir_143/moarfiles1234.txt

And, if I choose, I can filter further with the sizes. Maybe I just want to see files in that member volume that are 4K or less (in this case, that’s all of them):

# xcp diag -run fgid.py scan -match 'fgid(x)==6 and size < 4*K' -parallel 10 -l 10.10.10.10:/FGNFS

In my “TechONTAP” volume, I look for 500MB files or greater in member 6:

# xcp diag -run fgid.py scan -match 'fgid(x)==6 and size > 500*M' -parallel 10 -l 10.10.10.11:/techontap

rw-r--r-- --- 501 games 596MiB 598MiB 3y219d techontap/Episodes/Episode 1/Epidose 1 GBF.band/Media/Audio Files/Tech ONTAP Podcast - Episode 1 - AFF with Dan Isaacs v3_1.aif
rw-r--r-- --- 501 games 885MiB 888MiB 3y219d techontap/archive/Prod - old MacBook/Insight 2016_Day2_TechOnTap_JParisi_ASullivan_GDekhayser.mp4
rw-r--r-- --- 501 games 787MiB 790MiB 1y220d techontap/archive/Episode 181 - Networking Deep Dive/ep181-networking-deep-dive-meat.output.wav

Filtered: 3 matched, 7700 did not match

Xcp command : xcp diag -run fgid.py scan -match fgid(x)==6 and size > 500*M -parallel 10 -l 10.10.10.11:/techontap
Stats : 7,703 scanned, 3 matched
Speed : 1.81 MiB in (1.53 MiB/s), 129 KiB out (109 KiB/s)
Total Time : 1s.
STATUS : PASSED

So, there you have it! A way to find files in a specific member volume inside of a FlexGroup! Let me know if you have any comments or questions below.

How to identify a file or folder in ONTAP in NFS packet traces

When you’re troubleshooting NFS issues, sometimes you have to collect a packet capture to see what’s going on. But the issue is, packet captures don’t really tell you the file or folder names. I like to use Wireshark for Mac and Windows, and regular old tcpdump for Linux. For ONTAP, you can run packet captures using this KB (requires NetApp login):

How to capture packet traces (tcpdump) on ONTAP 9.2+ systems

By default, Wireshark shows NFS packets like this ACCESS call. We see a FH, which is in hex, and then we see another filehandle that’s even more unreadable. We’ll occasionally see file names in the trace (like copy-file below), but if we need to find out why an ACCESS call fails, we’ll have difficulty:

Luckily, Wireshark has some built-in stuff to crack open those NFS file handles in ONTAP.

Also, check out this new blog:

How to Map File and Folder Locations to NetApp ONTAP FlexGroup Member Volumes with XCP

Changing Wireshark Settings

First, we’d want to set the NFS preferences. That’s done via Edit -> Preferences and then by clicking on “Protocols” in the left hand menu and selecting NFS:

Here, you’ll see some options that you can read more about by mousing over them:

I just select them all.

When we go to the packet we want to analyze, we can right click and select “Decode As…”:

This brings up the “Decode As” window. Here, we have “NFS File Handle Types” pre-populated. Double-click (none) under “Current” and you get a drop down menu. You’ll get some options for NFS, including…. ONTAP! In this case, since I’m using clustered ONTAP, I select ontap_gx_v3. (GX is what clustered ONTAP was before clustered ONTAP was clustered ONTAP):

If you click “OK” it will apply to the current session only. If you click “Save” it will keep those preferences every time.

Now, when the ACCESS packet is displayed, I get WAY more information about the file in question and they’re translated to decimal values.

Those still don’t mean a lot to us, but I’ll get to that.

Mapping file handle values to files in ONTAP

Now, we can use the ONTAP CLI and the packet capture to discern exactly what file has that ACCESS call.

Every volume in ONTAP has a unique identifier called a “Master Set ID” (or MSID). You can see the volume’s MSID with the following diag priv command:

cluster::*> vol show -vserver DEMO -volume vol2 -fields msid
vserver volume  msid
------- ------- -----------
DEMO    vol2    2163230318

If you know the volume name you’re troubleshooting, then that makes life easier – just use find in the packet details.

If you don’t, the MSID can be found in a packet trace in the ACCESS reply as the “fsid”:

You can then find the volume name and exported path with the MSID in the ONTAP CLI with:

cluster::*> set diag; vol show -vserver DEMO -msid  2163230318 -fields volume,junction-path
vserver volume  junction-path
------- ------- ----------- 
DEMO    vol2    /vol2 

File and directory handles are constructed using that MSID, which is why each volume is considered a distinct filesystem. But we don’t care about that, because Wireshark figures all that out for us and we can use the ONTAP CLI to figure it out as well.

The pertinent information in the trace as it maps to the files and folders are:

  • Spin file id = inode number in ONTAP
  • Spin file unique id = file generation number
  • File id = inode number as seen by the NFS client

If you know the volume and file or folder’s name, you can easily find the inode number in ONTAP with this command:

cluster::*> set advanced; showfh -vserver DEMO /vol/vol2/folder
Vserver                Path
---------------------- ---------------------------
DEMO                   /vol/vol2/folder
flags   snapid fileid    generation fsid       msid         dsid
------- ------ --------- ---------- ---------- ------------ ------------
0x8000  0      0x658e    0x227ed312 -          -            0x1639

In the above, the values are in hex, but we can translate with a hex converter, like this one:

https://www.rapidtables.com/convert/number/hex-to-decimal.html

So, for the values we got:

  • file ID (inode) 0x658e = 25998
  • generation ID 0x227ed312 = 578736914

In the trace, that matches up:

Finding file names and paths by inode number

But what happens if you don’t know the file name and just have the information from the trace?

One way is to use the nodeshell level command “inodepath.”

::*> node run -node node1 inodepath -v files 15447
Inode 15447 in volume files (fsid 0x142a) has 1 name.
Volume UUID is: 76a69b93-cc2f-11ea-b16f-00a098696eda
[ 1] Primary pathname = /vol/files/newapps/user1-file-smb

This will work with a FlexGroup volume as well, provided you know the node and the member volume where the file lives (see “How to Map File and Folder Locations to NetApp ONTAP FlexGroup Member Volumes with XCP” for a way to figure that info out).

::*> node run -node node2 inodepath -v FG2__0007 5292
Inode 5292 in volume FG2__0007 (fsid 0x1639) has 1 name.
Volume UUID is: 87b14652-9685-11eb-81bf-00a0986b1223
[ 1] Primary pathname = /vol/FG2/copy-file-finder

There’s also a diag privilege command in ONTAP for that. The caveat is it can be dangerous to run, especially if you make a mistake in running it. (And when I say dangerous, I mean best case, it hangs your CLI session for a while; worst case, it panics the node.) If possible, use inodepath instead.

Here’s how we could use the volume name and inode number to find the file name. For a FlexVol volume, it’s simple:

cluster::*> vol explore -format inode -scope volname.inode -dump name

For example:

cluster::*> volume explore -format inode -scope files.15447 -dump name
name=/newapps/user1-file-smb

With a FlexGroup volume, however, it’s a little more complicated, as there are member volumes to take into account and there’s no easy way for ONTAP to discern which FlexGroup member volume has the file, since ONTAP inode numbers can be reused in different member volumes. This is because the file IDs presented to NFS clients are created using the inode numbers and things like the member volume’s MSID (which is different than the FlexGroup’s MSID).

To make this happen with volume explore, we’d be working in reverse – listing the contents of the volume’s files/folders, then using the inode number of the parent folder, listing those, etc. With high file count environments, this is basically an impossibility.

In that case, we’d need to use an NFS client to discover the file name associated with the inode number in question.

From the client, we have two commands to find an inode number for a file. In this case we know the file’s location and name:

# ls -i /mnt/client1/copy-file-finder
4133624749 /mnt/client1/copy-file-finder
#stat copy-file-finder
File: ‘copy-file-finder’
Size: 12 Blocks: 0 IO Block: 1048576 regular file
Device: 2eh/46d Inode: 4133624749 Links: 1
Access: (0555/-r-xr-xr-x) Uid: ( 1102/ prof1) Gid: (10002/ProfGroup)
Access: 2021-04-14 11:47:45.579879000 -0400
Modify: 2021-04-14 11:47:45.588875000 -0400
Change: 2021-04-14 17:34:07.364283000 -0400
Birth: -

In a packet trace, that inode number is “fileid” and found in REPLY calls, such as GETATTR:

If we only know the inode number (as if we got it from a packet trace), we can use the number on the client to find the file name. One way is with “find”:

# find /path/to/mountpoint -inum <inodenumber>

For example:

# find /mnt/client1 -inum 4133624749
/mnt/client1/copy-file-finder

“find” can take a while – especially in a high file count environment, so we could also use XCP.

# xcp -l -match 'fileid== <inodenumber>' server1:/export

In this case:

# xcp -l -match 'fileid== 4133624749' DEMO:/FG2
XCP 1.6.1; (c) 2021 NetApp, Inc.; Licensed to Justin Parisi [NetApp Inc] until Tue Jun 22 12:34:48 2021

r-xr-xr-x --- 1102 10002 12 0 12d23h FG2/copy-file-finder

Filtered: 8173 did not match

Xcp command : xcp -l -match fileid== 4133624749 DEMO:/FG2
Stats : 8,174 scanned, 1 matched
Speed : 1.47 MiB in (2.10 MiB/s), 8.61 KiB out (12.3 KiB/s)
Total Time : 0s.
STATUS : PASSED

Hope this helps you find files in your NFS filesystem! If you have questions or comments, leave them below.

Behind the Scenes – Episode 286 – Virtual Desktop Service and VMware vSphere with ONTAP

Welcome to the Episode 286, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

2019-insight-design2-warhol-gophers

This week, we discuss the new Virtual Desktop Service offering from NetApp with Technical Marketing Engineer Suresh Thoppay (@sthoppay, https://www.linkedin.com/in/sureshthoppay/)!

For the VDS trial:

https://cloud.netapp.com/vds-lp-30-day-sandbox-trial

Podcast Transcriptions

If you want a searchable transcript of the episode, check it out here (just set expectations accordingly):

Episode 286 – Virtual Desktop Service and VMware vSphere with ONTAP – Transcript

Just use the search field to look for words you want to read more about. (For example, search for “storage”)

transcript.png

Be sure to give us feedback (or if you need a full text transcript – Gong does not support sharing those yet) on the transcription in the comments here or via podcast@netapp.com! If you have requests for other previous episode transcriptions, let me know!

Tech ONTAP Community

We also now have a presence on the NetApp Communities page. You can subscribe there to get emails when we have new episodes.

Tech ONTAP Podcast Community

techontap_banner2

Finding the Podcast

You can find this week’s episode here:

You can also find the Tech ONTAP Podcast on:

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Behind the Scenes – Episode 285 – Project Astra? Project No More!

Welcome to the Episode 285, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

2019-insight-design2-warhol-gophers

This week, NetApp Technical Director Garrett Mueller (@innergy) and Senior Director of Cloud Management Sayan Saha (sayans@netapp.com) join us to discuss NetApp’s Kubernetes automation stack, Astra, achieving general availability status!

For more information:

Podcast Transcriptions

If you want a searchable transcript of the episode, check it out here (just set expectations accordingly):

Episode 285 – NetApp Astra GA – Transcript

Just use the search field to look for words you want to read more about. (For example, search for “storage”)

transcript.png

Be sure to give us feedback (or if you need a full text transcript – Gong does not support sharing those yet) on the transcription in the comments here or via podcast@netapp.com! If you have requests for other previous episode transcriptions, let me know!

Tech ONTAP Community

We also now have a presence on the NetApp Communities page. You can subscribe there to get emails when we have new episodes.

Tech ONTAP Podcast Community

techontap_banner2

Finding the Podcast

You can find this week’s episode here:

You can also find the Tech ONTAP Podcast on:

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

MacOS NFS Clients with ONTAP – Tips and Considerations

When I’m testing stuff out for customer deployments that I don’t work with a ton, I like to keep notes on the work so I can reference it later for TRs or other things. A blog is a great place to do that, as it might help other people in similar scenarios. This won’t be an exhaustive list, and it is certain to change over time and possibly make its way into a TR, but here we go…

ONTAP and Multiprotocol NAS

Before I get into the MacOS client stuff, we need to understand ONTAP multiprotocol NAS, as it can impact how MacOS clients behave.

In ONTAP, you can serve the same datasets to clients regardless of the NAS protocol they use (SMB or NFS). Many clients can actually do both protocols – MacOS is one of those clients.

The way ONTAP does Multiprotocol NAS (and keeps permissions predictable) is via name mappings and volume “security styles,” which controls what kind of ACLs are in use. TR-4887 goes into more detail on how all that works, but at a high level:

NTFS security styles use NTFS ACLs

SMB clients will map to UNIX users and NFS clients will require mappings to valid Windows users for authentication. Then, permissions are controlled via ACLs. Chmod/chown from NFS clients will fail.

UNIX security styles use UNIX mode bits (rwx) and/or NFSv4 ACLs

SMB clients will require mappings to a valid UNIX user for permissions; NFS clients will only require mapping to a UNIX user name if using NFSv4 ACLs. SMB clients can do *some* permissions changes, but on a very limited basis.

Mixed security styles always use either UNIX or NTFS effective security styles, based on last ACL change

Basically, if an NFS client chmods a file, it switches to UNIX security style. If an SMB client changes ownership of the file, it flips back to NTFS security style. This allows you to change permissions from any client, but you need to ensure you have proper name mappings in place to avoid undesired permission behavior. Generally, we recommend avoiding mixed security styles in most cases.

MacOS NFS Client Considerations

When using MacOS for an NFS client, there are a few things I’ve run into the past week or two while testing that you would want to know to avoid issues.

MacOS can be configured to use Active Directory LDAP for UNIX Identities

When you’re doing multiprotocol NAS (even if the clients will only do NFS, your volumes might have NTFS style permissions), you want to try to use a centralized name service like LDAP so that ONTAP, SMB clients and NFS clients all agree on who the users are, what groups they belong to, what numeric IDs they have, etc. If ONTAP thinks a user has a numeric ID of 1234 and the client things that user has a numeric ID of 5678, then you likely won’t get the access you expected. I wrote up a blog on configuring MacOS clients to use AD LDAP here:

MacOS clients can also be configured to use single sign on with AD and NFS home directories

Your MacOS clients – once added to AD in the blog post above – can now log in using AD accounts. There’s also an additional tab in the Directory Utility that allows you to auto-create home directories when a new user logs in to the MacOS client.

But you can also configure the auto-created home directories to leverage an NFS mount on the ONTAP storage system. You can configure the MacOS client to automount homedirs and then configure the MacOS client to use that path. (This process varies based on Mac version; I’m on 10.14.4 Catalina)

By default, the homedir path is /home in auto_master. We can use that.

Then, chmod the /etc/auto_home file to 644:

$ sudo chmod 644 /etc/auto_home

Create a volume on the ONTAP cluster for the homedirs and ensure it’s able to be mounted from the MacOS clients via the export policy rules (TR-4067 covers export policy rules):

::*> vol show -vserver DEMO -volume machomedirs -fields junction-path,policy
vserver volume      policy  junction-path
------- ----------- ------- ----------------
DEMO    machomedirs default /machomedirs

Create qtrees for each user and set the user/group and desired UNIX permissions:

qtree create -vserver DEMO -volume machomedirs -qtree prof1 -user prof1 -group ProfGroup -unix-permissions 755
qtree create -vserver DEMO -volume machomedirs -qtree student1 -user student1 -group group1 -unix-permissions 755

(For best results on Mac clients, use UNIX security styles.)

Then modify the automount /etc/auto_home file to use that path for homedir mounts. When a user logs in, the homedir will auto mount.

This is the line I used:

* -fstype=nfs nfs://demo:/machomedirs/&

And I also add the home mount

Then apply the automount change:

$ sudo automount -cv
automount: /net updated
automount: /home updated
automount: /Network/Servers updated
automount: no unmounts

Now, when I cd to /home/username, it automounts that path:

$ cd /home/prof1
$ mount
demo:/machomedirs/prof1 on /home/prof1 (nfs, nodev, nosuid, automounted, nobrowse)

But if I want that path to be the new homedir path, I would need to log in as that user and then go to “System Preferences -> Users and Groups” and right click the user. Then select “Advanced Options.”

Then you’d need to restart. Once that happens, log in again and when you first open Terminal, it will use the NFS homedir path.

NOTE: You may want to test if the Mac client can manually mount the homedir before testing logins. If the client can’t automount the homedir on login things will break.

Alternately, you can create a user with the same name as the AD account and then modify the homedir path (this removes the need to login). The Mac will pick up the correct UID, but the group ID may need to be changed.

If you use SMB shares for your home directories, it’s as easy as selecting “Use UNC path” in the User Experience area of Directory Utility (there’s no way to specify NFS here):

With new logins, the profile will get created in the qtree you created for the homedir (and you’ll go through the typical initial Mac setup screens):

# ls -la
total 28
drwxrwxr-x 6 student1 group1 4096 Apr 14 16:39 .
drwxr-xr-x 6 root root 4096 Apr 14 15:28 ..
drwx------ 2 student1 group1 4096 Apr 14 16:39 Desktop
drwx------ 2 student1 group1 4096 Apr 14 16:35 Downloads
drwxr-xr-x 25 student1 group1 4096 Apr 14 16:39 Library
-rw-r--r-- 1 student1 group1 4096 Apr 14 16:35 ._Library
drwx------ 4 student1 group1 4096 Apr 14 16:35 .Spotlight-V100

When you open terminal, it automounts the NFS home directory mount for that user and drops you right into your folder!

Mac NFS Considerations, Caveats, Issues

If you’re using NFS on Mac clients, there are two main things to remember:

  • Volumes/qtrees using UNIX security styles work best with NFS in general
  • Terminal/CLI works better than Finder in nearly all instances

If you have to/want to use Finder, or you have to/want to use NTFS security styles for multiprotocol, then there are some things you’d want to keep in mind.

  • If possible, connect the Mac client to the Active Directory domain and use LDAP for UNIX identities as described above.
  • Ensure your users/groups are all resolving properly on the Mac clients and ONTAP system. TR-4887 and TR-4835 cover some commands you can use to check users and groups, name mappings, group memberships, etc.
  • If you’re using NTFS security style volumes/qtrees and want the Finder to work properly for copies to and from the NFS mount, configure the NFS export policy rule to set -ntfs-unix-security-ops to “ignore” – Finder will bail out if ONTAP returns an error, so we want to silently fail those operations (such as SETATTR; see below).
  • When you open a file for reading/writing (such as a text file), Mac creates a ._filename file along with it. Depending on how many files you have in your volume, this can be an issue. For example, if you open 1 million files and Mac creates 1 million corresponding ._filename files, that starts to add up. Don’t worry! You’re not alone: https://apple.stackexchange.com/questions/14980/why-are-dot-underscore-files-created-and-how-can-i-avoid-them
  • If you’re using DFS symlinks, check out this KB: DFS links do not work on MAC OS client, with ONTAP 9.5 and symlinks enabled

I’ve also run into some interesting behaviors with Mac/Finder/SMB and junction paths in ONTAP, as covered in this blog:

Workaround for Mac Finder errors when unzipping files in ONTAP

One issue that I did a pretty considerable amount of analysis on was the aforementioned “can’t copy using Finder.” Here are the dirty details…

Permissions Error When Copying a File to a NFS Mount in ONTAP using Finder

In this case, a file copy worked using Terminal, but was failing with permissions errors when using Finder and complaining about the file already existing.

First, it wants a login (which shouldn’t be needed):

Then it says this:

If you select “Replace” this is the error:

If you select “Stop” it stops and you are left with an empty 0 byte “file” – so the copy failed.

If you select “Keep Both” the Finder goes into an infinite loop of 0 byte file creations. I stopped mine at around 2500 files (forced an unmount):

# ls -al | wc -l
1981
# ls -al | wc -l
2004
# ls -al | wc -l
2525

So what does that happen? Well, in a packet trace, I saw the following:

The SETATTR fails on CREATE (expected in NFS operations on NTFS security style volumes in ONTAP, but not expected for NFS clients as per RFC standards):

181  60.900209  x.x.x.x    x.x.x.y      NFS  226  V3 LOOKUP Call (Reply In 182), DH: 0x8ec2d57b/copy-file-finder << Mac NFS client checks if the file exists
182  60.900558  x.x.x.y   x.x.x.x      NFS  186  V3 LOOKUP Reply (Call In 181) Error: NFS3ERR_NOENT << does not exist, so let’s create it!
183  60.900633  x.x.x.x    x.x.x.y      NFS  238  V3 CREATE Call (Reply In 184), DH: 0x8ec2d57b/copy-file-finder Mode: EXCLUSIVE << creates the file
184  60.901179  x.x.x.y   x.x.x.x      NFS  362  V3 CREATE Reply (Call In 183)
185  60.901224  x.x.x.x    x.x.x.y      NFS  238  V3 SETATTR Call (Reply In 186), FH: 0x7b82dffd
186  60.901564  x.x.x.y   x.x.x.x      NFS  214  V3 SETATTR Reply (Call In 185) Error: NFS3ERR_PERM << fails setting attributes, which also fails the copy of the actual file data, so we have a 0 byte file

Then it REMOVES the file (since the initial operation fails) and creates it again, and SETATTR fails again. This is where that “Keep Both” loop behavior takes place.

229 66.995698 x.x.x.x x.x.x.y NFS 210 V3 REMOVE Call (Reply In 230), DH: 0x8ec2d57b/copy-file-finder
233 67.006816 x.x.x.x x.x.x.y NFS 226 V3 LOOKUP Call (Reply In 234), DH: 0x8ec2d57b/copy-file-finder
234 67.007166 x.x.x.y x.x.x.x NFS 186 V3 LOOKUP Reply (Call In 233) Error: NFS3ERR_NOENT
247 67.036056 x.x.x.x x.x.x.y NFS 238 V3 CREATE Call (Reply In 248), DH: 0x8ec2d57b/copy-file-finder Mode: EXCLUSIVE
248 67.037662 x.x.x.y x.x.x.x NFS 362 V3 CREATE Reply (Call In 247)
249 67.037732 x.x.x.x x.x.x.y NFS 238 V3 SETATTR Call (Reply In 250), FH: 0xc33bff48
250 67.038534 x.x.x.y x.x.x.x NFS 214 V3 SETATTR Reply (Call In 249) Error: NFS3ERR_PERM

With Terminal, it operates a little differently. Rather than bailing out after the SETATTR failure, it just retries it:

11 19.954145 x.x.x.x x.x.x.y NFS 226 V3 LOOKUP Call (Reply In 12), DH: 0x8ec2d57b/copy-file-finder
12 19.954496 x.x.x.y x.x.x.x NFS 186 V3 LOOKUP Reply (Call In 11) Error: NFS3ERR_NOENT
13 19.954560 x.x.x.x x.x.x.y NFS 226 V3 LOOKUP Call (Reply In 14), DH: 0x8ec2d57b/copy-file-finder
14 19.954870 x.x.x.y x.x.x.x NFS 186 V3 LOOKUP Reply (Call In 13) Error: NFS3ERR_NOENT
15 19.954930 x.x.x.x x.x.x.y NFS 258 V3 CREATE Call (Reply In 18), DH: 0x8ec2d57b/copy-file-finder Mode: UNCHECKED
16 19.954931 x.x.x.x x.x.x.y NFS 230 V3 LOOKUP Call (Reply In 17), DH: 0x8ec2d57b/._copy-file-finder
17 19.955497 x.x.x.y x.x.x.x NFS 186 V3 LOOKUP Reply (Call In 16) Error: NFS3ERR_NOENT
18 19.957114 x.x.x.y x.x.x.x NFS 362 V3 CREATE Reply (Call In 15)
25 19.959031 x.x.x.x x.x.x.y NFS 238 V3 SETATTR Call (Reply In 26), FH: 0x8bcb16f1
26 19.959512 x.x.x.y x.x.x.x NFS 214 V3 SETATTR Reply (Call In 25) Error: NFS3ERR_PERM
27 19.959796 x.x.x.x x.x.x.y NFS 238 V3 SETATTR Call (Reply In 28), FH: 0x8bcb16f1 << Hey let's try again and ask in a different way!
28 19.960321 x.x.x.y x.x.x.x NFS 214 V3 SETATTR Reply (Call In 27)

The first SETATTR tries to chmod to 700:

Mode: 0700, S_IRUSR, S_IWUSR, S_IXUSR

The retry uses 777. Since the file already shows as 777, it succeeds (because it was basically fooled):

Mode: 0777, S_IRUSR, S_IWUSR, S_IXUSR, S_IRGRP, S_IWGRP, S_IXGRP, S_IROTH, S_IWOTH, S_IXOTH

Since Finder bails on the error, setting the NFS server to return no error here for this export (ntfs-unix-security-ops ignore) on this client allows the copy to succeed. You can create granular rules in your export policy rules to just set that option for your Mac clients.

Now, why do our files all show as 777?

Displaying NTFS Permissions via NFS

Because NFS doesn’t understand NTFS permissions, the job to translate user identities into valid access rights falls onto the shoulders of ONTAP. A UNIX user maps to a Windows user and then that Windows user is evaluated against the folder/file ACLs.

So “777” here doesn’t mean we have wide open access; we only have access based on the Windows ACL. Instead, it just means “the Linux client can’t view the access level for that user.” In most cases, this isn’t a huge problem. But sometimes, you need files/folders not to show 777 (like for applications that don’t allow 777).

In that case, you can control somewhat how NFS clients display NTFS ACLs in “ls” commands with the NFS server option ntacl-display-permissive-perms.

[-ntacl-display-permissive-perms {enabled|disabled}] - Display maximum NT ACL Permissions to NFS Client (privilege: advanced)
This optional parameter controls the permissions that are displayed to NFSv3 and NFSv4 clients on a file or directory that has an NT ACL set. When true, the displayed permissions are based on the maximum access granted by the NT ACL to any user. When false, the displayed permissions are based on the minimum access granted by the NT ACL to any user. The default setting is false.

The default setting of “false” is actually “disabled.” When that option is enabled, this is the file/folder view:

When that option is disabled (the default):

This option is covered in more detail in TR-4067, but it doesn’t require a remount to take effect. It may take some time for the access caches to clear to see the results, however.

Keep in mind that these listings are approximations of the access as seen by the current user. If the option is disabled, you see the minimum access; if the option is enabled, you see the maximum access. For example, the “test” folder above shows 555 when the option is disabled, but 777 when the option is enabled.

These are the actual permissions on that folder:

::*> vserver security file-directory show -vserver DEMO -path /FG2/test
Vserver: DEMO
File Path: /FG2/test
File Inode Number: 10755
Security Style: ntfs
Effective Style: ntfs
DOS Attributes: 10
DOS Attributes in Text: ----D---
Expanded Dos Attributes: -
UNIX User Id: 1102
UNIX Group Id: 10002
UNIX Mode Bits: 777
UNIX Mode Bits in Text: rwxrwxrwx
ACLs: NTFS Security Descriptor
Control:0x8504
Owner:BUILTIN\Administrators
Group:NTAP\ProfGroup
DACL - ACEs
ALLOW-Everyone-0x1200a9-OI|CI (Inherited)
ALLOW-NTAP\prof1-0x1f01ff-OI|CI (Inherited)

Here are the expanded ACLs:

                     Owner:BUILTIN\Administrators
                     Group:NTAP\ProfGroup
                     DACL - ACEs
                       ALLOW-Everyone-0x1200a9-OI|CI (Inherited)
                          0... .... .... .... .... .... .... .... = Generic Read
                          .0.. .... .... .... .... .... .... .... = Generic Write
                          ..0. .... .... .... .... .... .... .... = Generic Execute
                          ...0 .... .... .... .... .... .... .... = Generic All
                          .... ...0 .... .... .... .... .... .... = System Security
                          .... .... ...1 .... .... .... .... .... = Synchronize
                          .... .... .... 0... .... .... .... .... = Write Owner
                          .... .... .... .0.. .... .... .... .... = Write DAC
                          .... .... .... ..1. .... .... .... .... = Read Control
                          .... .... .... ...0 .... .... .... .... = Delete
                          .... .... .... .... .... ...0 .... .... = Write Attributes
                          .... .... .... .... .... .... 1... .... = Read Attributes
                          .... .... .... .... .... .... .0.. .... = Delete Child
                          .... .... .... .... .... .... ..1. .... = Execute
                          .... .... .... .... .... .... ...0 .... = Write EA
                          .... .... .... .... .... .... .... 1... = Read EA
                          .... .... .... .... .... .... .... .0.. = Append
                          .... .... .... .... .... .... .... ..0. = Write
                          .... .... .... .... .... .... .... ...1 = Read

                       ALLOW-NTAP\prof1-0x1f01ff-OI|CI (Inherited)
                          0... .... .... .... .... .... .... .... = Generic Read
                          .0.. .... .... .... .... .... .... .... = Generic Write
                          ..0. .... .... .... .... .... .... .... = Generic Execute
                          ...0 .... .... .... .... .... .... .... = Generic All
                          .... ...0 .... .... .... .... .... .... = System Security
                          .... .... ...1 .... .... .... .... .... = Synchronize
                          .... .... .... 1... .... .... .... .... = Write Owner
                          .... .... .... .1.. .... .... .... .... = Write DAC
                          .... .... .... ..1. .... .... .... .... = Read Control
                          .... .... .... ...1 .... .... .... .... = Delete
                          .... .... .... .... .... ...1 .... .... = Write Attributes
                          .... .... .... .... .... .... 1... .... = Read Attributes
                          .... .... .... .... .... .... .1.. .... = Delete Child
                          .... .... .... .... .... .... ..1. .... = Execute
                          .... .... .... .... .... .... ...1 .... = Write EA
                          .... .... .... .... .... .... .... 1... = Read EA
                          .... .... .... .... .... .... .... .1.. = Append
                          .... .... .... .... .... .... .... ..1. = Write
                          .... .... .... .... .... .... .... ...1 = Read

So, prof1 has Full Control (7) and “Everyone” has Read (5). That’s where the minimum/maximum permissions show up. So you won’t get *exact* permissions here. If you want exact permission views, consider using UNIX security styles.

DS_Store files

Mac will leave these little files laying around as users browse shares. In a large environment, that can start to create clutter, so you may want to consider disabling the creation of these on network shares (such as NFS mounts), as per this:

http://hints.macworld.com/article.php?story=2005070300463515http://hints.macworld.com/article.php?story=2005070300463515

If you have questions, comments or know of some other weirdness in MacOS with NFS, comment below!