How to identify a file or folder in ONTAP in NFS packet traces

When you’re troubleshooting NFS issues, sometimes you have to collect a packet capture to see what’s going on. But the issue is, packet captures don’t really tell you the file or folder names. I like to use Wireshark for Mac and Windows, and regular old tcpdump for Linux. For ONTAP, you can run packet captures using this KB (requires NetApp login):

How to capture packet traces (tcpdump) on ONTAP 9.2+ systems

By default, Wireshark shows NFS packets like this ACCESS call. We see a FH, which is in hex, and then we see another filehandle that’s even more unreadable. We’ll occasionally see file names in the trace (like copy-file below), but if we need to find out why an ACCESS call fails, we’ll have difficulty:

Luckily, Wireshark has some built-in stuff to crack open those NFS file handles in ONTAP.

Also, check out this new blog:

How to Map File and Folder Locations to NetApp ONTAP FlexGroup Member Volumes with XCP

Changing Wireshark Settings

First, we’d want to set the NFS preferences. That’s done via Edit -> Preferences and then by clicking on “Protocols” in the left hand menu and selecting NFS:

Here, you’ll see some options that you can read more about by mousing over them:

I just select them all.

When we go to the packet we want to analyze, we can right click and select “Decode As…”:

This brings up the “Decode As” window. Here, we have “NFS File Handle Types” pre-populated. Double-click (none) under “Current” and you get a drop down menu. You’ll get some options for NFS, including…. ONTAP! In this case, since I’m using clustered ONTAP, I select ontap_gx_v3. (GX is what clustered ONTAP was before clustered ONTAP was clustered ONTAP):

If you click “OK” it will apply to the current session only. If you click “Save” it will keep those preferences every time.

Now, when the ACCESS packet is displayed, I get WAY more information about the file in question and they’re translated to decimal values.

Those still don’t mean a lot to us, but I’ll get to that.

Mapping file handle values to files in ONTAP

Now, we can use the ONTAP CLI and the packet capture to discern exactly what file has that ACCESS call.

Every volume in ONTAP has a unique identifier called a “Master Set ID” (or MSID). You can see the volume’s MSID with the following diag priv command:

cluster::*> vol show -vserver DEMO -volume vol2 -fields msid
vserver volume  msid
------- ------- -----------
DEMO    vol2    2163230318

If you know the volume name you’re troubleshooting, then that makes life easier – just use find in the packet details.

If you don’t, the MSID can be found in a packet trace in the ACCESS reply as the “fsid”:

You can then find the volume name and exported path with the MSID in the ONTAP CLI with:

cluster::*> set diag; vol show -vserver DEMO -msid  2163230318 -fields volume,junction-path
vserver volume  junction-path
------- ------- ----------- 
DEMO    vol2    /vol2 

File and directory handles are constructed using that MSID, which is why each volume is considered a distinct filesystem. But we don’t care about that, because Wireshark figures all that out for us and we can use the ONTAP CLI to figure it out as well.

The pertinent information in the trace as it maps to the files and folders are:

  • Spin file id = inode number in ONTAP
  • Spin file unique id = file generation number
  • File id = inode number as seen by the NFS client

If you know the volume and file or folder’s name, you can easily find the inode number in ONTAP with this command:

cluster::*> set advanced; showfh -vserver DEMO /vol/vol2/folder
Vserver                Path
---------------------- ---------------------------
DEMO                   /vol/vol2/folder
flags   snapid fileid    generation fsid       msid         dsid
------- ------ --------- ---------- ---------- ------------ ------------
0x8000  0      0x658e    0x227ed312 -          -            0x1639

In the above, the values are in hex, but we can translate with a hex converter, like this one:

https://www.rapidtables.com/convert/number/hex-to-decimal.html

So, for the values we got:

  • file ID (inode) 0x658e = 25998
  • generation ID 0x227ed312 = 578736914

In the trace, that matches up:

Finding file names and paths by inode number

But what happens if you don’t know the file name and just have the information from the trace?

One way is to use the nodeshell level command “inodepath.”

::*> node run -node node1 inodepath -v files 15447
Inode 15447 in volume files (fsid 0x142a) has 1 name.
Volume UUID is: 76a69b93-cc2f-11ea-b16f-00a098696eda
[ 1] Primary pathname = /vol/files/newapps/user1-file-smb

This will work with a FlexGroup volume as well, provided you know the node and the member volume where the file lives (see “How to Map File and Folder Locations to NetApp ONTAP FlexGroup Member Volumes with XCP” for a way to figure that info out).

::*> node run -node node2 inodepath -v FG2__0007 5292
Inode 5292 in volume FG2__0007 (fsid 0x1639) has 1 name.
Volume UUID is: 87b14652-9685-11eb-81bf-00a0986b1223
[ 1] Primary pathname = /vol/FG2/copy-file-finder

There’s also a diag privilege command in ONTAP for that. The caveat is it can be dangerous to run, especially if you make a mistake in running it. (And when I say dangerous, I mean best case, it hangs your CLI session for a while; worst case, it panics the node.) If possible, use inodepath instead.

Here’s how we could use the volume name and inode number to find the file name. For a FlexVol volume, it’s simple:

cluster::*> vol explore -format inode -scope volname.inode -dump name

For example:

cluster::*> volume explore -format inode -scope files.15447 -dump name
name=/newapps/user1-file-smb

With a FlexGroup volume, however, it’s a little more complicated, as there are member volumes to take into account and there’s no easy way for ONTAP to discern which FlexGroup member volume has the file, since ONTAP inode numbers can be reused in different member volumes. This is because the file IDs presented to NFS clients are created using the inode numbers and things like the member volume’s MSID (which is different than the FlexGroup’s MSID).

To make this happen with volume explore, we’d be working in reverse – listing the contents of the volume’s files/folders, then using the inode number of the parent folder, listing those, etc. With high file count environments, this is basically an impossibility.

In that case, we’d need to use an NFS client to discover the file name associated with the inode number in question.

From the client, we have two commands to find an inode number for a file. In this case we know the file’s location and name:

# ls -i /mnt/client1/copy-file-finder
4133624749 /mnt/client1/copy-file-finder
#stat copy-file-finder
File: ‘copy-file-finder’
Size: 12 Blocks: 0 IO Block: 1048576 regular file
Device: 2eh/46d Inode: 4133624749 Links: 1
Access: (0555/-r-xr-xr-x) Uid: ( 1102/ prof1) Gid: (10002/ProfGroup)
Access: 2021-04-14 11:47:45.579879000 -0400
Modify: 2021-04-14 11:47:45.588875000 -0400
Change: 2021-04-14 17:34:07.364283000 -0400
Birth: -

In a packet trace, that inode number is “fileid” and found in REPLY calls, such as GETATTR:

If we only know the inode number (as if we got it from a packet trace), we can use the number on the client to find the file name. One way is with “find”:

# find /path/to/mountpoint -inum <inodenumber>

For example:

# find /mnt/client1 -inum 4133624749
/mnt/client1/copy-file-finder

“find” can take a while – especially in a high file count environment, so we could also use XCP.

# xcp -l -match 'fileid== <inodenumber>' server1:/export

In this case:

# xcp -l -match 'fileid== 4133624749' DEMO:/FG2
XCP 1.6.1; (c) 2021 NetApp, Inc.; Licensed to Justin Parisi [NetApp Inc] until Tue Jun 22 12:34:48 2021

r-xr-xr-x --- 1102 10002 12 0 12d23h FG2/copy-file-finder

Filtered: 8173 did not match

Xcp command : xcp -l -match fileid== 4133624749 DEMO:/FG2
Stats : 8,174 scanned, 1 matched
Speed : 1.47 MiB in (2.10 MiB/s), 8.61 KiB out (12.3 KiB/s)
Total Time : 0s.
STATUS : PASSED

Hope this helps you find files in your NFS filesystem! If you have questions or comments, leave them below.

Updated FlexGroup Technical Reports now available for ONTAP 9.7!

ONTAP 9.7 is now available, so that means the TRs need to get a refresh.

161212-westworld-news

There are some new features in ONTAP 9.7 for FlexGroup volumes, including:

The TRs cover those features, and there are some updates to other areas that might not have been as clear as they could have been. I also added some new use cases.

Also, check out the newest FlexGroup episode of the Tech ONTAP Podcast:

Behind the Scenes: Episode 219 – FlexVol to FlexGroup Conversion

TR Update List

Here’s the list of FlexGroup TRs that have been updated for ONTAP 9.7:

TR-4678: Data Protection and Backup – FlexGroup volumes

This covers backup and DR best practices/support for FlexGroup volumes.

TR-4557: FlexGroup Volume Technical Overview

This TR is a technical overview, which is intended just to give information on how FlexGroups work.

TR-4571-a is an abbreviated best practice guide for easy consumption.

TR-4571: FlexGroup Best Practice Guide

This is the best practices TR and also offers information on new features, including details on FlexVol to FlexGroup convert!

Most of these updates came from feedback and questions I received. If you have something you want to see added to the TRs, let me know!

Updated FlexGroup Technical Reports now available for ONTAP 9.6!

ONTAP 9.6 is now available, so that means the TRs need to get a refresh.

161212-westworld-news

There are some new features in ONTAP 9.6 for FlexGroup volumes, including:

  • Elastic Sizing
  • MetroCluster support
  • SMB CA shares
  • FlexGroup rename/shrink

The TRs cover those features, and there are some updates to other areas that might not have been as clear as they could have been. I also added some new use cases.

Also, check out the newest FlexGroup episode of the Tech ONTAP Podcast:

TR Update List

Here’s the list of FlexGroup TRs that have been updated for ONTAP 9.6:

TR-4678: Data Protection and Backup – FlexGroup volumes

This covers backup and DR best practices/support for FlexGroup volumes.

TR-4557: FlexGroup Volume Technical Overview

This TR is a technical overview, which is intended just to give information on how FlexGroups work.

TR-4571-a is an abbreviated best practice guide for easy consumption.

TR-4571: FlexGroup Best Practice Guide

This is the best practices TR and also offers:

  • More detailed information about high file count environments and directory structure
  • More information about maxdirsize limits
  • Information on effects of drive failures
  • Workarounds for lack of NFSv4.x ACL support
  • Member volume count considerations when dealing with small and large files
  • Considerations when deleting FlexGroup volumes (and the volume recovery queue)
  • Clarifications on requirements for available space in an aggregate
  • System Manager support updates

Most of these updates came from feedback and questions I received. If you have something you want to see added to the TRs, let me know!

Behind the Scenes: Episode 189 – ONTAP 9.6 Overview

Welcome to the Episode 189, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

This week on the podcast, we give you the lowdown on the latest ONTAP 9.6 release with ONTAP Systems Group Vice President Octavian Tanase (@octav), Senior Director of Product Management Jeff Baxter (@baxontap), and Technical Product Marketing Manager Skip Shapiro (skip.shapiro@netapp.com)! 

Join us as we talk about how ONTAP 9.6 brings more simplicity, productivity, customer use cases, data protection and security to your datacenter. 

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

Sneak Peek! Elastic Sizing for FlexGroup Volumes in ONTAP 9.6

ONTAP 9.6 is coming soon and I recently posted a sneak peek for REST API support. But REST APIs aren’t the only new feature coming with the release. FlexGroup volumes are getting some new enhancements as well.

These include:

  • Ability to rename a FlexGroup volume
  • Ability to shrink a FlexGroup volume
  • Support for MetroCluster with FlexGroup volumes
  • SMB CA share support

One of the bigger features (albeit more under the radar) is a way for ONTAP to help FlexGroup volumes avoid failed writes to volumes due to being out of space – elastic sizing!

Image result for plastic man vs mr fantastic

Prior to ONTAP 9.6, storage administrators had to be a bit more cognizant of member volume capacity, because if a member volume ran out of space in a FlexGroup volume, the file write would fail. Since files do not stripe across member volumes, a single file could grow over time to cause issues with space allocation.

fg-filled.png

There are a few reasons a member volume in a FlexGroup might fill up.

  • A single file that exceeds the available space of a member volume is attempted to be written. For example, a 10GB file is written to a member volume with just 9GB available.
  • A file is appended/written to over time and eventually fills up a member volume. For example, if a database resides in a member volume.
  • Snapshots eat into the active file system space available.

FlexGroup volumes do a generally good job at allocating space across member volumes, but if a workload anomaly occurs, it can throw things off. (Like if your volume is mostly a bunch of 4K files but then you zip a lot of them up and create a giant single file).

Remediation of this problem is generally growing volumes or deleting data. But usually, admins won’t notice the issue until it’s too late and “out of space” errors have occurred. That’s where Elastic Sizing comes in handy.

Elastic Sizing – An Airbag for your Data

One of our FlexGroup volume developers refers to elastic sizing as an “airbag” in that it’s not designed to stop you from getting into an accident, but it does help soften the landing when it happens.

Image result for airbag

In other words, it’s not going to prevent you from writing large files or from running out of space, but it is going to provide a way for those writes to complete.

Here’s how it works…

  1. When a file is written to ONTAP, the system has no idea how large that file will become. The client doesn’t know. The application usually doesn’t know. All that’s known is “hey, I want to write a file.”
  2. When a FlexGroup volume receives a write request, it will get placed in the best available member based on a variety of factors – such as available capacity, inode count, time since last file creation, member volume performance (new in ONTAP 9.6), etc…
  3. When a file is placed, since ONTAP doesn’t know how big a file will get, it also doesn’t know if the file is going to grow to a size that’s larger than the available space. So, the write is allowed as long as we have space to allow it.
  4. If/when the member volume runs out of space, right before ONTAP sends an error to the client that we’ve run out of space, it will query the other member volumes in the FlexGroup to see if there’s any available space to borrow. If there is, ONTAP will add 1% of the volume’s total capacity (in a range of 10MB to 10GB) to the volume that is full (while taking the same amount from another member volume in the same FlexGroup volume) and then the file write will continue.
  5. During the time ONTAP is looking for space to borrow, that file write is paused – this will appear to the client as a performance issue. But the overall goal isn’t to finish the write fast – it’s to allow the write to finish at all. In most cases, a member volume will be large enough to provide the 10GB increment (1% of 1TB is 10GB), which is often more than enough to allow a file creation to complete. In smaller member volumes, the performance impact could be greater, as the system will need to query to borrow space more often.
  6. The capacity borrowing will maintain the overall size of the FlexGroup – for example, if your FlexGroup is 40TB in size, it will remain 40TB.

fg-elastic.png

Once files are deleted/volumes are grown and space is available in that member volume again, ONTAP will re-adjust the member volumes back to their original sizes to maintain an evenness in space.

Ultimately, elastic sizing helps remove the admin overhead of managing space, as well as worrying so much about the initial sizing/deployment of a FlexGroup. You can spend less time thinking about how many member volumes you need, what size they should be, etc.

When you combine elastic sizing in ONTAP 9.6 with features like autogrow/shrink, then ONTAP can pretty much manage your capacity in most cases and help avoid emergency space issues.

Elastic sizing = new FlexGroup use cases?

Traditionally, FlexGroup volume use cases have mainly been for unstructured NAS data, high file count environments, small files, etc. and I’ve cautioned people against putting larger files into FlexGroup volumes because of the aforementioned issues with large files/files that grow potentially filling up a member volume.

But now, with elastic sizing to mitigate those issues, along with volume autogrow/shrink, the FlexGroup use cases get a bit more expanded and interesting.

Why not put a workload with large files/files that grow on a FlexGroup now? In fact, with SMB support for Continuously Available shares for Hyper-V and SQL server, there is further proof that FlexGroup volumes are becoming more viable solutions for a variety of workloads.

You can find the latest podcast for FlexGroup volumes here:

Behind the Scenes: Episode 188 – FlexGroup Volumes Update

Welcome to the Episode 188, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

This week on the podcast, we deliver a long overdue update to Episode 46 of the Tech ONTAP podcast, where we first covered FlexGroup volumes.

We bring back lead developer Richard Jernigan – as well as Technical Director Dan Tennant – to discuss what’s new, what’s changed and what’s coming down the line for FlexGroup volumes.

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

New White Paper! Media and Entertainment Workloads using NetApp ONTAP! #NAB2019

Image result for media and entertainment

Every year, the National Association of Broadcasters puts on a show to deliver the latest and greatest in media and entertainment content and technology solutions.

This year, I decided to try to piggyback on the show and put out a new white paper about how NetApp ONTAP works with media and entertainment workloads. Included in this whitepaper:

  • DreamWorks Animation case study on NetApp ONTAP
  • Media/entertainment benchmark numbers on NetApp FlexGroup volumes
  • Why you’d want to use NetApp ONTAP

You can find the white paper here:

https://www.netapp.com/us/media/wp-7301.pdf

Leave your feedback in the comments!

Behind the Scenes: Episode 182 – NetApp on NetApp: FlexGroup Volumes and ActiveIQ

Welcome to the Episode 182, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, we invite in the guys from Customer One, who operate the NetApp on NetApp program. NetApp on NetApp is a program where we leverage the latest NetApp technologies within our own organizations. Eduardo Rivera (@mredrivera) and Faisal Salaam (https://www.linkedin.com/in/faisal-salam-754a13104/) as we discuss how NetApp is using FlexGroup volumes to power Active IQ. 

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

New and updated FlexGroup Technical Reports now available for ONTAP 9.4!

ONTAP 9.4 is now available, so that means the TRs need to get a refresh.

161212-westworld-news

Here’s what I’ve done for FlexGroup in ONTAP 9.4…

New Tech Report!

First, I moved the data protection section of the best practices TR (TR-4571) into its own dedicated backup and data protection TR, which can be found here:

TR-4678: Data Protection and Backup – FlexGroup volumes

Why? Well, that section is going to grow larger and larger as we add more data protection and backup functionality, so it made sense to proactively create a new one.

Updated TRs!

TR-4557 got an update of mostly just what’s new in ONTAP 9.4. That TR is a technical overview, which is intended just to give information on how FlexGroups work. The new feature payload for FlexGroup volumes in ONTAP 9.4 included:

  • QoS minimums and Adaptive QoS
  • FPolicy and file audit
  • SnapDiff support

TR-4571 is the best practices TR and got a brunt of the updates. Included in the TR (aside from details about new features), I added:

  • More detailed information about high file count environments and directory structure
  • More information about maxdirsize limits
  • Information on effects of drive failures
  • Workarounds for lack of NFSv4.x ACL support
  • Member volume count considerations when dealing with small and large files
  • Considerations when deleting FlexGroup volumes (and the volume recovery queue)
  • Clarifications on requirements for available space in an aggregate
  • System Manager support updates

Most of these updates came from feedback and questions I received. If you have something you want to see added to the TRs, let me know!

Behind the Scenes: Episode 88 – Migrating to ONTAP, FlexGroup volumes

Welcome to the Episode 88, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

group-4-2016

This week on the podcast, we invited Hadrian Baron of NetApp’s migration team to talk about moving from 7-Mode and competitor storage over to clustered ONTAP, as well as the advancements made in the simplicity and speed of moving there. We also discuss multiprotocol NAS challenges and FlexGroup volumes and their benefits.

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

You can listen here:

You can also now find us on YouTube. (The uploads are sporadic and we don’t go back prior to Episode 85):