Behind the Scenes: Episode 119 – NFSv4.x Deep Dive

Welcome to the Episode 119, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, I pass the NFS torch on to Chris Hurley (@averageguyx) as we talk about NFSv4.x in ONTAP and why you should be thinking of making the move from NFSv3 in the near future.

Chris Hurley’s blog can be found at: http://averageguyx.blogspot.com

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

This week’s episode is here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

 

Advertisements

ONTAP 9.3 NFS sneak preview: Mount and security tracing

aid1871175-v4-728px-trace-step-6-version-2

ONTAP 9.3 is on its way, and with it comes some long-awaited new functionality for NFS debugging, including a way to map volumes to IP addresses!

Mount trace

In ONTAP 7-Mode, you could trace mount requests with an option “nfs.mountd.trace.” That didn’t make its way into ONTAP operating in cluster mode until ONTAP 9.3. I covered a long and convoluted workaround in How to trace NFSv3 mount failures in clustered Data ONTAP.

Now, you can set different levels of debugging for mount traces via the cluster CLI without having to jump through hoops. As a bonus, you can see which data LIF has mounted to which client, to which volume!

To enable it, you would use the following diag level commands:

::*> debug sktrace tracepoint modify -node [node] -module MntTrace -level [0-20] -enabled true

::*> debug sktrace tracepoint modify -node [node] -module MntDebug -level [0-20] -enabled true

When enabled, ONTAP will log the mount trace modules to the sktrace log file, which is located at /mroot/etc/mlog/skrace.log. This file can be accessed via systemshell, or via the SPI interface. Here are a few of the logging levels:

4 – Info
5 – Error
8 – Debug

When you set the trace level to 8, you can see successful mounts, as well as failures. This gives volume info, client IP and data LIF IP. For example, this mount was done from client 10.63.150.161 to data LIF 10.193.67.218 of vserverID 10 on the /FGlocal path:

cluster::*> debug log sktrace show -node node2 -module-level MntTrace_8
Time TSC CPU:INT Module_Level
--------------------- ------------------------ ------- -------------------
 LogMountTrace: Mount access granted for Client=10.63.150.161
 VserverID=10 Lif=10.193.67.218 Path=/FGlocal

With that info, we can run the following command on the cluster to find the SVM and volume:

::*> net int show -address 10.193.67.218 -fields lif
 (network interface show)
vserver lif
------- ---------
DEMO    10g_data1

::*> volume show -junction-path /FGlocal -fields volume
vserver volume
------- ---------------
DEMO    flexgroup_local

The mount trace command can also be used to figure out why mount failures may have occurred from clients. We can also leverage performance information from OnCommand Performance Manager (top clients) and per-client stats to see what volumes might be seeing large performance increases and work our way backward to see what clients are mounting what LIFs, nodes, volumes, etc. with mount trace enabled.

Security trace (sectrace)

In ONTAP 9.2 and prior, you could trace CIFS/SMB permission issues only, using “sectrace” commands. Starting in ONTAP 9.3, you can now use sectrace on SMB and/or NFS. This is useful to troubleshoot why someone might be having access to a file or folder inside of a volume.

With the command, you can filter on:

  • Client IP
  • Path
  • Windows or UNIX name

Currently, sectrace is not supported on FlexGroup volumes, however.

cluster::*> sectrace filter create -vserver DEMO -index 1 -protocols nfs -trace-allow yes -enabled enabled -time-enabled 60

Warning: Security tracing for NFS will not be done for the following FlexGroups because Security tracing for NFS is not supported for FlexGroups: TechONTAP,flexgroupDS,flexgroup_16,flexgroup_local.
Do you want to continue? {y|n}: y

Then, I tested a permission issue.

# mkdir testsec
# chown 1301 testsec/
# chmod 700 testsec
# su user
$ cd /mnt/flexvol/testsec
bash: cd: /mnt/flexvol/testsec: Permission denied

And this was the result:

cluster::*> sectrace trace-result show -vserver DEMO

Node            Index Filter Details             Reason
--------------- ----- -------------------------- ------------------------------
node2           1     Security Style: UNIX       Access is allowed because the
                      permissions                user has UNIX root privileges
                                                 while creating the directory.
                                                 Access is granted for:
                                                 "Append"
                      Protocol: nfs
                      Volume: flexvol
                      Share: -
                      Path: /testsec
                      Win-User: -
                      UNIX-User: 0
                      Session-ID: -
node2           1     Security Style: UNIX       Access is allowed because the
                      permissions                user has UNIX root privileges
                                                 while setting attributes.
                      Protocol: nfs
                      Volume: flexvol
                      Share: -
                      Path: /testsec
                      Win-User: -
                      UNIX-User: 0
                      Session-ID: -
node2           1     Security Style: UNIX       Access is allowed because the
                                                 permissions user has UNIX root privileges
                                                 while setting attributes.
                      Protocol: nfs
                      Volume: flexvol
                      Share: -
                      Path: /testsec
                      Win-User: -
                      UNIX-User: 0
                      Session-ID: -
node2           1     Security Style: UNIX       Access is not granted for:
                      permissions                "Modify", "Extend", "Delete"
                      Protocol: nfs
                      Volume: flexvol
                      Share: -
                      Path: /
                      Win-User: -
                      UNIX-User: 7041
                      Session-ID: -
node2           1     Security Style: UNIX       Access is not granted for:
                      permissions                "Lookup", "Modify", "Extend",
                                                 "Delete", "Read"
                      Protocol: nfs
                      Volume: flexvol
                      Share: -
                      Path: /testsec
                      Win-User: -
                      UNIX-User: 7041
                      Session-ID: -

As you can see above, the trace output gives a very clear picture about who tried to access the folder, which folder had the error and why the permission issued occurred.

Bonus Round: Block Size Histograms!

Now, this isn’t really a “new in ONTAP 9.3” thing; in fact, I found it as far back as 9.1. I just hadn’t ever noticed it before. But in ONTAP, you can see the block sizes for NFS and CIFS/SMB operations in the CLI with the following command:

cluster::> statistics-v1 protocol-request-size show -node nodename

When you run this, you’ll see the average request size, the total count and a breakdown of what block sizes are being written to the cluster node. This can help you understand your NAS workloads better.

For example, this node runs mostly a VMware datastore workload

cluster::> statistics-v1 protocol-request-size show -node node2 -stat-type nfs3_read

Node: node2
Stat Type: nfs3_read
                     Value    Delta
--------------       -------- ----------
Average Size:        30073    -
Total Request Count: 92633    -
0-511:                1950    -
512-1023:                0    -
1K-2047:              1786    -
2K-4095:              1253    -
4K-8191:             18126    -
8K-16383:              268    -
16K-32767:            4412    -
32K-65535:             343    -
64K-131071:           1560    -
128K - :             62935    -

When you run the command again, you get a delta from the last time you ran it.

If you’re interested in more ONTAP 9.3 feature information, check out Jeff Baxter’s blog here:

https://blog.netapp.com/announcing-netapp-ontap-9-3-the-next-step-in-modernizing-your-data-management/

You can also see me dress up all fancy and break down the new features at a high level here:

I’ll also be doing more detailed blogs on new features as we get closer to the release.

New dedicated NFS Kerberos TR is now available!

When I first started as the NFS TME about 5 years ago, I took TR-4073 and expanded upon it to make it into a larger solution document that covered LDAP, NFSv4.x and Kerberos. As a result, it ballooned from 50-60 pages to 275 pages.

It seemed like a good idea at the time.

¯\_(ツ)_/¯

What I discovered was that while people didn’t fully understand Kerberos, LDAP and NFSv4, many also just wanted something to help them set it up, rather than a manifesto on all the quirks and how it works. So, I decided to do that.

some-of-the-best-recurring-gags-in-family-guy-10-photos-10

TR-4616 is a new TR that is dedicated solely to a simplified setup of NFS Kerberos in ONTAP. The TR is a total of 43 pages, and only 10-15 pages of that is the actual set up.

To make it simpler, I did the following:

  • Limited the scope of setup to ONTAP 9.2 and later, Microsoft Windows 2012/2016, RHEL/Centos 6.x and 7.x
  • Less explanations of “what,” more on “how”
  • Fewer screenshots
  • No LDAP/NFS specific information not related to Kerberos

Have a look and let me know what you think!

Using NFSv4.x ACLs with NFSv3 in NetApp ONTAP? You betcha!

One thing I’ve come to realize from being in IT so long is that you should be constantly learning new things. If you aren’t, it’s not because you’re smart or because you know everything; it’s because you’re stagnating.

So, I was not surprised when I heard it was possible to apply NFSv4.x ACLs to files and folders and then mount them via NFSv3 and have the ACLs still work! I already knew that you could do audit ACEs from NFSv4.x for NFSv3 (covered in TR-4067), but had no idea this could extend into the permissions realm. If so, this solves a pretty big problem with NFSv3 in general, where your normal permissions are limited only to owner, group and then everyone else. That makes it hard to do any sort of granular access control for NFSv3 mounts, presents problems for some environments.

It also allows you to keep using NFSv3 for your workloads, whether for legacy application or general performance concerns. NFSv4.x has a lot of advantages over NFSv3, but if you don’t need stateful operations or the NFSv4.x features, or integrated locking, then you are safe to stay with NFSv3.

So, is it possible to use NFSv4.x ACLs with NFSv3 objects?

You betcha!

fargo-film-marge.jpg

The method for doing this is pretty straightforward.

  1. Configure and enable NFSv4.x in ONTAP and on your client
  2. Enable NFSv4.x ACL support in ONTAP
  3. Mount the export via NFSv4.x
  4. Apply the NFSv4.x ACLs
  5. Unmount and then remount the export using NFSv3 and test it out!

 

Configuring NFSv4.x

When you’re setting up NFSv4.x in an environment, there are a few things to keep in mind:

  • Client and NFS server support for NFSv4.x
  • NFS utilities installed on clients (for NFSv4.x functionality)
  • NFSv4.x configured on the client in idmapd.conf
  • NFSv4.x configured on the server in ONTAP (ACLS allowed)
  • Export policies and rules configured in ONTAP
  • Ideally, a name service server (like LDAP) to negotiate the server/client conversation of user identities

One of the reasons NFS4.x is more secure than NFSv3 is the use of user ID strings (such as user@domain.com) to help limit cases of user spoofing in NFS conversations. This ID string is required to be case-sensitive. If the string doesn’t match on both client and server, then the NFSv4.x mounts will get squashed to the defined “nobody” user in the NFSv4.x client. One of the more common issues seen with NFSv4.x mounts is the “nobody:nobody” user and group on files and folders. One of the most common causes of this is when a domain string is mismatched on the client and server.

In a client that domain string is defined in the idmapd.conf file. Sometimes, it will default to the DNS domain. In ONTAP, the v4-id-domain string should be configured to the same value on the client to provide proper NFSv4.x authentication.

Other measures, such as Kerberos encryption, can help lock the NFS conversations down further. NFSv4.x ACLs are a way to ensure that files and folders are only seen by those entities that have been granted access and is considered to be authorization, or, what you are allowed to do once you authenticate. For more complete steps on setting up NFSv4.x, see TR-4067 and TR-4073.

However, we’re only setting up NFSv4.x to allow us to configure the ACLs…

What are NFSv4.x ACLs?

NFSv4.x ACLs are a way to apply granular permissions to files and folders in NFS outside of the normal “read/write/execute” of NFSv3, and across more objects than simple “owner/group/everyone.” NFSv4.x ACLs allow administrators to set permissions for multiple users and groups on the same file or folder and treat NFS ACLs more like Windows ACLs. For more information on NFSv4.x ACLs, see:

http://wiki.linux-nfs.org/wiki/index.php/ACLs

https://linux.die.net/man/5/nfs4_acl

http://www.netapp.com/us/media/tr-4067.pdf

NFSv3 doesn’t have this capability by default. The only way to get more granular ACLs in NFSv3 natively is to use POSIX ACLs, which ONTAP doesn’t support.

Once you’ve enabled ACLs in ONTAP (v4.0-acl and/or v4.1-acl options), you can mount an NFS export via NFSv4.x and start applying NFSv4.x ACLs.

In my environment, I mounted a homedir volume and then set up an ACL on a file owned by root for a user called “prof1” using nfs4_setfacl -e (which allows you to edit a file rather than have to type in a long command).

[root@centos7 /]# mount demo:/home /mnt
[root@centos7 /]# mount | grep mnt
demo:/home on /mnt type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.193.67.225,local_lock=none,addr=10.193.67.237)

The file lives in the root user’s homedir. The root homedir is set to 755, which means anyone can read them, but no one but the owner (root) can write to them.

drwxr-xr-x 2 root root 4096 Jul 13 10:42 root

That is, unless, I set NFSv4.x ACLs to allow a user full control:

[root@centos7 mnt]# nfs4_getfacl /mnt/root/file
A::prof1@ntap.local:rwaxtTnNcCy
A::OWNER@:rwaxtTnNcCy
A:g:GROUP@:rxtncy
A::EVERYONE@:rxtncy

I can also see those permissions from the ONTAP CLI:

ontap9-tme-8040::*> vserver security file-directory show -vserver DEMO -path /home/root/file

Vserver: DEMO
 File Path: /home/root/file
 File Inode Number: 8644
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 20
 DOS Attributes in Text: ---A----
Expanded Dos Attributes: -
 UNIX User Id: 0
 UNIX Group Id: 1
 UNIX Mode Bits: 755
 UNIX Mode Bits in Text: rwxr-xr-x
 ACLs: NFSV4 Security Descriptor
 Control:0x8014
 DACL - ACEs
 ALLOW-user-prof1-0x1601bf
 ALLOW-OWNER@-0x1601bf
 ALLOW-GROUP@-0x1200a9-IG
 ALLOW-EVERYONE@-0x1200a9

I can also expand the mask to translate the hex:

ontap9-tme-8040::*> vserver security file-directory show -vserver DEMO -path /home/root/file -expand-mask true

Vserver: DEMO
 File Path: /home/root/file
 File Inode Number: 8644
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 20
 DOS Attributes in Text: ---A----
Expanded Dos Attributes: 0x20
 ...0 .... .... .... = Offline
 .... ..0. .... .... = Sparse
 .... .... 0... .... = Normal
 .... .... ..1. .... = Archive
 .... .... ...0 .... = Directory
 .... .... .... .0.. = System
 .... .... .... ..0. = Hidden
 .... .... .... ...0 = Read Only
 UNIX User Id: 0
 UNIX Group Id: 1
 UNIX Mode Bits: 755
 UNIX Mode Bits in Text: rwxr-xr-x
 ACLs: NFSV4 Security Descriptor
 Control:0x8014

1... .... .... .... = Self Relative
 .0.. .... .... .... = RM Control Valid
 ..0. .... .... .... = SACL Protected
 ...0 .... .... .... = DACL Protected
 .... 0... .... .... = SACL Inherited
 .... .0.. .... .... = DACL Inherited
 .... ..0. .... .... = SACL Inherit Required
 .... ...0 .... .... = DACL Inherit Required
 .... .... ..0. .... = SACL Defaulted
 .... .... ...1 .... = SACL Present
 .... .... .... 0... = DACL Defaulted
 .... .... .... .1.. = DACL Present
 .... .... .... ..0. = Group Defaulted
 .... .... .... ...0 = Owner Defaulted

DACL - ACEs
 ALLOW-user-prof1-0x1601bf
 0... .... .... .... .... .... .... .... = Generic Read
 .0.. .... .... .... .... .... .... .... = Generic Write
 ..0. .... .... .... .... .... .... .... = Generic Execute
 ...0 .... .... .... .... .... .... .... = Generic All
 .... ...0 .... .... .... .... .... .... = System Security
 .... .... ...1 .... .... .... .... .... = Synchronize
 .... .... .... 0... .... .... .... .... = Write Owner
 .... .... .... .1.. .... .... .... .... = Write DAC
 .... .... .... ..1. .... .... .... .... = Read Control
 .... .... .... ...0 .... .... .... .... = Delete
 .... .... .... .... .... ...1 .... .... = Write Attributes
 .... .... .... .... .... .... 1... .... = Read Attributes
 .... .... .... .... .... .... .0.. .... = Delete Child
 .... .... .... .... .... .... ..1. .... = Execute
 .... .... .... .... .... .... ...1 .... = Write EA
 .... .... .... .... .... .... .... 1... = Read EA
 .... .... .... .... .... .... .... .1.. = Append
 .... .... .... .... .... .... .... ..1. = Write
 .... .... .... .... .... .... .... ...1 = Read

ALLOW-OWNER@-0x1601bf
 0... .... .... .... .... .... .... .... = Generic Read
 .0.. .... .... .... .... .... .... .... = Generic Write
 ..0. .... .... .... .... .... .... .... = Generic Execute
 ...0 .... .... .... .... .... .... .... = Generic All
 .... ...0 .... .... .... .... .... .... = System Security
 .... .... ...1 .... .... .... .... .... = Synchronize
 .... .... .... 0... .... .... .... .... = Write Owner
 .... .... .... .1.. .... .... .... .... = Write DAC
 .... .... .... ..1. .... .... .... .... = Read Control
 .... .... .... ...0 .... .... .... .... = Delete
 .... .... .... .... .... ...1 .... .... = Write Attributes
 .... .... .... .... .... .... 1... .... = Read Attributes
 .... .... .... .... .... .... .0.. .... = Delete Child
 .... .... .... .... .... .... ..1. .... = Execute
 .... .... .... .... .... .... ...1 .... = Write EA
 .... .... .... .... .... .... .... 1... = Read EA
 .... .... .... .... .... .... .... .1.. = Append
 .... .... .... .... .... .... .... ..1. = Write
 .... .... .... .... .... .... .... ...1 = Read

ALLOW-GROUP@-0x1200a9-IG
 0... .... .... .... .... .... .... .... = Generic Read
 .0.. .... .... .... .... .... .... .... = Generic Write
 ..0. .... .... .... .... .... .... .... = Generic Execute
 ...0 .... .... .... .... .... .... .... = Generic All
 .... ...0 .... .... .... .... .... .... = System Security
 .... .... ...1 .... .... .... .... .... = Synchronize
 .... .... .... 0... .... .... .... .... = Write Owner
 .... .... .... .0.. .... .... .... .... = Write DAC
 .... .... .... ..1. .... .... .... .... = Read Control
 .... .... .... ...0 .... .... .... .... = Delete
 .... .... .... .... .... ...0 .... .... = Write Attributes
 .... .... .... .... .... .... 1... .... = Read Attributes
 .... .... .... .... .... .... .0.. .... = Delete Child
 .... .... .... .... .... .... ..1. .... = Execute
 .... .... .... .... .... .... ...0 .... = Write EA
 .... .... .... .... .... .... .... 1... = Read EA
 .... .... .... .... .... .... .... .0.. = Append
 .... .... .... .... .... .... .... ..0. = Write
 .... .... .... .... .... .... .... ...1 = Read

ALLOW-EVERYONE@-0x1200a9
 0... .... .... .... .... .... .... .... = Generic Read
 .0.. .... .... .... .... .... .... .... = Generic Write
 ..0. .... .... .... .... .... .... .... = Generic Execute
 ...0 .... .... .... .... .... .... .... = Generic All
 .... ...0 .... .... .... .... .... .... = System Security
 .... .... ...1 .... .... .... .... .... = Synchronize
 .... .... .... 0... .... .... .... .... = Write Owner
 .... .... .... .0.. .... .... .... .... = Write DAC
 .... .... .... ..1. .... .... .... .... = Read Control
 .... .... .... ...0 .... .... .... .... = Delete
 .... .... .... .... .... ...0 .... .... = Write Attributes
 .... .... .... .... .... .... 1... .... = Read Attributes
 .... .... .... .... .... .... .0.. .... = Delete Child
 .... .... .... .... .... .... ..1. .... = Execute
 .... .... .... .... .... .... ...0 .... = Write EA
 .... .... .... .... .... .... .... 1... = Read EA
 .... .... .... .... .... .... .... .0.. = Append
 .... .... .... .... .... .... .... ..0. = Write
 .... .... .... .... .... .... .... ...1 = Read

In the above, I gave prof1 full control over the file. Then, I mounted via NFSv3:

[root@centos7 /]# mount -o nfsvers=3 demo:/home /mnt
[root@centos7 /]# mount | grep mnt
demo:/home on /mnt type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.193.67.219,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=10.193.67.219)

When I become a user that isn’t on the NFSv4.x ACL, I can’t write to the file:

[root@centos7 /]# su student1
sh-4.2$ cd /mnt/root
sh-4.2$ ls -la
total 8
drwxr-xr-x 2 root root 4096 Jul 13 10:42 .
drwxrwxrwx 11 root root 4096 Jul 10 10:04 ..
-rwxr-xr-x 1 root bin 0 Jul 13 10:23 file
-rwxr-xr-x 1 root root 0 Mar 29 11:37 test.txt

sh-4.2$ touch file
touch: cannot touch ‘file’: Permission denied
sh-4.2$ rm file
rm: remove write-protected regular empty file ‘file’? y
rm: cannot remove ‘file’: Permission denied

When I change to the prof1 user, I have access to do whatever I want, even though the mode bit permissions in v3 say I can’t:

[root@centos7 /]# su prof1
sh-4.2$ cd /mnt/root
sh-4.2$ ls -la
total 8
drwxr-xr-x 2 root root 4096 Jul 13 10:42 .
drwxrwxrwx 11 root root 4096 Jul 10 10:04 ..
-rwxr-xr-x 1 root bin 0 Jul 13 10:23 file
-rwxr-xr-x 1 root root 0 Mar 29 11:37 test.txt

sh-4.2$ vi file
sh-4.2$ cat file
NFSv4ACLS!

When I do a chmod, however, nothing seems to change from the NFSv4 ACL for the user. I set 700 on the file, which shows up in NFSv3 mode bits:

sh-4.2$ chmod 700 file
sh-4.2$ ls -la
total 8
drwxr-xr-x 2 root root 4096 Jul 13 10:42 .
drwxrwxrwx 11 root root 4096 Jul 10 10:04 ..
-rwx------ 1 root bin 11 Aug 11 09:58 file
-rwxr-xr-x 1 root root 0 Mar 29 11:37 test.txt

But notice how the prof1 user still has full control:

ontap9-tme-8040::*> vserver security file-directory show -vserver DEMO -path /home/root/file

Vserver: DEMO
 File Path: /home/root/file
 File Inode Number: 8644
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 20
 DOS Attributes in Text: ---A----
Expanded Dos Attributes: -
 UNIX User Id: 0
 UNIX Group Id: 1
 UNIX Mode Bits: 700
 UNIX Mode Bits in Text: rwx------
 ACLs: NFSV4 Security Descriptor
 Control:0x8014
 DACL - ACEs
 ALLOW-user-prof1-0x1601bf
 ALLOW-OWNER@-0x1601bf
 ALLOW-GROUP@-0x120088-IG
 ALLOW-EVERYONE@-0x120088

This is because of an ONTAP option known as “ACL Preservation.”

ontap9-tme-8040::*> nfs show -vserver DEMO -fields v4-acl-preserve
vserver v4-acl-preserve
------- ---------------
DEMO enabled

When I set the option to enabled, the NFSv4.x ACLs will survive mode bit changes. If I disable the option, the ACLs get blown away when a chmod is done:

ontap9-tme-8040::*> nfs modify -vserver DEMO -v4-acl-preserve disabled

ontap9-tme-8040::*> nfs show -vserver DEMO -fields v4-acl-preserve
vserver v4-acl-preserve
------- ---------------
DEMO disabled


[root@centos7 root]# chmod 755 file

And the ACLs are wiped out:

ontap9-tme-8040::*> vserver security file-directory show -vserver DEMO -path /home/root/file

Vserver: DEMO
 File Path: /home/root/file
 File Inode Number: 8644
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 20
 DOS Attributes in Text: ---A----
Expanded Dos Attributes: -
 UNIX User Id: 0
 UNIX Group Id: 1
 UNIX Mode Bits: 755
 UNIX Mode Bits in Text: rwxr-xr-x
 ACLs: -

I’d personally recommend setting that option to “enabled” if you want to do v3 mounts with v4.x ACLs.

So, there you have it… a new way to secure your NFSv3 mounts!

NFS Kerberos in ONTAP Primer

Fun fact!

Kerberos was named after Cerberus, the hound of Hades, which protected the gates of the underworld with its three heads of gnashing teeth.

cerberos

Kerberos in IT security isn’t a whole lot different; it’s pretty effective at stopping intruders and is literally a three-headed monster.

In my day to day role as a Technical Marketing Engineer for NFS, I find that one of the most challenging questions I get is regarding NFS mounts using Kerberos. This is especially true now, as IT organizations are focusing more and more on securing their data and Kerberos is one way to do that. CIFS/SMB already does a nice job of this and it’s pretty easily integrated without having to do a ton on the client or storage side.

With NFS Kerberos, however, there are a ton of moving parts and not a ton of expertise that spans those moving parts. Think for a moment what all is involved here when dealing with ONTAP:

  • DNS
  • KDC server (Key Distribution Center)
  • Client/principal
  • NFS server/principal
  • ONTAP
  • NFS
  • LDAP/name services

This blog post isn’t designed to walk you through all those moving parts; that’s what TR-4073 was written for. Instead, this blog is going to simply walk through the workflow of what happens during an NFS mount using Kerberos and where things can fail/common failure scenarios. This post will focus on Active Directory KDCs, since that’s what I see most and get the most questions on. Other UNIX-based KDCs are either not as widely used, or the admins running them are ninjas that never need any help. 🙂

Common terms

First, let’s cover a few common terms used in NFS Kerberos.

Storage Virtual Machine (SVM)

This is what clustered ONTAP uses to present NAS and SAN storage to clients. SVMs act as tenants within a cluster. Think of them as “virtualized storage blades.”

Key Distribution Center (KDC)

The Kerberos ticket headquarters. This stores all the passwords, objects, etc. for running Kerberos in an environment. In Active Directory, domain controllers are KDCs and replicate to other DCs in the environment, which makes Active Directory an ideal platform to run Kerberos on due to ease of use and familiarity. As a bonus, Active Directory is already primed with UNIX attributes for Identity Management with LDAP. (Note: Windows 2012 has UNIX attributes by default; prior to 2012, you had to manually extend the schema.)

Kerberos principals

Kerberos principals are objects within a KDC that can have tickets assigned. Users can own principals. Machine accounts can own principals. However, simply creating a user or machine account doesn’t mean you have created a principal. Those are stored within the object’s LDAP schema attributes in Active Directory. Generally speaking, it’s one of either:

  • servicePrincipalName (SPN)
  • userPrincipalName (UPN)

These get set when adding computers to a domain (including joining Linux clients), as well as when creating new users (every user gets a UPN). Principals include three different components.

  1. Primary – this defines the type of principal (usually a service such as ldap, nfs, host, etc) and is followed by a “/”; Not all principals have primary components. For example, most users are simply user@REALM.COM.
  2. Secondary – this defines the name of the principal (such as jimbob)
  3. Realm – This is the Kerberos realm and is usually defined in ALL CAPS and is the name of the domain your principal was added into (such as CONTOSO.COM)

Keytabs

The keytab file allows a client or server that is participating in an NFS mount to use their keytab to generate AS (authentication service) ticket requests. Think of this as the principal “logging in” to the KDC, similar to what you’d do with a username and password. Keytab files can make their way to clients one of two ways.

  1. Manually creating and copying the keytab file to the client (old school)
  2. Using the domain join tool of your choice (realmd, net ads/samba, adcli, etc.) on the client to automatically negotiate the keytab and machine principals on the KDC (recommended)

Keytab files, when created using the domain join tools, will create multiple entries for Kerberos principals. Generally, this will include a service principal name (SPN) for host/shortname@REALM.COM, host/fully.qualified.name@REALM.COM and a UPN for the machine account such as MACHINE$@REALM.COM. The auto-generated keytabs will also include multiple entries for each principal with different encryption types (enctypes). The following is an example of a CentOS 7 box’s keytab joined to an AD domain using realm join:

# klist -kte
Keytab name: FILE:/etc/krb5.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (des-cbc-crc)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (des-cbc-md5)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (aes128-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (aes256-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (arcfour-hmac)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (des-cbc-crc)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (des-cbc-md5)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (aes128-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (aes256-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (arcfour-hmac)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (des-cbc-crc)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (des-cbc-md5)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (aes128-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (aes256-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (arcfour-hmac)

Encryption types (enctypes)

Encryption types (or enctypes) are the level of encryption used for the Kerberos conversation. The client and KDC will negotiate the level of enctype used. The client will tell the KDC “hey, I want to use this list of enctypes. Which do you support?” and the KDC will respond “I support these, in order of strongest to weakest. Try using the strongest first.” In the example above, this is the order of enctype strength, from strongest to weakest:

  • AES-256
  • AES-128
  • ARCFOUR-HMAC
  • DES-CBC-MD5
  • DES-CBC-CRC

The reason a keytab file would add weaker enctypes like DES or ARCFOUR is for backwards compatibility. For example, Windows 2008 DCs don’t support AES enctypes. In some cases, the enctypes can cause Kerberos issues due to lack of support. Windows 2008 and later don’t support DES unless you explicitly enable it. ARCFOUR isn’t supported in clustered ONTAP for NFS Kerberos. In these cases, it’s good to modify the machine accounts to strictly define which enctypes to use for Kerberos.

What you need before you try mounting

This is a quick list of things that have to be in place before you can expect Kerberos with NFS to work properly. If I left something out, feel free to remind me in the comments. There’s so much info involved that I occasionally forget some things. 🙂

KDC and client – The KDC is a given – in this case, Active Directory. The client would need to have some things installed/configured before you try to join it, including a valid DNS server configuration, Kerberos utilities, etc. This varies depending on client and would be too involved to get into here. Again, TR-4073 would be a good place to start.

DNS entries for all clients and servers participating in the NFS Kerberos operation – this includes forward and reverse (PTR) records for the clients and servers. The DNS friendly names *must* match the SPN names. If they don’t, then when you try to mount, the DNS lookup will file the name hostname1 and use that to look up the SPN host/hostname1. If the SPN was called nfs/hostname2, then the Kerberos attempt will fail with “PRINCIPAL_UNKNOWN.” This is also true for Kerberos in CIFS/SMB environments. In ONTAP, a common mistake people make is they name the CIFS server or NFS Kerberos SPN as the SVM name (such as SVM1), but their DNS names are something totally different (such as cifs.domain.com).

Valid Kerberos SPNs and UPNs – When you join a Linux client to a domain, the machine account and SPNs are automatically created. However, the UPN is not created. Having no UPN on a machine account can create issues with some Linux services that use Kerberos keytab files to authenticate. For example, RedHat’s LDAP service (SSSD) can fail to bind if using a Kerberos service principal in the configuration via the ldap_sasl_authid option. The error you’d see would be “PRINCIPAL_UNKNOWN” and would drive you batty because it would be using a principal you *know* exists in your environment. That’s because it’s trying to find the UPN, not the SPN. You can manage the SPN and UPN via the Active Directory attributes tab in the advanced features view. You can query whether SPNs exist via the setspn command (use /q to query by SPN name) in the CLI or PowerShell.

PS C:\> setspn /q host/centos7.ntap.local
Checking domain DC=NTAP,DC=local
CN=CENTOS7,CN=Computers,DC=NTAP,DC=local
 HOST/centos7.ntap.local
 HOST/CENTOS7

Existing SPN found!

You can view a user’s UPN and SPN with the following PowerShell command:

PS C:\> Get-ADUser student1 -Properties UserPrincipalName,ServicePrincipalName

DistinguishedName : CN=student1,CN=Users,DC=NTAP,DC=local
Enabled : True
GivenName : student1
Name : student1
ObjectClass : user
ObjectGUID : d5d5b526-bef8-46fa-967b-00ebc77e468d
SamAccountName : student1
SID : S-1-5-21-3552729481-4032800560-2279794651-1108
Surname :
UserPrincipalName : student1@NTAP.local

And a machine account’s with:

PS C:\> Get-ADComputer CENTOS7$ -Properties UserPrincipalName,ServicePrincipalName

DistinguishedName : CN=CENTOS7,CN=Computers,DC=NTAP,DC=local
DNSHostName : centos7.ntap.local
Enabled : True
Name : CENTOS7
ObjectClass : computer
ObjectGUID : 3a50009f-2b40-46ea-9014-3418b8d70bdb
SamAccountName : CENTOS7$
ServicePrincipalName : {HOST/centos7.ntap.local, HOST/CENTOS7}
SID : S-1-5-21-3552729481-4032800560-2279794651-1140
UserPrincipalName : HOST/centos7.ntap.local@NTAP.LOCAL

Network Time Protocol (NTP) – With Kerberos, there is a 5 minute default time skew window. If a client and server/KDC’s time is outside of that window, Kerberos requests will fail with “Access denied” and you’d see time skew errors in the cluster logs. This KB covers it nicely:

https://kb.netapp.com/support/s/article/ka11A0000001V1YQAU/Troubleshooting-Workflow-CIFS-Authentication-failures?language=en_US

A common issue I’ve seen with this is time zone differences or daylight savings issues. I’ve often seen the wall clock time look identical on server and client, but the time zones or month/date differ, causing the skew.

The NTP requirement is actually a “make sure your time is up to date and in sync on everything” requirement, but NTP makes that easier.

Kerberos to UNIX name mappings – In ONTAP, we authenticate via name mappings not only for CIFS/SMB, but also for Kerberos. When a client attempts to send an authentication request to the cluster for an AS request or ST (service ticket) request, it has to map to a valid UNIX user. The UNIX user mapping will depend on what type of principal is coming in. If you don’t have a valid name mapping rule, you’d see something like this in the event log:

5/16/2017 10:24:23 ontap9-tme-8040-01
 ERROR secd.nfsAuth.problem: vserver (DEMO) General NFS authorization problem. Error: RPC accept GSS token procedure failed
 [ 8 ms] Acquired NFS service credential for logical interface 1034 (SPN='nfs/demo.ntap.local@NTAP.LOCAL').
 [ 11] GSS_S_COMPLETE: client = 'CENTOS7$@NTAP.LOCAL'
 [ 11] Trying to map SPN 'CENTOS7$@NTAP.LOCAL' to UNIX user 'CENTOS7$' using implicit mapping
 [ 12] Using a cached connection to oneway.ntap.local
**[ 14] FAILURE: User 'CENTOS7$' not found in UNIX authorization source LDAP.
 [ 15] Entry for user-name: CENTOS7$ not found in the current source: LDAP. Ignoring and trying next available source
 [ 15] Entry for user-name: CENTOS7$ not found in the current source: FILES. Entry for user-name: CENTOS7$ not found in any of the available sources
 [ 15] Unable to map SPN 'CENTOS7$@NTAP.LOCAL'
 [ 15] Unable to map Kerberos NFS user 'CENTOS7$@NTAP.LOCAL' to appropriate UNIX user

For service principals (SPNS) such as host/name or nfs/name, the mapping would try to default to primary/, so you’d need a UNIX user named host or nfs on the local SVM or in a name service like LDAP. Otherwise, you can create static krb-unix name mappings in the SVM to map to whatever user you like. If you want to use wild cards, regex, etc. you can do  that. For example, this name mapping rule will map all SPNs coming in as {MACHINE}$@REALM.COM to root.

cluster::*> vserver name-mapping show -vserver DEMO -direction krb-unix -position 1

Vserver: DEMO
 Direction: krb-unix
 Position: 1
 Pattern: (.+)\$@NTAP.LOCAL
 Replacement: root
IP Address with Subnet Mask: -
 Hostname: -

To test the mapping, use diag priv:

cluster::*> diag secd name-mapping show -node node1 -vserver DEMO -direction krb-unix -name CENTOS7$@NTAP.LOCAL

'CENTOS7$@NTAP.LOCAL' maps to 'root'

You can map the SPN to root, pcuser, etc. – as long as the UNIX user exists locally on the SVM or in the name service.

The workflow

Now that I’ve gotten some basics out of the way (and if you find that I’ve missed some, add to the comments), let’s look at how the workflow for an NFS mount using Kerberos would work, end to end. This is assuming we’ve configured everything correctly and are ready to mount, and that all the export policy rules allow the client to mount NFSv4 and Kerberos. If a mount fails, always check your export policy rules first.

Some common export policy issues include:

  • The export policy doesn’t have any rules configured
  • The vserver/SVM root volume doesn’t allow read access in the export policy rule for traversal of the / mount point in the namespace
  • The export policy has rules, but they are either misconfigured (clientmatch is wrong, read access disallowed, NFS protocol or auth method is disallowed) or they aren’t allowing the client to access the mount (Run export-policy rule show -instance)
  • The wrong/unexpected export policy has been applied to the volume (Run volume show -fields policy)

What’s unfortunate about trying to troubleshoot mounts with NFS Kerberos involved is that, regardless of the failures happening, the client will report:

mount.nfs: access denied by server while mounting

It’s a generic error and isn’t really helpful in diagnosing the issue.

In ONTAP, there is a command in admin privilege to check the export policy access for the client for troubleshooting purposes. Be sure to use it to rule out export issues.

cluster::> export-policy check-access -vserver DEMO -volume flexvol -client-ip 10.193.67.225 -authentication-method krb5 -protocol nfs4 -access-type read-write
 Policy Policy Rule
Path Policy Owner Owner Type Index Access
----------------------------- ---------- --------- ---------- ------ ----------
/ root vsroot volume 1 read
/flexvol default flexvol volume 1 read-write
2 entries were displayed.

The mount command is issued.

In my case, I use NFSv4.x, as that’s the security standard. Mounting without specifying a version will default to the highest NFS version allowed by the client and server, via a client-server negotiation. If NFSv4.x is disabled on the server, the client will fall back to NFSv3.

# mount -o sec=krb5 demo:/flexvol /mnt

Once the mount command gets issued and Kerberos is specified, a few (ok, a lot of) things happen in the background.

While this stuff happens, the mount command will appear to “hang” as the client, KDC and server suss out if you’re going to be allowed access.

  • DNS lookups are done for the client hostname and server hostname (or reverse lookup of the IP address) to help determine what names are going to be used. Additionally, SRV lookups are done for the LDAP service and Kerberos services in the domain. DNS lookups are happening constantly through this process.
  • The client uses its keytab file to send an authentication service request (AS-REQ) to the KDC, along with what enctypes it has available. The KDC then verifies if the requested principal actually exists in the KDC and if the enctypes are supported.
  • If the enctypes are not supported, or if the principal exists, or if there are DUPLICATE principals, the AS-REQ fails. If the principal exists, the KDC will send a successful reply.
  • Then the client will send a Ticket Granting Service request (TGS-REQ) to the KDC. This request is an attempt to look up the NFS service ticket named nfs/name. The name portion of the ticket is generated either via what was typed into the mount command (ie, demo) or via reverse lookup (if we typed in an IP address to mount). The TGS-REQ will be used later to allow us to obtain a service ticket (ST). The TGS will also negotiate supported enctypes for later. If the TGS-REQ between the KDC and client negotiates an enctype that ONTAP doesn’t support (for example, ARCFOUR), then the mount will fail later in process.
  • If the TGS-REQ succeeds, a TGS-REP is sent. If the KDC doesn’t support the requested enctypes from the client, we fail here. If the NFS principal doesn’t exist (remember, it has to be in DNS and match exactly), then we fail.
  • Once the TGS is acquired by the NFS client, it presents the ticket to the NFS server in ONTAP via a NFS NULL call. The ticket information includes the NFS service SPN and the enctype used. If the NFS SPN doesn’t match what’s in “kerberos interface show,” the mount fails. If the enctype presented by the client isn’t supported or is disallowed in “permitted enctypes” on the NFS server, the request fails. The client would show “access denied.”
  • The NFS service SPN sent by the client is presented to ONTAP. This is where the krb-unix mapping takes place. ONTAP will first see if a user named “nfs” exists in local files or name services (such as LDAP, where a bind to the LDAP server and lookup takes place). If the user doesn’t exist, it will then check to see if any krb-unix name mapping rules were set explicitly. If no rules exist and mapping fails, ONTAP logs an error on the cluster and the mount fails with “Access denied.” If the mapping works, the mount procedure moves on to the next step.
  • After the NFS service ticket is verified, the client will send SETCLIENTID calls and then the NFSv4.x mount compound call (PUTROOTFH | GETATTR). The client and server are also negotiating the name@domainID string to make sure they match on both sides as part of NFSv4.x security.
  • Then, the client will try to run a series of GETATTR calls to “/” in the path. If we didn’t allow “read” access in the policy rule for “/” (the vsroot volume), we fail. If the ACLs/mode bits on the vsroot volume don’t allow at least traverse permissions, we fail. In a packet trace, we can see that the vsroot volume has only traverse permissions:
    V4 Reply (Call In 268) ACCESS, [Access Denied: RD MD XT], [Allowed: LU DL]

    We can also see that from the cluster CLI (“Everyone” only has “Execute” permissions in this NTFS security style volume):

    cluster::> vserver security file-directory show -vserver DEMO -path / -expand-mask true
    
    Vserver: DEMO
     File Path: /
     File Inode Number: 64
     Security Style: ntfs
     Effective Style: ntfs
     DOS Attributes: 10
     DOS Attributes in Text: ----D---
    Expanded Dos Attributes: 0x10
     ...0 .... .... .... = Offline
     .... ..0. .... .... = Sparse
     .... .... 0... .... = Normal
     .... .... ..0. .... = Archive
     .... .... ...1 .... = Directory
     .... .... .... .0.. = System
     .... .... .... ..0. = Hidden
     .... .... .... ...0 = Read Only
     UNIX User Id: 0
     UNIX Group Id: 0
     UNIX Mode Bits: 777
     UNIX Mode Bits in Text: rwxrwxrwx
     ACLs: NTFS Security Descriptor
     Control:0x9504
    
    1... .... .... .... = Self Relative
     .0.. .... .... .... = RM Control Valid
     ..0. .... .... .... = SACL Protected
     ...1 .... .... .... = DACL Protected
     .... 0... .... .... = SACL Inherited
     .... .1.. .... .... = DACL Inherited
     .... ..0. .... .... = SACL Inherit Required
     .... ...1 .... .... = DACL Inherit Required
     .... .... ..0. .... = SACL Defaulted
     .... .... ...0 .... = SACL Present
     .... .... .... 0... = DACL Defaulted
     .... .... .... .1.. = DACL Present
     .... .... .... ..0. = Group Defaulted
     .... .... .... ...0 = Owner Defaulted
    
    Owner:BUILTIN\Administrators
     Group:BUILTIN\Administrators
     DACL - ACEs
     ALLOW-NTAP\Domain Admins-0x1f01ff-OI|CI
     0... .... .... .... .... .... .... .... = Generic Read
     .0.. .... .... .... .... .... .... .... = Generic Write
     ..0. .... .... .... .... .... .... .... = Generic Execute
     ...0 .... .... .... .... .... .... .... = Generic All
     .... ...0 .... .... .... .... .... .... = System Security
     .... .... ...1 .... .... .... .... .... = Synchronize
     .... .... .... 1... .... .... .... .... = Write Owner
     .... .... .... .1.. .... .... .... .... = Write DAC
     .... .... .... ..1. .... .... .... .... = Read Control
     .... .... .... ...1 .... .... .... .... = Delete
     .... .... .... .... .... ...1 .... .... = Write Attributes
     .... .... .... .... .... .... 1... .... = Read Attributes
     .... .... .... .... .... .... .1.. .... = Delete Child
     .... .... .... .... .... .... ..1. .... = Execute
     .... .... .... .... .... .... ...1 .... = Write EA
     .... .... .... .... .... .... .... 1... = Read EA
     .... .... .... .... .... .... .... .1.. = Append
     .... .... .... .... .... .... .... ..1. = Write
     .... .... .... .... .... .... .... ...1 = Read
    
    ALLOW-Everyone-0x100020-OI|CI
     0... .... .... .... .... .... .... .... = Generic Read
     .0.. .... .... .... .... .... .... .... = Generic Write
     ..0. .... .... .... .... .... .... .... = Generic Execute
     ...0 .... .... .... .... .... .... .... = Generic All
     .... ...0 .... .... .... .... .... .... = System Security
     .... .... ...1 .... .... .... .... .... = Synchronize
     .... .... .... 0... .... .... .... .... = Write Owner
     .... .... .... .0.. .... .... .... .... = Write DAC
     .... .... .... ..0. .... .... .... .... = Read Control
     .... .... .... ...0 .... .... .... .... = Delete
     .... .... .... .... .... ...0 .... .... = Write Attributes
     .... .... .... .... .... .... 0... .... = Read Attributes
     .... .... .... .... .... .... .0.. .... = Delete Child
     .... .... .... .... .... .... ..1. .... = Execute
     .... .... .... .... .... .... ...0 .... = Write EA
     .... .... .... .... .... .... .... 0... = Read EA
     .... .... .... .... .... .... .... .0.. = Append
     .... .... .... .... .... .... .... ..0. = Write
     .... .... .... .... .... .... .... ...0 = Read
  • If we have the appropriate permissions to traverse “/” then the NFS client attempts to find the file handle for the mount point via a LOOKUP call, using the file handle of vsroot in the path. It would look something like this:
    V4 Call (Reply In 271) LOOKUP DH: 0x92605bb8/flexvol
  • If the file handle exists, it gets returned to the client:
    fh.png
  • Then the client uses that file handle to run GETATTRs to see if it can access the mount:
    V4 Call (Reply In 275) GETATTR FH: 0x1f57355e

If all is clear, our mount succeeds!

But we’re not done… now the user that wants to access the mount has to go through another ticket process. In my case, I used a user named “student1.” This is because a lot of the Kerberos/NFSv4.x requests I get are generated by universities interested in setting up multiprotocol-ready home directories.

When a user like student1 wants to get into a Kerberized NFS mount, they can’t just cd into it. That would look like this:

# su student1
sh-4.2$ cd /mnt
sh: cd: /mnt: Not a directory

Oh look… another useless error! If I were to take that error literally, I would think “that mount doesn’t even exist!” But, it does:

sh-4.2$ mount | grep mnt
demo:/flexvol on /mnt type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.193.67.225,local_lock=none,addr=10.193.67.219)

What that error actually means is that the user requesting access does not have a valid Kerberos AS ticket (login) to make the request for a TGS (ticket granting ticket) to get a service ticket for NFS (nfs/server-hostname). We can see that via the klist -e command.

sh-4.2$ klist -e
klist: Credentials cache keyring 'persistent:1301:1301' not found

Before you can get into a mount that is only allowing Kerberos access, you have to get a Kerberos ticket. On Linux, you can do that via the kinit command, which is akin to a Windows login.

sh-4.2$ kinit
Password for student1@NTAP.LOCAL:
sh-4.2$ klist -e
Ticket cache: KEYRING:persistent:1301:1301
Default principal: student1@NTAP.LOCAL

Valid starting Expires Service principal
05/16/2017 15:54:01 05/17/2017 01:54:01 krbtgt/NTAP.LOCAL@NTAP.LOCAL
 renew until 05/23/2017 15:53:58, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96

Now that I have a my ticket, I can cd into the mount. When I cd into a Kerberized NFS mount, the client will make TGS requests to the KDC (seen in the trace in packet 101) for the service ticket. If that process is successful, we get access:

sh-4.2$ cd /mnt
sh-4.2$ pwd
/mnt
sh-4.2$ ls
c0 c1 c2 c3 c4 c5 c6 c7 newfile2 newfile-nfs4
sh-4.2$ klist -e
Ticket cache: KEYRING:persistent:1301:1301
Default principal: student1@NTAP.LOCAL

Valid starting Expires Service principal
05/16/2017 15:55:32 05/17/2017 01:54:01 nfs/demo.ntap.local@NTAP.LOCAL
 renew until 05/23/2017 15:53:58, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
05/16/2017 15:54:01 05/17/2017 01:54:01 krbtgt/NTAP.LOCAL@NTAP.LOCAL
 renew until 05/23/2017 15:53:58, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96

Now we’re done. (at least until our tickets expire…)

 

May 2016 update to TR-4073 (the NetApp NFS Kerberos/LDAP manifesto)

It’s time for new technical report updates!

koolaid

Since clustered Data ONTAP 8.3.2 is now available, we are publishing our 8.3.2 updates to our docs. I finally got the updates added to TR-4073: Secure Unified Authentication.

What is Secure Unified Authentication?

Secure Unified Authentication is a solution-based methodology to provide secure (via Kerberos) unified (via central LDAP servers for identity management) authentication for enterprise IT environments.

Security is more important than ever, so using a ticket-based auth process instead of over-the-wire passwords is one way to ensure you have protected your business assets. With AES-256 encryption, you are using the strongest available enctype for Kerberos.

Ease of management is also critical to an ever changing IT landscape. LDAP for Identity Management makes user account management and NAS permissioning easier.

What’s new?

  • PowerShell alternatives to LDIFDE queries
  • Extended GIDs best practices
  • Improved asymmetric name mapping support in LDAP
  • Expanding all hosts in a netgroup information
  • Vserver security command examples and information (fsecurity command parity)
  • Improved RFC-2307bis informati0n
  • Bind DN examples
  • LDAP terminology

Where can I find it?

Technical reports can be found a variety of ways. Google search works, as does looking in the NetApp library. I cover how to be better at NetApp documentation in a separate blog post. I also have a yet-unfinished series on LDAP here:

LDAP::What the heck is an LDAP anyway? – Part 1: Intro

To make it super easy, just follow this link:

TR-4073: Secure Unified Authentication

TECH:: Amazon AWS: File services enter the cloud!

This week at Amazon AWS Summit, the new Amazon Elastic File System (EFS) was announced.

From the EFS product page:

Amazon Elastic File System (Amazon EFS) is a file storage service for Amazon Elastic Compute Cloud (Amazon EC2) instances. Amazon EFS is easy to use and provides a simple interface that allows you to create and configure file systems quickly and easily. With Amazon EFS, storage capacity is elastic, growing and shrinking automatically as you add and remove files, so your applications have the storage they need, when they need it.

Amazon EFS supports the Network File System version 4 (NFSv4) protocol, so the applications and tools that you use today work seamlessly with Amazon EFS. Multiple Amazon EC2 instances can access an Amazon EFS file system at the same time, providing a common data source for workloads and applications running on more than one instance.

This is HUGE news – and not just for Amazon. As the Technical Marketing Engineer for NFS at NetApp, you can imagine how excited I am about this.

Why?

Amazon has just validated using NAS in the cloud.

For the longest time, cloud access has been limited to object-based storage and REST APIs. Now, users can access data via file-based NFSv4 through Amazon Web Services. It is time to…

enter-the-cloud

To read more, see my post on DataCenterDude.com!

Amazon Announces EFS

LDAP::What the heck is an LDAP anyway? – Part 1: Intro

Where’d I put that UID? Source: ZDNet.com

In my time in technical support and as the Technical Marketing Engineer at NetApp, I’ve come to realize that LDAP is one misunderstood technology, which is unfortunate, because I am seeing a large uptick in people who are using it in enterprise environments. As a result, I plan on starting a series of LDAP related posts based on this one, which was originally posted in February of 2015. These will be listed in TECH and LDAP Categories on this blog.

Some topics I’ll be covering (which will become links as they are written):

NOTE: I’m going to keep this real high-level and not bore you with all the details because we could get into the weeds real fast. You can always find copious amount of less interesting information out there in my Technical Reports or on your preferred search engine. If you have specific questions or want me to add something to this entry, hit me up on Twitter @NFSDudeAbides.

What is LDAP?

LDAP is also known as “Lightweight Directory Access Protocol.” It is, essentially, a database that acts as a phone book for all sorts of information for clients and servers in enterprise NAS environments. Some of these include:

  • Names
  • Addresses
  • Phone numbers
  • Unique numerical identifiers (SID, UID, GID, UUID, etc)
  • And many more!

Is Microsoft Active Directory LDAP?

Yes. And no. And maybe.

Microsoft Active Directory uses LDAP for its backend. It leverages LDAP RFC-2307 schema standards and populates records in its database with all sorts of good information about its objects.

For example, this machine account:

machineaccount

Notice it has all sorts of data about that machine account, including the servicePrincipalName (SPN) that is critical to leveraging Kerberos in both Microsoft and non-Microsoft KDCs.

However, while Microsoft is a using LDAP and technically could be called LDAP, it’s not LDAP in the traditional sense. That is reserved for the use of UNIX style attributes for users, groups, netgroups and so on. By default, Microsoft does not apply these schema attributes to its implementation of LDAP. However, the schema can be modified or extended to add attributes to populate things like UID, GID, etc.

How does it work?

LDAP functionality works like this, in a simplistic view.

  • A database is populated with objects and schema attributes.
  • LDAP clients are configured to connect to the LDAP server. This normally means they use port 389 or 636 (LDAP over SSL). Sometimes, the Global Catalog port 3268 can be used when using AD LDAP.
  • When a client connects to LDAP, it tries to bind based on the configuration. This can be SASL (such as GSSAPI, DIGEST-MD5, etc), simple or anonymous, depending on what the client and server are configured for/support.
  • After a bind, a lookup is done via RFC standards using ldapsearch commands and filter strings.
  • If the object and requested attributes exist, they are returned to the client.

Pretty straightforward. The hard part is remembering it. 🙂

What is it used for?

LDAP is used for enterprise NAS environments that want to centralize their identity management rather than keeping track of n number of passwd, group, netgroup and other files. Having LDAP service the clients allows standardized and automated client rollout. All atomic updates are done from the server side and replicated across multiple servers. All a client has to do is connect and search.

In TR-4073: Secure Unified Authentication, I cover LDAP as it pertains to NetApp’s clustered Data ONTAP. It focuses mainly on Microsoft Active Directory as an LDAP server, but also includes information on RedHat’s Directory Server (which is THE best UNIX-based LDAP server I’ve used so far).

Additionally, a new name service best practice guide was just released today. Check out TR-4379: Name Services Best Practices for more info.

TECH::What Super Bowl 49 Taught Us About Transitioning from NFSv3 to NFSv4.x

I’m sure most everyone either watched Super Bowl 49 or at least read about it somewhere. The Seattle Seahawks and New England Patriots clashed down to the wire for one of the most thrilling games in recent history. Both sides played fairly well, but not without mistakes. And as most of us know, it all came down to a final decision – do we run or pass from the 1 yard line?

The Seahawks chose to pass and the rest was history.

So what lessons can storage administrators learn about NFS from football (or as some call it, sportball)? In this case, we’ll call NFSv3 “the run” and NFSv4.x? The forward pass.

The forward pass: A little history

American football was not always the game it is today. The sport evolved from rugby and the first American football game was played in 1869. Like rugby, it was a run-heavy, violent game. Keep in mind that these used to be the helmets, and helmets weren’t even worn in the early games:

Yeah… that’s not gonna hurt, right? Source: AntiqueAthlete.com

In 1906, a rule change was added to try to help make the game less… bloody.

Mutant League Football – Source: Game Informer

As a result, the game opened up and became more exciting with higher scores and it added a facet of strategy to the game. Now it wasn’t just about simple brute force – you had to try to guess what the other team was going to do based on the situation.

NFS: Not Football Specific

NFS actually stands for Network File System and is used in enterprise environments across the world for file storage. Everything from music to movies to medical images to seismic data to virtual machines are hosted on NFS storage (preferably NetApp :-P). There are plenty of resources on what it is out there, starting with the Request For Comments (RFC) standards. As far as NFS on NetApp, I cover it in some detail in the following technical reports (Light reading, especially if you’re having trouble sleeping.):

TR-3580: NFSv4 Enhancements and Best Practices Guide: Data ONTAP Implementation

TR-4063: Parallel Network File System Configuration and Best Practices for Clustered Data ONTAP 8.2 and Later

TR-4067: Clustered Data ONTAP NFS Best Practice and Implementation Guide

TR-4073: Secure Unified Authentication

TR-4379: Name Services Best Practices

But NFS, and NAS storage administration, has some things in common with football – you have to decide on what to do based on every situation.

NFSv4.x and the forward pass

If we look at the decision made last night not to run  the ball on 2nd and goal from the 1 yard line with a player nicknamed “Beast Mode” (how do you NOT run it??) in terms of transitioning from the generally rock-solid NFSv3 to NFSv4.x, we can make the following analogies.

  • Do I give the ball to my best player (NFSv3) that has been dependable for years and see what happens?
    I know it could backfire on me, and he does have his flaws…
  • Do I throw a forward pass (NFSv4.x) without really thinking about the immediate consequences of just jumping into that decision?
    I could misconfigure the pass and that would be it for my enterprise data storage…
  • Do I call a timeout and re-think the play design?
    We are running out of time. We need to make the best decision for our business!

Planning the transition to NFSv4.x

Like any good coach or storage administrator, we need to come into every game prepared. We need to think out all possible scenarios and plan for them accordingly, but also be flexible enough to adjust on the fly if unforeseen circumstances occur. After all, unforeseen circumstances are, by definition, unforeseen.

In TR-4067, I cover some considerations that need to be made when deciding to move from NFSv3 to NFSv4.x. The following is a sneak preview. Keep in mind that this section may change in the final release of the TR, but the general concepts remain.

Transitioning from NFSv3 to NFSv4.x: Considerations

The following section covers some considerations that need to be addressed when migrating from NFSv3 to NFSv4.x. When choosing to use NFSv4.x after using NFSv3, you cannot simply turn it on and have it work as expected. There are specific items to address, such as:

  • Domain strings/ID mapping
  • Storage failover considerations
  • Name services
  • Firewall considerations
  • Export policy rule considerations
  • Client support
  • NFSv4.x features and functionality

ID Domain Mapping

While customers prepare to migrate their existing setup and infrastructure from NFSv3 to NFSv4, some environmental changes must be made before moving to NFSv4. One of them is “id domain mapping.”

In clustered Data ONTAP 8.1, a new option called v4-id-numerics was added. With this option enabled, even if the client does not have access to the name mappings, numeric IDs can be sent in the user name and group name fields and the server accepts them and treats them as representing the same user as would be represented by a v2/v3 UID or GID having the corresponding numeric value.

Essentially, this makes NFSv4.x behave more like NFSv3. This also removes the security enhancement of forcing ID domain resolution for NFSv4.x name strings, so whenever possible, keep this option as the default of disabled. If a name mapping for the user is present, however, the name string will be sent across the wire rather than the UID/GID. The intent of this option is to ensure the server never sends “nobody” as a response to credential queries in NFS requests.

To access this command prior to clustered Data ONTAP 8.3, you must be in diag mode. Commands related to diag mode should be used with caution.

Some production environments have the challenge to build new naming service infrastructures like NIS or LDAP for string-based name mapping to be functional in order to move to NFSv4. With the new “numeric_id” option, setting name services does not become an absolute requirement. The “numeric_id” feature must be supported and enabled on the server as well as on the client. With this option enabled, the user and groups exchange UIDs/GIDs between the client and server just as with NFSv3. However, for this option to be enabled and functional, NetApp recommends having a supported version of the client and the server. For client versions that support numeric IDs with NFSv4, please contact the OS vendor.

Note that -v4-id-numerics should be enabled only if the client supports it.

Storage failover considerations

NFSv4.x uses a completely different locking model than NFSv3. Locking in NFSv4.x is a lease-based model that is integrated into the protocol itself rather than separated out as it is in NFSv3 (NLM). From the ONTAP documentation:

In accordance with RFC 3530, Data ONTAP “defines a single lease period for all state held by an NFS client. If the client does not renew its lease within the defined period, all states associated with the client’s lease may be released by the server.” The client can renew its lease explicitly or implicitly by performing an operation, such as reading a file.

Furthermore, Data ONTAP defines a grace period, which is a period of special processing in which clients attempt to reclaim their locking state during a server recovery.

Term Definition (as per RFC-3530)
Lease The time period in which Data ONTAP irrevocably grants a lock to a client.
Grace period The time period in which clients attempt to reclaim their locking state from Data ONTAP during server recovery.
Lock Refers to both record (byte-range) locks as well as file (share) locks unless specifically stated otherwise.

For more information on locking, see the section in this document on NFSv4.x locking. Because of this new locking methodology, as well as the statefulness of the NFSv4.x protocol, storage failover operates differently as compared to NFSv3. For more information, see the section in this document on storage failover and in clustered Data ONTAP.

Name services

If deciding to use NFSv4.x, it is a best practice to centralize the NFSv4.x users in name services such as LDAP or NIS. This allows all clients and clustered Data ONTAP NFS servers to leverage the same resources and guarantees that all names, UID and GIDs will be consistent across the implementation. For more information on name services, see TR-4073: Secure Unified Authentication and TR-4379: Name Services Best Practices.

Firewall considerations

NFSv3 required several ports to be opened for ancillary protocols such as NLM, NSM, etc. in addition to port 2049. NFSv4.x requires only port 2049. If wishing to use NFSv3 and NFSv4.x in the same environment, all relevant NFS ports should be opened. These ports are referenced in this document.

Volume language considerations

In NetApp’s Data ONTAP, volumes can have specific languages set. This is intended to be used for internationalization of file names for languages that use characters not common to English, such as Japanese, Chinese, German, etc. When using NFSv4.x, RFC-3530 states that UTF-8 is recommended.

11.  Internationalization

   The primary issue in which NFS version 4 needs to deal with
   internationalization, or I18N, is with respect to file names and
   other strings as used within the protocol.  The choice of string
   representation must allow reasonable name/string access to clients
   which use various languages.  The UTF-8 encoding of the UCS as
   defined by [ISO10646] allows for this type of access and follows the
   policy described in "IETF Policy on Character Sets and Languages",
   [RFC2277].

If you intend to migrate to clustered Data ONTAP from a 7-mode system and use NFSv4.x, you should be using some form of UTF-8 language support, such as C.UTF-8 (which is the default language of volumes in clustered Data ONTAP). If the 7-mode system is not already using a UTF-8 language, then it should be converted before transitioning to cDOT or when intending to transition from NFSv3 to NFSv4. The exact UTF-8 language specified will depend on the specific requirements of the native language to ensure proper display of character sets.

7-mode allowed volumes that hosted NFSv4.x data to use C language types. Clustered Data ONTAP does not, as it honors the RFC standard recommendation of UTF-8. TR-4160: Secure Multi-tenancy Considerations covers language recommendations in cDOT. When changing a volume’s language, every file in the volume must be accessed after the change to ensure they all reflect the language change. This can be done via a simple “ls -lR” to do a recursive listing of files.

Export policy rules

In clustered Data ONTAP, it is possible to specify which version of NFS is supported for an exported filesystem. If an environment was configured for NFSv3 and the export policy rule option –protocol was limited to allow NFSv3 only, then it would need to be modified to allow NFSv4. Additionally, policy rules could be configured to allow only NFSv4.x clients access.

Example:

cluster::> export-policy rule modify -policy default -vserver NAS -protocol nfs4For more information, consult the product documentation for your specific version of clustered Data ONTAP.

Client considerations

When using NFSv4.x, clients are as important to consider as the NFS server. The following client considerations should be followed when implementing NFSv4.x. Other considerations may be necessary. Please contact the OS vendor for specific questions about NFSv4.x configuration.

  • x is supported.
  • The fstab file and NFS configuration files are configured properly. When mounting, the client will negotiate the highest NFS version available with the NFS server. If NFSv4.x is not allowed by the client or fstab specifies NFSv3, then NFSv4.x will not be used at mount.
  • The idmapd.conf file is configured with the proper settings.
  • The client either contains identical users and UID/GID (including case sensitivity) or is using the same name service server as the NFS server/clustered Data ONTAP SVM.
  • If using name services on the client, the client is configured properly for name services (nsswitch.conf, ldap.conf, sssd.conf, etc.) and the appropriate services are started, running and configured to start at boot.
  • The NFSv4.x service is started, running and configured to start at boot.

NFSv4.x Features and Functionality

NFSv4.x is the next evolution of the NFS protocol and enhances NFSv3 with new features and functionality, such as referrals, delegations, pNFS, etc. These features are covered throughout this document and should be factored in to any design decisions for NFSv4.x implementations.

If done right and you make the right play calls, NFSv4.x could bring home the championship for  your NAS storage environment!