Securing NFS mounts in a Docker container

docker_security-e1530093759599

Setting up Kerberized NFS on a client can be a bit challenging, especially if you’re trying to do it across multiple hosts. So, I decided I wanted to take on the challenge of creating an easy to deploy Docker container, using NetApp’s Trident plugin to make life even easier.

Why do I want Kerberos?

dockersecurity-476x192

With Kerberos on NFS mounts, you can encrypt traffic for authentication (krb5), integrity checking (krb5i) and for end-to end packet encryption (krb5p). I covered the benefits of using krb5p in a previous blog. I also covered how to use NFS with the new FlexGroup driver in Docker + NFS + FlexGroup volumes = Magic!

I also cover krb5p in Encrypt your NFS packets end to end with krb5p and ONTAP 9.2!

But this blog covers FlexVols only, since FlexGroup volumes can’t use NFSv4.x – yet.

Why do I want NFSv4.x?

NFSv3 is a great protocol, but it has some disadvantages when it comes to locking and security. For starters, v3 is stateless. NFSv4.x is stateful and manages locks much better, since it’s done on a lease basis and is integrated in the protocol itself, while v3 has ancillary services that manage locks.

Those ancillary services (like NLM, mountd, portmap) are also what makes NFSv3 less secure than NFSv4.x. More services = more ports to open on a firewall (your network guy hates you, btw). Additionally, standard in-flight encryption for NAS protocols, such as Kerberos, don’t encrypt the ancillary services – it only encrypts the NFS packets. NFSv4.x also has additional layers of security via NFSv4.x ACLs, as well as ID domain name mapping to ensure the client and server agree on who is who when accessing NFS mounts.

The main downside of NFSv4.x is performance. It currently lags behind NFSv3 for a variety of reasons, but mostly because it has to do more in each packet than NFSv3 had to, and being a stateful protocol can be costly to performance. When you lump in encryption, it adds more overhead. Things are getting better, however, and soon, there won’t be any excuse for not using NFSv4.x as your standard NFS version.

What you need before you start

In this example, I’m going to configure Kerberos, NFSv4.1 and LDAP on a single container. This allows me to have all the moving parts I’d need to get it working out of the gate. I’m using CentOS7.x/RHEL7.x as the Docker host and container base, as well as Microsoft Active Directory 2012R2 for LDAP UNIX identities and Kerberos KDC functionality. You can use Linux-based LDAP and KDCs, but that’s outside the scope of what this blog is about.

Before you get started, you need the following.

  • Active Directory configured to use LDAP for UNIX identity mapping
  • A server/VM running the latest CentOS/RHEL version (our Docker host)
  • A NetApp ONTAP cluster with a SVM running NFS on it

Configuring the ONTAP SVM

Before you can get started with NFS Kerberos on the client, you’ll need to configure NFS Kerberos in ONTAP. Doing this essentially comes down to the following steps:

  • Create a Kerberos realm
  • Configure DNS
  • AES encryption types allowed on the NFS server
  • Create a Kerberos interface (this creates a machine object in AD that has your SPN for NFS server ticket functionality and adds the keytab to the cluster/SVM)
  • Create a local UNIX user named “nfs” (this maps to the nfs/service account when Kerberos mounts are attempted)
  • Create a generic name mapping rule for all machine accounts (when you join a container to the Kerberos realm, it creates a new machine account with the format of [imagehexname]$@REALM.COM. Having a generic name mapping rule will eliminate headaches trying to manage that)
  • Create an export policy and rule that allows Kerberos authentication/v4.x for NFS
  • (optional, but recommended) LDAP server configuration that matches the client (this makes life much easier with NFSv4.x)
  • Configure the NFSv4 ID domain option and enable NFSv4.0/4.1

These steps are all covered in pretty good detail in TR-4073 (the unabridged version) and TR-4616 (the more streamlined version), so I won’t cover them here. Instead, I’ll show you how my cluster is configured.

Kerberos realm

::*> kerberos realm show -vserver DEMO -instance
(vserver nfs kerberos realm show)

Vserver: DEMO
Kerberos Realm: NTAP.LOCAL
KDC Vendor: Microsoft
KDC IP Address: 10.x.x.x
KDC Port: 88
Clock Skew: 5
Active Directory Server Name: oneway.ntap.local
Active Directory Server IP Address: 10.x.x.x
Comment: -
Admin Server IP Address: 10.x.x.x
Admin Server Port: 749
Password Server IP Address: 10.x.x.x
Password Server Port: 464
Permitted Encryption Types: aes-128, aes-256

DNS

::*> dns show -vserver DEMO -instance

Vserver: DEMO
Domains: NTAP.LOCAL
Name Servers: 10.x.x.x
Timeout (secs): 5
Maximum Attempts: 1
Is TLD Query Enabled?: true
Require Source and Reply IPs to Match: true
Require Packet Queries to Match: true

AES encryption types allowed on the NFS server

::*> nfs server show -vserver DEMO -fields permitted-enc-types
vserver permitted-enc-types
------- -------------------
DEMO aes-128,aes-256

Kerberos interface

::*> kerberos interface show -vserver DEMO -instance
(vserver nfs kerberos interface show)

Vserver: DEMO
Logical Interface: data
IP Address: 10.x.x.x
Kerberos Enabled: enabled
Service Principal Name: nfs/demo.ntap.local@NTAP.LOCAL
Permitted Encryption Types: aes-128, aes-256
Machine Account Name: -

Vserver: DEMO
Logical Interface: data2
IP Address: 10.x.x.x
Kerberos Enabled: enabled
Service Principal Name: nfs/demo.ntap.local@NTAP.LOCAL
Permitted Encryption Types: aes-128, aes-256
Machine Account Name: -
2 entries were displayed.

UNIX user named NFS

::*> unix-user show -vserver DEMO -user nfs -instance

Vserver: DEMO
User Name: nfs
User ID: 500
Primary Group ID: 500
User's Full Name:

Generic name mapping rule for Kerberos SPNs

::*> vserver name-mapping show -vserver DEMO -direction krb-unix -instance

Vserver: DEMO
Direction: krb-unix
Position: 1
Pattern: (.+)\$@NTAP.LOCAL
Replacement: root
IP Address with Subnet Mask: -
Hostname: -

Export policy rule that allows Kerberos/NFSv4.x

::*> export-policy rule show -vserver DEMO -policyname kerberos -instance

Vserver: DEMO
Policy Name: kerberos
Rule Index: 1
Access Protocol: nfs4
List of Client Match Hostnames, IP Addresses, Netgroups, or Domains: 0/0
RO Access Rule: krb5, krb5i, krb5p
RW Access Rule: krb5, krb5i, krb5p
User ID To Which Anonymous Users Are Mapped: 65534
Superuser Security Types: any
Honor SetUID Bits in SETATTR: true
Allow Creation of Devices: true
NTFS Unix Security Options: fail
Vserver NTFS Unix Security Options: use_export_policy
Change Ownership Mode: restricted
Vserver Change Ownership Mode: use_export_policy
Policy ID: 42949672971

LDAP client config (optional, but recommended if you plan to use NFSv4.x)

::*> ldap client show -client-config DEMO -instance

Vserver: DEMO
Client Configuration Name: DEMO
LDAP Server List: -
(DEPRECATED)-LDAP Server List: -
Active Directory Domain: NTAP.LOCAL
Preferred Active Directory Servers: -
Bind Using the Vserver's CIFS Credentials: true
Schema Template: MS-AD-BIS
LDAP Server Port: 389
Query Timeout (sec): 3
Minimum Bind Authentication Level: sasl
Bind DN (User): mtuser
Base DN: DC=NTAP,DC=local
Base Search Scope: subtree
User DN: -
User Search Scope: subtree
Group DN: -
Group Search Scope: subtree
Netgroup DN: -
Netgroup Search Scope: subtree
Vserver Owns Configuration: true
Use start-tls Over LDAP Connections: false
Enable Netgroup-By-Host Lookup: false
Netgroup-By-Host DN: -
Netgroup-By-Host Scope: subtree
Client Session Security: none
LDAP Referral Chasing: false
Group Membership Filter: -

To test LDAP functionality on the cluster, use the following command in advanced privilege to look up a user. If you get a UID/GID, you’re good to go.

::*> getxxbyyy getpwbyname -node ontap9-tme-8040-01 -vserver DEMO -username prof1
(vserver services name-service getxxbyyy getpwbyname)
pw_name: prof1
pw_passwd:
pw_uid: 1100
pw_gid: 1101
pw_gecos:
pw_dir:
pw_shell:

Configure/enable NFSv4.x

::*> nfs server show -vserver DEMO -fields v4-id-domain,v4.1,v4.0
vserver v4.0 v4-id-domain v4.1
------- ------- ------------ -------
DEMO enabled NTAP.LOCAL enabled

Once the cluster SVM is set up, there shouldn’t be much else, if anything, that needs to be done for Kerberos on the cluster. However, in AD, you’ll want to allow only AES for the NFS server machine account with this simple PowerShell command:

PS C:\> Set-ADComputer NFS-KRB-NAME$ -KerberosEncryptionType AES256,AES128

Configuring the Docker host

To get NFSv4.x to work properly in a container, you’ll need to make a decision about your Docker host. What I found in my testing is that containers running NFSv4.x want to use the Docker host’s ID mappings/users when doing NFSv4.x functions, rather than the container’s. So while the container may be able to pull users from LDAP and write files as those users, you will see NFSv4.x owners and groups that the Docker *host* cannot resolve appear as “nobody.”

sh-4.2$ ls -la
total 8
drwxrwxrwx. 2 root root 4096 Aug 14 21:08 .
drwxr-xr-x. 18 root root 4096 Aug 15 15:06 ..
-rw-r--r--. 1 nobody nobody 0 Aug 13 21:38 newfile

So, if you want NFSv4.x to resolve names properly (and I suspect you do), then you need to do one of the following on the Docker host:

a) Add users and groups locally to the passwd/group files

b) Configure SSSD to query LDAP

Naturally, I like things to be consistent, so I chose option b.

Since we already have an LDAP server in AD, we can just install/configure sssd to use that. Here’s what you’d do…

Install necessary packages

I like using realm and sssd. It’s fun. It’s easy. This is what you need to do that.

yum -y install realmd sssd oddjob oddjob-mkhomedir adcli samba-common krb5-workstation ntp

Configure DNS (/etc/resolv.conf)

This needs to point to the name servers in your AD domain.

Create a generic machine account in AD for LDAP authentication

This will be how our LDAP clients bind. You can use it for more than one client if you like. This can be done via PowerShell.

PS C:\> import-module activedirectory
PS C:\> New-ADComputer -Name [computername] -SAMAccountName [computername] -DNSHostName
[computername.dns.domain.com] -OtherAttributes @{'userAccountControl'= 2097152;'msDSSupportedEncryptionTypes'=27}

Create a keytab file to copy to the Docker host to use for LDAP binds

This is done on the AD domain controller in a CLI window. Use ktpass and this syntax:

ktpass -princ primary/instance@REALM -mapuser [DOMAIN]\machine$ -crypto AES256-SHA1 +rndpass -ptype KRB5_NT_PRINCIPAL +Answer -out [file:\location]

Copy the keytab file to your Docker host

WinSCP is a good tool to do this. It should live as /etc/krb5.keytab

Configure /etc/krb5.conf

Here’s an example (changes in bold):

# Configuration snippets may be placed in this directory as well
includedir /etc/krb5.conf.d/

[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log

[libdefaults]
dns_lookup_realm = false
ticket_lifetime = 30d
renew_lifetime = 30d
forwardable = true
rdns = false
# default_realm = EXAMPLE.COM
default_ccache_name = KEYRING:persistent:%{uid}

default_realm = NTAP.LOCAL
[realms]
# EXAMPLE.COM = {
# kdc = kerberos.example.com
# admin_server = kerberos.example.com
# }

NTAP.LOCAL = {
}

[domain_realm]
# .example.com = EXAMPLE.COM
# example.com = EXAMPLE.COM
ntap.local = NTAP.LOCAL
.ntap.local = NTAP.LOCAL

Configure the sssd.conf file to point to LDAP

This is what mine looks like. Note the LDAP URI and SASL authid. I also set “use_fully_qualified_names” to false.

[domain/default]
cache_credentials = False
case_sensitive = False
enumerate = True

[sssd]
config_file_version = 2
services = nss, pam, autofs
domains = NTAP.local
debug_level = 7
[nss]
filter_users = root,ldap,named,avahi,haldaemon,dbus,radiusd,news,nscd
filter_groups = root
[pam]
[domain/DOMAIN]
auth_provider = krb5
chpass_provider = krb5
id_provider = ldap
ldap_search_base = dc=ntap,dc=local
ldap_schema = rfc2307bis
ldap_sasl_mech = GSSAPI
ldap_user_object_class = user
ldap_group_object_class = group
ldap_user_home_directory = unixHomeDirectory
ldap_user_principal = userPrincipalName
ldap_account_expire_policy = ad
ldap_force_upper_case_realm = true
ldap_user_search_base = cn=Users,dc=ntap,dc=local
ldap_group_search_base = cn=Users,dc=ntap,dc=local
ldap_sasl_authid = root/krb-container.ntap.local@NTAP.LOCAL
krb5_server = ntap.local
krb5_realm = NTAP.LOCAL
krb5_kpasswd = ntap.local
use_fully_qualified_names = false

Enable authconfig and start SSSD

authconfig --enablesssd --enablesssdauth --updateall

systemctl start sssd

Test LDAP functionality/name lookup

You can use “getent” or “id” to look names up.

# id prof1
uid=1100(prof1) gid=1101(ProfGroup) groups=1101(ProfGroup),1203(group3),1202(group2),1201(group1),1220(sharedgroup)

Configure /etc/idmapd.conf with the NFSv4.x domain

Just this single line needs to be added. Needs to match what’s on the ONTAP SVM.

Domain = [NTAP.LOCAL]

Creating your container

I used the centos/http:latest as the base image, and am running with systemd. I also am copying a few config files to the container to ensure it functions properly and then running a script afterwards.

Here’s the dockerfile I used to create a container that could do NFSv4.x, Kerberos and LDAP. You can also find it on GitHub here:

https://github.com/whyistheinternetbroken/centos-kerberos-nfsv4-sssd

FROM centos/httpd:latest
ENV container docker

# Copy the dbus.service file from systemd to location with Dockerfile
ADD dbus.service /usr/lib/systemd/system/dbus.service

VOLUME ["/sys/fs/cgroup"]
VOLUME ["/run"]

CMD ["/usr/lib/systemd/systemd"]

RUN yum -y install centos-release-scl-rh && \
yum -y install --setopt=tsflags=nodocs mod_ssl
RUN yum -y update; yum clean all
RUN yum -y install --setopt=tsflags=nodocs sssd sssd-dbus adcli krb5-workstation ntp realmd oddjob oddjob-mkhomedir samba-common samba-common-tools nfs-utils; yum clean all

## Systemd cleanup base image
RUN (cd /lib/systemd/system/sysinit.target.wants && for i in *; do [ $i == systemd-tmpfiles-setup.service ] || rm -vf $i; done) & \
rm -vf /lib/systemd/system/multi-user.target.wants/* && \
rm -vf /etc/systemd/system/*.wants/* && \
rm -vf /lib/systemd/system/local-fs.target.wants/* && \
rm -vf /lib/systemd/system/sockets.target.wants/*udev* && \
rm -vf /lib/systemd/system/sockets.target.wants/*initctl* && \
rm -vf /lib/systemd/system/basic.target.wants/* && \
rm -vf /lib/systemd/system/anaconda.target.wants/*

# Copy the local SSSD conf file
RUN mkdir -p /etc/sssd
COPY sssd.conf /etc/sssd/sssd.conf

# Copy the local krb files
COPY krb5.keytab /etc/krb5.keytab
COPY krb5.conf /etc/krb5.conf

# Copy the NFSv4 IDmap file
COPY idmapd.conf /etc/idmapd.conf

#Copy the DNS config
COPY resolv.conf /etc/resolv.conf

# Copy rc.local
COPY rc.local /etc/rc.d/rc.local

# start services
ADD configure-nfs.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/configure-nfs.sh
RUN chmod +x /etc/rc.d/rc.local

You’ll notice that I have several COPY commands in the file. Those are config files you’d need to modify to reflect your own environment. You’ll want to store these in the same folder as your dockerfile. I’ll break down each file here.

dbus.service

This file is the same file as the one on the Docker host. It allows the container to run with systemd. Simply copy it from /usr/lib/systemd/system/dbus.service into your dockerfile folder location.

sssd.conf

This is our LDAP file and specifies how LDAP does its queries. While NFSv4.x will use the Docker host’s users for NFSv4.x mapping, the container will still need to know who a user is to allow us to su, kinit, etc. For this, you can essentially use the same config file you used for your Docker host.

krb5.keytab and krb5.conf

The krb5.keytab file is used to authenticate/bind to LDAP only in this case. So, use the same keytab file you created earlier. Same for the krb5.conf file, unless your containers are going to leverage a different KDC/domain than the Docker host. In that case, it gets a little more complicated. Just copy the Docker host’s krb5.keytab and krb5.conf files from /etc.

idmapd.conf

Again, same file as the Docker host. This defines our idmap domain for NFSv4.x.

resolv.conf

DNS information; should match what’s on the Docker host.

rc.local

This file is useful for running our configuration script. We need the script to run because the container won’t let you start services before it’s running. When you try, you get this error (or something similar):

Failed to get D-Bus connection: No connection to service manager.

This is the line I added to my rc.local:

/usr/local/bin/configure-nfs.sh

That leads us to the script…

configure-nfs.sh

This script starts services. It also joins the container to the Kerberos realm. While I’m using AD KDCs, you can also use realm join to join Linux-based KDCs. Maybe one day I’ll set one up and write up a guide, but for now, read the Linux KDC docs. 🙂

For the realm join, I’m passing the password with the command. It’s in plaintext, so I’d recommend not using a domain admin here. Realm join uses administrator by default, but you have a way to specify a different user with the -U option. So, you can either create a user that *only* has access to create/delete objects in a specific OU or leave the password portion out and have users enter the password when the container starts.

I’d also highly recommend creating a new OU in AD to house all your container machine objects. Otherwise, you’ll see your default OU get flooded with these:

krb-containers

So, configure an OU or CN in AD and then point realm join to use that.

docker-ou.png

Here’s my shell script:

#!/bin/sh
systemctl start dbus
systemctl start rpcgssd
systemctl start rpcidmapd
systemctl restart sssd
echo PASSWORD| realm join -U username --computer-ou OU=Docker NTAP.LOCAL

Realm join caveat

In my config, I’ve done something “clever.” When you join a realm on a Linux client, it will also configure SSSD to pull UNIX IDs from AD. It doesn’t use the uid field by default. Instead, it creates a UID based on the AD SID. Thus, user student1 might look like this from LDAP (as expected):

# id student1
uid=1301(student1) gid=1201(group1) groups=1201(group1),1220(sharedgroup),1203(group3)

But would look like this from SSSD’s algorithm:

# id student1@NTAP.LOCAL
uid=1587401108(student1@NTAP.local) gid=1587400513(domainusers@NTAP.local) groups=1587400513(domainusers@NTAP.local),1587401107(group3@NTAP.local),1587401105(group1@NTAP.local),1587401122(sharedgroup@NTAP.local)

ONTAP doesn’t really know how to query UIDs in the way SSSD does, so we’d need SSSD to be able to look up our UNIX users, but also be able to query AD users that may not have UNIX attributes populated. To control that, I set my sssd.conf file to do the following:

  • When a username is specified without a FQDN, SSSD looks it up in normal LDAP
  • When a username is specified with a FQDN, SSSD uses the algorithm

I controlled this with the SSSD option use_fully_qualified_names. I set it to false for my UNIX users. When realm join is run, it appends to the sssd.conf file and uses the default value of use_fully_qualified_names, which is “true.”

Here’s what realmd adds to the file:

[domain/NTAP.local]
ad_domain = NTAP.local
krb5_realm = NTAP.LOCAL
realmd_tags = manages-system joined-with-samba
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
ldap_id_mapping = True
use_fully_qualified_names = True
fallback_homedir = /home/%u@%d
access_provider = ad

Build your container!

That’s pretty much it. Once you have your Docker host and ONTAP cluster configured, Kerberizing NFS in containers is a breeze. Simply build your Docker container using the dockerfile:

docker build -f /dockerfiles/dockerfile.kerb -t parisi/centos-krb-client .

And then run it in privileged mode. The following also shows us specifying a volume that has been created using NetApp Trident. (see below for my Trident config.json file)

docker run --rm -it --privileged -d -v kerberos:/kerberos parisi/centos-krb-client

And then you can exec the container and start using Kerberos!

# docker exec -ti 330e10f7db1d bash

# su student1
sh-4.2$ klist
klist: Credentials cache keyring 'persistent:1301:1301' not found
sh-4.2$ kinit
Password for student1@NTAP.LOCAL:
sh-4.2$ klist -e
Ticket cache: KEYRING:persistent:1301:1301
Default principal: student1@NTAP.LOCAL

Valid starting Expires Service principal
08/16/18 14:52:58 08/17/18 00:52:58 krbtgt/NTAP.LOCAL@NTAP.LOCAL
renew until 08/23/18 14:52:55, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
sh-4.2$ cd /kerberos
sh-4.2$ klist -e
Ticket cache: KEYRING:persistent:1301:1301
Default principal: student1@NTAP.LOCAL

Valid starting Expires Service principal
08/16/18 14:53:09 08/17/18 00:52:58 nfs/demo.ntap.local@NTAP.LOCAL
renew until 08/23/18 14:52:55, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
08/16/18 14:52:58 08/17/18 00:52:58 krbtgt/NTAP.LOCAL@NTAP.LOCAL
renew until 08/23/18 14:52:55, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
sh-4.2$ ls -la
total 8
drwxrwxrwx. 2 root root 4096 Aug 15 15:46 .
drwxr-xr-x. 18 root root 4096 Aug 15 19:02 ..
-rw-r--r--. 1 prof1 ProfGroup 0 Aug 15 15:41 newfile
-rw-r--r--. 1 student1 group1 0 Aug 14 21:08 newfile2
-rw-r--r--. 1 root daemon 0 Aug 15 15:41 newfile3
-rw-r--r--. 1 student1 group1 0 Aug 15 15:46 newfile4
-rw-r--r--. 1 prof1 ProfGroup 0 Aug 13 20:57 prof1file
-rw-r--r--. 1 student1 group1 0 Aug 13 20:58 student1
-rw-r--r--. 1 student2 group2 0 Aug 13 21:12 student2

You can also push the docker image up to your repository and pull it down on any Docker host you like, provided that Docker host is configured as we mentioned.

BONUS ROUND: Trident config.json

tumblr_lndumwicey1qdh05go1_400

Did you know you could control mount options and export policy rules with Trident?

Get Trident here!

Just use the config.json file to do that. In my file, every volume mounts as NFSv4.1 with Kerberos security and a Kerberos export policy.

{
"version": 1,
"storageDriverName": "ontap-nas",
"managementLIF": "10.x.x.x",
"dataLIF": "10.x.x.x",
"svm": "DEMO",
"username": "admin",
"password": "PASSWORD",
"aggregate": "aggr1_node1",
"exportPolicy": "kerberos",
"nfsMountOptions": "-o vers=4.1,sec=krb5",
"defaults": {
"exportPolicy": "kerberos"
}
}

Happy Kerberizing!

Advertisements

Behind the Scenes: Episode 119 – NFSv4.x Deep Dive

Welcome to the Episode 119, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, I pass the NFS torch on to Chris Hurley (@averageguyx) as we talk about NFSv4.x in ONTAP and why you should be thinking of making the move from NFSv3 in the near future.

Chris Hurley’s blog can be found at: http://averageguyx.blogspot.com

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

This week’s episode is here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

 

Kerberize your NFSv4.1 Datastores in ESXi 6.5 using NetApp ONTAP

How do I set up Kerberos with NFSv4.1 datastores in ESXi 6.5?

 

homer-rubik

I have gotten this question enough now (and from important folks like Cormac Hogan) to write up an end to end guide on how to do this. I had a shorter version in TR-4073 and then a Linux-specific version in TR-4616, but admittedly, it wasn’t really enough to help people get the job done. I also wrote up some stuff on ESXi Kerberos in a previous blog post that wasn’t nearly as in-depth called How to set up Kerberos on vSphere 6.0 servers for datastores on NFS.

Cormac also wrote up one of his own:

https://cormachogan.com/2017/10/12/getting-grips-nfsv4-1-kerberos/

I will point out that, in general, NFS Kerberos is a pain in the ass for people who don’t set it up on a regular basis and understand its inner workings, regardless of the vendor you’re interacting with. The reason is that there are multiple moving parts involved, and support for various portions of Kerberos (such as encryption types) vary. Additionally, some hosts automate things better than others.

We’re going to set it all up as if we only have created a basic SVM with data LIFs and some volumes. If you have an existing SVM configured with NFS and such, you can retrofit as needed. While this blog covers only NFS Kerberos using ONTAP as the NFS server, the steps for AD and ESXi would apply to other NFS servers as well, in most cases.

Here’s the lab setup:

  • ONTAP 9.3 (but this works for ONTAP 9.0 and onward)
    • SVM is SVM1
  • Windows 2012R2 (any Windows KDC that supports AES will work)/Active Directory
    • DNS, KDC
    • Domain is core-tme.netapp.com
  • ESXi 6.5
    • Hostname is CX400S1-GWNSG6B

ONTAP Configuration

While there are around 10 steps to do this, keep in mind that you generally only have to configure this once per SVM.

We’ll start with ONTAP, via the GUI. I’ll also include a CLI section. First, we’ll start in the SVM configuration section to minimize clicks. This is found under Storage -> SVMs. Then, click on the SVM you want to configure and then click on the NFS protocol.

SVM-protocol

1. Configure DNS

We need this because Kerberos uses DNS lookups to determine host names/IPs. This will be the Active Directory DNS information. This is found under Services -> DNS/DDNS.

dns

2. Enable NFSv4.1

NFSv4.1 is needed to allow NFSv4.1 mounts (obviously). You don’t need to enable NFSv4.0 to use NFSv4.1. ESXi doesn’t support v4.0 anyway. But it is possible to use v3.0, v4.0 and v4.1 in the same SVM. This can be done under “Protocols -> NFS” on the left menu.

v41.png

3. Create an export policy and rule for the NFS Kerberos datastore volume

Export policies in ONTAP are containers for rules. Rules are what defines the access to an exported volume. With export policy rules, you can limit the NFS version allowed, the authentication type, root access, hosts, etc. For ESXi, we’re defining the ESXi host (or hosts) in the rule. We’re allowing NFS4 only and Kerberos only. We’re allowing the ESXi host to have root access. If you use NFSv3 with Kerberos for these datastores, be sure to adjust the policy and rules accordingly. This is done under the “Policies” menu section on the left.

export-rule.png

4. Verify that vsroot has an export policy and rule that allows read access to ESXi hosts

Vsroot is “/” in the namespace. As a result, for clients to mount NFS exports, they must have at least read access via vsroot’s export policy rule and at least traverse permissions (1 in mode bits) to navigate through the namespace. In most cases, vsroot uses “default” as the export policy. Verify whichever export policy is being used has the proper access.

If a policy doesn’t have a rule, create one. This is an example of minimum permissions needed for the ESXi host to traverse /.

vsroot-policy.png

5. Create the Kerberos realm

The Kerberos realm is akin to /etc/krb5.conf on Linux clients. It tells ONTAP where to look to attempt the bind/join to the KDC. After that, KDC servers are discovered by internal processes and won’t need the realm. The realm domain should be defined in ALL CAPS. This is done in System Manager using “Services -> Kerberos Realm” on the left.

realm.png

6. Enable Kerberos on your data LIF (or LIFs)

To use NFS Kerberos, you need to tell ONTAP which data LIFs will participate in the requests. Doing this specifies a service principal for NFS on that data LIF. The SVM will interact with the SVM defined in the Kerberos realm to create a new machine object in AD that can be used for Kerberos. The SPN is defined as nfs/hostname.domain.com and represents the name you want clients to use to access shares. This FQDN needs to exist in DNS as a forward and reverse record to ensure things work properly. If you enable Kerberos on multiple data LIFs with the same name, the machine account gets re-used. If you use different SPNs on LIFs, different accounts get created. You have a 15 character limit for the “friendly” display name in AD. If you want to change the name later, you can. That’s covered in TR-4616.

kerb-interface.png

7. Create local UNIX group and users

For Kerberos authentication, a krb-unix name mapping takes place, where the incoming SPN will attempt to map to a UNIX user that is either local on the SVM or in external name service servers. You always need the “nfs” user, as the nfs/fqdn SPN will map to “nfs” implicitly. The other user will depend on the user you specify in ESXi when you configure Kerberos. That UNIX user will use the same user name. In my example, I used “parisi,” which is a user in my AD domain. Without these local users, the krb-unix name mapping would fail and manifest as “permission denied” errors when mounting. The cluster would show errors calling out name mapping in “event log show.”

Alternatively, you can create name mapping rules. This is covered in TR-4616. UNIX users and groups can be created using the UNIX menu option under “Host Users and Groups” in the left menu.

The numeric GID and UID can be any unused numeric in your environment. I used 501 and 502.

First, create the primary group to assign to the users.

group.png

Then, create the NFS user with the Kerberos group as the primary group.

nfs-user.png

Finally, create the user you used in the Kerberos config in ESXi.

parisi-user.png

Failure to create the user would result in the following similar error in ONTAP. (In the below, I used a user named ‘vsphere’ to try to authenticate):

**[ 713] FAILURE: User 'vsphere' not found in UNIX authorization source LDAP.
 [ 713] Entry for user-name: vsphere not found in the current source: LDAP. Ignoring and trying next available source
 [ 715] Entry for user-name: vsphere not found in the current source: FILES. Entry for user-name: vsphere not found in any of the available sources
 [ 717] Unable to map SPN 'vsphere@CORE-TME.NETAPP.COM'
 [ 717] Unable to map Kerberos NFS user 'vsphere@CORE-TME.NETAPP.COM' to appropriate UNIX user

8. Create the volume to be used as the datastore

This is done from “Storage -> Volumes.” In ONTAP 9.3, the only consideration is that you must specify a “Protection” option, even if it’s “none.” Otherwise, it will throw an error.

vol create

vol-protect.png

Once the volume is created, it automatically gets exported to /volname.

9. Verify the volume security style is UNIX for the datastore volume

The volume security style impacts how a client will attempt to authenticate into the ONTAP cluster. If a volume is NTFS security style, then NFS clients will attempt to map to Windows users to figure out the access allowed on an object. System Manager doesn’t let you define the security style at creation yet and will default to the security style of the vsroot volume (which is / in the namespace). Ideally, vsroot would also be UNIX security style, but in some cases, NTFS is used. For VMware datastores, there is no reason to use NTFS security style.

From the volumes screen, click on the newly created volume and click the “Edit” button to verify UNIX security style is used.

sec-style.png

10. Change the export policy assigned to the volume to the ESX export policy you created

Navigate to “Storage -> Namespace” to modify the export policy used by the datastore.

change-policy.png

11. Configure NTP

This prevents the SVM from getting outside of the 5 minute time skew that can break Kerberos authentication. This is done via the CLI. No GUI support for this yet.

cluster::> ntp server create -server stme-infra02.core-tme.netapp.com -version auto

12. Set the NFSv4 ID domain

While we’re in the CLI, let’s set the ID domain. This ID domain is used for client-server interaction, where a user string will be passed for NFSv4.x operations. If the user string doesn’t match on each side, the NFS user gets squashed to “nobody” as a security mechanism. This would be the same domain string on both ESX and on the NFS server in ONTAP (case-sensitive). For example, “core-tme.netapp.com” would be the ID domain here and users from ESX would come in as user@core-tme.netapp.com. ONTAP would look for user@core-tme.netapp.com to exist.

In ONTAP, that command is:

cluster::> nfs modify -vserver SVM1 -v4-id-domain core-tme.netapp.com

13. Change datastore volume permissions

By default, volumes get created with the root user and group as the owner, and 755 access. In ESX, if you want to create VMs on a datastore, you’d need either root access or to change write permissions. When you use Kerberos, ESX will use the NFS credentials specified in the configuration as the user that writes to the datastore. Think of this as a “VM service account” more or less. So, your options are:

  • Change the owner to a different user than root
  • Use root as the user (which would need to exist as a principal in the KDC)
  • Change permissions

In my opinion, changing the owner is the best, most secure choice here. To do that:

cluster::> volume modify -vserver SVM1 -volume kerberos_datastore -user parisi

That’s all from ONTAP for the GUI. The CLI commands would be (all in admin priv):

cluster::> dns create -vserver SVM1 -domains core-tme.netapp.com -name-servers 10.193.67.181 -timeout 2 -attempts 1 -skip-config-validation true
cluster::> nfs modify -vserver SVM1 -v4.1 enabled
cluster::> export-policy create -vserver SVM1 -policyname ESX
cluster::> export-policy rule create -vserver SVM1 -policyname ESX -clientmatch CX400S1-GWNSG6B.core-tme.netapp.com -rorule krb5* -rwrule krb5* -allow-suid true -ruleindex 1 -protocol nfs4 -anon 65534 -superuser any
cluster::> vol show -vserver SVM1 -volume vsroot -fields policy
cluster::> export-policy rule show -vserver SVM1 -policy [policy from prev command] -instance
cluster::> export-policy rule modify or create (if changes are needed)
cluster::> kerberos realm create -vserver SVM1 -realm CORE-TME.NETAPP.COM -kdc-vendor Microsoft -kdc-ip 10.193.67.181 -kdc-port 88 -clock-skew 5 -adminserver-ip 10.193.67.181 -adminserver-port 749 -passwordserver-ip 10.193.67.181 -passwordserver-port 464 -adserver-name stme-infra02.core-tme.netapp.com -adserver-ip 10.193.67.181
cluster::> kerberos interface enable -vserver SVM1 -lif data -spn nfs/ontap9.core-tme.netapp.com
cluster::> unix-group create -vserver SVM1 -name kerberos -id 501
cluster::> unix-user create -vserver SVM1 -user nfs -id 501 -primary-gid 501
cluster::> unix-user create -vserver SVM1 -user parisi -id 502 -primary-gid 501
cluster::> volume create -vserver SVM1 -volume kerberos_datastore -aggregate aggr1_node1 -size 500GB -state online -policy kerberos -user 0 -group 0 -security-style unix -unix-permissions ---rwxr-xr-x -junction-path /kerberos_datastore 

 ESXi Configuration

This is all driven through the vSphere GUI. This would need to be performed on each host that is being used for NFSv4.1 Kerberos.

1. Configure DNS

This is done under the “Hosts and Clusters -> Manage -> Networking -> TCP/IP config.”

dns-esx.png

2. Configure NTP

This is found in “Hosts and Clusters -> Settings -> Time Configuration”

ntp

3. Join the ESXi host to the Active Directory domain

Doing this automatically creates the machine account in AD and will transfer the keytab files between the host and KDC. This also sets the SPNs on the machine account. The user specified in the credentials must have create object permissions in the Computers OU in AD. (For example, a domain administrator)

This is found in “Hosts and Clusters -> Settings -> Authentication Services.”

join-domain.png

4. Specify NFS Kerberos Credentials

This is the user that will authenticate with the KDC and ONTAP for the Kerberos key exchange. This user name will be the same as the UNIX user you used in ONTAP. If you use a different name, create a new UNIX user in ONTAP or create a name mapping rule. If the user password changes in AD, you must also change it in ESXi.

nfs-creds

With NFS Kerberos in ESX, the ID you specified in NFS Kerberos credentials will be the ID used to write. For example, I used “parisi” as the user. My SVM is using LDAP authentication with AD. That user exists in my LDAP environment as the following:

cluster::*> getxxbyyy getpwbyuid -node ontap9-tme-8040-01 -vserver SVM1 -userID 3629
  (vserver services name-service getxxbyyy getpwbyuid)
pw_name: parisi
pw_passwd: 
pw_uid: 3629
pw_gid: 512
pw_gecos: 
pw_dir: 
pw_shell: /bin/sh

As a result, the test VM I create got written as that user:

drwxr-xr-x   2 3629  512    4096 Oct 12 10:40 test

To even be able to write at all, I had to change the UNIX permissions on the datastore to allow write access to “others.” Alternatively, I could have changed the owner of the volume to the specified user. I mention those steps in the ONTAP section.

If you plan on changing the user for NFS creds, be sure to use “clear credentials,” which will restart the service and clear caches. Occasionally, you may need to restart the nfsgssd service from the CLI if something is stubbornly cached:

[root@CX400S1-03003-B3:/] /etc/init.d/nfsgssd restart
watchdog-nfsgssd: Terminating watchdog process with PID 33613
Waiting for process to terminate...
nfsgssd stopped
nfsgssd started

In rare cases, you may have to leave and re-join the domain, which will generate new keytabs. In one particularly stubborn case, I had to reboot the ESX server after I changed some credentials and the Kerberos principal name in ONTAP.

That’s the extent of the ESXi host configuration for now. We’ll come back to the host to mount the datastore once we make some changes in Active Directory.

Active Directory Configuration

Because there are variations in support for encryption types, as well as DNS records needed, there are some AD tasks that need to be performed to get Kerberos to work.

1. Configure the machine accounts

Set the machine account attributes for the ESXi host(s) and ONTAP NFS server to only allow AES encryption. Doing this avoids failures to mount via Kerberos that manifest as “permission denied” on the host. In a packet trace, you’d potentially be able to see the ESXi host trying to exchange keys with the KDC and getting “unsupported enctype” errors if this step is skipped.

The exact attribute to change is msDS-SupportedEncryptionTypes. Set that value to 24, which is AES only. For more info on encryption types in Windows, click to view this blog.

You can change this attribute using “Advanced Features” view with the attribute editor. If that’s not available or it’s not possible to use, you can also modify using PowerShell.

To modify in the GUI:

msds-enctype.png

To modify using PowerShell:

PS C:\> Set-ADComputer -Identity [NFSservername] -Replace @{'msDS-SupportedEncryptionTypes'=24}

2. Create DNS records for the ESXi hosts and the ONTAP server

This would be A/AAAA records for forward lookup and PTR for reverse. Windows DNS let’s you do both at the same time. Verify the DNS records with “nslookup” commands.

This can also be done via GUI or PowerShell.

From the GUI:

From PowerShell:

PS C:\Users\admin>Add-DnsServerResourceRecordA -IPv4Address 10.193.67.220 -CreatePtr core-tme.netapp.com -Name ontap9

PS C:\Users\admin>Add-DnsServerResourceRecordA -IPv4Address 10.193.67.35 -CreatePtr core-tme.netapp.com -Name cx400s1-gwnsg6b

Mounting the NFS Datastore via Kerberos

Now, we’re ready to create the datastore in ESX using NFSv4.1 and Kerberos.

Simply go to “Add Datastore” and follow the prompts to select the necessary options.

1. Select “NFS” and then “NFS 4.1.”

VMware doesn’t recommend mixing v3 and v4.1 on datastores. If you have an existing datastore that you were mounting via v3, VMware recommends migrating VMs using storage vmotion.

new-ds1new-ds2

2. Specify the name and configuration

The datastore name can be anything you like. The “folder” has to be the junction-path/export path on the ONTAP cluster. In our example, we use /kerberos_datastore.

Server(s) would be the data LIF you enabled Kerberos on. ONTAP doesn’t support NFSv4.1 multi-pathing/trunking yet, so specifying multiple NFS servers won’t necessarily help here.

new-ds3.png

4. Check “Enable Kerberos-based authentication”

Kind of a no-brainer here, but still worth mentioning.

new-ds4.png

5. Select the hosts that need access.

If other hosts have not been configured for Kerberos, they won’t be available to select.

new-ds5.png

6. Review the configuration details and click “Finish.”

This should mount quickly and without issue. If you have an issue, review the “Troubleshooting” tips below.

new-ds6.png

This can also be done with a command from ESX CLI:

esxcli storage nfs41 add -H ontap9 -a SEC_KRB5 -s /kerberos_datastore -v kerberosDS

Troubleshooting Tips

If you follow the steps above, this should all work fine.

mounted-ds.png

But sometimes I make mistakes. Sometimes YOU make mistakes. It happens. 🙂

Some steps I use to troubleshoot Kerberos mount issues…

  • Review the vmkernel logs for vcenter
  • Review “event log show” from the cluster CLI
  • Ensure the ESX host name and SVM host name exist in DNS (nslookup)
  • Use packet traces from the DC to see what is failing during Kerberos authentication (filter on “kerberos” in wireshark)
  • Review the SVM config:
    • Ensure NFSv4.1 is enabled
    • Ensure the SVM has DNS configured
    • Ensure the Kerberos realm is all caps and is created on the SVM
    • Ensure the desired data LIF has Kerberos enabled (from system manager of via “kerberos interface show” from the cli)
    • Ensure the export policies and rules allow access to the ESX datastore volume for Kerberos, superuser and NFSv4. Ensure the vsroot volume allows at least read access for the ESX host.
    • Ensure the SVM has the appropriate UNIX users and group created (nfs user for the NFS SPN; UNIX user name that matches the NFS user principal defined in ESX) or the users exist in external name services
  • From the KDC/AD domain controller:
    • Ensure the machine accounts created use AES only to avoid any weird issues with encryption type support
    • Ensure the SPNs aren’t duplicated (setspn /q {service/fqdn}
    • Ensure the user defined in the NFS Kerberos config hasn’t had a password expire

 

Using NFSv4.x ACLs with NFSv3 in NetApp ONTAP? You betcha!

One thing I’ve come to realize from being in IT so long is that you should be constantly learning new things. If you aren’t, it’s not because you’re smart or because you know everything; it’s because you’re stagnating.

So, I was not surprised when I heard it was possible to apply NFSv4.x ACLs to files and folders and then mount them via NFSv3 and have the ACLs still work! I already knew that you could do audit ACEs from NFSv4.x for NFSv3 (covered in TR-4067), but had no idea this could extend into the permissions realm. If so, this solves a pretty big problem with NFSv3 in general, where your normal permissions are limited only to owner, group and then everyone else. That makes it hard to do any sort of granular access control for NFSv3 mounts, presents problems for some environments.

It also allows you to keep using NFSv3 for your workloads, whether for legacy application or general performance concerns. NFSv4.x has a lot of advantages over NFSv3, but if you don’t need stateful operations or the NFSv4.x features, or integrated locking, then you are safe to stay with NFSv3.

So, is it possible to use NFSv4.x ACLs with NFSv3 objects?

You betcha!

fargo-film-marge.jpg

The method for doing this is pretty straightforward.

  1. Configure and enable NFSv4.x in ONTAP and on your client
  2. Enable NFSv4.x ACL support in ONTAP
  3. Mount the export via NFSv4.x
  4. Apply the NFSv4.x ACLs
  5. Unmount and then remount the export using NFSv3 and test it out!

 

Configuring NFSv4.x

When you’re setting up NFSv4.x in an environment, there are a few things to keep in mind:

  • Client and NFS server support for NFSv4.x
  • NFS utilities installed on clients (for NFSv4.x functionality)
  • NFSv4.x configured on the client in idmapd.conf
  • NFSv4.x configured on the server in ONTAP (ACLS allowed)
  • Export policies and rules configured in ONTAP
  • Ideally, a name service server (like LDAP) to negotiate the server/client conversation of user identities

One of the reasons NFS4.x is more secure than NFSv3 is the use of user ID strings (such as user@domain.com) to help limit cases of user spoofing in NFS conversations. This ID string is required to be case-sensitive. If the string doesn’t match on both client and server, then the NFSv4.x mounts will get squashed to the defined “nobody” user in the NFSv4.x client. One of the more common issues seen with NFSv4.x mounts is the “nobody:nobody” user and group on files and folders. One of the most common causes of this is when a domain string is mismatched on the client and server.

In a client that domain string is defined in the idmapd.conf file. Sometimes, it will default to the DNS domain. In ONTAP, the v4-id-domain string should be configured to the same value on the client to provide proper NFSv4.x authentication.

Other measures, such as Kerberos encryption, can help lock the NFS conversations down further. NFSv4.x ACLs are a way to ensure that files and folders are only seen by those entities that have been granted access and is considered to be authorization, or, what you are allowed to do once you authenticate. For more complete steps on setting up NFSv4.x, see TR-4067 and TR-4073.

However, we’re only setting up NFSv4.x to allow us to configure the ACLs…

What are NFSv4.x ACLs?

NFSv4.x ACLs are a way to apply granular permissions to files and folders in NFS outside of the normal “read/write/execute” of NFSv3, and across more objects than simple “owner/group/everyone.” NFSv4.x ACLs allow administrators to set permissions for multiple users and groups on the same file or folder and treat NFS ACLs more like Windows ACLs. For more information on NFSv4.x ACLs, see:

http://wiki.linux-nfs.org/wiki/index.php/ACLs

https://linux.die.net/man/5/nfs4_acl

http://www.netapp.com/us/media/tr-4067.pdf

NFSv3 doesn’t have this capability by default. The only way to get more granular ACLs in NFSv3 natively is to use POSIX ACLs, which ONTAP doesn’t support.

Once you’ve enabled ACLs in ONTAP (v4.0-acl and/or v4.1-acl options), you can mount an NFS export via NFSv4.x and start applying NFSv4.x ACLs.

In my environment, I mounted a homedir volume and then set up an ACL on a file owned by root for a user called “prof1” using nfs4_setfacl -e (which allows you to edit a file rather than have to type in a long command).

[root@centos7 /]# mount demo:/home /mnt
[root@centos7 /]# mount | grep mnt
demo:/home on /mnt type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.193.67.225,local_lock=none,addr=10.193.67.237)

The file lives in the root user’s homedir. The root homedir is set to 755, which means anyone can read them, but no one but the owner (root) can write to them.

drwxr-xr-x 2 root root 4096 Jul 13 10:42 root

That is, unless, I set NFSv4.x ACLs to allow a user full control:

[root@centos7 mnt]# nfs4_getfacl /mnt/root/file
A::prof1@ntap.local:rwaxtTnNcCy
A::OWNER@:rwaxtTnNcCy
A:g:GROUP@:rxtncy
A::EVERYONE@:rxtncy

I can also see those permissions from the ONTAP CLI:

ontap9-tme-8040::*> vserver security file-directory show -vserver DEMO -path /home/root/file

Vserver: DEMO
 File Path: /home/root/file
 File Inode Number: 8644
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 20
 DOS Attributes in Text: ---A----
Expanded Dos Attributes: -
 UNIX User Id: 0
 UNIX Group Id: 1
 UNIX Mode Bits: 755
 UNIX Mode Bits in Text: rwxr-xr-x
 ACLs: NFSV4 Security Descriptor
 Control:0x8014
 DACL - ACEs
 ALLOW-user-prof1-0x1601bf
 ALLOW-OWNER@-0x1601bf
 ALLOW-GROUP@-0x1200a9-IG
 ALLOW-EVERYONE@-0x1200a9

I can also expand the mask to translate the hex:

ontap9-tme-8040::*> vserver security file-directory show -vserver DEMO -path /home/root/file -expand-mask true

Vserver: DEMO
 File Path: /home/root/file
 File Inode Number: 8644
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 20
 DOS Attributes in Text: ---A----
Expanded Dos Attributes: 0x20
 ...0 .... .... .... = Offline
 .... ..0. .... .... = Sparse
 .... .... 0... .... = Normal
 .... .... ..1. .... = Archive
 .... .... ...0 .... = Directory
 .... .... .... .0.. = System
 .... .... .... ..0. = Hidden
 .... .... .... ...0 = Read Only
 UNIX User Id: 0
 UNIX Group Id: 1
 UNIX Mode Bits: 755
 UNIX Mode Bits in Text: rwxr-xr-x
 ACLs: NFSV4 Security Descriptor
 Control:0x8014

1... .... .... .... = Self Relative
 .0.. .... .... .... = RM Control Valid
 ..0. .... .... .... = SACL Protected
 ...0 .... .... .... = DACL Protected
 .... 0... .... .... = SACL Inherited
 .... .0.. .... .... = DACL Inherited
 .... ..0. .... .... = SACL Inherit Required
 .... ...0 .... .... = DACL Inherit Required
 .... .... ..0. .... = SACL Defaulted
 .... .... ...1 .... = SACL Present
 .... .... .... 0... = DACL Defaulted
 .... .... .... .1.. = DACL Present
 .... .... .... ..0. = Group Defaulted
 .... .... .... ...0 = Owner Defaulted

DACL - ACEs
 ALLOW-user-prof1-0x1601bf
 0... .... .... .... .... .... .... .... = Generic Read
 .0.. .... .... .... .... .... .... .... = Generic Write
 ..0. .... .... .... .... .... .... .... = Generic Execute
 ...0 .... .... .... .... .... .... .... = Generic All
 .... ...0 .... .... .... .... .... .... = System Security
 .... .... ...1 .... .... .... .... .... = Synchronize
 .... .... .... 0... .... .... .... .... = Write Owner
 .... .... .... .1.. .... .... .... .... = Write DAC
 .... .... .... ..1. .... .... .... .... = Read Control
 .... .... .... ...0 .... .... .... .... = Delete
 .... .... .... .... .... ...1 .... .... = Write Attributes
 .... .... .... .... .... .... 1... .... = Read Attributes
 .... .... .... .... .... .... .0.. .... = Delete Child
 .... .... .... .... .... .... ..1. .... = Execute
 .... .... .... .... .... .... ...1 .... = Write EA
 .... .... .... .... .... .... .... 1... = Read EA
 .... .... .... .... .... .... .... .1.. = Append
 .... .... .... .... .... .... .... ..1. = Write
 .... .... .... .... .... .... .... ...1 = Read

ALLOW-OWNER@-0x1601bf
 0... .... .... .... .... .... .... .... = Generic Read
 .0.. .... .... .... .... .... .... .... = Generic Write
 ..0. .... .... .... .... .... .... .... = Generic Execute
 ...0 .... .... .... .... .... .... .... = Generic All
 .... ...0 .... .... .... .... .... .... = System Security
 .... .... ...1 .... .... .... .... .... = Synchronize
 .... .... .... 0... .... .... .... .... = Write Owner
 .... .... .... .1.. .... .... .... .... = Write DAC
 .... .... .... ..1. .... .... .... .... = Read Control
 .... .... .... ...0 .... .... .... .... = Delete
 .... .... .... .... .... ...1 .... .... = Write Attributes
 .... .... .... .... .... .... 1... .... = Read Attributes
 .... .... .... .... .... .... .0.. .... = Delete Child
 .... .... .... .... .... .... ..1. .... = Execute
 .... .... .... .... .... .... ...1 .... = Write EA
 .... .... .... .... .... .... .... 1... = Read EA
 .... .... .... .... .... .... .... .1.. = Append
 .... .... .... .... .... .... .... ..1. = Write
 .... .... .... .... .... .... .... ...1 = Read

ALLOW-GROUP@-0x1200a9-IG
 0... .... .... .... .... .... .... .... = Generic Read
 .0.. .... .... .... .... .... .... .... = Generic Write
 ..0. .... .... .... .... .... .... .... = Generic Execute
 ...0 .... .... .... .... .... .... .... = Generic All
 .... ...0 .... .... .... .... .... .... = System Security
 .... .... ...1 .... .... .... .... .... = Synchronize
 .... .... .... 0... .... .... .... .... = Write Owner
 .... .... .... .0.. .... .... .... .... = Write DAC
 .... .... .... ..1. .... .... .... .... = Read Control
 .... .... .... ...0 .... .... .... .... = Delete
 .... .... .... .... .... ...0 .... .... = Write Attributes
 .... .... .... .... .... .... 1... .... = Read Attributes
 .... .... .... .... .... .... .0.. .... = Delete Child
 .... .... .... .... .... .... ..1. .... = Execute
 .... .... .... .... .... .... ...0 .... = Write EA
 .... .... .... .... .... .... .... 1... = Read EA
 .... .... .... .... .... .... .... .0.. = Append
 .... .... .... .... .... .... .... ..0. = Write
 .... .... .... .... .... .... .... ...1 = Read

ALLOW-EVERYONE@-0x1200a9
 0... .... .... .... .... .... .... .... = Generic Read
 .0.. .... .... .... .... .... .... .... = Generic Write
 ..0. .... .... .... .... .... .... .... = Generic Execute
 ...0 .... .... .... .... .... .... .... = Generic All
 .... ...0 .... .... .... .... .... .... = System Security
 .... .... ...1 .... .... .... .... .... = Synchronize
 .... .... .... 0... .... .... .... .... = Write Owner
 .... .... .... .0.. .... .... .... .... = Write DAC
 .... .... .... ..1. .... .... .... .... = Read Control
 .... .... .... ...0 .... .... .... .... = Delete
 .... .... .... .... .... ...0 .... .... = Write Attributes
 .... .... .... .... .... .... 1... .... = Read Attributes
 .... .... .... .... .... .... .0.. .... = Delete Child
 .... .... .... .... .... .... ..1. .... = Execute
 .... .... .... .... .... .... ...0 .... = Write EA
 .... .... .... .... .... .... .... 1... = Read EA
 .... .... .... .... .... .... .... .0.. = Append
 .... .... .... .... .... .... .... ..0. = Write
 .... .... .... .... .... .... .... ...1 = Read

In the above, I gave prof1 full control over the file. Then, I mounted via NFSv3:

[root@centos7 /]# mount -o nfsvers=3 demo:/home /mnt
[root@centos7 /]# mount | grep mnt
demo:/home on /mnt type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.193.67.219,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=10.193.67.219)

When I become a user that isn’t on the NFSv4.x ACL, I can’t write to the file:

[root@centos7 /]# su student1
sh-4.2$ cd /mnt/root
sh-4.2$ ls -la
total 8
drwxr-xr-x 2 root root 4096 Jul 13 10:42 .
drwxrwxrwx 11 root root 4096 Jul 10 10:04 ..
-rwxr-xr-x 1 root bin 0 Jul 13 10:23 file
-rwxr-xr-x 1 root root 0 Mar 29 11:37 test.txt

sh-4.2$ touch file
touch: cannot touch ‘file’: Permission denied
sh-4.2$ rm file
rm: remove write-protected regular empty file ‘file’? y
rm: cannot remove ‘file’: Permission denied

When I change to the prof1 user, I have access to do whatever I want, even though the mode bit permissions in v3 say I can’t:

[root@centos7 /]# su prof1
sh-4.2$ cd /mnt/root
sh-4.2$ ls -la
total 8
drwxr-xr-x 2 root root 4096 Jul 13 10:42 .
drwxrwxrwx 11 root root 4096 Jul 10 10:04 ..
-rwxr-xr-x 1 root bin 0 Jul 13 10:23 file
-rwxr-xr-x 1 root root 0 Mar 29 11:37 test.txt

sh-4.2$ vi file
sh-4.2$ cat file
NFSv4ACLS!

When I do a chmod, however, nothing seems to change from the NFSv4 ACL for the user. I set 700 on the file, which shows up in NFSv3 mode bits:

sh-4.2$ chmod 700 file
sh-4.2$ ls -la
total 8
drwxr-xr-x 2 root root 4096 Jul 13 10:42 .
drwxrwxrwx 11 root root 4096 Jul 10 10:04 ..
-rwx------ 1 root bin 11 Aug 11 09:58 file
-rwxr-xr-x 1 root root 0 Mar 29 11:37 test.txt

But notice how the prof1 user still has full control:

ontap9-tme-8040::*> vserver security file-directory show -vserver DEMO -path /home/root/file

Vserver: DEMO
 File Path: /home/root/file
 File Inode Number: 8644
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 20
 DOS Attributes in Text: ---A----
Expanded Dos Attributes: -
 UNIX User Id: 0
 UNIX Group Id: 1
 UNIX Mode Bits: 700
 UNIX Mode Bits in Text: rwx------
 ACLs: NFSV4 Security Descriptor
 Control:0x8014
 DACL - ACEs
 ALLOW-user-prof1-0x1601bf
 ALLOW-OWNER@-0x1601bf
 ALLOW-GROUP@-0x120088-IG
 ALLOW-EVERYONE@-0x120088

This is because of an ONTAP option known as “ACL Preservation.”

ontap9-tme-8040::*> nfs show -vserver DEMO -fields v4-acl-preserve
vserver v4-acl-preserve
------- ---------------
DEMO enabled

When I set the option to enabled, the NFSv4.x ACLs will survive mode bit changes. If I disable the option, the ACLs get blown away when a chmod is done:

ontap9-tme-8040::*> nfs modify -vserver DEMO -v4-acl-preserve disabled

ontap9-tme-8040::*> nfs show -vserver DEMO -fields v4-acl-preserve
vserver v4-acl-preserve
------- ---------------
DEMO disabled


[root@centos7 root]# chmod 755 file

And the ACLs are wiped out:

ontap9-tme-8040::*> vserver security file-directory show -vserver DEMO -path /home/root/file

Vserver: DEMO
 File Path: /home/root/file
 File Inode Number: 8644
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 20
 DOS Attributes in Text: ---A----
Expanded Dos Attributes: -
 UNIX User Id: 0
 UNIX Group Id: 1
 UNIX Mode Bits: 755
 UNIX Mode Bits in Text: rwxr-xr-x
 ACLs: -

I’d personally recommend setting that option to “enabled” if you want to do v3 mounts with v4.x ACLs.

So, there you have it… a new way to secure your NFSv3 mounts!

Encrypt your NFS packets end to end with krb5p and ONTAP 9.2!

NFS has always had a running joke about security, with a play on the acronym stating that NFS was “Not For Security.”

With NFSv3 and prior, there was certainly truth to that, especially when NFS was mounted without Kerberos. But even using Kerberos in NFSv3 wasn’t necessarily secure, as it only was applied to the NFS packets and not the extraneous services like NLM, NSM, mountd, etc.

NFSv4.x improved NFS security greatly by implementing a single port, ACLs, ID domain names and more tightly integrated support for Kerberos, among other improvements. However, simple krb5 authentication by itself only encrypts the initial mounts and not the NFS packets themselves.

That’s where stronger Kerberos modes like krb5i and krb5p come into play. From the RedHat man pages:

sec=krb5 uses Kerberos V5 instead of local UNIX UIDs and GIDs to authenticate users.

sec=krb5i uses Kerberos V5 for user authentication and performs integrity checking of NFS operations using secure checksums to prevent data tampering.

sec=krb5p uses Kerberos V5 for user authentication, integrity checking, and encrypts NFS traffic to prevent traffic sniffing. This is the most secure setting, but it also involves the most performance overhead.

krb5p = privacy

The p in krb5p stands for “privacy,” and it does that by way of Kerberos encryption of the NFS conversation end-to-end, via the specified encryption strength. The strongest you can currently use is AES-256. ONTAP 9.0 and later supports krb5p and AES-256 encryption. Krb5p is similar to SMB3 encryption/signing and sealing in its functionality.

Krb5p is also similar to SMB3 encryption in its performance impact; doing encryption of thousands of packets is expensive and can create CPU bottlenecks, unless…

AES-NI Offloading

AES-NI offloading is a feature available on specific Intel CPUs that allow encryption processing to use hardware acceleration instructions to offload processing for encryption. This allows the encryption to be done separately to alleviate performance bottlenecks.

From Intel’s site:

Intel® AES New Instructions (Intel® AES NI) is a new encryption instruction set that improves on the Advanced Encryption Standard (AES) algorithm and accelerates the encryption of data in the Intel® Xeon® processor family and the Intel® Core™ processor family.

Comprised of seven new instructions, Intel® AES-NI gives your IT environment faster, more affordable data protection and greater security; making pervasive encryption feasible in areas where previously it was not.

ONTAP 9.1 provided support for AES-NI offloading for SMB3 encryption, which greatly improved performance. But krb5p offloading was only added as of ONTAP 9.2. If you plan on using the end-to-end encryption functionality in NFS with krb5p, use ONTAP 9.2 or later. For more information on what other features are in ONTAP 9.2, see the following post:

ONTAP 9.2RC1 is available!

Krb5p performance in ONTAP 9.0 vs. ONTAP 9.2

Krb5p support was added in ONTAP 9.0, but the performance was pretty awful, due to the lack of AES-NI support.

Here are some graphs using SIO with different flavors of Kerberos and AUTH_SYS in ONTAP 9.0. (All using NFSv4.1)

In ONTAP 9.0, krb5p wasn’t ever able to achieve above 12k IOPS for 4k reads in these SIO tests, and what it was able to achieve, it did it at some pretty severe latency. Krb5i did a little better, but krb5 and auth_sys performed way better.

Test environment was:

  • FAS8080 (AFF numbers coming soon)
  • 12 RHEL 6.7 clients

4K sequential reads in ONTAP 9.0:

krb5-ontap9-4k-read

Writes are even worse for krb5p in ONTAP 9.0 – we didn’t even get to 10k.

4K sequential writes in ONTAP 9.0:

krb5-ontap9-4k-write

For 8K sequential reads in ONTAP 9.0, latency is about the same. Fewer ops, but that’s because we’re doing the same amount of work in bigger I/O chunks.

8K sequential reads in ONTAP 9.0:

krb5-ontap9-8k-read.png

8K sequential writes in ONTAP 9.0:

krb5-ontap9-8k-write.png

NOTE: ONTAP 9.1 was not tested, but I’d expect similar performance, as we don’t do AES-NI offloading for NFS in that release.

ONTAP 9.2 Kerberos 5p Performance – Vastly improved

Now, let’s compare those same tests to ONTAP 9.2 with the AES-NI offloading and other performance enhancements. In the graphs below, there are a few things to point out.

  • Much more predictable performance for krb5i and krb5p as IOPS increase
  • Lower latency in 9.2 at high IOPS for krb5 than in 9.0
  • No real peak IOPS for krb5i/krb5p; these security flavors are able to keep up with sys and krb5 for sheer maximum IOPS
  • Sub millisecond latency for NFS at high IOPS (~50k) in most workloads, regardless of the security flavor
  • AES-NI offloading and NFS performance improvements in ONTAP 9.2 are pretty substantial

4K Sequential Reads in ONTAP 9.2:

krb5-ontap92-4k-read.png

4K Sequential Writes in ONTAP 9.2:

krb5-ontap92-4k-write.png

8K sequential reads in ONTAP 9.2:

krb5-ontap92-8k-read.png

8K sequential writes in ONTAP 9.2:

krb5-ontap92-8k-write.png

Conclusion

With ONTAP 9.2, you can now get enterprise class security with Kerberos 5p along with performance that doesn’t kill your workloads. If you’re doing NFS with any flavor of Kerberos, it makes a ton of sense to upgrade to ONTAP 9.2 to receive the performance benefits from AES-NI offloading. Keep in mind that upgrading ONTAP is non-disruptive to NFSv3, as it’s stateless, but will be slightly disruptive to CIFS/SMB and NFSv4.x workloads, due to the statefulness of the protocols.

NFS Kerberos in ONTAP Primer

Fun fact!

Kerberos was named after Cerberus, the hound of Hades, which protected the gates of the underworld with its three heads of gnashing teeth.

cerberos

Kerberos in IT security isn’t a whole lot different; it’s pretty effective at stopping intruders and is literally a three-headed monster.

In my day to day role as a Technical Marketing Engineer for NFS, I find that one of the most challenging questions I get is regarding NFS mounts using Kerberos. This is especially true now, as IT organizations are focusing more and more on securing their data and Kerberos is one way to do that. CIFS/SMB already does a nice job of this and it’s pretty easily integrated without having to do a ton on the client or storage side.

With NFS Kerberos, however, there are a ton of moving parts and not a ton of expertise that spans those moving parts. Think for a moment what all is involved here when dealing with ONTAP:

  • DNS
  • KDC server (Key Distribution Center)
  • Client/principal
  • NFS server/principal
  • ONTAP
  • NFS
  • LDAP/name services

This blog post isn’t designed to walk you through all those moving parts; that’s what TR-4073 was written for. Instead, this blog is going to simply walk through the workflow of what happens during an NFS mount using Kerberos and where things can fail/common failure scenarios. This post will focus on Active Directory KDCs, since that’s what I see most and get the most questions on. Other UNIX-based KDCs are either not as widely used, or the admins running them are ninjas that never need any help. 🙂

Common terms

First, let’s cover a few common terms used in NFS Kerberos.

Storage Virtual Machine (SVM)

This is what clustered ONTAP uses to present NAS and SAN storage to clients. SVMs act as tenants within a cluster. Think of them as “virtualized storage blades.”

Key Distribution Center (KDC)

The Kerberos ticket headquarters. This stores all the passwords, objects, etc. for running Kerberos in an environment. In Active Directory, domain controllers are KDCs and replicate to other DCs in the environment, which makes Active Directory an ideal platform to run Kerberos on due to ease of use and familiarity. As a bonus, Active Directory is already primed with UNIX attributes for Identity Management with LDAP. (Note: Windows 2012 has UNIX attributes by default; prior to 2012, you had to manually extend the schema.)

Kerberos principals

Kerberos principals are objects within a KDC that can have tickets assigned. Users can own principals. Machine accounts can own principals. However, simply creating a user or machine account doesn’t mean you have created a principal. Those are stored within the object’s LDAP schema attributes in Active Directory. Generally speaking, it’s one of either:

  • servicePrincipalName (SPN)
  • userPrincipalName (UPN)

These get set when adding computers to a domain (including joining Linux clients), as well as when creating new users (every user gets a UPN). Principals include three different components.

  1. Primary – this defines the type of principal (usually a service such as ldap, nfs, host, etc) and is followed by a “/”; Not all principals have primary components. For example, most users are simply user@REALM.COM.
  2. Secondary – this defines the name of the principal (such as jimbob)
  3. Realm – This is the Kerberos realm and is usually defined in ALL CAPS and is the name of the domain your principal was added into (such as CONTOSO.COM)

Keytabs

The keytab file allows a client or server that is participating in an NFS mount to use their keytab to generate AS (authentication service) ticket requests. Think of this as the principal “logging in” to the KDC, similar to what you’d do with a username and password. Keytab files can make their way to clients one of two ways.

  1. Manually creating and copying the keytab file to the client (old school)
  2. Using the domain join tool of your choice (realmd, net ads/samba, adcli, etc.) on the client to automatically negotiate the keytab and machine principals on the KDC (recommended)

Keytab files, when created using the domain join tools, will create multiple entries for Kerberos principals. Generally, this will include a service principal name (SPN) for host/shortname@REALM.COM, host/fully.qualified.name@REALM.COM and a UPN for the machine account such as MACHINE$@REALM.COM. The auto-generated keytabs will also include multiple entries for each principal with different encryption types (enctypes). The following is an example of a CentOS 7 box’s keytab joined to an AD domain using realm join:

# klist -kte
Keytab name: FILE:/etc/krb5.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (des-cbc-crc)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (des-cbc-md5)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (aes128-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (aes256-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (arcfour-hmac)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (des-cbc-crc)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (des-cbc-md5)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (aes128-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (aes256-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (arcfour-hmac)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (des-cbc-crc)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (des-cbc-md5)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (aes128-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (aes256-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (arcfour-hmac)

Encryption types (enctypes)

Encryption types (or enctypes) are the level of encryption used for the Kerberos conversation. The client and KDC will negotiate the level of enctype used. The client will tell the KDC “hey, I want to use this list of enctypes. Which do you support?” and the KDC will respond “I support these, in order of strongest to weakest. Try using the strongest first.” In the example above, this is the order of enctype strength, from strongest to weakest:

  • AES-256
  • AES-128
  • ARCFOUR-HMAC
  • DES-CBC-MD5
  • DES-CBC-CRC

The reason a keytab file would add weaker enctypes like DES or ARCFOUR is for backwards compatibility. For example, Windows 2008 DCs don’t support AES enctypes. In some cases, the enctypes can cause Kerberos issues due to lack of support. Windows 2008 and later don’t support DES unless you explicitly enable it. ARCFOUR isn’t supported in clustered ONTAP for NFS Kerberos. In these cases, it’s good to modify the machine accounts to strictly define which enctypes to use for Kerberos.

What you need before you try mounting

This is a quick list of things that have to be in place before you can expect Kerberos with NFS to work properly. If I left something out, feel free to remind me in the comments. There’s so much info involved that I occasionally forget some things. 🙂

KDC and client – The KDC is a given – in this case, Active Directory. The client would need to have some things installed/configured before you try to join it, including a valid DNS server configuration, Kerberos utilities, etc. This varies depending on client and would be too involved to get into here. Again, TR-4073 would be a good place to start.

DNS entries for all clients and servers participating in the NFS Kerberos operation – this includes forward and reverse (PTR) records for the clients and servers. The DNS friendly names *must* match the SPN names. If they don’t, then when you try to mount, the DNS lookup will file the name hostname1 and use that to look up the SPN host/hostname1. If the SPN was called nfs/hostname2, then the Kerberos attempt will fail with “PRINCIPAL_UNKNOWN.” This is also true for Kerberos in CIFS/SMB environments. In ONTAP, a common mistake people make is they name the CIFS server or NFS Kerberos SPN as the SVM name (such as SVM1), but their DNS names are something totally different (such as cifs.domain.com).

Valid Kerberos SPNs and UPNs – When you join a Linux client to a domain, the machine account and SPNs are automatically created. However, the UPN is not created. Having no UPN on a machine account can create issues with some Linux services that use Kerberos keytab files to authenticate. For example, RedHat’s LDAP service (SSSD) can fail to bind if using a Kerberos service principal in the configuration via the ldap_sasl_authid option. The error you’d see would be “PRINCIPAL_UNKNOWN” and would drive you batty because it would be using a principal you *know* exists in your environment. That’s because it’s trying to find the UPN, not the SPN. You can manage the SPN and UPN via the Active Directory attributes tab in the advanced features view. You can query whether SPNs exist via the setspn command (use /q to query by SPN name) in the CLI or PowerShell.

PS C:\> setspn /q host/centos7.ntap.local
Checking domain DC=NTAP,DC=local
CN=CENTOS7,CN=Computers,DC=NTAP,DC=local
 HOST/centos7.ntap.local
 HOST/CENTOS7

Existing SPN found!

You can view a user’s UPN and SPN with the following PowerShell command:

PS C:\> Get-ADUser student1 -Properties UserPrincipalName,ServicePrincipalName

DistinguishedName : CN=student1,CN=Users,DC=NTAP,DC=local
Enabled : True
GivenName : student1
Name : student1
ObjectClass : user
ObjectGUID : d5d5b526-bef8-46fa-967b-00ebc77e468d
SamAccountName : student1
SID : S-1-5-21-3552729481-4032800560-2279794651-1108
Surname :
UserPrincipalName : student1@NTAP.local

And a machine account’s with:

PS C:\> Get-ADComputer CENTOS7$ -Properties UserPrincipalName,ServicePrincipalName

DistinguishedName : CN=CENTOS7,CN=Computers,DC=NTAP,DC=local
DNSHostName : centos7.ntap.local
Enabled : True
Name : CENTOS7
ObjectClass : computer
ObjectGUID : 3a50009f-2b40-46ea-9014-3418b8d70bdb
SamAccountName : CENTOS7$
ServicePrincipalName : {HOST/centos7.ntap.local, HOST/CENTOS7}
SID : S-1-5-21-3552729481-4032800560-2279794651-1140
UserPrincipalName : HOST/centos7.ntap.local@NTAP.LOCAL

Network Time Protocol (NTP) – With Kerberos, there is a 5 minute default time skew window. If a client and server/KDC’s time is outside of that window, Kerberos requests will fail with “Access denied” and you’d see time skew errors in the cluster logs. This KB covers it nicely:

https://kb.netapp.com/support/s/article/ka11A0000001V1YQAU/Troubleshooting-Workflow-CIFS-Authentication-failures?language=en_US

A common issue I’ve seen with this is time zone differences or daylight savings issues. I’ve often seen the wall clock time look identical on server and client, but the time zones or month/date differ, causing the skew.

The NTP requirement is actually a “make sure your time is up to date and in sync on everything” requirement, but NTP makes that easier.

Kerberos to UNIX name mappings – In ONTAP, we authenticate via name mappings not only for CIFS/SMB, but also for Kerberos. When a client attempts to send an authentication request to the cluster for an AS request or ST (service ticket) request, it has to map to a valid UNIX user. The UNIX user mapping will depend on what type of principal is coming in. If you don’t have a valid name mapping rule, you’d see something like this in the event log:

5/16/2017 10:24:23 ontap9-tme-8040-01
 ERROR secd.nfsAuth.problem: vserver (DEMO) General NFS authorization problem. Error: RPC accept GSS token procedure failed
 [ 8 ms] Acquired NFS service credential for logical interface 1034 (SPN='nfs/demo.ntap.local@NTAP.LOCAL').
 [ 11] GSS_S_COMPLETE: client = 'CENTOS7$@NTAP.LOCAL'
 [ 11] Trying to map SPN 'CENTOS7$@NTAP.LOCAL' to UNIX user 'CENTOS7$' using implicit mapping
 [ 12] Using a cached connection to oneway.ntap.local
**[ 14] FAILURE: User 'CENTOS7$' not found in UNIX authorization source LDAP.
 [ 15] Entry for user-name: CENTOS7$ not found in the current source: LDAP. Ignoring and trying next available source
 [ 15] Entry for user-name: CENTOS7$ not found in the current source: FILES. Entry for user-name: CENTOS7$ not found in any of the available sources
 [ 15] Unable to map SPN 'CENTOS7$@NTAP.LOCAL'
 [ 15] Unable to map Kerberos NFS user 'CENTOS7$@NTAP.LOCAL' to appropriate UNIX user

For service principals (SPNS) such as host/name or nfs/name, the mapping would try to default to primary/, so you’d need a UNIX user named host or nfs on the local SVM or in a name service like LDAP. Otherwise, you can create static krb-unix name mappings in the SVM to map to whatever user you like. If you want to use wild cards, regex, etc. you can do  that. For example, this name mapping rule will map all SPNs coming in as {MACHINE}$@REALM.COM to root.

cluster::*> vserver name-mapping show -vserver DEMO -direction krb-unix -position 1

Vserver: DEMO
 Direction: krb-unix
 Position: 1
 Pattern: (.+)\$@NTAP.LOCAL
 Replacement: root
IP Address with Subnet Mask: -
 Hostname: -

To test the mapping, use diag priv:

cluster::*> diag secd name-mapping show -node node1 -vserver DEMO -direction krb-unix -name CENTOS7$@NTAP.LOCAL

'CENTOS7$@NTAP.LOCAL' maps to 'root'

You can map the SPN to root, pcuser, etc. – as long as the UNIX user exists locally on the SVM or in the name service.

The workflow

Now that I’ve gotten some basics out of the way (and if you find that I’ve missed some, add to the comments), let’s look at how the workflow for an NFS mount using Kerberos would work, end to end. This is assuming we’ve configured everything correctly and are ready to mount, and that all the export policy rules allow the client to mount NFSv4 and Kerberos. If a mount fails, always check your export policy rules first.

Some common export policy issues include:

  • The export policy doesn’t have any rules configured
  • The vserver/SVM root volume doesn’t allow read access in the export policy rule for traversal of the / mount point in the namespace
  • The export policy has rules, but they are either misconfigured (clientmatch is wrong, read access disallowed, NFS protocol or auth method is disallowed) or they aren’t allowing the client to access the mount (Run export-policy rule show -instance)
  • The wrong/unexpected export policy has been applied to the volume (Run volume show -fields policy)

What’s unfortunate about trying to troubleshoot mounts with NFS Kerberos involved is that, regardless of the failures happening, the client will report:

mount.nfs: access denied by server while mounting

It’s a generic error and isn’t really helpful in diagnosing the issue.

In ONTAP, there is a command in admin privilege to check the export policy access for the client for troubleshooting purposes. Be sure to use it to rule out export issues.

cluster::> export-policy check-access -vserver DEMO -volume flexvol -client-ip 10.193.67.225 -authentication-method krb5 -protocol nfs4 -access-type read-write
 Policy Policy Rule
Path Policy Owner Owner Type Index Access
----------------------------- ---------- --------- ---------- ------ ----------
/ root vsroot volume 1 read
/flexvol default flexvol volume 1 read-write
2 entries were displayed.

The mount command is issued.

In my case, I use NFSv4.x, as that’s the security standard. Mounting without specifying a version will default to the highest NFS version allowed by the client and server, via a client-server negotiation. If NFSv4.x is disabled on the server, the client will fall back to NFSv3.

# mount -o sec=krb5 demo:/flexvol /mnt

Once the mount command gets issued and Kerberos is specified, a few (ok, a lot of) things happen in the background.

While this stuff happens, the mount command will appear to “hang” as the client, KDC and server suss out if you’re going to be allowed access.

  • DNS lookups are done for the client hostname and server hostname (or reverse lookup of the IP address) to help determine what names are going to be used. Additionally, SRV lookups are done for the LDAP service and Kerberos services in the domain. DNS lookups are happening constantly through this process.
  • The client uses its keytab file to send an authentication service request (AS-REQ) to the KDC, along with what enctypes it has available. The KDC then verifies if the requested principal actually exists in the KDC and if the enctypes are supported.
  • If the enctypes are not supported, or if the principal exists, or if there are DUPLICATE principals, the AS-REQ fails. If the principal exists, the KDC will send a successful reply.
  • Then the client will send a Ticket Granting Service request (TGS-REQ) to the KDC. This request is an attempt to look up the NFS service ticket named nfs/name. The name portion of the ticket is generated either via what was typed into the mount command (ie, demo) or via reverse lookup (if we typed in an IP address to mount). The TGS-REQ will be used later to allow us to obtain a service ticket (ST). The TGS will also negotiate supported enctypes for later. If the TGS-REQ between the KDC and client negotiates an enctype that ONTAP doesn’t support (for example, ARCFOUR), then the mount will fail later in process.
  • If the TGS-REQ succeeds, a TGS-REP is sent. If the KDC doesn’t support the requested enctypes from the client, we fail here. If the NFS principal doesn’t exist (remember, it has to be in DNS and match exactly), then we fail.
  • Once the TGS is acquired by the NFS client, it presents the ticket to the NFS server in ONTAP via a NFS NULL call. The ticket information includes the NFS service SPN and the enctype used. If the NFS SPN doesn’t match what’s in “kerberos interface show,” the mount fails. If the enctype presented by the client isn’t supported or is disallowed in “permitted enctypes” on the NFS server, the request fails. The client would show “access denied.”
  • The NFS service SPN sent by the client is presented to ONTAP. This is where the krb-unix mapping takes place. ONTAP will first see if a user named “nfs” exists in local files or name services (such as LDAP, where a bind to the LDAP server and lookup takes place). If the user doesn’t exist, it will then check to see if any krb-unix name mapping rules were set explicitly. If no rules exist and mapping fails, ONTAP logs an error on the cluster and the mount fails with “Access denied.” If the mapping works, the mount procedure moves on to the next step.
  • After the NFS service ticket is verified, the client will send SETCLIENTID calls and then the NFSv4.x mount compound call (PUTROOTFH | GETATTR). The client and server are also negotiating the name@domainID string to make sure they match on both sides as part of NFSv4.x security.
  • Then, the client will try to run a series of GETATTR calls to “/” in the path. If we didn’t allow “read” access in the policy rule for “/” (the vsroot volume), we fail. If the ACLs/mode bits on the vsroot volume don’t allow at least traverse permissions, we fail. In a packet trace, we can see that the vsroot volume has only traverse permissions:
    V4 Reply (Call In 268) ACCESS, [Access Denied: RD MD XT], [Allowed: LU DL]

    We can also see that from the cluster CLI (“Everyone” only has “Execute” permissions in this NTFS security style volume):

    cluster::> vserver security file-directory show -vserver DEMO -path / -expand-mask true
    
    Vserver: DEMO
     File Path: /
     File Inode Number: 64
     Security Style: ntfs
     Effective Style: ntfs
     DOS Attributes: 10
     DOS Attributes in Text: ----D---
    Expanded Dos Attributes: 0x10
     ...0 .... .... .... = Offline
     .... ..0. .... .... = Sparse
     .... .... 0... .... = Normal
     .... .... ..0. .... = Archive
     .... .... ...1 .... = Directory
     .... .... .... .0.. = System
     .... .... .... ..0. = Hidden
     .... .... .... ...0 = Read Only
     UNIX User Id: 0
     UNIX Group Id: 0
     UNIX Mode Bits: 777
     UNIX Mode Bits in Text: rwxrwxrwx
     ACLs: NTFS Security Descriptor
     Control:0x9504
    
    1... .... .... .... = Self Relative
     .0.. .... .... .... = RM Control Valid
     ..0. .... .... .... = SACL Protected
     ...1 .... .... .... = DACL Protected
     .... 0... .... .... = SACL Inherited
     .... .1.. .... .... = DACL Inherited
     .... ..0. .... .... = SACL Inherit Required
     .... ...1 .... .... = DACL Inherit Required
     .... .... ..0. .... = SACL Defaulted
     .... .... ...0 .... = SACL Present
     .... .... .... 0... = DACL Defaulted
     .... .... .... .1.. = DACL Present
     .... .... .... ..0. = Group Defaulted
     .... .... .... ...0 = Owner Defaulted
    
    Owner:BUILTIN\Administrators
     Group:BUILTIN\Administrators
     DACL - ACEs
     ALLOW-NTAP\Domain Admins-0x1f01ff-OI|CI
     0... .... .... .... .... .... .... .... = Generic Read
     .0.. .... .... .... .... .... .... .... = Generic Write
     ..0. .... .... .... .... .... .... .... = Generic Execute
     ...0 .... .... .... .... .... .... .... = Generic All
     .... ...0 .... .... .... .... .... .... = System Security
     .... .... ...1 .... .... .... .... .... = Synchronize
     .... .... .... 1... .... .... .... .... = Write Owner
     .... .... .... .1.. .... .... .... .... = Write DAC
     .... .... .... ..1. .... .... .... .... = Read Control
     .... .... .... ...1 .... .... .... .... = Delete
     .... .... .... .... .... ...1 .... .... = Write Attributes
     .... .... .... .... .... .... 1... .... = Read Attributes
     .... .... .... .... .... .... .1.. .... = Delete Child
     .... .... .... .... .... .... ..1. .... = Execute
     .... .... .... .... .... .... ...1 .... = Write EA
     .... .... .... .... .... .... .... 1... = Read EA
     .... .... .... .... .... .... .... .1.. = Append
     .... .... .... .... .... .... .... ..1. = Write
     .... .... .... .... .... .... .... ...1 = Read
    
    ALLOW-Everyone-0x100020-OI|CI
     0... .... .... .... .... .... .... .... = Generic Read
     .0.. .... .... .... .... .... .... .... = Generic Write
     ..0. .... .... .... .... .... .... .... = Generic Execute
     ...0 .... .... .... .... .... .... .... = Generic All
     .... ...0 .... .... .... .... .... .... = System Security
     .... .... ...1 .... .... .... .... .... = Synchronize
     .... .... .... 0... .... .... .... .... = Write Owner
     .... .... .... .0.. .... .... .... .... = Write DAC
     .... .... .... ..0. .... .... .... .... = Read Control
     .... .... .... ...0 .... .... .... .... = Delete
     .... .... .... .... .... ...0 .... .... = Write Attributes
     .... .... .... .... .... .... 0... .... = Read Attributes
     .... .... .... .... .... .... .0.. .... = Delete Child
     .... .... .... .... .... .... ..1. .... = Execute
     .... .... .... .... .... .... ...0 .... = Write EA
     .... .... .... .... .... .... .... 0... = Read EA
     .... .... .... .... .... .... .... .0.. = Append
     .... .... .... .... .... .... .... ..0. = Write
     .... .... .... .... .... .... .... ...0 = Read
  • If we have the appropriate permissions to traverse “/” then the NFS client attempts to find the file handle for the mount point via a LOOKUP call, using the file handle of vsroot in the path. It would look something like this:
    V4 Call (Reply In 271) LOOKUP DH: 0x92605bb8/flexvol
  • If the file handle exists, it gets returned to the client:
    fh.png
  • Then the client uses that file handle to run GETATTRs to see if it can access the mount:
    V4 Call (Reply In 275) GETATTR FH: 0x1f57355e

If all is clear, our mount succeeds!

But we’re not done… now the user that wants to access the mount has to go through another ticket process. In my case, I used a user named “student1.” This is because a lot of the Kerberos/NFSv4.x requests I get are generated by universities interested in setting up multiprotocol-ready home directories.

When a user like student1 wants to get into a Kerberized NFS mount, they can’t just cd into it. That would look like this:

# su student1
sh-4.2$ cd /mnt
sh: cd: /mnt: Not a directory

Oh look… another useless error! If I were to take that error literally, I would think “that mount doesn’t even exist!” But, it does:

sh-4.2$ mount | grep mnt
demo:/flexvol on /mnt type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.193.67.225,local_lock=none,addr=10.193.67.219)

What that error actually means is that the user requesting access does not have a valid Kerberos AS ticket (login) to make the request for a TGS (ticket granting ticket) to get a service ticket for NFS (nfs/server-hostname). We can see that via the klist -e command.

sh-4.2$ klist -e
klist: Credentials cache keyring 'persistent:1301:1301' not found

Before you can get into a mount that is only allowing Kerberos access, you have to get a Kerberos ticket. On Linux, you can do that via the kinit command, which is akin to a Windows login.

sh-4.2$ kinit
Password for student1@NTAP.LOCAL:
sh-4.2$ klist -e
Ticket cache: KEYRING:persistent:1301:1301
Default principal: student1@NTAP.LOCAL

Valid starting Expires Service principal
05/16/2017 15:54:01 05/17/2017 01:54:01 krbtgt/NTAP.LOCAL@NTAP.LOCAL
 renew until 05/23/2017 15:53:58, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96

Now that I have a my ticket, I can cd into the mount. When I cd into a Kerberized NFS mount, the client will make TGS requests to the KDC (seen in the trace in packet 101) for the service ticket. If that process is successful, we get access:

sh-4.2$ cd /mnt
sh-4.2$ pwd
/mnt
sh-4.2$ ls
c0 c1 c2 c3 c4 c5 c6 c7 newfile2 newfile-nfs4
sh-4.2$ klist -e
Ticket cache: KEYRING:persistent:1301:1301
Default principal: student1@NTAP.LOCAL

Valid starting Expires Service principal
05/16/2017 15:55:32 05/17/2017 01:54:01 nfs/demo.ntap.local@NTAP.LOCAL
 renew until 05/23/2017 15:53:58, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
05/16/2017 15:54:01 05/17/2017 01:54:01 krbtgt/NTAP.LOCAL@NTAP.LOCAL
 renew until 05/23/2017 15:53:58, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96

Now we’re done. (at least until our tickets expire…)

 

vSphere 6.5: The NFS edition

A while back, I wrote up a blog about the release of vSphere 6.0, with an NFS slant to it. Why? Because NFS!

With VMworld 2016 Barcelona, vSphere 6.5 (where’d .1, .2, .3, .4 go?) has been announced, so I can discuss the NFS impact. If you’d like a more general look at the release, check out Cormac Hogan’s blog on it.

Whither VMFS?

Before I get into the new NFS feature/functionality of the release, let’s talk about the changes to VMFS. One of the reasons people use NFS with VMware (other than its awesomeness) is that VMFS had some… limitations.

With the announcement of VMFS-6, some of those limitations have been removed. For example, VMFS-6 includes something called UNMAP, which essentially is a garbage collector for unused whitespace inside the VMFS datastore. This provides better space efficiency than previous iterations.

Additionally, VMware has added some performance enhancements to VMFS, so it may outperform NFS, especially if using fiber channel.

Other than that, you still can’t shrink the datastore, it’s not that easy to expand, etc. So, minor improvements that shouldn’t impact NFS too terribly. People who love NFS will likely stay on NFS. People who love VMFS will be happy with the improvements. Life goes on…

What’s new in vSphere 6.5 from the NFS perspective?

In vSphere 6.0, NFS 4.1 support was added.

However, it was a pretty minimal stack – no pNFS, no delegations, no referrals, etc. They basically added session trunking/multipath, which is cool – but there was still a lot to be desired. On the downside, that feature isn’t even supported in ONTAP yet. So close, yet so far…

In vSphere 6.5, the NFS 4.1 stack has been expanded a bit to include hardware acceleration for NFSv4.1. This is actually a pretty compelling addition, as it can help the overall NFSv4.1 performance of the datastore.

NFSv4.1 also fully supports IPv6. Your level of excitement is solely based on how many people you think use IPv6 right now.

Kerberos

Perhaps the most compelling NFS changs in vSphere 6.5 is how we secure our mounts.

In 6.0, Kerberos support was added, but you could only do DES. Blah.

Now, Kerberos support in vSphere 6.5 includes:

  • AES-128
  • AES-256
  • REMOVAL of DES encryption
  • Kerberos with integrity checking  (krb5i – prevents “man in the middle” attacks)

Now, while it’s pretty cool that they removed support for the insecure DES enctype, that *is* going to be a disruptive change for people using Kerberos. The machine account/principal will need to be destroyed and re-created, clients will need to re-mount, etc. But, it’s an improvement!

How vSphere 6.5 personally impacts me

The downside of these changes means that I have to adjust my Insight presentation a bit. If you’re going to Insight in Berlin, check out 60831-2: How Customers and Partners use NFS for Virtualization.

Still looking forward to pNFS in vSphere, though…

TECH::How pNFS could benefit cloud architecture

Yesterday, I was speaking with a customer who is a cloud provider. They were discussing how to use NFSv4 with clustered Data ONTAP for one of their customers. As we were talking, I brought up pNFS and its capabilities. They were genuinely excited about what pNFS could do for their particular use case.

What is pNFS?

pNFS is “parallel NFS,” which is a little bit of a misnomer, as it doesn’t do parallel reads and writes across different object stores (i.e., striping). Instead, it establishes a metadata path to the NFS server and then splits off the data path to its own dedicated path. The client works with the NFS server to determine which path is local to the physical location of the object store. Think of it asALUA for NFS.

In the case of pNFS on clustered Data ONTAP, NetApp currently supports file-level pNFS, so the object store would be a flexible volume on an aggregate of physical disks. pNFS is covered in detail in NetApp TR-4063.

pnfs

Great! So, what’s cool about pNFS?

There are a number of things that makes pNFS a cool new NFS feature, such as:

  • All the benefits and security of standard NFSv4.1
  • Metadata and data separation
  • Transparent-to-client data migration – data agility!

For more, check out the full post on DataCenterDude.com!

Harnessing the Power of pNFS in the Cloud with NetApp

TECH::How to set up Kerberos on vSphere 6.0 servers for datastores on NFS

For a more in-depth, updated version:

Kerberize your NFSv4.1 Datastores in ESXi 6.5 using NetApp ONTAP

In case you were living under a rock somewhere, VMWare released vSphere 6.0 in March. I covered some of my thoughts from a NFS perspective in vSphere 6.0 – NFS Thoughts.

In that blog, I covered some of the new features, including support for Kerberized NFS on vSphere 6.0. However, in my experience of setting up Kerberos for NFS clients, I learned that doing it can be a colossal pain in the ass. Luckily, vSphere 6.0 actually makes the process pretty easy.

TR-4073: Secure Unified Authentication will eventually contain information on how to do it, but I wanted to get the information out now and strike while the iron is hot!

What is Kerberos?

cerberus

I cover some scenarios regarding securing your NFS environment in “Feeling insecure about NFS?” One of those I mention is Kerberos, but I never really go into detail about what Kerberos actually is.

Kerberos is a ticket-based authentication process that eliminates the need to send passwords over the wire in text format. Instead, passwords are stored on a centralized server (known as a Key Distribution Center, or KDC) that issues tickets to grant tickets for access. This is done through varying levels of encryption, which is controlled via the client, server and keytabs. Right now, the best you can do is AES, which is the NIST standard. Clustered Data ONTAP 8.3 supports both AES-128 and AES-256, by the way. 🙂

However, vSphere 6.0 supports only DES, so…

Again,  TR-4073: Secure Unified Authentication covers this all in more detail than you’d probably want…

Kerberize… like a rockstar!

game-blouses

In one of my Insight sessions, I break down the Kerberos authentication process as a real-world scenario, such as buying a ticket to see your favorite band.

  • A person joins a fan club for first access to concert tickets
    • Ticket Granting Ticket (TGT) issued from Key Distribution Center (KDC)
  • A person buys the concert ticket to see their favorite band
    • TGT used to request Service Ticket (ST) from the KDC
  • They pick the ticket up at the box office
    • ST issued by KDC
  • They use the ticket to get into the concert arena
    • Authentication
  • The ticket specifies which seat they are allowed to sit in
    • Authorization; backstage pass specifies what special permissions they have

Why Kerberos?

One of the questions you may be asking, or have heard asked is, “why the heck do I want to Kerberize my NFS datastore mount? Doesn’t my export policy rule secure it enough?”

Well, how easy is it to change an IP address of an ESXi server? How easy is it to create a user? That’s really all you need to mount NFSv3. However, Kerberos requires a user name and password to get a ticket, interaction with a KDC, ticket exchange, etc.

So, it’s much more secure.

Awesome… how do I do it?

Glad you asked!

After you’ve set up your KDC and preferred NFS server to do Kerberos, you’d need to set the client up. In this case, the client is vSphere 6.0.

Step 1: Configure DNS

Kerberos needs DNS to work properly. This is tied to how service principal names (SPNs) are queried on the KDC. So, you need the following:

  • Forward and reverse lookup records on the DNS server for the ESXi server
  • Proper DNS configuration on the ESXi server

Example:

DNS-conf

Step 2: Configure NTP

Kerberos is very sensitive to time skew. There is a default of 5 minutes allowed between client/server/KDC. If the skew is outside of that, the Kerberos request will fail. This is for your security. 🙂

ntp

Step 3: Join ESXi to the Active Directory Domain

This essentially saves you the effort of messing with manual configuration of creating keytabs, SPNs, etc. Save yourself time and headaches.

Join-domain

Step 4: Specify a user principal name (UPN)

This user will be used by ESXi to kinit and grab a ticket granting ticket (TGT). Again, it’s entirely possible to do this manually and likely possible to leverage keytab authentication. But, again, save yourself the headache.

Credentials

Step 5: Create the NFS datastore for use with NFSv4.1 and Kerberos authentication

You *could* Kerberized NFSv3. But why? All that gets encrypted is the NFS stuff. NLM, NSM, portmap, mount, etc don’t get Kerberized. NFSv4.1 encapsulates all things related to the protocol, so encrypting NFSv4.1 encrypts it all.

New-datastore
Enter the server/datastore information:

Add-datastore-nfsv4.1

Be sure you don’t forget to enable Kerberos:

Kerberos-enable

After you’re done, test it out!

TECH::vSphere 6.0 – NFS thoughts

DISCLAIMER: I work for NetApp. However, I don’t speak for NetApp. These are my own views. 🙂

I’m a tad late to the party here, as there have already been numerous blogs about what’s new in vSphere 6.0, etc. I haven’t seen anything regarding what was missing from a NFS perspective, however. So I’m going to attempt to fill that gap.

What new NFS features were added?

Famously, vSphere 6 brings us NFSv4.1. NFSV4.1 is an enhancement of NFSV4.0, which brought the following features:

  • Pseudo/unified namespace
  • TCP only
  • Better security via domain ID string mapping, single firewall port and Kerberos integration
  • Better locking than NFSv3 via a lease-based model
  • Compound NFS calls (i.e., combining multiple NFS operations into a single packet)
  • Better standardization of the protocol, leveraging IETF
  • More granular ACLs (similar to Windows NTFS ACLs)
  • NFS referrals
  • NFS sessions
  • pNFS

I cover NFSv4.x in some detail in TR-4067 and TR-4073. I cover pNFS in TR-4063.

I wrote a blog post a while back on the Evolution of NAS, which pointed out how NFS and CIFS were going all Voltron on us and basically becoming similar enough to call them nearly identical.

vSphere 6.0 also brings the ability to Kerberize NFS mounts, as well as VVOL support. Fun fact: NetApp is currently the only storage vendor with support for VVOLs over NFS. 

Why do these features matter?

As Stephen Foskett correctly pointed out in his blog, adoption of NFSv4.x has been… slow. A lot of reasons for that, in addition to what he said.

  • Performance. NFSv3 is simply faster in most cases now. Though, that narrative is changing…
  • Disruption. NFSv3 had the illusion of being non-disruptive in failover events. NFSv4 is stateful, thus more susceptible to interruptions, but its locking makes it less susceptible to data loss/corruption in failover events (both network and storage).
  • Infrastructure. It’s a pain in the ass to add name services to an existing enterprise environment to ensure proper ID string mapping.
  • Disdain for change. No one wants to be the “early adopter” in a production environment.

However, more and more applications are recommending NFSv4.x. TIBCO is one. IBM MQueue is another. Additionally, there is a greater focus on security with recent data breaches and hacks, so storage administrators will need to start filling check boxes to be compliant with new security regulations. NFSv4.x features (Kerberos, domain ID, limited firewall ports to open) will likely be on that list. And now, vSphere offers NFSv4.1 with some limited features. What this means for the NFS protocol is that more people will start using it. And as more people start using it, the open-source-ness will start to kick in and the protocol will improve.

As for Kerberos, one of the questions you may be asking, or have heard ask is, “why the heck do I want to Kerberize my NFS datastore mount?” Doesn’t my export policy rule secure it enough?

Well, how easy is it to change an IP address of an ESXi server? How easy is it to create a user? That’s really all you need to mount NFSv3. However, Kerberos requires a user name and password, interaction with a KDC, ticket exchange, etc. So, it’s much more secure.

As for VVOLs, they could be a game changer in the world of software-defined storage.

Check out the following:

Virtual Volumes (VVOLs) On Horizon to Deliver Software Defined Storage for vSphere

The official VMware VVOL blog

vMiss also has a great post on VVOLs on her blog.

Also, NetApp’s ESX TME Peter Learmonth (@titaniumlegs on Twitter) has a video on it:

That’s great and all… but what’s missing?

While it’s awesome that VMware is attempting to keep the NFS stack up to date by adding NFSv4.1 and Kerberos, it just felt a little… incomplete.

For one Kerberos was added, but only with DES support. This is problematic on a few levels. For one, DES is old and laughably weak as far as Kerberos enctypes go. DES was cracked in less than a day… in 2008. If they were going to add Kerberos, why not AES, which is the NIST standard? Were they concerned about performance? AES has been known to be a bit of a hog. If that was a concern, though, why not implement the Intel AES CPU?

As for NFSv4.1… WHERE IS PNFS?? pNFS is an ideal protocol for what virtual machines do – open once, stream reads and writes. Not a ton of metadata. Mobile and agile with storage VMotion and volume moves in clustered Data ONTAP. No need to use up a ton of IP addresses (one per node, per datastore). Most storage operations via NFS would be simplified and virtually transparent with pNFS. Hopefully they add that one soon.

Ultimately, an improvement

I’m glad that VMware added some NFS improvements. It’s a step in the right direction. And they certainly beefed up the capabilities of vSphere 6 with added hardware support. Some of those numbers… monstrous! Hopefully they continue the dedication to NFS in future releases.

Wait, there’s more?!?

That’s right! In addition to the improvements of vSphere 6.0, there is also VMWare Horizon, which integrates with NetApp’s All-Flash FAS solutions. NetApp All-Flash FAS is provides the only all-flash NFS support on the market!

To learn more about it, see this video created by NetApp TME Chris Gebhardt.

You can also see the Shankay Iyer’s blog post here.

Introducing A New Release of VMWare Horizon!

For more info…

What’s New in the VMware vSphere 6.0 Platform

For a snarky rundown on NFSv4.1 and vSphere 6.0, check out Stephen Foskett’s blog.

For some more information on NFS-specific features, see Cormac Hogan’s post.