Kerberize your NFSv4.1 Datastores in ESXi 6.5 using NetApp ONTAP

How do I set up Kerberos with NFSv4.1 datastores in ESXi 6.5?

 

homer-rubik

I have gotten this question enough now (and from important folks like Cormac Hogan) to write up an end to end guide on how to do this. I had a shorter version in TR-4073 and then a Linux-specific version in TR-4616, but admittedly, it wasn’t really enough to help people get the job done. I also wrote up some stuff on ESXi Kerberos in a previous blog post that wasn’t nearly as in-depth called How to set up Kerberos on vSphere 6.0 servers for datastores on NFS.

Cormac also wrote up one of his own:

https://cormachogan.com/2017/10/12/getting-grips-nfsv4-1-kerberos/

I will point out that, in general, NFS Kerberos is a pain in the ass for people who don’t set it up on a regular basis and understand its inner workings, regardless of the vendor you’re interacting with. The reason is that there are multiple moving parts involved, and support for various portions of Kerberos (such as encryption types) vary. Additionally, some hosts automate things better than others.

We’re going to set it all up as if we only have created a basic SVM with data LIFs and some volumes. If you have an existing SVM configured with NFS and such, you can retrofit as needed. While this blog covers only NFS Kerberos using ONTAP as the NFS server, the steps for AD and ESXi would apply to other NFS servers as well, in most cases.

Here’s the lab setup:

  • ONTAP 9.3 (but this works for ONTAP 9.0 and onward)
    • SVM is SVM1
  • Windows 2012R2 (any Windows KDC that supports AES will work)/Active Directory
    • DNS, KDC
    • Domain is core-tme.netapp.com
  • ESXi 6.5
    • Hostname is CX400S1-GWNSG6B

ONTAP Configuration

While there are around 10 steps to do this, keep in mind that you generally only have to configure this once per SVM.

We’ll start with ONTAP, via the GUI. I’ll also include a CLI section. First, we’ll start in the SVM configuration section to minimize clicks. This is found under Storage -> SVMs. Then, click on the SVM you want to configure and then click on the NFS protocol.

SVM-protocol

1. Configure DNS

We need this because Kerberos uses DNS lookups to determine host names/IPs. This will be the Active Directory DNS information. This is found under Services -> DNS/DDNS.

dns

2. Enable NFSv4.1

NFSv4.1 is needed to allow NFSv4.1 mounts (obviously). You don’t need to enable NFSv4.0 to use NFSv4.1. ESXi doesn’t support v4.0 anyway. But it is possible to use v3.0, v4.0 and v4.1 in the same SVM. This can be done under “Protocols -> NFS” on the left menu.

v41.png

3. Create an export policy and rule for the NFS Kerberos datastore volume

Export policies in ONTAP are containers for rules. Rules are what defines the access to an exported volume. With export policy rules, you can limit the NFS version allowed, the authentication type, root access, hosts, etc. For ESXi, we’re defining the ESXi host (or hosts) in the rule. We’re allowing NFS4 only and Kerberos only. We’re allowing the ESXi host to have root access. If you use NFSv3 with Kerberos for these datastores, be sure to adjust the policy and rules accordingly. This is done under the “Policies” menu section on the left.

export-rule.png

4. Verify that vsroot has an export policy and rule that allows read access to ESXi hosts

Vsroot is “/” in the namespace. As a result, for clients to mount NFS exports, they must have at least read access via vsroot’s export policy rule and at least traverse permissions (1 in mode bits) to navigate through the namespace. In most cases, vsroot uses “default” as the export policy. Verify whichever export policy is being used has the proper access.

If a policy doesn’t have a rule, create one. This is an example of minimum permissions needed for the ESXi host to traverse /.

vsroot-policy.png

5. Create the Kerberos realm

The Kerberos realm is akin to /etc/krb5.conf on Linux clients. It tells ONTAP where to look to attempt the bind/join to the KDC. After that, KDC servers are discovered by internal processes and won’t need the realm. The realm domain should be defined in ALL CAPS. This is done in System Manager using “Services -> Kerberos Realm” on the left.

realm.png

6. Enable Kerberos on your data LIF (or LIFs)

To use NFS Kerberos, you need to tell ONTAP which data LIFs will participate in the requests. Doing this specifies a service principal for NFS on that data LIF. The SVM will interact with the SVM defined in the Kerberos realm to create a new machine object in AD that can be used for Kerberos. The SPN is defined as nfs/hostname.domain.com and represents the name you want clients to use to access shares. This FQDN needs to exist in DNS as a forward and reverse record to ensure things work properly. If you enable Kerberos on multiple data LIFs with the same name, the machine account gets re-used. If you use different SPNs on LIFs, different accounts get created. You have a 15 character limit for the “friendly” display name in AD. If you want to change the name later, you can. That’s covered in TR-4616.

kerb-interface.png

7. Create local UNIX group and users

For Kerberos authentication, a krb-unix name mapping takes place, where the incoming SPN will attempt to map to a UNIX user that is either local on the SVM or in external name service servers. You always need the “nfs” user, as the nfs/fqdn SPN will map to “nfs” implicitly. The other user will depend on the user you specify in ESXi when you configure Kerberos. That UNIX user will use the same user name. In my example, I used “parisi,” which is a user in my AD domain. Without these local users, the krb-unix name mapping would fail and manifest as “permission denied” errors when mounting. The cluster would show errors calling out name mapping in “event log show.”

Alternatively, you can create name mapping rules. This is covered in TR-4616. UNIX users and groups can be created using the UNIX menu option under “Host Users and Groups” in the left menu.

The numeric GID and UID can be any unused numeric in your environment. I used 501 and 502.

First, create the primary group to assign to the users.

group.png

Then, create the NFS user with the Kerberos group as the primary group.

nfs-user.png

Finally, create the user you used in the Kerberos config in ESXi.

parisi-user.png

Failure to create the user would result in the following similar error in ONTAP. (In the below, I used a user named ‘vsphere’ to try to authenticate):

**[ 713] FAILURE: User 'vsphere' not found in UNIX authorization source LDAP.
 [ 713] Entry for user-name: vsphere not found in the current source: LDAP. Ignoring and trying next available source
 [ 715] Entry for user-name: vsphere not found in the current source: FILES. Entry for user-name: vsphere not found in any of the available sources
 [ 717] Unable to map SPN 'vsphere@CORE-TME.NETAPP.COM'
 [ 717] Unable to map Kerberos NFS user 'vsphere@CORE-TME.NETAPP.COM' to appropriate UNIX user

8. Create the volume to be used as the datastore

This is done from “Storage -> Volumes.” In ONTAP 9.3, the only consideration is that you must specify a “Protection” option, even if it’s “none.” Otherwise, it will throw an error.

vol create

vol-protect.png

Once the volume is created, it automatically gets exported to /volname.

9. Verify the volume security style is UNIX for the datastore volume

The volume security style impacts how a client will attempt to authenticate into the ONTAP cluster. If a volume is NTFS security style, then NFS clients will attempt to map to Windows users to figure out the access allowed on an object. System Manager doesn’t let you define the security style at creation yet and will default to the security style of the vsroot volume (which is / in the namespace). Ideally, vsroot would also be UNIX security style, but in some cases, NTFS is used. For VMware datastores, there is no reason to use NTFS security style.

From the volumes screen, click on the newly created volume and click the “Edit” button to verify UNIX security style is used.

sec-style.png

10. Change the export policy assigned to the volume to the ESX export policy you created

Navigate to “Storage -> Namespace” to modify the export policy used by the datastore.

change-policy.png

11. Configure NTP

This prevents the SVM from getting outside of the 5 minute time skew that can break Kerberos authentication. This is done via the CLI. No GUI support for this yet.

cluster::> ntp server create -server stme-infra02.core-tme.netapp.com -version auto

12. Set the NFSv4 ID domain

While we’re in the CLI, let’s set the ID domain. This ID domain is used for client-server interaction, where a user string will be passed for NFSv4.x operations. If the user string doesn’t match on each side, the NFS user gets squashed to “nobody” as a security mechanism. This would be the same domain string on both ESX and on the NFS server in ONTAP (case-sensitive). For example, “core-tme.netapp.com” would be the ID domain here and users from ESX would come in as user@core-tme.netapp.com. ONTAP would look for user@core-tme.netapp.com to exist.

In ONTAP, that command is:

cluster::> nfs modify -vserver SVM1 -v4-id-domain core-tme.netapp.com

13. Change datastore volume permissions

By default, volumes get created with the root user and group as the owner, and 755 access. In ESX, if you want to create VMs on a datastore, you’d need either root access or to change write permissions. When you use Kerberos, ESX will use the NFS credentials specified in the configuration as the user that writes to the datastore. Think of this as a “VM service account” more or less. So, your options are:

  • Change the owner to a different user than root
  • Use root as the user (which would need to exist as a principal in the KDC)
  • Change permissions

In my opinion, changing the owner is the best, most secure choice here. To do that:

cluster::> volume modify -vserver SVM1 -volume kerberos_datastore -user parisi

That’s all from ONTAP for the GUI. The CLI commands would be (all in admin priv):

cluster::> dns create -vserver SVM1 -domains core-tme.netapp.com -name-servers 10.193.67.181 -timeout 2 -attempts 1 -skip-config-validation true
cluster::> nfs modify -vserver SVM1 -v4.1 enabled
cluster::> export-policy create -vserver SVM1 -policyname ESX
cluster::> export-policy rule create -vserver SVM1 -policyname ESX -clientmatch CX400S1-GWNSG6B.core-tme.netapp.com -rorule krb5* -rwrule krb5* -allow-suid true -ruleindex 1 -protocol nfs4 -anon 65534 -superuser any
cluster::> vol show -vserver SVM1 -volume vsroot -fields policy
cluster::> export-policy rule show -vserver SVM1 -policy [policy from prev command] -instance
cluster::> export-policy rule modify or create (if changes are needed)
cluster::> kerberos realm create -vserver SVM1 -realm CORE-TME.NETAPP.COM -kdc-vendor Microsoft -kdc-ip 10.193.67.181 -kdc-port 88 -clock-skew 5 -adminserver-ip 10.193.67.181 -adminserver-port 749 -passwordserver-ip 10.193.67.181 -passwordserver-port 464 -adserver-name stme-infra02.core-tme.netapp.com -adserver-ip 10.193.67.181
cluster::> kerberos interface enable -vserver SVM1 -lif data -spn nfs/ontap9.core-tme.netapp.com
cluster::> unix-group create -vserver SVM1 -name kerberos -id 501
cluster::> unix-user create -vserver SVM1 -user nfs -id 501 -primary-gid 501
cluster::> unix-user create -vserver SVM1 -user parisi -id 502 -primary-gid 501
cluster::> volume create -vserver SVM1 -volume kerberos_datastore -aggregate aggr1_node1 -size 500GB -state online -policy kerberos -user 0 -group 0 -security-style unix -unix-permissions ---rwxr-xr-x -junction-path /kerberos_datastore 

 ESXi Configuration

This is all driven through the vSphere GUI. This would need to be performed on each host that is being used for NFSv4.1 Kerberos.

1. Configure DNS

This is done under the “Hosts and Clusters -> Manage -> Networking -> TCP/IP config.”

dns-esx.png

2. Configure NTP

This is found in “Hosts and Clusters -> Settings -> Time Configuration”

ntp

3. Join the ESXi host to the Active Directory domain

Doing this automatically creates the machine account in AD and will transfer the keytab files between the host and KDC. This also sets the SPNs on the machine account. The user specified in the credentials must have create object permissions in the Computers OU in AD. (For example, a domain administrator)

This is found in “Hosts and Clusters -> Settings -> Authentication Services.”

join-domain.png

4. Specify NFS Kerberos Credentials

This is the user that will authenticate with the KDC and ONTAP for the Kerberos key exchange. This user name will be the same as the UNIX user you used in ONTAP. If you use a different name, create a new UNIX user in ONTAP or create a name mapping rule. If the user password changes in AD, you must also change it in ESXi.

nfs-creds

With NFS Kerberos in ESX, the ID you specified in NFS Kerberos credentials will be the ID used to write. For example, I used “parisi” as the user. My SVM is using LDAP authentication with AD. That user exists in my LDAP environment as the following:

cluster::*> getxxbyyy getpwbyuid -node ontap9-tme-8040-01 -vserver SVM1 -userID 3629
  (vserver services name-service getxxbyyy getpwbyuid)
pw_name: parisi
pw_passwd: 
pw_uid: 3629
pw_gid: 512
pw_gecos: 
pw_dir: 
pw_shell: /bin/sh

As a result, the test VM I create got written as that user:

drwxr-xr-x   2 3629  512    4096 Oct 12 10:40 test

To even be able to write at all, I had to change the UNIX permissions on the datastore to allow write access to “others.” Alternatively, I could have changed the owner of the volume to the specified user. I mention those steps in the ONTAP section.

If you plan on changing the user for NFS creds, be sure to use “clear credentials,” which will restart the service and clear caches. Occasionally, you may need to restart the nfsgssd service from the CLI if something is stubbornly cached:

[root@CX400S1-03003-B3:/] /etc/init.d/nfsgssd restart
watchdog-nfsgssd: Terminating watchdog process with PID 33613
Waiting for process to terminate...
nfsgssd stopped
nfsgssd started

In rare cases, you may have to leave and re-join the domain, which will generate new keytabs. In one particularly stubborn case, I had to reboot the ESX server after I changed some credentials and the Kerberos principal name in ONTAP.

That’s the extent of the ESXi host configuration for now. We’ll come back to the host to mount the datastore once we make some changes in Active Directory.

Active Directory Configuration

Because there are variations in support for encryption types, as well as DNS records needed, there are some AD tasks that need to be performed to get Kerberos to work.

1. Configure the machine accounts

Set the machine account attributes for the ESXi host(s) and ONTAP NFS server to only allow AES encryption. Doing this avoids failures to mount via Kerberos that manifest as “permission denied” on the host. In a packet trace, you’d potentially be able to see the ESXi host trying to exchange keys with the KDC and getting “unsupported enctype” errors if this step is skipped.

The exact attribute to change is msDS-SupportedEncryptionTypes. Set that value to 24, which is AES only. For more info on encryption types in Windows, click to view this blog.

You can change this attribute using “Advanced Features” view with the attribute editor. If that’s not available or it’s not possible to use, you can also modify using PowerShell.

To modify in the GUI:

msds-enctype.png

To modify using PowerShell:

PS C:\> Set-ADComputer -Identity [NFSservername] -Replace @{'msDS-SupportedEncryptionTypes'=24}

2. Create DNS records for the ESXi hosts and the ONTAP server

This would be A/AAAA records for forward lookup and PTR for reverse. Windows DNS let’s you do both at the same time. Verify the DNS records with “nslookup” commands.

This can also be done via GUI or PowerShell.

From the GUI:

From PowerShell:

PS C:\Users\admin>Add-DnsServerResourceRecordA -IPv4Address 10.193.67.220 -CreatePtr core-tme.netapp.com -Name ontap9

PS C:\Users\admin>Add-DnsServerResourceRecordA -IPv4Address 10.193.67.35 -CreatePtr core-tme.netapp.com -Name cx400s1-gwnsg6b

Mounting the NFS Datastore via Kerberos

Now, we’re ready to create the datastore in ESX using NFSv4.1 and Kerberos.

Simply go to “Add Datastore” and follow the prompts to select the necessary options.

1. Select “NFS” and then “NFS 4.1.”

VMware doesn’t recommend mixing v3 and v4.1 on datastores. If you have an existing datastore that you were mounting via v3, VMware recommends migrating VMs using storage vmotion.

new-ds1new-ds2

2. Specify the name and configuration

The datastore name can be anything you like. The “folder” has to be the junction-path/export path on the ONTAP cluster. In our example, we use /kerberos_datastore.

Server(s) would be the data LIF you enabled Kerberos on. ONTAP doesn’t support NFSv4.1 multi-pathing/trunking yet, so specifying multiple NFS servers won’t necessarily help here.

new-ds3.png

4. Check “Enable Kerberos-based authentication”

Kind of a no-brainer here, but still worth mentioning.

new-ds4.png

5. Select the hosts that need access.

If other hosts have not been configured for Kerberos, they won’t be available to select.

new-ds5.png

6. Review the configuration details and click “Finish.”

This should mount quickly and without issue. If you have an issue, review the “Troubleshooting” tips below.

new-ds6.png

This can also be done with a command from ESX CLI:

esxcli storage nfs41 add -H ontap9 -a SEC_KRB5 -s /kerberos_datastore -v kerberosDS

Troubleshooting Tips

If you follow the steps above, this should all work fine.

mounted-ds.png

But sometimes I make mistakes. Sometimes YOU make mistakes. It happens. 🙂

Some steps I use to troubleshoot Kerberos mount issues…

  • Review the vmkernel logs for vcenter
  • Review “event log show” from the cluster CLI
  • Ensure the ESX host name and SVM host name exist in DNS (nslookup)
  • Use packet traces from the DC to see what is failing during Kerberos authentication (filter on “kerberos” in wireshark)
  • Review the SVM config:
    • Ensure NFSv4.1 is enabled
    • Ensure the SVM has DNS configured
    • Ensure the Kerberos realm is all caps and is created on the SVM
    • Ensure the desired data LIF has Kerberos enabled (from system manager of via “kerberos interface show” from the cli)
    • Ensure the export policies and rules allow access to the ESX datastore volume for Kerberos, superuser and NFSv4. Ensure the vsroot volume allows at least read access for the ESX host.
    • Ensure the SVM has the appropriate UNIX users and group created (nfs user for the NFS SPN; UNIX user name that matches the NFS user principal defined in ESX) or the users exist in external name services
  • From the KDC/AD domain controller:
    • Ensure the machine accounts created use AES only to avoid any weird issues with encryption type support
    • Ensure the SPNs aren’t duplicated (setspn /q {service/fqdn}
    • Ensure the user defined in the NFS Kerberos config hasn’t had a password expire

 

Adventures in Upgrading ESXi

Here at NetApp, we have a variety of labs available to us to tinker with. I work with a few other TMEs in managing a few clustered Data ONTAP clusters, as well as an ESXi server farm. We have 6 ESXi servers that we just moved into a new lab location and are finally ready to be powered back up after a 4-5 month hiatus.

So, I figured, since the lab’s been down for so long anyway, why not upgrade the ESXi servers from 5.1 to 6.0 update 2 while we’re at it?

What could possibly go wrong on my first actual ESXi upgrade on servers that have been migrated from different IP addresses, some of which may still be lingering on the system and are unreachable?

Well, I’ll tell you.

First attempt at upgrading a server, all sorts of things were broken.

  • vCenter couldn’t connect
  • The web client couldn’t connect – error was “503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http16LocalServiceSpecE:0x1f06ff18] _serverNamespace = / _isRedirect = false _port = 8309)”
  • esxcli and vim-cmd commands failed with:
[root@esxi1:~] esxcli
Connect to localhost failed: Connection failure.

After spending a few hours poking around to try to fix the issue, I decided it was probably user error.  I used “install” instead of “update” and when I rebooted, so that probably nuked the server, right?

So I tried again on a new server. This time, I read the manual and did the update the way that was supposedly correct. I even got an error found in the release notes and used VMware’s workaround:

~ # esxcli system maintenanceMode set --enable true
~ # esxcli system maintenanceMode get
Enabled
~ # esxcli software vib update -d /vmfs/volumes/vm_storage/ESX6/update-from-esxi
6.0-6.0_update02.zip
 [DependencyError]
 VIB VMware_bootbank_esx-base_6.0.0-2.34.3620759 requires vsan >= 6.0.0-2.34, bu t the requirement cannot be satisfied within the ImageProfile.
 VIB VMware_bootbank_esx-base_6.0.0-2.34.3620759 requires vsan << 6.0.0-2.35, bu t the requirement cannot be satisfied within the ImageProfile.
 VIB VMware_bootbank_ehci-ehci-hcd_1.0-3vmw.600.2.34.3620759 requires xhci-xhci >= 1.0-3vmw.600.2.34, but the requirement cannot be satisfied within the ImagePr ofile.
 Please refer to the log file for more details.
~ # esxcli software profile update -d /vmfs/volumes/vm_storage/ESX6/update-from-esxi6.0-6.0_update02.zip -p ESXi-6.0.0-20160302001-standard
Update Result
 Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
 Reboot Required: true

After I rebooted:

[root@esxi1:~] esxcli
Connect to localhost failed: Connection failure.

Son of a…

I started Googling like a madman.

google-errors

Found the ever-helpful William Lam’s blog on the web client issue. His recommendation was running a vim-cmd command. However…

[root@esxi2:~] vim-cmd hostsvc/advopt/update Config.HostAgent.plugins.solo.enableMob bool true
Failed to login: Invalid response code: 503 Service Unavailable

In the vpxa.log file, a ton of these:

verbose vpxa[FF8E8AC0] [Originator@6876 sub=vpxXml] [VpxXml] Error fetching /sdk/vimService?wsdl: 503 (Service Unavailable)
warning vpxa[FFCC0B70] [Originator@6876 sub=Default] Closing Response processing in unexpected state: 3
warning vpxa[FFCC0B70] [Originator@6876 sub=hostdcnx] [VpxaHalCnxHostagent] Could not resolve version for authenticating to host agent

 

The log suggested there was a connection failure on port 443, but telnet to that port worked fine. It took me a little bit of tinkering, but I finally figured out where that port number is controlled – /etc/vmware/vpxa/vpxa.cfg.

In that log file, I also noticed that my IP address was wrong – it was using the old IP addresses the hosts had. I changed the IP address and the port used to port 80. Once I did that, my error changed a bit. This time, it was a SSL error:

Error in sending request - SSL Exception

I spent a bit more time poking around and finally decided – time to blow it up. Way easier to re-install a lab box than to try to dig through all the configuration files.

If you find yourself in a similar bind, don’t waste your time – unless it’s production. Then open a case.

I think my issue ended up being a combination of:

  • Stale IP addresses
  • Stale iSCSI HBA settings
  • Stale configs
  • Upgrading to ESXi 6 without addressing the above first

If anyone has any suggestions for fixing this issue, by all means, post in the comments. 🙂

UPDATE:

Both ESXi boxes have been wiped and reinstalled with ESXi 6.0. All is working fine. Funny story, though… after one re-image, I connected via SSH and thought it broke again. Turns out I had a duplicate IP and was still connecting to the old server. Ooops.

TECH::How to set up Kerberos on vSphere 6.0 servers for datastores on NFS

For a more in-depth, updated version:

Kerberize your NFSv4.1 Datastores in ESXi 6.5 using NetApp ONTAP

In case you were living under a rock somewhere, VMWare released vSphere 6.0 in March. I covered some of my thoughts from a NFS perspective in vSphere 6.0 – NFS Thoughts.

In that blog, I covered some of the new features, including support for Kerberized NFS on vSphere 6.0. However, in my experience of setting up Kerberos for NFS clients, I learned that doing it can be a colossal pain in the ass. Luckily, vSphere 6.0 actually makes the process pretty easy.

TR-4073: Secure Unified Authentication will eventually contain information on how to do it, but I wanted to get the information out now and strike while the iron is hot!

What is Kerberos?

cerberus

I cover some scenarios regarding securing your NFS environment in “Feeling insecure about NFS?” One of those I mention is Kerberos, but I never really go into detail about what Kerberos actually is.

Kerberos is a ticket-based authentication process that eliminates the need to send passwords over the wire in text format. Instead, passwords are stored on a centralized server (known as a Key Distribution Center, or KDC) that issues tickets to grant tickets for access. This is done through varying levels of encryption, which is controlled via the client, server and keytabs. Right now, the best you can do is AES, which is the NIST standard. Clustered Data ONTAP 8.3 supports both AES-128 and AES-256, by the way. 🙂

However, vSphere 6.0 supports only DES, so…

Again,  TR-4073: Secure Unified Authentication covers this all in more detail than you’d probably want…

Kerberize… like a rockstar!

game-blouses

In one of my Insight sessions, I break down the Kerberos authentication process as a real-world scenario, such as buying a ticket to see your favorite band.

  • A person joins a fan club for first access to concert tickets
    • Ticket Granting Ticket (TGT) issued from Key Distribution Center (KDC)
  • A person buys the concert ticket to see their favorite band
    • TGT used to request Service Ticket (ST) from the KDC
  • They pick the ticket up at the box office
    • ST issued by KDC
  • They use the ticket to get into the concert arena
    • Authentication
  • The ticket specifies which seat they are allowed to sit in
    • Authorization; backstage pass specifies what special permissions they have

Why Kerberos?

One of the questions you may be asking, or have heard asked is, “why the heck do I want to Kerberize my NFS datastore mount? Doesn’t my export policy rule secure it enough?”

Well, how easy is it to change an IP address of an ESXi server? How easy is it to create a user? That’s really all you need to mount NFSv3. However, Kerberos requires a user name and password to get a ticket, interaction with a KDC, ticket exchange, etc.

So, it’s much more secure.

Awesome… how do I do it?

Glad you asked!

After you’ve set up your KDC and preferred NFS server to do Kerberos, you’d need to set the client up. In this case, the client is vSphere 6.0.

Step 1: Configure DNS

Kerberos needs DNS to work properly. This is tied to how service principal names (SPNs) are queried on the KDC. So, you need the following:

  • Forward and reverse lookup records on the DNS server for the ESXi server
  • Proper DNS configuration on the ESXi server

Example:

DNS-conf

Step 2: Configure NTP

Kerberos is very sensitive to time skew. There is a default of 5 minutes allowed between client/server/KDC. If the skew is outside of that, the Kerberos request will fail. This is for your security. 🙂

ntp

Step 3: Join ESXi to the Active Directory Domain

This essentially saves you the effort of messing with manual configuration of creating keytabs, SPNs, etc. Save yourself time and headaches.

Join-domain

Step 4: Specify a user principal name (UPN)

This user will be used by ESXi to kinit and grab a ticket granting ticket (TGT). Again, it’s entirely possible to do this manually and likely possible to leverage keytab authentication. But, again, save yourself the headache.

Credentials

Step 5: Create the NFS datastore for use with NFSv4.1 and Kerberos authentication

You *could* Kerberized NFSv3. But why? All that gets encrypted is the NFS stuff. NLM, NSM, portmap, mount, etc don’t get Kerberized. NFSv4.1 encapsulates all things related to the protocol, so encrypting NFSv4.1 encrypts it all.

New-datastore
Enter the server/datastore information:

Add-datastore-nfsv4.1

Be sure you don’t forget to enable Kerberos:

Kerberos-enable

After you’re done, test it out!

TECH::Uh, I didn’t put that VM there… #vExpert

Ever find yourself browsing vSphere and seeing a VM show up on a datastore you *know* you didn’t put that VM in? I’ve run into this issue a few times and never have seen a KB or blog post on it. Closest I’ve seen is this one:

In VMware vCenter Server 5.x a virtual machine with a snapshot displays datastores or port groups that are no longer in use

If you’ve ever run into this issue, you know how irritating and maddening it can be. Forehead-smashing even. One of my co-workers/co-lab admins ran into this a couple weeks ago, so I decided to blog it up.

For example, in my vSphere, I have a datastore mounted (via NFS on clustered Data ONTAP, of course!) that contains ISO images for my VMs to use for installs, upgrades, etc. But I don’t ever create VMs on it. In fact, I’ve got roles set to disallow creating VMs. However, when I browse to it…

vsphere-datastore

And when I look at the datastore where that VM is *supposed* to exist…

vsphere-datastore2

So what gives? Why is my VM, which I am certain only exists once, showing up in multiple places, including a datastore where I can’t/don’t create VMs?

The answer? @#$^! snapshots.

The datastore that is showing an extra VM is my ISO datastore. As I mentioned, it hosts my ISO images that I mount to VMs. In this case, my Centos 6.5 image has an ISO mounted.

snapshot-iso

When I took a snapshot of the VM in vSphere, that meant it took a snapshot of the VM with an ISO mounted. To do that, it had to include information about the ISO in the snapshot, which means it “added” the VM to the ISO datastore.

snapshot-manager

So how do I fix it?

Simple – delete the snapshot. If you want a snapshot of the VM that doesn’t do this, unmount the ISO before taking a snapshot. And, as a best practice, unmount your ISOs when you’re done with them. (Put your toys away where you found them!)

After I delete the snapshot, the VM still shows up in the ISO datastore, because I have not unmounted the ISO yet. Once I unmount the ISO, the VM disappears from the datastore…

vm-removed

TECH::vSphere 6.0 – NFS thoughts

DISCLAIMER: I work for NetApp. However, I don’t speak for NetApp. These are my own views. 🙂

I’m a tad late to the party here, as there have already been numerous blogs about what’s new in vSphere 6.0, etc. I haven’t seen anything regarding what was missing from a NFS perspective, however. So I’m going to attempt to fill that gap.

What new NFS features were added?

Famously, vSphere 6 brings us NFSv4.1. NFSV4.1 is an enhancement of NFSV4.0, which brought the following features:

  • Pseudo/unified namespace
  • TCP only
  • Better security via domain ID string mapping, single firewall port and Kerberos integration
  • Better locking than NFSv3 via a lease-based model
  • Compound NFS calls (i.e., combining multiple NFS operations into a single packet)
  • Better standardization of the protocol, leveraging IETF
  • More granular ACLs (similar to Windows NTFS ACLs)
  • NFS referrals
  • NFS sessions
  • pNFS

I cover NFSv4.x in some detail in TR-4067 and TR-4073. I cover pNFS in TR-4063.

I wrote a blog post a while back on the Evolution of NAS, which pointed out how NFS and CIFS were going all Voltron on us and basically becoming similar enough to call them nearly identical.

vSphere 6.0 also brings the ability to Kerberize NFS mounts, as well as VVOL support. Fun fact: NetApp is currently the only storage vendor with support for VVOLs over NFS. 

Why do these features matter?

As Stephen Foskett correctly pointed out in his blog, adoption of NFSv4.x has been… slow. A lot of reasons for that, in addition to what he said.

  • Performance. NFSv3 is simply faster in most cases now. Though, that narrative is changing…
  • Disruption. NFSv3 had the illusion of being non-disruptive in failover events. NFSv4 is stateful, thus more susceptible to interruptions, but its locking makes it less susceptible to data loss/corruption in failover events (both network and storage).
  • Infrastructure. It’s a pain in the ass to add name services to an existing enterprise environment to ensure proper ID string mapping.
  • Disdain for change. No one wants to be the “early adopter” in a production environment.

However, more and more applications are recommending NFSv4.x. TIBCO is one. IBM MQueue is another. Additionally, there is a greater focus on security with recent data breaches and hacks, so storage administrators will need to start filling check boxes to be compliant with new security regulations. NFSv4.x features (Kerberos, domain ID, limited firewall ports to open) will likely be on that list. And now, vSphere offers NFSv4.1 with some limited features. What this means for the NFS protocol is that more people will start using it. And as more people start using it, the open-source-ness will start to kick in and the protocol will improve.

As for Kerberos, one of the questions you may be asking, or have heard ask is, “why the heck do I want to Kerberize my NFS datastore mount?” Doesn’t my export policy rule secure it enough?

Well, how easy is it to change an IP address of an ESXi server? How easy is it to create a user? That’s really all you need to mount NFSv3. However, Kerberos requires a user name and password, interaction with a KDC, ticket exchange, etc. So, it’s much more secure.

As for VVOLs, they could be a game changer in the world of software-defined storage.

Check out the following:

Virtual Volumes (VVOLs) On Horizon to Deliver Software Defined Storage for vSphere

The official VMware VVOL blog

vMiss also has a great post on VVOLs on her blog.

Also, NetApp’s ESX TME Peter Learmonth (@titaniumlegs on Twitter) has a video on it:

That’s great and all… but what’s missing?

While it’s awesome that VMware is attempting to keep the NFS stack up to date by adding NFSv4.1 and Kerberos, it just felt a little… incomplete.

For one Kerberos was added, but only with DES support. This is problematic on a few levels. For one, DES is old and laughably weak as far as Kerberos enctypes go. DES was cracked in less than a day… in 2008. If they were going to add Kerberos, why not AES, which is the NIST standard? Were they concerned about performance? AES has been known to be a bit of a hog. If that was a concern, though, why not implement the Intel AES CPU?

As for NFSv4.1… WHERE IS PNFS?? pNFS is an ideal protocol for what virtual machines do – open once, stream reads and writes. Not a ton of metadata. Mobile and agile with storage VMotion and volume moves in clustered Data ONTAP. No need to use up a ton of IP addresses (one per node, per datastore). Most storage operations via NFS would be simplified and virtually transparent with pNFS. Hopefully they add that one soon.

Ultimately, an improvement

I’m glad that VMware added some NFS improvements. It’s a step in the right direction. And they certainly beefed up the capabilities of vSphere 6 with added hardware support. Some of those numbers… monstrous! Hopefully they continue the dedication to NFS in future releases.

Wait, there’s more?!?

That’s right! In addition to the improvements of vSphere 6.0, there is also VMWare Horizon, which integrates with NetApp’s All-Flash FAS solutions. NetApp All-Flash FAS is provides the only all-flash NFS support on the market!

To learn more about it, see this video created by NetApp TME Chris Gebhardt.

You can also see the Shankay Iyer’s blog post here.

Introducing A New Release of VMWare Horizon!

For more info…

What’s New in the VMware vSphere 6.0 Platform

For a snarky rundown on NFSv4.1 and vSphere 6.0, check out Stephen Foskett’s blog.

For some more information on NFS-specific features, see Cormac Hogan’s post.