Backing up/restoring ONTAP SMB shares with PowerShell

486042-636355594290390040-16x9

A while back, I posted a SMB share backup and restore PowerShell script written by one of our SMB developers.  Later, Scott Harney added some scripts for NFS exports. You can find those here:

https://github.com/DatacenterDudes/cDOT-CIFS-share-backup-restore

That was back in the ONTAP 8.3.x timeframe. They’ve worked pretty well for the most part, but since then, we’re up to ONTAP 9.3 and I’ve occasionally gotten feedback that the scripts throw errors sometimes.

While the idea of an open script repository is to have other people send updates of scripts and make it a living, breathing and evolving entity, that’s not how this script has ended up. Instead, it’s gotten old and crusty and in need of an update. The inspiration was this reddit thread:

So, I’ve done that. You can find the updated versions of the script for ONTAP 9.x at the same place as before:

https://github.com/DatacenterDudes/cDOT-CIFS-share-backup-restore

However, other than for testing purposes, it may not have been necessary to do anything. I actually ran the original restore script without changing anything of note (changed some comments) and it ran fine. The errors most people see either have to do with the version of the NetApp PowerShell toolkit, a syntax error in their copy/paste or their version of PowerShell. Make sure they’re all up to date, else you’ll run into errors. I used:

  • Windows 2012R2
  • ONTAP 9.4 (yes, I have access to early releases!)
  • PowerShell 4.0.1.1
  • Latest NetApp PowerShell toolkit (4.5.1 for me)

When should I use these scripts?

These were created as a way to fill the gap that SVM-DR now fills. Basically, before SVM-DR existed, there was no way to backup and restore CIFS configurations. Even with SVM-DR, these scripts offer some nice granular functionality to backup and restore specific configuration areas and can be modified to include other things like CIFS options, SAN configuration, etc.

As for how to run them…

Backing up your shares

1) Download and install the latest PowerShell toolkit from https://mysupport.netapp.com/tools/info/ECMLP2310788I.html?productID=61926

ps-toolkit

2) Import the DataONTAP module with “Import-Module DataONTAP”

(be sure that the PowerShell window is closed and re-opened after you install the toolkit; otherwise, Windows won’t find the new module to import)

3) Back up the desired shares as per the usage comments in the script. (see below)

# Usage:
# Run as: .\backupSharesAcls.ps1 -server <mgmt_ip> -user <mgmt_user> -password <mgmt_user_password> -vserver <vserver name> -share <share name or * for all> -shareFile <xml file to store shares> -aclFile <xml file to store acls> -spit <none,less,more depending on info to print>
#
# Example
# 1. If you want to save only a single share on vserver vs2.
# Run as: .\backupSharesAcls.ps1 -server 10.53.33.59 -user admin -password netapp1! -vserver vs2 -share test2 -shareFile C:\share.xml -aclFile C:\acl.xml -spit more 
#
# 2. If you want to save all the shares on vserver vs2.
# Run as: .\backupSharesAcls.ps1 -server 10.53.33.59 -user admin -password netapp1! -vserver vs2 -share * -shareFile C:\share.xml -aclFile C:\acl.xml -spit less
#
# 3. If you want to save only shares that start with "test" and share1 on vserver vs2.
# Run as: .\backupSharesAcls.ps1 -server 10.53.33.59 -user admin -password netapp1! -vserver vs2 -share "test* | share1" -shareFile C:\share.xml -aclFile C:\acl.xml -spit more
#
# 4. If you want to save shares and ACLs into .csv format for examination.
# Run as: .\backupSharesAcls.ps1 -server 10.53.33.59 -user admin -password netapp1! -vserver vs2 -share * -shareFile C:\shares.csv -aclFile C:\acl.csv -csv true -spit more

If you use “-spit more” you’ll get verbose output:

backup-shares

4) Review the shares/ACLs via the XML files.

That’s it for backup. Pretty straightforward. However, our backups are only as good as our restores…

Restoring the shares using the script

I don’t recommend testing this script the first time on a production system. I’d suggest creating a test SVM, or even leveraging SVM-DR to replicate the SVM to a target location.

In my lab, however… who cares! Let’s blow it all away!

delete-shares

Now, run your restore.

restore-shares-acl

That’s it! Happy backing up/restoring!

Tips for running the script

  • Before running the script, copy and paste it into the “PowerShell ISE” to verify that the syntax is correct. From there, save the script to the local client. Syntax errors can cause problems with the script’s success.
  • Use the latest available NetApp PowerShell Toolkit and ensure the PowerShell version on your client matches what is in the release notes for the toolkit.
  • Test the script on a dummy SVM before running in production.
  • Ensure the DataONTAP module has been imported; if import fails after installing the toolkit, close the PowerShell window and re-open it.

Questions?

If you have any questions or comments, leave them here. Also, if you customize these at all, please do share with the community! Add them to the Github repository or create your own repo!

Advertisements

NFS Kerberos in ONTAP Primer

Fun fact!

Kerberos was named after Cerberus, the hound of Hades, which protected the gates of the underworld with its three heads of gnashing teeth.

cerberos

Kerberos in IT security isn’t a whole lot different; it’s pretty effective at stopping intruders and is literally a three-headed monster.

In my day to day role as a Technical Marketing Engineer for NFS, I find that one of the most challenging questions I get is regarding NFS mounts using Kerberos. This is especially true now, as IT organizations are focusing more and more on securing their data and Kerberos is one way to do that. CIFS/SMB already does a nice job of this and it’s pretty easily integrated without having to do a ton on the client or storage side.

With NFS Kerberos, however, there are a ton of moving parts and not a ton of expertise that spans those moving parts. Think for a moment what all is involved here when dealing with ONTAP:

  • DNS
  • KDC server (Key Distribution Center)
  • Client/principal
  • NFS server/principal
  • ONTAP
  • NFS
  • LDAP/name services

This blog post isn’t designed to walk you through all those moving parts; that’s what TR-4073 was written for. Instead, this blog is going to simply walk through the workflow of what happens during an NFS mount using Kerberos and where things can fail/common failure scenarios. This post will focus on Active Directory KDCs, since that’s what I see most and get the most questions on. Other UNIX-based KDCs are either not as widely used, or the admins running them are ninjas that never need any help. 🙂

Common terms

First, let’s cover a few common terms used in NFS Kerberos.

Storage Virtual Machine (SVM)

This is what clustered ONTAP uses to present NAS and SAN storage to clients. SVMs act as tenants within a cluster. Think of them as “virtualized storage blades.”

Key Distribution Center (KDC)

The Kerberos ticket headquarters. This stores all the passwords, objects, etc. for running Kerberos in an environment. In Active Directory, domain controllers are KDCs and replicate to other DCs in the environment, which makes Active Directory an ideal platform to run Kerberos on due to ease of use and familiarity. As a bonus, Active Directory is already primed with UNIX attributes for Identity Management with LDAP. (Note: Windows 2012 has UNIX attributes by default; prior to 2012, you had to manually extend the schema.)

Kerberos principals

Kerberos principals are objects within a KDC that can have tickets assigned. Users can own principals. Machine accounts can own principals. However, simply creating a user or machine account doesn’t mean you have created a principal. Those are stored within the object’s LDAP schema attributes in Active Directory. Generally speaking, it’s one of either:

  • servicePrincipalName (SPN)
  • userPrincipalName (UPN)

These get set when adding computers to a domain (including joining Linux clients), as well as when creating new users (every user gets a UPN). Principals include three different components.

  1. Primary – this defines the type of principal (usually a service such as ldap, nfs, host, etc) and is followed by a “/”; Not all principals have primary components. For example, most users are simply user@REALM.COM.
  2. Secondary – this defines the name of the principal (such as jimbob)
  3. Realm – This is the Kerberos realm and is usually defined in ALL CAPS and is the name of the domain your principal was added into (such as CONTOSO.COM)

Keytabs

The keytab file allows a client or server that is participating in an NFS mount to use their keytab to generate AS (authentication service) ticket requests. Think of this as the principal “logging in” to the KDC, similar to what you’d do with a username and password. Keytab files can make their way to clients one of two ways.

  1. Manually creating and copying the keytab file to the client (old school)
  2. Using the domain join tool of your choice (realmd, net ads/samba, adcli, etc.) on the client to automatically negotiate the keytab and machine principals on the KDC (recommended)

Keytab files, when created using the domain join tools, will create multiple entries for Kerberos principals. Generally, this will include a service principal name (SPN) for host/shortname@REALM.COM, host/fully.qualified.name@REALM.COM and a UPN for the machine account such as MACHINE$@REALM.COM. The auto-generated keytabs will also include multiple entries for each principal with different encryption types (enctypes). The following is an example of a CentOS 7 box’s keytab joined to an AD domain using realm join:

# klist -kte
Keytab name: FILE:/etc/krb5.keytab
KVNO Timestamp Principal
---- ------------------- ------------------------------------------------------
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (des-cbc-crc)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (des-cbc-md5)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (aes128-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (aes256-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/centos7.ntap.local@NTAP.LOCAL (arcfour-hmac)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (des-cbc-crc)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (des-cbc-md5)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (aes128-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (aes256-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 host/CENTOS7@NTAP.LOCAL (arcfour-hmac)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (des-cbc-crc)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (des-cbc-md5)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (aes128-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (aes256-cts-hmac-sha1-96)
 3 05/15/2017 18:01:39 CENTOS7$@NTAP.LOCAL (arcfour-hmac)

Encryption types (enctypes)

Encryption types (or enctypes) are the level of encryption used for the Kerberos conversation. The client and KDC will negotiate the level of enctype used. The client will tell the KDC “hey, I want to use this list of enctypes. Which do you support?” and the KDC will respond “I support these, in order of strongest to weakest. Try using the strongest first.” In the example above, this is the order of enctype strength, from strongest to weakest:

  • AES-256
  • AES-128
  • ARCFOUR-HMAC
  • DES-CBC-MD5
  • DES-CBC-CRC

The reason a keytab file would add weaker enctypes like DES or ARCFOUR is for backwards compatibility. For example, Windows 2008 DCs don’t support AES enctypes. In some cases, the enctypes can cause Kerberos issues due to lack of support. Windows 2008 and later don’t support DES unless you explicitly enable it. ARCFOUR isn’t supported in clustered ONTAP for NFS Kerberos. In these cases, it’s good to modify the machine accounts to strictly define which enctypes to use for Kerberos.

What you need before you try mounting

This is a quick list of things that have to be in place before you can expect Kerberos with NFS to work properly. If I left something out, feel free to remind me in the comments. There’s so much info involved that I occasionally forget some things. 🙂

KDC and client – The KDC is a given – in this case, Active Directory. The client would need to have some things installed/configured before you try to join it, including a valid DNS server configuration, Kerberos utilities, etc. This varies depending on client and would be too involved to get into here. Again, TR-4073 would be a good place to start.

DNS entries for all clients and servers participating in the NFS Kerberos operation – this includes forward and reverse (PTR) records for the clients and servers. The DNS friendly names *must* match the SPN names. If they don’t, then when you try to mount, the DNS lookup will file the name hostname1 and use that to look up the SPN host/hostname1. If the SPN was called nfs/hostname2, then the Kerberos attempt will fail with “PRINCIPAL_UNKNOWN.” This is also true for Kerberos in CIFS/SMB environments. In ONTAP, a common mistake people make is they name the CIFS server or NFS Kerberos SPN as the SVM name (such as SVM1), but their DNS names are something totally different (such as cifs.domain.com).

Valid Kerberos SPNs and UPNs – When you join a Linux client to a domain, the machine account and SPNs are automatically created. However, the UPN is not created. Having no UPN on a machine account can create issues with some Linux services that use Kerberos keytab files to authenticate. For example, RedHat’s LDAP service (SSSD) can fail to bind if using a Kerberos service principal in the configuration via the ldap_sasl_authid option. The error you’d see would be “PRINCIPAL_UNKNOWN” and would drive you batty because it would be using a principal you *know* exists in your environment. That’s because it’s trying to find the UPN, not the SPN. You can manage the SPN and UPN via the Active Directory attributes tab in the advanced features view. You can query whether SPNs exist via the setspn command (use /q to query by SPN name) in the CLI or PowerShell.

PS C:\> setspn /q host/centos7.ntap.local
Checking domain DC=NTAP,DC=local
CN=CENTOS7,CN=Computers,DC=NTAP,DC=local
 HOST/centos7.ntap.local
 HOST/CENTOS7

Existing SPN found!

You can view a user’s UPN and SPN with the following PowerShell command:

PS C:\> Get-ADUser student1 -Properties UserPrincipalName,ServicePrincipalName

DistinguishedName : CN=student1,CN=Users,DC=NTAP,DC=local
Enabled : True
GivenName : student1
Name : student1
ObjectClass : user
ObjectGUID : d5d5b526-bef8-46fa-967b-00ebc77e468d
SamAccountName : student1
SID : S-1-5-21-3552729481-4032800560-2279794651-1108
Surname :
UserPrincipalName : student1@NTAP.local

And a machine account’s with:

PS C:\> Get-ADComputer CENTOS7$ -Properties UserPrincipalName,ServicePrincipalName

DistinguishedName : CN=CENTOS7,CN=Computers,DC=NTAP,DC=local
DNSHostName : centos7.ntap.local
Enabled : True
Name : CENTOS7
ObjectClass : computer
ObjectGUID : 3a50009f-2b40-46ea-9014-3418b8d70bdb
SamAccountName : CENTOS7$
ServicePrincipalName : {HOST/centos7.ntap.local, HOST/CENTOS7}
SID : S-1-5-21-3552729481-4032800560-2279794651-1140
UserPrincipalName : HOST/centos7.ntap.local@NTAP.LOCAL

Network Time Protocol (NTP) – With Kerberos, there is a 5 minute default time skew window. If a client and server/KDC’s time is outside of that window, Kerberos requests will fail with “Access denied” and you’d see time skew errors in the cluster logs. This KB covers it nicely:

https://kb.netapp.com/support/s/article/ka11A0000001V1YQAU/Troubleshooting-Workflow-CIFS-Authentication-failures?language=en_US

A common issue I’ve seen with this is time zone differences or daylight savings issues. I’ve often seen the wall clock time look identical on server and client, but the time zones or month/date differ, causing the skew.

The NTP requirement is actually a “make sure your time is up to date and in sync on everything” requirement, but NTP makes that easier.

Kerberos to UNIX name mappings – In ONTAP, we authenticate via name mappings not only for CIFS/SMB, but also for Kerberos. When a client attempts to send an authentication request to the cluster for an AS request or ST (service ticket) request, it has to map to a valid UNIX user. The UNIX user mapping will depend on what type of principal is coming in. If you don’t have a valid name mapping rule, you’d see something like this in the event log:

5/16/2017 10:24:23 ontap9-tme-8040-01
 ERROR secd.nfsAuth.problem: vserver (DEMO) General NFS authorization problem. Error: RPC accept GSS token procedure failed
 [ 8 ms] Acquired NFS service credential for logical interface 1034 (SPN='nfs/demo.ntap.local@NTAP.LOCAL').
 [ 11] GSS_S_COMPLETE: client = 'CENTOS7$@NTAP.LOCAL'
 [ 11] Trying to map SPN 'CENTOS7$@NTAP.LOCAL' to UNIX user 'CENTOS7$' using implicit mapping
 [ 12] Using a cached connection to oneway.ntap.local
**[ 14] FAILURE: User 'CENTOS7$' not found in UNIX authorization source LDAP.
 [ 15] Entry for user-name: CENTOS7$ not found in the current source: LDAP. Ignoring and trying next available source
 [ 15] Entry for user-name: CENTOS7$ not found in the current source: FILES. Entry for user-name: CENTOS7$ not found in any of the available sources
 [ 15] Unable to map SPN 'CENTOS7$@NTAP.LOCAL'
 [ 15] Unable to map Kerberos NFS user 'CENTOS7$@NTAP.LOCAL' to appropriate UNIX user

For service principals (SPNS) such as host/name or nfs/name, the mapping would try to default to primary/, so you’d need a UNIX user named host or nfs on the local SVM or in a name service like LDAP. Otherwise, you can create static krb-unix name mappings in the SVM to map to whatever user you like. If you want to use wild cards, regex, etc. you can do  that. For example, this name mapping rule will map all SPNs coming in as {MACHINE}$@REALM.COM to root.

cluster::*> vserver name-mapping show -vserver DEMO -direction krb-unix -position 1

Vserver: DEMO
 Direction: krb-unix
 Position: 1
 Pattern: (.+)\$@NTAP.LOCAL
 Replacement: root
IP Address with Subnet Mask: -
 Hostname: -

To test the mapping, use diag priv:

cluster::*> diag secd name-mapping show -node node1 -vserver DEMO -direction krb-unix -name CENTOS7$@NTAP.LOCAL

'CENTOS7$@NTAP.LOCAL' maps to 'root'

You can map the SPN to root, pcuser, etc. – as long as the UNIX user exists locally on the SVM or in the name service.

The workflow

Now that I’ve gotten some basics out of the way (and if you find that I’ve missed some, add to the comments), let’s look at how the workflow for an NFS mount using Kerberos would work, end to end. This is assuming we’ve configured everything correctly and are ready to mount, and that all the export policy rules allow the client to mount NFSv4 and Kerberos. If a mount fails, always check your export policy rules first.

Some common export policy issues include:

  • The export policy doesn’t have any rules configured
  • The vserver/SVM root volume doesn’t allow read access in the export policy rule for traversal of the / mount point in the namespace
  • The export policy has rules, but they are either misconfigured (clientmatch is wrong, read access disallowed, NFS protocol or auth method is disallowed) or they aren’t allowing the client to access the mount (Run export-policy rule show -instance)
  • The wrong/unexpected export policy has been applied to the volume (Run volume show -fields policy)

What’s unfortunate about trying to troubleshoot mounts with NFS Kerberos involved is that, regardless of the failures happening, the client will report:

mount.nfs: access denied by server while mounting

It’s a generic error and isn’t really helpful in diagnosing the issue.

In ONTAP, there is a command in admin privilege to check the export policy access for the client for troubleshooting purposes. Be sure to use it to rule out export issues.

cluster::> export-policy check-access -vserver DEMO -volume flexvol -client-ip 10.193.67.225 -authentication-method krb5 -protocol nfs4 -access-type read-write
 Policy Policy Rule
Path Policy Owner Owner Type Index Access
----------------------------- ---------- --------- ---------- ------ ----------
/ root vsroot volume 1 read
/flexvol default flexvol volume 1 read-write
2 entries were displayed.

The mount command is issued.

In my case, I use NFSv4.x, as that’s the security standard. Mounting without specifying a version will default to the highest NFS version allowed by the client and server, via a client-server negotiation. If NFSv4.x is disabled on the server, the client will fall back to NFSv3.

# mount -o sec=krb5 demo:/flexvol /mnt

Once the mount command gets issued and Kerberos is specified, a few (ok, a lot of) things happen in the background.

While this stuff happens, the mount command will appear to “hang” as the client, KDC and server suss out if you’re going to be allowed access.

  • DNS lookups are done for the client hostname and server hostname (or reverse lookup of the IP address) to help determine what names are going to be used. Additionally, SRV lookups are done for the LDAP service and Kerberos services in the domain. DNS lookups are happening constantly through this process.
  • The client uses its keytab file to send an authentication service request (AS-REQ) to the KDC, along with what enctypes it has available. The KDC then verifies if the requested principal actually exists in the KDC and if the enctypes are supported.
  • If the enctypes are not supported, or if the principal exists, or if there are DUPLICATE principals, the AS-REQ fails. If the principal exists, the KDC will send a successful reply.
  • Then the client will send a Ticket Granting Service request (TGS-REQ) to the KDC. This request is an attempt to look up the NFS service ticket named nfs/name. The name portion of the ticket is generated either via what was typed into the mount command (ie, demo) or via reverse lookup (if we typed in an IP address to mount). The TGS-REQ will be used later to allow us to obtain a service ticket (ST). The TGS will also negotiate supported enctypes for later. If the TGS-REQ between the KDC and client negotiates an enctype that ONTAP doesn’t support (for example, ARCFOUR), then the mount will fail later in process.
  • If the TGS-REQ succeeds, a TGS-REP is sent. If the KDC doesn’t support the requested enctypes from the client, we fail here. If the NFS principal doesn’t exist (remember, it has to be in DNS and match exactly), then we fail.
  • Once the TGS is acquired by the NFS client, it presents the ticket to the NFS server in ONTAP via a NFS NULL call. The ticket information includes the NFS service SPN and the enctype used. If the NFS SPN doesn’t match what’s in “kerberos interface show,” the mount fails. If the enctype presented by the client isn’t supported or is disallowed in “permitted enctypes” on the NFS server, the request fails. The client would show “access denied.”
  • The NFS service SPN sent by the client is presented to ONTAP. This is where the krb-unix mapping takes place. ONTAP will first see if a user named “nfs” exists in local files or name services (such as LDAP, where a bind to the LDAP server and lookup takes place). If the user doesn’t exist, it will then check to see if any krb-unix name mapping rules were set explicitly. If no rules exist and mapping fails, ONTAP logs an error on the cluster and the mount fails with “Access denied.” If the mapping works, the mount procedure moves on to the next step.
  • After the NFS service ticket is verified, the client will send SETCLIENTID calls and then the NFSv4.x mount compound call (PUTROOTFH | GETATTR). The client and server are also negotiating the name@domainID string to make sure they match on both sides as part of NFSv4.x security.
  • Then, the client will try to run a series of GETATTR calls to “/” in the path. If we didn’t allow “read” access in the policy rule for “/” (the vsroot volume), we fail. If the ACLs/mode bits on the vsroot volume don’t allow at least traverse permissions, we fail. In a packet trace, we can see that the vsroot volume has only traverse permissions:
    V4 Reply (Call In 268) ACCESS, [Access Denied: RD MD XT], [Allowed: LU DL]

    We can also see that from the cluster CLI (“Everyone” only has “Execute” permissions in this NTFS security style volume):

    cluster::> vserver security file-directory show -vserver DEMO -path / -expand-mask true
    
    Vserver: DEMO
     File Path: /
     File Inode Number: 64
     Security Style: ntfs
     Effective Style: ntfs
     DOS Attributes: 10
     DOS Attributes in Text: ----D---
    Expanded Dos Attributes: 0x10
     ...0 .... .... .... = Offline
     .... ..0. .... .... = Sparse
     .... .... 0... .... = Normal
     .... .... ..0. .... = Archive
     .... .... ...1 .... = Directory
     .... .... .... .0.. = System
     .... .... .... ..0. = Hidden
     .... .... .... ...0 = Read Only
     UNIX User Id: 0
     UNIX Group Id: 0
     UNIX Mode Bits: 777
     UNIX Mode Bits in Text: rwxrwxrwx
     ACLs: NTFS Security Descriptor
     Control:0x9504
    
    1... .... .... .... = Self Relative
     .0.. .... .... .... = RM Control Valid
     ..0. .... .... .... = SACL Protected
     ...1 .... .... .... = DACL Protected
     .... 0... .... .... = SACL Inherited
     .... .1.. .... .... = DACL Inherited
     .... ..0. .... .... = SACL Inherit Required
     .... ...1 .... .... = DACL Inherit Required
     .... .... ..0. .... = SACL Defaulted
     .... .... ...0 .... = SACL Present
     .... .... .... 0... = DACL Defaulted
     .... .... .... .1.. = DACL Present
     .... .... .... ..0. = Group Defaulted
     .... .... .... ...0 = Owner Defaulted
    
    Owner:BUILTIN\Administrators
     Group:BUILTIN\Administrators
     DACL - ACEs
     ALLOW-NTAP\Domain Admins-0x1f01ff-OI|CI
     0... .... .... .... .... .... .... .... = Generic Read
     .0.. .... .... .... .... .... .... .... = Generic Write
     ..0. .... .... .... .... .... .... .... = Generic Execute
     ...0 .... .... .... .... .... .... .... = Generic All
     .... ...0 .... .... .... .... .... .... = System Security
     .... .... ...1 .... .... .... .... .... = Synchronize
     .... .... .... 1... .... .... .... .... = Write Owner
     .... .... .... .1.. .... .... .... .... = Write DAC
     .... .... .... ..1. .... .... .... .... = Read Control
     .... .... .... ...1 .... .... .... .... = Delete
     .... .... .... .... .... ...1 .... .... = Write Attributes
     .... .... .... .... .... .... 1... .... = Read Attributes
     .... .... .... .... .... .... .1.. .... = Delete Child
     .... .... .... .... .... .... ..1. .... = Execute
     .... .... .... .... .... .... ...1 .... = Write EA
     .... .... .... .... .... .... .... 1... = Read EA
     .... .... .... .... .... .... .... .1.. = Append
     .... .... .... .... .... .... .... ..1. = Write
     .... .... .... .... .... .... .... ...1 = Read
    
    ALLOW-Everyone-0x100020-OI|CI
     0... .... .... .... .... .... .... .... = Generic Read
     .0.. .... .... .... .... .... .... .... = Generic Write
     ..0. .... .... .... .... .... .... .... = Generic Execute
     ...0 .... .... .... .... .... .... .... = Generic All
     .... ...0 .... .... .... .... .... .... = System Security
     .... .... ...1 .... .... .... .... .... = Synchronize
     .... .... .... 0... .... .... .... .... = Write Owner
     .... .... .... .0.. .... .... .... .... = Write DAC
     .... .... .... ..0. .... .... .... .... = Read Control
     .... .... .... ...0 .... .... .... .... = Delete
     .... .... .... .... .... ...0 .... .... = Write Attributes
     .... .... .... .... .... .... 0... .... = Read Attributes
     .... .... .... .... .... .... .0.. .... = Delete Child
     .... .... .... .... .... .... ..1. .... = Execute
     .... .... .... .... .... .... ...0 .... = Write EA
     .... .... .... .... .... .... .... 0... = Read EA
     .... .... .... .... .... .... .... .0.. = Append
     .... .... .... .... .... .... .... ..0. = Write
     .... .... .... .... .... .... .... ...0 = Read
  • If we have the appropriate permissions to traverse “/” then the NFS client attempts to find the file handle for the mount point via a LOOKUP call, using the file handle of vsroot in the path. It would look something like this:
    V4 Call (Reply In 271) LOOKUP DH: 0x92605bb8/flexvol
  • If the file handle exists, it gets returned to the client:
    fh.png
  • Then the client uses that file handle to run GETATTRs to see if it can access the mount:
    V4 Call (Reply In 275) GETATTR FH: 0x1f57355e

If all is clear, our mount succeeds!

But we’re not done… now the user that wants to access the mount has to go through another ticket process. In my case, I used a user named “student1.” This is because a lot of the Kerberos/NFSv4.x requests I get are generated by universities interested in setting up multiprotocol-ready home directories.

When a user like student1 wants to get into a Kerberized NFS mount, they can’t just cd into it. That would look like this:

# su student1
sh-4.2$ cd /mnt
sh: cd: /mnt: Not a directory

Oh look… another useless error! If I were to take that error literally, I would think “that mount doesn’t even exist!” But, it does:

sh-4.2$ mount | grep mnt
demo:/flexvol on /mnt type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=krb5,clientaddr=10.193.67.225,local_lock=none,addr=10.193.67.219)

What that error actually means is that the user requesting access does not have a valid Kerberos AS ticket (login) to make the request for a TGS (ticket granting ticket) to get a service ticket for NFS (nfs/server-hostname). We can see that via the klist -e command.

sh-4.2$ klist -e
klist: Credentials cache keyring 'persistent:1301:1301' not found

Before you can get into a mount that is only allowing Kerberos access, you have to get a Kerberos ticket. On Linux, you can do that via the kinit command, which is akin to a Windows login.

sh-4.2$ kinit
Password for student1@NTAP.LOCAL:
sh-4.2$ klist -e
Ticket cache: KEYRING:persistent:1301:1301
Default principal: student1@NTAP.LOCAL

Valid starting Expires Service principal
05/16/2017 15:54:01 05/17/2017 01:54:01 krbtgt/NTAP.LOCAL@NTAP.LOCAL
 renew until 05/23/2017 15:53:58, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96

Now that I have a my ticket, I can cd into the mount. When I cd into a Kerberized NFS mount, the client will make TGS requests to the KDC (seen in the trace in packet 101) for the service ticket. If that process is successful, we get access:

sh-4.2$ cd /mnt
sh-4.2$ pwd
/mnt
sh-4.2$ ls
c0 c1 c2 c3 c4 c5 c6 c7 newfile2 newfile-nfs4
sh-4.2$ klist -e
Ticket cache: KEYRING:persistent:1301:1301
Default principal: student1@NTAP.LOCAL

Valid starting Expires Service principal
05/16/2017 15:55:32 05/17/2017 01:54:01 nfs/demo.ntap.local@NTAP.LOCAL
 renew until 05/23/2017 15:53:58, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
05/16/2017 15:54:01 05/17/2017 01:54:01 krbtgt/NTAP.LOCAL@NTAP.LOCAL
 renew until 05/23/2017 15:53:58, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96

Now we’re done. (at least until our tickets expire…)

 

TECH::Storage Virtual Machine (SVM) DR in cDOT

With the release of clustered Data ONTAP 8.3.1 comes a whole new and exciting set of features, such as:

  • Improved inline compression
  • FlashEssentials flash optimizations
  • Online foreign LUN import

But the one I’ll cover here is Storage Virtual Machine DR, which is a key component of the enterprise storage story.

Let’s start off with some terminology definitions:

Clustered Data ONTAP

From TR-3982:

Clustered Data ONTAP is enterprise-capable, unified scale-out storage. It is the basis for virtualized
shared storage infrastructures. Clustered Data ONTAP is architected for nondisruptive operations,
storage and operational efficiency, and scalability over the lifetime of the system.

A Data ONTAP cluster typically consists of fabric-attached storage (FAS) controllers: computers
optimized to run the clustered Data ONTAP operating system. The controllers provide network ports that
clients and hosts use to access storage. These controllers are also connected to each other using a
dedicated, redundant 10-gigabit Ethernet interconnect. The interconnect allows the controllers to act as a
single cluster. Data is stored on shelves attached to the controllers. The drive bays in these shelves can
contain hard disks, flash media, or both.

Storage Virtual Machine (SVM)

From TR-3982:

A cluster provides hardware resources, but clients and hosts access storage in clustered Data ONTAP
through storage virtual machines (SVMs). SVMs exist natively inside clustered Data ONTAP. They define
the storage available to the clients and hosts. SVMs define authentication, network access to the storage
in the form of logical interfaces (LIFs), and the storage itself in the form of SAN LUNs or NAS volumes.
Clients and hosts are aware of SVMs, but they may be unaware of the underlying cluster. The cluster
provides the physical resources the SVMs need in order to serve data. The clients and hosts connect to
an SVM, rather than to a physical storage array.

Like compute virtual machines, SVMs decouple services from hardware. Unlike compute virtual
machines, a single SVM can use the network ports and storage of many controllers, enabling scale-out.
One controller’s physical network ports and physical storage also can be shared by many SVMs, enabling
multi-tenancy.

SnapMirror

NetApp® SnapMirror® technology provides fast, efficient data replication and disaster recovery (DR) for your critical data.

Use a single solution across all NetApp storage arrays and protocols. SnapMirror technology works with any application, in both virtual and traditional environments, and in multiple configurations, including hybrid cloud.

Tune SnapMirror technology to meet recovery-point objectives ranging from minutes to hours. Fail over to a specific point in time in the DR copy to recover at once from mirrored data corruption.

Disaster Recovery (DR)

This is pretty standard; it’s a set of policies and procedures put in place for enterprise IT organizations to recover from a catastrophic loss of service at a primary site. Ideally, the failover will be instantaneous and service will be restored quickly, with as little disruption as possible.

No one needs DR… until they do.

One of the most criminally ignored sections of IT is backup and DR. This is because it costs money and doesn’t immediately make you any money. The ROI is low, so it becomes a low priority when it should be one of the highest priorities.

Luckily, the cloud is making DR more of a reality (through things like DRaaS, offered by Cloud ONTAP), as cloud storage prices are dropping and allowing companies to start taking DR more seriously. And remember – your data is only as good as your last restore test.

What is SVM DR?

svmdr

Storage Virtual Machines (SVMs) are essentially blades running Data ONTAP, more or less. They act as their own tenants in a cluster and could represent individual divisions, companies or test/prod environments.

However, even with multiple SVMs, you still end up with a single point of failure – the storage system itself. If a meteor hit your datacenter, your cluster would be toast and your clients would be dead in the water, unless you planned for disaster recovery accordingly.

dayafter

Oops. Did we ever set up DR?

SVM DR allows disaster recovery capability at a granular SVM level, as opposed to having to replicate an entire cluster or filer. This is analogous to the vfiler DR functionality available in 7-Mode.

SVM DR does the following:

  • Leverages NetApp SnapMirror to replicate data to a secondary site.
  • Leverages the new Configuration Replication Service (CRS) application to replicate SVM configuration, including CIFS/SMB shares, network information, NFS exports, etc.
  • Allows two flavors of SVM DR – Identity Preserving and Identity Discarding.

Identity Preserving

This replicates the primary SVM’s configuration and allows us to change to that identity in a failover scenario. One use case for this would be DR on the same physical campus/site (two separate buildings).

The following graphic shows what is (and is not) replicated for SVM DR in Identity Preserve:

svmdr-replicate

Identity Discarding

This allows us to use a different network configuration on a secondary SVM and bring it online as its own identity. A use case for this would be DR to a different geographical location in the world.

The following graphic shows what is (and is not) replicated for SVM DR in Identity Discard:

svmdr-discard

How it works

The flow of operation in SVM DR is essentially:

  • Create SVM DR relationship/schedule
  • Initialize the SnapMirror
  • Ensure updates are successful
  • Test DR

When we test (or do a real failover) to DR, the following happens:

  • SnapMirror break; break means we can now do R/W operations
  • SnapMirror goes from snapmirrored to broken-off
  • Depending on identity type, we either preserve or discard old identity
  • SVM DR destination goes from dp-destination to default
  • Once source site is back up, we can do a resync/flip-resync

When the flip resync occurs:

  • Data written to DR destination gets synced back to source to ensure we have current copy of data and config; this uses a new SVM DR relationship
  • After we’re synced up, the original SVM DR relationship is re-established
  • The flip resync SnapMirror gets broken off and removed
  • SVM DR destination changes from default to dp-destination
  • Snapmirror goes from broken-off to snapmirrored

Some things to keep in mind

While SVM DR makes heavy use of SnapMirror functionality, it is not a true SnapMirror in terms of how it is managed.

  • qtrees in the SVM root volume do *not* get replicated.
  • If you mount a qtree under SVM root and then mount a volume below that qtree, SVM DR will fail unless there is a qtree with the same name created in the destination SVM root volume.
  • All non-SVM root volumes (data volumes) have are type DP.
  • You cannot manage SVM DR SnapMirrors independently. They must be managed via the SVM level as a single entity.
  • SVM DR snapshots are named with vserverdr….
  • If reverting from 8.3.1, all SVM DR relationships and snapshots must be deleted before revert.
  • Source and destination should be at 8.3.1 or later; source version should never be higher than destination.
  • Source and destination must have SnapMirror licenses.
  • Destination cluster should have at least one non-root aggregate with at least 10GB free space for configuration replication.
  • Destination cluster must have same licenses (ie, CIFS, NFS, FCP, etc.) as source to ensure full functionality as source upon failover.
  • If using NFS mounts, clients must remount the volumes on DR failover, as the FSIDs will change. NOTE: ONTAP 9 now supports FSID preservation on SVM DR!

For more information on SVM DR, be sure to check TR-4015 for updates as 8.3.1 goes to general availability (GA – find out what that is here) and follow the SVM DR/Multi-tenancy TME Doug Moore on Twitter @mooredo21. Doug will also be presenting SVM DR sessions at NetApp Insight 2015 in Las Vegas and Berlin.

I’ll also be presenting some sessions at NetApp Insight 2015, so keep checking back at whyistheinternetbroken.com for updates!

If you’re interested in step by step guides of how to set up SVM DR, check out the Express Guides for your version of ONTAP!