Behind the Scenes: Episode 182 – NetApp on NetApp: FlexGroup Volumes and ActiveIQ

Welcome to the Episode 182, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, we invite in the guys from Customer One, who operate the NetApp on NetApp program. NetApp on NetApp is a program where we leverage the latest NetApp technologies within our own organizations. Eduardo Rivera (@mredrivera) and Faisal Salaam (https://www.linkedin.com/in/faisal-salam-754a13104/) as we discuss how NetApp is using FlexGroup volumes to power Active IQ. 

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

Advertisements

Windows NFS? WHO DOES THAT???

Image result for disgusted girl meme

Believe it or not, Windows NFS is a thing. Microsoft has its own NFS server and client, which can leverage RFC compliant NFSv3 calls to a Windows Server running NFS server or to a 3rd party NFS server, such as NetApp ONTAP. It’s actually so popular, that NetApp had to re-introduce it in clustered ONTAP (it wasn’t there until ONTAP 8.2.3/8.3.1).

While Windows NFS currently provides NFSv3 clients, they don’t have NFSv4.1 clients – yet. They do provide NFSv4.1 as a server option, though:

https://docs.microsoft.com/en-us/windows-server/storage/nfs/nfs-overview

I cover Windows NFS support in TR-4067 starting on page 116. I am bringing this topic up because it has come up again recently and I wanted to create a quick and easy blog to follow, as well as call out how you can integrate AD LDAP to help identity management.

There are a few things you have to do to get it working in ONTAP.

Specifically:

  • enable -v3-ms-dos-client option on the NFS server
  • enable -showmount on the NFS server – this prevents some weirdness with writing files
  • disable -enable-ejukebox and -v3-connection-drop

The command would look like this:

cluster::> set advanced
cluster::*> nfs server modify -vserver DEMO -v3-ms-dos-client enabled -v3-connection-drop disabled -enable-ejukebox false -showmount enabled
cluster::*> nfs server show -vserver DEMO -fields v3-ms-dos-client,v3-connection-drop,showmount,enable-ejukebox
vserver enable-ejukebox v3-connection-drop showmount v3-ms-dos-client
------- --------------- ------------------ --------- ----------------
DEMO false disabled enabled enabled

Once that’s done, you can mount via NFS inside Windows clients using the standard “mount” command, provided you’ve enabled the Services for UNIX functionality. There’s plenty of documentation out there for that.

Just by doing the above, here’s an example of a working NFS mount in Windows:

C:\Users\Administrator>mount DEMO:/flexvol X:
X: is now successfully connected to DEMO:/flexvol

The command completed successfully.

Here’s the cluster’s view of that connection:

ontap9-tme-8040::*> network connections active show -node ontap9-tme-8040-0* -service nfs*,mount -remote-ip 10.193.67.236
              Vserver   Interface         Remote
      CID Ctx Name      Name:Local Port   Host:Port            Protocol/Service
--------- --- --------- ----------------- -------------------- ----------------
Node: ontap9-tme-8040-02
2968991376  4 DEMO      data:2049         oneway.ntap.local:931
                                                               TCP/nfs

When I write a file to the mount, there is something that can prove to be an issue, however. Users other than Administrator will write as UID/GID of 4294967294 (-2).

ontap9-tme-8040::*> vserver security file-directory show -vserver DEMO -path /flexvol/student1-nfs.txt

                Vserver: DEMO
              File Path: /flexvol/student1-nfs.txt
      File Inode Number: 1606599
         Security Style: unix
        Effective Style: unix
         DOS Attributes: 20
DOS Attributes in Text: ---A----
Expanded Dos Attributes: -
           UNIX User Id: 4294967294
          UNIX Group Id: 4294967294
         UNIX Mode Bits: 755
UNIX Mode Bits in Text: rwxr-xr-x
                   ACLs: -

That means users won’t show up properly/as desired in UNIX NFS mounts. For example, this is that same file from CentOS:

[root@centos7 /]# cd flexvol
[root@centos7 flexvol]# ls -la | grep student1-nfs
-rwxr-xr-x 1 4294967294 4294967294 0 Feb 5 09:18 student1-nfs.txt

So, how does one fix that?

Configuring Windows NFS clients to negotiate users properly

There are a few ways to have users leverage UID/GID other than -2.

One way is to “squash” every NFS user to the same UID/GID via the old Windows standby – the Windows registry. This is useful if only a single user will be using an NFS client.

This covers how to do that:

https://blogs.msdn.microsoft.com/saponsqlserver/2011/02/03/installation-configuration-of-windows-nfs-client-to-enable-windows-to-mount-a-unix-file-system/

Some of the third party NFS clients (such as Cygwin and Hummingbird/OpenText) will provide local passwd and group file functionality to allow you to leverage more users. In some cases, all this does is add more registry entries.

Another was is to chmod/chown the file after it’s written. But that’s not ideal.

The best way is to leverage an existing name service (such as NIS or LDAP) and have Windows clients query for the UID and GID. If you have one already, great! It’s super easy to set up the client. Just run the following command as an administrator in cmd. My NTAP.LOCAL domain already has an LDAP server set up:

C:\Users\administrator>nfsadmin mapping WIN7-CLIENT config adlookup=yes addomain=NTAP.LOCAL

The settings were successfully updated.

Once I did that, I wrote a new file and the UID/GID was properly represented:

ontap9-tme-8040::*> vserver security file-directory show -vserver DEMO -path /flexvol/prof1-nfs.txt

                Vserver: DEMO
              File Path: /flexvol/prof1-nfs.txt
      File Inode Number: 1606600
         Security Style: unix
        Effective Style: unix
         DOS Attributes: 20
DOS Attributes in Text: ---A----
Expanded Dos Attributes: -
           UNIX User Id: 1100
          UNIX Group Id: 1101
         UNIX Mode Bits: 755
UNIX Mode Bits in Text: rwxr-xr-x
                   ACLs: -

ontap9-tme-8040::*> getxxbyyy getpwbyname -node ontap9-tme-8040-01 -vserver DEMO -username prof1
  (vserver services name-service getxxbyyy getpwbyname)
pw_name: prof1
pw_passwd:
pw_uid: 1100
pw_gid: 1101
pw_gecos:
pw_dir:
pw_shell:

If you’re interested, a packet trace shows that the Windows client will communicate via encrypted LDAP to query the user’s UNIX attribute information:

windows-ldap

An added bonus of having Windows clients query LDAP for UNIX user names and groups for NFS on ONTAP is that if you’re using NTFS security style volumes, you won’t have issues connecting to those mounts.

What breaks when doing NTFS security style?

When a UNIX user attempts to access a volume with NTFS security style ACLs, ONTAP will attempt to map that user to a valid Windows user to make sure Windows ACLs can be calculated. (I cover this in Mixed perceptions with NetApp multiprotocol NAS access)

If a user comes in with the default Windows NFS ID of 4294967294 (which doesn’t translate to a UNIX user), this is what happens.

  • The UNIX user 4294967294 tries to access the mount.
  • ONTAP receives a UID of 4294967294 and attempts to map that to a Windows user
  • That Windows user does not exist, so access is denied. This can manifest as an error (such as when writing a file) or it could just show no files/folder.

windows-nfs-ntfs-noaccess.png

windows-nfs-ntfs-noaccess2

That particular folder does have data. It’s just that the user can’t see it:

windows-nfs-ntfs-data-list

In ONTAP, we’d see this error, confirming that the user doesn’t exist:

2/5/2019 14:31:26 ontap9-tme-8040-02
ERROR secd.nfsAuth.problem: vserver (DEMO) General NFS authorization problem. Error: Get user credentials procedure failed
[ 15 ms] Hostname found in Name Service Cache
[ 19] Hostname found in Name Service Cache
[ 23] Successfully connected to ip 10.193.67.236, port 389 using TCP
**[ 28] FAILURE: User ID '4294967294' not found in UNIX authorization source LDAP.
[ 28] Entry for user-id: 4294967294 not found in the current source: LDAP. Ignoring and trying next available source
[ 29] Entry for user-id: 4294967294 not found in the current source: FILES. Entry for user-id: 4294967294 not found in any of the available sources
[ 44] Unable to get the name for UNIX user with UID 4294967294

With LDAP involved, access to the access to the NFS mounted volume with NTFS security works much better, because ONTAP and the client agree that user 1100 is prof1.

windows-nfs-ntfs-data-list-ldap

So, uh… what if I don’t have LDAP or NIS?

Well, in a Windows domain, you ALWAYS have an LDAP server. Active Directory leverages LDAP schemas to store information and any version of Windows Active Directory can be used to look up UNIX users and groups. In fact, the newer versions of Windows make this very easy. In older Windows versions, you had to manually extend the LDAP schema to provide UNIX attributes. Now, UNIX attributes like UID, UIDnumber, etc. are all in LDAP by default. All you have to do is populate these values with information. You can even do it via PowerShell CMDlets!

Once you have a working Active Directory LDAP environment, you can then configure ONTAP to communicate with LDAP for UNIX identities and you’re well on your way to having a scalable, functional multiprotocol NAS environment.

The one downside I’ve found with Windows NFS is that it doesn’t always play nicely when you want to use SMB on the same client. Windows gets a bit… confused. I haven’t dug into that a ton, but I’ve seen it enough to express caution. 🙂

Securing NFS mounts in a Docker container

docker_security-e1530093759599

Setting up Kerberized NFS on a client can be a bit challenging, especially if you’re trying to do it across multiple hosts. So, I decided I wanted to take on the challenge of creating an easy to deploy Docker container, using NetApp’s Trident plugin to make life even easier.

Why do I want Kerberos?

dockersecurity-476x192

With Kerberos on NFS mounts, you can encrypt traffic for authentication (krb5), integrity checking (krb5i) and for end-to end packet encryption (krb5p). I covered the benefits of using krb5p in a previous blog. I also covered how to use NFS with the new FlexGroup driver in Docker + NFS + FlexGroup volumes = Magic!

I also cover krb5p in Encrypt your NFS packets end to end with krb5p and ONTAP 9.2!

But this blog covers FlexVols only, since FlexGroup volumes can’t use NFSv4.x – yet.

Why do I want NFSv4.x?

NFSv3 is a great protocol, but it has some disadvantages when it comes to locking and security. For starters, v3 is stateless. NFSv4.x is stateful and manages locks much better, since it’s done on a lease basis and is integrated in the protocol itself, while v3 has ancillary services that manage locks.

Those ancillary services (like NLM, mountd, portmap) are also what makes NFSv3 less secure than NFSv4.x. More services = more ports to open on a firewall (your network guy hates you, btw). Additionally, standard in-flight encryption for NAS protocols, such as Kerberos, don’t encrypt the ancillary services – it only encrypts the NFS packets. NFSv4.x also has additional layers of security via NFSv4.x ACLs, as well as ID domain name mapping to ensure the client and server agree on who is who when accessing NFS mounts.

The main downside of NFSv4.x is performance. It currently lags behind NFSv3 for a variety of reasons, but mostly because it has to do more in each packet than NFSv3 had to, and being a stateful protocol can be costly to performance. When you lump in encryption, it adds more overhead. Things are getting better, however, and soon, there won’t be any excuse for not using NFSv4.x as your standard NFS version.

What you need before you start

In this example, I’m going to configure Kerberos, NFSv4.1 and LDAP on a single container. This allows me to have all the moving parts I’d need to get it working out of the gate. I’m using CentOS7.x/RHEL7.x as the Docker host and container base, as well as Microsoft Active Directory 2012R2 for LDAP UNIX identities and Kerberos KDC functionality. You can use Linux-based LDAP and KDCs, but that’s outside the scope of what this blog is about.

Before you get started, you need the following.

  • Active Directory configured to use LDAP for UNIX identity mapping
  • A server/VM running the latest CentOS/RHEL version (our Docker host)
  • A NetApp ONTAP cluster with a SVM running NFS on it

Configuring the ONTAP SVM

Before you can get started with NFS Kerberos on the client, you’ll need to configure NFS Kerberos in ONTAP. Doing this essentially comes down to the following steps:

  • Create a Kerberos realm
  • Configure DNS
  • AES encryption types allowed on the NFS server
  • Create a Kerberos interface (this creates a machine object in AD that has your SPN for NFS server ticket functionality and adds the keytab to the cluster/SVM)
  • Create a local UNIX user named “nfs” (this maps to the nfs/service account when Kerberos mounts are attempted)
  • Create a generic name mapping rule for all machine accounts (when you join a container to the Kerberos realm, it creates a new machine account with the format of [imagehexname]$@REALM.COM. Having a generic name mapping rule will eliminate headaches trying to manage that)
  • Create an export policy and rule that allows Kerberos authentication/v4.x for NFS
  • (optional, but recommended) LDAP server configuration that matches the client (this makes life much easier with NFSv4.x)
  • Configure the NFSv4 ID domain option and enable NFSv4.0/4.1

These steps are all covered in pretty good detail in TR-4073 (the unabridged version) and TR-4616 (the more streamlined version), so I won’t cover them here. Instead, I’ll show you how my cluster is configured.

Kerberos realm

::*> kerberos realm show -vserver DEMO -instance
(vserver nfs kerberos realm show)

Vserver: DEMO
Kerberos Realm: NTAP.LOCAL
KDC Vendor: Microsoft
KDC IP Address: 10.x.x.x
KDC Port: 88
Clock Skew: 5
Active Directory Server Name: oneway.ntap.local
Active Directory Server IP Address: 10.x.x.x
Comment: -
Admin Server IP Address: 10.x.x.x
Admin Server Port: 749
Password Server IP Address: 10.x.x.x
Password Server Port: 464
Permitted Encryption Types: aes-128, aes-256

DNS

::*> dns show -vserver DEMO -instance

Vserver: DEMO
Domains: NTAP.LOCAL
Name Servers: 10.x.x.x
Timeout (secs): 5
Maximum Attempts: 1
Is TLD Query Enabled?: true
Require Source and Reply IPs to Match: true
Require Packet Queries to Match: true

AES encryption types allowed on the NFS server

::*> nfs server show -vserver DEMO -fields permitted-enc-types
vserver permitted-enc-types
------- -------------------
DEMO aes-128,aes-256

Kerberos interface

::*> kerberos interface show -vserver DEMO -instance
(vserver nfs kerberos interface show)

Vserver: DEMO
Logical Interface: data
IP Address: 10.x.x.x
Kerberos Enabled: enabled
Service Principal Name: nfs/demo.ntap.local@NTAP.LOCAL
Permitted Encryption Types: aes-128, aes-256
Machine Account Name: -

Vserver: DEMO
Logical Interface: data2
IP Address: 10.x.x.x
Kerberos Enabled: enabled
Service Principal Name: nfs/demo.ntap.local@NTAP.LOCAL
Permitted Encryption Types: aes-128, aes-256
Machine Account Name: -
2 entries were displayed.

UNIX user named NFS

::*> unix-user show -vserver DEMO -user nfs -instance

Vserver: DEMO
User Name: nfs
User ID: 500
Primary Group ID: 500
User's Full Name:

Generic name mapping rule for Kerberos SPNs

::*> vserver name-mapping show -vserver DEMO -direction krb-unix -instance

Vserver: DEMO
Direction: krb-unix
Position: 1
Pattern: (.+)\$@NTAP.LOCAL
Replacement: root
IP Address with Subnet Mask: -
Hostname: -

Export policy rule that allows Kerberos/NFSv4.x

::*> export-policy rule show -vserver DEMO -policyname kerberos -instance

Vserver: DEMO
Policy Name: kerberos
Rule Index: 1
Access Protocol: nfs4
List of Client Match Hostnames, IP Addresses, Netgroups, or Domains: 0/0
RO Access Rule: krb5, krb5i, krb5p
RW Access Rule: krb5, krb5i, krb5p
User ID To Which Anonymous Users Are Mapped: 65534
Superuser Security Types: any
Honor SetUID Bits in SETATTR: true
Allow Creation of Devices: true
NTFS Unix Security Options: fail
Vserver NTFS Unix Security Options: use_export_policy
Change Ownership Mode: restricted
Vserver Change Ownership Mode: use_export_policy
Policy ID: 42949672971

LDAP client config (optional, but recommended if you plan to use NFSv4.x)

::*> ldap client show -client-config DEMO -instance

Vserver: DEMO
Client Configuration Name: DEMO
LDAP Server List: -
(DEPRECATED)-LDAP Server List: -
Active Directory Domain: NTAP.LOCAL
Preferred Active Directory Servers: -
Bind Using the Vserver's CIFS Credentials: true
Schema Template: MS-AD-BIS
LDAP Server Port: 389
Query Timeout (sec): 3
Minimum Bind Authentication Level: sasl
Bind DN (User): mtuser
Base DN: DC=NTAP,DC=local
Base Search Scope: subtree
User DN: -
User Search Scope: subtree
Group DN: -
Group Search Scope: subtree
Netgroup DN: -
Netgroup Search Scope: subtree
Vserver Owns Configuration: true
Use start-tls Over LDAP Connections: false
Enable Netgroup-By-Host Lookup: false
Netgroup-By-Host DN: -
Netgroup-By-Host Scope: subtree
Client Session Security: none
LDAP Referral Chasing: false
Group Membership Filter: -

To test LDAP functionality on the cluster, use the following command in advanced privilege to look up a user. If you get a UID/GID, you’re good to go.

::*> getxxbyyy getpwbyname -node ontap9-tme-8040-01 -vserver DEMO -username prof1
(vserver services name-service getxxbyyy getpwbyname)
pw_name: prof1
pw_passwd:
pw_uid: 1100
pw_gid: 1101
pw_gecos:
pw_dir:
pw_shell:

Configure/enable NFSv4.x

::*> nfs server show -vserver DEMO -fields v4-id-domain,v4.1,v4.0
vserver v4.0 v4-id-domain v4.1
------- ------- ------------ -------
DEMO enabled NTAP.LOCAL enabled

Once the cluster SVM is set up, there shouldn’t be much else, if anything, that needs to be done for Kerberos on the cluster. However, in AD, you’ll want to allow only AES for the NFS server machine account with this simple PowerShell command:

PS C:\> Set-ADComputer NFS-KRB-NAME$ -KerberosEncryptionType AES256,AES128

Configuring the Docker host

To get NFSv4.x to work properly in a container, you’ll need to make a decision about your Docker host. What I found in my testing is that containers running NFSv4.x want to use the Docker host’s ID mappings/users when doing NFSv4.x functions, rather than the container’s. So while the container may be able to pull users from LDAP and write files as those users, you will see NFSv4.x owners and groups that the Docker *host* cannot resolve appear as “nobody.”

sh-4.2$ ls -la
total 8
drwxrwxrwx. 2 root root 4096 Aug 14 21:08 .
drwxr-xr-x. 18 root root 4096 Aug 15 15:06 ..
-rw-r--r--. 1 nobody nobody 0 Aug 13 21:38 newfile

So, if you want NFSv4.x to resolve names properly (and I suspect you do), then you need to do one of the following on the Docker host:

a) Add users and groups locally to the passwd/group files

b) Configure SSSD to query LDAP

Naturally, I like things to be consistent, so I chose option b.

Since we already have an LDAP server in AD, we can just install/configure sssd to use that. Here’s what you’d do…

Install necessary packages

I like using realm and sssd. It’s fun. It’s easy. This is what you need to do that.

yum -y install realmd sssd oddjob oddjob-mkhomedir adcli samba-common krb5-workstation ntp

Configure DNS (/etc/resolv.conf)

This needs to point to the name servers in your AD domain.

Create a generic machine account in AD for LDAP authentication

This will be how our LDAP clients bind. You can use it for more than one client if you like. This can be done via PowerShell.

PS C:\> import-module activedirectory
PS C:\> New-ADComputer -Name [computername] -SAMAccountName [computername] -DNSHostName
[computername.dns.domain.com] -OtherAttributes @{'userAccountControl'= 2097152;'msDSSupportedEncryptionTypes'=27}

Create a keytab file to copy to the Docker host to use for LDAP binds

This is done on the AD domain controller in a CLI window. Use ktpass and this syntax:

ktpass -princ primary/instance@REALM -mapuser [DOMAIN]\machine$ -crypto AES256-SHA1 +rndpass -ptype KRB5_NT_PRINCIPAL +Answer -out [file:\location]

Copy the keytab file to your Docker host

WinSCP is a good tool to do this. It should live as /etc/krb5.keytab

Configure /etc/krb5.conf

Here’s an example (changes in bold):

# Configuration snippets may be placed in this directory as well
includedir /etc/krb5.conf.d/

[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log

[libdefaults]
dns_lookup_realm = false
ticket_lifetime = 30d
renew_lifetime = 30d
forwardable = true
rdns = false
# default_realm = EXAMPLE.COM
default_ccache_name = KEYRING:persistent:%{uid}

default_realm = NTAP.LOCAL
[realms]
# EXAMPLE.COM = {
# kdc = kerberos.example.com
# admin_server = kerberos.example.com
# }

NTAP.LOCAL = {
}

[domain_realm]
# .example.com = EXAMPLE.COM
# example.com = EXAMPLE.COM
ntap.local = NTAP.LOCAL
.ntap.local = NTAP.LOCAL

Configure the sssd.conf file to point to LDAP

This is what mine looks like. Note the LDAP URI and SASL authid. I also set “use_fully_qualified_names” to false.

[domain/default]
cache_credentials = False
case_sensitive = False
enumerate = True

[sssd]
config_file_version = 2
services = nss, pam, autofs
domains = NTAP.local
debug_level = 7
[nss]
filter_users = root,ldap,named,avahi,haldaemon,dbus,radiusd,news,nscd
filter_groups = root
[pam]
[domain/DOMAIN]
auth_provider = krb5
chpass_provider = krb5
id_provider = ldap
ldap_search_base = dc=ntap,dc=local
ldap_schema = rfc2307bis
ldap_sasl_mech = GSSAPI
ldap_user_object_class = user
ldap_group_object_class = group
ldap_user_home_directory = unixHomeDirectory
ldap_user_principal = userPrincipalName
ldap_account_expire_policy = ad
ldap_force_upper_case_realm = true
ldap_user_search_base = cn=Users,dc=ntap,dc=local
ldap_group_search_base = cn=Users,dc=ntap,dc=local
ldap_sasl_authid = root/krb-container.ntap.local@NTAP.LOCAL
krb5_server = ntap.local
krb5_realm = NTAP.LOCAL
krb5_kpasswd = ntap.local
use_fully_qualified_names = false

Enable authconfig and start SSSD

authconfig --enablesssd --enablesssdauth --updateall

systemctl start sssd

Test LDAP functionality/name lookup

You can use “getent” or “id” to look names up.

# id prof1
uid=1100(prof1) gid=1101(ProfGroup) groups=1101(ProfGroup),1203(group3),1202(group2),1201(group1),1220(sharedgroup)

Configure /etc/idmapd.conf with the NFSv4.x domain

Just this single line needs to be added. Needs to match what’s on the ONTAP SVM.

Domain = [NTAP.LOCAL]

Creating your container

I used the centos/http:latest as the base image, and am running with systemd. I also am copying a few config files to the container to ensure it functions properly and then running a script afterwards.

Here’s the dockerfile I used to create a container that could do NFSv4.x, Kerberos and LDAP. You can also find it on GitHub here:

https://github.com/whyistheinternetbroken/centos-kerberos-nfsv4-sssd

FROM centos/httpd:latest
ENV container docker

# Copy the dbus.service file from systemd to location with Dockerfile
ADD dbus.service /usr/lib/systemd/system/dbus.service

VOLUME ["/sys/fs/cgroup"]
VOLUME ["/run"]

CMD ["/usr/lib/systemd/systemd"]

RUN yum -y install centos-release-scl-rh && \
yum -y install --setopt=tsflags=nodocs mod_ssl
RUN yum -y update; yum clean all
RUN yum -y install --setopt=tsflags=nodocs sssd sssd-dbus adcli krb5-workstation ntp realmd oddjob oddjob-mkhomedir samba-common samba-common-tools nfs-utils; yum clean all

## Systemd cleanup base image
RUN (cd /lib/systemd/system/sysinit.target.wants && for i in *; do [ $i == systemd-tmpfiles-setup.service ] || rm -vf $i; done) & \
rm -vf /lib/systemd/system/multi-user.target.wants/* && \
rm -vf /etc/systemd/system/*.wants/* && \
rm -vf /lib/systemd/system/local-fs.target.wants/* && \
rm -vf /lib/systemd/system/sockets.target.wants/*udev* && \
rm -vf /lib/systemd/system/sockets.target.wants/*initctl* && \
rm -vf /lib/systemd/system/basic.target.wants/* && \
rm -vf /lib/systemd/system/anaconda.target.wants/*

# Copy the local SSSD conf file
RUN mkdir -p /etc/sssd
COPY sssd.conf /etc/sssd/sssd.conf

# Copy the local krb files
COPY krb5.keytab /etc/krb5.keytab
COPY krb5.conf /etc/krb5.conf

# Copy the NFSv4 IDmap file
COPY idmapd.conf /etc/idmapd.conf

#Copy the DNS config
COPY resolv.conf /etc/resolv.conf

# Copy rc.local
COPY rc.local /etc/rc.d/rc.local

# start services
ADD configure-nfs.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/configure-nfs.sh
RUN chmod +x /etc/rc.d/rc.local

You’ll notice that I have several COPY commands in the file. Those are config files you’d need to modify to reflect your own environment. You’ll want to store these in the same folder as your dockerfile. I’ll break down each file here.

dbus.service

This file is the same file as the one on the Docker host. It allows the container to run with systemd. Simply copy it from /usr/lib/systemd/system/dbus.service into your dockerfile folder location.

sssd.conf

This is our LDAP file and specifies how LDAP does its queries. While NFSv4.x will use the Docker host’s users for NFSv4.x mapping, the container will still need to know who a user is to allow us to su, kinit, etc. For this, you can essentially use the same config file you used for your Docker host.

krb5.keytab and krb5.conf

The krb5.keytab file is used to authenticate/bind to LDAP only in this case. So, use the same keytab file you created earlier. Same for the krb5.conf file, unless your containers are going to leverage a different KDC/domain than the Docker host. In that case, it gets a little more complicated. Just copy the Docker host’s krb5.keytab and krb5.conf files from /etc.

idmapd.conf

Again, same file as the Docker host. This defines our idmap domain for NFSv4.x.

resolv.conf

DNS information; should match what’s on the Docker host.

rc.local

This file is useful for running our configuration script. We need the script to run because the container won’t let you start services before it’s running. When you try, you get this error (or something similar):

Failed to get D-Bus connection: No connection to service manager.

This is the line I added to my rc.local:

/usr/local/bin/configure-nfs.sh

That leads us to the script…

configure-nfs.sh

This script starts services. It also joins the container to the Kerberos realm. While I’m using AD KDCs, you can also use realm join to join Linux-based KDCs. Maybe one day I’ll set one up and write up a guide, but for now, read the Linux KDC docs. 🙂

For the realm join, I’m passing the password with the command. It’s in plaintext, so I’d recommend not using a domain admin here. Realm join uses administrator by default, but you have a way to specify a different user with the -U option. So, you can either create a user that *only* has access to create/delete objects in a specific OU or leave the password portion out and have users enter the password when the container starts.

I’d also highly recommend creating a new OU in AD to house all your container machine objects. Otherwise, you’ll see your default OU get flooded with these:

krb-containers

So, configure an OU or CN in AD and then point realm join to use that.

docker-ou.png

Here’s my shell script:

#!/bin/sh
systemctl start dbus
systemctl start rpcgssd
systemctl start rpcidmapd
systemctl restart sssd
echo PASSWORD| realm join -U username --computer-ou OU=Docker NTAP.LOCAL

Realm join caveat

In my config, I’ve done something “clever.” When you join a realm on a Linux client, it will also configure SSSD to pull UNIX IDs from AD. It doesn’t use the uid field by default. Instead, it creates a UID based on the AD SID. Thus, user student1 might look like this from LDAP (as expected):

# id student1
uid=1301(student1) gid=1201(group1) groups=1201(group1),1220(sharedgroup),1203(group3)

But would look like this from SSSD’s algorithm:

# id student1@NTAP.LOCAL
uid=1587401108(student1@NTAP.local) gid=1587400513(domainusers@NTAP.local) groups=1587400513(domainusers@NTAP.local),1587401107(group3@NTAP.local),1587401105(group1@NTAP.local),1587401122(sharedgroup@NTAP.local)

ONTAP doesn’t really know how to query UIDs in the way SSSD does, so we’d need SSSD to be able to look up our UNIX users, but also be able to query AD users that may not have UNIX attributes populated. To control that, I set my sssd.conf file to do the following:

  • When a username is specified without a FQDN, SSSD looks it up in normal LDAP
  • When a username is specified with a FQDN, SSSD uses the algorithm

I controlled this with the SSSD option use_fully_qualified_names. I set it to false for my UNIX users. When realm join is run, it appends to the sssd.conf file and uses the default value of use_fully_qualified_names, which is “true.”

Here’s what realmd adds to the file:

[domain/NTAP.local]
ad_domain = NTAP.local
krb5_realm = NTAP.LOCAL
realmd_tags = manages-system joined-with-samba
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
ldap_id_mapping = True
use_fully_qualified_names = True
fallback_homedir = /home/%u@%d
access_provider = ad

Build your container!

That’s pretty much it. Once you have your Docker host and ONTAP cluster configured, Kerberizing NFS in containers is a breeze. Simply build your Docker container using the dockerfile:

docker build -f /dockerfiles/dockerfile.kerb -t parisi/centos-krb-client .

And then run it in privileged mode. The following also shows us specifying a volume that has been created using NetApp Trident. (see below for my Trident config.json file)

docker run --rm -it --privileged -d -v kerberos:/kerberos parisi/centos-krb-client

And then you can exec the container and start using Kerberos!

# docker exec -ti 330e10f7db1d bash

# su student1
sh-4.2$ klist
klist: Credentials cache keyring 'persistent:1301:1301' not found
sh-4.2$ kinit
Password for student1@NTAP.LOCAL:
sh-4.2$ klist -e
Ticket cache: KEYRING:persistent:1301:1301
Default principal: student1@NTAP.LOCAL

Valid starting Expires Service principal
08/16/18 14:52:58 08/17/18 00:52:58 krbtgt/NTAP.LOCAL@NTAP.LOCAL
renew until 08/23/18 14:52:55, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
sh-4.2$ cd /kerberos
sh-4.2$ klist -e
Ticket cache: KEYRING:persistent:1301:1301
Default principal: student1@NTAP.LOCAL

Valid starting Expires Service principal
08/16/18 14:53:09 08/17/18 00:52:58 nfs/demo.ntap.local@NTAP.LOCAL
renew until 08/23/18 14:52:55, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
08/16/18 14:52:58 08/17/18 00:52:58 krbtgt/NTAP.LOCAL@NTAP.LOCAL
renew until 08/23/18 14:52:55, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
sh-4.2$ ls -la
total 8
drwxrwxrwx. 2 root root 4096 Aug 15 15:46 .
drwxr-xr-x. 18 root root 4096 Aug 15 19:02 ..
-rw-r--r--. 1 prof1 ProfGroup 0 Aug 15 15:41 newfile
-rw-r--r--. 1 student1 group1 0 Aug 14 21:08 newfile2
-rw-r--r--. 1 root daemon 0 Aug 15 15:41 newfile3
-rw-r--r--. 1 student1 group1 0 Aug 15 15:46 newfile4
-rw-r--r--. 1 prof1 ProfGroup 0 Aug 13 20:57 prof1file
-rw-r--r--. 1 student1 group1 0 Aug 13 20:58 student1
-rw-r--r--. 1 student2 group2 0 Aug 13 21:12 student2

You can also push the docker image up to your repository and pull it down on any Docker host you like, provided that Docker host is configured as we mentioned.

BONUS ROUND: Trident config.json

tumblr_lndumwicey1qdh05go1_400

Did you know you could control mount options and export policy rules with Trident?

Get Trident here!

Just use the config.json file to do that. In my file, every volume mounts as NFSv4.1 with Kerberos security and a Kerberos export policy.

{
"version": 1,
"storageDriverName": "ontap-nas",
"managementLIF": "10.x.x.x",
"dataLIF": "10.x.x.x",
"svm": "DEMO",
"username": "admin",
"password": "PASSWORD",
"aggregate": "aggr1_node1",
"exportPolicy": "kerberos",
"nfsMountOptions": "-o vers=4.1,sec=krb5",
"defaults": {
"exportPolicy": "kerberos"
}
}

Happy Kerberizing!

Docker + NFS + FlexGroup volumes = Magic!

tapete-as-creation-the-magic-unicorns-8-470937_l

A couple of years ago, I wrote up a blog on using NFS with Docker as I was tooling around with containers, in an attempt to wrap my head around them. Then, I never really touched them again and that blog got a bit… stale.

Why stale?

Well, in that blog, I had to create a bunch of kludgy hacks to get NFS to work with Docker, and honestly, it likely wasn’t even the best way to do it, given my lack of overall Docker knowledge. More recently, I wrote up a way to Kerberize NFS mounts in Docker containers that is a little better effort.

Luckily, realizing that I’m not the only one who wants to use Docker but may not know all the ins and outs, NetApp developers created a NetApp plugin to use with Docker that will do all the volume creation, removal, etc for you. Then, you can leverage the Docker volume options to mount via NFS. That plugin is named “Trident.”

mattel-dc-multiverse-super-friends-aquaman-review-trident-2

Trident + NFS

Trident is an open source storage provisioner and orchestrator for the NetApp portfolio.

You can read more about it here:

https://netapp.io/2016/12/23/introducing-trident-dynamic-persistent-volume-provisioner-kubernetes/

You can also read about how we use it for AI/ML here:

https://www.theregister.co.uk/2018/08/03/netapp_a800_pure_airi_flashblade/

When you’re using the Trident plugin, you can create Docker-ready NFS exported volumes in ONTAP to provide storage to all of your containers just by specifying the -v option during your “docker run” commands.

For example, here’s a NFS exported volume created using the Trident plugin:

# docker volume create -d netapp --name=foo_justin
foo_justin
# docker volume ls
DRIVER VOLUME NAME
netapp:latest foo_justin

Here’s what shows up on the ONTAP system:

::*> vol show -vserver DEMO -volume netappdvp_foo_justin -fields policy
vserver volume               policy
------- -------------------- -------
DEMO    netappdvp_foo_justin default

Then, I can just start up the container using that volume:

# docker run --rm -it -v foo_justin:/foo alpine ash
/ # mount | grep justin
10.x.x.x:/netappdvp_foo_justin on /foo type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.193.67.237,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=10.x.x.x)

Having a centralized NFS storage volume for your containers to rely on has a vast number of use cases, providing access for reading and writing to the same location across a network on a high-performing storage system with all sorts of data protection capabilities to ensure high availability and resiliency.

Customization of Volumes

With the Trident plugin, you have the ability to modify the config files to change attributes from the defaults, such as custom names, size, export policies and others. See the full list here:

http://netapp-trident.readthedocs.io/en/latest/docker/install/ndvp_ontap_config.html

Trident + NFS + FlexGroup Volumes

Starting in Trident 18.07, a new Trident NAS driver was added that supports creation of FlexGroup volumes with Docker.

To change the plugin, change the /etc/netappdvp/config.json file to use the FlexGroup driver.

{
"version": 1,
"storageDriverName": "ontap-nas-flexgroup",
"managementLIF": "10.x.x.x",
"dataLIF": "10.x.x.x.",
"svm": "DEMO",
"username": "admin",
"password": "********",
"aggregate": "aggr1_node1",
}

Then, create your FlexGroup volume. That simple!

A word of advice, though. The FlexGroup driver defaults to 1GB and creates 8 member volumes across your aggregates, which creates 128MB member volumes. That’s problematic for a couple reasons:

  • FlexGroup volumes should have members that are no less than 100GB in size (as per TR-4571) – small members will affect performance due to member volumes doing more remote allocation than normal
  • Files that get written to the FlexGroup will fill up 128MB pretty fast, causing the FlexGroup to appear to be out of space.

You can fix this either by setting the config.json file to use larger sizes, or specifying the size up front in the Docker volume command. I’d recommend using the config file and overriding the defaults.

To set this in the config file, just specify “size” as a variable (full list of options can be found here: https://netapp-trident.readthedocs.io/en/latest/kubernetes/operations/tasks/backends/ontap.html:

{
    "version": 1,
    "storageDriverName": "ontap-nas-flexgroup",
    "managementLIF": "10.0.0.1",
    "dataLIF": "10.0.0.2",
    "svm": "svm_nfs",
    "username": "vsadmin",
    "password": "secret",
    "defaults": {
      "size": "800G",
      "spaceReserve": "volume",
      "exportPolicy": "myk8scluster"
    }}

Since the volumes default to thin provisioned, you shouldn’t worry too much about storage space, unless you think your clients will fill up 800GB. If that’s the case, you can apply quotas to the volumes if needed to limit how much space can be used. (For FlexGroups, quota enforcement will be available in an upcoming release; FlexVols can currently use quota enforcement)

# docker volume create -d netapp --name=foo_justin_fg -o size=1t
foo_justin_fg

And this is what the volume looks like in ONTAP:

::*> vol show -vserver DEMO -volume netappdvp_foo_justin* -fields policy,is-flexgroup,aggr-list,size,space-guarantee 
vserver volume                  aggr-list               size policy  space-guarantee is-flexgroup
------- ----------------------- ----------------------- ---- ------- --------------- ------------
DEMO netappdvp_foo_justin_fg    aggr1_node1,aggr1_node2 1TB  default none            true

Since the FlexGroup is 1TB in size, the member volumes will be 128GB, which fulfills the 100GB minimum. Future releases will enforce this without you having to worry about it.

::*> vol show -vserver DEMO -volume netappdvp_foo_justin_fg_* -fields aggr-list,size -sort-by aggr-list
vserver volume                        aggr-list   size
------- ----------------------------- ----------- -----
DEMO    netappdvp_foo_justin_fg__0001 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0003 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0005 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0007 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0002 aggr1_node2 128GB
DEMO    netappdvp_foo_justin_fg__0004 aggr1_node2 128GB
DEMO    netappdvp_foo_justin_fg__0006 aggr1_node2 128GB
DEMO    netappdvp_foo_justin_fg__0008 aggr1_node2 128GB
8 entries were displayed.

Practical uses for FlexGroups with containers

It’s cool that we *can* provision FlexGroup volumes with Trident for use with containers, but does that mean we should?

Well, consider this…

In an ONTAP cluster that uses FlexVol volumes for NFS storage presented to containers, I am going to be bound to a single node’s resources, as per the design of a FlexVol. This means that even though I bought a 4 node cluster, I can only use 1 node’s RAM, CPU, network, capacity, etc. If I have a use case where thousands of containers spin up at any given moment and attach themselves to a NFS volume, then I might see some performance bottlenecks due to the increased load. In most cases, that’s fine – but if you could get more out of your storage, wouldn’t you want to do that?

docker-flexvol

You could add layers of automation into the mix to add more FlexVols to the solution, but then you have new mount points/folders. And what if those containers all need to access the same data?

docker-flexvol2

With a FlexGroup volume that gets presented to those same Docker instances, the containers now can leverage all nodes in the cluster, use a single namespace and simplify the overall automation structure.

docker-flexgroup.png

The benefits become even more evident when those containers are constantly writing new files to the NFS mount, such as in an Artificial Intelligence/Machine Learning use case. FlexGroups were designed to handle massive amounts of file creations and can provide 2-6x the performance over a FlexVol in use cases where we’re constantly creating new files.

Stay tuned for some more information on how FlexGroups and Trident can bring even more capability to the table to AI/ML workloads. In the meantime, you can learn more about NetApp solutions for AI/ML here:

https://www.netapp.com/us/solutions/applications/ai-deep-learning.aspx

Behind the Scenes: Episode 148 – An Introduction to Cloud Volume Services

Welcome to the Episode 148, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, we invited Eiki Hrafnsson (@EirikurH), Chief Architect, Data Fabric, and Sr. Manager, Product Marketing, Cloud, Ingo Fuchs (@IngoFuchs) to give us the lowdown on what Cloud Volume Services are and where people are expected to use them.

You can also check out Eiki’s Cloud Field Day presentation on Cloud Volume Services here:

http://techfieldday.com/appearance/netapp-presents-at-cloud-field-day-3/

Also, you can investigate Cloud Volume Services on your own here:

https://cloud.netapp.com/cloud-volumes

Or check out this demo with Ingo.

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

This week’s episode is here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

 

How to find average file size and largest file size using XCP

If you use NetApp ONTAP to host NAS shares (CIFS or NFS) and have too many files and folders to count, then you know how challenging it can be to figure out file information in your environment in a quick, efficient and effective way.

This becomes doubly important when you are thinking of migrating NAS data from FlexVol volumes to FlexGroup volumes, because there is some work up front that needs to be done to ensure you size the capacity of the FlexGroup and its member volumes correctly. TR-4571 covers some of that in detail, but it basically says “know your average file size.” It currently doesn’t tell you *how* to do that (though it will eventually). This blog attempts to fill that gap.

XCP

I’ve written previously about XCP here:

Generally speaking, it’s been to tout the data migration capabilities of the tool. But, in this case, I want to highlight the “xcp scan” capability.

XCP scan allows you to use multiple, parallel threads to analyze an unstructured NAS share much more quickly than you could with basic tools like rsync, du, etc.

The NFS version of XCP also allows you to output this scan to a file (HTML, XML, etc) to generate a report about the scanned data. It even does the math for you and finds the largest (max) file size and average file size!

xcpfilesize

The command I ran to get this information was:

# xcp scan -md5 -stats -html SERVER:/volume/path > filename.html

That’s it! XCP will scan and write to a file. You can also get info about the top five file consumers (by number and capacity) by owner, as well as get some nifty graphs. (Pro tip: Managers love graphs!)

xcp-graphs

What if I only have SMB/CIFS data?

Currently, XCP for SMB doesn’t support output to HTML files. But that doesn’t mean you can’t have fun, too!

You can stand up a VM using CentOS or whatever your favorite Linux kernel is and use XCP for NFS to scan data – provided the client has the necessary access to do so and you can score an NFS license (even if it’s eval). XCP scans are read-only, so you shouldn’t have issues running them.

Just keep in mind the following:

NFS to shares that have traditionally been SMB/CIFS-only are likely NTFS security style. This means that the user you are accessing the data as (for example, root) should be able to map to a valid Windows user that has read access to the data. NFS clients that access NTFS security style volumes map to Windows users to figure out permissions. I cover that here:

Mixed perceptions with NetApp multiprotocol NAS access

You can check the volume security style in two ways:

  • CLI with the command
    ::> volume show -volume [volname] -fields security-style
  • OnCommand System Manager under the “Storage -> Qtrees” section (yea, yea… I know. Volumes != Qtrees)

ocsm-qtree

To check if the user you are attempting to access the volume via NFS with maps to a valid and expected Windows user, use this CLI command from diag privilege:

::> set diag
::*> diag secd name-mapping show -node node1 -vserver DEMO -direction unix-win -name prof1

'prof1' maps to 'NTAP\prof1'

To see what Windows groups this user would be a member of (and thus would get access to files and folders that have those groups assigned), use this diag privilege command:

::*> diag secd authentication show-creds -node ontap9-tme-8040-01 -vserver DEMO -unix-user-name prof1

UNIX UID: prof1 <> Windows User: NTAP\prof1 (Windows Domain User)

GID: ProfGroup
 Supplementary GIDs:
 ProfGroup
 group1
 group2
 group3
 sharedgroup

Primary Group SID: NTAP\DomainUsers (Windows Domain group)

Windows Membership:
 NTAP\group2 (Windows Domain group)
 NTAP\DomainUsers (Windows Domain group)
 NTAP\sharedgroup (Windows Domain group)
 NTAP\group1 (Windows Domain group)
 NTAP\group3 (Windows Domain group)
 NTAP\ProfGroup (Windows Domain group)
 Service asserted identity (Windows Well known group)
 BUILTIN\Users (Windows Alias)
 User is also a member of Everyone, Authenticated Users, and Network Users

Privileges (0x2080):
 SeChangeNotifyPrivilege

If you want to run XCP as root and want it to have administrator level access, you can create a name mapping. This is what I have in my SVM:

::> vserver name-mapping show -vserver DEMO -direction unix-win

Vserver: DEMO
Direction: unix-win
Position Hostname         IP Address/Mask
-------- ---------------- ----------------
1        -                -                Pattern: root
                                           Replacement: DEMO\\administrator

To create a name mapping for root to map to administrator:

::> vserver name-mapping create -vserver DEMO -direction unix-win -position 1 -pattern root -replacement DEMO\\administrator

Keep in mind that backup software often has this level of rights to files and folders, and the XCP scan is read-only, so there shouldn’t be any issue. If you are worried about making root an administrator, create a new Windows user for it to map to (for example, DOMAIN\xcp) and add it to the Backup Operators Windows Group.

In my lab, I ran a scan on a NTFS security style volume called “xcp_ntfs_src”:

::*> vserver security file-directory show -vserver DEMO -path /xcp_ntfs_src

Vserver: DEMO
 File Path: /xcp_ntfs_src
 File Inode Number: 64
 Security Style: ntfs
 Effective Style: ntfs
 DOS Attributes: 10
 DOS Attributes in Text: ----D---
Expanded Dos Attributes: -
 UNIX User Id: 0
 UNIX Group Id: 0
 UNIX Mode Bits: 777
 UNIX Mode Bits in Text: rwxrwxrwx
 ACLs: NTFS Security Descriptor
 Control:0x8014
 Owner:NTAP\prof1
 Group:BUILTIN\Administrators
 DACL - ACEs
 ALLOW-BUILTIN\Administrators-0x1f01ff-OI|CI
 ALLOW-DEMO\Administrator-0x1f01ff-OI|CI
 ALLOW-Everyone-0x100020-OI|CI
 ALLOW-NTAP\student1-0x120089-OI|CI
 ALLOW-NTAP\student2-0x120089-OI|CI

I used this command and nearly 600,000 objects were scanned in 25 seconds:

# xcp scan -md5 -stats -html 10.x.x.x:/xcp_ntfs_src > xcp-ntfs.html
XCP 1.3D1-8ae2672; (c) 2018 NetApp, Inc.; Licensed to Justin Parisi [NetApp Inc] until Tue Sep 4 13:23:07 2018

126,915 scanned, 85,900 summed, 43.8 MiB in (8.75 MiB/s), 14.5 MiB out (2.89 MiB/s), 5s
 260,140 scanned, 187,900 summed, 91.6 MiB in (9.50 MiB/s), 31.3 MiB out (3.34 MiB/s), 10s
 385,100 scanned, 303,900 summed, 140 MiB in (9.60 MiB/s), 49.9 MiB out (3.71 MiB/s), 15s
 516,070 scanned, 406,530 summed, 187 MiB in (9.45 MiB/s), 66.7 MiB out (3.36 MiB/s), 20s
Sending statistics...
 594,100 scanned, 495,000 summed, 220 MiB in (6.02 MiB/s), 80.5 MiB out (2.56 MiB/s), 25s
594,100 scanned, 495,000 summed, 220 MiB in (8.45 MiB/s), 80.5 MiB out (3.10 MiB/s), 25s.

This was the resulting report:

xcp-ntfs

Happy scanning!

Behind the Scenes: Episode 145 – AI, Machine Learning and ONTAP with Santosh Rao

Welcome to the Episode 145, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, NetApp Senior Technical Director Santosh Rao (@santorao) joins us to talk about how NetApp and NVidia are partnering to enhance AI solutions with the DGX-1, ONTAP and FlexGroup volumes using NFS!

You can find more information in the following links:

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

This week’s episode is here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

New NetApp FlexPod Cisco Validated Designs available!

new20flexpod

In case you weren’t aware, NetApp partners with other vendors, such as Cisco, to create bundled architectures that are co-supported and simple to deploy. These converged infrastructure products are known as “FlexPods” and they go through a rigorous testing and validation process by both NetApp and Cisco to ensure the highest probability of success in deployments.

There are two new Cisco Validated Designs (CVDs) available. I’ll basically be plagiarizing the announcement emails that went out here, because I like converging communications, too. 🙂

FlexPod Datacenter (AFF A300) with Citrix XenDesktop/XenApp for 6000 Seats (credit to Chris Gebhardt – @chrisgeb)

This CVD is a new milestone in user density on the A300 platform running ONTAP 9.3 at 6000 users.  In this CVD we were able to observe peak total values of 83,000 IOPS and a bandwidth of 1750 MB/s at a latency of less than 1ms.  This low latency translates directly into an excellent end-user experience on mixed use case scale testing scenario.

https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/UCS_CVDs/cisco_ucs_xd715esxi65u1_flexpod.html

The solution provides customers with a blueprint for deploying large scale mixed use case of persistent, nonpersistent & RDS Windows 10 virtual machines on FlexPod.  Cisco & NetApp have stress-tested the limits of the infrastructure during many different scenarios.

Some of the key values of this solution include:

  • NetApp AFF A300array provides industry-leading storage solutions that efficiently handle the most demanding I/O bursts (for example, login storms), profile management, and user data management, deliver simple and flexible business continuance, and help reduce storage cost per desktop.
  • NetApp AFF A300array provides a simple to understand storage architecture for hosting all user data components (VMs, profiles, user data) on the same storage array.
  • NetApp ONTAP 9.3 softwareenables to seamlessly add, upgrade or remove storage from the infrastructure to meet the needs of the virtual desktops.
  • NetApp Virtual Storage Console(VSC) for VMware vSphere hypervisor has deep integrations with vSphere, providing easy-button automation for key storage tasks such as storage repository provisioning, storage resize, data deduplication, directly from vCenter.

 More information on this and other solutions can be found:

Cisco NetApp Website: http://www.cisconetapp.com/

All Flash FAS Datasheet: http://www.netapp.com/us/media/ds-3582.pdf

Additional FlexPod solutions with All Flash FAS, AltaVault, SAP HANA, Microsoft SharePoint, VMware vSphere, Cisco UCS Director and more, can be found:  http://www.netapp.com/us/solutions/flexpod/datacenter/validated-designs.aspx

FlexPod Datacenter with VMware 6.5 Update1 and Cisco ACI 3.1 Solution (credit to Dave Derry)

I am pleased to announce the release of an updated FlexPod Solution –  FlexPod Datacenter with VMware 6.5 Update1 and Cisco ACI 3.1.  This validated FlexPod design (CVD) refreshes FlexPod with All Flash FAS with ONTAP 9.3, Cisco ACI 3.1, and VMware vSphere 6.5 u1.

flexpod-aci.png

This solution continues the momentum of the huge success of NetApp All Flash FAS along with over 5000 Cisco Nexus 9k customers and over 200 production deployments of Cisco ACI in enterprises and service providers.

FlexPod with All Flash FAS and Cisco ACI is the optimal shared infrastructure to deploy a variety of workloads.  FlexPod provides a platform that is both flexible and scalable for multiple use cases and applications.   From virtual desktop infrastructure to SAP®, FlexPod can efficiently and effectively support business-critical applications running simultaneously from the same shared infrastructure. The flexibility and scalability of FlexPod also enable customers to create a right-sized infrastructure that can grow with and adapt to their evolving business requirements.

FlexPod is the leading converged infrastructure with the lowest storage cost per TB of any converged infrastructure in the market, according to IDC.  This coupled with All Flash FAS increases application performance by 20X and with Cisco ACI that provides up to 83% faster network provisioning times for applications compared to existing network provisioning tools.

Use Case Summary

This infrastructure solution validates this design including redundancy, failure and recovery testing:

  • UCS B200-M4 and M5 compute servers, 6300-series fabric interconnects, running UCS Manager 3.2
  • Nexus N9k running in ACI Fabric Mode, 40G end-to-end
  • Cisco APIC managing the Nexus 9k fabric, running ACI 3.1
  • AFF A300 controllers with ONTAP 9.3, with storage connected to the ACI fabric (NFS)
  • OnCommand System Manager, OnCommand Unified Manager, OnCommand Workflow Automation, NetApp Virtual Storage Console for vSphere
  • NetApp SnapDrive and SnapManager
  • VMware vSphere 6.5 u1
  • Validation of FC direct-connect and iSCSI for SAN boot and data access from the fabric interconnects to storage
  • FC, iSCSI, and NFS Storage access
  • NetApp storage QoS to limit load to keep CPU utilization at a recommended level

The FlexPod Datacenter with VMware 6.5 Update1 and Cisco ACI 3.1 design guide is located here:

https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/UCS_CVDs/flexpod_esxi65u1_n9k_aci_design.html

To review additional validated FlexPod solutions including validated infrastructure and workload solutions, see this link:   http://www.netapp.com/us/solutions/flexpod/datacenter/validated-designs.aspx

We also will have a podcast soon that covers the CVD for OpenStack in a FlexPod. Stay tuned for that!

Life hack: Change the UID of all files in NAS volume… without actually changing anything!

146538-640x313

There are a ton of hidden gems in ONTAP that go unnoticed because they get added without a lot of fanfare. For example, the volume recovery queue got added in 8.3 and no one really knew what it was, what it did, or why the volumes they deleted didn’t seem to actually get deleted for 24 hours.

I keep my ears open for these features so I can promote them and I ran across a pretty slick, simple gem while at the NetApp Converge (sales kick off) conference, from an old colleague in my support days that now does SE work. (Shout out to Maarten Lippmann!)

But, features are only as good as their use cases.

Here’s the scenario…

Let’s say you have a Git code repository with millions of files and its files are owned by a number of different people that one of your developers wants to access and make changes to. They don’t have access to some of those files by way of permissions, but there are way too many to re-permission effectively and in a timely manner. Plus, if you change the access to these files, you might break the code repo horribly.

So, how do you:

  • Create a usable copy of the entire code repo in a reasonable amount of time without eating up a ton of space
  • Assign a new owner to all the files in the volume quickly and easily
  • Keep the original repo intact

It’s pretty easy in ONTAP, actually – In fact, it’s a single command. All you need is a FlexClone license and you can make an instant copy of a volume with a new file owner without impacting the source volume and without using up any new space. Additionally, if you wanted to keep those changes, you can split the clone into its own unique volume.

In the following example, I have an existing volume that has a ton of files and folders, all owned by root:

[root@XCP nfs4]# ls -la
total 8012
d------r-x. 102 root root 8192 Apr 11 11:41 .
drwxr-xr-x. 5 root root 4096 Apr 12 17:20 ..
----------. 1 root root 0 Apr 11 11:29 file
d---------. 1002 root root 77824 Apr 11 11:47 topdir_0
d---------. 1002 root root 77824 Apr 11 11:47 topdir_1
...
d---------. 1002 root root 77824 Apr 11 11:47 topdir_99

I want the new owner of the files in the cloned volume to be a user named “prof1” and the GID to be 1101.

cluster::*> getxxbyyy getpwbyname -node ontap9-tme-8040-01 -vserver DEMO -username prof1
 (vserver services name-service getxxbyyy getpwbyname)
pw_name: prof1
pw_passwd:
pw_uid: 1100
pw_gid: 1101
pw_gecos:
pw_dir:
pw_shell:

So, I do the following:

cluster::*> vol clone create -vserver DEMO -flexclone clone -type RW -parent-vserver DEMO -parent-volume flexvol -junction-active true -foreground true -junction-path /clone -uid 1100 -gid 1101
[Job 12606] Job succeeded: Successful

cluster::*> vol show -vserver DEMO -volume clone -fields clone-volume,clone-parent-name,clone-parent-vserver
vserver volume clone-volume clone-parent-vserver clone-parent-name
------- ------ ------------ -------------------- -----------------
DEMO clone true DEMO flexvol

That command took literally 10 seconds to complete. There are over 1.8 million objects in that volume.

cluster::*> df -i /vol/clone
Filesystem iused ifree %iused Mounted on Vserver
/vol/clone/ 1824430 4401487 29% /clone DEMO

Then, I check the owner of the files:

cluster::*> vserver security file-directory show -vserver DEMO /clone/nfs4

Vserver: DEMO
 File Path: /clone/nfs4
 File Inode Number: 96
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 10
 DOS Attributes in Text: ----D---
Expanded Dos Attributes: -
 UNIX User Id: 1100
 UNIX Group Id: 1101
 UNIX Mode Bits: 5
 UNIX Mode Bits in Text: ------r-x
 ACLs: NFSV4 Security Descriptor
 Control:0x8014
 DACL - ACEs
 ALLOW-user-prof1-0x1601ff-FI|DI|IO
 ALLOW-user-student1-0x21-FI|DI|IO
 ALLOW-group-ProfGroup-0x1200a9-FI|DI|IO|IG
 ALLOW-EVERYONE@-0x1200a9

cluster::*> vserver security file-directory show -vserver DEMO /clone/nfs4/topdir_99

Vserver: DEMO
 File Path: /clone/nfs4/topdir_99
 File Inode Number: 3556
 Security Style: unix
 Effective Style: unix
 DOS Attributes: 10
 DOS Attributes in Text: ----D---
Expanded Dos Attributes: -
 UNIX User Id: 1100
 UNIX Group Id: 1101
 UNIX Mode Bits: 0
 UNIX Mode Bits in Text: ---------
 ACLs: NFSV4 Security Descriptor
 Control:0x8004
 DACL - ACEs
 ALLOW-user-prof1-0x1601ff-FI|DI
 ALLOW-user-student1-0x21-FI|DI
 ALLOW-group-ProfGroup-0x1200a9-FI|DI|IG

And from the client:

[root@XCP nfs4]# pwd
/clone/nfs4

[root@XCP nfs4]# ls -la
total 8012
d------r-x. 102 1100 1101 8192 Apr 11 11:41 .
drwxr-xr-x. 5 1100 1101 4096 Apr 12 17:20 ..
----------. 1 1100 1101 0 Apr 11 11:29 file
d---------. 1002 1100 1101 77824 Apr 11 11:47 topdir_0
d---------. 1002 1100 1101 77824 Apr 11 11:47 topdir_1
d---------. 1002 1100 1101 77824 Apr 11 11:47 topdir_10
d---------. 1002 1100 1101 77824 Apr 11 11:47 topdir_11
d---------. 1002 1100 1101 77824 Apr 11 11:47 topdir_12

It shouldn’t be that easy, should it?

If I wanted to split the volume off into its own volume (such as when a dev makes changes and wants to keep them, but doesn’t want to change the source volume):

cluster::*> vol clone split
 estimate show start status stop

If I want to delete the clone after I’m done, I just run “volume destroy.”

Questions? Hit me up in the comments!

Behind the Scenes: Episode 134 – The Active IQ Story: Building a Data Pipeline for Machine Learning

Welcome to the Episode 134, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, Active IQ Technical Director Shankar Pasupathy joins us and tells us how AutoSupport’s infrastructure and backend evolved into Active IQ’s multicloud data pipeline. Learn how NetApp is using big data analytics and machine learning on ONTAP to improve the overall customer experience

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

This week’s episode is here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here: