Why Is the Internet Broken: Greatest Hits

When I started this site back in October of 2014, it was mainly to drive traffic to my NetApp Insight sessions -and it worked.

(By the way… stay tuned for a blog on this year’s new Insight sessions by yours truly. Now with more lab!)

As I continued writing, my goal was to keep creating content – don’t be the guy who just shows up during conference season.

blogfieldofdreams

So far, so good.

But since I create so much content, it gets hard to find for new visitors to this site, The WordPress archives/table of contents is lacking. So, what I’ve done is create my own table of contents of the top 5 most visited posts.

Top 5 Blogs (by number of visits)

TECH::Using NFS with Docker – Where does it fit in?

NetApp FlexGroup: An evolution of NAS

ONTAP 9.1 is now generally available (GA)!

TECH::Become a clustered Data ONTAP CLI Ninja

TECH::Data LIF best practices for NAS in cDOT 8.3

 

DataCenterDude

I also write for datacenterdude.com on occasion. To read those, go to this link:

My DataCenterDude stuff

How else do I find stuff?

You can also search on the site or click through the archives, if you choose. Or, subscribe to the RSS feed. If you have questions or want to see something changed or added to the site, follow me on Twitter @NFSDudeAbides or comment on one of the posts here!

You can also email me at whyistheinternetbroken@gmail.com.

Advertisements

Behind the Scenes: Episode 109– ONTAP 9.3 Security Enhancements

Welcome to the Episode 109, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

Note: If you’re looking for last week’s podcast (IBM Watson/Elio), then it will be back up soon. It had to be reviewed before it could be officially published. Should be up as Episode 110 in a couple days.

group-4-2016

This week on the podcast, we cover the new security enhancements in ONTAP 9.3 with the security super squad, Juan Mojica (@Juan_M_Mojica, http://securitybrutesquad.blogspot.com) and Dan Tulledge (@Dan_Tulledge). Join us as we discuss Multifactor Authentication and NetApp Volume Encryption enhancements.

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

This week’s episode is here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

Kerberize your NFSv4.1 Datastores in ESXi 6.5 using NetApp ONTAP

How do I set up Kerberos with NFSv4.1 datastores in ESXi 6.5?

 

homer-rubik

I have gotten this question enough now (and from important folks like Cormac Hogan) to write up an end to end guide on how to do this. I had a shorter version in TR-4073 and then a Linux-specific version in TR-4616, but admittedly, it wasn’t really enough to help people get the job done. I also wrote up some stuff on ESXi Kerberos in a previous blog post that wasn’t nearly as in-depth called How to set up Kerberos on vSphere 6.0 servers for datastores on NFS.

Cormac also wrote up one of his own:

https://cormachogan.com/2017/10/12/getting-grips-nfsv4-1-kerberos/

I will point out that, in general, NFS Kerberos is a pain in the ass for people who don’t set it up on a regular basis and understand its inner workings, regardless of the vendor you’re interacting with. The reason is that there are multiple moving parts involved, and support for various portions of Kerberos (such as encryption types) vary. Additionally, some hosts automate things better than others.

We’re going to set it all up as if we only have created a basic SVM with data LIFs and some volumes. If you have an existing SVM configured with NFS and such, you can retrofit as needed. While this blog covers only NFS Kerberos using ONTAP as the NFS server, the steps for AD and ESXi would apply to other NFS servers as well, in most cases.

Here’s the lab setup:

  • ONTAP 9.3 (but this works for ONTAP 9.0 and onward)
    • SVM is SVM1
  • Windows 2012R2 (any Windows KDC that supports AES will work)/Active Directory
    • DNS, KDC
    • Domain is core-tme.netapp.com
  • ESXi 6.5
    • Hostname is CX400S1-GWNSG6B

ONTAP Configuration

While there are around 10 steps to do this, keep in mind that you generally only have to configure this once per SVM.

We’ll start with ONTAP, via the GUI. I’ll also include a CLI section. First, we’ll start in the SVM configuration section to minimize clicks. This is found under Storage -> SVMs. Then, click on the SVM you want to configure and then click on the NFS protocol.

SVM-protocol

1. Configure DNS

We need this because Kerberos uses DNS lookups to determine host names/IPs. This will be the Active Directory DNS information. This is found under Services -> DNS/DDNS.

dns

2. Enable NFSv4.1

NFSv4.1 is needed to allow NFSv4.1 mounts (obviously). You don’t need to enable NFSv4.0 to use NFSv4.1. ESXi doesn’t support v4.0 anyway. But it is possible to use v3.0, v4.0 and v4.1 in the same SVM. This can be done under “Protocols -> NFS” on the left menu.

v41.png

3. Create an export policy and rule for the NFS Kerberos datastore volume

Export policies in ONTAP are containers for rules. Rules are what defines the access to an exported volume. With export policy rules, you can limit the NFS version allowed, the authentication type, root access, hosts, etc. For ESXi, we’re defining the ESXi host (or hosts) in the rule. We’re allowing NFS4 only and Kerberos only. We’re allowing the ESXi host to have root access. If you use NFSv3 with Kerberos for these datastores, be sure to adjust the policy and rules accordingly. This is done under the “Policies” menu section on the left.

export-rule.png

4. Verify that vsroot has an export policy and rule that allows read access to ESXi hosts

Vsroot is “/” in the namespace. As a result, for clients to mount NFS exports, they must have at least read access via vsroot’s export policy rule and at least traverse permissions (1 in mode bits) to navigate through the namespace. In most cases, vsroot uses “default” as the export policy. Verify whichever export policy is being used has the proper access.

If a policy doesn’t have a rule, create one. This is an example of minimum permissions needed for the ESXi host to traverse /.

vsroot-policy.png

5. Create the Kerberos realm

The Kerberos realm is akin to /etc/krb5.conf on Linux clients. It tells ONTAP where to look to attempt the bind/join to the KDC. After that, KDC servers are discovered by internal processes and won’t need the realm. The realm domain should be defined in ALL CAPS. This is done in System Manager using “Services -> Kerberos Realm” on the left.

realm.png

6. Enable Kerberos on your data LIF (or LIFs)

To use NFS Kerberos, you need to tell ONTAP which data LIFs will participate in the requests. Doing this specifies a service principal for NFS on that data LIF. The SVM will interact with the SVM defined in the Kerberos realm to create a new machine object in AD that can be used for Kerberos. The SPN is defined as nfs/hostname.domain.com and represents the name you want clients to use to access shares. This FQDN needs to exist in DNS as a forward and reverse record to ensure things work properly. If you enable Kerberos on multiple data LIFs with the same name, the machine account gets re-used. If you use different SPNs on LIFs, different accounts get created. You have a 15 character limit for the “friendly” display name in AD. If you want to change the name later, you can. That’s covered in TR-4616.

kerb-interface.png

7. Create local UNIX group and users

For Kerberos authentication, a krb-unix name mapping takes place, where the incoming SPN will attempt to map to a UNIX user that is either local on the SVM or in external name service servers. You always need the “nfs” user, as the nfs/fqdn SPN will map to “nfs” implicitly. The other user will depend on the user you specify in ESXi when you configure Kerberos. That UNIX user will use the same user name. In my example, I used “parisi,” which is a user in my AD domain. Without these local users, the krb-unix name mapping would fail and manifest as “permission denied” errors when mounting. The cluster would show errors calling out name mapping in “event log show.”

Alternatively, you can create name mapping rules. This is covered in TR-4616. UNIX users and groups can be created using the UNIX menu option under “Host Users and Groups” in the left menu.

The numeric GID and UID can be any unused numeric in your environment. I used 501 and 502.

First, create the primary group to assign to the users.

group.png

Then, create the NFS user with the Kerberos group as the primary group.

nfs-user.png

Finally, create the user you used in the Kerberos config in ESXi.

parisi-user.png

Failure to create the user would result in the following similar error in ONTAP. (In the below, I used a user named ‘vsphere’ to try to authenticate):

**[ 713] FAILURE: User 'vsphere' not found in UNIX authorization source LDAP.
 [ 713] Entry for user-name: vsphere not found in the current source: LDAP. Ignoring and trying next available source
 [ 715] Entry for user-name: vsphere not found in the current source: FILES. Entry for user-name: vsphere not found in any of the available sources
 [ 717] Unable to map SPN 'vsphere@CORE-TME.NETAPP.COM'
 [ 717] Unable to map Kerberos NFS user 'vsphere@CORE-TME.NETAPP.COM' to appropriate UNIX user

8. Create the volume to be used as the datastore

This is done from “Storage -> Volumes.” In ONTAP 9.3, the only consideration is that you must specify a “Protection” option, even if it’s “none.” Otherwise, it will throw an error.

vol create

vol-protect.png

Once the volume is created, it automatically gets exported to /volname.

9. Verify the volume security style is UNIX for the datastore volume

The volume security style impacts how a client will attempt to authenticate into the ONTAP cluster. If a volume is NTFS security style, then NFS clients will attempt to map to Windows users to figure out the access allowed on an object. System Manager doesn’t let you define the security style at creation yet and will default to the security style of the vsroot volume (which is / in the namespace). Ideally, vsroot would also be UNIX security style, but in some cases, NTFS is used. For VMware datastores, there is no reason to use NTFS security style.

From the volumes screen, click on the newly created volume and click the “Edit” button to verify UNIX security style is used.

sec-style.png

10. Change the export policy assigned to the volume to the ESX export policy you created

Navigate to “Storage -> Namespace” to modify the export policy used by the datastore.

change-policy.png

11. Configure NTP

This prevents the SVM from getting outside of the 5 minute time skew that can break Kerberos authentication. This is done via the CLI. No GUI support for this yet.

cluster::> ntp server create -server stme-infra02.core-tme.netapp.com -version auto

12. Set the NFSv4 ID domain

While we’re in the CLI, let’s set the ID domain. This ID domain is used for client-server interaction, where a user string will be passed for NFSv4.x operations. If the user string doesn’t match on each side, the NFS user gets squashed to “nobody” as a security mechanism. This would be the same domain string on both ESX and on the NFS server in ONTAP (case-sensitive). For example, “core-tme.netapp.com” would be the ID domain here and users from ESX would come in as user@core-tme.netapp.com. ONTAP would look for user@core-tme.netapp.com to exist.

In ONTAP, that command is:

cluster::> nfs modify -vserver SVM1 -v4-id-domain core-tme.netapp.com

13. Change datastore volume permissions

By default, volumes get created with the root user and group as the owner, and 755 access. In ESX, if you want to create VMs on a datastore, you’d need either root access or to change write permissions. When you use Kerberos, ESX will use the NFS credentials specified in the configuration as the user that writes to the datastore. Think of this as a “VM service account” more or less. So, your options are:

  • Change the owner to a different user than root
  • Use root as the user (which would need to exist as a principal in the KDC)
  • Change permissions

In my opinion, changing the owner is the best, most secure choice here. To do that:

cluster::> volume modify -vserver SVM1 -volume kerberos_datastore -user parisi

That’s all from ONTAP for the GUI. The CLI commands would be (all in admin priv):

cluster::> dns create -vserver SVM1 -domains core-tme.netapp.com -name-servers 10.193.67.181 -timeout 2 -attempts 1 -skip-config-validation true
cluster::> nfs modify -vserver SVM1 -v4.1 enabled
cluster::> export-policy create -vserver SVM1 -policyname ESX
cluster::> export-policy rule create -vserver SVM1 -policyname ESX -clientmatch CX400S1-GWNSG6B.core-tme.netapp.com -rorule krb5* -rwrule krb5* -allow-suid true -ruleindex 1 -protocol nfs4 -anon 65534 -superuser any
cluster::> vol show -vserver SVM1 -volume vsroot -fields policy
cluster::> export-policy rule show -vserver SVM1 -policy [policy from prev command] -instance
cluster::> export-policy rule modify or create (if changes are needed)
cluster::> kerberos realm create -vserver SVM1 -realm CORE-TME.NETAPP.COM -kdc-vendor Microsoft -kdc-ip 10.193.67.181 -kdc-port 88 -clock-skew 5 -adminserver-ip 10.193.67.181 -adminserver-port 749 -passwordserver-ip 10.193.67.181 -passwordserver-port 464 -adserver-name stme-infra02.core-tme.netapp.com -adserver-ip 10.193.67.181
cluster::> kerberos interface enable -vserver SVM1 -lif data -spn nfs/ontap9.core-tme.netapp.com
cluster::> unix-group create -vserver SVM1 -name kerberos -id 501
cluster::> unix-user create -vserver SVM1 -user nfs -id 501 -primary-gid 501
cluster::> unix-user create -vserver SVM1 -user parisi -id 502 -primary-gid 501
cluster::> volume create -vserver SVM1 -volume kerberos_datastore -aggregate aggr1_node1 -size 500GB -state online -policy kerberos -user 0 -group 0 -security-style unix -unix-permissions ---rwxr-xr-x -junction-path /kerberos_datastore 

 ESXi Configuration

This is all driven through the vSphere GUI. This would need to be performed on each host that is being used for NFSv4.1 Kerberos.

1. Configure DNS

This is done under the “Hosts and Clusters -> Manage -> Networking -> TCP/IP config.”

dns-esx.png

2. Configure NTP

This is found in “Hosts and Clusters -> Settings -> Time Configuration”

ntp

3. Join the ESXi host to the Active Directory domain

Doing this automatically creates the machine account in AD and will transfer the keytab files between the host and KDC. This also sets the SPNs on the machine account. The user specified in the credentials must have create object permissions in the Computers OU in AD. (For example, a domain administrator)

This is found in “Hosts and Clusters -> Settings -> Authentication Services.”

join-domain.png

4. Specify NFS Kerberos Credentials

This is the user that will authenticate with the KDC and ONTAP for the Kerberos key exchange. This user name will be the same as the UNIX user you used in ONTAP. If you use a different name, create a new UNIX user in ONTAP or create a name mapping rule. If the user password changes in AD, you must also change it in ESXi.

nfs-creds

With NFS Kerberos in ESX, the ID you specified in NFS Kerberos credentials will be the ID used to write. For example, I used “parisi” as the user. My SVM is using LDAP authentication with AD. That user exists in my LDAP environment as the following:

cluster::*> getxxbyyy getpwbyuid -node ontap9-tme-8040-01 -vserver SVM1 -userID 3629
  (vserver services name-service getxxbyyy getpwbyuid)
pw_name: parisi
pw_passwd: 
pw_uid: 3629
pw_gid: 512
pw_gecos: 
pw_dir: 
pw_shell: /bin/sh

As a result, the test VM I create got written as that user:

drwxr-xr-x   2 3629  512    4096 Oct 12 10:40 test

To even be able to write at all, I had to change the UNIX permissions on the datastore to allow write access to “others.” Alternatively, I could have changed the owner of the volume to the specified user. I mention those steps in the ONTAP section.

If you plan on changing the user for NFS creds, be sure to use “clear credentials,” which will restart the service and clear caches. Occasionally, you may need to restart the nfsgssd service from the CLI if something is stubbornly cached:

[root@CX400S1-03003-B3:/] /etc/init.d/nfsgssd restart
watchdog-nfsgssd: Terminating watchdog process with PID 33613
Waiting for process to terminate...
nfsgssd stopped
nfsgssd started

In rare cases, you may have to leave and re-join the domain, which will generate new keytabs. In one particularly stubborn case, I had to reboot the ESX server after I changed some credentials and the Kerberos principal name in ONTAP.

That’s the extent of the ESXi host configuration for now. We’ll come back to the host to mount the datastore once we make some changes in Active Directory.

Active Directory Configuration

Because there are variations in support for encryption types, as well as DNS records needed, there are some AD tasks that need to be performed to get Kerberos to work.

1. Configure the machine accounts

Set the machine account attributes for the ESXi host(s) and ONTAP NFS server to only allow AES encryption. Doing this avoids failures to mount via Kerberos that manifest as “permission denied” on the host. In a packet trace, you’d potentially be able to see the ESXi host trying to exchange keys with the KDC and getting “unsupported enctype” errors if this step is skipped.

The exact attribute to change is msDS-SupportedEncryptionTypes. Set that value to 24, which is AES only. For more info on encryption types in Windows, click to view this blog.

You can change this attribute using “Advanced Features” view with the attribute editor. If that’s not available or it’s not possible to use, you can also modify using PowerShell.

To modify in the GUI:

msds-enctype.png

To modify using PowerShell:

PS C:\> Set-ADComputer -Identity [NFSservername] -Replace @{'msDS-SupportedEncryptionTypes'=24}

2. Create DNS records for the ESXi hosts and the ONTAP server

This would be A/AAAA records for forward lookup and PTR for reverse. Windows DNS let’s you do both at the same time. Verify the DNS records with “nslookup” commands.

This can also be done via GUI or PowerShell.

From the GUI:

From PowerShell:

PS C:\Users\admin>Add-DnsServerResourceRecordA -IPv4Address 10.193.67.220 -CreatePtr core-tme.netapp.com -Name ontap9

PS C:\Users\admin>Add-DnsServerResourceRecordA -IPv4Address 10.193.67.35 -CreatePtr core-tme.netapp.com -Name cx400s1-gwnsg6b

Mounting the NFS Datastore via Kerberos

Now, we’re ready to create the datastore in ESX using NFSv4.1 and Kerberos.

Simply go to “Add Datastore” and follow the prompts to select the necessary options.

1. Select “NFS” and then “NFS 4.1.”

VMware doesn’t recommend mixing v3 and v4.1 on datastores. If you have an existing datastore that you were mounting via v3, VMware recommends migrating VMs using storage vmotion.

new-ds1new-ds2

2. Specify the name and configuration

The datastore name can be anything you like. The “folder” has to be the junction-path/export path on the ONTAP cluster. In our example, we use /kerberos_datastore.

Server(s) would be the data LIF you enabled Kerberos on. ONTAP doesn’t support NFSv4.1 multi-pathing/trunking yet, so specifying multiple NFS servers won’t necessarily help here.

new-ds3.png

4. Check “Enable Kerberos-based authentication”

Kind of a no-brainer here, but still worth mentioning.

new-ds4.png

5. Select the hosts that need access.

If other hosts have not been configured for Kerberos, they won’t be available to select.

new-ds5.png

6. Review the configuration details and click “Finish.”

This should mount quickly and without issue. If you have an issue, review the “Troubleshooting” tips below.

new-ds6.png

This can also be done with a command from ESX CLI:

esxcli storage nfs41 add -H ontap9 -a SEC_KRB5 -s /kerberos_datastore -v kerberosDS

Troubleshooting Tips

If you follow the steps above, this should all work fine.

mounted-ds.png

But sometimes I make mistakes. Sometimes YOU make mistakes. It happens. 🙂

Some steps I use to troubleshoot Kerberos mount issues…

  • Review the vmkernel logs for vcenter
  • Review “event log show” from the cluster CLI
  • Ensure the ESX host name and SVM host name exist in DNS (nslookup)
  • Use packet traces from the DC to see what is failing during Kerberos authentication (filter on “kerberos” in wireshark)
  • Review the SVM config:
    • Ensure NFSv4.1 is enabled
    • Ensure the SVM has DNS configured
    • Ensure the Kerberos realm is all caps and is created on the SVM
    • Ensure the desired data LIF has Kerberos enabled (from system manager of via “kerberos interface show” from the cli)
    • Ensure the export policies and rules allow access to the ESX datastore volume for Kerberos, superuser and NFSv4. Ensure the vsroot volume allows at least read access for the ESX host.
    • Ensure the SVM has the appropriate UNIX users and group created (nfs user for the NFS SPN; UNIX user name that matches the NFS user principal defined in ESX) or the users exist in external name services
  • From the KDC/AD domain controller:
    • Ensure the machine accounts created use AES only to avoid any weird issues with encryption type support
    • Ensure the SPNs aren’t duplicated (setspn /q {service/fqdn}
    • Ensure the user defined in the NFS Kerberos config hasn’t had a password expire

 

Behind the Scenes: Episode 108 – #NetAppInsight 2017: Las Vegas Recap (with the #NetAppATeam)

Welcome to the Episode 108, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

group-4-2016

This week on the podcast, we rounded up some of the NetApp A-Team members to discuss NetApp Insight 2017 in Las Vegas. Included in the podcast were:

You can listen to this week’s episode here:

Here are some shots of us recording:

For the blog referenced in the podcast:

The Importance of Perspective in Crisis

I also made a video of my side trip to Zion:

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

The Importance of Perspective in Crisis

Just to be crystal clear, this blog is in no way representative of my employer, NetApp. This is solely my own opinion.

I played football in high school. Our team was awful. We regularly went winless each season, and lost games handily. I recall one particular game where we ended up losing by a ton, but it wasn’t losing that makes me remember that game; it was seeing a teammate get laid out by a crackback block where we feared he may never walk again.

He eventually was able to recover with no issues, but I remember how one of the toughest guys I knew laid on the field, screaming in pain, and how after he was carted off and put in an ambulance, I didn’t have the motivation to play again that night. What was the point? It was just a game!

Years later, I found myself in the midst of one of the largest historical events in American history, as a gunman fired upon a crowd of innocent concert-goers from his perch on the 32nd floor of the Mandalay Bay in Las Vegas. I was there for a work conference and found myself with a familiar old feeling, where I lost inspiration to play because of the enormity of the situation. This is the story of my experiences that night, and how I was able to overcome the feeling of “what was the point?”

How it happened, from my perspective

I’ll start off by stating that, while seeing an active shooter event play out in real time is unnerving, I can’t even begin to claim it impacted me in any meaningful way. Why?

Perspective.

We were at a team dinner when the shooting began. I actually had been emailing a colleague who told me he heard gunshots. My initial reaction was, “well, this is Vegas. That happens sometimes.” But then he told me it was the sound of a fully automatic weapon. That doesn’t normally happen.

But I didn’t think much of it – it had to be far away from us, right?

Then, I started seeing cracks in the illusion I had set up for myself. A waitress at the restaurant hastily grabbed her purse with an alarmed look on her face and bolted for the door. We were instructed to leave the restaurant in a “calm and orderly manner.” I instantly knew what was happening – my colleague’s email, this reaction. Someone was shooting people very close to where we were.

At the time, not many other people had the information I had from my colleague. But I also had no idea where exactly this person was. He could have been in the lobby. There could have been multiple people. But since we were being escorted calmly and orderly, I figured there was no imminent threat.

We headed toward the basement entrance. Then, someone urgently called for us to move quickly toward the exit. The reality of the situation became more apparent. Then, as we were moving toward the exit to the back parking lot, we saw 7 or 8 police officers running in the opposite direction with rifles brandished. They were yelling at us.

RUN.

I’ve never been in an active shooter drill, much less a real one playing out. I’ve never been in the military. I’ve never fired a gun. The closest I’ve ever been to real, unadulterated danger would be first-person shooters. Games. “It’s just a game…”

This was not a game. But oddly, despite my lack of training, I was mostly calm. I was alert. The adrenaline had kicked in.

My head was on a swivel – stay low. Keep in the shadows. Watch your surroundings. Clear the corners. Don’t run out of doors without looking. Know where the exits are.

I was sprinting at this point, because I was in survival mode. I have lots to lose in this life – my wife, my son. I was not ready to lose any of it.

We made our way to the parking lot, where we saw a police helicopter circling the Mandalay Bay tower, about midway up.

I stopped and took video (from the shadows) of the helicopter. Within a few minutes, the helicopter shone its spotlight on us and, from the loudspeaker, issued a single command.

RUN.

The police on the ground instructed us to move south. They had set up a perimeter and were moving us outside of it. They were our buffer. We kept moving toward the “Welcome to Las Vegas” sign, feeling anything but.

We came across Maverick Helicopters. There was a couple there, waiting by the doors. The doors opened and we were allowed into the lobby for shelter. We sat inside, wondering what was happening. We still had little to no information, but at least we weren’t running anymore.

We sat for a bit. The good people at Maverick handed out food and water. Then, they instructed us to move into the back room, where there were no windows. We filed down a narrow hallway, into a room no larger than a hotel room. There were about 30-40 of us, so the room got warm and uncomfortable. But we were safe.

Perspective.

More food was handed out. More water. One of the staff joked about wanting cookies. Then they broke out champagne. Generally, that’s reserved for celebration. As police sirens whizzed by, people drank. Perhaps we were celebrating being alive, being safe.

One woman drank too fast and got dizzy. The staff got one of the first responders in to look at her. He decided she was fine, but her husband persisted. He kept reassuring them. I watched as he patiently told the man his wife would be fine, his knees and arms covered in someone else’s blood, with many more people still needing his help. I started to get angry – who were these people to imply they were more important? Why weren’t they trying to place themselves outside of their little world and think of others in more need? Then I thought of the staff at Maverick Helicopters, going above and beyond their job descriptions, sheltering us and feeding us. In this moment of crisis, there was infinitely more good than bad.

Perspective.

We were in the room for about an hour. People were streaming news reports and I was looking at social media to find out more. We were starting to get information, for better or worse. We saw the video of the shooting, with the surreal sounds of automatic rifle fire raining down on the concert. I realized that when we were running out the back way, we were just around the corner from this madman’s rifle sights. I was thankful we had police escorts to steer us away from danger.

I also started hearing unconfirmed reports of other shooters. Other incidents at other hotels. Oh my god. Was this a coordinated attack? How big was this? I started to think about the poor people who weren’t in the right place at the wrong time like we had been. I was still calm, because I was safe. I wasn’t thinking about the conference – I was thinking about survival. And my next steps.

We were allowed back into the lobby later. The police sirens were replaced by ambulance sirens. Things were calming down and we were now entering damage control. We were given the option of staying at Maverick Helicopters all night or moving on to a safe zone. After some waffling, I decided that the safest move was to get farther away from the strip, regardless of the inconvenience.

We got onto a bus with some other people. The radio had news reports. We found out that there was only one shooter. I was glad to be wrong. I was angry at the way fake news spread. But I was happy that the scale was small.

Perspective.

The driver (the cookie man) turned the radio down. A woman in cowboy boots and a hat asked him to turn it back up. He snapped at her.

“REALLY? REALLY?”

The volume stayed low.

We all process these things differently. Some of us want to be saturated in grief. Some want to pretend it’s not happening, like the cookie man. I just wanted to be safe.

Later, he turned the volume back up. The number of dead and injured rolled in. The woman in cowboy boots and a hat started to cry. I realized she had been in the middle of it all. I realized that I had not.

Perspective.

We ended up at Thomas and Mack arena. When we got there, a detective announced over a bullhorn that we were all there as witnesses and would be asked to sign a statement. I wondered if we were in the right place. The air was chilly, as desert nights are.

We found out that we didn’t need to be witnesses to be let in. We went inside and saw a vast array of people inconvenienced by the shooting. Travelers who couldn’t check in. Evacuees like me. Concert goers with minor injuries from falling in the mass stampede out of the danger zone.

I watched as travelers who couldn’t check in to hotels looked surly and complained about their plight. Then I watched the girl with skinned knees got patched up and moved on. We all process these things differently. Being cold, or uncomfortable, or stranded? No where near the scale of inconvenience of people who had been shot at, trampled, hit or watched friends die.

Perspective.

I tried to get a little rest. It was 4AM. My phone battery was dead. I had no room to go back to. There was uncertainty about my responsibilities the next day. All I wanted to do was sleep. We were told that Mandalay Bay was closed indefinitely to us, due to the sheer number of casualties. A young man, possibly a former medic in the military, came in with a backpack of medicine and first aid to help people. College students were handing out blankets. Food and water donations started to pour in.

I decided to go to the airport, rent a car and find accommodations off the strip. I was tired, but I was taking up resources from people who needed them more. I decided to stay well off the strip, not out of fear, but out of consideration for people who were trying to sort things out. I wanted to help, but it wasn’t my time.

I picked up some supplies and made it to the new hotel. The reality of how much luck I had been granted started to sink in. I had easily gotten a car and hotel, and had the means to do so. The worst injury I witnessed was a champagne dizzy spell and some skinned knees. I was uninjured. And alive.

I had witnessed numerous acts of kindness and selflessness. I was inspired by how people almost always come together in situations like this. I had nothing to complain about.

Perspective.

I got about 2 hours of sleep. When I woke up, I decided it was time for me to do something, anything. The first day of our conference was canceled, naturally. The Mandalay Bay hotel was starting to let people back in.

I had a rental car and offered rides. I did some research to find out where I could go to donate blood. I passed out information about the blood drive locations. I didn’t get the chance to donate, as I was turned away after 3 hours in line, but I tried to make an impact by buying and handing out fruit and popsicles to the other people in line with me, while numerous other people did the same.

line

People showed up with pizza, sandwiches, tacos, donuts. Truckloads of water came in. Buses arrived with air conditioned seating. The Vegas hotels sent lunch boxes. The inspiring goodness of humanity was shining through and it was inspiring. I thought back to the time I saw my friend get hurt in a meaningless football game and how crushed my spirit had been. But I also realized why something as relatively meaningless as a tech conference had to go on. We had to reclaim our normalcy. We had to continue living our lives, because doing otherwise would be an insult to the sacrifice of the people who responded to the shooting and the people that had much, much worse days than any of us had.

So, the next day, I woke up and started the day, newly refreshed and inspired to reclaim normalcy and understand that my struggles are but ripples in the greater human experience.

So while a tech conference isn’t life, it’s part of life. And it’s important to admit that.

Perspective.

Behind the Scenes: Episode 107 – NetApp Insight, GPDR & Data Fabric

Welcome to the Episode 107, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

group-4-2016

This week on the podcast, we get ready for Insight with a couple of interviews with some of the folks that will be attending the conference from NetApp and answering questions at Insight Central. Join us as we chat with Professional Services Product Manager Justine Ma (https://www.linkedin.com/in/justinema/@ma_justine) and Global Architect David Mancusi (David.Mancusi@netapp.com) about Insight, GDPR and the Data Fabric! 

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

You can listen to this week’s episode here:

https://m.soundcloud.com/techontap_podcast/episode-107-netapp-insight-gpdr-data-fabric

NetApp FlexGroup: Crazy fast

This week, the SPEC SFS®2014_swbuild test results for NetApp FlexGroup volumes submitted for file services were approved and published.

TL;DR – NetApp was the cream of the crop.

You can find those results here:

http://spec.org/sfs2014/results/res2017q3/sfs2014-20170908-00021.html

The testing rig was as follows:

  • Four node FAS8200 cluster (not AFF)
  • 72 4TB 7200 RPM 12Gb SAS drives (per HA pair)
  • NFSv3
  • 20 IBM servers/clients
  • 10GbE network (four connections per HA pair)

Below is a graph that consolidates the results of multiple vendor SPEC SFS®2014_swbuild results. Notice the FlexGroup did more IOPS (around 260k) at a lower latency (sub 3ms):

specsfs-fg

In addition, NetApp had the best Overall Response Time (ORT) of the competition:

specsfs-ort

And had the best MBps/throughput:

specsfs-mbps

Full results here:

http://spec.org/sfs2014/results/sfs2014swbuild.html

For more information on the SPEC SFS®2014_swbuild test, see https://www.spec.org/sfs2014/.

Everything but the kitchen sink…

With a NetApp FlexGroup, the more clients and work you throw at it, the better it will perform. An example of this is seen in TR-4571, with a 2 node A700 doing GIT workload testing. Note how increasing the jobs only encourages the FlexGroup.

average-iops

max-mbps-git

FlexGroup Resources

If you’re interested in learning more, see the following resources:

You can also email us at flexgroups-info@netapp.com.

Tech ONTAP Podcast: Now powered by NetApp FlexGroup volumes!

If you’re not aware, I co-host the Tech ONTAP Podcast. I also am the TME for NetApp FlexGroup volumes. Inexplicably, we weren’t actually storing our podcast files on NetApp storage – instead, we were using the local Mac SSD, which was problematic for three reasons:

  1. It was eventually going to fill up.
  2. If it failed, bye bye files.
  3. It was close to impossible to access unless were were local to the Mac, for a variety of reasons.

So, it finally dawned on me that I had an AFF8040 in my lab, barely being used for anything except testing and TR writing.

At first, I was going to use a FlexVol, out of habit. But then I realized that a FlexGroup volume would provide a great place to write a bunch of 1-400MB files while leveraging all of my cluster resources. The whole process, from creating the FlexGroup, googling autofs in Mac and setting up the NFS mount and Audio Hijack, took me all of maybe 30 minutes (most of that googling and setting up autofs). Not bad!

The podcast setup

When we record the podcast, we use software called Audio Hijack. This allows us to pipe in sound from applications like WebEx and web browsers, as well as from the in-studio microphones, which all get converted to MP3. This is where the FlexGroup NFS mount comes in – we’ll be pointing Audio Hijack to the FlexGroup volume, where the MP3 files will stream in real time.

Additionally, I also migrated all the existing data over to the FlexGroup for archival purposes. We do use OneDrive to do podcast sharing and such, but I wanted an extra layer of centralized data access, and the NFS mounted FlexGroup provides that. Setting it up to stream right from Audio Hijack removes an extra step for me when processing the files. But, before I could point the software at the NFS mount, I had to configure the Mac to automount the FlexGroup volume on boot.

Creating the FlexGroup volume

Normally, a FlexGroup volume is created with 8 member volumes per node for an AFF (as per best practice). However, my FlexGroup volume was going to be around 5TB. That means 16 member volumes would be around 350-400GB each. That violates the other best practices of no less than 500GB per member, to avoid too much remote allocation. While my file sizes weren’t going to be huge, I wanted to avoid issues as the volume filled, so I met in the middle – 8 member volumes total, 4 per node. To do that, you have to go to the CLI; System Manager doesn’t do customization like that yet. In particular, you need the -aggr-list and -aggr-list-multiplier options with volume create.

ontap9-tme-8040::*> vol create -vserver DEMO -volume TechONTAP -aggr-list aggr1_node1,aggr1_node2 -aggr-list-multiplier 4
ontap9-tme-8040::*> vol show -vserver DEMO -volume TechONTAP* -sort-by size -fields size,node
vserver volume size node
------- --------------- ----- ------------------
DEMO TechONTAP__0001 640GB ontap9-tme-8040-01
DEMO TechONTAP__0002 640GB ontap9-tme-8040-02
DEMO TechONTAP__0003 640GB ontap9-tme-8040-01
DEMO TechONTAP__0004 640GB ontap9-tme-8040-02
DEMO TechONTAP__0005 640GB ontap9-tme-8040-01
DEMO TechONTAP__0006 640GB ontap9-tme-8040-02
DEMO TechONTAP__0007 640GB ontap9-tme-8040-01
DEMO TechONTAP__0008 640GB ontap9-tme-8040-02
DEMO TechONTAP 5TB -

Automounting NFS on boot with a Mac

When you mount NFS with a Mac, it doesn’t retain it after you reboot. To get the mount to come back up, you have to configure the autofs service on the Mac. This is different from Linux, where you can simply edit the fstab file. The process is covered very well in this blog post (just be sure to read all the way down to avoid the issue he mentions at the end):

https://coderwall.com/p/fuoa-g/automounting-nfs-share-in-os-x-into-volumes

Here’s my configuration…. I disabled “nobrowse” to prevent issues in case Audio Hijack needed to be able to browse.

autofs.conf

Screen Shot 2017-09-22 at 10.04.37 AM

auto_master file

Screen Shot 2017-09-22 at 10.04.59 AM

auto_nfs

Screen Shot 2017-09-22 at 10.05.17 AM

After that was set up, I copied over the existing 50-ish GBs of data into the FlexGroup and cleaned up some space on the Mac.

ontap9-tme-8040::*> vol show -vserver DEMO -volume TechONTAP* -sort-by size -fields size,used
vserver volume size used
------- --------------- ----- -------
DEMO TechONTAP__0001 640GB 5.69GB
DEMO TechONTAP__0002 640GB 8.24GB
DEMO TechONTAP__0003 640GB 5.56GB
DEMO TechONTAP__0004 640GB 6.48GB
DEMO TechONTAP__0005 640GB 6.42GB
DEMO TechONTAP__0006 640GB 8.39GB
DEMO TechONTAP__0007 640GB 6.25GB
DEMO TechONTAP__0008 640GB 6.25GB
DEMO TechONTAP 5TB 53.29GB
9 entries were displayed.

Then, I configured Audio Hijack to pump the recordings to the FlexGroup volume.

Screen Shot 2017-09-22 at 10.01.00 AM.png

Then, we recorded a couple episodes, without an issue!

Screen Shot 2017-09-22 at 10.34.30 AM.png

As you can see from this output, the FlexGroup volume is relatively evenly allocated:

ontap9-tme-8040::*> node run * flexgroup show TechONTAP
2 entries were acted on.

Node: ontap9-tme-8040-01
FlexGroup 0x80F03817
* next snapshot cleanup due in 2886 msec
* next refresh message due in 886 msec (last to member 0x80F0381F)
* spinnp version negotiated as 4.6, capability 0x3
* Ref count is 8

Idx Member L Used Avail Urgc Targ Probabilities D-Ingest Alloc F-Ingest Alloc
--- -------- - --------------- ---------- ---- ---- --------------------- --------- ----- --------- -----
 1 2044 L 1485146 0% 159376256 0% 12% [100% 100% 79% 79%] 0+ 0 0 0+ 0 0
 2 2045 R 2153941 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 3 2046 L 1415120 0% 159339950 0% 12% [100% 100% 76% 76%] 0+ 0 0 0+ 0 0
 4 2047 R 1690392 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 5 2048 L 1675583 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 6 2049 R 2191360 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 7 2050 L 1630946 1% 159376256 0% 12% [100% 100% 87% 87%] 0+ 0 0 0+ 0 0
 8 2051 R 1631429 1% 159376256 0% 12% [100% 100% 87% 87%] 0+ 0 0 0+ 0 0

Node: ontap9-tme-8040-02
FlexGroup 0x80F03817
* next snapshot cleanup due in 3144 msec
* next refresh message due in 144 msec (last to member 0x80F03818)
* spinnp version negotiated as 4.6, capability 0x3
* Ref count is 8

Idx Member L Used Avail Urgc Targ Probabilities D-Ingest Alloc F-Ingest Alloc
--- -------- - --------------- ---------- ---- ---- --------------------- --------- ----- --------- -----
 1 2044 R 1485146 0% 159376256 0% 12% [100% 100% 79% 79%] 0+ 0 0 0+ 0 0
 2 2045 L 2153941 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 3 2046 R 1415120 0% 159339950 0% 12% [100% 100% 76% 76%] 0+ 0 0 0+ 0 0
 4 2047 L 1690392 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 5 2048 R 1675583 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 6 2049 L 2191360 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 7 2050 R 1630946 1% 159376256 0% 12% [100% 100% 87% 87%] 0+ 0 0 0+ 0 0
 8 2051 L 1631429 1% 159376256 0% 12% [100% 100% 87% 87%] 0+ 0 0 0+ 0 0

I plan on using this setup when I start writing the new FlexGroup data protection best practice guide, so stay tuned for that…

So, now, the Tech ONTAP podcast is happily drinking the NetApp FlexGroup champagne!

If you’re going to NetApp Insight, check out session 16594-2 on FlexGroup volumes.

For more information on NetApp FlexGroup volumes, see:

Behind the Scenes: Episode 106 – NetApp Insight 2017 Preview

Welcome to the Episode 106, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

group-4-2016

This week on the podcast, we preview NetApp Insight 2017 with one of the event’s track leads, Jake Thorne, as well as the mastermind of Insight Central, Melissa Hara (@Melissa_NTAP)! Join us as we discuss what is where at the event, as well as some tips on how you can best navigate the sessions at NetApp Insight 2017. 

Also, be sure to check out my blog on Insight sessions you might want to check out!

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

You can listen to this week’s episode here:

Why are there so many P releases in ONTAP lately?

If you’ve been paying any attention, you’d notice that ONTAP 9.1P8 just released last week. That’s insane, right? I mean, ONTAP 9.1 just went GA less than a year ago! And ONTAP 8.2.4 only had 5 or 6 P releases ever! What’s going on???

It’s simple… ONTAP has a different software release cadence.

Starting with ONTAP 9, the release cadence model changed to accelerate the release of new ONTAP features. Now, instead of a major release (think 8.1, 8.2, 8.3, etc.) coming out every year and a half, we ship feature-rich major releases every 6 months. This means that NetApp can be more agile with their development cycles and more aggressive in releasing new features.

This also means, no more “maintenance releases.”

What’s a maintenance release?

A maintenance release was one of the “dot” releases you’d see in between major releases. Remember, it was usually 18 months between a major release, so while you were waiting for 8.2 to ship, NetApp was releasing 8.1.1, 8.1.2, 8.1.3, etc. These releases were generally devoid of new features, but instead included bug fixes. That was in addition to the “patch” releases, which were intended to be releases made to fix major bugs faster than a maintenance release could.

So, instead of seeing 9.1.1, 9.1.2, and so on, you’re going to get P releases. And that’s why you’re seeing an uptick in P releases for ONTAP 9.x in a shorter time frame. So, no worries! ONTAP 9.x is still one of the most stable family of releases we’ve seen for clustered ONTAP, regardless of the number of P releases.

General P release/upgrade guidance

If you’re trying to determine whether you should upgrade to a P release of ONTAP, here are some helpful tips:

  • P releases are fully production ready and QA tested
  • If you are trying to decide whether to upgrade to a P release, be sure to review the bug fix list on the P release download page to see if you’re exposed to any of the bugs and if you think it’s worth your time to upgrade
  • Make use of the “upgrade recommendation” found in MyAutoSupport.
  • ONTAP provides the ability to perform non-disruptive upgrades, so updating to a P release should take minimal downtime. This is especially true of ONTAP versions running in the same major release family, as there are no version mismatches to worry about in the upgrade.
  • System Manager now provides automated upgrade utilities to provide for a simpler upgrade process
  • Be sure to review the software version support policy for your release to make the most informed decision you can.

Hopefully this clears up any questions you have about P releases. Ping me in the comments if you need clarifications!

The NetApp E-Series EF570 – Leaving the Competition in the Dust

A few weeks ago, we had some folks from the E-Series team on the Tech ONTAP Podcast to give us an overview. In that podcast, we refer to the E-Series as the drag racer of NetApp, for good reason – it’s fast!

We were a tad early to record, as we secretly knew the EF570 AFA was about to release some smokin’ SPC-1 and SPC-2 numbers. Well, now, they’re officially out!

Check out the press release here.

What are the SCP-1 and SPC-2?

SPC stands for the “Storage Performance Council.” These are the folks that provide standard industry benchmarks for storage, with the hopes that the thirst for competition will drive vendors to create faster storage platforms. From their site’s charter:

The SPC serves as a catalyst for performance improvement in storage products.  In support of that goal, the SPC has developed a complete portfolio of industry-standard storage benchmarks. The comprehensive SPC benchmark portfolio utilizes I/O workloads that represent the “real world” storage performance behavior of both OLTP (online transaction processing) and sequential applications.

The SPC benchmark portfolio provides a rigorous, audited and reliable means to produce comparative storage performance, price-performance and energy use data, which is used to develop and evaluate storage products, which range from individual components to complex, distributed storage configurations.

NetApp submits results for several platforms, from ONTAP to E-Series, and generally does pretty well. In this case…

NetApp reached the top 10 in performance at #7 and set a new world record for SPC-1 v3 price/performance.

Tested Storage Product: Netapp EF570 All-Flash Array
SPC-1 IOPS: 500,022
SPC-1 Price-Performance™: $0.13/SPC-1 IOPS™ 
Total ASU Capacity 9,006 GB 
Data Protection Level Protected 2 (Mirrored and full redundancy) 
Total Price $64,212.58

The full EF570 results can be found here:

SPC-1 results:

http://www.storageperformance.org/results/results_spc1_v3/spc1_v3_active#A31009

SPC-2 results:

http://www.storageperformance.org/results/benchmark_results_spc2_active/#B12003

Check out the official NetApp blog here:

https://blog.netapp.com/new-netapp-all-flash-and-hybrid-flash-systems-software-splunk-solutions/