Why Is the Internet Broken: Greatest Hits

When I started this site back in October of 2014, it was mainly to drive traffic to my NetApp Insight sessions -and it worked.

(By the way… stay tuned for a blog on this year’s new Insight sessions by yours truly. Now with more lab!)

As I continued writing, my goal was to keep creating content – don’t be the guy who just shows up during conference season.

blogfieldofdreams

So far, so good.

But since I create so much content, it gets hard to find for new visitors to this site, The WordPress archives/table of contents is lacking. So, what I’ve done is create my own table of contents of the top 15 most visited posts and the last 5-10 newest. I will keep it up as the main landing page. The list will change on occasion to keep up changing stats.

Newest posts (excluding “Behind the Scenes posts”)

FlexGroups: An evolution of NAS

Setting up BIND to be as insecure as possible in Centos/RHEL7

ONTAP 9 RC1 is now available!

ONTAP 9 Feature: Volume rehosting

Migrating to ONTAP – Ludicrous speed!

 

Top 5 Blogs (by number of visits)

TECH::Using NFS with Docker – Where does it fit in?

TECH:: NetApp is kicking some flash!

TECH::Clustered Data ONTAP 8.3.1 is now in general availability (GA)!

TECH::Data LIF best practices for NAS in cDOT 8.3

TECH::Become a clustered Data ONTAP CLI Ninja

DataCenterDude

I also write for datacenterdude.com on occasion. To read those, go to this link:

My DataCenterDude stuff

How else do I find stuff?

You can also search on the site or click through the archives, if you choose. If you have questions or want to see something changed or added to the site, follow me on Twitter @NFSDudeAbides or comment on one of the posts here!

You can also email me at whyistheinternetbroken@gmail.com.

Behind the Scenes: Episode 53 –Developer Advocacy and Kubernetes

Welcome to the Episode 53, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

ep53

This week, we welcome a very special guest – developer advocate rockstar Kelsey Hightower (@kelseyhightower) from Google/Kubernetes! Kelsey gives us the run down on his views on advocacy vs. evangelism and where Kubernetes fits in with a changing IT landscape. I was able to convince Kelsey to join us by simply asking. Super accessible!

If that’s not enough, we also bring in NetApp SolidFire’s developer advocate, Josh Atwell (@josh_atwell). Josh echoes some of the same feelings as Kelsey, and gives us his Monty Python pitch for Puppet.

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

You can listen here:

Behind the Scenes: Episode 52 –SolidFire’s Position in NetApp’s Portfolio

Welcome to the Episode 52, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

ep52

This week, we welcomed NetApp/SolidFire’s Business Development Manager Keith Norbie (@keithnorbie) and Product Marketing Manager Kelly Boeckman (@kellyboeckman) to discuss the SolidFire positioning in NetApp’s portfolio, as well as playing a rousing game of “You might be SolidFire if…” (Spoiler: Keith lost)

Keith also made use of the guest beard/neckbeard/mullet.

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

You can listen here:

What’s the deal with remote I/O in ONTAP?

cropped-jerry-seinfeld-stand-up-comedy-seinfeld1

I’m sure most of you have seen Seinfeld, so be sure to read the title in your head as if Seinfeld is delivering it.

I used a comedian as a starter because this post is about a question that I get asked – a lot – that is kind of a running joke by now.

The set up…

When Clustered Data ONTAP first came out, there was a pretty big kerfuffle (love that word) about the architecture of the OS. After all, wasn’t it just a bunch of 7-Mode systems stitched together with duct tape?

Actually, no.

It’s a complete re-write of the ONTAP operating system, for one. The NAS stack from 7-Mode was gutted and became a new architecture built for clustering.

Then, in 8.1, the SAN concepts in 7-Mode were re-done for clustering.

So, while a clustered Data ONTAP cluster is, at the hardware level, a series of HA pairs stitched together with a 10GB network, the operating system has been turned into essentially what I like to call a storage blade center. Your storage systems span clusters of up to 24 physical hardware nodes, effectively obfuscating the hardware and allowing a single management plane for the entire subsystem.

Every node in a cluster is aware of every other node, as well as every other storage object. If a volume lives on node 1, then node 20 knows about it and where it lives via the concept of a replicated database (RDB).

Additionally, the cluster also has a clustered networking stack, where an IP address or WWPN is presented via a logical interface (a LIF). While SAN LIFs have to stay put and leverage host-side pathing for data locality, NAS LIFs have the ability to migrate across any node and any port in the cluster.

However, volumes are still located on physical disks and owned by physical nodes, even though you can move them around via volume move or vol rehost. LIFs are still located on physical ports and nodes, even though you can move them around and load balance connections on them. This raises the question…

What is the deal with remote I/O in ONTAP?

Since you can have multiple nodes in a cluster and a volume can only exist on one node (well, unless you want to check out FlexGroups), and since data LIFs live on single or aggregated ports on a single node, you are bound to run into scenarios where you end up traversing the backend cluster network for data operations unless you want to take on the headache of ensuring every client mounts to a specific IP address to ensure data locality, or you want to leverage one of the data locality features in NAS, such as pNFS or node referrals on initial connection (available for NFSv4.x and CIFS/SMB). I cover some of the NFS-related data locality features in TR-4067, and CIFS autolocation is covered in TR-4191.

In SAN, we have ALUA to manage that locality (or optimized paths), but even adding an extra layer of protection in the form of protocol locality can’t avoid scenarios where interfaces go down or volumes move around after a TCP connection has been established.

That backend network? Why, it’s a 10GB dedicated network with 2-4 dedicated ports per node. No other data is allowed on the network other than cluster operations. Data I/O traverses the network in a proprietary protocol known as SpinNP, which leverages TCP to guarantee the arrival of packets. And, with the advent of 40GB ethernet and other speedier methods of data transfer, I’d be shocked if we didn’t see that backend network improve over the next 5-10 years. The types of operations that traverse the cluster network include:

  • SpinNP for data/local snapmirror
  • ZAPI calls

That’s pretty much it. It’s a beefy, robust backend network that is *extremely* hard to saturate. You’re more likely to bottleneck somewhere else (like your client) before you overload a cluster network.

So now that we’ve established that remote I/O will likely happen, let’s talk about if that matters…

The punchline

simpson_krusty_il_clown

Remote I/O absolutely adds overhead to operations. There’s no technical way around saying it. Suggesting there is no penalty would be dishonest. The amount of penalty, however, varies, depending on protocol. This is especially true when  you consider that NAS operations will leverage a fast path when you localize data.

But the question wasn’t “is there a penalty?” The question is “does it matter?”

I’ll answer with some anecdotal evidence – I spent 5 years in support, working on escalations for clustered Data ONTAP for 3 of those years. I closed thousands of cases over that time period. In that time, I *never* fixed a performance issue by making sure a customer used a local data path.  And believe me, it wasn’t for lack of effort. I *wanted* remote traffic to be the root cause, because that was the easy answer.

Sure, it could help when dealing with really low latency applications, such as Oracle. But in those cases, you architect the solution with data locality in mind. In the other vast majority of scenarios, the “remote I/O” penalty is pretty much irrelevant and causes more hand wringing than necessary.

The design of clustered Data ONTAP was intended to help storage administrators stop worrying about the layout of the data. Let’s start allowing it to do its job!

Behind the Scenes: Episode 51 – Guided Problem Solving and Live Chat Support

Welcome to the Episode 51, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

ep51

This week, we welcome Ross Ackerman (@TheRossAckerman) to talk about some improvements to the NetApp Support site experience, and how customers can leverage support without having to open cases or pick up the phone.

Guided Problem Solving

The first thing we discuss is a feature called “Guided Problem Solving.” This feature is exactly what it sounds like – a guided problem solver. If you want more information, check out the white paper on Guided Problem Solving and Chat.

When you land on the NetApp support site, you’ll see a green box in the middle of the page:

guided-problem-solving.png

Right now, those are the only options. Expect more available products in this feature in the near future…

From there, click on the solution you need to work on. That will open a page with a subset of solutions:

guided-problem-solving-2

Since I am the NFS dude, I picked NFS.

When you click on the desired subject, you get a new page. It starts off with the setup and configuration docs, mainly because that’s one of the first things people are trying to find.

However, there are also areas to find KBs, Tech Reports and community posts on the selected subject.

guided-problem-solving-3.png

Of course, if the provided information doesn’t help you, click “create a case.”

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

You can listen here:

Behind the Scenes: Episode 50 – Cisco Live Recap? Nah. FlexPod Infomercial!

Welcome to the Episode 50, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

A few weeks ago, I thought about making Episode 50 some big event. Confetti, balloons, special celebrity guests. Then inertia set in, as well as the notion that none of us are very “showy” guys. So, instead, we welcomed Glenn back from Cisco Live/vacation to talk about Cisco Live.

What we got instead was a missive on the new FlexPod offering. We just wound Glenn up and let him go…

20120222053249171

Artist’s rendition of Glenn

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

 

You can listen here:

The Joy of Sec: Realmd

Recently, the esteemed Jonathan Frappier (@jfrappier) posted an article on setting up Kerberos for use with Ansible. My Kerberos senses started to tingle…

kerb-sense

While Jonathan was referring to Ansible, it made me remember that this question comes up a lot when trying to use Kerberos with Linux clients.

Kerberos isn’t necessarily easy

When using Kerberos with Active Directory and Windows clients, it’s generally pretty straightforward, as the GUI does most of the work for you. When you add a Windows box to a domain, the SPN and machine account principal is auto-populated from the AD KDC.

The keytab file gets ported over to the client and, provided you have a valid Windows login, you can start using Kerberos without ever actually knowing you are using it. In fact, most people don’t realize they’re using it until it breaks.

Additionally, even if Kerberos isn’t working in Windows, there is the fallback option of NTLM authentication, so if you can’t get a ticket to access a share, you could always use the less secure auth method (unless you disabled it in the domain).

As a result, in 90% of the cases, you never even have to think about Kerberos in a Windows-only environment, much less know how it works. I know this from experience as a Windows administrator in my earlier IT days. Once I started working for NetApp support, I realized how little I actually knew about how Windows authentication worked.

So, say what you will about Windows, but it is *way* simpler in most cases for daily tasks like authentication.

Linux isn’t necessarily hard

One of the main things I’ve learned about Linux as I transitioned from solely being a “Windows guy” into a hybrid-NAS guy is that Linux isn’t really that hard. It’s just… different.

And by “different,” I mean it in terms of management. The core operating systems of Windows and Linux are essentially identical in terms of functionality:

  • They both boot from a kernel and load configurations via config files
  • They both leverage file system partitions and services
  • They both can be run on hardware or software (virtualized)
  • They both require resources like memory and CPU

The main differences between the two, in my opinion, are the open source aspect and the way you manage them. Naturally, there are a ton of other differences and I’m not interested in debating the merits of the OS. My point is simply this: Linux is only hard if you aren’t familiar with it.

That said, some things in Linux can be very manual processes. Kerberos configuration, for example, used to be a very convoluted process. In older Linux clients, you had to roughly do the following to get it to work:

  • Create a user or machine account in the KDC manaually (the Kerberos principal)
  • Assign SPNs manually to the principal
  • Configure the desired enctypes on the principal manually
  • Create the keytab for the principal manually (using something like ktpass)
  • Copy the keytab to the Linux client
  • Install the keytab to the client manually (using something like ktutil)
  • Configure the client to use secure NFS and configure the KDC realm information manually
  • Start the GSSD service manually and configure it to start on boot
  • Configure DNS
  • Ensure the time skew is within 5 minutes/configure NTP
  • Configure LDAP on the NFS client manually

That’s all off the top of my head. I’m sure I’m missing something, mainly because that’s a LONG LIST. But, Linux is getting better and automating more of these tasks. CentOS7/RHEL7 took a big leap in that regard by including realmd.

If you’re looking for the easiest way to configure Kerberos…

Use realmd. It’s brilliant.

It automates most the Kerberos client configuration tasks I listed above. Sure, you still have to install it and a few other tools (like SSSD, Kerberos workstation, etc) and configure the realm information, NTP and DNS settings, but after that, it’s as simple as running “realm join.”

This acts a lot like a Windows domain join in that it:

  • Creates a machine account for you
  • Creates the SPNs for you
  • Creates the keytab for you
  • Adds the keytab file to the client manually
  • Configures SSSD to use Windows AD for LDAP/Identity management for you

Super simple. I cover it in the next update of TR-4073 (update to that coming soon… stay tuned) as it pertains to NetApp storage systems, but there are plenty of how-to guides for just the client portion out there.

Happy Kerberizing!

Behind the Scenes: Episode 49 – Data Governance and Operational Point Objectives

Welcome to the Episode 49, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

This week, we dug up a podcast recording from the April-May time frame, where we spoke with the Storage Service Design team and Ken Socko about Data Governance and Operational Point Objectives. The concept is not only protecting data, but also securing it, as well as delivering a more efficient recovery plan.

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

The official blog is here:

http://community.netapp.com/t5/Technology/Tech-ONTAP-Podcast-Episode-49-Data-Governance-amp-Operational-Point-Objectives/ba-p/121643

You can listen here:

Spreading the love: Load balancing NAS connections in ONTAP

peanut-butter-spread-400x400

I can be a little thick at times.

I’ll get asked a question a number of times, answer the question, and then forget the most important action item – document the question and answer somewhere to refer people to later, when I inevitably get asked the same question.

Some of the questions I get asked about fairly often as the NetApp NFS Technical Marketing Engineer involve DNS, which is only loosely associated with NFS. Go figure.

But, because I know enough about DNS to have written a blog post on it and a Technical Report on our Name Services Best Practices (and I actually respond to emails), I get asked.

These questions include:

  • What’s round robin DNS?
  • What other load balancing options are  there?
  • What is on-box DNS in clustered Data ONTAP?
  • How do I ensure data access is local?
  • How do I set it up?
  • When would I use on-box DNS vs DNS round robin?

So, in this blog, I’ll try to answer most of those at a high level. For more detail, see the new TR-4523: DNS Load Balancing in ONTAP.

What’s round robin DNS?

Remember when you were in school and you played “duck duck goose“? If you didn’t, click the link on the term and read about it.

But essentially, the game is: everyone sits in a circle, someone walks around the circle and taps each person and says “duck” and then when they want to initiate the chase, they yell “GOOSE!” and run around the circle to sit before the person catches them.

That’s essentially round robin DNS.

You create multiple A/AAAA records, associate with the same host name and away you go! The DNS server will deliver a different IP address for each request of the hostname, in ABCD/ABCD fashion. No real rhyme or reason, just first come/first serve.

What other DNS load balancing options are there?

There are 3rd party load balance appliances, such as F5 Big IP (not an endorsement, just an example). But, those cost money and require administration.

In ONTAP, however, there is a not-so-well-known feature for DNS load balancing called “on-box DNS load balancing” that is intended to incorporate intelligent load balancing for DNS requests into a cluster.

What is on-box DNS load balancing?

On-box DNS load balancing in ONTAP uses a patented algorithm to determine the best possible data LIFs on the best possible nodes to return to clients.

Basically, it looks a bit like this:

onbox

The client will make a DNS request to the DNS servers in its configuration.

The DNS server will notice that the request is from a specific zone and use its zone forwarder to pass that request to the cluster data LIFs acting as name servers.

The cluster will leverage its DNS application process and a weight file to determine which IP addresses out of the ones configured to be used in that DNS zone should be used.

The algorithm factors in CPU utilization, throughput, etc when making the determination.

The data LIF IP address is passed back to the DNS server, then to the client.

Easy peasy.

picture13911134748425

How do I ensure data locality?

The short answer: With on-box DNS, you can’t. But does it matter?

In clustered Data ONTAP, if you have multiple nodes and multiple data LIFs, you might end up landing on a node’s data LIF that is not local to the volume being requested. That can incur a slight latency penalty as the request traverses the backend cluster network.

In a majority of cases, this penalty is negligible to clients and applications, but with latency-sensitive applications (especially in flash environments), this penalty can hurt a little. Using local network connections to data volumes for NAS uses a concept of “fast path” that bypasses things that the remote connections need to do. I cover this in a little more detail in TR-4067 and in TECH::Data LIF best practices for NAS in cDOT 8.3.

In cases where you absolutely *need* data access to be local to the node, you would need to mount those local data LIFs specifically. Create A/AAAA records with node names incorporated to help discern which LIFs are on which nodes.

But in most cases, it doesn’t hurt to have remote traffic – in my 5 years in support, I never fixed a performance issue by making data access local to the node.

How do I set it up?

It’s pretty straightforward. I cover it in detail in TR-4523: DNS Load Balancing in ONTAP. In that TR, I cover Active Directory and BIND environments.

For a simple summary:

  1. Configure data LIFs in your storage virtual machine to use -dns-zone [zone name]
  2. Select data LIFs in your storage virtual machine that will act as name servers and listen for DNS queries on port 53 with “-listen-for-dns-query true”. I’d recommend multiple LIFs to provide fault tolerance.
  3. Add a DNS forwarding zone (subdomain in BIND, delegation or conditional forwarder in AD) on the DNS server. Use the data LIFs acting as name servers in the configuration and use the zone specified in -dns-zone.
  4. Add PTR records for the LIFs as needed.

That’s about it.

When to use on-box DNS vs Round Robin DNS?

This is one of the trickier questions I get, because it’s ultimately due to preference.

However, there are some guidelines…

  • If the cluster is 1 or 2 nodes in size, it probably makes sense from a administration perspective to simply use round robin DNS.
  • If the cluster is larger than 2 nodes or will eventually scale out to more than 2 nodes, it probably makes sense to get the forwarding zones set up and use on-box DNS.
  • If you require data locality or plan on using features such as NFS node referrals, SMB node referrals or pNFS, then the load balance choice doesn’t matter much – the locality features will override the DNS request.

Conclusion

So there you have it – the quick and dirty rundown of using DNS load balancing for NAS connections. I’m personally a big fan of on-box DNS as a feature because of the notion of intelligent calculation of “best available” IP addresses.

If you have any questions about the feature or the new TR-4523, please comment below.

NetApp stuff you should be using: NetAppDocs

netappdocs.png

Sometimes, there are NetApp tools out there that no one really knows about – including people who work at NetApp. And it’s unfortunate, as there are some pretty great tools out there.

One tool in particular – NetAppDocs.

What is it?

NetAppDocs is:

A PowerShell module and contains a set of functions that automate the creation of NetApp® site design documentation. NetAppDocs can generate Excel, Word and PDF document types. The data contained in the output documents can be sanitized for use in sites where the data may be sensitive.

The tool/guide was written by NetApp PSC Jason Cole and can be found here (requires a NetApp login):

http://mysupport.netapp.com/tools/download/ECMP12505953DT.html?productID=62107

What can I use it for?

The intent of the NetAppDocs tool is to automate documentation based on specific storage configurations. The idea is that, while documentation tries to fit all use cases, it’s not perfect and cannot adapt to varying configurations. By using this tool, we can generate a set of docs that cover specific configurations.

Another use case that came up recently on our DLs at NetApp was to document the default options for ONTAP in an easy to find, easy to read format. While the man pages keep most of this information, it can be time consuming to trawl through the pages and pages of docs out there. With this tool, once a cluster is installed, simply run it and get the default option settings right off the bat.

Additionally, the data collected can be useful for support cases where ASUP isn’t sending to NetApp for whatever reason.

This tool works with ONTAP running in 7-Mode or clustered Data ONTAP. You can even use it in secure sites easily and sanitize the data for external consumption!

How to use it

Because this is a PowerShell tool, you’d install it on a server running PowerShell. Refer to the tool’s documentation to find what the minimum PS version to use. In the case of NetAppDocs 3.1, the following is recommended:

  1. Microsoft Windows® 32-bit/64-bit computer
  2. Microsoft Windows PowerShell 3.0 or higher
  3. Microsoft .Net Framework 4.0 or higher
  4. NetApp Data ONTAP PowerShell Toolkit (included in the zip file or install package)
  5. NetApp Data ONTAP 7.2.x, 7.3.x, 8.0.x (7-Mode), 8.1.x, 8.2.x and 8.3.x
  6. Internal NetApp connection and SSO login required for ASUP data collection

The installation is simple; just a simple .msi and some mouse clicks. This essentially installs the necessary PowerShell cmdlets and scripts.

Then, follow the instructions in the guide to allow PowerShell execution and import the module.

PS C:\> Import-Module NetAppDocs

To view the HTML documentation after the tools are installed:

PS C:\> Show-NtapDocsHelp

In those docs, there are usage examples, functions and other helpful information.

You can also get help via PowerShell:

PS C:\> Get-Command -Module NetAppDocs

If you have a NetApp login, go check it out today and let them know what you think of it at mailto: ng-NetAppDocs-support@netapp.com.

Behind the Scenes: Episode 48 – ONTAP 9 Manageability

ep48.jpg

Welcome to the Episode 48, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

This is the final episodes for ONTAP 9 month on the podcast.

ontap9week

This week, we talk about manageability tools with Director of Technical Marketing, Joel Kaufman (@thejoelk). We were supposed to have Vidula Aiyer on to discuss headroom, but she had some technical difficulties with Skype. Perhaps we can have her on another time…

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

The official blog is here:

http://community.netapp.com/t5/Technology/Tech-ONTAP-Podcast-Episode-48-ONTAP-9-Manageability-Tools/ba-p/121322

The podcast episode is here: