Spreading the love: Load balancing NAS connections in ONTAP

peanut-butter-spread-400x400

I can be a little thick at times.

I’ll get asked a question a number of times, answer the question, and then forget the most important action item – document the question and answer somewhere to refer people to later, when I inevitably get asked the same question.

Some of the questions I get asked about fairly often as the NetApp NFS Technical Marketing Engineer involve DNS, which is only loosely associated with NFS. Go figure.

But, because I know enough about DNS to have written a blog post on it and a Technical Report on our Name Services Best Practices (and I actually respond to emails), I get asked.

These questions include:

  • What’s round robin DNS?
  • What other load balancing options are  there?
  • What is on-box DNS in clustered Data ONTAP?
  • How do I ensure data access is local?
  • How do I set it up?
  • When would I use on-box DNS vs DNS round robin?

So, in this blog, I’ll try to answer most of those at a high level. For more detail, see the new TR-4523: DNS Load Balancing in ONTAP.

What’s round robin DNS?

Remember when you were in school and you played “duck duck goose“? If you didn’t, click the link on the term and read about it.

But essentially, the game is: everyone sits in a circle, someone walks around the circle and taps each person and says “duck” and then when they want to initiate the chase, they yell “GOOSE!” and run around the circle to sit before the person catches them.

That’s essentially round robin DNS.

You create multiple A/AAAA records, associate with the same host name and away you go! The DNS server will deliver a different IP address for each request of the hostname, in ABCD/ABCD fashion. No real rhyme or reason, just first come/first serve.

What other DNS load balancing options are there?

There are 3rd party load balance appliances, such as F5 Big IP (not an endorsement, just an example). But, those cost money and require administration.

In ONTAP, however, there is a not-so-well-known feature for DNS load balancing called “on-box DNS load balancing” that is intended to incorporate intelligent load balancing for DNS requests into a cluster.

What is on-box DNS load balancing?

On-box DNS load balancing in ONTAP uses a patented algorithm to determine the best possible data LIFs on the best possible nodes to return to clients.

Basically, it looks a bit like this:

onbox

The client will make a DNS request to the DNS servers in its configuration.

The DNS server will notice that the request is from a specific zone and use its zone forwarder to pass that request to the cluster data LIFs acting as name servers.

The cluster will leverage its DNS application process and a weight file to determine which IP addresses out of the ones configured to be used in that DNS zone should be used.

The algorithm factors in CPU utilization, throughput, etc when making the determination.

The data LIF IP address is passed back to the DNS server, then to the client.

Easy peasy.

picture13911134748425

How do I ensure data locality?

The short answer: With on-box DNS, you can’t. But does it matter?

In clustered Data ONTAP, if you have multiple nodes and multiple data LIFs, you might end up landing on a node’s data LIF that is not local to the volume being requested. That can incur a slight latency penalty as the request traverses the backend cluster network.

In a majority of cases, this penalty is negligible to clients and applications, but with latency-sensitive applications (especially in flash environments), this penalty can hurt a little. Using local network connections to data volumes for NAS uses a concept of “fast path” that bypasses things that the remote connections need to do. I cover this in a little more detail in TR-4067 and in TECH::Data LIF best practices for NAS in cDOT 8.3.

In cases where you absolutely *need* data access to be local to the node, you would need to mount those local data LIFs specifically. Create A/AAAA records with node names incorporated to help discern which LIFs are on which nodes.

But in most cases, it doesn’t hurt to have remote traffic – in my 5 years in support, I never fixed a performance issue by making data access local to the node.

How do I set it up?

It’s pretty straightforward. I cover it in detail in TR-4523: DNS Load Balancing in ONTAP. In that TR, I cover Active Directory and BIND environments.

For a simple summary:

  1. Configure data LIFs in your storage virtual machine to use -dns-zone [zone name]
  2. Select data LIFs in your storage virtual machine that will act as name servers and listen for DNS queries on port 53 with “-listen-for-dns-query true”. I’d recommend multiple LIFs to provide fault tolerance.
  3. Add a DNS forwarding zone (subdomain in BIND, delegation or conditional forwarder in AD) on the DNS server. Use the data LIFs acting as name servers in the configuration and use the zone specified in -dns-zone.
  4. Add PTR records for the LIFs as needed.

That’s about it.

When to use on-box DNS vs Round Robin DNS?

This is one of the trickier questions I get, because it’s ultimately due to preference.

However, there are some guidelines…

  • If the cluster is 1 or 2 nodes in size, it probably makes sense from a administration perspective to simply use round robin DNS.
  • If the cluster is larger than 2 nodes or will eventually scale out to more than 2 nodes, it probably makes sense to get the forwarding zones set up and use on-box DNS.
  • If you require data locality or plan on using features such as NFS node referrals, SMB node referrals or pNFS, then the load balance choice doesn’t matter much – the locality features will override the DNS request.

Conclusion

So there you have it – the quick and dirty rundown of using DNS load balancing for NAS connections. I’m personally a big fan of on-box DNS as a feature because of the notion of intelligent calculation of “best available” IP addresses.

If you have any questions about the feature or the new TR-4523, please comment below.

Advertisement

3 thoughts on “Spreading the love: Load balancing NAS connections in ONTAP

  1. Pingback: What’s the deal with remote I/O in ONTAP? | Why Is The Internet Broken?

  2. Interesting read . I need to solve an issue where we have no DNS and we have to cater for LIF failures across 4 nodes. NFSv3 only. Hosts file won’t cut it. Maybe playing around with mounting all 4 IPS and combine with symbolic links < link to 1 mount point. But can we use on board DNS on its own for this?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s