New/Updated NAS Technical Reports! – Spring 2020

With the COVID-19 quarantine, stay at home orders and new 1-year ONTAP release cadence, I’m finding I have a lot more spare time, which translates into time to update old, crusty technical reports!

30 Gandalf Facts To Rule Them All | The Fact Site

Some of the old TRs hadn’t been updated for 3 years or so. Much of the information in those still applied, but overall, the TR either had to be retired or needed an update – if only to refresh the publish date and apply new templates.

So, first, let’s cover the grandfather TRs.

Updated TRs

TR-4073: Secure Unified Authentication

This TR was a monolith that I wrote when I first started as a TME back in 2015-ish. It covers LDAP, Kerberos and NFSv4.x for a unified security approach to NFS. The goal was to combine everything into a centralized document, but what ended up happening was I now had a TR that was 250+ pages long. Not only is that hard to read, but it’s also daunting enough to cause people not to want to read it at all. As a result, I made it a goal to break the TR up into more manageable chunks. Eventually, this TR will be deprecated in favor of newer TRs that are shorter and more specific.

TR-4616: NFS Kerberos in ONTAP

I created the NFS Kerberos TR in 2017 to focus only on Kerberos with NFS. To streamline the document, I narrowed the focus to only a set of configuration options (AD KDCs, RHEL clients, newest ONTAP version), removed extraneous details and moved examples/configuration steps to the end of the document. The end result – a 42 page document with the most important information taking up around 30 pages.

However, there hasn’t been an updated version since then. I’m currently in the process of updating that TR and was waiting on some other TRs to be completed before I finished this one. The new revision will include updated information and the page count will rise to around 60-70 pages.

TR-4067: NFS Best Practice Guide

This TR is another of the original documents I created and hasn’t been updated since 2017. It’s currently getting a major overhaul right now, including re-organizing the order to include the more crucial information at the start of the document and reducing the total page count by roughly 20 pages. Examples and advanced topics were moved to the back of the document and the “meat” of the TR is going to be around 90 pages.

Major changes include:

  • New TR template
  • Performance testing for NFSv3 vs. NFSv4.x
  • New best practice recommendations
  • Security best practices
  • Multiprotocol NAS information
  • Removal of Infinite Volume section
  • NFS credential information

As part of the TR-4073 de-consolidation project, TR-4067 will cover the NFSv4.x aspects.

This TR is nearly done and is undergoing some peer review, so stay tuned!

TR-4523: DNS Load Balancing in ONTAP

This TR was created to cover the DNS load balancing approaches for NAS workloads with ONTAP. It’s pretty short – 35 pages or so – and covers on-box and off-box DNS load balancing.

It was updated in May 2020 and was basically a minor refresh.

New TR

TR-4835: How to Configure LDAP in ONTAP

The final part of the TR-4073 de-consolidation effort was creating an independent LDAP TR. Unlike the NFS Kerberos TR, I wanted this one to cover a wide array of configurations and use cases, so the total length ended up being 135 pages, but the “meat” of the document (the most pertinent information) only takes up around 87 pages.

Sections include, in order:

  • LDAP overview
  • Authentication in ONTAP
  • LDAP Components and Considerations
  • Configuration
  • Common Issues and Troubleshooting
  • Best Practices
  • Appendix/Command Examples

Feedback and comments are welcome!

Behind the Scenes: Episode 137: Name Services in ONTAP

Welcome to the Episode 137, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, we talk Name Services in ONTAP and the introduction of the new global name services cache in ONTAP 9.3 with NAS TME, Chris Hurley (@averageguyX)!

We’ll be taking next week off as we record and prepare for some big announcements coming soon!

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

This week’s episode is here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

Cache Rules Everything Around Me: New Global Name Service Cache in ONTAP 9.3

cache-rules

In an ONTAP cluster made up of individual nodes with individual hardware resources, it’s useful if a storage administrator can manage the entire cluster as a monolithic entity, without having to worry about what lives where.

Prior to ONTAP 9.3, name service caches were node-centric, for the most part. This sometimes could create scenarios where a cache could become stale on one node, where it was recently populated on another node. Thus, a client may get different results depending on which physical node the network connection occurred.

The following is pulled right out of the new name services best practices technical report (https://www.netapp.com/us/media/tr-4668.pdf), which acts as an update to TR-4379. I wrote some of this, but most of what’s written here is by the new NFS/Name Services TME Chris Hurley. (@averageguyx) This is basically a copy/paste, but I thought this was a cool enough feature to highlight on its own.

Global Name Services Cache in ONTAP 9.3

ONTAP 9.3 offers a new caching mechanism that moves name service caches out of memory and into a persistent cache that is replicated asynchronously between all nodes in the cluster. This provides more reliability and resilience in the event of failovers, as well as offering higher limits for name service entries due to being cached on disk rather than in node memory.

The name service cache is enabled by default. If legacy cache commands are attempted in ONTAP 9.3 with name service caching enabled, an error will occur, such as the following:

Error: show failed: As name service caching is enabled, "Netgroups" caches no longer exist. Use the command "vserver services name-service cache netgroups members show" (advanced privilege level) to view the corresponding name service cache entries.

The name service caches are controlled in a centralized location, below the name-service cache command set. This provides easier cache management, from configuring caches to clearing stale entries.

The global name service cache can be disabled for individual caches using vserver services name-service cache commands in advanced privilege, but it is not recommended to do so. For more detailed information, please see later sections in this document.

ONTAP also offers the additional benefit of using the caches while external name services are unavailable.  If there is an entry in the cache, regardless if the entry’s TTL is expired or not, ONTAP will use that cache entry when external name services servers cannot be reached, thereby providing continued access to data served by the SVM.

Hosts Cache

There are two individual host caches; forward-lookup and reverse-lookup but the hosts cache settings are controlled as a whole.  When a record is retrieved from DNS, the TTL of that record will be used for the cache TTL, otherwise, the default TTL in the host cache settings will be used (24 hours).  The default for negative entries (host not found) is 60 seconds.  Changing DNS settings does not affect the cache contents in any way.

  • The network ping command does not use the name services hosts cache when using a hostname.

User and Group Cache

The user and group caches consist of three categories; passwd (user), group and group membership.

  • Cluster RBAC access does not use the any of the caches

Passwd (User) Cache

User cache consists of two caches, passwd and passwd-by-uid.  The caches only cache the name, uid and gid aspects of the user data to conserve space since the other data such as homedir and shell are irrelevant for NAS access.  When an entry is placed in the passwd cache, the corresponding entry is created in the passwd-by-uid cache.  By the same token, when an entry is deleted from one cache, the corresponding entry will be deleted from the other cache.  If you have an environment where there are disjointed username to uid mappings, there is an option to disable this behavior.

Group Cache

Like the passwd cache, the group cache consists of two caches, group and group-by-gid.  When an entry is placed in the group cache, the corresponding entry is created in the group-by-gid cache.  By the same token, when an entry is deleted from one cache, the corresponding entry will be deleted from the other cache.  The full group membership is not cached to conseve space and is not necessary for NAS data access, therefore only the group name and gid are cached.  If you have an environment where there are disjointed group name to gid mappings, there is an option to disable this behavior.

Group Membership Cache

In file and NIS environments, there is no efficient way to gather a list of groups a particular user is a member of, so for these environments ONTAP has a group membership cache to provide these efficiencies.  The group membership cache consists of a single cache and contains a list of groups a user is a member of.

Netgroup Cache

Beginning in ONTAP 9.3, the various netgroup caches have been consolidated into 2 caches; a netgroup.byhost and a netgroup.byname cache.  The netgroup.byhost cache is the first cache consulted for the netgroups a host is a part of.  Next, if this information is not available, then the query reverts to gathering the full netgroup members and comparing that to the host.  If the information is not in the cache, then the same process is performed against the netgroup ns-switch sources.  If a host requesting access via a netgroup is found via the netgroup membership lookup process, that ip-to-netgroup mapping is always added to the netgroup.byhost cache for faster future access.  This also leads to needing a lower TTL for the members cache so that changes in netgroup membership can be reflected in the ONTAP caches within the TTL timeframe.

Viewing cache entries

Each of the above came service caches and be viewed.  This can be used to confirm whether or not expected results are gotten from name services servers.  Each cache has its own individual options that you can use to filter the results of the cache to find what you are looking for.  In order to view the cache, the name-services cache <cache> <subcache> show command is used.

Caches are unique per vserver, so it is suggested to view caches on a per-vserver basis.  Below are some examples of the caches and the options.

ontap9-tme-8040::*> name-service cache hosts forward-lookup show  ?

  (vserver services name-service cache hosts forward-lookup show)

  [ -instance | -fields <fieldname>, ... ]

  [ -vserver <vserver name> ]                                                   *Vserver

  [[-host] <text>]                                                              *Hostname

  [[-protocol] {Any|ICMP|TCP|UDP}]                                              *Protocol (default: *)

  [[-sock-type] {SOCK_ANY|SOCK_STREAM|SOCK_DGRAM|SOCK_RAW}]                     *Sock Type (default: *)

  [[-flags] {FLAG_NONE|AI_PASSIVE|AI_CANONNAME|AI_NUMERICHOST|AI_NUMERICSERV}]  *Flags (default: *)

  [[-family] {Any|Ipv4|Ipv6}]                                                   *Family (default: *)

  [ -canonname <text> ]                                                         *Canonical Name

  [ -ips <IP Address>, ... ]                                                    *IP Addresses

  [ -ip-protocol {Any|ICMP|TCP|UDP}, ... ]                                      *Protocol

  [ -ip-sock-type {SOCK_ANY|SOCK_STREAM|SOCK_DGRAM|SOCK_RAW}, ... ]             *Sock Type

  [ -ip-family {Any|Ipv4|Ipv6}, ... ]                                           *Family

  [ -ip-addr-length <integer>, ... ]                                            *Length

  [ -source {none|files|dns|nis|ldap|netgrp_byname} ]                           *Source of the Entry

  [ -create-time <"MM/DD/YYYY HH:MM:SS"> ]                                      *Create Time

  [ -ttl <integer> ]                                                            *DNS TTL




ontap9-tme-8040::*> name-service cache unix-user user-by-id show

  (vserver services name-service cache unix-user user-by-id show)

Vserver    UID         Name         GID            Source  Create Time

---------- ----------- ------------ -------------- ------- -----------

SVM1       0           root         1              files   1/25/2018 15:07:13

ch-svm-nfs1

           0           root         1              files   1/24/2018 21:59:47

2 entries were displayed.

If there are no entries in a particular cache, the following message will be shown:

ontap9-tme-8040::*> name-service cache netgroups members show

  (vserver services name-service cache netgroups members show)

This table is currently empty.

There you have it! New cache methodology in ONTAP 9.3. If you’re using NAS and name services in ONTAP, it’s highly recommended to go to ONTAP 9.3 to take advantage of this new feature.

Spreading the love: Load balancing NAS connections in ONTAP

peanut-butter-spread-400x400

I can be a little thick at times.

I’ll get asked a question a number of times, answer the question, and then forget the most important action item – document the question and answer somewhere to refer people to later, when I inevitably get asked the same question.

Some of the questions I get asked about fairly often as the NetApp NFS Technical Marketing Engineer involve DNS, which is only loosely associated with NFS. Go figure.

But, because I know enough about DNS to have written a blog post on it and a Technical Report on our Name Services Best Practices (and I actually respond to emails), I get asked.

These questions include:

  • What’s round robin DNS?
  • What other load balancing options are  there?
  • What is on-box DNS in clustered Data ONTAP?
  • How do I ensure data access is local?
  • How do I set it up?
  • When would I use on-box DNS vs DNS round robin?

So, in this blog, I’ll try to answer most of those at a high level. For more detail, see the new TR-4523: DNS Load Balancing in ONTAP.

What’s round robin DNS?

Remember when you were in school and you played “duck duck goose“? If you didn’t, click the link on the term and read about it.

But essentially, the game is: everyone sits in a circle, someone walks around the circle and taps each person and says “duck” and then when they want to initiate the chase, they yell “GOOSE!” and run around the circle to sit before the person catches them.

That’s essentially round robin DNS.

You create multiple A/AAAA records, associate with the same host name and away you go! The DNS server will deliver a different IP address for each request of the hostname, in ABCD/ABCD fashion. No real rhyme or reason, just first come/first serve.

What other DNS load balancing options are there?

There are 3rd party load balance appliances, such as F5 Big IP (not an endorsement, just an example). But, those cost money and require administration.

In ONTAP, however, there is a not-so-well-known feature for DNS load balancing called “on-box DNS load balancing” that is intended to incorporate intelligent load balancing for DNS requests into a cluster.

What is on-box DNS load balancing?

On-box DNS load balancing in ONTAP uses a patented algorithm to determine the best possible data LIFs on the best possible nodes to return to clients.

Basically, it looks a bit like this:

onbox

The client will make a DNS request to the DNS servers in its configuration.

The DNS server will notice that the request is from a specific zone and use its zone forwarder to pass that request to the cluster data LIFs acting as name servers.

The cluster will leverage its DNS application process and a weight file to determine which IP addresses out of the ones configured to be used in that DNS zone should be used.

The algorithm factors in CPU utilization, throughput, etc when making the determination.

The data LIF IP address is passed back to the DNS server, then to the client.

Easy peasy.

picture13911134748425

How do I ensure data locality?

The short answer: With on-box DNS, you can’t. But does it matter?

In clustered Data ONTAP, if you have multiple nodes and multiple data LIFs, you might end up landing on a node’s data LIF that is not local to the volume being requested. That can incur a slight latency penalty as the request traverses the backend cluster network.

In a majority of cases, this penalty is negligible to clients and applications, but with latency-sensitive applications (especially in flash environments), this penalty can hurt a little. Using local network connections to data volumes for NAS uses a concept of “fast path” that bypasses things that the remote connections need to do. I cover this in a little more detail in TR-4067 and in TECH::Data LIF best practices for NAS in cDOT 8.3.

In cases where you absolutely *need* data access to be local to the node, you would need to mount those local data LIFs specifically. Create A/AAAA records with node names incorporated to help discern which LIFs are on which nodes.

But in most cases, it doesn’t hurt to have remote traffic – in my 5 years in support, I never fixed a performance issue by making data access local to the node.

How do I set it up?

It’s pretty straightforward. I cover it in detail in TR-4523: DNS Load Balancing in ONTAP. In that TR, I cover Active Directory and BIND environments.

For a simple summary:

  1. Configure data LIFs in your storage virtual machine to use -dns-zone [zone name]
  2. Select data LIFs in your storage virtual machine that will act as name servers and listen for DNS queries on port 53 with “-listen-for-dns-query true”. I’d recommend multiple LIFs to provide fault tolerance.
  3. Add a DNS forwarding zone (subdomain in BIND, delegation or conditional forwarder in AD) on the DNS server. Use the data LIFs acting as name servers in the configuration and use the zone specified in -dns-zone.
  4. Add PTR records for the LIFs as needed.

That’s about it.

When to use on-box DNS vs Round Robin DNS?

This is one of the trickier questions I get, because it’s ultimately due to preference.

However, there are some guidelines…

  • If the cluster is 1 or 2 nodes in size, it probably makes sense from a administration perspective to simply use round robin DNS.
  • If the cluster is larger than 2 nodes or will eventually scale out to more than 2 nodes, it probably makes sense to get the forwarding zones set up and use on-box DNS.
  • If you require data locality or plan on using features such as NFS node referrals, SMB node referrals or pNFS, then the load balance choice doesn’t matter much – the locality features will override the DNS request.

Conclusion

So there you have it – the quick and dirty rundown of using DNS load balancing for NAS connections. I’m personally a big fan of on-box DNS as a feature because of the notion of intelligent calculation of “best available” IP addresses.

If you have any questions about the feature or the new TR-4523, please comment below.

Setting up BIND to be as insecure as possible in Centos/RHEL7

DNS, in general, should be locked down as much as possible. It’s too easy for hackers to send DNS attacks like DDoS unless you set up some security measures.

However, if you’re just trying to set up a simple BIND DNS server in a lab that’s not on a public network and is behind a ton of firewalls, just to test some basic functionality like I’ve been doing, you may want things to just *work* without having to set up all the extra security bells and whistles.

I’m writing this up to help people avoid the hours of head banging, Googling and debugging that always ends up in an Occam’s razor-like scenario: disable your firewall.

C2y3rk

Before we start, I want to re-iterate something:

DO NOT CONFIGURE YOUR PRODUCTION DNS SERVERS LIKE THIS, INCLUDING DNS SERVERS YOU RUN AT YOUR HOUSE. IF YOU DO, YOU ARE ASKING FOR TROUBLE.

Now that that’s out of the way…

BIND configuration – named.conf Worst Practices

The general recommendations to secure DNS servers is to diable recursion, lock down the allowed queries, etc. Eff that. We’re going all out and allowing everything.

Here’s the named.conf file I used on my BIND server:

options {
 listen-on port 53 {any;};
 listen-on-v6 port 53 {any;};
 directory "/var/named";
 dump-file "/var/named/data/cache_dump.db";
 statistics-file "/var/named/data/named_stats.txt";
 memstatistics-file "/var/named/data/named_mem_stats.txt";
 allow-transfer {any;};
 allow-query-cache {any;};
 allow-query {any;};
 recursion yes;

 dnssec-enable no;
 dnssec-validation no;

/* Path to ISC DLV key */
 bindkeys-file "/etc/named.iscdlv.key";

managed-keys-directory "/var/named/dynamic";

pid-file "/run/named/named.pid";
 session-keyfile "/run/named/session.key";
};

Hackable as s**t. But it works, dammit.

For good measure, my zones:

};

zone "bind.parisi.com" IN {
 type master;
 file "bind.parisi.com.zone";
 allow-update {any; };
 allow-query {any;};
};

zone "xx.xx.xx.in-addr.arpa" IN {
 type master;
 file "xx.xx.xx.in-addr.arpa.zone";
 allow-update {any;};
 allow-query {any;};
};

Arrrgh. Firewalls!

pirate

If you’ve worked with Linux in the past 10 years, I’m sure you’ve run into the problem with Linux firewalls where you just end up turning them off. Historically, it’s been iptables and SELinux. When I was working on my environment, I was seeing the following in a packet trace when attempting remote nslookups:

ICMP 118 Destination unreachable (Host administratively prohibited)

Local worked fine. Pinging the IP worked fine. But dig?

# dig @xx.xx.xx.xx dns.bind.parisi.com

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @xx.xx.xx.xx dns.bind.parisi.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

Ping?

# ping dns.bind.parisi.com
ping: unknown host dns.bind.parisi.com

Everything I read said it was either a config or firewall issue. I had already disabled the usual suspects, SELinux and iptables. But no dice.

Finally, I remembered that Centos/RHEL7 is pretty different from previous versions. So I Googled “centos7 security features” and found my answer: THEY ADDED A NEW &*@$ FIREWALL.

Introducing your newest Linux security nemesis…

Firewalld.

Now, I fully understand the need for new security enhancements. And you should totally leave this alone in production environments. But, like the Windows Firewall, it’s the bane of a lab machine’s existence. So, I disabled it.

# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
 Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
 Active: active (running) since Thu 2016-06-23 14:57:47 EDT; 6h ago
 Main PID: 670 (firewalld)
 CGroup: /system.slice/firewalld.service
 └─670 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid

Jun 23 14:57:21 dns.bind.parisi.com systemd[1]: Starting firewalld - dynamic firewall daemon...
Jun 23 14:57:47 dns.bind.parisi.com systemd[1]: Started firewalld - dynamic firewall daemon.

# systemctl stop firewalld

stewie.jpg

# dig @xx.xx.xx.xx dns.bind.parisi.com

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @xx.xx.xx.xx dns.bind.parisi.com
; (1 server found)
;; global options: +cmd
;; Got answer:

Now, on to fight with BIND some more. Stay tuned for news on TR updates featuring BIND configuration with on-box DNS in ONTAP!

TECH::TR-4379 Name Services Best Practices in clustered Data ONTAP updated for 8.3.1!

It’s time for new technical report updates!

Since clustered Data ONTAP 8.3.1 is now available, we are publishing our 8.3.1 updates to our docs.

idmu

TR-4379: Name Services Best Practices covers a wide range of considerations when using external name services like LDAP, DNS and NIS with your clustered Data ONTAP storage system. External name services are critical to NAS environments, as they help control identity management, Kerberos authentication, hostname resolution, netgroups and export policy rule access.

What’s new in TR-4379?

  • Dynamic DNS support information for 8.3.1
  • Clarification and updates on existing best practices
  • Improved information on name server best practices
  • Upgrade considerations

Where can I find it?

Technical reports can be found a variety of ways. Google search works, as does looking in the NetApp library. I cover how to be better at NetApp documentation in a separate blog post.

To make it super easy, just follow this link:

TR-4379: Name Services Best Practices

TECH::There’s no place like 127.0.0.1. (But for everywhere else, use DNS.)

One of my favorite IT jokes is “there’s no place like 127.0.0.1.” You can get this slogan emblazoned on t-shirts, welcome mats, etc.

127.0.0.1 is, of course, localhost or the loopback address. Every device on a network has one. However, for addresses that need to be resolvable outside of the internal subsystem, we need MAC addresses, IP addresses and in most cases, routing and DNS. Think of it this way – 127.0.0.1 is your bedroom door. That doesn’t help people find your house when you invite them over, however.

Guess who’s coming to dinner?

When you have people over, you need to give them information to get them to your house. In today’s age, that’s as easy as telling someone a street number and name that they can plug into a GPS or Google maps. No more having to give step-by-step directions!

But even giving that much information can be too much, especially if that person comes over a lot (but has a terrible memory). So, in those cases, an address can be saved as a shortcut in a map app or GPS with an alias such as “Justin’s house.”

This is not unlike how MAC and IP addresses work. A MAC address is the physical pavement of the road. An IP is the street number and name. The aliased short cut? That’s the hostname.

The hostname can be served locally via a flat file, or in a database like DNS, LDAP or even NIS. Then clients and servers can query the common database for the information and use that information to find their way around the IT village.

This may all seem rudimentary to you; that’s because it is. 🙂

But you would be surprised how often DNS/hostname resolution comes up in support cases, configuration issues, etc. The reason for that is two-fold.

1) People do not fully understand DNS/hostname resolution

2) People take DNS/hostname resolution for granted

What is DNS?

To cover #1, let’s talk about DNS and what it is/does.

DNS is short for Domain Name System. It’s a centralized database that contains hostnames, IP addresses, service records, aliases, zones… all sorts of things that allow enterprise IT environments leverage it for day to day operations. By default, DNS is included in Active Directory domain deployments. It has to be – otherwise, AD would not function very well/at all. If you want to read more about that, see the following:

How DNS support for Active Directory works

Active Directory-Integrated DNS

Configure a DNS server for use with Active Directory

However, DNS isn’t just used for Active Directory and isn’t isolated to only Windows environments. DNS has been around for a long time and is critical in numerous widely used IT services, including:

  • NAS (NFS and SMB)
  • Kerberos
  • Microsoft Exchange
  • LDAP
  • Various other 3rd party applications

The above list is by no means complete, but gives a general idea of how integral DNS is to day to day IT shops.

What is so difficult about DNS?

DNS is not extremely complicated. However, there are general high-level concepts that get mistaken from time to time.

Servers

DNS servers themselves are concepts that can get lost on people. These contain the records, zones, etc. They also may replicate across the network to other DNS servers. They require specific functionality, such as being able to listen for DNS requests on port 53, caching requests, acting as authoritative servers (SOA) for DNS updates, etc.

Records

This is one thing that trips a lot of people up, mainly because there are many different types of records. Some of the main/common ones include:

  • A/AAAA records (for IPv4/IPv6 addresses)
  • CNAMEs (aliases)
  • MX (mail exchange)
  • NS (name server)
  • PTR records (pointer/reverse lookup)
  • SOA (start of authoritative zone)
  • SRV (service records such as LDAP, Kerberos KDC, etc)

Zones

Zones are used to direct requests from clients to their appropriate locations and/or forward them to other name servers. For example, dns.windows.com might be the name of the Active Directory domain, but you might also have DNS zones in other locations that exist on other name servers. If so, you could add a zone (such as bind.linux.com) and add NS records to forward requests on to the appropriate name servers running BIND. This allows for improved performance of lookups, as well as scalable DNS environments.

NetApp’s clustered Data ONTAP actually allows storage admins to configure individual data LIFs as name servers to act as DNS zones in a Storage Virtual Machine. This comes in handy for intelligent DNS load balancing in clusters and is covered in TR-4073: Secure Unified Authentication on page 27.

Wither DNS?

There is plenty more to DNS than the above. However, if you already know and understand DNS, you can see why it’s easy to overlook it and take it for granted. When configured properly, it just works. It’s not fancy. It’s generally robust and resilient. And with DDNS, you don’t even have to go in and add records to existing DNS servers. Clients do it for you. So when a problem *does* occur, it becomes a “forest for the trees” problem where DNS is one of the last places many admins look. This is a mistake – DNS should be one of the first things checked off the list as “not a problem” when troubleshooting, as it’s so important to so many things in IT.

Best Practices

Most DNS servers out there have documented best practices, and any best practice for a DNS server should come from a vendor. However, there are universal best practices that are pretty much no-brainers when it comes to managing DNS.

  • Use multiple DNS servers: This provides redundancy, eliminates single points of failures, allows load balancing, etc.
  • If using multiple DNS servers, ensure they are all in sync: Replicate all the zones and records on a regular interval. Check error logs to ensure that replication is occurring normally and without error.
  • Be thorough in hostname record creation: Don’t just add a forward lookup record. Add the PTR, too. And don’t create a CNAME unless you have an A/AAAA and PTR record to point it to.
  • Make sure your clients are configured to use the correct DNS servers and zones
  • Avoid using local hosts files if possible: Everyone forgets to update those things. And imagine having to update 1000s of files every time an IP address or hostname changes….
  • Ensure proper service records (SRV) are in place for services.
  • Review the vendor recommendation for enabling recursion. Some vendors want it disabled.
  • Know your DNS port number (53) by heart. This will save you troubleshooting headaches.
  • Learn to love packet traces for troubleshooting, as well as ping, nslookup and dig. Just be careful with ping. General rule of thumb is, if you can ping the IP but not the hostname, check DNS.

There are tons of other best practices out there, including this Cisco doc, this Microsoft doc and this Wikia article. For Name Services Best Practices related to NetApp’s clustered Data ONTAP, see the new TR I wrote on the subject (TR-4379).