ONTAP clusters utilize a backend cluster network to allow multiple HA pairs to communicate and provide more scale for performance and capacity. This is done by allowing you to nondisruptively add new nodes (and, as a result, capacity and compute) into a cluster. Data will be accessible regardless of where you connect in the cluster. You can scale up to 24 nodes for NAS-only clusters, while being able to mix different HA pair types in the same cluster if you choose to offer different service levels for storage (such as performance tiers, capacity tiers, etc).
Network interfaces that serve data to clients live on physical ports on nodes and are floating/virtual IP addresses that can move to/from any node in the cluster. File systems for NAS are defined by Storage Virtual Machines (SVMs) and volumes. The SVMs own the IP addresses you would use to access data.
When using NAS (CIFS/SMB/NFS) for data access, you can connect to a data interface in the SVM that lives on any node in the cluster, regardless of where the data volume resides. The following graphic shows how that happens.

When you access a NAS volume on a data interface on the same node as the data volume, ONTAP can “cheat” a little and directly interact with that volume without having to do extra work.
If that data interface is on a different node than where the volume resides, then the NAS packet gets packaged up as a proprietary protocol and shipped over the cluster network backend to the node where the volume lives. This volume/node relationship is stored in an internal database in ONTAP so we always have a map to find volumes quickly. Once the NAS packet arrives on the destination node, it gets unpackaged, processed and then the response to the client goes back out the way it came.
Traversing the cluster network has a bit of a latency cost, however, as the packaging/unpackaging/traversal takes some time (more time than a local request). This manifests into slightly less performance for those workloads. The impact of that performance hit is negligible in most environments, but for latency-sensitive applications, there might be some noticeable performance degradation.
There are protocol features that help mitigate the remote I/O that can occur in a cluster, such as SMB node referrals and pNFS, but in scenarios where you can’t use either of those (SMB node referrals didn’t use Kerberos in earlier Windows versions; pNFS needs NFSv4.1 and later), then you’re going to likely have remote cluster traffic. As mentioned, in most cases this isn’t an issue, but it may be useful to have an easy way to find out if an ONTAP cluster is doing remote/cluster traffic.
Cluster level – Statistics show-periodic
To get a cluster-wide view if there is remote traffic on the cluster, you can use the advanced priv command “statistics show-periodic.” This command gives a wealth of information by default, such as:
- CPU average/busy
- Total ops/NFS ops/CIFS ops
- FlexCache ops
- Total data recieved/sent (Data and cluster network throughput)
- Data received/sent (Data throughput only)
- Cluster received/sent (Cluster throughput only)
- Cluster busy % (how busy the cluster network is)
- Disk reads/writes
- Packets sent/received
We also have options to limit the intervals, define SVMs/vservers, etc.
::*> statistics show-periodic ?
[[-object] ] *Object
[ -instance ] *Instance
[ -counter ] *Counter
[ -preset ] *Preset
[ -node ] *Node
[ -vserver ] *Vserver
[ -interval ] *Interval in Seconds (default: 2)
[ -iterations ] *Number of Iterations (default: 0)
[ -summary {true|false} ] *Print Summary (default: true)
[ -filter ] *Filter Data
But for backend cluster traffic, we only care about a few of those, so we can filter the iterations for only what we want to view. In this case, I just want to look at the data sent/received and the cluster busy %.
::*> statistics show-periodic -counter total-recv|total-sent|data-recv|data-sent|cluster-recv|cluster-sent|cluster-busy
When I do that, I get a cleaner, easier to read capture. This is what it looks like when we have remote traffic. This is an NFSv4.1 workload without pNFS, using a mount wsize of 64K.
cluster1: cluster.cluster: 5/11/2021 14:01:49 total total data data cluster cluster cluster recv sent recv sent busy recv sent -------- -------- -------- -------- ------- -------- -------- 157MB 4.85MB 148MB 3.46MB 0% 8.76MB 1.39MB 241MB 70.2MB 197MB 4.68MB 1% 43.1MB 65.5MB 269MB 111MB 191MB 4.41MB 4% 78.1MB 107MB 329MB 92.5MB 196MB 4.52MB 4% 133MB 88.0MB 357MB 117MB 246MB 5.68MB 2% 111MB 111MB 217MB 27.1MB 197MB 4.55MB 1% 20.3MB 22.5MB 287MB 30.4MB 258MB 5.91MB 1% 28.7MB 24.5MB 205MB 28.1MB 176MB 4.03MB 1% 28.9MB 24.1MB cluster1: cluster.cluster: 5/11/2021 14:01:57 total total data data cluster cluster cluster recv sent recv sent busy recv sent -------- -------- -------- -------- ------- -------- -------- Minimums: 157MB 4.85MB 148MB 3.46MB 0% 8.76MB 1.39MB Averages for 8 samples: 258MB 60.3MB 201MB 4.66MB 1% 56.5MB 55.7MB Maximums: 357MB 117MB 258MB 5.91MB 4% 133MB 111MB
As we can see, there is an average of 55.7MB sent and 56.5MB received over the cluster network each second; this accounts for an average of 1% of the available bandwidth, which means we have plenty of cluster network utilization left over.
When we look at the latency for this workload, this is what we see. (Using qos statistics latency show)
Policy Group Latency -------------------- ---------- -total- 364.00us extreme-fixed 364.00us -total- 619.00us extreme-fixed 619.00us -total- 490.00us extreme-fixed 490.00us -total- 409.00us extreme-fixed 409.00us -total- 422.00us extreme-fixed 422.00us -total- 474.00us extreme-fixed 474.00us -total- 412.00us extreme-fixed 412.00us -total- 372.00us extreme-fixed 372.00us -total- 475.00us extreme-fixed 475.00us -total- 436.00us extreme-fixed 436.00us -total- 474.00us extreme-fixed 474.00us
This is what the cluster network looks like when I use pNFS for data locality:
cluster1: cluster.cluster: 5/11/2021 14:18:19 total total data data cluster cluster cluster recv sent recv sent busy recv sent -------- -------- -------- -------- ------- -------- -------- 208MB 6.24MB 206MB 4.76MB 0% 1.56MB 1.47MB 214MB 5.37MB 213MB 4.85MB 0% 555KB 538KB 214MB 6.27MB 213MB 4.80MB 0% 1.46MB 1.47MB 219MB 5.95MB 219MB 5.40MB 0% 572KB 560KB 318MB 8.91MB 317MB 7.44MB 0% 1.46MB 1.47MB 203MB 5.16MB 203MB 4.62MB 0% 560KB 548KB 205MB 6.09MB 204MB 4.64MB 0% 1.44MB 1.45MB cluster1: cluster.cluster: 5/11/2021 14:18:26 total total data data cluster cluster cluster recv sent recv sent busy recv sent -------- -------- -------- -------- ------- -------- -------- Minimums: 203MB 5.16MB 203MB 4.62MB 0% 555KB 538KB Averages for 7 samples: 226MB 6.28MB 225MB 5.22MB 0% 1.08MB 1.07MB Maximums: 318MB 8.91MB 317MB 7.44MB 0% 1.56MB 1.47MB
There is barely any cluster traffic other than the normal cluster operations. The “data” and “total” sent/received is nearly identical.
And the latency was an average of .1 ms lower.
Policy Group Latency -------------------- ---------- -total- 323.00us extreme-fixed 323.00us -total- 323.00us extreme-fixed 323.00us -total- 325.00us extreme-fixed 325.00us -total- 336.00us extreme-fixed 336.00us -total- 325.00us extreme-fixed 325.00us -total- 328.00us extreme-fixed 328.00us -total- 334.00us extreme-fixed 334.00us -total- 341.00us extreme-fixed 341.00us -total- 336.00us extreme-fixed 336.00us -total- 330.00us extreme-fixed 330.00us
Try it out and see for yourself! If you have questions or comments, enter them below.