What’s the deal with remote I/O in ONTAP?


I’m sure most of you have seen Seinfeld, so be sure to read the title in your head as if Seinfeld is delivering it.

I used a comedian as a starter because this post is about a question that I get asked – a lot – that is kind of a running joke by now.

The set up…

When Clustered Data ONTAP first came out, there was a pretty big kerfuffle (love that word) about the architecture of the OS. After all, wasn’t it just a bunch of 7-Mode systems stitched together with duct tape?

Actually, no.

It’s a complete re-write of the ONTAP operating system, for one. The NAS stack from 7-Mode was gutted and became a new architecture built for clustering.

Then, in 8.1, the SAN concepts in 7-Mode were re-done for clustering.

So, while a clustered Data ONTAP cluster is, at the hardware level, a series of HA pairs stitched together with a 10GB network, the operating system has been turned into essentially what I like to call a storage blade center. Your storage systems span clusters of up to 24 physical hardware nodes, effectively obfuscating the hardware and allowing a single management plane for the entire subsystem.

Every node in a cluster is aware of every other node, as well as every other storage object. If a volume lives on node 1, then node 20 knows about it and where it lives via the concept of a replicated database (RDB).

Additionally, the cluster also has a clustered networking stack, where an IP address or WWPN is presented via a logical interface (a LIF). While SAN LIFs have to stay put and leverage host-side pathing for data locality, NAS LIFs have the ability to migrate across any node and any port in the cluster.

However, volumes are still located on physical disks and owned by physical nodes, even though you can move them around via volume move or vol rehost. LIFs are still located on physical ports and nodes, even though you can move them around and load balance connections on them. This raises the question…

What is the deal with remote I/O in ONTAP?

Since you can have multiple nodes in a cluster and a volume can only exist on one node (well, unless you want to check out FlexGroups), and since data LIFs live on single or aggregated ports on a single node, you are bound to run into scenarios where you end up traversing the backend cluster network for data operations unless you want to take on the headache of ensuring every client mounts to a specific IP address to ensure data locality, or you want to leverage one of the data locality features in NAS, such as pNFS or node referrals on initial connection (available for NFSv4.x and CIFS/SMB). I cover some of the NFS-related data locality features in TR-4067, and CIFS autolocation is covered in TR-4191.

In SAN, we have ALUA to manage that locality (or optimized paths), but even adding an extra layer of protection in the form of protocol locality can’t avoid scenarios where interfaces go down or volumes move around after a TCP connection has been established.

That backend network? Why, it’s a 10GB dedicated network with 2-4 dedicated ports per node. No other data is allowed on the network other than cluster operations. Data I/O traverses the network in a proprietary protocol known as SpinNP, which leverages TCP to guarantee the arrival of packets. And, with the advent of 40GB ethernet and other speedier methods of data transfer, I’d be shocked if we didn’t see that backend network improve over the next 5-10 years. The types of operations that traverse the cluster network include:

  • SpinNP for data/local snapmirror
  • ZAPI calls

That’s pretty much it. It’s a beefy, robust backend network that is *extremely* hard to saturate. You’re more likely to bottleneck somewhere else (like your client) before you overload a cluster network.

So now that we’ve established that remote I/O will likely happen, let’s talk about if that matters…

The punchline


Remote I/O absolutely adds overhead to operations. There’s no technical way around saying it. Suggesting there is no penalty would be dishonest. The amount of penalty, however, varies, depending on protocol. This is especially true when  you consider that NAS operations will leverage a fast path when you localize data.

But the question wasn’t “is there a penalty?” The question is “does it matter?”

I’ll answer with some anecdotal evidence – I spent 5 years in support, working on escalations for clustered Data ONTAP for 3 of those years. I closed thousands of cases over that time period. In that time, I *never* fixed a performance issue by making sure a customer used a local data path.  And believe me, it wasn’t for lack of effort. I *wanted* remote traffic to be the root cause, because that was the easy answer.

Sure, it could help when dealing with really low latency applications, such as Oracle. But in those cases, you architect the solution with data locality in mind. In the other vast majority of scenarios, the “remote I/O” penalty is pretty much irrelevant and causes more hand wringing than necessary.

The design of clustered Data ONTAP was intended to help storage administrators stop worrying about the layout of the data. Let’s start allowing it to do its job!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s