TECH::Multiprotocol NAS, Locking and You

Bank_Vault_3D_Wallpaper-HD

One question I got today, and have gotten a few times in my years as a NAS guy, was “how does locking work when you are sharing files between CIFS/SMB and NFS?”

Essentially, “can you guarantee my data will be safe?”

Well, the first answer to that question is a question: What is a file lock?

File locks explained

Essentially, a file lock is exactly what it sounds like. We all know what locks are; we have them on our doors. They keep unwanted people and things out. A file lock is no different; it’s a way to prevent unwanted people and applications from accessing files while they are in use to prevent the “c” word – CORRUPTION!

ohfudge

File locks are always issued by the requesting client or application – a NAS will only honor or deny the lock. If the client or application does not issue a lock to the file, all bets are off. This is similar to our door locking analogy – a door will not lock unless you turn the key.

Similarly, a lock will only be safely broken or released when the application or client is done with it. If the client or application dies, the lock needs to be cleaned up. Depending on the NAS protocol/protocol version, this may or may not require manual intervention. For example, if NFSv3 locks are left over by an application crash (such as an Oracle database), then the locks must be manually cleared on the server before the application can be restarted to use the same files. This is because locking in NFSv3 is handled via the Network Lock Manager (NLM), which is an ancillary protocol to NFS and doesn’t always play well with others.

Conversely, NFSv4.x locking is integrated with the protocol and is lease-based. With leases, the locks will live for a pre-determined amount of time. If the client doesn’t renew the lock in that amount of time, the locks will expire. If the client or application crashes, the locks will release on their own after the lease period expires. If the NFS server restarts, the locks will remain intact until either a client reclaims them or the lease expires.

From RFC-5661:

Lease: A lease is an interval of time defined by the server for which the client is irrevocably granted locks. At the end of a lease period, locks may be revoked if the lease has not been 
extended. A lock must be revoked if a conflicting lock has been granted after the lease interval

A server grants a client a single lease for all state.

Simple enough. But what many people don’t know is that there are also different types of file locks.

File Locking in CIFS/SMB

Special thanks to NetApp CIFS/SMB TME Marc Waldrop (@CIFSorSMB on Twitter) for the CIFS/SMB file lock sanity check.

Locks in CIFS/SMB are done either at a share level or a file level. A share lock will dictate what level of access is allowed to the open file while the original opener of the file has the file opened. The share level lock, commonly known as share access mode, will dictate whether additional openers of the file can read or write to the file. The share access mode can lock the entire file from additional clients doing specific read or write operations. A file level lock is what most know as byte-range lock.

File locking in CIFS/SMB is done via oplocks and share locks. When a file is opened for editing, an oplock is applied and the share-level lock is modified to control what access a client or application has to the file. In some cases, classic byte-range locks are used when portions of a file need to be locked.

CIFS/SMB uses three types of “opportunistic locks,” or “oplocks.”

  1. Batch locks
    Created for batch files to help performance issues with files that are opened and closed many times.
  2. Exclusive locks
    Used when an application opens a file in “shared” mode; for example, Microsoft Office. Exclusive locks allow a client to assume they are the only ones using a file and will cache all changes locally. This is where that weird looking ~FILE.doc comes from in Word. If another application requests an exclusive lock on the file, the server will invalidate the original lock and the application will flush the cached changes to the file.
  3. Level 2 Oplocks
    These get issued when multiple clients want to access the same file. When an application gets issued a Level 2 Oplock, multiple clients can read/cache reads of a file. If any client attempts a write, that client gets issued an exclusive lock.

File Locking in NFS

Unlike CIFS/SMB, NFS doesn’t do share-level locking. It only does file locking. NFS locking comes in two flavors:

  1. Shared locks
    Shared locks can be used by multiple processes at the same time and can only be issued if there are no exclusive locks on a file. These are intended for read-only work, but can be used for writes (such as with a database). This is similar to the Level 2 Oplock in CIFS/SMB.
  2. Exclusive locks
    These operate the same as exclusive locks in CIFS/SMB – only one process can use the file when there is an exclusive lock. If any other processes have locked the file, an exclusive lock cannot be issued, unless that process was “forked.” However, unlike CIFS/SMB, there isn’t a notion of “opportunistic” locking, where a file will allow access without outside intervention.

One of the best analogies I’ve seen for this is a real-world example on stackoverflow:

I wrote this answer down because I thought this would be a fun (and fitting) analogy:

Think of a lockable object as a blackboard (lockable) in a class room containing a teacher (writer) and many students (readers).

While a teacher is writing something (exclusive lock) on the board:

1. Nobody can read it, because it’s still being written, and she’s blocking your view => If an object is exclusively
          locked, shared locks cannot be obtained.

2. Other teachers won’t come up and start writing either, or the board becomes unreadable, and confuses
students => If an object is exclusively locked, other exclusive locks cannot be obtained.

When the students are reading (shared locks) what is on the board:

1. They all can read what is on it, together => Multiple shared locks can co-exist.

2. The teacher waits for them to finish reading before she clears the board to write more => If one or more
          shared locks already exist, exclusive locks cannot be obtained.

The reason I prefer the CIFS/SMB notion of locking is that it feels a lot less messy in general and there is less overhead/management involved with the locking. This is especially true with NFSv3, where locking is not integrated into the protocol, but instead uses an ancillary process called NLM. (as mentioned above)

NFSv4.x and later realized this folly and have integrated locking into the protocol. Locking is better now in NFS, and applications are starting to adopt NFSv4.x as a standard because of the improved locking mechanisms.

Locking in multiprotocol environments

Multiprotocol in NAS simply means “ability to access the same datasets from multiple protocols.” Thus, CIFS/SMB and NFS can read and write to the same files.

This can be problematic for a few reasons:

  • CIFS/SMB and NFS permissions are not the same
    NFS, especially NFSv3, uses mode bit permissions (such as 777, 755, etc). NFSv4.x implements ACLs, which look and feel an awful lot like Windows NTFS ACLs. They’re so close, in fact, that it solves a lot of the old permissioning issues you would see in multiprotocol environments. But they don’t solve all of our problems.
  • CIFS/SMB and NFS user/group concepts are not the same
    Windows users and groups use a super long Security Identifier (SID) for unique identifiers, which is constructed by leveraging the domain SID and unique user/group Relative ID (RID). NFS users and groups use a numeric UID/GID and/or NFSv4 ID domain string. In order to get a user to leverage the correct security permission structure, a name mapping has to take place, depending on the originating request + the type of ACL on the file or folder. Unified name services, such as LDAP, are often useful in alleviating the pain of this scenario.
  • CIFS/SMB and NFS file locks are not the same
    As described above. Because of this, the underlying file system has to be able to negotiate the file locking. As it so happens, NetApp invented integrated locking in the 1990s.

So, how do NAS vendors (like NetApp) get this to work?

In ONTAP (and, by proxy, WAFL), the file system owns the locks and gets the final say as to who gets what access. When a CIFS/SMB client grants an exclusive lock to a file, a NFS client that tries to get a file lock to that same file would not be granted access until the lock has been released. The same goes for NFS clients that have an exclusive lock on a file – CIFS/SMB can’t do anything with that file until the locks is broken/released. These locks are only released when the original client releases them, either voluntarily or by lease expiration.

The general idea here is, protocols don’t matter – protect the files at all cost. If I buy a deadbolt lock, I should be able to use it on any door I choose and it should keep my house safe.

In ONTAP, you can check to see if a file has a lock via the command line or API calls.

In Data ONTAP operating in 7-Mode, the commands are:

7mode> lock status
lock status -f [file] [-p protocol] [-n]
lock status -h [host [-o owner]] [-f file] [-p protocol] [-n]
lock status -o [-f file] [-p protocol] [-n]
lock status -o owner [-f file] [-p protocol] [-n] (valid for CIFS only)
lock status -p protocol [-n]
lock status -n

In clustered Data ONTAP, the command is:

cluster::> vserver locks show ?
 [ -instance | -smb-attrs | -fields , ... ]
 { [ -vserver  ] Vserver
 [[-volume] ] Volume
 [[-lif] ] Logical Interface
 [[-path] ] Object Path
 | [ -lockid  ] } Lock UUID
 [ -protocol  ] Lock Protocol
 [ -type {byte-range|share-level|op-lock|delegation} ] Lock Type
 [ -node  ] Node Holding Lock State
 [ -lock-state  ] Lock State
 [ -bytelock-offset  ] Bytelock Starting Offset
 [ -bytelock-length  ] Number of Bytes Locked
 [ -bytelock-mandatory {true|false} ] Bytelock is Mandatory
 [ -bytelock-exclusive {true|false} ] Bytelock is Exclusive
 [ -bytelock-super {true|false} ] Bytelock is Superlock
 [ -bytelock-soft {true|false} ] Bytelock is Soft
 [ -oplock-level {exclusive|level2|batch|null|read-batch} ] Oplock Level
 [ -sharelock-mode  ] Shared Lock Access Mode
 [ -sharelock-soft {true|false} ] Shared Lock is Soft
 [ -delegation-type {read|write} ] Delegation Type
 [ -client-address  ] Client Address
 [ -smb-open-type {none|durable|persistent} ] SMB Open Type
 [ -smb-connect-state  ] SMB Connect State
 [ -smb-expiration-time  ] SMB Expiration Time (Secs)
 [ -smb-open-group-id  ] SMB Open Group ID

Testing file locking between protocols

This can be tricky. I’ve seen a lot of people try to test multiprotocol file locking in the following manner:

  • Open a file in a CIFS/SMB share with notepad in Windows.
  • Go to the NFS client. Open the file with vi.
  • Wonder why the heck writes are allowed on both. Bang head repeatedly on desk. Destroy a copier.

office-space-copier

The reason why writes are allowed on both clients in this scenario is because NEITHER APPLICATION HAS ISSUED A LOCK. File locks are the responsibility of the client and/or application, not the server. Vi and notepad don’t lock files!

Testing locks from NFS when CIFS/SMB owns the lock

The easiest way to lock a file in Windows? Microsoft Office.

When I open a MS Word file in a CIFS/SMB share on cDOT, I get some share locks and a batch oplock:

cluster::> vserver locks show -vserver NAS -volume unix 

Vserver: NAS
Volume   Object Path               LIF         Protocol  Lock Type   Client
-------- ------------------------- ----------- --------- ----------- ----------
unix     /unix/                    data1       cifs      share-level 10.228.225.120
               Sharelock Mode: read-deny_none
               Sharelock Mode: read-deny_none
                                                                     10.62.194.166
               Sharelock Mode: read-deny_none
         /unix/office.docx                               op-lock     10.62.194.166
               Oplock Level: batch
                                                         share-level 10.62.194.166
               Sharelock Mode: read_write-deny_write_delete

In some cases, I may see an exclusive oplock, especially during edits. I can generate an exclusive oplock if another CIFS/SMB client attempts to access the doc with something like WordPad and gets denied access:

Vserver: NAS
Volume   Object Path               LIF         Protocol  Lock Type   Client
-------- ------------------------- ----------- --------- ----------- ----------
unix     /unix/                    data1       cifs      share-level 10.228.225.120
               Sharelock Mode: read-deny_none
               Sharelock Mode: read-deny_none
                                                                     10.62.194.166
               Sharelock Mode: read-deny_none
         /unix/office.docx                               op-lock     10.62.194.166
               Oplock Level: exclusive
                                                         share-level 10.62.194.166
               Sharelock Mode: read_write-deny_write_delete

If I use another instance of MS Word on a separate client, I can get a level 2 oplock:

Vserver: NAS
Volume   Object Path               LIF         Protocol  Lock Type   Client
-------- ------------------------- ----------- --------- ----------- ----------
unix     /unix/                    data1       cifs      share-level 10.228.225.120
               Sharelock Mode: read-deny_none
               Sharelock Mode: read-deny_none
                                                                     10.62.194.166
               Sharelock Mode: read-deny_none
         /unix/office.docx                               op-lock     10.62.194.166
               Oplock Level: level2
                                                         share-level 10.62.194.166
               Sharelock Mode: read_write-deny_write_delete

With this Office file open, I want to test to see if my NFS client can access/write to the file. When I look at the share with “ls,” I can see that funny ~filename listed.

# ls | grep office
~$office.docx
office.docx

As I mentioned, vi is a terrible way to test locks. However, Linux clients have utilities to test file locking in NFS. In CentOS/RHEL, one utility to use is flock. With flock, I can run a command to lock a file. To be cute, I use vi. 😉

When I try to get an exclusive lock on that file, it hangs:

# flock --exclusive office.docx vi

Since I’m doing NFSv3, I can check to see if any NLM locks have been issued on my cDOT Storage Virtual Machine. I can see that I have been issued a byte-range lock, but since the command hung, I can’t do any damage to the file:

cluster::> vserver locks show -vserver NAS -volume unix -protocol nlm

Vserver: NAS
Volume   Object Path               LIF         Protocol  Lock Type   Client
-------- ------------------------- ----------- --------- ----------- ----------
unix     /unix/office.docx         data1       nlm       byte-range  10.228.225.140
                Bytelock Offset(Length): 0 (18446744073709551615)

When I try to get a shared lock to that file, it allows me in:

~ VIM - Vi IMproved 
~ 
~ version 7.2.411 
~ by Bram Moolenaar et al. 
~ Modified by <bugzilla@redhat.com> 
~ Vim is open source and freely distributable 
~ 
~ Become a registered Vim user! 
~ type :help register for information 
~ 
~ type :q to exit 
~ type :help or  for on-line help 
~ type :help version7 for version info

And my SVM grants a byte-range lock:

cluster::> vserver locks show -vserver NAS -volume unix -protocol nlm

Vserver: NAS
Volume   Object Path               LIF         Protocol  Lock Type   Client
-------- ------------------------- ----------- --------- ----------- ----------
unix     /unix/office.docx         data1       nlm       byte-range  10.228.225.140
                Bytelock Offset(Length): 0 (18446744073709551615)

But when I try to write, it fails.

E32: No file name

So, that’s good, right? I close the Word doc out and all the file locks are gone. Only share locks remain:

cluster::> vserver locks show -vserver NAS -volume unix
Vserver: NAS
Volume   Object Path               LIF         Protocol  Lock Type   Client
-------- ------------------------- ----------- --------- ----------- ----------
unix     /unix/                    data1       cifs      share-level 10.228.225.120
               Sharelock Mode: read-deny_none
               Sharelock Mode: read-deny_none
                                                                     10.62.194.166
               Sharelock Mode: read-deny_delete
3 entries were displayed.

Testing locks from CIFS/SMB when NFS owns the lock

From my NFS client, I lock one of the files in the share with an exclusive lock. In this case, I use “newfile,” because vi has no idea what to do with an Office doc.

# flock --exclusive newfile vi

And I can see the new byte-range lock on the SVM:

cluster::> vserver locks show -vserver NAS -volume unix
Vserver: NAS
Volume   Object Path               LIF         Protocol  Lock Type   Client
-------- ------------------------- ----------- --------- ----------- ----------
unix     /unix/                    data1       cifs      share-level 10.228.225.120
               Sharelock Mode: read-deny_none
               Sharelock Mode: read-deny_none
                                                                     10.62.194.166
               Sharelock Mode: read-deny_delete
unix     /unix/office.docx         data1       nlm       byte-range  10.228.225.140
                Bytelock Offset(Length): 0 (18446744073709551615)

The expected behavior here? The file will only allow reads. And I am not disappointed!

Screen Shot 2015-05-20 at 12.34.12 AM

What happens if I have a stale lock?

If my NFS client dies, for whatever reason, and that byte-range lock is still lingering, I can break it from the storage in advanced privilege (complete with scary/legit warning):

cluster::*> vserver locks break -vserver NAS -volume unix -lif data1 -path /unix/newfile

Warning: Breaking file locks can cause applications to become unsynchronized and may lead to data corruption.
Do you want to continue? {y|n}: y
1 entry was acted on.
cluster::> vserver locks show -vserver NAS -volume unix
Vserver: NAS
Volume   Object Path               LIF         Protocol  Lock Type   Client
-------- ------------------------- ----------- --------- ----------- ----------
unix     /unix/                    data1       cifs      share-level 10.228.225.120
               Sharelock Mode: read-deny_none
               Sharelock Mode: read-deny_none
                                                                     10.62.194.166
               Sharelock Mode: read-deny_delete
3 entries were displayed.

Once that happens, I can issue new locks from other clients, whether they are CIFS/SMB or NFS. Doesn’t matter – multiprotocol locking in ONTAP just works!

Where can I find out more?

For more information on NFS in clustered Data ONTAP, see TR-4067: NFS Best Practice and Implementation Guide.

For more information on assorted aspects of multiprotocol NAS access in clustered Data ONTAP, see TR-4073: Secure Unified Authentication.

Additionally, subscribe to Why is the Internet Broken and follow @NFSDudeAbides on Twitter for more NAS-related information!

3 thoughts on “TECH::Multiprotocol NAS, Locking and You

  1. Pingback: Partial Givebacks during Storage Failovers in NetApp's Clustered Data ONTAP - Datacenter Dude

Leave a comment