I’ve been the NFS TME at NetApp for 3 years now.
I also cover name services (LDAP, NIS, DNS, etc.) and occasionally answer the stray CIFS/SMB question. I look at NAS as a data utility, not unlike water or electricity in your home. You need it, you love it, but you don’t really think about it too much and it doesn’t really excite you.
However, once I heard that NetApp was creating a brand new distributed file system that could evolve how NAS works, I jumped at the opportunity to be a TME for it. So, now, I am the Technical Marketing Engineer for NFS, Name Services and NetApp FlexGroup (and sometimes CIFS/SMB). How’s that for a job title?
We covered NetApp FlexGroup in the NetApp Tech ONTAP Podcast the week of June 30, but I wanted to write up a blog post to expand upon the topic a little more.
Now that ONTAP 9.1 is available, it was time to update the blog here.
For the official Technical Report, check out TR-4557 – NetApp FlexGroup Technical Overview.
Also, stay tuned for videos and a best practice guide!
This is one I did at Insight:
Data is growing.
It’s no secret… we’re leaving (some may say, left) the days behind where 100TB in a single volume is enough to accommodate a file system. Files are getting larger and datasets are increasing. For instance, think about the sheer amount of data that’s needed to keep something like a photo or video store running. Or a global GPS data structure. Or EDA environments. Or seismic data analyzing oil and gas locations.
Environments like these require massive amounts of capacity, and billions of files in some cases. With scale-out NAS storage devices being the best way to approach these use cases, it’s important to be able to scale the existing architecture in a simple and efficient manner.
For a while, storage systems like ONTAP had a single construct to handle these workloads – the Flexible Volume (or, FlexVol).
FlexVols are great, but…
For most use cases, FlexVols are perfect. They are large enough (up to 100TB) and can handle enough files (up to 2 billion). For NAS workloads, they can do just about anything. But where you start to see issues with the FlexVol is when you start to increase the number of metadata operations in a file system. The FlexVol will serialize these operations and won’t use all possible CPU threads for the operations. Additionally, because a FlexVol is tied directly to a physical aggregate and node, your NAS operations are also tied to that single aggregate or node. If you have a 10 node cluster, each with multiple aggregates, you might not be getting the most bang for your buck.
That’s where NetApp FlexGroup comes in.
FlexGroup has been designed to solve multiple issues in large-scale NAS workloads.
- Capacity – Scales up to 20 petabytes
- High file counts – Up to 400 billion files
- Performance – parallelized operations in NAS workloads, across CPUs, nodes, aggregates and constituent FlexVols
- Simplicity of deployment – Simple to use GUI in System Manager; avoid having to use junction paths to get larger than 100TB capacity
- Load balancing – Use all of your storage resources for a dataset
- Resiliency – Fix metadata errors in real time without taking downtime
With FlexGroup, NAS workloads can now take advantage of every resource available in a cluster. Even if you are using a single node cluster, a FlexGroup can balance workloads across multiple FlexVol constituents and aggregates.
How does a FlexGroup work at a high level?
FlexGroup essentially takes the already awesome concept of a FlexVol and simply enhances it by stitching together multiple FlexVol member constituents into a single namespace that acts like a single FlexVol to clients and storage administrators.
A FlexGroup would roughly look like this:
To a NAS client, it would look like this:
Files are written to individual FlexVol constituents across the FlexGroup. Files are not striped. The amount of concurrency you would see in a FlexGroup would depend on the number of constituents you used. Right now, the maximum number of constituents for a FlexGroup is 200. Since the max volume size is 100TB and the max file count for each volume is 2 billion, that’s where we get our “20PB, 400 billion files” number. Keep in mind that those limits are simply the tested limits – theoretically, the limits are able to climb much higher. #math
When a client creates a file in a FlexGroup, ONTAP will decide which member constituent is the best possible container for that write based on a number of things such as capacity across members, throughput, last accessed, node busy-ness… Basically, doing all the hard work for you. The idea is to keep the members as balanced as possible without hurting performance predictability at all.
The creates can arrive on any node in the cluster. Once the request arrives to the cluster, if ONTAP chooses a member volume that’s different than where the request arrived, a hardlink is created (remote or local, depending on the request) and the create is then passed on to the designated member volume.
Reads and writes after a file is created will operate much like they already do in ONTAP FlexVols now; the system will tell the client where the file location is and point that client to that particular member volume.
Why is this better?
When NAS operations can be allocated across multiple FlexVols, we don’t run into the issue of serialization in the system. Instead, we start spreading the workload across multiple file systems (FlexVols) joined together (the FlexGroup). And unlike Infinite Volumes, there is no concept of a single FlexVol to handle metadata operations – every member volume in a FlexGroup is eligible to process metadata operations.
That way, a client can access a persistent mount point that shows gobs of available space without having to traverse different file systems like you’d have to do with FlexVols.
It’s been tribal knowledge for a while now to create multiple FlexVols in large NAS environments to parallelize operations, but we still had the issue of 100TB limits and the notion of file systems changing when you traversed volumes that were junctioned to other volumes. Plus, storage administrators would be looking at a ton of work trying to figure out how best to layout the data to get the best performance results.
Now, with NetApp FlexGroup, all of that architecture is done for you without needing to spend weeks architecting the layout.
What kind of performance boost are we potentially seeing?
In preliminary testing of a FlexGroup against a single FlexVol, we’ve seen up to 6x the performance. And that was with simple spinning SAS disk. This was the set up used:
- Single FAS8080 node
- SAS drives
- 16 FlexVol member constituents
- 2 aggregates
- 8 members per aggregate
The workload used to test the FlexGroup as a software build using Git. In the graph below, we can see that operations such as checkout and clone show the biggest performance boosts, as they take far less time to run to completion on a FlexGroup than on a single FlexVol.
Adding more nodes and members can improve performance. Adding AFF into the mix can help latency. Here’s a similar test comparison with an AFF system. This test used GIT, but did a compile of gcc instead of the Linux source code to give us more files.
In this case, we see similar performance between a single FlexVol and FlexGroup. We do see slightly better performance with multiple FlexVols (junctioned), but doing that creates complexity and doesn’t offer a true single namespace of >100TB.
This section is added after the blog post was already published, as per one of the blog comments. I just simply forgot to mention it.🙂
In the first release of NetApp FlexGroup, we’ll have access to snapshot functionality. Essentially, this works the same as regular snapshots in ONTAP – it’s done at the FlexVol level and will capture a point in time of the filesystem and lock blocks into place with pointers. I cover general snapshot technology in the blog post Snapshots and Polaroids: Neither Last Forever.
Because a FlexGroup is a collection of member FlexVols, we want to be sure snapshots are captured at the exact same time for filesystem consistency. As such, FlexGroup snapshots are coordinated by ONTAP to be taken at the same time. If a member FlexVol cannot take a snapshot for any reason, the FlexGroup snapshot fails and ONTAP cleans things up.
With the introduction of ONTAP 9.1 RC1, FlexGroup now supports SnapMirror for disaster recovery. This replicates up to 32 member volumes per FlexGroup (100 total per cluster) to a DR site. SnapMirror will take a snapshot of all member volumes at once and then do a concurrent transfer of the members to the DR site.
Automatic Incremental Resiliency
Also included in the FlexGroup feature is a new mechanism that seeks out metadata inconsistencies and fixes them when a client requests access, in real time. No outages. No interruptions. The entire FlexGroup remains online while this happens and the clients don’t even notice when a repair takes place. In fact, no one would know if we didn’t trigger a pesky EMS message to ONTAP to ensure a storage administrator knows we fixed something. Pretty underrated new aspect of FlexGroup, if you ask me.
How do you get NetApp FlexGroup?
NetApp FlexGroup is currently available in ONTAP 9.1RC1 for general availability. It can be used by anyone, but should only be used for the specific use cases covered in the FlexGroup TR-4557. In ONTAP 9.1, FlexGroup supports:
- NFSv3 and SMB 2.0/2.1 (RC2 for SMB support)
- Thin Provisioning
- User and group quota reporting
- Storage efficiencies (inline deduplication, compression, compaction; post-process deduplication)
- OnCommand Performance Manager and System Manager support
- All-flash FAS (incidentally, the *only* all-flash array that currently supports this scale)
- Sharing SVMs with FlexVols
- Constituent volume moves
- 20 PB, 400 billion files
To get more information, please email email@example.com.
What ONTAP 9 features enhance NetApp FlexGroup?
While FlexGroup as a feature is awesome on its own, there are also a number of ONTAP 9 features added that make a FlexGroup even more attractive, in my opinion.
I cover ONTAP 9 in ONTAP 9 RC1 is now available! but the features I think benefit FlexGroup right out of the gate include:
- 15 TB SSDs – once we support flash, these will be a perfect fit for FlexGroup
- Per-aggregate CPs – never bottleneck a node on an over-used aggregate again
- RAID Triple Erasure Coding (RAID-TEC) – triple parity to add extra protection to your large data sets
Be sure to keep an eye out for more news and information regarding FlexGroup. If you have specific questions, I’ll answer them in the comments section (provided they’re not questions I’m not allowed to answer).🙂
If you’re going to NetApp Insight 2016, I’ll be doing a deep dive on FlexGroup. It will be session 60411. Pretty solid results from Insight Las Vegas. Come check it out in Berlin!
Also, check out my blog on XCP, which I think would be a pretty natural fit for migration off existing NAS systems onto FlexGroup.