Updated FlexGroup Technical Reports now available for ONTAP 9.6!

ONTAP 9.6 is now available, so that means the TRs need to get a refresh.

161212-westworld-news

There are some new features in ONTAP 9.6 for FlexGroup volumes, including:

  • Elastic Sizing
  • MetroCluster support
  • SMB CA shares
  • FlexGroup rename/shrink

The TRs cover those features, and there are some updates to other areas that might not have been as clear as they could have been. I also added some new use cases.

Also, check out the newest FlexGroup episode of the Tech ONTAP Podcast:

TR Update List

Here’s the list of FlexGroup TRs that have been updated for ONTAP 9.6:

TR-4678: Data Protection and Backup – FlexGroup volumes

This covers backup and DR best practices/support for FlexGroup volumes.

TR-4557: FlexGroup Volume Technical Overview

This TR is a technical overview, which is intended just to give information on how FlexGroups work.

TR-4571-a is an abbreviated best practice guide for easy consumption.

TR-4571: FlexGroup Best Practice Guide

This is the best practices TR and also offers:

  • More detailed information about high file count environments and directory structure
  • More information about maxdirsize limits
  • Information on effects of drive failures
  • Workarounds for lack of NFSv4.x ACL support
  • Member volume count considerations when dealing with small and large files
  • Considerations when deleting FlexGroup volumes (and the volume recovery queue)
  • Clarifications on requirements for available space in an aggregate
  • System Manager support updates

Most of these updates came from feedback and questions I received. If you have something you want to see added to the TRs, let me know!

Advertisements

Behind the Scenes: Episode 189 – ONTAP 9.6 Overview

Welcome to the Episode 189, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

This week on the podcast, we give you the lowdown on the latest ONTAP 9.6 release with ONTAP Systems Group Vice President Octavian Tanase (@octav), Senior Director of Product Management Jeff Baxter (@baxontap), and Technical Product Marketing Manager Skip Shapiro (skip.shapiro@netapp.com)! 

Join us as we talk about how ONTAP 9.6 brings more simplicity, productivity, customer use cases, data protection and security to your datacenter. 

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

Sneak Peek! Elastic Sizing for FlexGroup Volumes in ONTAP 9.6

ONTAP 9.6 is coming soon and I recently posted a sneak peek for REST API support. But REST APIs aren’t the only new feature coming with the release. FlexGroup volumes are getting some new enhancements as well.

These include:

  • Ability to rename a FlexGroup volume
  • Ability to shrink a FlexGroup volume
  • Support for MetroCluster with FlexGroup volumes
  • SMB CA share support

One of the bigger features (albeit more under the radar) is a way for ONTAP to help FlexGroup volumes avoid failed writes to volumes due to being out of space – elastic sizing!

Image result for plastic man vs mr fantastic

Prior to ONTAP 9.6, storage administrators had to be a bit more cognizant of member volume capacity, because if a member volume ran out of space in a FlexGroup volume, the file write would fail. Since files do not stripe across member volumes, a single file could grow over time to cause issues with space allocation.

fg-filled.png

There are a few reasons a member volume in a FlexGroup might fill up.

  • A single file that exceeds the available space of a member volume is attempted to be written. For example, a 10GB file is written to a member volume with just 9GB available.
  • A file is appended/written to over time and eventually fills up a member volume. For example, if a database resides in a member volume.
  • Snapshots eat into the active file system space available.

FlexGroup volumes do a generally good job at allocating space across member volumes, but if a workload anomaly occurs, it can throw things off. (Like if your volume is mostly a bunch of 4K files but then you zip a lot of them up and create a giant single file).

Remediation of this problem is generally growing volumes or deleting data. But usually, admins won’t notice the issue until it’s too late and “out of space” errors have occurred. That’s where Elastic Sizing comes in handy.

Elastic Sizing – An Airbag for your Data

One of our FlexGroup volume developers refers to elastic sizing as an “airbag” in that it’s not designed to stop you from getting into an accident, but it does help soften the landing when it happens.

Image result for airbag

In other words, it’s not going to prevent you from writing large files or from running out of space, but it is going to provide a way for those writes to complete.

Here’s how it works…

  1. When a file is written to ONTAP, the system has no idea how large that file will become. The client doesn’t know. The application usually doesn’t know. All that’s known is “hey, I want to write a file.”
  2. When a FlexGroup volume receives a write request, it will get placed in the best available member based on a variety of factors – such as available capacity, inode count, time since last file creation, member volume performance (new in ONTAP 9.6), etc…
  3. When a file is placed, since ONTAP doesn’t know how big a file will get, it also doesn’t know if the file is going to grow to a size that’s larger than the available space. So, the write is allowed as long as we have space to allow it.
  4. If/when the member volume runs out of space, right before ONTAP sends an error to the client that we’ve run out of space, it will query the other member volumes in the FlexGroup to see if there’s any available space to borrow. If there is, ONTAP will add 1% of the volume’s total capacity (in a range of 10MB to 10GB) to the volume that is full (while taking the same amount from another member volume in the same FlexGroup volume) and then the file write will continue.
  5. During the time ONTAP is looking for space to borrow, that file write is paused – this will appear to the client as a performance issue. But the overall goal isn’t to finish the write fast – it’s to allow the write to finish at all. In most cases, a member volume will be large enough to provide the 10GB increment (1% of 1TB is 10GB), which is often more than enough to allow a file creation to complete. In smaller member volumes, the performance impact could be greater, as the system will need to query to borrow space more often.
  6. The capacity borrowing will maintain the overall size of the FlexGroup – for example, if your FlexGroup is 40TB in size, it will remain 40TB.

fg-elastic.png

Once files are deleted/volumes are grown and space is available in that member volume again, ONTAP will re-adjust the member volumes back to their original sizes to maintain an evenness in space.

Ultimately, elastic sizing helps remove the admin overhead of managing space, as well as worrying so much about the initial sizing/deployment of a FlexGroup. You can spend less time thinking about how many member volumes you need, what size they should be, etc.

When you combine elastic sizing in ONTAP 9.6 with features like autogrow/shrink, then ONTAP can pretty much manage your capacity in most cases and help avoid emergency space issues.

Elastic sizing = new FlexGroup use cases?

Traditionally, FlexGroup volume use cases have mainly been for unstructured NAS data, high file count environments, small files, etc. and I’ve cautioned people against putting larger files into FlexGroup volumes because of the aforementioned issues with large files/files that grow potentially filling up a member volume.

But now, with elastic sizing to mitigate those issues, along with volume autogrow/shrink, the FlexGroup use cases get a bit more expanded and interesting.

Why not put a workload with large files/files that grow on a FlexGroup now? In fact, with SMB support for Continuously Available shares for Hyper-V and SQL server, there is further proof that FlexGroup volumes are becoming more viable solutions for a variety of workloads.

You can find the latest podcast for FlexGroup volumes here:

Behind the Scenes: Episode 188 – FlexGroup Volumes Update

Welcome to the Episode 188, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

This week on the podcast, we deliver a long overdue update to Episode 46 of the Tech ONTAP podcast, where we first covered FlexGroup volumes.

We bring back lead developer Richard Jernigan – as well as Technical Director Dan Tennant – to discuss what’s new, what’s changed and what’s coming down the line for FlexGroup volumes.

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

New White Paper! Media and Entertainment Workloads using NetApp ONTAP! #NAB2019

Image result for media and entertainment

Every year, the National Association of Broadcasters puts on a show to deliver the latest and greatest in media and entertainment content and technology solutions.

This year, I decided to try to piggyback on the show and put out a new white paper about how NetApp ONTAP works with media and entertainment workloads. Included in this whitepaper:

  • DreamWorks Animation case study on NetApp ONTAP
  • Media/entertainment benchmark numbers on NetApp FlexGroup volumes
  • Why you’d want to use NetApp ONTAP

You can find the white paper here:

https://www.netapp.com/us/media/wp-7301.pdf

Leave your feedback in the comments!

Behind the Scenes: Episode 182 – NetApp on NetApp: FlexGroup Volumes and ActiveIQ

Welcome to the Episode 182, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, we invite in the guys from Customer One, who operate the NetApp on NetApp program. NetApp on NetApp is a program where we leverage the latest NetApp technologies within our own organizations. Eduardo Rivera (@mredrivera) and Faisal Salaam (https://www.linkedin.com/in/faisal-salam-754a13104/) as we discuss how NetApp is using FlexGroup volumes to power Active IQ. 

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

New ONTAP Release = Updated Technical reports!

ONTAP 9.5 is finally available, which means technical reports are in the process of being updated. For me, that means FlexGroup volumes!

Image result for yay

You can find the latest updates to the FlexGroup volume documentation here:

https://www.netapp.com/us/media/tr-4571.pdf

https://www.netapp.com/us/media/tr-4557.pdf

https://www.netapp.com/us/media/tr-4571-a.pdf

https://www.netapp.com/us/media/tr-4678.pdf

 

Docker + NFS + FlexGroup volumes = Magic!

tapete-as-creation-the-magic-unicorns-8-470937_l

A couple of years ago, I wrote up a blog on using NFS with Docker as I was tooling around with containers, in an attempt to wrap my head around them. Then, I never really touched them again and that blog got a bit… stale.

Why stale?

Well, in that blog, I had to create a bunch of kludgy hacks to get NFS to work with Docker, and honestly, it likely wasn’t even the best way to do it, given my lack of overall Docker knowledge. More recently, I wrote up a way to Kerberize NFS mounts in Docker containers that is a little better effort.

Luckily, realizing that I’m not the only one who wants to use Docker but may not know all the ins and outs, NetApp developers created a NetApp plugin to use with Docker that will do all the volume creation, removal, etc for you. Then, you can leverage the Docker volume options to mount via NFS. That plugin is named “Trident.”

mattel-dc-multiverse-super-friends-aquaman-review-trident-2

Trident + NFS

Trident is an open source storage provisioner and orchestrator for the NetApp portfolio.

You can read more about it here:

https://netapp.io/2016/12/23/introducing-trident-dynamic-persistent-volume-provisioner-kubernetes/

You can also read about how we use it for AI/ML here:

https://www.theregister.co.uk/2018/08/03/netapp_a800_pure_airi_flashblade/

When you’re using the Trident plugin, you can create Docker-ready NFS exported volumes in ONTAP to provide storage to all of your containers just by specifying the -v option during your “docker run” commands.

For example, here’s a NFS exported volume created using the Trident plugin:

# docker volume create -d netapp --name=foo_justin
foo_justin
# docker volume ls
DRIVER VOLUME NAME
netapp:latest foo_justin

Here’s what shows up on the ONTAP system:

::*> vol show -vserver DEMO -volume netappdvp_foo_justin -fields policy
vserver volume               policy
------- -------------------- -------
DEMO    netappdvp_foo_justin default

Then, I can just start up the container using that volume:

# docker run --rm -it -v foo_justin:/foo alpine ash
/ # mount | grep justin
10.x.x.x:/netappdvp_foo_justin on /foo type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.193.67.237,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=10.x.x.x)

Having a centralized NFS storage volume for your containers to rely on has a vast number of use cases, providing access for reading and writing to the same location across a network on a high-performing storage system with all sorts of data protection capabilities to ensure high availability and resiliency.

Customization of Volumes

With the Trident plugin, you have the ability to modify the config files to change attributes from the defaults, such as custom names, size, export policies and others. See the full list here:

http://netapp-trident.readthedocs.io/en/latest/docker/install/ndvp_ontap_config.html

Trident + NFS + FlexGroup Volumes

Starting in Trident 18.07, a new Trident NAS driver was added that supports creation of FlexGroup volumes with Docker.

To change the plugin, change the /etc/netappdvp/config.json file to use the FlexGroup driver.

{
"version": 1,
"storageDriverName": "ontap-nas-flexgroup",
"managementLIF": "10.x.x.x",
"dataLIF": "10.x.x.x.",
"svm": "DEMO",
"username": "admin",
"password": "********",
"aggregate": "aggr1_node1",
}

Then, create your FlexGroup volume. That simple!

A word of advice, though. The FlexGroup driver defaults to 1GB and creates 8 member volumes across your aggregates, which creates 128MB member volumes. That’s problematic for a couple reasons:

  • FlexGroup volumes should have members that are no less than 100GB in size (as per TR-4571) – small members will affect performance due to member volumes doing more remote allocation than normal
  • Files that get written to the FlexGroup will fill up 128MB pretty fast, causing the FlexGroup to appear to be out of space.

You can fix this either by setting the config.json file to use larger sizes, or specifying the size up front in the Docker volume command. I’d recommend using the config file and overriding the defaults.

To set this in the config file, just specify “size” as a variable (full list of options can be found here: https://netapp-trident.readthedocs.io/en/latest/kubernetes/operations/tasks/backends/ontap.html:

{
    "version": 1,
    "storageDriverName": "ontap-nas-flexgroup",
    "managementLIF": "10.0.0.1",
    "dataLIF": "10.0.0.2",
    "svm": "svm_nfs",
    "username": "vsadmin",
    "password": "secret",
    "defaults": {
      "size": "800G",
      "spaceReserve": "volume",
      "exportPolicy": "myk8scluster"
    }}

Since the volumes default to thin provisioned, you shouldn’t worry too much about storage space, unless you think your clients will fill up 800GB. If that’s the case, you can apply quotas to the volumes if needed to limit how much space can be used. (For FlexGroups, quota enforcement will be available in an upcoming release; FlexVols can currently use quota enforcement)

# docker volume create -d netapp --name=foo_justin_fg -o size=1t
foo_justin_fg

And this is what the volume looks like in ONTAP:

::*> vol show -vserver DEMO -volume netappdvp_foo_justin* -fields policy,is-flexgroup,aggr-list,size,space-guarantee 
vserver volume                  aggr-list               size policy  space-guarantee is-flexgroup
------- ----------------------- ----------------------- ---- ------- --------------- ------------
DEMO netappdvp_foo_justin_fg    aggr1_node1,aggr1_node2 1TB  default none            true

Since the FlexGroup is 1TB in size, the member volumes will be 128GB, which fulfills the 100GB minimum. Future releases will enforce this without you having to worry about it.

::*> vol show -vserver DEMO -volume netappdvp_foo_justin_fg_* -fields aggr-list,size -sort-by aggr-list
vserver volume                        aggr-list   size
------- ----------------------------- ----------- -----
DEMO    netappdvp_foo_justin_fg__0001 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0003 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0005 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0007 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0002 aggr1_node2 128GB
DEMO    netappdvp_foo_justin_fg__0004 aggr1_node2 128GB
DEMO    netappdvp_foo_justin_fg__0006 aggr1_node2 128GB
DEMO    netappdvp_foo_justin_fg__0008 aggr1_node2 128GB
8 entries were displayed.

Practical uses for FlexGroups with containers

It’s cool that we *can* provision FlexGroup volumes with Trident for use with containers, but does that mean we should?

Well, consider this…

In an ONTAP cluster that uses FlexVol volumes for NFS storage presented to containers, I am going to be bound to a single node’s resources, as per the design of a FlexVol. This means that even though I bought a 4 node cluster, I can only use 1 node’s RAM, CPU, network, capacity, etc. If I have a use case where thousands of containers spin up at any given moment and attach themselves to a NFS volume, then I might see some performance bottlenecks due to the increased load. In most cases, that’s fine – but if you could get more out of your storage, wouldn’t you want to do that?

docker-flexvol

You could add layers of automation into the mix to add more FlexVols to the solution, but then you have new mount points/folders. And what if those containers all need to access the same data?

docker-flexvol2

With a FlexGroup volume that gets presented to those same Docker instances, the containers now can leverage all nodes in the cluster, use a single namespace and simplify the overall automation structure.

docker-flexgroup.png

The benefits become even more evident when those containers are constantly writing new files to the NFS mount, such as in an Artificial Intelligence/Machine Learning use case. FlexGroups were designed to handle massive amounts of file creations and can provide 2-6x the performance over a FlexVol in use cases where we’re constantly creating new files.

Stay tuned for some more information on how FlexGroups and Trident can bring even more capability to the table to AI/ML workloads. In the meantime, you can learn more about NetApp solutions for AI/ML here:

https://www.netapp.com/us/solutions/applications/ai-deep-learning.aspx

Behind the Scenes: Episode 145 – AI, Machine Learning and ONTAP with Santosh Rao

Welcome to the Episode 145, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, NetApp Senior Technical Director Santosh Rao (@santorao) joins us to talk about how NetApp and NVidia are partnering to enhance AI solutions with the DGX-1, ONTAP and FlexGroup volumes using NFS!

You can find more information in the following links:

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

This week’s episode is here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

New and updated FlexGroup Technical Reports now available for ONTAP 9.4!

ONTAP 9.4 is now available, so that means the TRs need to get a refresh.

161212-westworld-news

Here’s what I’ve done for FlexGroup in ONTAP 9.4…

New Tech Report!

First, I moved the data protection section of the best practices TR (TR-4571) into its own dedicated backup and data protection TR, which can be found here:

TR-4678: Data Protection and Backup – FlexGroup volumes

Why? Well, that section is going to grow larger and larger as we add more data protection and backup functionality, so it made sense to proactively create a new one.

Updated TRs!

TR-4557 got an update of mostly just what’s new in ONTAP 9.4. That TR is a technical overview, which is intended just to give information on how FlexGroups work. The new feature payload for FlexGroup volumes in ONTAP 9.4 included:

  • QoS minimums and Adaptive QoS
  • FPolicy and file audit
  • SnapDiff support

TR-4571 is the best practices TR and got a brunt of the updates. Included in the TR (aside from details about new features), I added:

  • More detailed information about high file count environments and directory structure
  • More information about maxdirsize limits
  • Information on effects of drive failures
  • Workarounds for lack of NFSv4.x ACL support
  • Member volume count considerations when dealing with small and large files
  • Considerations when deleting FlexGroup volumes (and the volume recovery queue)
  • Clarifications on requirements for available space in an aggregate
  • System Manager support updates

Most of these updates came from feedback and questions I received. If you have something you want to see added to the TRs, let me know!