Docker + NFS + FlexGroup volumes = Magic!

tapete-as-creation-the-magic-unicorns-8-470937_l

A couple of years ago, I wrote up a blog on using NFS with Docker as I was tooling around with containers, in an attempt to wrap my head around them. Then, I never really touched them again and that blog got a bit… stale.

Why stale?

Well, in that blog, I had to create a bunch of kludgy hacks to get NFS to work with Docker, and honestly, it likely wasn’t even the best way to do it, given my lack of overall Docker knowledge. More recently, I wrote up a way to Kerberize NFS mounts in Docker containers that is a little better effort.

Luckily, realizing that I’m not the only one who wants to use Docker but may not know all the ins and outs, NetApp developers created a NetApp plugin to use with Docker that will do all the volume creation, removal, etc for you. Then, you can leverage the Docker volume options to mount via NFS. That plugin is named “Trident.”

mattel-dc-multiverse-super-friends-aquaman-review-trident-2

Trident + NFS

Trident is an open source storage provisioner and orchestrator for the NetApp portfolio.

You can read more about it here:

https://netapp.io/2016/12/23/introducing-trident-dynamic-persistent-volume-provisioner-kubernetes/

You can also read about how we use it for AI/ML here:

https://www.theregister.co.uk/2018/08/03/netapp_a800_pure_airi_flashblade/

When you’re using the Trident plugin, you can create Docker-ready NFS exported volumes in ONTAP to provide storage to all of your containers just by specifying the -v option during your “docker run” commands.

For example, here’s a NFS exported volume created using the Trident plugin:

# docker volume create -d netapp --name=foo_justin
foo_justin
# docker volume ls
DRIVER VOLUME NAME
netapp:latest foo_justin

Here’s what shows up on the ONTAP system:

::*> vol show -vserver DEMO -volume netappdvp_foo_justin -fields policy
vserver volume               policy
------- -------------------- -------
DEMO    netappdvp_foo_justin default

Then, I can just start up the container using that volume:

# docker run --rm -it -v foo_justin:/foo alpine ash
/ # mount | grep justin
10.x.x.x:/netappdvp_foo_justin on /foo type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.193.67.237,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=10.x.x.x)

Having a centralized NFS storage volume for your containers to rely on has a vast number of use cases, providing access for reading and writing to the same location across a network on a high-performing storage system with all sorts of data protection capabilities to ensure high availability and resiliency.

Customization of Volumes

With the Trident plugin, you have the ability to modify the config files to change attributes from the defaults, such as custom names, size, export policies and others. See the full list here:

http://netapp-trident.readthedocs.io/en/latest/docker/install/ndvp_ontap_config.html

Trident + NFS + FlexGroup Volumes

Starting in Trident 18.07, a new Trident NAS driver was added that supports creation of FlexGroup volumes with Docker.

To change the plugin, change the /etc/netappdvp/config.json file to use the FlexGroup driver.

{
"version": 1,
"storageDriverName": "ontap-nas-flexgroup",
"managementLIF": "10.x.x.x",
"dataLIF": "10.x.x.x.",
"svm": "DEMO",
"username": "admin",
"password": "********",
"aggregate": "aggr1_node1",
}

Then, create your FlexGroup volume. That simple!

A word of advice, though. The FlexGroup driver defaults to 1GB and creates 8 member volumes across your aggregates, which creates 128MB member volumes. That’s problematic for a couple reasons:

  • FlexGroup volumes should have members that are no less than 100GB in size (as per TR-4571) – small members will affect performance due to member volumes doing more remote allocation than normal
  • Files that get written to the FlexGroup will fill up 128MB pretty fast, causing the FlexGroup to appear to be out of space.

You can fix this either by setting the config.json file to use larger sizes, or specifying the size up front in the Docker volume command. I’d recommend using the config file and overriding the defaults.

To set this in the config file, just specify “size” as a variable (full list of options can be found here: https://netapp-trident.readthedocs.io/en/latest/kubernetes/operations/tasks/backends/ontap.html:

{
    "version": 1,
    "storageDriverName": "ontap-nas-flexgroup",
    "managementLIF": "10.0.0.1",
    "dataLIF": "10.0.0.2",
    "svm": "svm_nfs",
    "username": "vsadmin",
    "password": "secret",
    "defaults": {
      "size": "800G",
      "spaceReserve": "volume",
      "exportPolicy": "myk8scluster"
    }}

Since the volumes default to thin provisioned, you shouldn’t worry too much about storage space, unless you think your clients will fill up 800GB. If that’s the case, you can apply quotas to the volumes if needed to limit how much space can be used. (For FlexGroups, quota enforcement will be available in an upcoming release; FlexVols can currently use quota enforcement)

# docker volume create -d netapp --name=foo_justin_fg -o size=1t
foo_justin_fg

And this is what the volume looks like in ONTAP:

::*> vol show -vserver DEMO -volume netappdvp_foo_justin* -fields policy,is-flexgroup,aggr-list,size,space-guarantee 
vserver volume                  aggr-list               size policy  space-guarantee is-flexgroup
------- ----------------------- ----------------------- ---- ------- --------------- ------------
DEMO netappdvp_foo_justin_fg    aggr1_node1,aggr1_node2 1TB  default none            true

Since the FlexGroup is 1TB in size, the member volumes will be 128GB, which fulfills the 100GB minimum. Future releases will enforce this without you having to worry about it.

::*> vol show -vserver DEMO -volume netappdvp_foo_justin_fg_* -fields aggr-list,size -sort-by aggr-list
vserver volume                        aggr-list   size
------- ----------------------------- ----------- -----
DEMO    netappdvp_foo_justin_fg__0001 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0003 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0005 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0007 aggr1_node1 128GB
DEMO    netappdvp_foo_justin_fg__0002 aggr1_node2 128GB
DEMO    netappdvp_foo_justin_fg__0004 aggr1_node2 128GB
DEMO    netappdvp_foo_justin_fg__0006 aggr1_node2 128GB
DEMO    netappdvp_foo_justin_fg__0008 aggr1_node2 128GB
8 entries were displayed.

Practical uses for FlexGroups with containers

It’s cool that we *can* provision FlexGroup volumes with Trident for use with containers, but does that mean we should?

Well, consider this…

In an ONTAP cluster that uses FlexVol volumes for NFS storage presented to containers, I am going to be bound to a single node’s resources, as per the design of a FlexVol. This means that even though I bought a 4 node cluster, I can only use 1 node’s RAM, CPU, network, capacity, etc. If I have a use case where thousands of containers spin up at any given moment and attach themselves to a NFS volume, then I might see some performance bottlenecks due to the increased load. In most cases, that’s fine – but if you could get more out of your storage, wouldn’t you want to do that?

docker-flexvol

You could add layers of automation into the mix to add more FlexVols to the solution, but then you have new mount points/folders. And what if those containers all need to access the same data?

docker-flexvol2

With a FlexGroup volume that gets presented to those same Docker instances, the containers now can leverage all nodes in the cluster, use a single namespace and simplify the overall automation structure.

docker-flexgroup.png

The benefits become even more evident when those containers are constantly writing new files to the NFS mount, such as in an Artificial Intelligence/Machine Learning use case. FlexGroups were designed to handle massive amounts of file creations and can provide 2-6x the performance over a FlexVol in use cases where we’re constantly creating new files.

Stay tuned for some more information on how FlexGroups and Trident can bring even more capability to the table to AI/ML workloads. In the meantime, you can learn more about NetApp solutions for AI/ML here:

https://www.netapp.com/us/solutions/applications/ai-deep-learning.aspx

Advertisements

Behind the Scenes: Episode 145 – AI, Machine Learning and ONTAP with Santosh Rao

Welcome to the Episode 145, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, NetApp Senior Technical Director Santosh Rao (@santorao) joins us to talk about how NetApp and NVidia are partnering to enhance AI solutions with the DGX-1, ONTAP and FlexGroup volumes using NFS!

You can find more information in the following links:

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

This week’s episode is here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

New and updated FlexGroup Technical Reports now available for ONTAP 9.4!

ONTAP 9.4 is now available, so that means the TRs need to get a refresh.

161212-westworld-news

Here’s what I’ve done for FlexGroup in ONTAP 9.4…

New Tech Report!

First, I moved the data protection section of the best practices TR (TR-4571) into its own dedicated backup and data protection TR, which can be found here:

TR-4678: Data Protection and Backup – FlexGroup volumes

Why? Well, that section is going to grow larger and larger as we add more data protection and backup functionality, so it made sense to proactively create a new one.

Updated TRs!

TR-4557 got an update of mostly just what’s new in ONTAP 9.4. That TR is a technical overview, which is intended just to give information on how FlexGroups work. The new feature payload for FlexGroup volumes in ONTAP 9.4 included:

  • QoS minimums and Adaptive QoS
  • FPolicy and file audit
  • SnapDiff support

TR-4571 is the best practices TR and got a brunt of the updates. Included in the TR (aside from details about new features), I added:

  • More detailed information about high file count environments and directory structure
  • More information about maxdirsize limits
  • Information on effects of drive failures
  • Workarounds for lack of NFSv4.x ACL support
  • Member volume count considerations when dealing with small and large files
  • Considerations when deleting FlexGroup volumes (and the volume recovery queue)
  • Clarifications on requirements for available space in an aggregate
  • System Manager support updates

Most of these updates came from feedback and questions I received. If you have something you want to see added to the TRs, let me know!

Behind the Scenes: Episode 134 – The Active IQ Story: Building a Data Pipeline for Machine Learning

Welcome to the Episode 134, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, Active IQ Technical Director Shankar Pasupathy joins us and tells us how AutoSupport’s infrastructure and backend evolved into Active IQ’s multicloud data pipeline. Learn how NetApp is using big data analytics and machine learning on ONTAP to improve the overall customer experience

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

This week’s episode is here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

FlexGroup Technical Reports Updated for ONTAP 9.3

fg-diagram

The latest updates for NetApp FlexGroup volumes for ONTAP 9.3 are available in the following Technical Reports:

Check it out and comment if you have a question!

Also check out previous blogs on FlexGroup volumes:

NetApp FlexGroup: Crazy fast

Tech ONTAP Podcast: Now powered by NetApp FlexGroup volumes!

NetApp FlexGroup: An evolution of NAS

And the lightboard video:

ONTAP 9.3RC1 is now available!

ONTAP 9.3 was announced at NetApp Insight 2017 in Las Vegas and was covered at a high level by Jeff Baxter in the following blog:

Announcing NetApp ONTAP 9.3: The Next Step in Modernizing Your Data Management

I also did a brief video summary here:

We also did a podcast with ONTAP Chief Evangelist Jeff Baxter (@baxontap) and ONTAP SVP Octavian Tanase (@octav) here:

ONTAP releases are delivered every 6 months, with the odd numbered releases landing around time for Insight. Now, the first release candidate for 9.3 is available here:

http://mysupport.netapp.com/NOW/download/software/ontap/9.3RC1

For info on what a release candidate is, see:

http://mysupport.netapp.com/NOW/products/ontap_releasemodel/

Also, check out the documentation center:

docs.netapp.com/ontap-9/index.jsp

The general theme around ONTAP 9.3 is modernization of the data center. I cover this at Insight in session 30682-2, which is available as a recording from Las Vegas for those with a login. If you’re going to Insight in Berlin, feel free to add it to your schedule builder. Here’s a high level list of features, with more detail on some of them later in this blog.

Security enhancements

Simplicity innovations

  • MongoDB support added to application provisioning
  • Simplified data protection flows in System Manager
  • Guided cluster setup and expansion
  • Adaptive QoS

Performance and efficiency improvements

  • Up to 30% performance improvement for specific workloads via WAFL improvements, parallelization and flash optimizations
  • Automatic schedules for deduplication
  • Backgroup inline aggregate deduplication (AFF only; automatic schedule only)

NetApp FlexGroup volume features

This is covered in more detail in What’s New for NetApp FlexGroup Volumes in ONTAP 9.3?

  • Qtrees
  • Antivirus
  • Volume autogrow
  • SnapVault/Unified SnapMirror
  • SMB Change/notify
  • QoS Maximums
  • Improved automated load balancing logic

Data Fabric additions

  • SolidFire to ONTAP SnapMirror
  • MetroCluster over IP

Now, let’s look at a few of the features in a bit more detail. If you have things you want covered more, leave a comment.

Multifactor Authentication (MFA)

Traditionally, to log in to an ONTAP system as an admin, all you needed was a username and password and you’d get root-level access to all storage virtual machines in a cluster. If you’re the benevolent storage admin, that’s great! If you’re a hostile actor, great!* (*unless you’re the benevolent storage admin… then, not so great)

ONTAP 9.3 introduces the ability to configure an external Identity Provider (IdP) server to interact with OnCommand System Manager and Unified Manager to require a key to be passed in addition to a username and password. Initial support for IdP will include Microsoft Active Directory Federation Services and Shibboleth.

MFA

For the command line, the multifactor portion would be passed by way of SSH keys currently.

SnapLock Enhancements

SnapLock is a NetApp ONTAP feature that provides data compliance for businesses that need to preserve data for regulatory reasons, such as HIPAA standards (SnapLock compliance) or for internal requirements, such as needing to preserve records (SnapLock enterprise).

ONTAP 9.3 provides a few enhancements to SnapLock, including one that isn’t available from any storage provider currently.

legal-hold.png

Legal hold is useful in the event that a court has ordered specific documents to be preserved for an ongoing case or investigation. This can be applied to multiple files and remains in effect until you choose to remove it.

event-based

Event-based retention allows storage administrators to set protections on data based on defined events, such as an employee leaving the company (to avoid disgruntled deletions), or for insurance use cases (such as death of a policy holder).

vol-append.png

Volume append mode is the SnapLock feature I alluded to, where no one else can currently accomplish this. Essentially, it’s for media workloads (audio and video) and will write-protect the portion of the files that have already been streamed and allow appending to those files after they’ve been protected. It’s kind of like having a CD-R on  your storage system.

Performance improvements

improve-perf

Every release of ONTAP strives to improve performance in some way. ONTAP 9.3 introduces performance enhancements (mostly for SAN)/block via the following changes:

  • Read latency reductions via WAFL optimizations for All Flash FAS SAN (block) systems
  • Better parallelization for all workloads on mid-range and high-end systems (FAS and AFF) to deliver more throughput/IOPS at lower latencies
  • Parallelization of the iSCSI layer to allow iSCSI to use more cores (best results on 20 core or higher systems)

The following graphs show some examples of that performance improvement versus ONTAP 9.2.

a700-fcp

a700-iscsi

Adaptive Quality of Service (QoS)

Adaptive QoS is a way for storage administrators to allow ONTAP to manage the number of IOPS per TB of volume space without the need to intervene. You simply set a service level class and let ONTAP control the rest.

The graphic below shows how it works.

adaptive-qos

MetroCluster over IP

MetroCluster is a way for clusters to operate in a high availability manner over long distances. (hundreds of kilometers) Traditionally, MetroCluster has been done over fiber channel networks due to low latency requirements needed to guarantee writes can be committed to both sites.

However, now that IP networks are getting more robust, ONTAP is able to support MetroCluster over IP, which provides the following benefits:

  • Reduced CapEx and OpEx (no more dedicated fiber channel networks, cards, bridges)
  • Simplicty of management (use existing IP networks)

mcc-ip.png

The ONTAP 9.3 release is going to be a limited release for this feature, with the following caveats:

  • A700, FAS9000 only
  • 100km limit
  • Dedicated ISL with extended VLAN currently required
  • 1 iWARP card per node

SolidFire to ONTAP SnapMirror

A few years back, the concept of a data fabric (where all of your data can be moved anywhere with the click of a button) was introduced.

That vision continued this year with the inclusion of SnapMirror from SolidFire (and NetApp HCI systems) to ONTAP.

sf-snapmirror.png

ONTAP 9.3 will allow storage administrators to implement a disaster recovery plan for their SolidFire systems.

This includes the following:

  • Baseline and incremental replication using NetApp SnapMirror from SolidFire to ONTAP
  • Failover storage to ONTAP for disaster recovery
  • Failback storage from ONTAP to SolidFire
    • Only for LUNs replicated from SolidFire
    • Replication from ONTAP to SolidFire only for failback

That covers a deeper look at some of the new ONTAP 9.3 features. Feel free to comment if you want to learn more about these features, or any not listed in the overview.

NetApp FlexGroup: Crazy fast

This week, the SPEC SFS®2014_swbuild test results for NetApp FlexGroup volumes submitted for file services were approved and published.

TL;DR – NetApp was the cream of the crop.

You can find those results here:

http://spec.org/sfs2014/results/res2017q3/sfs2014-20170908-00021.html

The testing rig was as follows:

  • Four node FAS8200 cluster (not AFF)
  • 72 4TB 7200 RPM 12Gb SAS drives (per HA pair)
  • NFSv3
  • 20 IBM servers/clients
  • 10GbE network (four connections per HA pair)

Below is a graph that consolidates the results of multiple vendor SPEC SFS®2014_swbuild results. Notice the FlexGroup did more IOPS (around 260k) at a lower latency (sub 3ms):

specsfs-fg

In addition, NetApp had the best Overall Response Time (ORT) of the competition:

specsfs-ort

And had the best MBps/throughput:

specsfs-mbps

Full results here:

http://spec.org/sfs2014/results/sfs2014swbuild.html

For more information on the SPEC SFS®2014_swbuild test, see https://www.spec.org/sfs2014/.

Everything but the kitchen sink…

With a NetApp FlexGroup, the more clients and work you throw at it, the better it will perform. An example of this is seen in TR-4571, with a 2 node A700 doing GIT workload testing. Note how increasing the jobs only encourages the FlexGroup.

average-iops

max-mbps-git

FlexGroup Resources

If you’re interested in learning more, see the following resources:

You can also email us at flexgroups-info@netapp.com.

Tech ONTAP Podcast: Now powered by NetApp FlexGroup volumes!

If you’re not aware, I co-host the Tech ONTAP Podcast. I also am the TME for NetApp FlexGroup volumes. Inexplicably, we weren’t actually storing our podcast files on NetApp storage – instead, we were using the local Mac SSD, which was problematic for three reasons:

  1. It was eventually going to fill up.
  2. If it failed, bye bye files.
  3. It was close to impossible to access unless were were local to the Mac, for a variety of reasons.

So, it finally dawned on me that I had an AFF8040 in my lab, barely being used for anything except testing and TR writing.

At first, I was going to use a FlexVol, out of habit. But then I realized that a FlexGroup volume would provide a great place to write a bunch of 1-400MB files while leveraging all of my cluster resources. The whole process, from creating the FlexGroup, googling autofs in Mac and setting up the NFS mount and Audio Hijack, took me all of maybe 30 minutes (most of that googling and setting up autofs). Not bad!

The podcast setup

When we record the podcast, we use software called Audio Hijack. This allows us to pipe in sound from applications like WebEx and web browsers, as well as from the in-studio microphones, which all get converted to MP3. This is where the FlexGroup NFS mount comes in – we’ll be pointing Audio Hijack to the FlexGroup volume, where the MP3 files will stream in real time.

Additionally, I also migrated all the existing data over to the FlexGroup for archival purposes. We do use OneDrive to do podcast sharing and such, but I wanted an extra layer of centralized data access, and the NFS mounted FlexGroup provides that. Setting it up to stream right from Audio Hijack removes an extra step for me when processing the files. But, before I could point the software at the NFS mount, I had to configure the Mac to automount the FlexGroup volume on boot.

Creating the FlexGroup volume

Normally, a FlexGroup volume is created with 8 member volumes per node for an AFF (as per best practice). However, my FlexGroup volume was going to be around 5TB. That means 16 member volumes would be around 350-400GB each. That violates the other best practices of no less than 500GB per member, to avoid too much remote allocation. While my file sizes weren’t going to be huge, I wanted to avoid issues as the volume filled, so I met in the middle – 8 member volumes total, 4 per node. To do that, you have to go to the CLI; System Manager doesn’t do customization like that yet. In particular, you need the -aggr-list and -aggr-list-multiplier options with volume create.

ontap9-tme-8040::*> vol create -vserver DEMO -volume TechONTAP -aggr-list aggr1_node1,aggr1_node2 -aggr-list-multiplier 4
ontap9-tme-8040::*> vol show -vserver DEMO -volume TechONTAP* -sort-by size -fields size,node
vserver volume size node
------- --------------- ----- ------------------
DEMO TechONTAP__0001 640GB ontap9-tme-8040-01
DEMO TechONTAP__0002 640GB ontap9-tme-8040-02
DEMO TechONTAP__0003 640GB ontap9-tme-8040-01
DEMO TechONTAP__0004 640GB ontap9-tme-8040-02
DEMO TechONTAP__0005 640GB ontap9-tme-8040-01
DEMO TechONTAP__0006 640GB ontap9-tme-8040-02
DEMO TechONTAP__0007 640GB ontap9-tme-8040-01
DEMO TechONTAP__0008 640GB ontap9-tme-8040-02
DEMO TechONTAP 5TB -

Automounting NFS on boot with a Mac

When you mount NFS with a Mac, it doesn’t retain it after you reboot. To get the mount to come back up, you have to configure the autofs service on the Mac. This is different from Linux, where you can simply edit the fstab file. The process is covered very well in this blog post (just be sure to read all the way down to avoid the issue he mentions at the end):

https://coderwall.com/p/fuoa-g/automounting-nfs-share-in-os-x-into-volumes

Here’s my configuration…. I disabled “nobrowse” to prevent issues in case Audio Hijack needed to be able to browse.

autofs.conf

Screen Shot 2017-09-22 at 10.04.37 AM

auto_master file

Screen Shot 2017-09-22 at 10.04.59 AM

auto_nfs

Screen Shot 2017-09-22 at 10.05.17 AM

After that was set up, I copied over the existing 50-ish GBs of data into the FlexGroup and cleaned up some space on the Mac.

ontap9-tme-8040::*> vol show -vserver DEMO -volume TechONTAP* -sort-by size -fields size,used
vserver volume size used
------- --------------- ----- -------
DEMO TechONTAP__0001 640GB 5.69GB
DEMO TechONTAP__0002 640GB 8.24GB
DEMO TechONTAP__0003 640GB 5.56GB
DEMO TechONTAP__0004 640GB 6.48GB
DEMO TechONTAP__0005 640GB 6.42GB
DEMO TechONTAP__0006 640GB 8.39GB
DEMO TechONTAP__0007 640GB 6.25GB
DEMO TechONTAP__0008 640GB 6.25GB
DEMO TechONTAP 5TB 53.29GB
9 entries were displayed.

Then, I configured Audio Hijack to pump the recordings to the FlexGroup volume.

Screen Shot 2017-09-22 at 10.01.00 AM.png

Then, we recorded a couple episodes, without an issue!

Screen Shot 2017-09-22 at 10.34.30 AM.png

As you can see from this output, the FlexGroup volume is relatively evenly allocated:

ontap9-tme-8040::*> node run * flexgroup show TechONTAP
2 entries were acted on.

Node: ontap9-tme-8040-01
FlexGroup 0x80F03817
* next snapshot cleanup due in 2886 msec
* next refresh message due in 886 msec (last to member 0x80F0381F)
* spinnp version negotiated as 4.6, capability 0x3
* Ref count is 8

Idx Member L Used Avail Urgc Targ Probabilities D-Ingest Alloc F-Ingest Alloc
--- -------- - --------------- ---------- ---- ---- --------------------- --------- ----- --------- -----
 1 2044 L 1485146 0% 159376256 0% 12% [100% 100% 79% 79%] 0+ 0 0 0+ 0 0
 2 2045 R 2153941 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 3 2046 L 1415120 0% 159339950 0% 12% [100% 100% 76% 76%] 0+ 0 0 0+ 0 0
 4 2047 R 1690392 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 5 2048 L 1675583 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 6 2049 R 2191360 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 7 2050 L 1630946 1% 159376256 0% 12% [100% 100% 87% 87%] 0+ 0 0 0+ 0 0
 8 2051 R 1631429 1% 159376256 0% 12% [100% 100% 87% 87%] 0+ 0 0 0+ 0 0

Node: ontap9-tme-8040-02
FlexGroup 0x80F03817
* next snapshot cleanup due in 3144 msec
* next refresh message due in 144 msec (last to member 0x80F03818)
* spinnp version negotiated as 4.6, capability 0x3
* Ref count is 8

Idx Member L Used Avail Urgc Targ Probabilities D-Ingest Alloc F-Ingest Alloc
--- -------- - --------------- ---------- ---- ---- --------------------- --------- ----- --------- -----
 1 2044 R 1485146 0% 159376256 0% 12% [100% 100% 79% 79%] 0+ 0 0 0+ 0 0
 2 2045 L 2153941 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 3 2046 R 1415120 0% 159339950 0% 12% [100% 100% 76% 76%] 0+ 0 0 0+ 0 0
 4 2047 L 1690392 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 5 2048 R 1675583 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 6 2049 L 2191360 1% 159376256 0% 12% [100% 100% 98% 98%] 0+ 0 0 0+ 0 0
 7 2050 R 1630946 1% 159376256 0% 12% [100% 100% 87% 87%] 0+ 0 0 0+ 0 0
 8 2051 L 1631429 1% 159376256 0% 12% [100% 100% 87% 87%] 0+ 0 0 0+ 0 0

I plan on using this setup when I start writing the new FlexGroup data protection best practice guide, so stay tuned for that…

So, now, the Tech ONTAP podcast is happily drinking the NetApp FlexGroup champagne!

If you’re going to NetApp Insight, check out session 16594-2 on FlexGroup volumes.

For more information on NetApp FlexGroup volumes, see:

New! NetApp FlexGroup Lab on Demand

Working in the laboratory

Interested in trying out NetApp FlexGroup volumes yourself? Well, we have a new Lab on Demand available that guides you through the setup and configuration of NetApp FlexGroup volumes, as well as managing and monitoring the feature.

If you have a NetApp login and are a partner or internal employee, you can check it out here:

https://labondemand.netapp.com/catalog (log in and then search for/click on the NetApp FlexGroup Volumes in ONTAP v1.0 lab)

If you’re a customer, ping your account team for access, or check out the Lab on Demand at Insight!

While you’re there, be sure to check out other Labs on Demand!

What is Lab on Demand?

Lab on Demand is a fully automated virtualized sandbox of a multitude of NetApp technologies. You can do pretty much anything in these labs, including:

  • Setting up SnapMirror and SnapVault relationships
  • Managing Multiprotocol NAS environments using LDAP
  • Configuring and using SnapCenter
  • Using Docker and Kubernetes with NetApp
  • Testing VMware SRM Backup and Recover

And much more!

Be sure to send lab feedback to flexgroups-info@netapp.com or post to the comments here!

XCP SMB/CIFS support available!

If you’re not familiar with what XCP is, I covered it in a previous blog post, Migrating to ONTAP – Ludicrous speed! as well as in the XCP podcast. Basically, it’s a super-fast way to scan and migrate data.

One of the downsides of the tool was the fact that it only supported NFSv3 migrations, which also meant it couldn’t handle NTFS style ACLs. Doing that would require a SMB/CIFS supported version of XCP. Today, we get that with XCP SMB/CIFS 1.0:

https://mysupport.netapp.com/tools/download/ECMLP2357425DT.html?productID=62115&pcfContentID=ECMLP2357425

XCP for SMB/CIFS supports the following:

“show” Displays information about the CIFS shares of a system
“scan”  Reads all files and directories found on a CIFS share and build assessment reports
“copy”  Recursively copies everything from source to destination
“sync”  Performs multiple incremental syncs from source to target
“verify”  Verifies that the target state matches the source, including attributes and NTFS ACLs
“activate”  Activates the XCP license on Windows hosts
“help”     Displays detailed information about XCP commands and options

 

Right now, it’s CLI only, but be on the lookout for a GUI version.

“Installing” XCP on Windows

XCP in Windows is a simple executable file that runs via the cmd or a PowerShell window. One of the pre-requisites for the software includes Microsoft Visual C++ Redistributable for Visual Studio 2017. If you don’t install this, trying to run the program will result in an error that calls out a specific DLL that isn’t registered.

When I copied the file to my Windows host, I created a new directory called “C:\XCP.” You can put that directory anywhere. To run the utility in CMD, you can either navigate to the directory and run “xcp” or add the directory to your system paths to run from anywhere.

For example:

env-windows-path

XCP-path

Once that’s done, run XCP from any location:

cifs-xcp

cifs-xcp-ps.png

Licensing XCP

XCP is a licensed feature. That doesn’t mean you have to pay for it; the license is only used for tracking purposes. But you do have to apply a license. In Windows, that’s pretty easy.

  1. Download a license from xcp.netapp.com
  2. Copy the license into the C:\NetApp\XCP folder
  3. Run “xcp activate”

xcp-license.png

XCP show

The command “xcp show \\server” can give some useful information for an ONTAP SMB/CIFS server, such as:

  • Available shares
  • Capacity (used and available)
  • Current connections
  • Folder path
  • Share attributes and permissions

This output is a good way to get an overall look at what is available on a server.

cifs-xcp-show.png

XCP scan

XCP has a number of useful scanning features. These include:

PS C:\XCP> xcp help scan

usage: xcp scan [-h] [-v] [-parallel <n>] [-match <filter>] [-preserve-atime]
 [-depth <n>] [-stats] [-l] [-ownership] [-du]
 [-fmt <expression>]
 source

positional arguments:
 source

optional arguments:
 -h, --help show this help message and exit
 -v increase debug verbosity
 -parallel <n> number of concurrent processes (default: <cpu-count>)
 -match <filter> only process files and directories that match the filter
 (see `xcp help -match` for details)
 -preserve-atime restore last accessed date on source
 -depth <n> limit the search depth
 -stats print tree statistics report
 -l detailed file listing output
 -ownership retrieve ownership information
 -du summarize space usage of each directory including
 subdirectories
 -fmt <expression> format file listing according to the python expression
 (see `xcp help -fmt` for details)

I scanned my “shared” directory with the -stats option and it was able to scan over 60,000 files in 31 seconds and gave me the following stats:

== Maximum Values ==
 Size Depth Namelen Dirsize
 2.02KiB 5 15 100

== Average Values ==
 Size Depth Namelen Dirsize
 25.6 5 6 6

== Top File Extensions ==
 .py
 50003 1

== Number of files ==
 empty <8KiB 8-64KiB 64KiB-1MiB 1-10MiB 10-100MiB >100MiB
 3 50001

== Space used ==
 empty <8KiB 8-64KiB 64KiB-1MiB 1-10MiB 10-100MiB >100MiB
 0 1.22MiB 0 0 0 0 0

== Directory entries ==
 empty 1-10 10-100 100-1K 1K-10K >10k
 2 10004 101

== Depth ==
 0-5 6-10 11-15 16-20 21-100 >100
 60111

== Modified ==
 >1 year >1 month 1-31 days 1-24 hrs <1 hour <15 mins future
 60111

== Created ==
 >1 year >1 month 1-31 days 1-24 hrs <1 hour <15 mins future
 60111

Total count: 60111
Directories: 10107
Regular files: 50004
Symbolic links:
Junctions:
Special files:
Total space for regular files: 1.22MiB
Total space for directories: 0
Total space used: 1.22MiB
60,111 scanned, 0 errors, 31s

When I increased the parallel threads to 8, it finished in 18 seconds:

PS C:\XCP> xcp scan -stats -parallel 8 \\demo\shared

Total count: 60111
Directories: 10107
Regular files: 50004
Symbolic links:
Junctions:
Special files:
Total space for regular files: 1.22MiB
Total space for directories: 0
Total space used: 1.22MiB
60,111 scanned, 0 errors, 18s

XCP copy

With xcp copy, I can copy SMB/CIFS data with or without ACLs at a much faster rate than simple robocopy. Keep in mind that with this version of XCP, it doesn’t have BACKUP OPERATOR rights, so you’d need to run the utility as an admin user on both source and destination.

In the following example, I used robocopy to copy the same dataset as XCP to a NetApp FlexGroup volume.

Robocopy to FlexGroup results (~20-30 minutes)

         Total Copied Skipped Mismatch FAILED Extras
 Dirs :  10107  10106       1        0      0      0
 Files : 50004  50004       0        0      0      0
 Bytes : 1.21m  1.21m       0        0      0      0
 Times : 0:19:01 0:13:11 0:00:00 0:05:50

Speed : 1615 Bytes/sec.
 Speed : 0.092 MegaBytes/min.

UPDATE: Someone asked if the above robocopy run was done with the /MT flag, which would be a more fair apples to apples comparison, since XCP does multithreading. It wasn’t. The syntax used was:

PS C:\XCP> robocopy /S /COPYALL source destination

So, I re-ran it using MT:8 and with an empty FlexGroup after restoring the base snapshot and converting the security style to NTFS to ensure the ACLs come over as well. The multithreading of robocopy cut the time to completion roughly in half.

Robocopy /MT to FlexGroup results (~8-9 minutes)

 PS C:\XCP> robocopy /S /COPYALL /MT:8 \\demo\shared \\demo\flexgroup\robocopyMT

-------------------------------------------------------------------------------
 ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : Tue Aug 22 20:32:54 2017

Source : \\demo\shared\
 Dest : \\demo\flexgroup\robocopyMT\

Files : *.*

Options : *.* /S /COPYALL /MT:8 /R:1000000 /W:30
------------------------------------------------------------------------------
Total Copied Skipped Mismatch FAILED Extras
 Dirs : 10107 10106 1 0 0 0
 Files : 50004 50004 0 0 0 0
 Bytes : 1.21 m 1.21 m 0 0 0 0
 Times : 0:35:21 0:06:23 0:00:00 0:01:59

Ended : Tue Aug 22 20:41:18 2017

Then I re-ran the XCP to FlexGroup by restoring the baseline snapshot and then making sure the security style of the volume was NTFS. (It was UNIX before, which would have affected ACLs and overall speed). But, the run still held within 4 minutes. So, we’re looking at 2x as fast as robocopy with a small 60k file and folder workload. In addition, the host I’m using is a Windows 7 client VM with a 1GB network connection and not a ton of power behind it. XCP works best with more robust hardware.

win7-info

XCP to FlexGroup results – NTFS security style (~4 minutes!)

PS C:\XCP> xcp copy -parallel 8 \\demo\shared \\demo\flexgroup\XCP
1,436 scanned, 0 errors, 0 skipped, 0 copied, 0 (0/s), 5s
4,381 scanned, 0 errors, 0 skipped, 507 copied, 12.4KiB (2.48KiB/s), 10s
5,426 scanned, 0 errors, 0 skipped, 1,882 copied, 40.5KiB (5.64KiB/s), 15s
7,431 scanned, 0 errors, 0 skipped, 3,189 copied, 67.4KiB (5.37KiB/s), 20s
8,451 scanned, 0 errors, 0 skipped, 4,537 copied, 96.1KiB (5.75KiB/s), 25s
9,651 scanned, 0 errors, 0 skipped, 5,867 copied, 123KiB (5.31KiB/s), 30s
10,751 scanned, 0 errors, 0 skipped, 7,184 copied, 150KiB (5.58KiB/s), 35s
12,681 scanned, 0 errors, 0 skipped, 8,507 copied, 178KiB (5.44KiB/s), 40s
13,891 scanned, 0 errors, 0 skipped, 9,796 copied, 204KiB (5.26KiB/s), 45s
14,861 scanned, 0 errors, 0 skipped, 11,136 copied, 232KiB (5.70KiB/s), 50s
15,966 scanned, 0 errors, 0 skipped, 12,464 copied, 259KiB (5.43KiB/s), 55s
18,031 scanned, 0 errors, 0 skipped, 13,784 copied, 287KiB (5.52KiB/s), 1m0s
19,056 scanned, 0 errors, 0 skipped, 15,136 copied, 316KiB (5.80KiB/s), 1m5s
20,261 scanned, 0 errors, 0 skipped, 16,436 copied, 342KiB (5.21KiB/s), 1m10s
21,386 scanned, 0 errors, 0 skipped, 17,775 copied, 370KiB (5.65KiB/s), 1m15s
23,286 scanned, 0 errors, 0 skipped, 19,068 copied, 397KiB (5.36KiB/s), 1m20s
24,481 scanned, 0 errors, 0 skipped, 20,380 copied, 424KiB (5.44KiB/s), 1m25s
25,526 scanned, 0 errors, 0 skipped, 21,683 copied, 451KiB (5.35KiB/s), 1m30s
26,581 scanned, 0 errors, 0 skipped, 23,026 copied, 479KiB (5.62KiB/s), 1m35s
28,421 scanned, 0 errors, 0 skipped, 24,364 copied, 507KiB (5.63KiB/s), 1m40s
29,701 scanned, 0 errors, 0 skipped, 25,713 copied, 536KiB (5.70KiB/s), 1m45s
30,896 scanned, 0 errors, 0 skipped, 26,996 copied, 561KiB (5.15KiB/s), 1m50s
31,911 scanned, 0 errors, 0 skipped, 28,334 copied, 590KiB (5.63KiB/s), 1m55s
33,706 scanned, 0 errors, 0 skipped, 29,669 copied, 617KiB (5.52KiB/s), 2m0s
35,081 scanned, 0 errors, 0 skipped, 30,972 copied, 644KiB (5.44KiB/s), 2m5s
36,116 scanned, 0 errors, 0 skipped, 32,263 copied, 671KiB (5.30KiB/s), 2m10s
37,201 scanned, 0 errors, 0 skipped, 33,579 copied, 698KiB (5.48KiB/s), 2m15s
38,531 scanned, 0 errors, 0 skipped, 34,898 copied, 726KiB (5.65KiB/s), 2m20s
40,206 scanned, 0 errors, 0 skipped, 36,199 copied, 753KiB (5.36KiB/s), 2m25s
41,371 scanned, 0 errors, 0 skipped, 37,507 copied, 780KiB (5.39KiB/s), 2m30s
42,441 scanned, 0 errors, 0 skipped, 38,834 copied, 808KiB (5.63KiB/s), 2m35s
43,591 scanned, 0 errors, 0 skipped, 40,161 copied, 835KiB (5.47KiB/s), 2m40s
45,536 scanned, 0 errors, 0 skipped, 41,445 copied, 862KiB (5.31KiB/s), 2m45s
46,646 scanned, 0 errors, 0 skipped, 42,762 copied, 890KiB (5.56KiB/s), 2m50s
47,691 scanned, 0 errors, 0 skipped, 44,052 copied, 916KiB (5.30KiB/s), 2m55s
48,606 scanned, 0 errors, 0 skipped, 45,371 copied, 943KiB (5.45KiB/s), 3m0s
50,611 scanned, 0 errors, 0 skipped, 46,518 copied, 967KiB (4.84KiB/s), 3m5s
51,721 scanned, 0 errors, 0 skipped, 47,847 copied, 995KiB (5.54KiB/s), 3m10s
52,846 scanned, 0 errors, 0 skipped, 49,138 copied, 1022KiB (5.32KiB/s), 3m15s
53,876 scanned, 0 errors, 0 skipped, 50,448 copied, 1.02MiB (5.53KiB/s), 3m20s
55,871 scanned, 0 errors, 0 skipped, 51,757 copied, 1.05MiB (5.42KiB/s), 3m25s
57,011 scanned, 0 errors, 0 skipped, 53,080 copied, 1.08MiB (5.52KiB/s), 3m30s
58,101 scanned, 0 errors, 0 skipped, 54,384 copied, 1.10MiB (5.39KiB/s), 3m35s
59,156 scanned, 0 errors, 0 skipped, 55,714 copied, 1.13MiB (5.57KiB/s), 3m40s
60,111 scanned, 0 errors, 0 skipped, 57,049 copied, 1.16MiB (5.52KiB/s), 3m45s
60,111 scanned, 0 errors, 0 skipped, 58,483 copied, 1.19MiB (6.02KiB/s), 3m50s
60,111 scanned, 0 errors, 0 skipped, 59,907 copied, 1.22MiB (5.79KiB/s), 3m55s
60,111 scanned, 0 errors, 0 skipped, 60,110 copied, 1.22MiB (5.29KiB/s), 3m56s

XCP sync and verify

Sync and verify can be used during data migrations to ensure the source and target match up before cutting over. These use the same multi-processing capabilities as copy, so this should also be fast. Keep in mind that sync could also potentially be used to do incremental backups using XCP!

xcp-verify.png