Behind the Scenes: Episode 219 – FlexVol to FlexGroup Conversion

Welcome to the Episode 219, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

2019-insight-design2-warhol-gophers

This week on the podcast, we invite the NetApp FlexGroup Technical Director, Dan Tennant, and FlexGroup developer Jessica Peters, to talk to us about the ins and outs of converting a FlexVol to a FlexGroup in-place, with no copy and no outage!

I also cover the process in detail in this blog post:

FlexGroup Conversion: Moving from FlexVols to FlexGroups the Easy Way

Expect official documentation on it in the coming weeks.

For more information or questions about FlexGroup volumes, email us at flexgroups-info@netapp.com!

Podcast Transcriptions

We also are piloting a new transcription service, so if you want a written copy of the episode, check it out here (just set expectations accordingly):

Episode 219: FlexVol to FlexGroup Conversion Transcription

Just use the search field to look for words you want to read more about. (For example, search for “storage”)

transcript.png

Be sure to give us feedback on the transcription in the comments here or via podcast@netapp.com! If you have requests for other previous episode transcriptions, let me know!

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

Behind the Scenes: Episode 217 – ONTAP 9.7

Welcome to the Episode 217, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

2019-insight-design2-warhol-gophers

This week on the podcast, we talk about the latest release of ONTAP, as well as the new All-SAN array!

Featured in this week’s podcast:

  • NetApp SVP Octavian Tanase
  • NetApp Director Jeff Baxter
  • NetApp Product Marketing Manager Jon Jacob
  • Netapp Technical Product Marketing Manager Skip Shapiro
  • NetApp TMEs Dan Isaacs and Mike Peppers

Podcast Transcriptions

We also are piloting a new transcription service, so if you want a written copy of the episode, check it out here (just set expectations accordingly):

Episode 217: ONTAP 9.7 Transcription

Just use the search field to look for words you want to read more about. (For example, search for “storage”)

transcript.png

Be sure to give us feedback on the transcription in the comments here or via podcast@netapp.com! If you have requests for other previous episode transcriptions, let me know!

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

FlexGroup Conversion: Moving from FlexVols to FlexGroups the Easy Way

NetApp announced ONTAP 9.7 at Insight 2019 in Las Vegas, which included a number of new features. But mainly, ONTAP 9.7 focuses on making storage management in ONTAP simpler.

recordable-easy-button-with-custom-voice-and-logo

One of the new features that will help make things easier is the new FlexGroup conversion feature, which allows in-place conversion of a FlexVol to a FlexGroup volume without the need to do a file copy.

Best of all, this conversion takes a matter of seconds without needing to remount clients!

I know it sounds too good to be true, but what would you rather do: spend days copying terabytes of data over the network, or run a single command that converts the volume in place without touching the data?

As you can imagine, a lot of people are pretty stoked about being able to convert volumes without copying data, so I wanted to write up something to point people to as the questions inevitably start rolling in. This blog will cover how it works and what caveats there are. The blog will be a bit long, but I wanted to cover all the bases. Look for this information to be included in TR-4571 soon, as well as a new FlexGroup conversion podcast in the coming weeks.

Why would I want to convert a volume to a FlexGroup?

FlexGroup volumes offer a few advantages over FlexVol volume, such as:

  • Ability to expand beyond 100TB and 2 billion files in a single volume
  • Ability to scale out capacity or performance non-disruptively
  • Multi-threaded performance for high ingest workloads
  • Simplification of volume management and deployment

For example, perhaps you have a workload that is growing rapidly and you don’t want to have to migrate the data, but still want to provide more capacity. Or perhaps a workload’s performance just isn’t cutting it on a FlexVol, so you want to provide better performance handling with a FlexGroup. Converting can help here.

When would I not want to convert a FlexVol?

Converting a FlexVol to a FlexGroup might not always be the best option. If you have features you require in FlexVol that aren’t available in FlexGroup volumes, then you should hold off. For example, SVM-DR and cascading SnapMirrors aren’t supported in ONTAP 9.7, so if you need those, you should stay with FlexVols.

Also, if you have a FlexVol that’s already very large (80-100TB) and already very full (80-90%) then you might want to copy the data rather than convert, as the converted FlexGroup volume would then have a very large, very full member volume, which could create performance issues and doesn’t really fully resolve your capacity issues – particularly if that dataset contains files that grow over time.

For example, if you have a FlexVol that is 100TB in capacity and 90TB used, it would look like this:

FV-100t

If you were to convert this 90% full volume to a FlexGroup, then you’d have a 90% full member volume. Once you add new member volumes, they’d be 100TB each and 0% full, meaning they’d take on a majority of new workloads. The data would not rebalance and if the original files grew over time, you could still run out of space with nowhere to go (since 100TB is the maximum member volume size).

fg-400t.png

Things that would block a conversion

ONTAP will block conversion of a FlexVol for the following reasons:

  • The ONTAP version isn’t 9.7 on all nodes
  • ONTAP upgrade issues preventing conversion
  • A FlexVol volume was transitioned from 7-Mode using 7MTT
  • Something is enabled on the volume that isn’t supported with FlexGroups yet (SAN LUNs, Windows NFS, SMB1, part of a fan-out/cascade snapmirror, SVM-DR, Snapshot naming/autodelete, vmalign set, SnapLock, space SLO, logical space enforcement/reporting, etc.)
  • FlexClones are present (The volume being converted can’t be a parent nor a clone)
  • The volume is a FlexCache origin volume
  • Snapshots with snap-ids greater than 255
  • Storage efficiencies are enabled (can be re-enabled after)
  • The volume is a source of a snapmirror and the destination has not been converted yet
  • The volume is part of an active (not quiesced) snapmirror
  • Quotas enabled (must be disabled first, then re-enabled after)
  • Volume names longer than 197 characters
  • Running ONTAP processes (mirrors, jobs, wafliron, NDMP backup, inode conversion in process, etc)
  • SVM root volume
  • Volume is too full

You can check for upgrade issues with:

cluster::*> upgrade-revert show
cluster::*> system node image show-update-progress -node *

You can check for transitioned volumes with:

cluster::*> volume show -is-transitioned true
There are no entries matching your query.

You can check for snapshots with snap-ids >255 with:

cluster::*> volume snapshot show -vserver DEMO -volume testvol -logical-snap-id >255 -fields logical-snap-id

How it works

To convert a FlexVol volume to a FlexGroup volume in ONTAP 9.7, you run a single, simple command in advanced privilege:

cluster::*> volume conversion start ?
-vserver <vserver name> *Vserver Name
[-volume] <volume name> *Volume Name
[ -check-only [true] ] *Validate the Conversion Only
[ -foreground [true] ] *Foreground Process (default: true)

When you run this command, it will take a single FlexVol and convert it into a FlexGroup volume with one member. You can even run a validation of the conversion before you do the real thing!

The process is 1:1, so you can’t currently convert multiple FlexVols into a single FlexGroup. Once the conversion is done, you will have a single member FlexGroup volume, which you can then add more member volumes of the same size to increase capacity and performance.

convert.png

Other considerations/caveats

While the actual conversion process is simple, there are some considerations to think of before converting. Most of these considerations will go away in each ONTAP release as support is added for features, but it’s still prudent to call them out here.

Once the initial conversion is done, ONTAP will unmount the volume internally and remount it to get the new FlexGroup information into the appropriate places. Clients won’t have to remount/reconnect, but will see a disruption that last less than 1 minute while this takes place. Data doesn’t change at all – filehandles all stay the same.

  • FabricPool doesn’t need anything. It just works. No need to rehydrate data on-prem.
  • Snapshot copies will remain and available for clients to access data from, but you won’t be able to restore the volume using them via snaprestore commands. Those snapshots get marked as “pre-conversion.”
  • SnapMirrors will pick up where they left off without needing to rebaseline, provided the source and destination volumes have both been converted. But no snapmirror restores of the volume; just file retrieval from clients. Snapmirror destinations need to be converted first.
  • FlexClones will need to be deleted or split from the volume to be converted.
  • Storage efficiencies will need to be disabled during the conversion, but your space savings will be preserved after the convert
  • FlexCache instances with an origin volume being converted will need to be deleted.
  • Space guarantees can impact how large a FlexGroup volume can get if they’re set to volume guarantee. New member volumes will need to be the same size as the existing members, so you’d need adequate space to honor those.
  • Quotas are supported in FlexGroup volumes, but are done a bit differently than in FlexVol volumes. So, while the convert is being done, quotas have to be disabled (quota off) and then re-enabled later (quota on).

Also, conversion to FlexGroup volumes is a one way street after you expand it, so be sure you’re ready to make the jump. If anything goes wrong during the conversion process, there is a “rescue” method that support can help you use to get out of the pickle, so your data will be safe.

When you expand the FlexGroup to add new member volumes, they will be the same size as the converted member volume, so be sure there is adequate space available. Additionally, the existing data that resides in the original volume will remain in that member volume. Data does not re-distribute. Instead, the FlexGroup will favor newly added member volumes for new files.

Nervous about convert?

Well, ONTAP has features for that.

If you don’t feel comfortable about converting your production FlexVol to a FlexGroup right away, you have options.

First of all, remember that we have the ability to run a check on the convert command with -check-only true. That tells us what pre-requisites we might be missing.

cluster::*> volume conversion start -vserver DEMO -volume flexvol -foreground true -check-only true

Error: command failed: Cannot convert volume "flexvol" in Vserver "DEMO" to a FlexGroup. Correct the following issues and retry the command:
* The volume has Snapshot copies with IDs greater than 255. Use the (privilege: advanced) "volume snapshot show -vserver DEMO -volume flexvol -logical-snap-id >255 -fields logical-snap-id" command to list the Snapshot copies
with IDs greater than 255 then delete them using the "snapshot delete -vserver DEMO -volume flexvol" command.
* Quotas are enabled. Use the 'volume quota off -vserver DEMO -volume flexvol' command to disable quotas.
* Cannot convert because the source "flexvol" of a SnapMirror relationship is source to more than one SnapMirror relationship. Delete other Snapmirror relationships, and then try the conversion of the source "flexvol" volume.
* Only volumes with logical space reporting disabled can be converted. Use the 'volume modify -vserver DEMO -volume flexvol -is-space-reporting-logical false' command to disable logical space reporting.

Also, remember, ONTAP has the ability to create multiple storage virtual machines, which can be fenced off from network access. This can be used to test things, such as volume conversion. The only trick is getting a copy of that data over… but it’s really not that tricky.

Option 1: SnapMirror

You can create a SnapMirror of your “to be converted” volume to the same SVM or a new SVM. Then, break the mirror and delete the relationship. Now you have a sandbox copy of your volume, complete with snapshots, to test out conversion, expansion and performance.

Option 2: FlexClone/volume rehost

If you don’t have SnapMirror or want to try a method that is less taxing on your network, you can use a combination of FlexClone (instant copy of your volume backed by a snapshot) and volume rehost (instant move of the volume from one SVM to another). Keep in mind that FlexClones themselves can’t be rehosted, but you can split the clone and then rehost.

Essentially, the process is:

  1. FlexClone create
  2. FlexClone split
  3. Volume rehost to new SVM (or convert on the existing SVM)
  4. Profit!

Sample conversion

Before I converted a volume, I added around 300,000 files to help determine how long the process might take with a lot of files present.

cluster::*> df -i lotsafiles
Filesystem iused ifree %iused Mounted on Vserver
/vol/lotsafiles/ 330197 20920929 1% /lotsafiles DEMO

cluster::*> volume show lots*
Vserver   Volume       Aggregate    State      Type       Size  Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
DEMO      lotsafiles   aggr1_node1  online     RW         10TB     7.33TB    0%

First, let’s try out the validation:

cluster::*> volume conversion start -vserver DEMO -volume lotsafiles -foreground true -check-only true

Error: command failed: Cannot convert volume "lotsafiles" in Vserver "DEMO" to a FlexGroup. Correct the following issues and retry the command:
* SMB1 is enabled on Vserver "DEMO". Use the 'vserver cifs options modify -smb1-enabled false -vserver DEMO' command to disable SMB1.
* The volume contains LUNs. Use the "lun delete -vserver DEMO -volume lotsafiles -lun *" command to remove the LUNs, or use the "lun move start" command to relocate the LUNs to other
FlexVols.
* NFSv3 MS-DOS client support is enabled on Vserver "DEMO". Use the "vserver nfs modify -vserver DEMO -v3-ms-dos-client disabled" command to disable NFSv3 MS-DOS client support on the
Vserver. Note that disabling this support will disable access for all NFSv3 MS-DOS clients connected to Vserver "DEMO".

As you can see, we have some blockers, such as SMB1 and the LUN I created (to intentionally break conversion). So, I clear them with the recommendations and run it again and see some of our caveats:

cluster::*> volume conversion start -vserver DEMO -volume lotsafiles -foreground true -check-only true
Conversion of volume "lotsafiles" in Vserver "DEMO" to a FlexGroup can proceed with the following warnings:
* After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume.
* Converting flexible volume "lotsafiles" in Vserver "DEMO" to a FlexGroup will cause the state of all Snapshot copies from the volume to be set to "pre-conversion". Pre-conversion Snapshot copies cannot be restored.

Now, let’s convert. But, first, I’ll start a script that takes a while to complete, while also monitoring performance during the conversion using Active IQ Performance Manager.

The conversion of the volume took less than 1 minute, and the only disruption I saw was a slight drop in IOPS:

cluster::*> volume conversion start -vserver DEMO -volume lotsafiles -foreground true

Warning: After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume.
Do you want to continue? {y|n}: y
Warning: Converting flexible volume "lotsafiles" in Vserver "DEMO" to a FlexGroup will cause the state of all Snapshot copies from the volume to be set to "pre-conversion". Pre-conversion Snapshot
         copies cannot be restored.
Do you want to continue? {y|n}: y
[Job 23671] Job succeeded: success
cluster::*> statistics show-periodic
cpu cpu total fcache total total data data data cluster cluster cluster disk disk pkts pkts
avg busy ops nfs-ops cifs-ops ops spin-ops recv sent busy recv sent busy recv sent read write recv sent
---- ---- -------- -------- -------- -------- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- -------- -------- --------
34% 44% 14978 14968 10 0 14978 14.7MB 15.4MB 0% 3.21MB 3.84MB 0% 11.5MB 11.6MB 4.43MB 1.50MB 49208 55026
40% 45% 14929 14929 0 0 14929 15.2MB 15.7MB 0% 3.21MB 3.84MB 0% 12.0MB 11.9MB 3.93MB 641KB 49983 55712
36% 44% 15020 15020 0 0 15019 14.8MB 15.4MB 0% 3.24MB 3.87MB 0% 11.5MB 11.5MB 3.91MB 23.9KB 49838 55806
30% 39% 15704 15694 10 0 15704 15.0MB 15.7MB 0% 3.29MB 3.95MB 0% 11.8MB 11.8MB 2.12MB 4.99MB 50936 57112
32% 43% 14352 14352 0 0 14352 14.7MB 15.3MB 0% 3.33MB 3.97MB 0% 11.3MB 11.3MB 4.19MB 27.3MB 49736 55707
37% 44% 14807 14797 10 0 14807 14.5MB 15.0MB 0% 3.09MB 3.68MB 0% 11.4MB 11.4MB 4.34MB 2.79MB 48352 53616
39% 43% 15075 15075 0 0 15076 14.9MB 15.6MB 0% 3.24MB 3.86MB 0% 11.7MB 11.7MB 3.48MB 696KB 50124 55971
32% 42% 14998 14998 0 0 14997 15.1MB 15.8MB 0% 3.23MB 3.87MB 0% 11.9MB 11.9MB 3.68MB 815KB 49606 55692
38% 43% 15038 15025 13 0 15036 14.7MB 15.2MB 0% 3.27MB 3.92MB 0% 11.4MB 11.3MB 3.46MB 15.8KB 50256 56150
43% 44% 15132 15132 0 0 15133 15.0MB 15.7MB 0% 3.22MB 3.87MB 0% 11.8MB 11.8MB 1.93MB 15.9KB 50030 55938
34% 42% 15828 15817 10 0 15827 15.8MB 16.5MB 0% 3.39MB 4.10MB 0% 12.4MB 12.3MB 4.02MB 21.6MB 52142 58771
28% 39% 11807 11807 0 0 11807 12.3MB 13.1MB 0% 2.55MB 3.07MB 0% 9.80MB 9.99MB 6.76MB 27.9MB 38752 43748
33% 42% 15108 15108 0 0 15107 15.1MB 15.5MB 0% 3.32MB 3.91MB 0% 11.7MB 11.6MB 3.50MB 1.17MB 50903 56143
32% 42% 16143 16133 10 0 16143 15.1MB 15.8MB 0% 3.28MB 3.95MB 0% 11.8MB 11.8MB 3.78MB 9.00MB 50922 57403
24% 34% 8843 8843 0 0 8861 14.2MB 14.9MB 0% 3.70MB 4.44MB 0% 10.5MB 10.5MB 8.46MB 10.7MB 46174 53157
27% 37% 10949 10949 0 0 11177 9.91MB 10.2MB 0% 2.45MB 2.84MB 0% 7.46MB 7.40MB 5.55MB 1.67MB 31764 35032
28% 38% 12580 12567 13 0 12579 13.3MB 13.8MB 0% 2.76MB 3.26MB 0% 10.5MB 10.6MB 3.92MB 19.9KB 44119 48488
30% 40% 14300 14300 0 0 14298 14.2MB 14.7MB 0% 3.09MB 3.68MB 0% 11.1MB 11.1MB 2.66MB 600KB 47282 52789
31% 41% 14514 14503 10 0 14514 14.3MB 14.9MB 0% 3.15MB 3.75MB 0% 11.2MB 11.2MB 3.65MB 728KB 48093 53532
31% 42% 14626 14626 0 0 14626 14.3MB 14.9MB 0% 3.16MB 3.77MB 0% 11.1MB 11.1MB 4.84MB 1.14MB 47936 53645
ontap9-tme-8040: cluster.cluster: 11/13/2019 22:44:39
cpu cpu total fcache total total data data data cluster cluster cluster disk disk pkts pkts
avg busy ops nfs-ops cifs-ops ops spin-ops recv sent busy recv sent busy recv sent read write recv sent
---- ---- -------- -------- -------- -------- -------- -------- -------- ---- -------- -------- ------- -------- -------- -------- -------- -------- --------
30% 39% 15356 15349 7 0 15370 15.3MB 15.8MB 0% 3.29MB 3.94MB 0% 12.0MB 11.8MB 3.18MB 6.90MB 50493 56425
32% 42% 14156 14146 10 0 14156 14.6MB 15.3MB 0% 3.09MB 3.68MB 0% 11.5MB 11.7MB 5.49MB 16.3MB 48159 53678

This is what the performance looked like from AIQ:

convert-perf.png

And now, we have a single member FlexGroup volume:

cluster::*> volume show lots*
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
DEMO lotsafiles - online RW 10TB 7.33TB 0%
DEMO lotsafiles__0001
aggr1_node1 online RW 10TB 7.33TB 0%
2 entries were displayed.

And our snapshots are still there, but are marked as “pre-conversion”:

cluster::> set diag
cluster::*> snapshot show -vserver DEMO -volume lotsafiles -fields is-convert-recovery,state
vserver volume snapshot state is-convert-recovery
------- ---------- -------- -------------- -------------------
DEMO lotsafiles base pre-conversion false
DEMO lotsafiles hourly.2019-11-13_1705
pre-conversion false
DEMO lotsafiles hourly.2019-11-13_1805
pre-conversion false
DEMO lotsafiles hourly.2019-11-13_1905
pre-conversion false
DEMO lotsafiles hourly.2019-11-13_2005
pre-conversion false
DEMO lotsafiles hourly.2019-11-13_2105
pre-conversion false
DEMO lotsafiles hourly.2019-11-13_2205
pre-conversion false
DEMO lotsafiles clone_clone.2019-11-13_223144.0
pre-conversion false
DEMO lotsafiles convert.2019-11-13_224411
pre-conversion true
9 entries were displayed.

Snap restores will fail:

cluster::*> snapshot restore -vserver DEMO -volume lotsafiles -snapshot convert.2019-11-13_224411

Error: command failed: Promoting a pre-conversion Snapshot copy is not supported.

But we can still grab files from the client:

[root@centos7 scripts]# cd /lotsafiles/.snapshot/convert.2019-11-13_224411/pre-convert/
[root@centos7 pre-convert]# ls
topdir_0 topdir_14 topdir_2 topdir_25 topdir_30 topdir_36 topdir_41 topdir_47 topdir_52 topdir_58 topdir_63 topdir_69 topdir_74 topdir_8 topdir_85 topdir_90 topdir_96
topdir_1 topdir_15 topdir_20 topdir_26 topdir_31 topdir_37 topdir_42 topdir_48 topdir_53 topdir_59 topdir_64 topdir_7 topdir_75 topdir_80 topdir_86 topdir_91 topdir_97
topdir_10 topdir_16 topdir_21 topdir_27 topdir_32 topdir_38 topdir_43 topdir_49 topdir_54 topdir_6 topdir_65 topdir_70 topdir_76 topdir_81 topdir_87 topdir_92 topdir_98
topdir_11 topdir_17 topdir_22 topdir_28 topdir_33 topdir_39 topdir_44 topdir_5 topdir_55 topdir_60 topdir_66 topdir_71 topdir_77 topdir_82 topdir_88 topdir_93 topdir_99
topdir_12 topdir_18 topdir_23 topdir_29 topdir_34 topdir_4 topdir_45 topdir_50 topdir_56 topdir_61 topdir_67 topdir_72 topdir_78 topdir_83 topdir_89 topdir_94
topdir_13 topdir_19 topdir_24 topdir_3 topdir_35 topdir_40 topdir_46 topdir_51 topdir_57 topdir_62 topdir_68 topdir_73 topdir_79 topdir_84 topdir_9 topdir_95

Now, I can add more member volumes using “volume expand”:

cluster::*> volume expand -vserver DEMO -volume lotsafiles -aggr-list aggr1_node1,aggr1_node2 -aggr-list-multiplier 2

Warning: The following number of constituents of size 10TB will be added to FlexGroup "lotsafiles": 4. Expanding the FlexGroup will cause the state of all Snapshot copies to be set to "partial".
Partial Snapshot copies cannot be restored.
Do you want to continue? {y|n}: y

Warning: FlexGroup "lotsafiles" is a converted flexible volume. If this volume is expanded, it will no longer be able to be converted back to being a flexible volume.
Do you want to continue? {y|n}: y
[Job 23676] Job succeeded: Successful

But remember, the data doesn’t redistribute. The original member volume will keep the files in place:

cluster::*> df -i lots*
Filesystem iused ifree %iused Mounted on Vserver
/vol/lotsafiles/ 3630682 102624948 3% /lotsafiles DEMO
/vol/lotsafiles__0001/ 3630298 17620828 17% /lotsafiles DEMO
/vol/lotsafiles__0002/ 96 21251030 0% --- DEMO
/vol/lotsafiles__0003/ 96 21251030 0% --- DEMO
/vol/lotsafiles__0004/ 96 21251030 0% --- DEMO
/vol/lotsafiles__0005/ 96 21251030 0% --- DEMO
6 entries were displayed.

cluster::*> df -h lots*
Filesystem total used avail capacity Mounted on Vserver
/vol/lotsafiles/ 47TB 2735MB 14TB 0% /lotsafiles DEMO
/vol/lotsafiles/.snapshot
2560GB 49MB 2559GB 0% /lotsafiles/.snapshot DEMO
/vol/lotsafiles__0001/ 9728GB 2505MB 7505GB 0% /lotsafiles DEMO
/vol/lotsafiles__0001/.snapshot
512GB 49MB 511GB 0% /lotsafiles/.snapshot DEMO
/vol/lotsafiles__0002/ 9728GB 57MB 7505GB 0% --- DEMO
/vol/lotsafiles__0002/.snapshot
512GB 0B 512GB 0% --- DEMO
/vol/lotsafiles__0003/ 9728GB 57MB 7766GB 0% --- DEMO
/vol/lotsafiles__0003/.snapshot
512GB 0B 512GB 0% --- DEMO
/vol/lotsafiles__0004/ 9728GB 57MB 7505GB 0% --- DEMO
/vol/lotsafiles__0004/.snapshot
512GB 0B 512GB 0% --- DEMO
/vol/lotsafiles__0005/ 9728GB 57MB 7766GB 0% --- DEMO
/vol/lotsafiles__0005/.snapshot
512GB 0B 512GB 0% --- DEMO
12 entries were displayed.

Converting a FlexVol in a SnapMirror relationship

Now, let’s take a look at a volume that is in a SnapMirror.

cluster::*> snapmirror show -destination-path data_dst -fields state
source-path destination-path state
----------- ---------------- ------------
DEMO:data   DEMO:data_dst    Snapmirrored

If I try to convert the source, I get an error:

cluster::*> vol conversion start -vserver DEMO -volume data -check-only true

Error: command failed: Cannot convert volume "data" in Vserver "DEMO" to a FlexGroup. Correct the following issues and retry the command:
       * Cannot convert source volume "data" because destination volume "data_dst" of the SnapMirror relationship with "data" as the source is not converted.  First check if the source can be converted to a FlexGroup volume using "vol
       conversion start -volume data -convert-to flexgroup -check-only true". If the conversion of the source can proceed then first convert the destination and then convert the source.

So, I’d need to convert the destination first. To do that, I need to quiesce the snapmirror:

cluster::*> vol conversion start -vserver DEMO -volume data_dst -check-only true

Error: command failed: Cannot convert volume "data_dst" in Vserver "DEMO" to a FlexGroup. Correct the following issues and retry the command:
* The relationship was not quiesced. Quiesce SnapMirror relationship using "snapmirror quiesce -destination-path data_dst" and then try the conversion.

Here we go…

cluster::*> snapmirror quiesce -destination-path DEMO:data_dst
Operation succeeded: snapmirror quiesce for destination "DEMO:data_dst".

cluster::*> vol conversion start -vserver DEMO -volume data_dst -check-only true
Conversion of volume "data_dst" in Vserver "DEMO" to a FlexGroup can proceed with the following warnings:
* After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume.
* Converting flexible volume "data_dst" in Vserver "DEMO" to a FlexGroup will cause the state of all Snapshot copies from the volume to be set to "pre-conversion". Pre-conversion Snapshot copies cannot be restored.

When I convert the volume, it lets me know my next steps:

cluster::*> vol conversion start -vserver DEMO -volume data_dst

Warning: After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume.
Do you want to continue? {y|n}: y
Warning: Converting flexible volume "data_dst" in Vserver "DEMO" to a FlexGroup will cause the state of all Snapshot copies from the volume to be set to "pre-conversion". Pre-conversion Snapshot copies cannot be restored.
Do you want to continue? {y|n}: y
[Job 23710] Job succeeded: SnapMirror destination volume "data_dst" has been successfully converted to a FlexGroup volume. You must now convert the relationship's source volume, "DEMO:data", to a FlexGroup. Then, re-establish the SnapMirror relationship using the "snapmirror resync" command.

Now I convert the source volume…

cluster::*> vol conversion start -vserver DEMO -volume data

Warning: After the volume is converted to a FlexGroup, it will not be possible to change it back to a flexible volume.
Do you want to continue? {y|n}: y
Warning: Converting flexible volume "data" in Vserver "DEMO" to a FlexGroup will cause the state of all Snapshot copies from the volume to be set to "pre-conversion". Pre-conversion Snapshot copies cannot be restored.
Do you want to continue? {y|n}: y
[Job 23712] Job succeeded: success

And resync the mirror:

cluster::*> snapmirror resync -destination-path DEMO:data_dst
Operation is queued: snapmirror resync to destination "DEMO:data_dst".

cluster::*> snapmirror show -destination-path DEMO:data_dst -fields state
source-path destination-path state
----------- ---------------- ------------
DEMO:data DEMO:data_dst Snapmirrored

While that’s fine and all, the most important part of a snapmirror is the restore. So let’s see if I can access files from the destination volume’s snapshot.

First, I mount the source and destination and compare ls output:

# mount -o nfsvers=3 DEMO:/data_dst /dst
# mount -o nfsvers=3 DEMO:/data /data
# ls -lah /data
total 14G
drwxrwxrwx 6 root root 4.0K Nov 14 11:57 .
dr-xr-xr-x. 54 root root 4.0K Nov 15 10:08 ..
drwxrwxrwx 2 root root 4.0K Sep 14 2018 cifslink
drwxr-xr-x 12 root root 4.0K Nov 16 2018 nas
-rwxrwxrwx 1 prof1 ProfGroup 0 Oct 3 14:32 newfile
drwxrwxrwx 5 root root 4.0K Nov 15 10:06 .snapshot
lrwxrwxrwx 1 root root 23 Sep 14 2018 symlink -> /shared/unix/linkedfile
drwxrwxrwx 2 root bin 4.0K Jan 31 2019 test
drwxrwxrwx 3 root root 4.0K Sep 14 2018 unix
-rwxrwxrwx 1 newuser1 ProfGroup 0 Jan 14 2019 userfile
-rwxrwxrwx 1 root root 6.7G Nov 14 11:58 Windows2.iso
-rwxrwxrwx 1 root root 6.7G Nov 14 11:37 Windows.iso
# ls -lah /dst
total 14G
drwxrwxrwx 6 root root 4.0K Nov 14 11:57 .
dr-xr-xr-x. 54 root root 4.0K Nov 15 10:08 ..
drwxrwxrwx 2 root root 4.0K Sep 14 2018 cifslink
dr-xr-xr-x 2 root root 0 Nov 15 2018 nas
-rwxrwxrwx 1 prof1 ProfGroup 0 Oct 3 14:32 newfile
drwxrwxrwx 4 root root 4.0K Nov 15 10:05 .snapshot
lrwxrwxrwx 1 root root 23 Sep 14 2018 symlink -> /shared/unix/linkedfile
drwxrwxrwx 2 root bin 4.0K Jan 31 2019 test
drwxrwxrwx 3 root root 4.0K Sep 14 2018 unix
-rwxrwxrwx 1 newuser1 ProfGroup 0 Jan 14 2019 userfile
-rwxrwxrwx 1 root root 6.7G Nov 14 11:58 Windows2.iso
-rwxrwxrwx 1 root root 6.7G Nov 14 11:37 Windows.iso

And if I ls to the snapshot in the destination volume…

# ls -lah /dst/.snapshot/snapmirror.7e3cc08e-d9b3-11e6-85e2-00a0986b1210_2163227795.2019-11-15_100555/
total 14G
drwxrwxrwx 6 root root 4.0K Nov 14 11:57 .
drwxrwxrwx 4 root root 4.0K Nov 15 10:05 ..
drwxrwxrwx 2 root root 4.0K Sep 14 2018 cifslink
dr-xr-xr-x 2 root root 0 Nov 15 2018 nas
-rwxrwxrwx 1 prof1 ProfGroup 0 Oct 3 14:32 newfile
lrwxrwxrwx 1 root root 23 Sep 14 2018 symlink -> /shared/unix/linkedfile
drwxrwxrwx 2 root bin 4.0K Jan 31 2019 test
drwxrwxrwx 3 root root 4.0K Sep 14 2018 unix
-rwxrwxrwx 1 newuser1 ProfGroup 0 Jan 14 2019 userfile
-rwxrwxrwx 1 root root 6.7G Nov 14 11:58 Windows2.iso
-rwxrwxrwx 1 root root 6.7G Nov 14 11:37 Windows.iso

Everything is there!

Now, I expand the FlexGroup source to give us more capacity:

cluster::*> volume expand -vserver DEMO -volume data -aggr-list aggr1_node1,aggr1_node2 -aggr-list-multiplier 

Warning: The following number of constituents of size 30TB will be added to FlexGroup "data": 4. Expanding the FlexGroup will cause the state of all Snapshot copies to be set to "partial". Partial Snapshot copies cannot be restored.
Do you want to continue? {y|n}: y
[Job 23720] Job succeeded: Successful

If you notice, my source volume now has 5 member volumes. My destination volume… only has one:

cluster::*> vol show -vserver DEMO -volume data*
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
DEMO data - online RW 150TB 14.89TB 0%
DEMO data__0001 aggr1_node2 online RW 30TB 7.57TB 0%
DEMO data__0002 aggr1_node1 online RW 30TB 7.32TB 0%
DEMO data__0003 aggr1_node2 online RW 30TB 7.57TB 0%
DEMO data__0004 aggr1_node1 online RW 30TB 7.32TB 0%
DEMO data__0005 aggr1_node2 online RW 30TB 7.57TB 0%
DEMO data_dst - online DP 30TB 7.32TB 0%
DEMO data_dst__0001
aggr1_node1 online DP 30TB 7.32TB 0%
8 entries were displayed.

No worries! Just update the mirror and ONTAP will fix it for you.

cluster::*> snapmirror update -destination-path DEMO:data_dst
Operation is queued: snapmirror update of destination "DEMO:data_dst".

The update will initially fail with the following:

Last Transfer Error: A SnapMirror transfer for the relationship with destination FlexGroup "DEMO:data_dst" was aborted because the source FlexGroup was expanded. A SnapMirror AutoExpand job with id "23727" was created to expand the destination FlexGroup and to trigger a SnapMirror transfer for the SnapMirror relationship. After the SnapMirror transfer is successful, the "healthy" field of the SnapMirror relationship will be set to "true". The job can be monitored using either the "job show -id 23727" or "job history show -id 23727" commands.

The job will expand the volume and then we can update again:

cluster::*> job show -id 23727
Owning
Job ID Name Vserver Node State
------ -------------------- ---------- -------------- ----------
23727 Snapmirror Expand cluster
node1
Success
Description: SnapMirror FG Expand data_dst


cluster::*> snapmirror show -destination-path DEMO:data_dst -fields state
source-path destination-path state
----------- ---------------- ------------
DEMO:data DEMO:data_dst Snapmirrored

Now both FlexGroup volumes have the same number of members:

cluster::*> vol show -vserver DEMO -volume data*
Vserver Volume Aggregate State Type Size Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
DEMO data - online RW 150TB 14.88TB 0%
DEMO data__0001 aggr1_node2 online RW 30TB 7.57TB 0%
DEMO data__0002 aggr1_node1 online RW 30TB 7.32TB 0%
DEMO data__0003 aggr1_node2 online RW 30TB 7.57TB 0%
DEMO data__0004 aggr1_node1 online RW 30TB 7.32TB 0%
DEMO data__0005 aggr1_node2 online RW 30TB 7.57TB 0%
DEMO data_dst - online DP 150TB 14.88TB 0%
DEMO data_dst__0001
aggr1_node1 online DP 30TB 7.32TB 0%
DEMO data_dst__0002
aggr1_node1 online DP 30TB 7.32TB 0%
DEMO data_dst__0003
aggr1_node2 online DP 30TB 7.57TB 0%
DEMO data_dst__0004
aggr1_node1 online DP 30TB 7.32TB 0%
DEMO data_dst__0005
aggr1_node2 online DP 30TB 7.57TB 0%
12 entries were displayed.

So, there you have it… a quick and easy way to move from FlexVol volumes to FlexGroups!

Behind the Scenes: Episode 209 – Designing an End-to-End Genomics Solution Using NetApp

Welcome to the Episode 209, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

This week on the podcast, we talk genomics and how to design an end to end solution using NetApp products across the portfolio with Cloud Solution Architect Florian Fledhaus (@flfeld). 

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

Using XCP to delete files en masse: A race against rm

superman-flash-race-dc-comics-featured-image

XCP has traditionally been thought of as a way to rapidly migrate large amounts of data, or to scan data and generate reports. And those ideas still hold up today….

But what if i told you that you could use XCP to delete millions of files 5-6x faster than running rm on an NFS client?

Wait… why would I delete millions of files?

Normally, you wouldn’t. But in some workflows, such scratch space, this is what happens. A bunch of small files get generated and then deleted once the work is done.

I ran a simple test in my lab where I had a flexgroup volume with ~37 million files in it.

::*> vol show -vserver DEMO -volume flexgroup_16 -fields files-used
vserver volume files-used
------- ------------ ----------
DEMO flexgroup_16 37356098

I took a snapshot of that data so I could restore it later for XCP to delete and then ran rm -rf on it from a client. It took 20 hours:

# time rm -rf /flexgroup/*

real 1213m4.652s
user 1m39.703s
sys 41m16.978s

Then I restored the snapshot and deleted the same ~37 million files using XCP. That took roughly 3.5 hours:

# time xcp diag -rmrf 10.193.67.219:/flexgroup_16
real 218m17.765s
user 149m16.132s
sys 40m47.427s

So, if you have a workflow that requires you to delete large amounts of data that normally takes you FOREVER, try XCP next time…

These are VMs with limited RAM and 1GB network connections, so I’d imagine with bigger, beefier servers, those times could come down a bit more. But in an apples to apples test, XCP wins again!

Updated FlexGroup Technical Reports now available for ONTAP 9.6!

ONTAP 9.6 is now available, so that means the TRs need to get a refresh.

161212-westworld-news

There are some new features in ONTAP 9.6 for FlexGroup volumes, including:

  • Elastic Sizing
  • MetroCluster support
  • SMB CA shares
  • FlexGroup rename/shrink

The TRs cover those features, and there are some updates to other areas that might not have been as clear as they could have been. I also added some new use cases.

Also, check out the newest FlexGroup episode of the Tech ONTAP Podcast:

TR Update List

Here’s the list of FlexGroup TRs that have been updated for ONTAP 9.6:

TR-4678: Data Protection and Backup – FlexGroup volumes

This covers backup and DR best practices/support for FlexGroup volumes.

TR-4557: FlexGroup Volume Technical Overview

This TR is a technical overview, which is intended just to give information on how FlexGroups work.

TR-4571-a is an abbreviated best practice guide for easy consumption.

TR-4571: FlexGroup Best Practice Guide

This is the best practices TR and also offers:

  • More detailed information about high file count environments and directory structure
  • More information about maxdirsize limits
  • Information on effects of drive failures
  • Workarounds for lack of NFSv4.x ACL support
  • Member volume count considerations when dealing with small and large files
  • Considerations when deleting FlexGroup volumes (and the volume recovery queue)
  • Clarifications on requirements for available space in an aggregate
  • System Manager support updates

Most of these updates came from feedback and questions I received. If you have something you want to see added to the TRs, let me know!

Behind the Scenes: Episode 189 – ONTAP 9.6 Overview

Welcome to the Episode 189, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

This week on the podcast, we give you the lowdown on the latest ONTAP 9.6 release with ONTAP Systems Group Vice President Octavian Tanase (@octav), Senior Director of Product Management Jeff Baxter (@baxontap), and Technical Product Marketing Manager Skip Shapiro (skip.shapiro@netapp.com)! 

Join us as we talk about how ONTAP 9.6 brings more simplicity, productivity, customer use cases, data protection and security to your datacenter. 

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

Sneak Peek! Elastic Sizing for FlexGroup Volumes in ONTAP 9.6

ONTAP 9.6 is coming soon and I recently posted a sneak peek for REST API support. But REST APIs aren’t the only new feature coming with the release. FlexGroup volumes are getting some new enhancements as well.

These include:

  • Ability to rename a FlexGroup volume
  • Ability to shrink a FlexGroup volume
  • Support for MetroCluster with FlexGroup volumes
  • SMB CA share support

One of the bigger features (albeit more under the radar) is a way for ONTAP to help FlexGroup volumes avoid failed writes to volumes due to being out of space – elastic sizing!

Image result for plastic man vs mr fantastic

Prior to ONTAP 9.6, storage administrators had to be a bit more cognizant of member volume capacity, because if a member volume ran out of space in a FlexGroup volume, the file write would fail. Since files do not stripe across member volumes, a single file could grow over time to cause issues with space allocation.

fg-filled.png

There are a few reasons a member volume in a FlexGroup might fill up.

  • A single file that exceeds the available space of a member volume is attempted to be written. For example, a 10GB file is written to a member volume with just 9GB available.
  • A file is appended/written to over time and eventually fills up a member volume. For example, if a database resides in a member volume.
  • Snapshots eat into the active file system space available.

FlexGroup volumes do a generally good job at allocating space across member volumes, but if a workload anomaly occurs, it can throw things off. (Like if your volume is mostly a bunch of 4K files but then you zip a lot of them up and create a giant single file).

Remediation of this problem is generally growing volumes or deleting data. But usually, admins won’t notice the issue until it’s too late and “out of space” errors have occurred. That’s where Elastic Sizing comes in handy.

Elastic Sizing – An Airbag for your Data

One of our FlexGroup volume developers refers to elastic sizing as an “airbag” in that it’s not designed to stop you from getting into an accident, but it does help soften the landing when it happens.

Image result for airbag

In other words, it’s not going to prevent you from writing large files or from running out of space, but it is going to provide a way for those writes to complete.

Here’s how it works…

  1. When a file is written to ONTAP, the system has no idea how large that file will become. The client doesn’t know. The application usually doesn’t know. All that’s known is “hey, I want to write a file.”
  2. When a FlexGroup volume receives a write request, it will get placed in the best available member based on a variety of factors – such as available capacity, inode count, time since last file creation, member volume performance (new in ONTAP 9.6), etc…
  3. When a file is placed, since ONTAP doesn’t know how big a file will get, it also doesn’t know if the file is going to grow to a size that’s larger than the available space. So, the write is allowed as long as we have space to allow it.
  4. If/when the member volume runs out of space, right before ONTAP sends an error to the client that we’ve run out of space, it will query the other member volumes in the FlexGroup to see if there’s any available space to borrow. If there is, ONTAP will add 1% of the volume’s total capacity (in a range of 10MB to 10GB) to the volume that is full (while taking the same amount from another member volume in the same FlexGroup volume) and then the file write will continue.
  5. During the time ONTAP is looking for space to borrow, that file write is paused – this will appear to the client as a performance issue. But the overall goal isn’t to finish the write fast – it’s to allow the write to finish at all. In most cases, a member volume will be large enough to provide the 10GB increment (1% of 1TB is 10GB), which is often more than enough to allow a file creation to complete. In smaller member volumes, the performance impact could be greater, as the system will need to query to borrow space more often.
  6. The capacity borrowing will maintain the overall size of the FlexGroup – for example, if your FlexGroup is 40TB in size, it will remain 40TB.

fg-elastic.png

Once files are deleted/volumes are grown and space is available in that member volume again, ONTAP will re-adjust the member volumes back to their original sizes to maintain an evenness in space.

Ultimately, elastic sizing helps remove the admin overhead of managing space, as well as worrying so much about the initial sizing/deployment of a FlexGroup. You can spend less time thinking about how many member volumes you need, what size they should be, etc.

When you combine elastic sizing in ONTAP 9.6 with features like autogrow/shrink, then ONTAP can pretty much manage your capacity in most cases and help avoid emergency space issues.

Elastic sizing = new FlexGroup use cases?

Traditionally, FlexGroup volume use cases have mainly been for unstructured NAS data, high file count environments, small files, etc. and I’ve cautioned people against putting larger files into FlexGroup volumes because of the aforementioned issues with large files/files that grow potentially filling up a member volume.

But now, with elastic sizing to mitigate those issues, along with volume autogrow/shrink, the FlexGroup use cases get a bit more expanded and interesting.

Why not put a workload with large files/files that grow on a FlexGroup now? In fact, with SMB support for Continuously Available shares for Hyper-V and SQL server, there is further proof that FlexGroup volumes are becoming more viable solutions for a variety of workloads.

You can find the latest podcast for FlexGroup volumes here:

Behind the Scenes: Episode 188 – FlexGroup Volumes Update

Welcome to the Episode 188, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

This week on the podcast, we deliver a long overdue update to Episode 46 of the Tech ONTAP podcast, where we first covered FlexGroup volumes.

We bring back lead developer Richard Jernigan – as well as Technical Director Dan Tennant – to discuss what’s new, what’s changed and what’s coming down the line for FlexGroup volumes.

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

New White Paper! Media and Entertainment Workloads using NetApp ONTAP! #NAB2019

Image result for media and entertainment

Every year, the National Association of Broadcasters puts on a show to deliver the latest and greatest in media and entertainment content and technology solutions.

This year, I decided to try to piggyback on the show and put out a new white paper about how NetApp ONTAP works with media and entertainment workloads. Included in this whitepaper:

  • DreamWorks Animation case study on NetApp ONTAP
  • Media/entertainment benchmark numbers on NetApp FlexGroup volumes
  • Why you’d want to use NetApp ONTAP

You can find the white paper here:

https://www.netapp.com/us/media/wp-7301.pdf

Leave your feedback in the comments!