Behind the Scenes: Episode 88 – Migrating to ONTAP, FlexGroup volumes

Welcome to the Episode 88, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

group-4-2016

This week on the podcast, we invited Hadrian Baron of NetApp’s migration team to talk about moving from 7-Mode and competitor storage over to clustered ONTAP, as well as the advancements made in the simplicity and speed of moving there. We also discuss multiprotocol NAS challenges and FlexGroup volumes and their benefits.

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

You can listen here:

You can also now find us on YouTube. (The uploads are sporadic and we don’t go back prior to Episode 85):

Migrating to ONTAP – Ludicrous speed!

As many of those familiar with NetApp know, the era of clustered Data ONTAP (CDOT) is upon us. 7-Mode is going the way of the dodo, and we’re helping customers (both legacy and new) move to our scale-out storage solution.

There are a variety of ways people have been moving to cDOT:

(Also, stay tuned for more transition goodness coming very, very soon!)

What’s unstructured NAS data?

If you’re not familiar with the term, unstructured NAS data is, more or less, just NAS data. But it’s really messy NAS data.

It’s home directories, file shares, etc. It refers to a dataset that has been growing and growing over time and becoming harder and harder to manage at a granular level due to the directory structure, number of objects and the sheer amount of ACLs.

It’s a sore point for NAS migrations because it’s difficult to move due to the dependencies. If you’re coming from 7-Mode, you can certainly migrate using the 7MTT, which will copy all those folders and ACLs, but you potentially miss out on the opportunity to restructure that NAS data into a more manageable, logical format via copy-based transition (CBT).

When coming from a non-NetApp storage system, it gets trickier because copying the data is the *only* option at that point. Then the complexity of the unstructured NAS data is exacerbated by the fact that it will take a very, very long time to migrate in some cases.

What tools are available to migrate unstructured NAS data?

The arrows in your quiver (so to speak) for migrating NAS data are your typical utilities, such as the tried and true Robocopy for CIFS/SMB data.

There is also the old standby of NDMP, which just about every storage vendor supports. This can migrate all NAS data types, as it’s file-system agnostic.

However, each of the available methods to migrate are not without challenges. They are fairly slow. Some are single-threaded. All are network-dependent. And the challenges only get more apparent as the number of files grows. Remember, you’re not just copying files – you are copying information associated with those files. That adds to the overhead.

One of the favorite migration tools of NAS data is rsync. Some people swear it’s the best backup tool ever. However, it faces the same challenges mentioned – it’s slow, especially when dealing with large numbers of objects and wide/deep directory structures.

How has NetApp fixed that?

Thanks to some excellent work by one of NetApp’s architects, Peter Schay, we now have a utility that can help your migrations hit ludicrous speed – without needing rsync.

The tool? XCP.

https://xcp.netapp.com/

Also be on the lookout for some more ONTAP goodness in ONTAP 9 that helps improve performance and capacity with NAS data.

What is XCP?

XCP is a free data migration tool offered by NetApp that promises to accelerate NFSv3 migration for large unstructured NAS datasets, gather statistics about your files, sync, verify… pretty much anything you ever wanted out of a NAS migration tool. Its wheelhouse is high file count environments that use NFSv3, which also happens to be one of the more challenging scenarios for data migration.

Now, I can’t tell you something is really, really fast without giving you some empirical data. I won’t name names, because that’s not what I do, but our test runs showed an unspecified NAS vendor’s migration of 165 million files took 20 times longer than XCP. We took an 8-10 day file copy down to twelve hours in our testing.

In another use case, a customer moved 4 BILLION inodes and a petabyte of data from a non-NetApp system to a cDOT system and it was 30x faster than rsync.

That’s INSANE.

However, if you’re migrating a few large files, you won’t see a huge gain in speed. Rsync would be similarly effective.

And data migration isn’t the only use case – XCP can also help with file listing.

Recall what I mentioned before…

Remember, you’re not just copying files – you are copying information associated with those files.

That “information” I mentioned? It’s called metadata. And it has long been the bane of existence for NAS file systems. It’s all those messy bits of file systems – the directory tree locations, filehandles, file permissions, owners – all the things that make file based storage awesome because of the granularity and security also make it not so awesome because of the overhead. It’s a problem that is seen across vendors.

Case in point – that same not-to-be-named, non-NetApp storage vendor? It took 9 days to do a listing of the aforementioned 165 million files. NINE DAYS.

I’ve seen bathroom renovations take less time than that.

With XCP on a cDOT cluster?

That listing took 30 minutes.

That’s a 400x performance improvement with a free, easy to use tool. It takes traditionally slow utilities like du, ls, find and dd and makes them faster. It also does another thing – it makes them useful for storage performance benchmark tests.

I used to work in support – we’d get numerous calls about how “slow” our storage was because dd, du, ls or find were slow. We’d get a perfstat, see hardly any iops on the storage, disk utilization near idle, CPUs barely at 25% and say “yea, you’re using the wrong type of test.”

XCP is now another arrow in the quiver for performance testing.

What else can it do?

XCP can also do some pretty rich reporting of datasets. You can gather information like space utilization, extension types, number of files, directory entries, dates modified/created/accessed, even the top 5 space consumers… and all in manager-friendly graphs and charts.

For example:

Screen Shot 2015-11-04 at 10.22.29 PM

Pretty cool, eh? And did I mention it’s FREE?

How does it work?

XCP, at a high level, is built from the ground up and takes the overall concept of rsync, re-invents it and multi-threads it. Everything is done in parallel, using multiple connections and cores. This ensures the only bottleneck of your data transfer is your pipe. XCP will copy as much data over as many threads as your network (and CPUs) can handle. You can saturate as many 10GbE network links as your storage can handle.

The details relayed to me by the XCP team:

  • Parallelism galore – multitasking, multiprocessing, and multiple links
  • Built-in NFS client that does asynchronous queueing and streaming of all standard NFSv3 requests listed in RFC-1813
  • Typically 5-25 times faster than rsync!

As our German friends say, it’s like the Autobahn – no speed limit (other than the limits of your own vehicle).

If you don’t believe me, try it for yourself. Contact your NetApp sales reps or partners and get a proof of concept going. Keep in mind that all this awesomeness is just in version 1.0 of this software. There are many plans to make this tool even better, including plans for supporting other protocols. Right now, in the lab, we’re looking at S3 (DataFabric, anyone?) and CIFS/SMB support for XCP!

XCP a breakthrough in data migration, processing and reporting.

Introducing: Copy-Free Transition

Clustered Data ONTAP 8.3.2RC1 was announced last week and included many enhancements to ONTAP, including a feature called Copy-Free Transition.

A number of people knew about this feature prior to the cDOT release because they attended Insight 2015 and witnessed either a live demo of the feature or the session presented by Jay White (CR-2845-2: Using the 7-Mode Transition Tool).

We talked a bit about CFT in Episode 15 of the Tech ONTAP Podcast.

There’s also a video demo of Copy-Free Transition available here:

If you’re not familiar with Copy-Free Transition (CFT), then here’s a brief rundown…

cft-button

What is Copy-Free Transition?

Prior to cDOT 8.3.2, transition to clustered Data ONTAP involved copying your data from 7-Mode to clustered Data ONTAP using one of the many tools available. Architecturally and structurally clustered Data ONTAP is very different from 7-Mode which precluded the ability to upgrade in-place to clustered Data ONTAP.

Essentially, you would use one of the following migration options:

  • Use the 7-Mode Transition Tool (7MTT) which leverages SnapMirror to replicate data from 7-Mode to clustered Data ONTAP
  • An application-based migration option (such as Storage vMotion from VMware)
  • File copy options such as ndmpcopy, RoboCopy, rsync, etc.
  • Using Foreign LUN Import

As the above migration options are all methods that copy data, the general term used to describe them is Copy-Based Transition (CBT).

With CFT in 8.3.2 and later, the 7MTT can be used to migrate to clustered Data ONTAP by simply halting your 7-Mode systems, recabling your disk shelves to a cDOT system, then importing the data and configuration into the cluster.

Voilà! Transition simplified!

Why do we want to use CFT?

For starters, you’d use CFT because it allows you to move a large amount of data in a fraction of the time it would take you to copy it. This “big bang” type of transition does require a little extra planning to make sure the clustered Data ONTAP environment is functional post-CFT, but the 7MTT contains extensive pre-checks and assessment capabilities to assist you with your transition planning.

Our live demo at Insight involved a 2-node HA pair with 2 data aggregates and 4 volumes. These volumes served NFS, CIFS and iSCSI data. We were able to finish a live migration in less than 30 minutes, start to finish.

I wasn’t just wearing a Flash costume for giggles – I wanted to emphasize how fast CFT can be.

cft-flash

The guidance from engineering I’ve heard is 3-8 hours, but they’ve been *very* generous in the amount of time built in for cabling the shelves. The time to completion is also dictated by the overall number of objects in the system (ie, number of volumes, qtrees, quotas, exports, etc) and not the size of the dataset. That’s because the 7MTT has to build the configuration on the cDOT system and that takes a number of ZAPI calls. Fundamentally, the message here is that you can do CFT, and roll back if necessary, within a single maintenance window. The main contention for timing here will be how long it takes to re-cable or move disk shelves and reconnect clients.

The actual conversion of the 7-Mode volumes is relatively quick.

Anecdotally, I heard about a customer that did an early preview of CFT with multiple terabytes of data. The cutover after the shelves were moved took 30 minutes. That is… impressive.

That timing is not guaranteed, however – it’s a good idea to plan the 3-8 hours into your window.

Aside from the time it takes to transition, using CFT is also a bonus for people who did not want to purchase/rent swing gear to move data (aside from the minimal amount of equipment needed to bring the cDOT cluster up), or people that simply wanted to keep their existing shelves that they already had support on.

Rather than having to copy the data from 7-Mode to a swing system and then to a cDOT system, you can now simply use the existing gear you have.

The sweet spot for CFT is really unstructured NAS data, such as home directories. These datasets can potentially have thousands or millions of objects with corresponding ACLs. CFT allows for a massively simplified transition of this type of data.

 

What do I need for CFT?

This is a short list of what you currently need for CFT. Keep in mind that the product documentation for the cDOT release is the final word, so always check there.

Currently, you need:

  • 7-Mode 8.1.4P48.1.4P9 (source system)
  • Clustered Data ONTAP 8.3.2RC1 or later (destination)
  • 7MTT 2.2 or later
  • 64-bit aggregates
  • A minimally pre-configured* storage virtual machine on the destination cluster – one per vFiler/node
  • If using CIFS, a CIFS server on the destination
  • An HA pair with no data on it other than the cluster config/SVM placeholders
  • Functioning SP modules on the 7-Mode systems

*Minimally pre-configured here means you need a vsroot volume. If CIFS is involved, you need a data LIF, DNS configuration and a CIFS server pre-created in the same domain as the source CIFS server.

If you have a cluster with existing data on it, you can still use CFT, but you have to have a 4 node cluster with 2 of the HA nodes evacuated of all data. Otherwise, 7MTT won’t allow the CFT to continue.

For platform support, please check the documentation, as those are subject to change.

Also keep in mind that this is a version 1.0 of the feature, so there will be more support for things as the feature matures.

What isn’t currently supported by CFT?

  • SnapMirror sources and destinations are supported, but SnapVault currently is not.
  • MetroCluster is currently not supported.
  • 32-bit aggregates are not supported, but can be upgraded to 64-bit prior to running CFT.
  • Systems containing traditional volumes (TradVols), but let’s be real – who uses those still? 🙂
  • Currently, clusters with existing datasets are not supported (must have an evacuated HA pair)

What happens during the CFT process?

In our demo, we had the following graphic:

cft-process

In that graphic, we have gear images for automated processes and M for manual processes. The good thing about CFT is that it’s super easy because it’s mostly automated. The 7MTT handles most of it for you – even the halting of the 7-Mode systems.

Here’s a rundown of each part of that flowchart. For more details, check the product documentation and TR-4052. (not updated yet, but should be updated in time for 8.3.2GA)

Keep in mind that during the 7MTT run, each section will have a window that shows exactly what is happening at each phase.

Start CFT Migration

This covers the start of the 7MTT and the addition of the 7Mode HA pair and cluster management LIF to the tool. This does not cover the initial up-front planning prior to the migration, so keep that in mind. That all has to take place before this part.

During the “Start CFT” portion, you will also populate the data LIFs you want to migrate, the volumes and define the volume paths. You will also map the vFilers you are migrating to the SVM placeholders on the cluster.

Planning and Pre-checks

This portion of CFT is an automated task that will look at a list of pre-canned checks of 7-Mode and cDOT to ensure the source and destination are ready. It checks compatibility via a series of pre-canned checks and looks to see if 7-Mode is doing things that are not currently supported in cDOT. If anything fails, the tool makes you correct the mistakes before you continue as not to allow you to shoot yourself in the foot.

Apply SVM Configuration

This automated process will take the information grabbed from 7-Mode and apply it to cDOT. This includes the data LIFS – they will get created on the SVM and then placed into a “down” state to avoid IP conflicts.

Test SVM Configuration

Here, you would manually ensure that the SVM configuration has been applied correctly. Check the data LIFs, etc.

Verify Cutover Readiness

This is another pre-check that is essentially in place in case you did the pre-check a week ago and need to verify nothing has changed since then.

Disconnect clients

This is a manual process and the start of the “downtime” portion of CFT – we don’t want clients attached to the 7-Mode system during the export/halt phase.

Export & Halt 7-Mode Systems

This is an automated process that is done by the 7MTT. It leverages the SP interfaces on the 7-Mode systems to do a series of halts and reboots, as well as booting into maintenance mode to remove disk ownership. We’re almost there!

Cable Disk Shelves

Another manual process – you essentially move the cables from the 7-Mode system to the cDOT system. You might even have to physically move shelves or heads, depending on  the datacenter layout.

Verify Cabling

This is an automated 7MTT task. It simply looks for the disks and ensures they can be seen. However, it’s a good idea to do some visual checks, as well as potentially make use of Config Advisor or the 7MTT Cabling Guide.

Import Data & Configuration

This automated phase will assign the disks to the cDOT systems, as well as import the remaining configuration that could not be added previously (we need volumes to attach to quotas, etc… volumes had to come over with the shelves). This is also where the actual conversion of the volumes from 7-Mode style to cDOT style takes place.

Pre-prod verification

This is where you need to check the cDOT cluster to ensure your transitioned data is in place and able to be accessed as expected.

Reconnect clients

This is the “all clear” signal to your clients to start using the cluster. Keep in mind that if you are intending on rolling back to 7-Mode at any time, the data written to the cluster from here could potentially be lost, as the roll back entails reverting to an aggregate level snapshot.

Commit

This is the point of difficult return – once you do this, the aggregate level snapshots you could use to roll back will be deleted. That means, if you plan on going back to 7-Mode, you will be using a copy-based method. Be sure to make your decision quickly!

Rolling back to 7-Mode

If, for some strange reason, you have to roll back to 7-Mode, be sure you decide on it prior to committing CFT. In our demo, roll back was simple, but not automated by the 7MTT. To make the process easy and repeatable, I actually scripted it out using a simple shell script. Worked pretty well every time, provided people followed the directions. 🙂

But, it is possible, and if you don’t commit, it’s pretty fast.

If you have any questions about CFT that I didn’t cover here, feel free to comment.

Also, check out this excellent summary blog on transition by Dimitris Krekoukias (@dkrek):

http://recoverymonkey.org/2016/02/05/7-mode-to-clustered-ontap-transition/

NetAppInsight::Come to booth 303 for a special live demo!

NetApp Insight 2015 in Las Vegas is finally upon us and we’re past day 1 of the show. I’ve already delivered one session and did the very first live demo of the stuff we’ve been working so hard on the past few weeks.

I’d like to say we did it without a hitch, but it’s a live demo – Murphy’s law dictates that whatever can happen, will happen. And with live demos, this is especially true. 🙂

But, even with a couple of snafus, we were able to complete the demo and I think it went pretty well. People seemed genuinely excited about what we were showing and had plenty of good questions.

Am I being vague?

Why yes, yes I am. 🙂

You see, we can’t really say *what* exactly the demo is about right now on social media. But I can tell you there is a demo going on at Booth 303 at NetApp Insight 2015 in Las Vegas in Insight Central at 12:15PM and 2:15PM PST.

If you’re not familiar with the layout of the exhibit hall, just enter the doors, go through a couple of giant “N”s, pass the Lab on Demand area and bear left. We’re across from the NetApp Social Media Booth.

Tuesday through Thursday. I’ll be at the 2:15PM slots on Tues/Wed and then both slots on Thursday.

So come on out and see a live demo! You might even get to see how we handle stuff breaking in real time. 🙂

Other stuff

Also, check out my sessions going on this week.

1884-2: Unlocking the Mysteries of Multiprotocol NAS 

This is a level 2 session where I will attempt to demystify multiprotocol NAS and discuss some best practices with regards to clustered Data ONTAP.

  • Tuesday, 10/13, 10:30AM PST (Jasmine A)
  • Wednesday, 10/14, 1PM PST (Breakers B)
  • Thursday, 10/15, 9AM PST (Jasmine C)

1881-3-TT: SecD Deep Dive

This is a level 3 session where I go pretty deep into how SecD works and how to use it to troubleshoot.

  • Wednesday, 10/14, 10:30AM PST (Palm D)