As many of those familiar with NetApp know, the era of clustered Data ONTAP (CDOT) is upon us. 7-Mode is going the way of the dodo, and we’re helping customers (both legacy and new) move to our scale-out storage solution.
There are a variety of ways people have been moving to cDOT:
(Also, stay tuned for more transition goodness coming very, very soon!)
What’s unstructured NAS data?
If you’re not familiar with the term, unstructured NAS data is, more or less, just NAS data. But it’s really messy NAS data.
It’s home directories, file shares, etc. It refers to a dataset that has been growing and growing over time and becoming harder and harder to manage at a granular level due to the directory structure, number of objects and the sheer amount of ACLs.
It’s a sore point for NAS migrations because it’s difficult to move due to the dependencies. If you’re coming from 7-Mode, you can certainly migrate using the 7MTT, which will copy all those folders and ACLs, but you potentially miss out on the opportunity to restructure that NAS data into a more manageable, logical format via copy-based transition (CBT).
When coming from a non-NetApp storage system, it gets trickier because copying the data is the *only* option at that point. Then the complexity of the unstructured NAS data is exacerbated by the fact that it will take a very, very long time to migrate in some cases.
What tools are available to migrate unstructured NAS data?
The arrows in your quiver (so to speak) for migrating NAS data are your typical utilities, such as the tried and true Robocopy for CIFS/SMB data.
There is also the old standby of NDMP, which just about every storage vendor supports. This can migrate all NAS data types, as it’s file-system agnostic.
However, each of the available methods to migrate are not without challenges. They are fairly slow. Some are single-threaded. All are network-dependent. And the challenges only get more apparent as the number of files grows. Remember, you’re not just copying files – you are copying information associated with those files. That adds to the overhead.
One of the favorite migration tools of NAS data is rsync. Some people swear it’s the best backup tool ever. However, it faces the same challenges mentioned – it’s slow, especially when dealing with large numbers of objects and wide/deep directory structures.
How has NetApp fixed that?
Thanks to some excellent work by one of NetApp’s architects, Peter Schay, we now have a utility that can help your migrations hit ludicrous speed – without needing rsync.
The tool? XCP.
Also be on the lookout for some more ONTAP goodness in ONTAP 9 that helps improve performance and capacity with NAS data.
What is XCP?
XCP is a free data migration tool offered by NetApp that promises to accelerate NFSv3 migration for large unstructured NAS datasets, gather statistics about your files, sync, verify… pretty much anything you ever wanted out of a NAS migration tool. Its wheelhouse is high file count environments that use NFSv3, which also happens to be one of the more challenging scenarios for data migration.
Now, I can’t tell you something is really, really fast without giving you some empirical data. I won’t name names, because that’s not what I do, but our test runs showed an unspecified NAS vendor’s migration of 165 million files took 20 times longer than XCP. We took an 8-10 day file copy down to twelve hours in our testing.
In another use case, a customer moved 4 BILLION inodes and a petabyte of data from a non-NetApp system to a cDOT system and it was 30x faster than rsync.
However, if you’re migrating a few large files, you won’t see a huge gain in speed. Rsync would be similarly effective.
And data migration isn’t the only use case – XCP can also help with file listing.
Recall what I mentioned before…
Remember, you’re not just copying files – you are copying information associated with those files.
That “information” I mentioned? It’s called metadata. And it has long been the bane of existence for NAS file systems. It’s all those messy bits of file systems – the directory tree locations, filehandles, file permissions, owners – all the things that make file based storage awesome because of the granularity and security also make it not so awesome because of the overhead. It’s a problem that is seen across vendors.
Case in point – that same not-to-be-named, non-NetApp storage vendor? It took 9 days to do a listing of the aforementioned 165 million files. NINE DAYS.
I’ve seen bathroom renovations take less time than that.
With XCP on a cDOT cluster?
That listing took 30 minutes.
That’s a 400x performance improvement with a free, easy to use tool. It takes traditionally slow utilities like du, ls, find and dd and makes them faster. It also does another thing – it makes them useful for storage performance benchmark tests.
I used to work in support – we’d get numerous calls about how “slow” our storage was because dd, du, ls or find were slow. We’d get a perfstat, see hardly any iops on the storage, disk utilization near idle, CPUs barely at 25% and say “yea, you’re using the wrong type of test.”
XCP is now another arrow in the quiver for performance testing.
What else can it do?
XCP can also do some pretty rich reporting of datasets. You can gather information like space utilization, extension types, number of files, directory entries, dates modified/created/accessed, even the top 5 space consumers… and all in manager-friendly graphs and charts.
Pretty cool, eh? And did I mention it’s FREE?
How does it work?
XCP, at a high level, is built from the ground up and takes the overall concept of rsync, re-invents it and multi-threads it. Everything is done in parallel, using multiple connections and cores. This ensures the only bottleneck of your data transfer is your pipe. XCP will copy as much data over as many threads as your network (and CPUs) can handle. You can saturate as many 10GbE network links as your storage can handle.
The details relayed to me by the XCP team:
- Parallelism galore – multitasking, multiprocessing, and multiple links
- Built-in NFS client that does asynchronous queueing and streaming of all standard NFSv3 requests listed in RFC-1813
- Typically 5-25 times faster than rsync!
As our German friends say, it’s like the Autobahn – no speed limit (other than the limits of your own vehicle).
If you don’t believe me, try it for yourself. Contact your NetApp sales reps or partners and get a proof of concept going. Keep in mind that all this awesomeness is just in version 1.0 of this software. There are many plans to make this tool even better, including plans for supporting other protocols. Right now, in the lab, we’re looking at S3 (DataFabric, anyone?) and CIFS/SMB support for XCP!
XCP a breakthrough in data migration, processing and reporting.