How to Map File and Folder Locations to NetApp ONTAP FlexGroup Member Volumes with XCP

The concept behind a NetApp FlexGroup volume is that ONTAP presents a single large namespace for NAS data, while ONTAP handles the balance and placement of files and folders to the underlying FlexVol member volumes, rather than a storage administrator needing to manage that.

I cover it in more detail in:

There’s also this USENIX presentation:

However, while not knowing/caring where files and folders live in your cluster is nice most of the time, there are occasions where you may need to figure out where a file or folder *actually* lives in the cluster – such as if a member volume has a large imbalance of capacity usage and you need to know what files need to be deleted/moved out of that volume. Previously, there’s been no real good way to do that, but thanks to the efforts of one of our global solutions architects (and one of the inventors of XCP), we now have a way and we don’t even need a treasure map.

Amazon.com: Plastic Treasure Map Party Accessory (1 count) (1/Pkg): Kitchen  & Dining

What is NetApp XCP?

If you’re unfamiliar with NetApp XCP, it’s NetApp’s FREE copy utility/data move that also can be used to do file analytics. There are other use cases, too:

Using XCP to delete files en masse: A race against rm

How to find average file size and largest file size using XCP

Because XCP can run in parallel from a client, it can perform tasks (such as find) much faster in high file count environments, so you’re not sitting around waiting for a command to finish for minutes/hours/days.

Since a FlexGroup is pretty much made for high file count environments, we’d want a way to quickly find files and their locations.

ONTAP NAS and File Handles

In How to identify a file or folder in ONTAP in NFS packet traces, I covered how to find inode information and a little bit about how ONTAP file handles are created/presented. The deep details aren’t super important here, but the general concept – that FlexGroup member volume information is stored in file handles that NFS can read – is.

Using that information and some parsing, there’s a Python script that can be used as an XCP plugin to translate file handles into member volume index numbers and present them in easy-to-read formats.

That Python script can be found here:

FlexGroup File Mapper

How to Use the “FlexGroup File Mapper” plugin with XCP

First of all, you’d need a client that has XCP installed. The version isn’t super important, but the latest release is generally the best release to use.

There are two methods we’ll use here to map files to member volumes.

  1. Scan All Files/Folders in a FlexGroup and Map Them All to Member Volumes
  2. Use a FlexGroup Member Volume Number and Find All Files in that Member Volume

To do this, I’ll use a FlexGroup that has ~2 million files.

::*> df -i FGNFS
Filesystem iused ifree %iused Mounted on Vserver
/vol/FGNFS/ 2001985 316764975 0% /FGNFS DEMO

Getting the XCP Host Ready

First, copy the FlexGroup File Mapper plugin to the XCP host. The file name isn’t important, but when you run the XCP command, you’ll either want to specify the plugin’s location or run the command from the folder the plugin lives in.

On my XCP host, I have the plugin named fgid.py in /testXCP:

# ls -la | grep fgid.py
-rw-r--r-- 1 502 admin 1645 Mar 25 17:34 fgid.py
# pwd
/testXCP

Scan All Files/Folders in a FlexGroup and Map Them All to Member Volumes

In this case, we’ll map all files and folders to their respective FlexGroup member volumes.

This is the command I use:

xcp diag -run fgid.py scan -fmt '"{} {}".format(x, fgid(x))' 10.10.10.10:/exportname

You can also include -parallel (n) to control how many processes spin up to do this work and you can use > filename at the end to pipe the output to a file (recommended).

For example, scanning ~2 million files in this volume took just 37 seconds!

# xcp diag -run fgid.py scan -fmt '"{} {}".format(x, fgid(x))' 10.10.10.10:/FGNFS > FGNFS.txt
402,061 scanned, 70.6 MiB in (14.1 MiB/s), 367 KiB out (73.3 KiB/s), 5s
751,933 scanned, 132 MiB in (12.3 MiB/s), 687 KiB out (63.9 KiB/s), 10s
1.10M scanned, 193 MiB in (12.2 MiB/s), 1007 KiB out (63.6 KiB/s), 15s
1.28M scanned, 225 MiB in (6.23 MiB/s), 1.14 MiB out (32.6 KiB/s), 20s
1.61M scanned, 283 MiB in (11.6 MiB/s), 1.44 MiB out (60.4 KiB/s), 25s
1.91M scanned, 335 MiB in (9.53 MiB/s), 1.70 MiB out (49.5 KiB/s), 31s
2.00M scanned, 351 MiB in (3.30 MiB/s), 1.79 MiB out (17.4 KiB/s), 36s
Sending statistics…

Xcp command : xcp diag -run fgid.py scan -fmt "{} {}".format(x, fgid(x)) 10.10.10.10:/FGNFS
Stats : 2.00M scanned
Speed : 351 MiB in (9.49 MiB/s), 1.79 MiB out (49.5 KiB/s)
Total Time : 37s.
STATUS : PASSED

The file created was 120MB, though… that’s a LOT of text to sort through.

-rw-r--r--. 1 root root 120M Apr 27 15:28 FGNFS.txt

So, there’s another way to do this, right? Correct!

If I know the folder I want to filter, or even a matching of file names, I can use -match in the command. In this case, I want to find all folders named dir_33.

This is the command:

# xcp diag -run fgid.py scan -fmt '"{} {}".format(x, fgid(x))' -match "name=='dir_33'" 10.10.10.10:/FGNFS > dir_33_FGNFS.txt

This is the output of the file. Two folders – one in member volume 3, one in member volume 4:

# cat dir_33_FGNFS.txt
x.x.x.x:/FGNFS/files/client1/dir_33 3
x.x.x.x:/FGNFS/files/client2/dir_33 4

If I want to use pattern matching for file names (ie, I know I want all files with “moarfiles3” in the name), then I can do this using regex and/or wildcards. More examples can be found in the XCP user guides.

Here’s the command I used. It found 440,400 files with that pattern in 27s.

# xcp diag -run fgid.py scan -fmt '"{} {}".format(x, fgid(x))' -match "fnm('moarfiles3*')" 10.10.10.10:/FGNFS > moarfiles3_FGNFS.txt

507,332 scanned, 28,097 matched, 89.0 MiB in (17.8 MiB/s), 465 KiB out (92.9 KiB/s), 5s
946,796 scanned, 132,128 matched, 166 MiB in (15.4 MiB/s), 866 KiB out (80.1 KiB/s), 10s
1.31M scanned, 209,340 matched, 230 MiB in (12.8 MiB/s), 1.17 MiB out (66.2 KiB/s), 15s
1.73M scanned, 297,647 matched, 304 MiB in (14.8 MiB/s), 1.55 MiB out (77.3 KiB/s), 20s
2.00M scanned, 376,195 matched, 351 MiB in (9.35 MiB/s), 1.79 MiB out (48.8 KiB/s), 25s
Sending statistics…

Filtered: 444400 matched, 1556004 did not match

Xcp command : xcp diag -run fgid.py scan -fmt "{} {}".format(x, fgid(x)) -match fnm('moarfiles3*') 10.10.10.10:/FGNFS
Stats : 2.00M scanned, 444,400 matched
Speed : 351 MiB in (12.6 MiB/s), 1.79 MiB out (65.7 KiB/s)
Total Time : 27s.

And this is a sample of some of those entries (the file is 27MB):

x.x.x.x:/FGNFS/files/client1/dir_45/moarfiles3158.txt 3
x.x.x.x:/FGNFS/files/client1/dir_45/moarfiles3159.txt 3

I can also look for files over a certain size. In this volume, the files are all 4K in size; but in my TechONTAP volume, I have varying file sizes. In this case, I want to find all .wav files greater than 100MB. This command didn’t seem to pipe to a file for me, but the output was only 16 files.

# xcp diag -run fgid.py scan -fmt '"{} {}".format(x, fgid(x))' -match "fnm('.wav') and size > 500*M" 10.10.10.11:/techontap > TechONTAP_ep.txt

10.10.10.11:/techontap/Episodes/Episode 20x - Genomics Architecture/ep20x-genomics-meat.wav 4
10.10.10.11:/techontap/archive/combine.band/Media/Audio Files/ep104-webex.output.wav 5
10.10.10.11:/techontap/archive/combine.band/Media/Audio Files/ep104-mics.output.wav 3
10.10.10.11:/techontap/archive/Episode 181 - Networking Deep Dive/ep181-networking-deep-dive-meat.output.wav 6
10.10.10.11:/techontap/archive/Episode 181 - Networking Deep Dive/ep181-networking-deep-dive-meat.wav 2

Filtered: 16 matched, 7687 did not match

xcp command : xcp diag -run fgid.py scan -fmt "{} {}".format(x, fgid(x)) -match fnm('.wav') and size > 100M 10.10.10.11:/techontap
Stats : 7,703 scanned, 16 matched
Speed : 1.81 MiB in (1.44 MiB/s), 129 KiB out (102 KiB/s)
Total Time : 1s.
STATUS : PASSED

But what if I know that a member volume is getting full and I want to see what files are in that member volume?

Use a FlexGroup Member Volume Number and Find All Files in that Member Volume

In the case where I know what member volume needs to be addressed, I can use XCP to search using the FlexGroup index number. The index number lines up with the member volume numbers, so if the index number is 6, then we know the member volume is 6.

In my 2 million file FG, I want to filter by member 6, so I use this command, which shows there are ~95019 files in member 6:

# xcp diag -run fgid.py scan -match 'fgid(x)==6' -parallel 10 -l 10.10.10.10:/FGNFS > member6.txt

 615,096 scanned, 19 matched, 108 MiB in (21.6 MiB/s), 563 KiB out (113 KiB/s), 5s
 1.03M scanned, 5,019 matched, 180 MiB in (14.5 MiB/s), 939 KiB out (75.0 KiB/s), 10s
 1.27M scanned, 8,651 matched, 222 MiB in (8.40 MiB/s), 1.13 MiB out (43.7 KiB/s), 15s
 1.76M scanned, 50,019 matched, 309 MiB in (17.3 MiB/s), 1.57 MiB out (89.9 KiB/s), 20s
 2.00M scanned, 62,793 matched, 351 MiB in (8.35 MiB/s), 1.79 MiB out (43.7 KiB/s), 25s

Filtered: 95019 matched, 1905385 did not match

Xcp command : xcp diag -run fgid.py scan -match fgid(x)==6 -parallel 10 -l 10.10.10.10:/FGNFS
Stats       : 2.00M scanned, 95,019 matched
Speed       : 351 MiB in (12.5 MiB/s), 1.79 MiB out (65.0 KiB/s)
Total Time  : 28s.
STATUS      : PASSED

When I check against the files-used for that member volume, it lines up pretty well:

::*> vol show -vserver DEMO -volume FGNFS__0006 -fields files-used
vserver volume      files-used
------- ----------- ----------
DEMO    FGNFS__0006 95120

And the output file shows not just the file names, but also the sizes!

rw-r--r-- --- root root 4KiB 4KiB 18h22m FGNFS/files/client2/dir_143/moarfiles1232.txt
rw-r--r-- --- root root 4KiB 4KiB 18h22m FGNFS/files/client2/dir_143/moarfiles1233.txt
rw-r--r-- --- root root 4KiB 4KiB 18h22m FGNFS/files/client2/dir_143/moarfiles1234.txt

And, if I choose, I can filter further with the sizes. Maybe I just want to see files in that member volume that are 4K or less (in this case, that’s all of them):

# xcp diag -run fgid.py scan -match 'fgid(x)==6 and size < 4*K' -parallel 10 -l 10.10.10.10:/FGNFS

In my “TechONTAP” volume, I look for 500MB files or greater in member 6:

# xcp diag -run fgid.py scan -match 'fgid(x)==6 and size > 500*M' -parallel 10 -l 10.10.10.11:/techontap

rw-r--r-- --- 501 games 596MiB 598MiB 3y219d techontap/Episodes/Episode 1/Epidose 1 GBF.band/Media/Audio Files/Tech ONTAP Podcast - Episode 1 - AFF with Dan Isaacs v3_1.aif
rw-r--r-- --- 501 games 885MiB 888MiB 3y219d techontap/archive/Prod - old MacBook/Insight 2016_Day2_TechOnTap_JParisi_ASullivan_GDekhayser.mp4
rw-r--r-- --- 501 games 787MiB 790MiB 1y220d techontap/archive/Episode 181 - Networking Deep Dive/ep181-networking-deep-dive-meat.output.wav

Filtered: 3 matched, 7700 did not match

Xcp command : xcp diag -run fgid.py scan -match fgid(x)==6 and size > 500*M -parallel 10 -l 10.10.10.11:/techontap
Stats : 7,703 scanned, 3 matched
Speed : 1.81 MiB in (1.53 MiB/s), 129 KiB out (109 KiB/s)
Total Time : 1s.
STATUS : PASSED

So, there you have it! A way to find files in a specific member volume inside of a FlexGroup! Let me know if you have any comments or questions below.

New XCP 1.6.1 Version is now Available!

flash

If you use XCP (or have used it), you might have run into an issue or two on occasion – generally related to memory issues or problems with XCP sync taking too long.

The new XCP 1.6.1 release aims to fix some of those issues.

Key enhancements

Sync Performance : Identified sync bottlenecks w.r.t change handling and fixed in 1.6.1 to Improve the sync performance. Now incremental sync takes less time than baseline.  This would help performing more frequent syncs or will need less sync/cutover window.

Memory Optimization : Memory allocation was a challenge in the previous versions and used to receive “Memory allocation” errors during the large sync jobs. XCP 1.6.1 handles the memory allocation elegantly to further assist in accomplishing the sync job.

Syslog integration : Configure Syslog option to send XCP events to the remote syslog server for monitoring the XCP jobs.

Complete Features List of NetApp XCP 1.6.1

  • Sync performance improvements (NFS/SMB)
  • Memory optimization (NFS Sync)
  • File Analytics (General Availability)
  • Event logging – XCP now supports logging events to xcp_event.log file
  • Syslog support (NFS/SMB)

The XCP TME Karthik Nagalingam also wrote up a blog on XCP:

https://blog.netapp.com/data-migration-xcp

File Analytics

In XCP 1.6, a new File Analytics Engine was added to XCP. This basically configures an Apache web server, a postgresql database and adds a web GUI to view file information using an XCP listener agent. Karthik also wrote a blog on that:

https://blog.netapp.com/xcp-data-migration-software

With this feature, you can add your NFS server/volume and XCP will run periodic scans on the datasets to provide information.

xcp_fsa_add_nfs

Once you add a server, you get a list of the available exports:

xcp_fsa_shares

Once you have the shares, you can click on one and run a scan:

xcp_fsa_home

Once that’s done, you can access the Stat View and File Distribution tabs:

The stat view includes:

  • Hot/cold data size and count
  • File capacity sorted by size
  • File distribution by size
  • Directory depth
  • Directory entries
  • Accessed time
  • Information on space occupied by users

Postgresql DB

In addition to the web GUI, you also can access the postgresql database. That way you can query the database if you choose.

When XCP configures the database, a new table called xcp_cmdb is added:

# sudo -u postgres psql
psql (9.2.24)
Type "help" for help.

postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
xcp_cmdb | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(4 rows)

You can access that database and query which tables are added:

# sudo -u postgres psql xcp_cmdb
psql (9.2.24)
Type "help" for help.

xcp_cmdb=# \dt
List of relations
Schema | Name | Type | Owner
--------+---------------------+-------+---------
public | xcp_agent | table | central
public | xcp_login | table | central
public | xcp_nfs_file_server | table | central
public | xcp_nfs_jobs | table | central
public | xcp_nfs_stats | table | central
public | xcp_nfs_tree | table | central
public | xcp_smb_file_server | table | central
public | xcp_smb_jobs | table | central
public | xcp_smb_stats | table | central
public | xcp_smb_tree | table | central
(10 rows)

You can use standard Postgresql commands to query the tables.

xcp_cmdb=# SELECT file_server_ip FROM xcp_nfs_file_server;
file_server_ip
----------------
10.10.10.10
(1 row)


xcp_cmdb=# SELECT * FROM xcp_nfs_jobs;
id | export_path | agent_id | process_group_id | start_timestamp |
end_timestamp | type | status | statistics |
stdout | stderr | log | protocol | files_to_process
--------------------------------------+-------------------------+--------------------------------------+------------------+----------------------------+-----
-----------------------+------+----------+------------------------------------------------------------------------------------------------------------------+
--------+--------+-----+----------+------------------
745a0615-4177-4d83-b945-8e302efc89e9 | 10.10.10.10:techontap | bf83aced-81cd-4b7f-be5d-94e55df11484 | 6562 | 2020-07-27 06:26:55.980298 | 2020
-07-27 06:26:56.785463 | SCAN | FINISHED | {"received": 1699564, "scanned": 6928, "sent_throughput": 138247, "sent": 106400, "receive_throughput": 2208273} |
stdout | stderr | log | NFS | 9047
9a5fc78e-7260-47dd-8cb5-da1820dbcd3e | 10.10.10.10:/home | bf83aced-81cd-4b7f-be5d-94e55df11484 | 2039 | 2020-07-31 00:34:50.001866 | 2020
-07-31 00:34:50.390331 | SCAN | FINISHED | {"received": 9808, "scanned": 21, "sent_throughput": 5627, "sent": 2072, "receive_throughput": 26638} |
stdout | stderr | log | NFS | 839
2767d48c-be49-4232-9a49-9632878bd4c6 | 10.10.10.10:techontap | bf83aced-81cd-4b7f-be5d-94e55df11484 | 5832 | 2020-07-27 06:17:47.496321 | 2020
-07-27 06:17:48.290484 | SCAN | FINISHED | {"received": 1699564, "scanned": 6928, "sent_throughput": 83533, "sent": 106400, "receive_throughput": 1334304} |
stdout | stderr | log | NFS | 9114
(3 rows)

Installation/Upgrade Note

If you’re already running an XCP version prior to 1.6.1, you’ll need to re-run the configure.sh script. This creates the XCP service module on the client (NFS version) to provide better overall management.

For example, if I simply untar the XCP install (like you would do in prior versions to upgrade), then XCP commands work fine, but the new File Systems Analytics module doesn’t, as it needs the XCP service to run properly.

# tar -xvf NETAPP_XCP_1.6.1.tgz
./
./xcp/
./xcp/linux/
./xcp/linux/xcp
./xcp/windows/
./xcp/windows/xcp.exe
./xcp/xcp_gui/
./xcp/xcp_gui/xcp/
./xcp/configure.sh
./xcp/README

# xcp activate
XCP 1.6.1; (c) 2020 NetApp, Inc.; Licensed to Justin Parisi [NetApp Inc] until Thu Oct 22 12:31:11 2020

XCP already activated

Trying to start the listener will fail:

# ./linux/xcp --listen &
[1] 1525
[root@XCP xcp]# Traceback (most recent call last):
File "../xcp.py", line 957, in <module>
File "../xcp.py", line 902, in main
File "../rest/config.py", line 117, in <module>
File "../rest/config.py", line 112, in singleton
File "../rest/config.py", line 163, in __init__
File "/local/build/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 3215, in first
File "/local/build/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 3007, in __getitem__
File "/local/build/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 3317, in __iter__
File "/local/build/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 3339, in _execute_and_instances
File "/local/build/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 3354, in _get_bind_args
File "/local/build/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 3332, in _connection_from_session
File "/local/build/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1123, in connection
File "/local/build/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1129, in _connection_for_bind
File "/local/build/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 430, in _connection_for_bind
File "/local/build/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2226, in _contextual_connect
File "/local/build/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2266, in _wrap_pool_connect
File "/local/build/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1536, in _handle_dbapi_exception_noconnection
File "/local/build/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 383, in raise_from_cause
File "/local/build/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 2262, in _wrap_pool_connect
File "/local/build/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 354, in connect
File "/local/build/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 751, in _checkout
File "/local/build/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 483, in checkout
File "/local/build/lib/python2.7/site-packages/sqlalchemy/pool/impl.py", line 138, in _do_get
File "/local/build/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
File "/local/build/lib/python2.7/site-packages/sqlalchemy/pool/impl.py", line 135, in _do_get
File "/local/build/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 299, in _create_connection
File "/local/build/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 428, in __init__
File "/local/build/lib/python2.7/site-packages/sqlalchemy/pool/base.py", line 630, in __connect
File "/local/build/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 114, in connect
File "/local/build/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 453, in connect
File "/local/build/lib/python2.7/site-packages/pgdb.py", line 1623, in connect
sqlalchemy.exc.InternalError: (pg.InternalError) could not connect to server: Connection refused
Is the server running on host "127.0.0.1" and accepting
TCP/IP connections on port 5432?

(Background on this error at: http://sqlalche.me/e/2j85)

[1]+ Exit 1 ./linux/xcp --listen

When we look for the XCP service, it’s not there. That gets created with the script:

# service xcp status
Redirecting to /bin/systemctl status xcp.service
Unit xcp.service could not be found.
# systemctl status xcp
Unit xcp.service could not be found.
# sudo systemctl start xcp
Failed to start xcp.service: Unit not found.

Here’s an example of the configure.sh script to upgrade:

# ./configure.sh 
------------XCP CONFIGURATION SCRIPT -------------
# Checking if xcp service is running

Menu Help:
# Install/ Upgrade -
# Install - Installs PostgreSQL and httpd if not installed and does setup for XCP File Analytics.
# Upgrade - Upgrades takes a backup of xcp folder under /var/www/html/ to xcp_backup
# Repair - Restarts PostgreSQL database, http and XCP services incase if any of them are stopped.
# Cleanup - Uninstalls PostgreSQL and httpd.

1. Installation/Upgrade
2. Repair
3. Cleanup
0. Exit

Please select the option from the menu [0-3]: 1 
--------Setting up Postgres -----------
# Checking if postgres is installed
Redirecting to /bin/systemctl status postgresql.service
# Postgres service is already running 
--------------Setting up httpd -------------
# Checking if httpd is installed
httpd-2.4.6-93.el7.centos.x86_64
# HTTPD is already installed... 
--------------Setting up https -------------
mod_ssl-2.4.6-93.el7.centos.x86_64
# Https is already installed... 
--------------Configuring xcp --------------
# Copying xcp files.....
# Taking backup of xcp folder /var/www/html/xcp in /var/www/html/xcp_backup
NOTE:
If you are running for first time, select option 1.
If you are upgrading select option 0.
______________________________
Please choose the menu you want to start:
1. Configure client system
2. Reset admin password
3. Reset Postgres database password
4. Rebuild xcp.ini file
5. Delete database and reconfigure client system

0. Quit
-------------------------------
Enter digit between [0-5] >> 1

# Postgres service is running
# Database already exists and configured. 
---------------Starting xcp service -------------
# Creating xcp service...
Redirecting to /bin/systemctl status xcp.service
# XCP service started successfully
Created symlink from /etc/systemd/system/multi-user.target.wants/xcp.service to /etc/systemd/system/xcp.service.
root 1818 1 19 20:03 ? 00:00:00 /usr/bin/xcp --listen
postgres 1833 1587 0 20:03 ? 00:00:00 postgres: central xcp_cmdb 127.0.0.1(34018) idle 
---------------------------------------------
XCP File analytics is configured successfully
To start XCP service manually, use "sudo systemctl start xcp" command

Use following link and login as 'admin' user for GUI access
https://XCP/xcp 
---------------------------------------------

Once that’s done, XCP is ready to use!

Full NetApp Support

Since XCP 1.5, you can call in for official NetApp Support for XCP.

Support Site Self-Service Options

To review, order, and renew software subscriptions, click My Service Contracts & Warranties on the Service and Support page of the NetApp Support site.

Using XCP to delete files en masse: A race against rm

superman-flash-race-dc-comics-featured-image

XCP has traditionally been thought of as a way to rapidly migrate large amounts of data, or to scan data and generate reports. And those ideas still hold up today….

But what if i told you that you could use XCP to delete millions of files 5-6x faster than running rm on an NFS client?

Wait… why would I delete millions of files?

Normally, you wouldn’t. But in some workflows, such scratch space, this is what happens. A bunch of small files get generated and then deleted once the work is done.

I ran a simple test in my lab where I had a flexgroup volume with ~37 million files in it.

::*> vol show -vserver DEMO -volume flexgroup_16 -fields files-used
vserver volume files-used
------- ------------ ----------
DEMO flexgroup_16 37356098

I took a snapshot of that data so I could restore it later for XCP to delete and then ran rm -rf on it from a client. It took 20 hours:

# time rm -rf /flexgroup/*

real 1213m4.652s
user 1m39.703s
sys 41m16.978s

Then I restored the snapshot and deleted the same ~37 million files using XCP. That took roughly 3.5 hours:

# time xcp diag -rmrf 10.193.67.219:/flexgroup_16
real 218m17.765s
user 149m16.132s
sys 40m47.427s

So, if you have a workflow that requires you to delete large amounts of data that normally takes you FOREVER, try XCP next time…

These are VMs with limited RAM and 1GB network connections, so I’d imagine with bigger, beefier servers, those times could come down a bit more. But in an apples to apples test, XCP wins again!

How to find average file size and largest file size using XCP

If you use NetApp ONTAP to host NAS shares (CIFS or NFS) and have too many files and folders to count, then you know how challenging it can be to figure out file information in your environment in a quick, efficient and effective way.

This becomes doubly important when you are thinking of migrating NAS data from FlexVol volumes to FlexGroup volumes, because there is some work up front that needs to be done to ensure you size the capacity of the FlexGroup and its member volumes correctly. TR-4571 covers some of that in detail, but it basically says “know your average file size.” It currently doesn’t tell you *how* to do that (though it will eventually). This blog attempts to fill that gap.

XCP

I’ve written previously about XCP here:

Generally speaking, it’s been to tout the data migration capabilities of the tool. But, in this case, I want to highlight the “xcp scan” capability.

XCP scan allows you to use multiple, parallel threads to analyze an unstructured NAS share much more quickly than you could with basic tools like rsync, du, etc.

The NFS version of XCP also allows you to output this scan to a file (HTML, XML, etc) to generate a report about the scanned data. It even does the math for you and finds the largest (max) file size and average file size!

xcpfilesize

The command I ran to get this information was:

# xcp scan -md5 -stats -html SERVER:/volume/path > filename.html

That’s it! XCP will scan and write to a file. You can also get info about the top five file consumers (by number and capacity) by owner, as well as get some nifty graphs. (Pro tip: Managers love graphs!)

xcp-graphs

What if I only have SMB/CIFS data?

Currently, XCP for SMB doesn’t support output to HTML files. But that doesn’t mean you can’t have fun, too!

You can stand up a VM using CentOS or whatever your favorite Linux kernel is and use XCP for NFS to scan data – provided the client has the necessary access to do so and you can score an NFS license (even if it’s eval). XCP scans are read-only, so you shouldn’t have issues running them.

Just keep in mind the following:

NFS to shares that have traditionally been SMB/CIFS-only are likely NTFS security style. This means that the user you are accessing the data as (for example, root) should be able to map to a valid Windows user that has read access to the data. NFS clients that access NTFS security style volumes map to Windows users to figure out permissions. I cover that here:

Mixed perceptions with NetApp multiprotocol NAS access

You can check the volume security style in two ways:

  • CLI with the command
    ::> volume show -volume [volname] -fields security-style
  • OnCommand System Manager under the “Storage -> Qtrees” section (yea, yea… I know. Volumes != Qtrees)

ocsm-qtree

To check if the user you are attempting to access the volume via NFS with maps to a valid and expected Windows user, use this CLI command from diag privilege:

::> set diag
::*> diag secd name-mapping show -node node1 -vserver DEMO -direction unix-win -name prof1

'prof1' maps to 'NTAP\prof1'

To see what Windows groups this user would be a member of (and thus would get access to files and folders that have those groups assigned), use this diag privilege command:

::*> diag secd authentication show-creds -node ontap9-tme-8040-01 -vserver DEMO -unix-user-name prof1

UNIX UID: prof1 <> Windows User: NTAP\prof1 (Windows Domain User)

GID: ProfGroup
 Supplementary GIDs:
 ProfGroup
 group1
 group2
 group3
 sharedgroup

Primary Group SID: NTAP\DomainUsers (Windows Domain group)

Windows Membership:
 NTAP\group2 (Windows Domain group)
 NTAP\DomainUsers (Windows Domain group)
 NTAP\sharedgroup (Windows Domain group)
 NTAP\group1 (Windows Domain group)
 NTAP\group3 (Windows Domain group)
 NTAP\ProfGroup (Windows Domain group)
 Service asserted identity (Windows Well known group)
 BUILTIN\Users (Windows Alias)
 User is also a member of Everyone, Authenticated Users, and Network Users

Privileges (0x2080):
 SeChangeNotifyPrivilege

If you want to run XCP as root and want it to have administrator level access, you can create a name mapping. This is what I have in my SVM:

::> vserver name-mapping show -vserver DEMO -direction unix-win

Vserver: DEMO
Direction: unix-win
Position Hostname         IP Address/Mask
-------- ---------------- ----------------
1        -                -                Pattern: root
                                           Replacement: DEMO\\administrator

To create a name mapping for root to map to administrator:

::> vserver name-mapping create -vserver DEMO -direction unix-win -position 1 -pattern root -replacement DEMO\\administrator

Keep in mind that backup software often has this level of rights to files and folders, and the XCP scan is read-only, so there shouldn’t be any issue. If you are worried about making root an administrator, create a new Windows user for it to map to (for example, DOMAIN\xcp) and add it to the Backup Operators Windows Group.

In my lab, I ran a scan on a NTFS security style volume called “xcp_ntfs_src”:

::*> vserver security file-directory show -vserver DEMO -path /xcp_ntfs_src

Vserver: DEMO
 File Path: /xcp_ntfs_src
 File Inode Number: 64
 Security Style: ntfs
 Effective Style: ntfs
 DOS Attributes: 10
 DOS Attributes in Text: ----D---
Expanded Dos Attributes: -
 UNIX User Id: 0
 UNIX Group Id: 0
 UNIX Mode Bits: 777
 UNIX Mode Bits in Text: rwxrwxrwx
 ACLs: NTFS Security Descriptor
 Control:0x8014
 Owner:NTAP\prof1
 Group:BUILTIN\Administrators
 DACL - ACEs
 ALLOW-BUILTIN\Administrators-0x1f01ff-OI|CI
 ALLOW-DEMO\Administrator-0x1f01ff-OI|CI
 ALLOW-Everyone-0x100020-OI|CI
 ALLOW-NTAP\student1-0x120089-OI|CI
 ALLOW-NTAP\student2-0x120089-OI|CI

I used this command and nearly 600,000 objects were scanned in 25 seconds:

# xcp scan -md5 -stats -html 10.x.x.x:/xcp_ntfs_src > xcp-ntfs.html
XCP 1.3D1-8ae2672; (c) 2018 NetApp, Inc.; Licensed to Justin Parisi [NetApp Inc] until Tue Sep 4 13:23:07 2018

126,915 scanned, 85,900 summed, 43.8 MiB in (8.75 MiB/s), 14.5 MiB out (2.89 MiB/s), 5s
 260,140 scanned, 187,900 summed, 91.6 MiB in (9.50 MiB/s), 31.3 MiB out (3.34 MiB/s), 10s
 385,100 scanned, 303,900 summed, 140 MiB in (9.60 MiB/s), 49.9 MiB out (3.71 MiB/s), 15s
 516,070 scanned, 406,530 summed, 187 MiB in (9.45 MiB/s), 66.7 MiB out (3.36 MiB/s), 20s
Sending statistics...
 594,100 scanned, 495,000 summed, 220 MiB in (6.02 MiB/s), 80.5 MiB out (2.56 MiB/s), 25s
594,100 scanned, 495,000 summed, 220 MiB in (8.45 MiB/s), 80.5 MiB out (3.10 MiB/s), 25s.

This was the resulting report:

xcp-ntfs

Happy scanning!

XCP SMB/CIFS support available!

If you’re not familiar with what XCP is, I covered it in a previous blog post, Migrating to ONTAP – Ludicrous speed! as well as in the XCP podcast. Basically, it’s a super-fast way to scan and migrate data.

One of the downsides of the tool was the fact that it only supported NFSv3 migrations, which also meant it couldn’t handle NTFS style ACLs. Doing that would require a SMB/CIFS supported version of XCP. Today, we get that with XCP SMB/CIFS 1.0:

https://mysupport.netapp.com/tools/download/ECMLP2357425DT.html?productID=62115&pcfContentID=ECMLP2357425

XCP for SMB/CIFS supports the following:

“show” Displays information about the CIFS shares of a system
“scan”  Reads all files and directories found on a CIFS share and build assessment reports
“copy”  Recursively copies everything from source to destination
“sync”  Performs multiple incremental syncs from source to target
“verify”  Verifies that the target state matches the source, including attributes and NTFS ACLs
“activate”  Activates the XCP license on Windows hosts
“help”     Displays detailed information about XCP commands and options

 

Right now, it’s CLI only, but be on the lookout for a GUI version.

“Installing” XCP on Windows

XCP in Windows is a simple executable file that runs via the cmd or a PowerShell window. One of the pre-requisites for the software includes Microsoft Visual C++ Redistributable for Visual Studio 2017. If you don’t install this, trying to run the program will result in an error that calls out a specific DLL that isn’t registered.

When I copied the file to my Windows host, I created a new directory called “C:\XCP.” You can put that directory anywhere. To run the utility in CMD, you can either navigate to the directory and run “xcp” or add the directory to your system paths to run from anywhere.

For example:

env-windows-path

XCP-path

Once that’s done, run XCP from any location:

cifs-xcp

cifs-xcp-ps.png

Licensing XCP

XCP is a licensed feature. That doesn’t mean you have to pay for it; the license is only used for tracking purposes. But you do have to apply a license. In Windows, that’s pretty easy.

  1. Download a license from xcp.netapp.com
  2. Copy the license into the C:\NetApp\XCP folder
  3. Run “xcp activate”

xcp-license.png

XCP show

The command “xcp show \\server” can give some useful information for an ONTAP SMB/CIFS server, such as:

  • Available shares
  • Capacity (used and available)
  • Current connections
  • Folder path
  • Share attributes and permissions

This output is a good way to get an overall look at what is available on a server.

cifs-xcp-show.png

XCP scan

XCP has a number of useful scanning features. These include:

PS C:\XCP> xcp help scan

usage: xcp scan [-h] [-v] [-parallel <n>] [-match <filter>] [-preserve-atime]
 [-depth <n>] [-stats] [-l] [-ownership] [-du]
 [-fmt <expression>]
 source

positional arguments:
 source

optional arguments:
 -h, --help show this help message and exit
 -v increase debug verbosity
 -parallel <n> number of concurrent processes (default: <cpu-count>)
 -match <filter> only process files and directories that match the filter
 (see `xcp help -match` for details)
 -preserve-atime restore last accessed date on source
 -depth <n> limit the search depth
 -stats print tree statistics report
 -l detailed file listing output
 -ownership retrieve ownership information
 -du summarize space usage of each directory including
 subdirectories
 -fmt <expression> format file listing according to the python expression
 (see `xcp help -fmt` for details)

I scanned my “shared” directory with the -stats option and it was able to scan over 60,000 files in 31 seconds and gave me the following stats:

== Maximum Values ==
 Size Depth Namelen Dirsize
 2.02KiB 5 15 100

== Average Values ==
 Size Depth Namelen Dirsize
 25.6 5 6 6

== Top File Extensions ==
 .py
 50003 1

== Number of files ==
 empty <8KiB 8-64KiB 64KiB-1MiB 1-10MiB 10-100MiB >100MiB
 3 50001

== Space used ==
 empty <8KiB 8-64KiB 64KiB-1MiB 1-10MiB 10-100MiB >100MiB
 0 1.22MiB 0 0 0 0 0

== Directory entries ==
 empty 1-10 10-100 100-1K 1K-10K >10k
 2 10004 101

== Depth ==
 0-5 6-10 11-15 16-20 21-100 >100
 60111

== Modified ==
 >1 year >1 month 1-31 days 1-24 hrs <1 hour <15 mins future
 60111

== Created ==
 >1 year >1 month 1-31 days 1-24 hrs <1 hour <15 mins future
 60111

Total count: 60111
Directories: 10107
Regular files: 50004
Symbolic links:
Junctions:
Special files:
Total space for regular files: 1.22MiB
Total space for directories: 0
Total space used: 1.22MiB
60,111 scanned, 0 errors, 31s

When I increased the parallel threads to 8, it finished in 18 seconds:

PS C:\XCP> xcp scan -stats -parallel 8 \\demo\shared

Total count: 60111
Directories: 10107
Regular files: 50004
Symbolic links:
Junctions:
Special files:
Total space for regular files: 1.22MiB
Total space for directories: 0
Total space used: 1.22MiB
60,111 scanned, 0 errors, 18s

XCP copy

With xcp copy, I can copy SMB/CIFS data with or without ACLs at a much faster rate than simple robocopy. Keep in mind that with this version of XCP, it doesn’t have BACKUP OPERATOR rights, so you’d need to run the utility as an admin user on both source and destination.

In the following example, I used robocopy to copy the same dataset as XCP to a NetApp FlexGroup volume.

Robocopy to FlexGroup results (~20-30 minutes)

         Total Copied Skipped Mismatch FAILED Extras
 Dirs :  10107  10106       1        0      0      0
 Files : 50004  50004       0        0      0      0
 Bytes : 1.21m  1.21m       0        0      0      0
 Times : 0:19:01 0:13:11 0:00:00 0:05:50

Speed : 1615 Bytes/sec.
 Speed : 0.092 MegaBytes/min.

UPDATE: Someone asked if the above robocopy run was done with the /MT flag, which would be a more fair apples to apples comparison, since XCP does multithreading. It wasn’t. The syntax used was:

PS C:\XCP> robocopy /S /COPYALL source destination

So, I re-ran it using MT:8 and with an empty FlexGroup after restoring the base snapshot and converting the security style to NTFS to ensure the ACLs come over as well. The multithreading of robocopy cut the time to completion roughly in half.

Robocopy /MT to FlexGroup results (~8-9 minutes)

 PS C:\XCP> robocopy /S /COPYALL /MT:8 \\demo\shared \\demo\flexgroup\robocopyMT

-------------------------------------------------------------------------------
 ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : Tue Aug 22 20:32:54 2017

Source : \\demo\shared\
 Dest : \\demo\flexgroup\robocopyMT\

Files : *.*

Options : *.* /S /COPYALL /MT:8 /R:1000000 /W:30
------------------------------------------------------------------------------
Total Copied Skipped Mismatch FAILED Extras
 Dirs : 10107 10106 1 0 0 0
 Files : 50004 50004 0 0 0 0
 Bytes : 1.21 m 1.21 m 0 0 0 0
 Times : 0:35:21 0:06:23 0:00:00 0:01:59

Ended : Tue Aug 22 20:41:18 2017

Then I re-ran the XCP to FlexGroup by restoring the baseline snapshot and then making sure the security style of the volume was NTFS. (It was UNIX before, which would have affected ACLs and overall speed). But, the run still held within 4 minutes. So, we’re looking at 2x as fast as robocopy with a small 60k file and folder workload. In addition, the host I’m using is a Windows 7 client VM with a 1GB network connection and not a ton of power behind it. XCP works best with more robust hardware.

win7-info

XCP to FlexGroup results – NTFS security style (~4 minutes!)

PS C:\XCP> xcp copy -parallel 8 \\demo\shared \\demo\flexgroup\XCP
1,436 scanned, 0 errors, 0 skipped, 0 copied, 0 (0/s), 5s
4,381 scanned, 0 errors, 0 skipped, 507 copied, 12.4KiB (2.48KiB/s), 10s
5,426 scanned, 0 errors, 0 skipped, 1,882 copied, 40.5KiB (5.64KiB/s), 15s
7,431 scanned, 0 errors, 0 skipped, 3,189 copied, 67.4KiB (5.37KiB/s), 20s
8,451 scanned, 0 errors, 0 skipped, 4,537 copied, 96.1KiB (5.75KiB/s), 25s
9,651 scanned, 0 errors, 0 skipped, 5,867 copied, 123KiB (5.31KiB/s), 30s
10,751 scanned, 0 errors, 0 skipped, 7,184 copied, 150KiB (5.58KiB/s), 35s
12,681 scanned, 0 errors, 0 skipped, 8,507 copied, 178KiB (5.44KiB/s), 40s
13,891 scanned, 0 errors, 0 skipped, 9,796 copied, 204KiB (5.26KiB/s), 45s
14,861 scanned, 0 errors, 0 skipped, 11,136 copied, 232KiB (5.70KiB/s), 50s
15,966 scanned, 0 errors, 0 skipped, 12,464 copied, 259KiB (5.43KiB/s), 55s
18,031 scanned, 0 errors, 0 skipped, 13,784 copied, 287KiB (5.52KiB/s), 1m0s
19,056 scanned, 0 errors, 0 skipped, 15,136 copied, 316KiB (5.80KiB/s), 1m5s
20,261 scanned, 0 errors, 0 skipped, 16,436 copied, 342KiB (5.21KiB/s), 1m10s
21,386 scanned, 0 errors, 0 skipped, 17,775 copied, 370KiB (5.65KiB/s), 1m15s
23,286 scanned, 0 errors, 0 skipped, 19,068 copied, 397KiB (5.36KiB/s), 1m20s
24,481 scanned, 0 errors, 0 skipped, 20,380 copied, 424KiB (5.44KiB/s), 1m25s
25,526 scanned, 0 errors, 0 skipped, 21,683 copied, 451KiB (5.35KiB/s), 1m30s
26,581 scanned, 0 errors, 0 skipped, 23,026 copied, 479KiB (5.62KiB/s), 1m35s
28,421 scanned, 0 errors, 0 skipped, 24,364 copied, 507KiB (5.63KiB/s), 1m40s
29,701 scanned, 0 errors, 0 skipped, 25,713 copied, 536KiB (5.70KiB/s), 1m45s
30,896 scanned, 0 errors, 0 skipped, 26,996 copied, 561KiB (5.15KiB/s), 1m50s
31,911 scanned, 0 errors, 0 skipped, 28,334 copied, 590KiB (5.63KiB/s), 1m55s
33,706 scanned, 0 errors, 0 skipped, 29,669 copied, 617KiB (5.52KiB/s), 2m0s
35,081 scanned, 0 errors, 0 skipped, 30,972 copied, 644KiB (5.44KiB/s), 2m5s
36,116 scanned, 0 errors, 0 skipped, 32,263 copied, 671KiB (5.30KiB/s), 2m10s
37,201 scanned, 0 errors, 0 skipped, 33,579 copied, 698KiB (5.48KiB/s), 2m15s
38,531 scanned, 0 errors, 0 skipped, 34,898 copied, 726KiB (5.65KiB/s), 2m20s
40,206 scanned, 0 errors, 0 skipped, 36,199 copied, 753KiB (5.36KiB/s), 2m25s
41,371 scanned, 0 errors, 0 skipped, 37,507 copied, 780KiB (5.39KiB/s), 2m30s
42,441 scanned, 0 errors, 0 skipped, 38,834 copied, 808KiB (5.63KiB/s), 2m35s
43,591 scanned, 0 errors, 0 skipped, 40,161 copied, 835KiB (5.47KiB/s), 2m40s
45,536 scanned, 0 errors, 0 skipped, 41,445 copied, 862KiB (5.31KiB/s), 2m45s
46,646 scanned, 0 errors, 0 skipped, 42,762 copied, 890KiB (5.56KiB/s), 2m50s
47,691 scanned, 0 errors, 0 skipped, 44,052 copied, 916KiB (5.30KiB/s), 2m55s
48,606 scanned, 0 errors, 0 skipped, 45,371 copied, 943KiB (5.45KiB/s), 3m0s
50,611 scanned, 0 errors, 0 skipped, 46,518 copied, 967KiB (4.84KiB/s), 3m5s
51,721 scanned, 0 errors, 0 skipped, 47,847 copied, 995KiB (5.54KiB/s), 3m10s
52,846 scanned, 0 errors, 0 skipped, 49,138 copied, 1022KiB (5.32KiB/s), 3m15s
53,876 scanned, 0 errors, 0 skipped, 50,448 copied, 1.02MiB (5.53KiB/s), 3m20s
55,871 scanned, 0 errors, 0 skipped, 51,757 copied, 1.05MiB (5.42KiB/s), 3m25s
57,011 scanned, 0 errors, 0 skipped, 53,080 copied, 1.08MiB (5.52KiB/s), 3m30s
58,101 scanned, 0 errors, 0 skipped, 54,384 copied, 1.10MiB (5.39KiB/s), 3m35s
59,156 scanned, 0 errors, 0 skipped, 55,714 copied, 1.13MiB (5.57KiB/s), 3m40s
60,111 scanned, 0 errors, 0 skipped, 57,049 copied, 1.16MiB (5.52KiB/s), 3m45s
60,111 scanned, 0 errors, 0 skipped, 58,483 copied, 1.19MiB (6.02KiB/s), 3m50s
60,111 scanned, 0 errors, 0 skipped, 59,907 copied, 1.22MiB (5.79KiB/s), 3m55s
60,111 scanned, 0 errors, 0 skipped, 60,110 copied, 1.22MiB (5.29KiB/s), 3m56s

XCP sync and verify

Sync and verify can be used during data migrations to ensure the source and target match up before cutting over. These use the same multi-processing capabilities as copy, so this should also be fast. Keep in mind that sync could also potentially be used to do incremental backups using XCP!

xcp-verify.png

Behind the Scenes: Episode 100 – XCP

Welcome to the Episode 100, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

group-4-2016

This week is our 100th episode! In true TechONTAP podcast fashion, we didn’t celebrate it at all.

Instead, we stuck to the tech and brought in Bogdan Minciu and Joshey Lazer of the XCP team to discuss XCP and the upcoming release that supports CIFS/SMB.

Also, check out the podcast episode on migrations (where we chat about XCP) and this XCP blog.

Finding the Podcast

The podcast is all finished and up for listening. You can find it on iTunes or SoundCloud or by going to techontappodcast.com.

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

You can listen here:

 

Migrating to ONTAP – Ludicrous speed!

As many of those familiar with NetApp know, the era of clustered Data ONTAP (CDOT) is upon us. 7-Mode is going the way of the dodo, and we’re helping customers (both legacy and new) move to our scale-out storage solution.

There are a variety of ways people have been moving to cDOT:

(Also, stay tuned for more transition goodness coming very, very soon!)

What’s unstructured NAS data?

If you’re not familiar with the term, unstructured NAS data is, more or less, just NAS data. But it’s really messy NAS data.

It’s home directories, file shares, etc. It refers to a dataset that has been growing and growing over time and becoming harder and harder to manage at a granular level due to the directory structure, number of objects and the sheer amount of ACLs.

It’s a sore point for NAS migrations because it’s difficult to move due to the dependencies. If you’re coming from 7-Mode, you can certainly migrate using the 7MTT, which will copy all those folders and ACLs, but you potentially miss out on the opportunity to restructure that NAS data into a more manageable, logical format via copy-based transition (CBT).

When coming from a non-NetApp storage system, it gets trickier because copying the data is the *only* option at that point. Then the complexity of the unstructured NAS data is exacerbated by the fact that it will take a very, very long time to migrate in some cases.

What tools are available to migrate unstructured NAS data?

The arrows in your quiver (so to speak) for migrating NAS data are your typical utilities, such as the tried and true Robocopy for CIFS/SMB data.

There is also the old standby of NDMP, which just about every storage vendor supports. This can migrate all NAS data types, as it’s file-system agnostic.

However, each of the available methods to migrate are not without challenges. They are fairly slow. Some are single-threaded. All are network-dependent. And the challenges only get more apparent as the number of files grows. Remember, you’re not just copying files – you are copying information associated with those files. That adds to the overhead.

One of the favorite migration tools of NAS data is rsync. Some people swear it’s the best backup tool ever. However, it faces the same challenges mentioned – it’s slow, especially when dealing with large numbers of objects and wide/deep directory structures.

How has NetApp fixed that?

Thanks to some excellent work by one of NetApp’s architects, Peter Schay, we now have a utility that can help your migrations hit ludicrous speed – without needing rsync.

The tool? XCP.

https://xcp.netapp.com/

Also be on the lookout for some more ONTAP goodness in ONTAP 9 that helps improve performance and capacity with NAS data.

What is XCP?

XCP is a free data migration tool offered by NetApp that promises to accelerate NFSv3 migration for large unstructured NAS datasets, gather statistics about your files, sync, verify… pretty much anything you ever wanted out of a NAS migration tool. Its wheelhouse is high file count environments that use NFSv3, which also happens to be one of the more challenging scenarios for data migration.

Now, I can’t tell you something is really, really fast without giving you some empirical data. I won’t name names, because that’s not what I do, but our test runs showed an unspecified NAS vendor’s migration of 165 million files took 20 times longer than XCP. We took an 8-10 day file copy down to twelve hours in our testing.

In another use case, a customer moved 4 BILLION inodes and a petabyte of data from a non-NetApp system to a cDOT system and it was 30x faster than rsync.

That’s INSANE.

However, if you’re migrating a few large files, you won’t see a huge gain in speed. Rsync would be similarly effective.

And data migration isn’t the only use case – XCP can also help with file listing.

Recall what I mentioned before…

Remember, you’re not just copying files – you are copying information associated with those files.

That “information” I mentioned? It’s called metadata. And it has long been the bane of existence for NAS file systems. It’s all those messy bits of file systems – the directory tree locations, filehandles, file permissions, owners – all the things that make file based storage awesome because of the granularity and security also make it not so awesome because of the overhead. It’s a problem that is seen across vendors.

Case in point – that same not-to-be-named, non-NetApp storage vendor? It took 9 days to do a listing of the aforementioned 165 million files. NINE DAYS.

I’ve seen bathroom renovations take less time than that.

With XCP on a cDOT cluster?

That listing took 30 minutes.

That’s a 400x performance improvement with a free, easy to use tool. It takes traditionally slow utilities like du, ls, find and dd and makes them faster. It also does another thing – it makes them useful for storage performance benchmark tests.

I used to work in support – we’d get numerous calls about how “slow” our storage was because dd, du, ls or find were slow. We’d get a perfstat, see hardly any iops on the storage, disk utilization near idle, CPUs barely at 25% and say “yea, you’re using the wrong type of test.”

XCP is now another arrow in the quiver for performance testing.

What else can it do?

XCP can also do some pretty rich reporting of datasets. You can gather information like space utilization, extension types, number of files, directory entries, dates modified/created/accessed, even the top 5 space consumers… and all in manager-friendly graphs and charts.

For example:

Screen Shot 2015-11-04 at 10.22.29 PM

Pretty cool, eh? And did I mention it’s FREE?

How does it work?

XCP, at a high level, is built from the ground up and takes the overall concept of rsync, re-invents it and multi-threads it. Everything is done in parallel, using multiple connections and cores. This ensures the only bottleneck of your data transfer is your pipe. XCP will copy as much data over as many threads as your network (and CPUs) can handle. You can saturate as many 10GbE network links as your storage can handle.

The details relayed to me by the XCP team:

  • Parallelism galore – multitasking, multiprocessing, and multiple links
  • Built-in NFS client that does asynchronous queueing and streaming of all standard NFSv3 requests listed in RFC-1813
  • Typically 5-25 times faster than rsync!

As our German friends say, it’s like the Autobahn – no speed limit (other than the limits of your own vehicle).

If you don’t believe me, try it for yourself. Contact your NetApp sales reps or partners and get a proof of concept going. Keep in mind that all this awesomeness is just in version 1.0 of this software. There are many plans to make this tool even better, including plans for supporting other protocols. Right now, in the lab, we’re looking at S3 (DataFabric, anyone?) and CIFS/SMB support for XCP!

XCP a breakthrough in data migration, processing and reporting.