Using XCP to delete files en masse: A race against rm

superman-flash-race-dc-comics-featured-image

XCP has traditionally been thought of as a way to rapidly migrate large amounts of data, or to scan data and generate reports. And those ideas still hold up today….

But what if i told you that you could use XCP to delete millions of files 5-6x faster than running rm on an NFS client?

Wait… why would I delete millions of files?

Normally, you wouldn’t. But in some workflows, such scratch space, this is what happens. A bunch of small files get generated and then deleted once the work is done.

I ran a simple test in my lab where I had a flexgroup volume with ~37 million files in it.

::*> vol show -vserver DEMO -volume flexgroup_16 -fields files-used
vserver volume files-used
------- ------------ ----------
DEMO flexgroup_16 37356098

I took a snapshot of that data so I could restore it later for XCP to delete and then ran rm -rf on it from a client. It took 20 hours:

# time rm -rf /flexgroup/*

real 1213m4.652s
user 1m39.703s
sys 41m16.978s

Then I restored the snapshot and deleted the same ~37 million files using XCP. That took roughly 3.5 hours:

# time xcp diag -rmrf 10.193.67.219:/flexgroup_16
real 218m17.765s
user 149m16.132s
sys 40m47.427s

So, if you have a workflow that requires you to delete large amounts of data that normally takes you FOREVER, try XCP next time…

These are VMs with limited RAM and 1GB network connections, so I’d imagine with bigger, beefier servers, those times could come down a bit more. But in an apples to apples test, XCP wins again!

New Technical Report – Electronic Design Automation (EDA) Best Practices

eda-logo

With the introduction of FlexGroup volumes in ONTAP 9.1, I mention that one of the sweet spots for FlexGroup volume use cases is the EDA space, due to the high ingest and large number of files.

As such, I’ve written up a new TR for EDA best practices that can be found here:

http://www.netapp.com/us/media/tr-4617.pdf

What is EDA?

EDA stands for “Electronic Design Automation.” Essentially, it refers to software tools for designing electronic systems such as integrated circuits and printed circuit boards. The tools work together in a design flow that chip designers use to design and analyze entire semiconductor chips. Since a modern semiconductor chip can have billions of components, EDA tools are essential for their design. Here’s a list of EDA companies for reference:

https://en.wikipedia.org/wiki/Electronic_design_automation

Feel free to send feedback to the DL in the doc, or post in the comments here.