Behind the Scenes Episode 381: What’s New in NetApp ONTAP 9.14.1

Welcome to the Episode 381, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

2019-insight-design2-warhol-gophers

Another fall brings another new ONTAP release here at NetApp and this one is chock full of new features to better improve your NetApp experience. Keith Aasen joins us to break it all down and explain what’s new.

Finding the Podcast

You can find this week’s episode here:

I’ve also resurrected the YouTube playlist. You can find this week’s episode here:

You can also find the Tech ONTAP Podcast on:

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Transcription

The following transcript was generated using Descript’s speech to text service and then further edited. As it is AI generated, YMMV.

Tech ONTAP Podcast Episode 381 – What’s New in ONTAP: Version 9.14.1
===

Justin Parisi: This week on the Tech ONTAP podcast, we talk to Keith Aasen about ONTAP 9.14.1.

Podcast Intro/outro: [Intro]

Justin Parisi: Hello and welcome to the Tech ONTAP podcast. My name is Justin Parisi. I’m here in the basement of my house and with me today I have a special guest to talk to us about the latest release of ONTAP, ONTAP 9.14.1. Keith Aasen is here. So Keith, what do you do at NetApp and how do we reach you?

Keith Aasen: From the space above the garage in my house Keith Aasen here. I am a senior product manager on our enterprise storage BU. So that includes ONTAP as well as all the hardware.

Justin Parisi: All right. So like I said, we’re talking about ONTAP 9.14.1, and we just announced it recently at the Insight conference. And this is just to talk about the features themselves and to kind of go into more depth.

So, Keith, tell me a little bit about the release and tell me how we’re going to break it down.

Keith Aasen: Yeah, you bet. First off, Insight was great. Hopefully, many of your listeners made it there. If not, it was super nice to see everybody in person again. Room was packed. So we did a one hour deep dive on ONTAP 9.14.1, had 173 attendees squeezed into a room. Felt like the right size for 60. It was cozy, but it was good to see everybody wanting to hear what we’re doing in the ONTAP space. 9.14.1 is an interesting release. It has a little bit of something for everybody. Sometimes we have these releases, Justin, where it’s like, oh, wow, here’s the one or two really big things you want to talk about, or it has a real clear theme or focusing. ONTAP is so broad as it is, and it’s really unique in the industry. It has high performance. It has massive scalability. It’s targeted for high availability workloads and next generation workloads. It is super multifaceted.

And I think that’s getting reflected in releases like this, where there’s a ton of features in a bunch of different areas. But I can certainly share where we’re investing in general. And I did do this, I went through all of the almost 100 feature enhancements in this release, and I sorted them.

And the big three categories were data protection, tons of enhancements around better ways of protecting data, new ways of replication. We’ll talk about some enhancements to SnapMirror, which is the most widely used data replication software out there. Security… certainly not taking your foot off the pedal of security.

I think ONTAP has a really unique advantage for security right now. And we have to keep that by making sure that we continue to innovate in that space. And finally, cost optimization, right? We always joke that the data is not getting any smaller and nobody’s budgets are getting bigger. And I don’t think that’s any more true than it has been now, and so being able to reduce the cost of storing data has never been more critical, and it’s some really good enhancements in that space.

Justin Parisi: So let’s talk about the data protection. Let’s talk about what’s new in that space. And we’ll dive into a little bit more as we go.

Keith Aasen: Let’s start with one of those little things that at the surface seems pretty simple, but can be a big lifesaver. In System Manager, we added in this ability to do a DR rehearsal. And it is exactly what it sounds like, which is, if I’m replicating a volume of data from one location to the other, I can now in System Manager, just do a right click and initiate a DR rehearsal.

And what that does is without disrupting the replication, builds a FlexClone at the secondary site, and it’s just a mount point to connect that data out. And by doing that, it’s a great way to validate that if I needed to, I could recover my data, right? I can mount that test point and get my app developers or my app owners to connect to that and verify that, hey, if I had to recover, I could.

And probably the best part of it, is when you’re done, you right click and it all cleans up. These are all things that ONTAP admins do today already, hopefully regularly. But rather than having to do a mix of command line and GUI, now it’s all done in a really simple to use GUI right within the System Manager.

Justin Parisi: And another thing about the FlexClone is it’s no space taken up, right? It’s backed by a snapshot. It doesn’t take up space until you start to write to the FlexClone. So if you wanted to do a test where you write to the FlexClone, you certainly could, but then you blow it away and you’ve not used up any space.

Keith Aasen: Yeah, that’s a great one. We call it the DR rehearsal, but you’re absolutely right. You’re not consuming any space and it’s read writable, which means that, yeah, it might be for a DR rehearsal, but I’ve also seen that used for testing an application upgrade, testing an application change, using for test data. You wouldn’t want to do this for something that was being done regularly. Like, I see people automate doing this when they’re running software validation tests, but for these one off requests where it’s like, hey, I’ve got to upgrade this application on Friday. It’s Wednesday.

I maybe want to practice that or I want to do a test run. Hey, I’ll create a clone off the secondary data and you can try the test and when you’re done destroy it ahead of the actual upgrade on the Friday. So yeah, FlexClones are one of those things are so powerful and maybe a bit underused by our clients, partially as product knowledge, but making it easier for them to use it. And that’s what this feature really is meant for.

Justin Parisi: I could see other use cases here. Like, let’s say you’re doing maybe a data analysis in a big data scenario, right? And you’ve got your production data, but you want to verify that data or do training on that data.

And you can split this off into a FlexClone or multiple FlexClones and create multiple mount points where you can train the data in parallel, and you can do all this through an automated way. It’s not just a rehearsal. I feel like there’s other things you can leverage here.

Keith Aasen: Absolutely. A lot of it is just validation, right? And you periodically say, Hey, I’m replicating it. How do I know what I’m replicating is actually good? Like, if I was to spontaneously ask to recover that data, how do I know that data would be in a known good state? Well, let’s test it.

Right. Let’s Grab the most recent replication pass and mount it up and see what’s in there. And the best thing about this is it doesn’t stop the replication schedule. You’re not putting your recovery point at risk at all. That replication is still continuing behind the scenes.

And so you’re never jeopardizing your recovery point. If a disaster actually did happen, you can certainly tear down that FlexClone and mount the most recent replication and recover from there.

Justin Parisi: So I remember FlexClone used to have an individual license you had to purchase.

Is that bundled in now or is it still something you have to buy separately to actually leverage this functionality.

Keith Aasen: Oh, bringing licensing up something near and dear to my heart. Yeah, nice thing is, is it’s included in every bundle. That functionality is something we deem that was so critical to ONTAP that yeah, it’s available for every system now.

Justin Parisi: Good. It’s table stakes now. You have that, you have snapshots, you have all sorts of goodness that’s built into the bundle without trying to gatekeep it.

Keith Aasen: Yeah, exactly. And hopefully most people have climatized to the new bundles going forward. We standardized on really just two. So there’s a base bundle and even that base has all protocols and snapshots and FlexClones. So it’s still pretty comprehensive. And then once we moved to ONTAP One, that’s everything else, right? That’s all your replication, your asynchronous, synchronous replication your advanced replication, like S3 replication as well, SnapMirror to cloud, anything that has a license key is included ONTAP One.

So now it’s just those two levels. So there’s not any extra little a la carte.

Justin Parisi: So we have disaster recovery rehearsal. What’s our next feature that helps us with disaster recovery situations.

Keith Aasen: Well, why don’t we shake things up and dive into one that’s much more sophisticated and take a look at this evolution of SnapMirror. So, we’ve had, several flavors of SnapMirror, though, there’s one license, but you had SnapMirror Asynchronous, which is by far what most people use, which is replicating a volume of data on a given schedule. So it replicates from point A to point B and as low as every five minutes or as infrequent as you want. Infinite distance, infinite latency, anywhere to anywhere. On-prem to the cloud.

Wherever to wherever. And then we have synchronous replication and that obviously has a lot more requirements because we’re requiring that data to be replicated to the secondary site before we acknowledge it back and there’s some rules and restrictions around that, but that’s still very much a one way replication, right?

Point A to point B. And then we had this idea of SnapMirror Business Continuity. And it’s current incarnation is still replicating from point A to point B but with a layer of app awareness tied in there. So we would make sure we present available paths from both sites to the host. As far as the host was concerned. It would see that same LUN of data, both on the site where the LUN was active, as well as the site where the LUN was inactive. And in the event of some sort of an outage on the primary site, we would then make that LUN rewritable on the alternate site. So you had all these available paths, but really all the writes were happening on one location only.

We’re renaming SnapMirror Business Continuity. First off, we’re renaming it to SnapMirror ActiveSync. And that is much more descriptive of what it actually does. In other words, it’s fully synchronous, but now it’s going to be active. And what we mean by active is having that LUN actually be rewritable at both locations. So if we ponder that a little bit, what the application server sees if it’s connected to both sites is active paths to that LUN, both on the primary location and to the secondary location. And they’re both active optimized, which means that app can write to either location and the LUN is actually read/writable on either location.

Where it probably gets even more interesting is you may have a scenario where you have app servers on both sides and they’re only connected to one site or the other. And in that scenario, it’s still fully writable in both locations and ONTAP will synchronously replicate in either direction, which is pretty mind blowing when you when you think about it, but has some really interesting use cases.

Justin Parisi: So what sort of use cases do you see customers using this for?

Keith Aasen: Well, the first big one is going to be VMware. VMware has Site Recovery Manager, which is really meant for that Asynchronous. Kind of an active passive.

And although it works really well you have to do some scripting and some automation. You have to also push the button, right? Somebody has to be there to declare a disaster or initiate failover but vMware HA is entirely autonomous. And so, through VMware HA, if a host dies, VMware will recover the virtual machines that were running on that host on the surviving members of the cluster.

So what SnapMirror ActiveSync allows us to do is to build a VM or HA cluster that spans sites. And what that means is if I lose vSphere hosts from one site, those VMs will automatically, autonomously pop up on the surviving site. So you get that sort of hands off recovery. Of course, you need VMware HA also to do vMotion to move virtual machines while they’re running from one host to the other.

And so having SnapMirror ActiveSync actually gives us not just disaster recovery, but also disaster avoidance. So if you have a data center outage, maybe it’s power, cooling or a weather event, you could effectively evacuate your virtual machines out of one site to the other and then rebalance them seamlessly when that site comes back up.

So some really powerful flexibility specifically from a VMware standpoint.

Justin Parisi: Yeah, that sounds like a great option there for VMware. Is it able to utilize VMware cloud capabilities? Are you able to do this with a cloud instance or is this all strictly on-prem?

Keith Aasen: Strictly on-prem to begin with and that’s more of a limitation on the VMware side where you can’t put vSphere hosts that are on-prem and in the cloud together in an HA group.

Now we do have one of the other things we announced at Insight, sort of out of my technical realm, but we did also announce the BlueXP disaster recovery adapter that does allow that failover from on-prem to the cloud or cloud to on-prem. So we have that capability, but this one in particular is meant for on-prem to on-prem.

You still need to have the two sites be within seven milliseconds of each other because any write is syncronously being replicated to the other. And of course, you need full control over those vSphere clusters, because if you’re defining that HA cluster, so, at least that use case is on-prem to on-prem.

But we do have maybe a more traditional disaster recovery tool to facilitate on-prem to the cloud and back again.

Justin Parisi: So I remember with the SnapMirror Synchronous aspect, if it falls outside of the realm of that latency, it’ll go to asynchronous. Is that what happens here or does it do something else?

Keith Aasen: Well, so in the SnapMirror Synchronous, it can drop back to that because the LUN is always read only on one site and and read/write around the other. ActiveSync is a little bit different because the, the LUN’s actually read/writeable on both sites, and so the behavior has to adjust a little bit differently.

It still has the concept though of site bias. And what I mean by that, is if you do run to a scenario where replication becomes difficult and even communication between the sites becomes compromised, you can still bias that LUN to make sure it remains writeable at one site or at the other.

And that will ensure that the application remains up. It avoids communication issues or replication issues or split brain scenario where it can’t tell if the other site still there or has failed. You can still apply a site bias to say, Hey, if it all goes wonky make sure you keep that LUN up on a given site.

But we have a mediator that also keeps track of what’s going on and helping ONTAP to determine. Hey, is this a replication issue or a communication issue? And what should I do? Or how should it behave? Generally it can sustain a fair amount of jitter. And that’s the other thing we identify is what happens if that latency spikes up a little bit, because typically the given blocks in that LUN are only being written from one site or the other, normally. So yeah, it’s not quite as resilient as SnapMirror Synchronous because it does have this concept of that LUN is actually writeable at both locations, and there’s not really a concept of which side is primary. They’re both primary, right? There’s no primary/secondary, they’re both primary. So it’s a little bit different. As I said, it’s only available as a tech preview right now, but a lot more of that will be defined when it’s GA and we’re targeting GA in 2024.

Justin Parisi: We’re dealing with LUNs. Do these have to be all SAN arrays or can they be AFFs?

Keith Aasen: Great question. Does not need to be all SAN arrays, but the two sides do need to be the same. In other words, I can do this between two AFFs or two ASAs, but I can’t cross the streams. I can’t go from an AFF to an ASA.

And again, that’s primarily because the pathing architecture we use is different on the two and even it’s the same LUN, but I can’t path it differently on one side of this versus the other.

Justin Parisi: And I’m guessing no FAS support here.

Keith Aasen: No FAS support. We need the latency to be much lower than what FAS can do.

So yeah, just AFF and ASA.

Justin Parisi: And what about NAS? Is there a use case there for this or is it strictly SAN?

Keith Aasen: Strictly SAN today, definitely a use case for NAS. But strictly SAN right now. We really wanted to get to this stage on the SAN. And so we decided let’s rather than trying to do two things at once, let’s get the SAN journey really completed to this active active. And now we can go back and look at how we do this for a NAS world. So certainly NAS is on our wishlist of things to do. We just wanted to get the SAN journey completed before we start down the path of NAS.

Justin Parisi: So what else we got?

Keith Aasen: Ah, well, let’s shift gears a little bit. So let’s hop over into security. Now security’s kind of a wide topic because it’s one of the things is everything is security, right? We have to thread security in through every activity.

But one in particular that I’m a big fan of, I’m a huge security guy and I’m not a massive scripting guy, but I got a little PowerShell in my day and I always found it a bit weird when I create a script, and the opening few lines of the script is where I would connect to the given cluster and authenticate.

And quite often that was done in plain text, both the username and password. And that always felt a little insecure. And it certainly is insecure. So ONTAP 9.14.1 is adding full support for OAuth 2.0. OAuth 2.0 is open source authentication mechanism that’s patched in or plugged into a lot of the major automation frameworks. For us, Ansible is a very common framework that’s used for automating ONTAP environments. And so what we’ve added in 9.14 is not only support for it, but right in System Manager, it’s super easy to create a authentication token.

So basically specify what permissions do I want this token to have? How long should the token work for? When does it need to be renewed, et cetera, et cetera. Generate the token. And then that token can be used in your automation framework to authenticate against that particular system.

And as a ONTAP administrator, you still maintain full control. So if something changes, you very easily can go in and see the tokens you’ve handed out and change permissions or expire them, renew them, whatever it needs to be. So a great way of maintaining control of your environment and making sure you’re never doing plain text username and passwords again.

Justin Parisi: What if I like doing plain text passwords?

Keith Aasen: And your security guys are probably not sending you a Christmas card this year.

Justin Parisi: Probably not. If they did, it would be like encrypted anyway. So… I can’t even read this!

Keith Aasen: Very angry Christmas cards. Stop doing that. Please stop doing that. Yeah. There’s a ton of other little security things in there. Like I said, you just have to keep at it in a bunch of different areas. The other one that people may have been waiting for is we added in support a few versions ago for Cisco Duo for multi factor authentication. We added it into System Manager, but it wasn’t there for your SSH. So for you command line heads out there, we didn’t have that, but that’s being added as well. So make sure that even your SSH connections are multi-factor authenticated, and you can use Cisco Duo for that. And we have a number of other authentication frameworks we do wanna plug in for that. It’s surprising amount of work supporting a authentication framework.

So we’re kinda working our way through the list, but we do wanna expand that list pretty quickly.

Justin Parisi: So I know that with ONTAP, and when I was working with it a lot, you’d get questions every now and then like, security teams run their scanners, they detect a vulnerability, but it doesn’t apply to us because, reasons, right?

Maybe it’s proprietary or maybe it doesn’t fit into whatever vulnerability is there, but we still have to address those things. So did we do anything in this release to address that?

Keith Aasen: For sure. And you probably remember like the early days with the identify it as a Linux system or to identify as a BSD system and therefore all the same, like, well, no, it’s not that.

But yes, for example ONTAP internally uses BIND9 as our internal DNS. So we’re able to maintain patching that we made sure it was secure, but, if you did a security scan on ONTAP system, looking specifically for BIND9 exposures, we would identify as a at risk system.

Now we’d patched it, but again, you never want to appear on the vulnerability scan, right? You have to do a bunch of hand waving and justification. So 9.14.1 does. We’ve upgraded the BIND version in there to pass those security scans and that will make it easier for us to upgrade that as well if there’s more vulnerabilities found.

So just a good thing to do. On the surface, you don’t notice anything different, but rest assured we’re doing a lot of these sort of upgrades and enhancements that will just keep ONTAP that much more secure.

Justin Parisi: All right. Do we have anything else for security?

Keith Aasen: Nothing super shattering, but why don’t we switch over to cost optimization? Because there’s a couple of big ones in there that might be fun to talk about.

Justin Parisi: Yeah, we like to help people save money there.

Keith Aasen: Yeah. So how about free storage? That’s always a good one.

Justin Parisi: Whoa. Slow your roll there, man.

Keith Aasen: Well, so a couple of interesting things. I give our engineers all the credit in the world. They have this audacious goal that every version of ONTAP should have higher performance and better storage efficiency. It’s sort of was mind blowing when you go, Hey, we’re going to add in new features and make it faster and make it storage efficient and they do it. Just upgrading to newer versions of ONTAP, you should always get more throughput or IO out of your system and you should always get better storage efficiencies. And that’s going to continue. We see that going down the road and the storage efficiencies come in different flavors.

Early days of ONTAP – old spinning disc days – there was a challenge. Optimize its performance by using WAFL to do large data stripes to the discs. And that’s how we maintain this really fast write performance. But it only worked if we had large areas of disk to write to. And so, we always were like, Oh, don’t fill up the system, your performance will hurt. And so we actually would hide away some capacity in what we refer to as a WAFL reserve to make sure that performance continued.

Well, fast forward a decade or two and ONTAP engineering goes, Hey, you know, we don’t actually need these giant chunks of space anymore, right? We can fill up a system pretty full and we don’t need that much space to maintain that write performance. So in ONTAP 9.12.1, for any all flash system, when it was upgraded to 9.12, you immediately got 5 percent more usable space back. We reduced the WAFL reserve by 5 percent and that yielded 5 percent more usable. Funny story. When I was at Insight, I had a guy come up and goes, Hey, I really appreciate Insight in person, but it’s been a few years, by the way, 9.12.1 caused chaos because we did an ONTAP upgrade on our production system and full chaos ensued because suddenly it looked like we had lost 100 terabytes of data. Like we knew exactly how much data we have in that system. And all of a sudden it looked like there was a hundred terabytes less. Well turns out they didn’t lose any data. Instead, we just gave them 100 terabytes more usable with this 5 percent reserve and all was well. So safety note, read the release notes, make sure you know how this is going to change before doing an upgrade.

I was like, Oh, I’m really sorry for scaring you like that, but Hey, 100 terabytes of free disk.

Justin Parisi: So I guess they thought stuff got deleted or something.

Keith Aasen: Exactly. Yeah. They knew how much free space they had. Suddenly there was more. So I was like, Oh no, what went away? But no, they just got more usable.

Justin Parisi: That’s what they get for paying attention. Like they should just ignore it.

Keith Aasen: I have no idea how much free space I have. Yeah, that’s exactly it. Serves them right for being careful.

Justin Parisi: Responsible admins.

Keith Aasen: And I could wrap my brain around that on flash, right?

Because flash doesn’t need these big, long stripes in order to optimize read write heads. But spinning disk does. And yet still our engineers have figured out a way that say, Hey, we don’t actually need that much reserve. even on spinning disk systems anymore. So 9.14.1, if you upgrade your FAS system, you’ll immediately get 5 percent more usable there. And there are some big FAS systems out there, right? Multi petabyte FAS systems that this could be a lot of additional free space for them. So pretty exciting there.

Justin Parisi: I imagine some of that has to do with the disk sizes themselves, right? Cause I mean they’re big enough where they can accommodate the stripes, so you don’t have to have so much percentage taken up.

Keith Aasen: Yeah, I would think so. You’re not thrashing around on little, one, two terabyte drives anymore. You’ve got these much longer drives and yeah, you’re sending a lot more data per transaction to them. Yeah, it’s just sort of mind blowing. That’s something that gets missed a lot. People look at the storage efficiency ratios, but you have to look everything from raw to effective and ONTAP still absolutely shines when you look at raw to usable. And this only makes that better by 5%, which is pretty powerful.

Justin Parisi: Yeah, absolutely. All right. How else are we saving money?

Keith Aasen: One that I know we’ve had a lot of people use and I love when we have a feature that just works. When we talk with people, go, Hey, are you using this? so yeah, I’m using this. How is it? It just does what it says, and that’s FabricPool. So if you remember back in 9.8, we introduced this idea of FabricPool, which is identifying cold data blocks, packaging those up, getting them off of your high performance flash and getting them on to what we refer to as a cloud tier.

Now that cloud tier is really any object store. Obviously, we’d love it to be, StorageGrid on-prem or ONTAP to ONTAP. So AFF to FAS is also a really great option here. But it can certainly be an object store in the cloud. And obviously, when ONTAP is running in the cloud, and that’s moving between two different types of storage, right?

If ONTAP is running on direct attach flash as a CVO instance, it also means that we can Package that up and send it to the blob or object store in the cloud there as well. But, it’s something we’ve had for a number of releases. Super easy to manage, really effective for moving cold data off of high performance, ergo more expensive media and putting it on something that’s priced right.

And since we introduced it’s tiered over an exabyte of cold data off of high performance media, which is a lot. That’s a lot of flash we’ve avoided customers having to buy.

Justin Parisi: Yeah, absolutely. And it also gives you a way to dabble in the cloud if you’re not really comfortable going directly to the cloud. And then when you’re in the cloud, you can have your performance tier being on that more expensive flash storage, right? Your NVMe disks. And then you can tier off to whatever lower cost thing you’ve got, whether it’s Azure Blob or AWS S3 or whatever. So, it’s a way to reduce those costs without having to do a lot of overhead and management.

Keith Aasen: Absolutely, you set a super simple policy and off you go. Because of that, we haven’t really done much with FabricPool last few releases, but 9.14, we saw some areas we can enhance it. So we’ve tweaked it a bit. We’ve added in some functionality. The first one is you can think of this as a form of thin provisioning or over provisioning the flash and so people always worry. What if I start to run out of that space in that flash? It doesn’t happen that often, but maybe something has changed where you’re suddenly sending a lot more data or some processes pulled some data back.

Normally FabricPool runs entirely in the background. It’s opportunistic, right? So if I’ve got spare CPU cycles, I’ll go look for some cold data and send that off because it’s not critical. It doesn’t have to happen in a really narrow window. But if I’m starting to run out of capacity, and I might have an out of space condition, then things are getting pretty critical.

And at that point will now prioritize this hunt for cold data. So if you’re starting to get really tight for space, we will increase increase the prioritization of FabricPool and try to move some cold data off to try to avoid that out of space condition, which would be a bad situation to be in.

So, I love the fact that this was just automatically or autonomously start to do this.

The other enhancement is, Hey, I love this idea of FabricPool. Today, my data is sitting on vendor X, Y, Z. And I know most of it’s cold. How do I get there without having to buy all that flash? Because normally you’d have to have enough capacity on the flash to land all of the data there if you do a migration and then let it cool.

So you can’t sit there and wait until you start to get that space back. That’s not optimal. So now we have this ability to what’s called cloud write, which is really meant for migrations. If I’m migrating data in, all of the data immediately goes to that cloud tier. And bypasses the hot tier altogether.

And then once the migration is done, your normal programming resumes. All your hot data lands on the hot tier. And things get prioritized between the two. So it’s a great way to do a migration off of third party storage into a FabricPool architecture without having to invest.

You don’t have to buy that capacity or all that flash to begin with. You can start with the 20 percent you probably actually need.

Justin Parisi: So you have a third party system and you leverage the tiering to tier to an S3. And it just bypasses the hot tier. Is that right?

Keith Aasen: So say you had a petabyte of unstructured data. And you know that 80 percent of it’s cold. So if I was to architect the system for ONTAP, I know that I need 200 terabytes of flash, and then I can have 800 terabytes of cold data, right?

But if I want to migrate that to ONTAP, It doesn’t quite work that way because I only have the 200 terabytes of flash. I need to land that petabyte of data somewhere and then wait for it to cool so that 200 terabytes flash and 800 terabytes of object will work eventually.

But how do I get to that state? Well, I could put it into this cloud bypass mode. And as I’m migrating data onto ONTAP, all of that data goes directly into the cloud tier. So I dropped the full petabyte into the capacity tier, leaving my 200 terabytes free on the flash.

And then normal operations begin and all new writes will land on flash and things will get promoted. And I can Begin the normal ecosystem where 80 percent of my data is cold and 20 percent of it is hot. So it saves me from having to buy the full capacity of flash. And I can jump right to the good part, which is the end state of balancing between the two.

Justin Parisi: It also eliminates the need for swing gear, right? So if you’re trying to do a migration and you borrow gear and you have to give it back and all that good stuff, so you just leverage the cloud and it goes right to it. So I was thinking it was going from the third party system directly to the cloud object, but it’s actually like a proxy, like the ONTAP system’s a proxy, you copy it using normal means, like, say, an rsync or an XCP, and then it just goes right to cloud. It doesn’t even stop on the hot tier.

Keith Aasen: You got it. Exactly. It still goes into ONTAP. We just don’t ever land it on media. We just package it directly up and send it to the object tier immediately, leaving the metadata behind, right? Because you want that metadata there. But yeah…

Justin Parisi: all the inodes get populated. You get the file counts and all that stuff, but the data itself just gets automatically moved.

Keith Aasen: Yep, exactly. Exactly. Pretty slick, right?

Justin Parisi: It is pretty slick. Sounds like it might take a while though.

Keith Aasen: Right? Yeah. Yeah. You’re going to get the time of the migration.

But having to worry about swing gear’s a prime example. Also think about it in things like CVO. You want to migrate something from third party into CVO. You can use this there too, right? Where you have the 20 percent in the flash tier and then the rest of it’s in the object.

But you can have that same migration where CVO would send them data blocks directly to the object here. And then CVO would have that automatic tiering. So as useful in the cloud as it is on-prem.

Justin Parisi: Cool. So do we have any other FabricPool enhancements here that we can talk about?

Keith Aasen: Yeah, yeah, one more. Now, when we first designed FabricPool, we did it in mind of, hey, we have blocks on-prem and then a cold tier sitting in the cloud. And that certainly exists. But as we just described quite often, it’s all on-prem or all in the cloud. So on-prem ONTAP to StorageGrid is super common. ONTAP to ONTAP is also really popular from a tiering standpoint.

And then in the cloud, wherever you’re running CVO it’s super popular to tier inside that cloud. And when we were thinking about on-prem to the cloud, it’s super cheap to send data to the cloud, but you want to be really careful about egress. When you leave from the cloud to go back on-prem, you want to be really careful about how you do that.

So FabricPool is super unique. If you asked for a 4k block, we would just fetch you that 4k block back. And so that minimized your egress costs. But if I’m doing on-prem to on-prem, where I’m in the cloud doing cloud to cloud there’s no egress costs. So we can be a lot more opportunistic about fetching data back.

In other words, if I ask for a portion of a file, let’s pull that whole file back. This is a pretty good chance that you’re going to ask the rest of that file. If I have written a bunch of files in time proximity of each other and I asked for one of them back, maybe we should pull those other ones back as well because there’s a good chance if you wrote them together, you might read them together. So we use a lot of those algorithms now to improve the performance and it’s pretty drastic. We’re looking at a 500 percent improvement on single file performance, as much as 85 percent on a multi file read performance. So pretty potent. You do need to pick that mode as you set up your FabricPool, if you’re staying on-prem, or you’re staying in the cloud to the same cloud, you want to pick this optimized mode. And you can change after the fact. If you’re already staying on-prem this is a tweak you can make in 9.14 and just improve your read performance. The irony is I haven’t really had too many people complain about read performance because usually that data, when it gets cold, it just stays cold. But this is a good peace of mind that if you do need some of that cold data back, it just works that much better.

Justin Parisi: Well, it also kind of opens it up to more people because I think some of the concern is the read performance doing this type of work. So if you want to be able to tier things off, but you don’t want it to be slow when it comes back, this might entice you to actually leverage it because now that read performance is not as big of a problem as it was.

And you might say, Oh, okay, this can work for certain use cases. If I do like a monthly report, I don’t have to wait so long for the data to come back. It’ll be smart enough to pull stuff back a lot faster.

Keith Aasen: That’s it entirely. And certain customers, like customers that are working in the media, where they have these large texture files or these large images that have been rendered that, get used in a movie or in a game and they go cold. And so you want to get that off of the high performance media because you don’t need it to be high performance anymore, but you want to keep it there. And now if somebody touches a part of one of those files, we’ll fetch the whole file back and it will drastically reduce that time to rewarm that if they suddenly need that texture file or that bitmap or that image again.

So it’s going to be ideal. Those customers are already using FabricPool to a tremendous success and this only makes it that much better for them.

Justin Parisi: Yeah. And it’s basically treating those files more like objects cause that’s kind of how an object store works anyway. You’re not fetching parts of files. You’re getting the entire file.

Keith Aasen: Yeah, yeah, exactly. And it’s just making ONTAP more aware of its deployment. Oh, I’m on an on-prem only deployment, and therefore, I’m going to handle data this way. And I’m actually aware that I’m on-prem, but sending data to the cloud.

And so I know how to optimize my behavior to minimize customers costs in that scenario. So making ONTAP really much more self aware of how it’s deployed it’s pretty cool and pretty crazy when you think about it.

Justin Parisi: Can you customize it at a granular level? Like maybe I don’t want to bring an entire group of files over. Maybe I just want to do this, the single file bringing it over, or is that just something that’s just baked into the, the entire thing.

Keith Aasen: Right now it’s just baked in. God, Justin, you want all the nerd knobs, don’t you?

Justin Parisi: I like nerd knobs, which is funny. Cause I’m over in the land of no nerd knobs. I’m over in the cloud group, right?

No nerd knobs allowed.

Keith Aasen: No, we’ve tried to keep this pretty simple, right? So try to just have it as a toggle switch. No nerd knobs, but with some toggle switches.

Justin Parisi: All right, cool. So that’s the FabricPool feature functionality. Any other cost optimization benefits that we see here?

Keith Aasen: Those are the the big ones. Now I see we’re getting tight on time. I’ll throw some teasers. We’ve got some big things planned around cost optimization. So definitely looking forward to hopefully getting invited back again in a couple of months time,.

Maybe early spring/May time frame would be a good time to us to talk about storage efficiencies because we got some cool things planned in that space. So yeah, absolutely. Stay tuned there.

Justin Parisi: Well, if I want to find more information about the latest release of ONTAP, where would I do that?

Keith Aasen: I would say NetApp.TV is a great one for all kinds of details and a lot of the different deep dive sessions and posts up on NetApp.TV. YouTube channel. Join Discord. The guys have got me hanging on Discord. There’s a NetApp channel on Discord and if you have questions, try to lurk there to help the guys out.

Otherwise, you can always shoot me an email. It’s pretty easy. It’s KeithA@netapp.com. Always happy to hear from folks on that.

Justin Parisi: All right. Awesome. And your Insight session, is that available for viewing now?

Keith Aasen: Maybe by the time we post this, it might be. I know it was recorded, but I haven’t seen it posted just yet. I think because 9.14.1 is just being uploaded as we speak. So hopefully shortly after that’s posted, the Insight sessions will be up.

Justin Parisi: All right, cool. Well, if it comes out between now and the time we publish this, we’ll go ahead and add that to the blog.

Keith Aasen: That was fantastic. The power of doing things in the future.

Justin Parisi: That’s right. Time machine. Alright, well Keith, thanks again for joining us and talking to us all about the latest release of ONTAP, ONTAP 9.14.1.

Alright, that music tells me it’s time to go. If you’d like to get in touch with us, send us an email to podcast at netapp.com or send us a tweet @NetApp. As always, if you’d like to subscribe, find us on iTunes, Spotify, GooglePlay, iHeartRadio, SoundCloud, Stitcher, or via techontappodcast. com. If you liked the show today, leave us a review. On behalf of the entire Tech ONTAP podcast team, I’d like to thank Keith Aasen for joining us today.

As always, thanks for listening.

Podcast Intro/outro: [Outro]

 

Leave a comment