Welcome to the Episode 354, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”
This week, Semion Mazor (semion@netapp.com) and Cecile Kellam (cecilek@netapp.com) join us to discuss the new addition to BlueXP Cloud Backup – FlexGroup volumes! Now you can backup, index and restore your high file count environments at a better speed, efficiency and simplify restores with search functionality.
For more information:
- Cloud Backup and FlexGroup Integration: Petabyte Backup Solved
- Fighting Ransomware with NetApp BlueXP Backup and Recovery
Tech ONTAP Community
We also now have a presence on the NetApp Communities page. You can subscribe there to get emails when we have new episodes.
Finding the Podcast
You can find this week’s episode here:
I’ve also resurrected the YouTube playlist. You can find this week’s episode here:
You can also find the Tech ONTAP Podcast on:
I also recently got asked how to leverage RSS for the podcast. You can do that here:
http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss
Transcription
The following transcript was generated using Descript’s speech to text service and then further edited. As it is AI generated, YMMV.
Episode 354: FlexGroup Volumes and BlueXP Cloud Backup
Tech ONTAP Podcast Episode 354 – FlexGroup Volumes and BlueXP Cloud Backup
===
Justin Parisi: This week on the Tech ONTAP Podcast, we talk about FlexGroups and where they fit in to BlueXP Cloud Backup.
Podcast intro/outro: [Podcastintro]
Justin Parisi: Hello and welcome to the TechONTAP podcast. My name is Justin Parisi. I’m here in the basement of my house and with me today on this very new year, we have Semion Mazor. So Semion, what do you do here at NetApp? How do we reach you?
Semion Mazor: Hi, Justin. Thank you for having me. I’m Semion Mazor. I’m a product marketing manager in the NetApp Innovation Center in Tel Aviv, and you can reach me at semion @netapp.com Or on my LinkedIn.
Justin Parisi: All right, excellent. Also with us here today, we have Cecile Kellam. So Cecile, what do you do here at NetApp? How do we reach you?.
Cecile Kellam: Hi, Justin. Yeah, thank you for having me. I am part of a small team here that covers a group of products we collectively refer to as the Data Services that are very use case based. And we kind of serve as a go between, between Semion’s team of amazing developers in Tel Aviv and our customers and the other sales teams here to make sure we continue to grow these products to be best in class enterprise grade solutions for our NetApp customers.
Justin Parisi: All right, so you may recognize Semion and Cecile from previous podcasts where we talked about BlueXP, and we’ll get to that. But first we wanna talk about the Data Services piece of this – the actual data piece of all these solutions. And really what we’re referring to here is big data, right?
Like giant data lakes or large repositories of data. So to start off with, Semion, what sort of industries do you see out there that have the most data problems or large data sets that they need to deal with.
Semion Mazor: Yeah. So we used to talk about big data, like something new. Today it’s all over.
Any company has lots of data and when we say "Data," we mean big data. But specifically I think we can talk about EDA – electronic design automation companies, media and entertainment, oil & gas companies, companies that dealing with machine learning and AI manufacturing and automotive.
Those are several examples of industry that creates a huge amount of data that they need to maintain and also protect.
Justin Parisi: Yeah, we’re also looking at things like healthcare, like being able to store medical images and that sort of thing. And I guess that kind of ties into the AI/ML piece, because they’re starting to use that more and more today.
Those are very large data sets, but really what we’re talking about when we talk about big data is unstructured data, data that has a lot of sprawl. Like you don’t really have a lot of organization. There’s folders everywhere, there’s files everywhere. It’s very hard to manage.
We’re not talking about structured data, which is more like your databases and that sort of thing. What sort of challenges are there for these companies with these large data sets and all that sprawl? What are they needing to do to deal with that stuff?
Semion Mazor: Yeah, so I think that the challenges, as you said, is NAS data, which is in many cases a lot of files, large or small, and you need a very efficient solution to host that data and to be able to call it and get response very quickly, very efficiently to support the operations of the company.
Cecile Kellam: In talking to customers now, it used to be that we talk about customers having maybe a couple dozen terabytes and now we’re talking to people with not just petabytes, but projected to have zetabytes of data out there.
And being able to manage that data and move that data is becoming increasingly more complex and just a huge strain on people resources and IT resources.
Justin Parisi: You can really buy a JBOD and just throw a bunch of disks at it and call it a day. Right? But the problem is not capacity. The problem is data sprawl. It’s management of that data. It’s storage efficiencies, it’s replication, it’s moving things in and out of the cloud. So NetApp is really approaching those problems in big data and trying to come up with different solutions for each of those. One of the main challenges that people have is the ability to back up and recover that data.
With such a large data set, with such important data, you wanna be able to back it up effectively. So what does NetApp offer today that allows you to take these giant data sets and back them up efficiently and recover them efficiently?
Cecile Kellam: The problem that you’re talking about, Justin, is exactly what we designed Cloud Backup Service, now referred to as BlueXP Backup, to solve for our customers. These huge environments and data sets that we need to be able to back up within a certain window, you know, to hit a certain RPO or RTO that our customers are beholden to in order for them to have that peace of mind, that if worst case scenario happens, they’ve got that additional backup copy that they can get their organization back up and running within those timeframes that they have to adhere to.
Justin Parisi: So you just threw a couple of acronyms at me, RTO and RPO. So I know what they are, but for those of us who do not know what they are, can you kind of enlighten us here?
Cecile Kellam: Yeah, absolutely. Your recovery point objective and your recovery time objective are both measures of how quickly you need to have that data recovered.
So there’s gonna be certain workloads that are not as important, that might have a longer timeframe that allow you to use a less efficient backup methodology. But for your critical workloads, those primary workloads, your RPO and RTO. You’re probably gonna want to be as close to zero amount of time as possible so you can have that backup copy back up into production so your business can get back up and running.
Justin Parisi: And really, when you’re talking about RTO and RPO, it isn’t like a monolith, right? Not everyone’s gonna have the same requirements, because every business is different. Every data set is different. Some are more important than others. So what does Cloud Backup do to help address those differences in needs?
Semion Mazor: Cloud Backup utilizes a lot of technology that NetApp developed for a long time, and take advantage of that to create a reliable, secure backup copy. And it works on a block level, means it doesn’t look on the whole file, but as a blocks and move them very efficiently. It preserve all the storage efficiencies that NetApp provides. It is incremental forever, so it means that it moves only the changed blocks of the files, and it doesn’t require any re-baseline. It’s also a direct backup, so there’s no media gateway that can be single point of failure and also can affect the security and the encryption, so here, it’s all end-to-end encrypted. And all of that makes it very, very efficient and be able to move large amounts of data in a reasonable timeframe, unlike other solutions.
Justin Parisi: So I guess my question more of was when you have some data sets that are more critical than others, do we have different levels that we can apply in Cloud Backup?
Do we have different RTO/ RPO settings where we can say, okay, I want this to be kept up to the minute, but these can be kept seven days or two weeks, whatever.
Semion Mazor: Yeah, sure. Part of the solution that you can apply the policy that you want and maintain the data according to the policy that you want to meet, and also, be able to tier the data and move it.
The data that is less critical to move it to archival tier and get much cheaper storage for them. This is use case mainly for long-term retention, when you need to have a copy in use cases for regulation that are relevant. So sure, you can manage the different types of data according to the needs of the company to support the business.
Justin Parisi: So you mentioned tiering and that that’s gonna be, I guess, FabricPool where you can do your storage efficiency tiering. What about presenting things as objects themselves? Like taking that NAS data and moving it somewhere where I can serve it out as an S3 object.
Semion Mazor: Yeah, so we have also tiering, of course, part of BlueXP. What I was referring to is having the object search and move that data through the archival tiers of the cloud services meaning part of the data. For example, let’s talk about AWS. So you can, from your ONTAP, change the data to object storage and part of it store on S3, part of that on Glacier, part of that on deep Glacier.
And depending on the frequency that you need, the data or the policy that you define just to store it on different tiers in the cloud provider.
Cecile Kellam: Yeah, to Semion’s point on the archival tiers, a lot of companies like oil, gas, healthcare, software development, the things that they’re building and storing are highly important to their organization.
And sometimes that backup copy might be a third or fourth copy, and they definitely need to have it, but they don’t need to have it sitting on-prem and taking up that valuable space when they could be more efficiently storing it in archival tier storage, which is just pennies on the dollar when you’re looking at that compared to traditional warm object storage, if you will.
Semion Mazor: Yeah. Think about a company like an insurance company or a company deals with legal, and they must keep another copy of the data for seven years or in some cases, even 25 years. And in most cases, they will not touch the data, but they still has to have it and be able to get access to it.
And when you have a lot of such data, it can be very cost demanding. So the ability to keep it in a secure way, but still to pay it as as low as you can is very important. And we have customers that did research on what are the best option and found this option using BlueXP Backup as the better TCO they can get for such use case.
Justin Parisi: So, I mentioned big data and unstructured NAS and that sort of thing. And what this usually takes is lots and lots of storage, right? And with FlexVols, we can do things up to, I think now is 300 terabytes in 9.12.1. However, I mean there are data sets that expand well beyond that. And for those use cases, we need a bigger bucket.
So does Cloud Backup, does BlueXP support things like FlexGroup volumes?
Semion Mazor: Yeah, and this is a new announcement, very important announcement that recently Cloud Backup also support FlexGroups. And till now, the ability was to replicate data from one ONTAP to other ONTAP. And now with Cloud Backup, the option is to take the data that’s stored on FlexGroups and to change it to object storage and keep it on a second, third, fourth copy of the data in object storage in the cloud or on premises. I think this is very important announcement to keep this important information in object storage and basically also to align with the three to one backup strategy.
Cecile Kellam: To add on to that, over the past couple of years have been working on this products, the ability to support FlexGroup volumes has probably been the most widely requested enhancement that we have been working towards with our backup solution because we wanna be able to provide that almost don’t wanna say instantaneous, but the fastest possible backup and recovery solution for our ONTAP customers that are dealing with massive unstructured data sets and they need to be able to back them up. But currently, the only ways that they have to do that require just huge strains on a system or huge amounts of time to be able to.
Justin Parisi: So with the FlexGroup volumes themselves, are they targets, are they sources or are they both with this solution?
Semion Mazor: In this solution they are the source. It takes FlexGroups as a source storage on ONTAP and move it to object storage.
Justin Parisi: I guess the better question is what is the object storage? Is that any object storage? Is, does it include ONTAP or is it strictly like storage grid and AWS and that sort of thing?
Semion Mazor: So the target can be one of the major cloud providers and can be object storage of AWS, Azure or GCP Google Cloud platform. Or it can be StorageGrid. Later also ONTAP S3 will be supported.
Justin Parisi: It’s on the roadmap. Okay, cool. So ultimately a FlexGroup could also be a target because the S3 buckets in ONTAP are FlexGroup volumes. So there is that aspect of it. That’s great. I mean, it sounds like we’ve got a pretty good overall solution for these very large data sets.
I would like to ask, how is using BlueXP Cloud Backup unique in the data management capabilities? Is there an index that we can reference to? Is it easily searchable? How do we access that data for recovery?
Semion Mazor: It sounds like you has a pretty good familiarity with the product. So yeah, in terms of restore, besides the efficiency of the restore and the speed that it can provide, also BlueXP backup provides browsable and searchable catalog which is very important, especially for large scale.
So if you need to restore a backup that was done recently, you probably can remember what is relevant, but when you have a very large amount of data and need to access a copy that was backed up some time ago, this option to search for that is very useful and can help a lot to find the relevant information you need to restore.
Cecile Kellam: And BlueXP backup also provides you with the same workflow regardless of whether we’re talking about backing up to AWS, GCP, Azure, StorageGrid. It might be a little bit of difference in terminologies between your different clouds or a couple different endpoints.
But for the same, it allows our customers to take advantage of learning one workflow for all of their backup operations.
Semion Mazor: One more thing to add is about the innovation that was implemented here. So for each cloud provider, the product, the development process, used the native technology on the same provider for the searchable catalog, for the index catalog. So for each of them is cloud native technology implemented into BlueXP Backup.
Cecile Kellam: And the cloud native technologies and APIs is what really leaves that door open for us to continue to develop this in new ways that we might not have thought about yet, or have features and enhancements that our customers are always bringing to us as new challenges arrive.
And that’s part of the whole big story of having the BlueXP Backup, and by leveraging those native technologies anywhere that we can.
Justin Parisi: Yeah, I just realized that we didn’t actually tell anyone what FlexGroups are. So like we all know what they are. It’s like internal code name, right? So if you’re not familiar with a FlexGroup, it’s the large bucket solution or large volume solution with ONTAP. It actually takes FlexVols and stitches ’em all together into a single name space.
So it’s all transparent to the client. So you just basically have slash volume name or a CIFS share and you can access that data and we can grow up to 20 petabytes. It gives you up to 400 billion files. So really it’s just a way to be able to non-disruptively scale out your storage as needed, as opposed to having that hard limit that a FlexVol usually gives you.
Because even though we can go up to 300 terabytes now you still have a limit , right? So the FlexGroup, if you hit that limit, you can add more member volumes or constituent volumes and keep trucking along with your data sets.
Semion Mazor: Thank you, Justin, for making it more clear.
Justin Parisi: I was just doing some quick reading on it. That’s all I know about it. [laughs]
Semion Mazor: I don’t believe you. [laughs]
Justin Parisi: Anyway. So you know, the index piecing is huge, like being able to search quickly for data that’s in a large dataset, because I don’t know if you’ve ever tried to look through your own music files or video files or image files in your home computer. It’s really hard to find stuff, especially when you don’t name things appropriately. And we all know that end users aren’t going to always follow the proper naming convention rules, or the applications might not name things properly.
So you mentioned Cloud Backup and FlexGroup volumes being a source. Can Cloud Backup back up other things as sources, do they have to be ONTAP? Can they be, you know, third party storage? Can they be cloud providers or is it just strictly for a NetApp ONTAP solution?
Semion Mazor: This capability is titled and develop, especially for ONTAP, and this is also what make it so unique because we have the best integration and using all the best that ONTAP can provide. Now we’re utilizing it that also for backup. And in terms of the sources in the type of data and workload, so it can be NAS data that we already discussed, but it also can be a application, databases and VMs through integration with SnapCenter.
And it can also Kubernetes pods that are hosted on ONTAP.
Justin Parisi: Okay. So I guess that would be through Astra.
Semion Mazor: Astra has their own solution but they currently still not support CVO, so everything’s hosted on CVO is through BlueXP.
Justin Parisi: Okay. And does it also have Snap Mirror integration? Can we kick off Snap Mirror relationships through BlueXP?
Cecile Kellam: Of course, we can kick off Snap Mirror relationships through BlueXP. That’s one of the bread and butter of it, right? It’s a great little drag and drop set up within the BlueXP portfolio. And similar to as we spoke about the backup it’s gonna be the same workflow for setting up those SnapMirrors regardless of your source and target that you’re using within that canvas.
Justin Parisi: All right, so it sounds like BlueXP covers both unstructured and structured data then. We have the ability to take on all sorts of use cases. What about the SAN use cases? Is it only applications on NAS or can it also do things with SAN LUNs or NVMe namespaces? With NAS, it’s basically a data set that doesn’t have a lot of complexity there, right? So with a LUN, you’ve got an underlying file system that isn’t NetApp. There’s like an operating system attached to it. So a LUN is always doing Windows stuff or always doing Linux stuff. So to take a proper backup of that data, you either have to copy the files outta the LUN into somewhere else, or you know, snapshotting the entire LUN requires you to tell both ONTAP and the application that, Hey, we’re taking a snapshot. Get ready for that, so you’re not doing any operations, so I don’t cut something off in the middle of it and corrupt your backup.
Cecile Kellam: Oh, so our application, consistent backups via our SnapCenter integration. Is that what we’re referring to?
Justin Parisi: So there’s application consistency, and then there’s like the idea of like LUN consistency, right? Mm-hmm. The actual LUN itself has a different operating system that ONTAP has no idea about because it’s just doing its thing happily trucking along. And there may be some in-flight operating system stuff going on where if you’re taking a snapshot, it might chop something off in the middle of it. So you wanna avoid that. So yeah it’s both application consistency, but also file system consistency.
Semion Mazor: No, so currently what we do to handle the application no more is through in the integration with SnapCenter.
Justin Parisi: Okay. So SnapCenter would handle all that. It’s not really a BlueXP thing, so it probably handles SAN and NAS data just fine. It’s different approaches to taking those backups.
Cecile Kellam: Different approaches on the underlying, but for the workflow that the customer’s using, that doesn’t matter regardless of whether you’re talking about the SAN or the NAS.
Justin Parisi: Yeah. Yeah. BlueXP is basically a driver. It’s like, Hey, yes let’s do this guys, let’s all work together. It’s like the superintendent or the supervisor or whatever. .
Semion Mazor: I also want to add that it’s integration with SnapCenter, but the experience you have is through BlueXP Backup when you want to back it up to object storage.
The way to do that is through BlueXP. Same user experience as you back up any NAS data.
Justin Parisi: So, Cecile, you said the words 3-2-1 backup. I know I’ve heard that before. Can you go into a little more detail about what that entails and how Cloud Backup handles that.
Cecile Kellam: Yeah, absolutely. 3-2-1 a backup, or a 3-2-1 data protection strategy is the US government’s gold standard of backup, what they have the expectation of, and to elaborate on what that means, the three refers to having three different copies of your data. The two refers to having those copies on two different types of media. And the one refers to having one of those offsite. So in the instance that you have your on-prem systems and you’re SnapMirroring from one data center to your other.
That’s your one and two copies, and then having a backup copy sent to an object storage outside of a ransomware loop is gonna be that third additional copy in that location that is of a different type of media. Therefore satisfying the 3-2-1 data protection strategy that provides you with that peace of mind that in the event that a ransomware attack happens or meteors take out both of your data centers, you’re able to recover through your third copy that you have residing in object storage. For example, you could spin up a Cloud Volumes ONTAP on the fly if your data centers on-prem are are gone and be able to restore and get back up and running.
Justin Parisi: So Semion, what sort of advantages do I get with the Cloud Backup solution for that particular use case where I’m trying to get the 3-2-1 backup approach?
Semion Mazor: So first it’s allows you to get the different type of media, the two in the 3-2-1, because it changes the file format from the ONTAP native format to object storage.
Then to have the third copy of the data with BlueXP Backup, it’s very easy to do that. Especially if you are backing up to the cloud, you can easily create the third copy. And you don’t need to purchase hardware and maintain all that stuff. Just a very simple flow and you have the third copy of your data in different formats.
Justin Parisi: Okay, so when we’re talking about files and recoveries and backups and that sort of thing, we have to consider different approaches, whether it’s a full recovery of an entire volume or individual files or individual folder. So what sort of granularity does Cloud Backup give me?
Can I restore something very easily at the volume level and all the way down to the folder level and file level, or is that something that’s a little more complex?
Cecile Kellam: That’s a great question, Justin, because, you know a backup solution’s only as good as its ability to restore it, right? So we do have the ability to restore at a volume level all the way down to a file level.
Semion mentioned before that we have that kind of Google like search and restore functionality, which when you’re talking about large data sets, is absolutely invaluable for you to be able to find the actual file that you need and not have to browse through the likely millions and millions that you might have after you’ve been using a backup solution for some time.
Justin Parisi: Okay. And with that index piece, the search and find functionality, that makes it even easier cuz maybe we don’t want to restore 10 petabytes. Maybe we wanna restore, you know, a single file that’s really hard to find. And being able to know that file name and search for it very easily is gonna be crucial to that particular use case.
Semion Mazor: I wanted to add to what you said. Being a good backup solution is also to succeed in backing up all your data, and especially when we are talking about large data sets and in the context of FlexGroups, it is a real challenge. Most traditional backup solution today just can meet the backup window that needed to back up petabytes of data.
This is exactly the places that the advanced architecture of BlueXP Backup has the advantage because it’s just very, very efficient. Like the performance is crazy. So you can succeed in meeting the backup window that you need, even when we’re talking about petabyte scale.
Justin Parisi: Yeah. And that’s that SnapMirror backing engine. Right? So that’s the ability to do the incrementals at a block level with just the changes as opposed to taking an entire file and having to replicate it each time.
Semion Mazor: Exactly.
Cecile Kellam: Exactly. Nobody backs up ONTAP like the makers of ONTAP.
Justin Parisi: And it’s interesting too cuz when you’re dealing with something like an object store, when you’re trying to do S3 calls, there’s no such thing as appending a file. It’s either PUT or GET right? Or DELETE.
So making a change to a file is basically putting a new file in there. But this isn’t doing that. It’s taking the existing data and just changing it from a SnapMirror perspective, so you’re not having to have those duplicate copies of those files that you would with normal S3 or REST API operations.
Cecile Kellam: Exactly. No duplicate copies. Making it as efficient as we can, because even when you’re talking about putting it in a cloud object storage, you know that data’s still residing somewhere. You’re still having to pay for it, and it’s still doing things like consuming energy and making us less efficient and less green as a whole.
So being able to do it to an object store in an incremental forever block level way makes it so that you’re putting the least amount of strain on your system and moving as little as possible as required to do the job.
Semion Mazor: Yeah, and the traditional backup solution, they need re-baseline each interval of time and BlueXP Backup and recovery just doesn’t need that.
It’s incremental forever.
Justin Parisi: RIght. Like, you don’t have to do a reset of everything. You can just keep going provided you have those snapshots intact, right? Like nobody deletes the snapshots from under you. But as long as you have a common snapshot there, it should be easy to keep things going incrementally.
All right. So, the FlexGroup piece… how long has that been in BlueXP? Is that the last month or how long has it been?
Semion Mazor: This is something that was released a few weeks ago and actually this is the reason why we wanted to have this podcast is basically to spread the word for customers that uses FlexGroups for partners, accounts that their customers works with FlexGroup, that they will know that now they have the option to have a reliable backup for the large scale data, and to have it in another format easily and efficiently.
Justin Parisi: Yea, and I ask because we just had the holidays, so this kind of gets lost in the shuffle. When you release something in Thanksgiving or you know, American Thanksgiving, right. Or Christmas timeframe, people kind of forget about it. So it’s good to have a refresher in the new year to understand what is available out there that you might not have realized was there.
Cecile Kellam: Absolutely. And we’re just continuing to grow and evolve all these products. And this FlexGroup support is another example of that as we’ll be rolling out more feature functionality. You mentioned earlier deleting things, right?
So Data Lock is something that we’ll be rolling out with the future release that will be supported for FlexGroups and that provides you with that true SnapLock/WORM, write once/read many, you know, inability to delete things, but in your cloud. Which is something that has also been highly requested.
So I would encourage people to keep up with the What’s new section within the NetApp cloud docs to find all the greatest and latest feature releases for the products that we’re talking about as it’s a great place to find what we’ve just rolled out as well as the documentation and links to support that.
Semion Mazor: This is an important mentioning all the aspects related to ransomware protection and specifically ransomware protection of the backup itself. So we also have a great features there and as Cecile mentioned, there is also interoperability with SnapLock on the source and to create a warm data and lock the object themself in the destination.
And also on top of that to have alert if somebody tries to touch the object itself, so they cannot do that because the objects are locked, but, the user will get a notify that somebody try to do that. So there’s also a lot of good stuff in this aspects of ransomware protection and data lock and how we keep the data and the backup secure.
Justin Parisi: Yeah, and we’ll include a link in the blog for that as well, so you can access that easily. All right, Cecile Semion, thanks for so much for joining us today and talking to us all about the new Cloud Backup piece of BlueXP, including the FlexGroups option there.
So Semion again, if we wanted to reach you, how do we do that?
Semion Mazor: So you can email me on Semion@netapp.com or reach me on LinkedIn, Semion Mazor and I will be happy to discuss all things related to BlueXP protection, Cloud Backup, and any other topic that is related.
Justin Parisi: All right. And Cecile.
Cecile Kellam: Yeah, same thing. Please find me on LinkedIn, Cecile Kellam, or you can reach me at Cecilek@netapp.com and be happy to follow up with you and get you more detailed information wherever you might want.
Justin Parisi: All right. Excellent. Thanks so much for joining us and talking to us all about the new FlexGroup functionality within BlueXP Cloud Backup. All right, that music tells me it’s time to go. If you’d like to get in touch with us, send us an email to podcast@netapp.com or send us a tweet at NetApp. As always, if you’d like to subscribe, find us on iTunes, Spotify, Google Play, iHeartRadio, SoundCloud, Stitcher, or via techontappodcast.com. If you liked the show today, leave us a review. On behalf of the entire Tech ONTAP podcast team, I’d like to thank Semion Mazor and Cecile Kellam for joining us today. As always, thanks for listening.
Podcast intro/outro: [outro]