Welcome to the Episode 341, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”
This week on the podcast, we feature two of our NetApp Astra Control customers as they kick the tires on the new SnapMirror support in Astra 22.08. Join me and Astra Product Manager Hrishi Kermane (firstname.lastname@example.org) as we hear from SAP’s Jacob Jiang (email@example.com) and Hyland’s Casey Shenberger (firstname.lastname@example.org) and their thoughts on the importance of simple, fast, application-aware DR that can be accomplished with a few clicks!
Tech ONTAP Community
We also now have a presence on the NetApp Communities page. You can subscribe there to get emails when we have new episodes.
Finding the Podcast
You can find this week’s episode here:
I’ve also resurrected the YouTube playlist. You can find this week’s episode here:
You can also find the Tech ONTAP Podcast on:
I also recently got asked how to leverage RSS for the podcast. You can do that here:
The following transcript was generated using Google Cloud’s speech to text service and then further edited. As it is AI generated, YMMV.
Episode 341: NetApp Astra Control DR – Customer Perspectives (SAP Conversation)
Justin Parisi: I’m here in the basement of my house. And with me today, I have a couple of special guests to talk to us all about NetApp, as well as Astra, but most importantly SAP. So with us today we have Jacob Jiang. So Jacob, do you do at SAP and how do we reach you?
Jacob Jiang: Yeah, you can reach me at my email address, email@example.com.
Justin: All right. And what do you do over at SAP?
Jacob: I’m the storage engineer in the SuccessFactors BU.
Justin: What exactly is SAP? Like, could you tell me a little bit about what they do? If people aren’t familiar with, I know it’s a pretty big name out there, but there may still be people that aren’t really familiar with it.
Jacob: So actually, you know our BU, SuccessFactors, which is a few more results, SaaS provider for the customer or end-user. My job is to maintain those things and seeking for the solution for the DR, the backups and also the monitoring those kind of stuff and make sure everything went smooth.
Justin: So it sounds like SAP, the business unit you’re in, is more like
HR as a service, right?
Jacob: Yeah, yeah. You know, my BU is. Yes, yes. My LOB is this one. So, for the other LOBs, I’m not sure, but actually, I’m in the SuccessFactors BU.
Justin: Yeah. It’s a very large company. So I wouldn’t expect you to know everything about SAP but…
Jacob: yeah, yeah. There are a lot of, you know, lots of lots of BUs inside SAP. We are providing, you know, multiple different SaaS functions, to different customers like Ariba, like HANA, like, BYD. SuccessFactors is just one of them, you know. Yeah, it’s very large company.
Justin: Yeah, I know for a fact that SAP has been a long time customer for NetApp when back in the days when I worked in support used to work with SAP directly. So you know, long-time NetApp customers, and that’s great. Also with us today, we have Hrishi Keramane. So Hrishi is here at NetApp. What do you do? And how do we reach you?
Hrishi Keramane: Hi Justin, thanks for having me back on the show.
I’m one of the product managers in the Astra family of products, primarily focused on NetApp Astra Control. You can reach me via email at firstname.lastname@example.org.
Justin: All right. Excellent. So from the sounds of, it looks like Jacob uses a lot of NetApp, a lot of ONTAP, probably mostly specifically with Cloud Volumes ONTAP, and then on-prem products as well. So, Jacob in your time SAP, what made you decide to choose NetApp for these types of workloads.
Jacob: The NetApp is, to be practice easy to use and more stable. Also, the support is very excellent. So that’s why we choose to the NetApp as our standard storage solutions at the moment.
Justin: So we do have a product suite called Astra, that’s why Hrishi is here. So before I talk to you about Astra in SAP, Hrishi, can you kind of give us the overview of the Astra product suite? Because it’s not just one thing, it’s multiple things.
Hrishi: Absolutely. So yeah Astra as a portfolio has basically looking to bring application-aware data management to Kubernetes, right. Now, we started out with Astra Trident which was Trident and later got renamed, but that is a key cog of our portfolio. Trident is a CSI provisioner, one of the first. It has been out there for six years. Trident enables Kubernetes applications to consume storage seamlessly, irrespective of where and what form factor of ONTAP storage and NetApp storage they’re running, right? CSI is the beginning – container storage interface – and all of the products build on top of Trident. So, Astra Control – which is our application aware data management suite – provides protection, backup and recovery to rich Kubernetes applications and workloads both in the public cloud and on-premises. So it enables data protection, disaster recovery, and also brings in mobility, which is a key use case for Kubernetes workloads for our customers who want to move their applications across clusters, both on-prem and to the cloud. So the hybrid multi-cloud is real, right, but Astra Control leverages key NetApp technology, like snapshot data, replication, cloning, all that customers like SAP and Jacob have come to use and trust over the years. That’s Astra Control in a nutshell. And Astra Control, Astra Trident together, forms the Astra portfolio.
Justin: I assume that Jacob is familiar with Astra. You’ve heard of it before, you probably looked into it a bit. Is that, is that accurate, Jacob?
Jacob: I’m familiar with Trident for Astra, but I’m the newby here.
Justin: Okay. That’s fine. That’s why you’re here, right? So from what you heard reach you talk about. I mean you you’re familiar with Trident, you know what that’s all about. It’s pretty straightforward, right? We need a way for Kubernetes to talk to the storage, whether it’s NetApp or whatever. What about the Astra Control suite? What sort of things that Hrishi mentioned kind of drew your attention the most.
Jacob: Oh yes. So actually you know for SAP for the SuccessFactors we address from ourselves from the traditional solutions to the SAP Kubernetes now and that’s why we are using the Trident, you know and more and more functioning modules are going to the Kubernetes now. So what we are seeking here is that the back up and the DR solutions. Because, you know, we are moving to the Trident right? Traditional backup and traditional DR no longer suits us anymore. So that’s why we are seeking for the new solutions and Astra seems like very suitable for us.
Justin: Yeah, it sounds like, with the new Kubernetes environment you’ve got, you’re looking for something that’s both application-aware and able to integrate easily with Kubernetes. So, Hrishi, how does Astra fit in there? Like, what does it use to make those things easy?
Hrishi: Astra Control at its core is application-aware data management. So there are two parts to it – application awareness and application consistency. So, like Jacob mentioned, as these monolithic applications are getting containerized, standard or like traditional ways of backup and DR don’t scale very well – especially with the different logical components that containers bring in. You need a solution that takes an application as a whole, including all those logical pieces, all those volumes together and manage them at that granularity. And that’s what Astra Control provides, it provides the application-aware, application-consistent suite, or solution, to protect and move your data rich Kubernetes, right. And what it does is, like I said, leverages a lot of NetApp’s core data management technology which we have come to known for over 20 years and builds on top using Trident. Trident enters our gateway, right, and most customers are used to leveraging Trident for their applications. Now Astra Control provides this additional layer on top, which provides application data management to backup and DR like Jacob was mentioning. It actually goes one step forward by also allowing you mobility. Now, once we know how to do backup, right, being able to put it into a third object store, we have the capability to move applications between clusters. So that’s the core value prop of Astra Control.
Justin: So I know in my experience with dealing with Astra Trident, you know, setting up containers and CSI drivers is there’s a lot of manual labor there. There’s a lot of JSON files, you gotta modify config files. With the data protection piece integrated into Astra Control, are you able to do all that through a GUI and then it modifies the JSON files in the back end for you? Or are you still kind of dealing with JSON files manually?
Hrishi: It’s allows you flexibility, too. In that sense, Astra Control allows you to use a GUI and do everything that you do through a JSON YAML or kubectl, right? But if you are Kubernetes consumer, you can do a lot of it the way you’re used to. So you have both options, you can go back to your YAML and kubectl ways of deploying and managing, and we also provide APIs. It’s it has a rich suite of REST APIs. It caters to customers’ preferred way of management.
Justin: So Jacob, what’s your preference? Are you a kubectl/CLI guy, or are you strictly GUI? Do you like to use the GUI?
Jacob: GUI, That’s for sure. But actually you know we also want some kubectl command and the interface for us so that we can integrate our scripts because maybe in the future, we will monitor by ourselves. So if the NetApp can expose some APIs or RESTful APIs for us to monitor it and integrate it with our current monitoring system, that will be great.
Justin: We’ve talked about how you manage the Kubernetes piece. So let’s talk about the data that you’re actually using with Kubernetes. Now I remember, like, back in the early days of containers, I don’t think people really considered data to be stateful, right? It was all ephemeral, didn’t really need it, you can get rid of it anytime. Now, you’re looking at more workloads needing those datasets to be protected. So, tell me about how you have these data sets laid out, are they all in a single volume and then everything points to that volume, are they across multiple volumes? And how are you handling all that with Kubernetes?
Jacob: So, the PVs and the different volumes and we have different namespace. Each of the namespace is for the different modules, and different modules will have their own set of the PVs. So, they will put the database, which means that they for will pause that. That’s why we need the protection for those volumes.
Justin: And what are you using today to protect them? What’s traditionally what you’ve been using for backup and recovery with this particular workload?
Jacob: Well, the traditional one we are using SnapMirror for the data replication. But for backup, we are still using the EMC NetWorker. But, you know the EMC NetWorker is not suitable for Kubernetes area. It’s good at the VM-based or filesystem, which means that the database go to the backup storage by the dd boost. That’s okay, but whenever we move to the Kubernetes area is no longer true. It’s not suitable for us anymore.
Justin: And why is that what’s the holdup with using NetWorker and Kubernetes? Like what’s the deal with that?
Jacob: The limitation is that the EMC NetWorker is not designed for the Kubernetes. It’s long back ago, it’s a Legato, right? So EMC acquired it maybe twenty years or ten years ago. You know, when it happened it has no Kubernetes, right? So yeah, it’s not suitable for that.
Justin: Are you putting all this on tape? Is that a tape backup?
Jacob: Yes, it’s a rotated back up but actually, you know, the EMC NetWorker cannot backup the Kubernetes objects. I will say like config map, like port, like secrets, right? Because backup those kind of things.
Justin: So what you’re basically saying is, you know, we should have data sets, those are important, but it’s also very important to be able to quickly stand up a Kubernetes cluster that might have tanked on you. Right. Not to say that’s never happened to somebody, but being able to quickly spin one up and have a Kubernetes ready to go, just by using your backup and recovery software. So, Hrishi, how does Astra solve that problem?
Hrishi: Astra manages, or backs up, the application as a whole unit. So, you’re not backing up a single PV, or a volume. So once you manage an application as a unit, you can back up to a object store periodically through scheduled or manual operations, and once you have those back ups and you choose a recovery point, you get the entire application. I’m gonna step back a little bit. When you take the backups, those backups are also application consistent. You provide admins flexibility to define how the consistency is achieved by execution hooks, right? Where they can provide their own scripts. Once that scripts are provided Astra, every time it takes a backup, it executes those curves to give you a consistency Point across the application. All the PVs, all the Kubernetes objects, right? And then backs it up as one unit. So, when we restore it, you’re not scrambling to say, “these are my ten PVS,” right? These parts are running in this namespace and so forth and stitch them together, but Astra knows how to do all of that for you. So it becomes a single button operation.
Justin: And it’s still leverages SnapMirror, so we’re actually SnapMirroring volume data and… are we SnapMirroring the Kubernetes data as well?
Hrishi: That’s actually hot off the press, right? It just came out today. So, since the day we’ve been talking to customers like Jacob from our first release, this has been, across-the-board, the most commonly asked feature, right? Like Jacob said, they are already using SnapMirror for their PVs, but they have to manually stitch them up on the DR site and bring the application to PV mapping. But with this release of Astra Control, we are integrating SnapMirror into applications, which is basically, we are taking our DR… we are actually bringing in business continuity with the application-level mirroring, where all your volumes are mirrored to your secondary cluster using SnapMirror. So this provides a level of business continuity with really low RPO and RTO. In that sense, everything is available on your secondary cluster or secondary site and ready to go at a click of a button to restore your application as opposed to like a back-up which can take hours to restore.
Justin: And that secondary site doesn’t have to be an on-prem site. That can be a cloud instance, right?
Hrishi: Exactly. And I think that’s where Jacob’s use case is very relevant. We are actually doing a POC with Jacob right now. Where we are using Kubernetes clusters running in the cloud, using CVO back end. So this is where the hybrid multi-cloud comes in. So you can SnapMirror across clouds. We can see SnapMirror across CVO, instances running in any hyperscalar, and also on-prem to cloud, which unlocks the whole the DR to cloud use case.
Justin: So Jacob, you know, I know that SnapMirror is good for disaster recovery and back up and that sort of thing. What about using it for a data mover… do you use it to move things around to different clouds to allow your end users to be closer to the data?
Jacob: You mean the migration?
Justin: Yea, data migration, or maybe you know, let’s say like you have a site in Singapore and you want to move it to a site in Germany, right?
Jacob: Yes. So we are using the SnapMirror for the data migration as well, that’s true. Yes.
Justin: With this Astra Control piece, is it only for backup, or can we use it in those data migration use cases? Can it simplify those migrations?
Hrishi: Yeah, absolutely. Not only simplifies, right. It also reduces your maintenance window.
Jacob: Yeah, that’s true. I agree. So actually, you know, when we talk about DR solutions, we the major concern we have is RPO and RTO. So, actually RPO here is the difference of data, right? So here we are talking about the RTO – the recovery time. Without Astra, everything will do the manual, which means that it will take longer time for us to do the restore. But with something like automation in the Astra, it’s just a one click button and it will save time.
Justin: That’s not the only time saver there. That’s also time Savings with SnapMirror, because it’s faster overall to move data and then you have the incremental pieces that are faster in general because you’re not moving entire files, right?
Hrishi: Exactly. And you need a downtime for only moving that last incremental copy in your cutover phase, right, so that can be in minutes. So you can really plan your migration, like you are alluding to, right, with minutes of downtime across sites.
Jacob: Exactly. So without Astra we have to hop on the target side to manually run the SnapMirror break – quiesce and break – and then go to the Kubernetes cluster and run the Trident import to import those PVs and we have to manually bring up the pods – kubectl those pods and map the PVs and then the last service. If we using the Astra, we just hit button and everything will goes underneath.
Hrishi: We actually have a feature for that – call it reverse replication. So it actually moves your application over to the other site, instantaneously, but also if you want to replicate it back, continuous writing or syncing back to the original site.
Jacob: In such case, without Astra if by human, it will become like SnapMirror resync, you have to run by yourself and run tridentctl input this or something like that.
Justin: So Hrishi I had another question, we can replicate data with SnapMirror… are we replicating those Kubernetes configurations so that we can automate standing up the cluster as it was on the other site very quickly?
Hrishi: Great question. So yeah, we talked a little bit about application consistency, the same concept is extended to DR, right? Every time we do an update of a SnapMirror, we actually take an application consistent snapshot. Remember that hooks and all that orchestration we do in Astra, we still do that. And then we take snapshots which are application consistent, number one. But we also go ahead and at the same time frame, take application objects. So you have not only volumes, which are replicated application consistently, but also the app objects, which are many, right? With Kubernetes, you have logical objects. We take a copy of all of that, store it in Astra. So when you’re ready to fail over, you get all those objects ready and restored instantaneously, right? That’s what Jacob was alluding to with a single click of a button, you can now have your app with that consistency point.
Justin: So Jacob, let’s kind of just wrap this up by asking you, what is your wish list? Like you’ve heard about what we have now. So what sort of things are you looking for to be added in future releases?
Jacob: So our feature list is something like if the Astra can expose more things for the monitoring, then that will be great. Because, you know, what we want is something like integrated with our Splunk, or if it can expose some of the API for us so that we can do some automation and monitor it.
Justin: Hrishi, I’m sure you’re taking these notes down or you’ve already gotten these notes.
Hrishi: Yes, we actually have features written up for these, these are great points and we have heard them across the board. So we have open metrics end points, which can be integrated into customer environments. We are enhancing them to keep pace with the feature development, obviously, but we also have a rich suite of Cloud Insights monitoring and telemetry and advanced insights integrated as well. It’s a start, but we need feedback from customers like Jacob on what’s the direction we want to take it.
Justin: All right. Excellent. Well, I won’t take any more of your time. Jacob. I appreciate you coming onto the podcast and talking to us about your Astra experiences as well as your NetApp experiences. Again, if we wanted to reach you, how do we do that?
Jacob: Yeah, you can email me.
Justin: All right. We’ll include that in the blog. And Hrishi?
Hrishi: You can reach me on email at email@example.com
Episode 341: NetApp Astra Control DR – Customer Perspectives (Hyland Conversation)
Justin Parisi: We have Casey Shenberger from Hyland here. So Casey, what do you do at Hyland? And how do we reach you?
Casey Shenberger: I am a cloud platform architect at Hyland, and I have been a primary storage administrator for ten years here now, so I manage all the storage for our hosted environments, basically, just email Casey.Shenberger@hyland.com.
Justin: And also with us today we have Hrisihi Keramane. So what do you do at NetApp and how do I reach you?
Hrishi Keramane: Hi Justin, thanks for having me, again on your show. This is Hrishi. I’m one of the product managers for the NetApp Astra portfolio. I’m primarily focused on Astra control and its close integration with ONTAP and bringing those goodness to Astra. You can reach me at Hrishi@netapp.com.
Justin: All right. So, before we start talking about NetApp and Astra, and all that good stuff, we want to talk about Hyland, so Casey, can you kind of give us an idea of what Hyland does? Like, what sort of things that are into?
Casey: Yeah, Hyland is an enterprise content management software company, we write OnBase, we have ShareBase by Hyland also have content on several different ECM suites and software for paperless office.
Justin: Ok, can you kind of give me an idea of, I don’t know if you have public customer references or anything, but just kind of an idea of what sort of things that your customers use Hyland for.
Casey: ShareBase is sort of like a dropbox, like an employee file sync and share, thoughts on based lots of companies and from all different sectors. It started with banks and check images. So, when you write a check and you wanted to go and see an image of that, check thought those were there and your bank would let you use OnBase to access images of that. We’ve now worked on to, you know, workflow and business continuity. We have lots of healthcare customers who use it for patient record stores, that kind of thing.
Justin: So it sounds like you deal with a lot of sensitive data that requires a lot of compliance regulation pieces, right?
Casey: Yeah, there’s a lot of sensitive data for sure, and all sorts of different regulatory compliance that we have to meet.
Justin: You are in NetApp customer. So I’d like to understand a little more about why you chose to use NetApp and what sort of NetApp products you use already.
Casey: Hyland is kind of split into two pieces. There’s the internal IT staff. They are also a customer. Hyland as a whole is, but I work in the hosting department where we host the software that we write. We chose NetApp originally way back in two thousand, I think five or six is one of the first start using that out. I started here in 07, we were already a NetApp customer. We chose NetApp for it’s reliance. And, you know, block-level replication of even files because the software that we write and handle has hundreds of thousands and millions of tiny files, so that causes lots of issues if we try to replicate via file workloads. We did move away from NetApp for a little while. We were trying some other things to do a little bit differently and we came back. We use nearly everything in the portfolio, obviously ONTAP, we SnapMirror. We have some SnapLock storage that we use. We are a StorageGrid customers as well.
We have a pretty decent StorageGrid implementation. We use that for Splunk logging as well as data storage. We are looking right now to also start using SnapCenter. We looked at it before and off and put it to the side. We use Cloud Volumes ONTAP, and AWS, for similar replication reasons and performance. And obviously, we are Trident users. We started using Trident quite a long time ago… 2017, early, 2018, maybe somewhere in there. So it’s pretty early on with Trident.
Justin: So, tell me a little bit more about your Trident use case. Are you deploying this strictly with a Docker implementation and containers, are you actually rolling out full-fledged Kubernetes in your environment? And that’s why you’ve got Trident in there.
Casey: We have a full-fledged Kubernetes. We got a few different workloads on containers. They need persistent volumes, but they don’t need any kind of replication that things like ElasticSearch, some of what we’re using Redis for. But yeah, it’s full, it’s Kubernetes and so we used Trident.
Justin: So you’re using Trident for your system, volume stuff you mentioned. You have some workloads that don’t require replication. Do you have workloads that you’re starting to implement with Kubernetes that do require replication? Or is that just something that doesn’t apply to your environment?
Casey: We have several workloads that do require replication, which is where we’ve been working with Hrishi and Astra Control, because today that’s kind of difficult. And it’s kind of difficult because the way that our software is implemented and our use case, we can’t use SVM-DR. So basically, we have a DevOps team who handles Kubernetes, and we give them the details they need for Trident, but they don’t have any access to the back-end storage. So when they need replication, they give us a good and we have to do a SnapMirror and they have to re-import it. If we do a DR failover and we have to reverse the replication and then once the replication’s reversed and they re imported on the production side, usually the good changes and then we have to go off and review all of that replication again. So it’s very difficult and that’s where we started down this road.
Justin: What I’m hearing is you don’t like doing things manually. Is that that correct?
Casey: Yeah, that would be correct.
Justin: Can’t imagine why. So Hrishi, you know, that sounds like a pretty important problem for them. What are we doing with Astra Control that helps that?
Hrishi: Well, it seems like all the bits and pieces they are putting it together as a puzzle, right? For containers, making it not just the data replicate transparently and be available and reverse like Casey mentioned, but also doing it with app granularity. Moving applications with all its constituents is where Astra Control brings in app aware data management. As you may have heard with the latest release that came out this week with Astra Control, we are bringing in replication based DR functionality, which is basically integrating it with SnapMirror like Casey mentioned. They’re already using it. But now we’re getting an Astra lens to implementing SnapMirror based DR. So what this would do is still manage your application as a whole – a single unit. But under the wraps, Astra would go ahead and protect all your volumes – establish SnapMirror relationships to the destination and provide you one button operations for everything that Casey mentioned. To begin with failover, reverse application when you want to do like a load balance, or move your application over while you continue to replicate the thing in a reverse direction, or maybe even migrate it over, right. In a true disaster, you’d fail over, resync, fail back. So, all these orchestrations – or workflows – which are basically a lot of manual steps today. How far down in a controlled way with Astra, on top of it, you also get the application consistency point, where we’re really allow execution hooks to come and play and give you the that consistent view across all your PVs or volumes. Plus also the same consistency in your Kubernetes objects. So one click operations with a lot of application aware and application consistent replication.
Justin: So Casey, I imagine you’ve already been kicking the tires on this. What are your first impressions with the new SnapMirror implementation in Astra Control?
Casey: Probably one of the biggest points that Hrishi brought up and it’s going to be, you know, the biggest benefit to us is not only is it automated and much easier, but the application consistency. Because the way it works today, we don’t necessarily take the snapshot of all of the PVs for a single app at the exact same time. And we have to do some work to make sure that we’re in the right spot when we do come up DR, so that’ll be a big win for us. Like we can fail over an application and everything will be consistent. We won’t have to do that other than the reduction of time, the simple application that has three PVs, maybe takes us right now about 45 minutes just to do a one way failover and then we have to do some prep and then we want to fail it back to the production site. Let’s say we’re just doing a DR test, then we start that process over, right? That’s another 45 minutes to fail it over and then the prep work again. This will cut it down to a couple of clicks and then minutes on the back-end for everything to get done. So that’ll be huge for us.
Justin: Yeah, I imagine your test consists of doing, you know, an initial replication, a couple of additional replications to get everything up to date. And then that cut over is where the actual test happens. Is that is that about right?
Casey: Yeah. So basically they will stop the application in Kubernetes, right? Once the application’s down we’ll force some replication and then we’ll break off the SnapMirrors and that will bring them live on the other side, they’ll restart the application in the DR site and then we have to re-replicate the data, you know, one volume of time like one PV is – each PV is a volume – and then once that’s re-replicated, right? Then we’re ready for failback and they can start their application testing. I guess another thing I didn’t necessarily mention is the workloads were talking about here, they’re very time-sensitive. So there may be 30 to 40,000 document changes in an eight-hour period. In any given minute, there’s a lot of these and while the application is down,
that means somebody’s making these changes on paper, right? They failed to do a paper process and then when it comes back online, not only do they have to start doing digitally again, but somebody has to enter the paper data… that work has to get done. So every minute that we can crush this down is a huge win.
Justin: And how long did you say that cut over was?
Casey: Last time we did one, it took about 45 minutes to get everything cutover, and there’s more to the app then just this Kubernetes piece. But all in total – and the Kubernetes piece is probably a good twenty-five/thirty minutes of it, just based on prep, getting it failed over, making sure everything’s good, bringing the application back up, making sure the application is in a consistent state, all that stuff.
Justin: And this was pre-Astra Control, right?
Justin: And have you done any tests post-Astra Control with the new data protection piece? How long has that taken?
Casey: We have not yet done it with our application. We’re actually finalizing the install right now to do all of that testing, but we have done tests with Hrishi and his team online and in demos that are basically the exact same thing and they take a couple of minutes, right? We can do a 15-minute demo and fail it over multiple times.
Hrishi: And then just to put in context right now all the steps that Casey walked through with Astra Control, it could be actually just three steps. And the first step would be they stop their application. Click that one button first and reverse replication. It would do under the wraps everything that Casey mentioned and take a last snapshot, update it, cutover, break, bring it online, setup replication in the reverse direction. That’s a lot of things, right? For step two. And then step three would be when they’re ready, they would fail back with another reverse replication, which does all of this again. The good part is while all of this is going on, you’re always replicating in one direction. So there is not even a small window of time where you have lost replication, when the app comes up and that, as Casey said, in our demos, we got a couple of minutes. We’re curious to try that out and bring it up, so that’ll be interesting.
Justin: Yeah, I would imagine that a few minutes is a lot less time to hold your breath than 45 minutes, so… So as far as the Astra Control piece goes, it’s not just about reduction of time. It’s also about repeatability of the steps and making sure that all the steps are done correctly every time. And when you have more manual things in your process, there’s more chance for things to break because maybe somebody fat fingers or something, or, you know, doesn’t run the steps in the right order. So Casey, I imagine that your DR plan today, before this new update, has a pretty long laundry list of things that you have to do. That’s going to get a lot shorter now.
Casey: Yeah, it is a laundry list. We have a lot of it automated with Ansible. But because the way we do it and it still requires a storage engineer or storage administrator, somebody who understands the underlying storage to be involved. That’s the other benefit of Astra is like I said, earlier, our DevOps team is who manages Kubernetes and with Astra Control, we were going to be able to hand this over to them. So when the failover occurs, they won’t need a storage administrator to be involved. They will just go into Astra Control, they can perform the failover on their own. And storage administrators, if it is a true disaster you have lots more apps than just this Kubernetes thing, right? Storage administrators can be involved in the actual other issues that are going on or maybe really require their attention. So there’s that benefit as well.
Justin: So are you running homegrown Kubernetes deployment? Or are you doing like an engine of some sort or you’re running it strictly in the cloud? How are you managing the Kubernetes aspect of this?
Casey: It’s all Rancher.
Hrishi: Yes, it’s Rancher from what I’ve seen, two datacenters where they’re on-prem at this point and they’re running Rancher, so we are setting up SnapMirror across two ONTAP instances. But Casey also mentioned some CVO in the future that you would consider. So we also support SnapMirroring for DR to Cloud, which might be a future use case for you?
Casey: Yeah, I believe that will be a future use case for us. It just so happens that today that this particular workloads that were mostly talking about happen to be in physical colo facilities.
Justin: Case, I don’t know if you handle the setup of this or if Hrishi and his team did, but as far as integrating Astra Control with Rancher goes, what was the process like?
Casey: That would be Hrishi and our DevOps team did all that. I was not very involved.
Justin: So Hrishi, walk me through that… tell me how that all goes. How simple is it to integrate something like Rancher into Astra Control?
Hrishi: Astra Control is Kubernetes application, right? So you would deploy it like any other application onto Rancher. In this case, Chad, that we were working with all the DevOps team at Hyland, he has two Rancher clusters up, both of them, have their Trident configurations done, and are talking to the ONTAP instances. At this point, we just need ACC images updated to a local repository, in this case, Hyland is using an ACR repo and then it’s couple of steps where you populate, a, if you’re familiar with Kubernetes, you’d have a couple of YAMLs, the operator YAML and admin YAML, provide some configuration details, like the analysis and so forth, right? And apply the YAML. A simple as that. So where we are right now is basically ACC is deployed, there we are just getting ready to get into the next steps to configure SnapMirror and start testing it out. That’s as simple as it was to bring an instance of ACC. Now, I remember ACC is a control plane, which is managing multiple Kubernetes cluster, so you only need one instance of it running across all your datacenters. Ideally you would want it in a central place, which has a different fault domain and isolation from your primary and secondary clusters. But you could also be running written, direct instances in both your primary and secondary. So, we are exploring all these different combinations. Then as we go through this video scene with Casey and team, we will explore or demonstrate those capabilities.
Justin: Okay, so it’s as simple as pulling the images and creating a new pod. That’s it right?
Hrishi: Yeah. Apply the YAML to get your pods running.
Justin: All right. Cool. Now that YAML, is it a manual process to edit it in the text editor or do you have like a GUI that kind of helps you do that? How does that work?
Hrishi: So the YAML editing would be any text editor of your choice, right? So you read that. I mean there’s not a lot of editing that you need to do. It’s very much down the alley of a Kubernetes admin but we just need a couple of options to be set to see what kind of ingress they’re using. And what’s the domain name? And that’s pretty much it.
Justin: Casey, you know, you’ve talked about how your DevOps guys have been handling the Kubernetes deployment. So I would imagine that you’re familiar with Kubernetes, but you’re not like an admin. How straightforward has Astra Control been for you in this process?
Casey: Yeah, my involvement has been really straight-forward. We have a good set of DevOps guys so you know, we’re pretty tight with them. My involvement has been, they need me to set up Trident. I did that. They need me to, you know, help them, get some code or make sure that we have the right kind of access. And so it’s been very simple for me. And I think it’s been pretty simple for that team as well. Right, like Hrisihi said, it’s been pretty easy. Probably our biggest struggles that we had were with some firewall rules and routing stuff that we didn’t have in our test environment. That was probably the biggest struggle. So nothing really Astra related.
Hrishi: Just to add to that from the beginning, focus with Astra has been dedicated to these different personas like Casey alluded to. We want to make it consumable two app admins, and they don’t really have to worry about the storage side of things. Once Casey sets it up, and does the pre-setup like peering, they can manage it at an application level. Hopefully, it helps in a bigger disaster or if you need to be involved you can do different apps at the same time but if you’re going to do a DR test, the app admins can do go do it themselve, right? That’s the value prop.
Justin: Yeah, I imagine in a disaster scenario, the last thing you want to be worrying about is how to recover everything, you just want to be able to do it.
Casey: Yea, for sure. That’ll be a big thing for me. Like I said, we have a very small storage admin team, so when it comes to a disaster we need to do a lot of replication, breaking, and failover. If it was like a true site disaster, you know, we would need all of our storage guys doing that, and the fact that we can hand off these applications that are running Kubernetes to the DevOps, and they can just do their own failovers? That’s a big win because it frees us completely of all of that stuff.
Justin: Yea, and it frees you up to focus on other things, that you need to be responsible for in the environment. So it really just moves the role to the proper teams.
Justin: So Hrishi, tell us a little bit more about why we chose to support the data protection piece for Astra. It sounds pretty self-explanatory, from what we talked about with Casey here. Are there any other reasons that you might have implemented this?
Hrishi: So that’s the primary reason, of course. Since Astra is an app data management and providing protection and backup and recovery and DR, it was a natural fit for all of NetApp customers who are using ONTAP. And many of them, like Casey said, are already using SnapMirror, or using some way of replicating. In some cases, they’re using SVM-DR. So this was definitely the next step on our journey to integrate the goodness of ONTAP. And that’s what we bring today and make it really simple to consume. You have the disaster failover capability but then DR test is a very common use case like Casey said, right? So you want to be able to do it and make sure your DR readiness are tight. These are the two things we alluded to a little bit, but there’s also DR to cloud, so that’s kind of an important use case, in the sense of being able to have your Kubernetes clusters run anywhere. So that unlocks the hybrid multi-cloud use cases. So it could be running CVO instances and be able to SnapMirror across and replicate your apps. An extension to that is also migration, although it’s not the first use case you’d think of when SnapMirror or DR is talked about. Once you have set this up, you can have a migration story with really small maintenance window, because all your data is protected, right? Your RTO is really small, like, Casey said, a couple of minutes, right? So you could migrate your apps – even heavier apps, which have a lot of data within minutes of your outage window. It can be extended to be seen as a migration tool, as well.
Justin: Casey with these types of workloads that you’re using with healthcare images, or financial images. I imagine there’s multiple remote sites that have to access these things. Does that data, migration story, help you there? Does it localize things to make things faster? Or is it something that you aren’t really currently looking at yet?
Casey: Today, we don’t use that because our applications are not designed – let’s say they’re not designed for multi-site. However, we do have some applications that we’ve talked about making the multi-site where there’s no more primary and disaster site. But today, that doesn’t exist. The use of this tool to do that is definitely being looked at as a, you know, like okay, that’s one way to make it a very short window. It’s a future thing that’s possible but not something we’re actively looking at.
Justin: So Hrishi, you’ve been working with the DevOps teams at Hyland quite a bit. What sort of feedback have they been giving you in terms of things they want to see in the product?
Hrishi: Where we want some feedback from them, is on their processes. How could they see the apps configured? Do you have a follow-up session to say are they using apps across namespaces? A different clusters code resources, more of Kubernetes aspect and saying, how can we take the app aware, app consistent data management to the next level. That that’s the feedback we will seek and work through, right? And also figure out when these app teams have given out. What kind of RBAC do they need, what kind of configurations and roles. So that’s a journey. So we have a good set of features in Astra Control already which allows you to different scope and limiting your name spaces and so forth but that’s primarily feedback we would take and then evolve Astra Control to meet all those use cases.
Justin: All right Casey, you know, it sounds like you’re well on your way to trying Astra Control and the data protection piece. Again, if we wanted to reach you for any sort of information about your experience, with Astra Control, how do we do that?
Casey: You can reach me via email at Casey.Shenberger@hyland.com.
Justin: All right, and Hrishi, how would we reach you?
Hrishi: That would be Hrishi@netapp.com.
Justin: All right. Excellent. Well Casey, thanks so much for joining us today as well as you Hrishi. And hope to speak to you again soon.
One thought on “Behind the Scenes Episode 341 – NetApp Astra Control DR – Customer Perspectives”
Pingback: A Year in Review: 2022 Highlights | Why Is The Internet Broken?