Welcome to the Episode 353, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”
This week on the podcast, Jaap van Duijvenbode (@JvDuijvenbode, https://www.linkedin.com/in/jvduijvenbode/) joins us to talk all about Cloud Volumes Edge Cache and how you can use it for remote and edge locations for better localization of data sets instead of having to move terabytes of data around when you need it.
For more information:
Tech ONTAP Community
We also now have a presence on the NetApp Communities page. You can subscribe there to get emails when we have new episodes.
Finding the Podcast
You can find this week’s episode here:
I’ve also resurrected the YouTube playlist. You can find this week’s episode here:
You can also find the Tech ONTAP Podcast on:
I also recently got asked how to leverage RSS for the podcast. You can do that here:
http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss
Transcription
The following transcript was generated using Descript’s speech to text service and then further edited. As it is AI generated, YMMV.
Episode 353: NetApp Cloud Volumes Edge Cache
Tech ONTAP Podcast Episode 353 – NetApp Cloud Volumes Edge Cache
===
Justin Parisi: This week on the Tech ONTAP podcast, we talk about Cloud Volumes Edge Cache with Jaap van Duijvenbode.
Podcast intro/outro: [Podcast intro]
Justin Parisi: Hello and welcome to the Tech ONTAP podcast. My name is Justin Parisi. I’m here in the basement of my house and with me today I have a special guest to talk to us all about caching in the cloud, on-prem, wherever. Right. So Jaap or is it "yop" or "jop"?
Jaap van Duijvenbode: It’s "yap," it’s perfectly fine, thank you. Yap.
Justin Parisi: So, yeah, it looked Dutch and the J is a Y, right?
So, and then I’m gonna attempt the last name here – "van dooey-van-bode."
Jaap van Duijvenbode: That’s like 99% correct. That’s great. It’s "von dye-ven-bode". It’s typical Dutch name.
Justin Parisi: "Dye-ven-bode." That’s right.
Jaap van Duijvenbode: Yeah, it dates back to the 1400s. So I’m not gonna blame you.
It’s great to be here on the show. My name is, like you said, Jaap van Duijvenbode. I’m a principal technologist here at NetApp and Cloud Evangelist, and what I focus on is helping customers to solve the challenges around distributive storage and help them in their journey to the cloud to be able to kind of migrate their file services off premises into hybrid or public cloud.
So that’s a bit about me.
Justin Parisi: All right. So, if we wanted to reach you, how do we do that?
Jaap van Duijvenbode: Well, I got a few social media channels. You can look me up on Twitter at @jvduijvenbode, or on LinkedIn. Find me there. I’m gonna kick off with some Discord and Twitch stuff later onwards, and you can find my stuff on YouTube as well. So I have a YouTube channel with all kinds of cool videos, walk-through demos, overviews, but also some technical in-depth stuff.
Justin Parisi: You mentioned migrating to the cloud or from the cloud or, whatever.
So why would someone wanna do that? What are some of the use cases that you’re seeing out there where people are really taking a serious look at moving in or out of the cloud?
Jaap van Duijvenbode: Yeah, I think a lot of companies, they’ve been dealing and challenged with managing distributed storages for decades.
When I started in IT I started in the Novell days. Some of you may still remember. That gradually evolved into Microsoft Windows NT and Windows servers and all of that data has kind of sprawled over the decades, right? So we see customers that have distributed environments with a lot of distributed locations, remote offices, branch offices, construction sites, manufacturing plants.
And that data has grown organically over time. And they see exponential data growth issues with regards to unstructured file level data. And for many, many years, organizations have been trying to manage those islands of data with local backup, local data management, local security, and it’s a pain in the neck because a lot of customers, they can’t really deal with the ramification of the exponential data growth that comes with it.
So they want to leverage the cloud to subsequently make the data a little bit more intelligent, right? So instead of worrying about blocks and clusters and storage and data management, they want to protect and secure their intellectual property and the information that that data represents.
So basically, our customers and the discussions that I’m having are typically around streamlining storage infrastructure and the cost associated with managing that infrastructure and subsequently being able to augment and leverage the cloud for that purpose, to put more structure around the unstructured data sets that they’re managing.
Justin Parisi: So another challenge that you’ve touched on a little bit, but when you have a very, very large data set, like multiple petabytes of data in a, in a data lake or something, you don’t necessarily wanna move that entire thing. You only wanna move the stuff that you need. And I kind of equate it to like if you go on vacation, you don’t bring your bed , right?
No. You bring a suitcase, you bring some clothes, right? Exactly. You don’t always need to bring all your data with you. So what do we have out there that helps us take that concept of a data lake but shrink it down a bit so we don’t have to move so much data, it doesn’t cost us as much to migrate.
Jaap van Duijvenbode: Yeah, I think that’s a great topic and you put that really well. You don’t want to bring your bed and you don’t want to bring your garage and all. Truth be told, 80% of a customer’s data is unstructured and of that workload, we typically see that only 2% of that data is actively used on a day by day basis. So traditionally, customers would just throw disk shelves and storage at that problem of data growth and the ability to present that data locally in distributed locations or in data centers.
But the majority of that data shouldn’t even be there, right? So if you look at 98% of the data, it shouldn’t even be on primary tier one storage. It should be out there somewhere in a deep archive or should be in a, a cold object tier that could be leveraged for the purpose of long-term data retention.
And that’s what customers really love about the idea of the hybrid multi-cloud message that we deliver and these types of options for us to integrate different technologies to put more structure around that unstructured file data set, but also around the cost and the impact of cost on that exponential data growth we’re talking about.
So combining tiering capabilities in the cloud, combining lower cost object storage and leveraging primary SSD for performance tiers, and combining that into a single set of data, which becomes your primary file system in the cloud. That is something that our customers have articulated over the years to say, Hey, we would like to get through this perfect world scenario of centralized single set of data in the public cloud or in our data center, and then have the ability to better manage it, better protect it, secure it, monitor, all of the things that we need to do with that data, but not the premium as if everything sits on our tier one storage platform, but really leverage the cloud for the purpose of scale flexibility and also lower cost storage.
Justin Parisi: So what does NetApp offer for that type of use case? What do we have that can automate that to make sure that you don’t have to move things on your own, everything is done programmatically through this option that we have.
Jaap van Duijvenbode: Exactly, and that’s what we want to talk about today. Cloud Volumes Edge Cache is the latest addition to the Cloud Volumes ONTAP portfolio that combines our Cloud Volumes ONTAP storage platform in the cloud, our backup and recovery services that are now offered through BlueXP, and subsequently what used to be our Global File Cache technology and Global File Cache still is available today as a standalone product, but it’s now fully automated and integrated within Cloud Volumes Edge Cache, also short called CVEC, so I’m gonna reference to that as CVEC. And CVEC has been built to give customers the option not only to better manage their data and storage assets, but also being able to leverage caching technology to cache only what is actively used at these edge locations. And an edge location can be a distributed office, a remote office/branch office environment, but also regional data centers that you may have where you’re running your file services. And the benefit of caching is basically you can present multi petabyte file sets, but only cache a number of terabytes at a specific edge location – only what is needed and what’s relevant to that specific location as users are consuming that data, either on demand as they’re browsing through their file shares and their folder structures as they’re opening large files you know, traditional office documents, large CAD drawings, complex 3D models, and designer data.
They actually find the benefit of local performance with the consistency of a global file system and a global file set, and that’s what makes Cloud Volumes Edge Cache really unique in our portfolio, because this doesn’t only talk about the cloud. It doesn’t only focus on the principles of the hyperscaler and the integration of storage and data management, but it extends out to the edge where the users reside, where the users, the applications, and some of these services are interfacing with that data on a daily basis.
And to be able to give local type of performance in those locations with the consistency and the coherence of a single file set really adds a lot of benefits for customers, gives ’em the benefits of consolidation. They can lift and shift their traditional monolithic Windows file server storage away, move that away from their branch office environments, move that into the cloud, replace that with the cache, and drastically streamline and reduce the cost of ownership with regards to that infrastructure and the data management that goes with it.
And then as a result, which is something that is often spoken about in our EBCs and workshops, is if you get to that idea of a centralized single set of data or a single source of truth, you now have the ability to collaborate in real time. So if you’re an architectural firm or your construction firm and you’re working on large design files, CAD files, 3D models. You can now collaborate across the globe on a single set of data with the full coherence and consistency of that data set, which means that through file locking, through local caching, through streaming of delta blocks over the wide area network, you can provide a local file system with the consistency of a global file system in the cloud, which gives the users the capabilities to work, either as in a "follow the sun" type of mechanism or in real time with multiple offices as if they’re all sitting in the same office. Which makes Edge Caching really unique in terms of how that enhances your productivity capabilities, while it eliminates the challenges of the IT administrators and of those that have to manage all of those assets to move everything centrally, so you can perform all of your data management tasks, your compliance, your governance, all of that centrally in the public cloud.
For the last decade, we’ve been working with customers to help them in their journey towards consolidation. Think about an engineering firm that goes into an airport terminal. They put in this big laser scanning device, and it basically uses a software package that subsequently translates the actual point cloud laser scan data into a 3D model. You’re talking about multiple terabytes of data. Within a few minutes and hours, you can see massive increase in terms of the data that is being generated offsite, and that has to be ingested somewhere. So think about the concept of massive data that gets created through drone images on construction sites or telemetry in motor racing. For example, with Porsche Motorsports, we’ve done project to help them move multiple terabytes of data from the racetrack to the factory, process it in the cloud and make better decisions to start winning races. So all of that goodness comes together, especially as it evolves around large data and large file sets and specifically file shares, right?
How do you make file shares available on a global scale with local performance? And I think everyone on the session that is listening, understand some of the ramifications of the SMB protocol or some of us may still refer to it as CIFS, but that’s a highly chatty protocol.
So if you are using that protocol over the wide area network, you’re kind of dead in the water because every millisecond of latency has a compounding effect on performance, and it degrades the performance because every single SMB block needs to be acknowledged in sequence in order to get it across the wide area network.
So by leveraging caching and streaming, And optimizing that data movement and the data transfer over the wide area network, you’re able to not only maintain the principles and the construct of the SMB protocol where the users will interface with that data, but you’re eliminating the checks and the constraints of high latency and low bandwidth in order to provide that local experience with the consistency of a global file set.
And that’s why large file shares, engineering, construction, manufacturing, design. All of those different verticals make a perfect fit for this technology, but also financial institutions that deal with large Excel files that need to collaborate over the wide area network between AsiaPac, EMEA and the Americas to make better decisions in investment trading portfolios, et cetera.
Justin Parisi: So let’s talk a little bit about how this actually works. I would imagine that the edge caches themselves are read writeable. Is that accurate?
Jaap van Duijvenbode: Yeah, so basically the way that Cloud Volumes Edge Cache is presented, is available in the marketplace. So if you go to, let’s say, Microsoft Azure, you wanna enable Cloud Volumes Edge Cache.
What do you get is a storage platform built on Cloud Volumes ONTAP, with backup service embedded within the offering. Then subsequently, we’re extending that capability of that central storage platform through whatever we call our edge caching software. That edge caching software runs on Windows server and provides what we call a virtual file share at the edge location, which means a user would be able to map a drive or kind of access their global name space to be able to interface with that local version of the file system that is centrally stored and protected and managed in CVO, in the cloud.
So by leveraging Windows Server as the base platform for this software, as you deploy the software either on commodity hardware or on a hypervisor infrastructure, you now create, I’m not gonna use the word replica, but kind of a realtime view of your central dataset through a local SMB file share that is accessible to the users.
And the way that this technology works, it basically, Like I said, integrates with a global namespace like DFS namespace, so it’s really built for the Microsoft Enterprises where users are used to their yellow folders, their Windows Explorer view, their drive mappings to be able to access that data on their file server. But now their file server may be thousands of miles away or hundreds of milliseconds away, but then still maintain that local aspect of performance through caching, which we use the NTFS file system on Windows server to cache the active data sets either on the mount or through what we call pre-population. And the benefit of being a file aware solution or file-based technology.
Is that you augment and you leverage all of the principles of the the SMB protocols. So all of the different versions and dialects, but also the NTFS file system, which includes obviously the data itself, the metadata, but also the extended attributes and the permission structures as you would know them from the Microsoft world.
So unlike other technologies in the marketplace that try to solve this problem around distributed storage, we don’t have to mimic CIFS or SMB. We’re natively sitting on the Windows platform to be able to deliver that experience to the users. We don’t have to decouple metadata or replicate metadata between different sites like other vendors do in order to make a solution.
As such work, which makes it highly scalable, but also flexible to deploy. We have customers with maybe five sites and 20 terabytes of storage, and we have customers with hundreds of sites globally. And multi petabyte environments where they’re actively collaborating around the globe 24x7x365 in order to get their productivity up to a hundred percent level.
So that’s obviously where Cloud Volumes Edge Cache. Solution really thrives. It brings two worlds together. It brings the cloud world together with the Microsoft world, where customers are used to managing Windows server infrastructure. Instead of managing a file server, they are just managing a Windows server with a cache, and that subsequently correlates and coexists with additional roles and features that you might require in your environment.
So, think about software distribution, think about AD, DNS, DHCP or infrastructure roles and services that you may need. You may already have a Windows server for today. You could refactor that and repurpose that as a file cache that is much more intelligent, requires way less management, doesn’t require any backup because all of your backups have now been centrally consolidated.
And also is very low cost in terms of infrastructure footprint as it runs on commodity hardware or your existing hypervisor infrastructure.
Justin Parisi: As I understand caching with this sort of workload, the hard part is not getting the data to the edge. The hard part is dealing with the locking in the SMB protocol.
How does CVEC handle that global file locking? How does a lock from site A get honored on site C halfway across the world?
Jaap van Duijvenbode: Yeah, that’s a great question and it’s the topic that comes up a lot with our customer conversations, that I want to introduce better ways of collaboration because historically speaking, we’ve never been able to effectively collaborate on large traditional file systems in such a way, right?
So, if you ask Microsoft, they want you to put all of the data in SharePoint and use check-in/check-out and document management, document control. And you know, this sounds a bit utopian. I think a lot of customers want to get to that state of managing everything in a more structured manner, but we do know that 80% of the data is indeed unstructured and file services are here to stay.
And that’s what I’m seeing in my conversations with customers. The file services estate continues to grow. We just have to make more intelligent decisions around how do we present this to our users and how can our users get more value out of a file system? And some parts may end up in OneDrive, other parts end up in SharePoint or document management, document control types of environments, but the majority of our workflows and our workloads are associated with that unstructured data set. So yeah, to your point, file locking is critical because it maintains, and it guarantees data integrity for users that are actively working on their data. They don’t wanna see a productivity loss.
They don’t wanna have to deal with restoring snapshots, which is, by the way, something we do really well. But, you wanna make sure that you can maintain the level of productivity where there’s no data at risk, there’s no data integrity issues whatsoever. So our locking capabilities have been built on the fundamentals of a Windows server.
So basically what we do, if you look at the way that we interface with the file system, as a user interfaces with the edge, it’s an Edge VM instance or running on a Windows server and interfacing with that virtual file share, which is a UNC path that you would normally access through drive mapping or DFS namespace.
That edge is communicating with what we call a core instance, which is currently a VM instance, but soon will be migrated into more cloud native technologies and container-based technologies. But our core subsequently interfaces with the CVO instance. So the CVO instance was basically where you have your SVMs and your SMB endpoints, and that’s where you manage your data.
That’s where you manage your backups. That’s where you associate different types of snapshot and data retention policies with that central dataset. But CVO is basically where the file lock is taken out. So if a user from an Edge location opens a file, what happens is, the user navigates through their file share, obviously has authenticated, and double clicks the file, let’s say a simple file like a Word document.
The edge will communicate with the core, do a quick hash check to see if there’s already a cache copy in the edge, yes or no. If there’s not, we have to stream the delta differences over the wide area network to update the file and the cache. But we’ve taken out a central file lock on CVO on behalf of that user at the edge, and in that case, that user is guaranteed read/write access to that file if he is the first one to open up that file.
If someone else from site B or another distributed location tries to open up that same file like a millisecond thereafter, they’ll get a message that the file is in used by, let’s say Jaap in Amsterdam where I’m based. And you get the prompt at the same way as a Microsoft server would actually offer you say, Hey, you wanna read only copy of the data?
Do you want to get a notification when the file is available for read/write? Or do you want to get merged the contents later and save a copy locally, elsewhere? And if you click, I wanna get a notification, we get the benefits of the SMB notify. Then let’s say I make the change in Amsterdam, make some incremental updates to my document.
I save it, I close it at the end of the day, and the other user receives a notification. "This file is available for read/write. Do you want to open up the file?" It basically triggers a rehydration for only the delta blocks over the wide area network between the core and the edge, and subsequently the user will get access to the latest and greatest version of the file and would then have the read/write lock that sits on CVO. And that makes this really consistent because all of the locking is built on centralized locking on CVO.
That also helps if you have users that are accessing CVO directly. You may have an in-cloud process that basically does some sort of rendering or high performance compute that accesses the file shares directly on the backend. Even if you extend the data out to cache locations, all the locking semantics and principles are still maintained on a per-file basis in a native fashion, as you would normally recognize from a Windows server or from an ONTAP system to guarantee that level of integrity.
Justin Parisi: Integrity aside, how do you handle the performance problem? I know that that’s one of the main issues with this distributed file locking is that there’s gonna be some performance issues because you gotta replicate lock states and everybody’s gotta be aware of everything. How does CVEC do that?
Jaap van Duijvenbode: That’s a great question. I think that’s where we’re unique in the marketplace because if you look at other solutions out there, they have to actively replicate data or metadata associated with the locking state to tell these controllers that are running proprietary file systems and emulations of Samba at the edge, to say, "Hey, this file is unlocked by this user in that site." We don’t have to do that because all of the locking state is managed and maintained at the central location. So if I’m in Amsterdam and if I have a file open, I don’t need to know what files are open by the users in New York or elsewhere, right?
So I could have infinite scale at that point, and I’m not impacted by chatiness or the abundance of replication traffic that is required in all those other solutions in the marketplace. Replicating metadata for the state of the file system, replicating metadata, who has the latest and greatest snapshot, replicating ACL structures and NTFS permissions throughout my ecosystem of controllers, that is not required with Cloud Volumes Edge Cache, because each edge is unique. I could have a hundred terabyte file set and my edge in Amsterdam right here in my backyard could be a one terabyte cache, and your cache may be 10 terabytes depending on your active data set. I don’t need to know about your cache, and you don’t need to know about mine.
So basically what happens in the cache stays in the cache, but all of the consistency is guaranteed by that centralized locking model. And because we’re efficiently moving, using our streaming protocol over the wide area network between the edge and the core, it only takes us a couple of milliseconds to check whether or not the file lock is taken out and whether or not we can take out a file lock on behalf of that user at the edge, which makes this highly scalable and flexible and performant at the same time.
And then in terms of performance, obviously there’s a few secret ingredients, quote unquote, or secret sauce that enables the technology to move the data faster from a networking perspective – large windowing, streaming, compression. But we also leverage delta differencing capabilities.
So if you have a large file that is being served from the data center, or from the cloud, if you haven’t touched that file in, let’s say 30 days, and 20 percent of that file has been updated elsewhere, you only have to fetch that 20% delta increment in order to update the latest and greatest version of the file in the cache and serve it to you as an end user.
So there’s intelligence built into our protocol that helps to overcome some of these performance bottlenecks that you would typically see with other solutions or solutions that do write through or write around technologies, for example, in Microsoft you used to have branch cache and you have Azure Files and Azure File Sync, they use those types of semantics in order to be able to replicate data We don’t replicate. We really only cache when we stream and compress over the wide area network to make it most efficient and highly performant as well.
Justin Parisi: If I’m understanding this correctly it sounds like we’re taking this notion of performance as perception, right?
So our users are really gonna be what dictates if something’s fast or not. Yeah. So if they’re local to the data and it’s fast for them, that’s all we care about because the backend can take a bit of time to go back if we need it to, cuz it, it doesn’t really matter. I think the only place where that really matters is if you need two sites to have access to a file immediately. That might take a little bit of time for that to update. Am I understanding that correctly?
Jaap van Duijvenbode: Yeah, that’s correct. And we leverage write back caching capability to accomplish that. So if a user performs a large write on the local file system at the edge, we use the write back cache and we can use as much bandwidth as is available to us, or we can throttle the bandwidth between what we call the edge and core in order to move that data in a write back fashion.
To your point, if you are collaborating in real time on a large engineering drawing. So let’s say we’re working on this 3D model of a skyscraper that’s about a terabyte of data. You are working on the doors and I’m working on the windows. I need to make sure that as much as possible, my metadata for the actual model itself is almost like synchronously replicated, and we’ve built in features in order to accomplish that, to provide for those types of collaborative workloads and workflows that allows you to move the data not only in a write back fashion, but also prioritize that data as well to be written to the backend in real time. One of the things that I see in the field is, as much as we can obfuscate the impact of performance from the wide area network and keep that performance local to the users where the users would interface with their files, either on demand or through warming up the cache using our pre-population capabilities.
You’re kind of really taking the burden off the user and let the back end file system handle the writes, but also the commitments to the central storage platform that is used.
Justin Parisi: Earlier you mentioned things like SharePoint and OneDrive and DFS. How does CVEC differ from those? Gimme just the overview, the summary of how it differs from each of those different ways of sharing data…
Jaap van Duijvenbode: I think the major difference is that we’re talking about a real-time global file system. So all of your file shares that are centrally provisioned on CVO, they’re being cacheed at the edge on demand or through pre-pop.
It’s always gonna be the latest and greatest version of the file that you’re gonna access as a user. With DFS or DFS-R, you’re dependent on replication. You have no locking or no consistency in order to ensure that you always have access to the latest and greatest version of the file. With SharePoint, obviously you gotta move your entire data estate that you’re managing into a structured working environment, which is then built on client server technology in order to allow your users to interface with that data.
So all of your locking capabilities go away. All of a sudden you have to train your users to check-in/check-out. You have to adhere to specific workflows in order to get access to that data. And the performance issues are still there, especially in a global environment. So you may have changed the form factor of the data.
You haven’t really addressed the issue of having a local cached copy that you need for performance and consistency to be able to actively access that data as a user. So I think if you look at the market in general I think caching is still a very unique approach for this specific workload and it becomes more and more relevant as obviously distance still continues to exist. Bandwidth is growing – that’s not an issue anymore. Most of the scenarios, but the consistency and having this idea of a global file system that it looks and feels as a local file server is still very much relevant.
Justin Parisi: What about something like Avere? Like how does CVEC differ from a product like that?
Jaap van Duijvenbode: Yeah, that’s a good question. And obviously we see there’s a numerous amounts of solutions that are built like Avere, which are currently designed to push a lot of data through the pipe. So using UDP based data replication to just move as much data as possible.
We see this use case most of the times between data center to data center type of replications or very specific workflows, ie., in the media industry that just doesn’t bother, there’s no issues with consistency, there’s no requirements for file locking. There’s no requirements for the semantics of a file system with authentication and authorization principals on the NTFS file system or permission structures.
Avere and solutions like that are great technologists just to massively move large amounts of data through UDP based pipe, and that’s basically a completely different use case. Our Edge Caching software is designed for the movement of data, but with the purpose of creating that single set of data in the cloud and the ability to catch, but also maintain all of the fundamentals around authentication and authorization, ACL’s, permission, central file locking, all that goodness.
Justin Parisi: Okay, so I know that if you’re NetApp centric, there’s also that obvious one that we need to talk about is FlexCache, right? So ONTAP does have the concept of edge caching within ONTAP natively, and if you’re using CVO, you have access to that. So talk to me about the differences between something like a FlexCache versus something like CVEC, and why would I use one over the other in certain use cases?
Jaap van Duijvenbode: I think both solutions really complement each other really, really well. Historically, FlexCache’s primary focus has always been NFS. Over the last years they’ve been adding SMB capabilities to it as well. Our Edge cache software, or Global File Cache, has always been focused on SMB and it’s been really focused on the Microsoft world.
So if you love managing ONTAP, and if you love the ability to leverage caching, FlexCache is a great technology. If you are dealing with file shares that are not only sitting on ONTAP, but also in other storage platforms of choice then, and, and you’re managing a Microsoft centric environment, our Global File Cache or Edge cache solution that is part of CVEC is a great technology that subsequently enables that capability. It’s built for Microsoft Administrators, it’s built on Microsoft technology. It’s built for the Microsoft ecosystem, and therefore it fits really nicely into that SMB specific workload. I do see a lot of customers that combine those solutions, right?
They want to have a specific workload on NFS at a specific distributed location. They may have other workloads or other file system requirements around highly collaborative environments and realtime file walking and all that capability that are native within the stack and have been for the last decade.
That’s where they wanna use Global File Cache or the edge cache software that comes with CVEC.
Justin Parisi: So I know the ONTAP also can do multi-protocol now. So you mentioned NFS, and of course that won’t work with Global File Cache or CVEC. So what about when you have data sets that are accessed by both NFS and SMB? Could I use CVEC for the SMB caching, even if there’s NFS stuff going on the backend and other places?
Jaap van Duijvenbode: Yes, you can. Obviously, the downside of an environment as such is that depending on the duality of the protocol stack on the backend, you may or may not benefit from file locking the way that you normally would in an SMB ecosystem as a whole. So yeah, there’s considerations for that.
We do have customers, for example, that leverage ANF in a dual protocol fashion and subsequently use the SMB portion of that volume to extend that out to the distributed locations using our GFC software. That’s fully supported, but you can’t really control what happens on the NFS endpoint.
Justin Parisi: So you’ve talked a lot about CVEC being a part of the Cloud Volumes ONTAP infrastructure. What if I’ve got a mixture? What if I’ve got some Cloud Volumes ONTAP, but I’ve also got a lot of on-prem stuff. And really the reality is that a lot of our data sets are still on-prem and we haven’t migrated them to cloud.
How does CVEC interact with that sort of use case?
Jaap van Duijvenbode: That’s a massive opportunity we see there. I think 90% of my conversations with customers are exactly around that hybrid motion, right? We have our on-premises estate of NetApp infrastructure. We’ve been managing that for many decades, and we love it.
We do wanna kind of expand that into the cloud, but we don’t want to give up on our on-premises capabilities. So yeah, Cloud Volumes Edge Cache is really designed to deliver that hybrid, multi-cloud approach. So think about bootstrapping your "Cloud Landing Zone," as I would call it.
You create a Cloud Landing Zone in the cloud. That’s a file system. Could be 10 terabyte, could be 30, could be a hundred. That’s your starting point. And subsequently what you do with that file system, you unlock the capability of Edge Caching. So you now, as a user can see that centralized file system, but you can also attach that end user to your on-premises data center because by virtue of the deploying and implementing Cloud Volumes Edge Cache, you automatically unlock the capabilities to integrate what we call our core instance with your on-premises infrastructure as well.
So now we can extend the file shares that may sit on a FAS or an all flash array out to that same distributed location. So as a user, I can see the files that are sitting centrally on CVO. That are sitting centrally in my on-premises data center in the same construct of the name space, which really provides you that hybrid capability for different reasons, right?
So data sovereignty or data residency, some of the data may not be in the cloud, but I do want to present it at my remote office, branch office locations. Maybe the ability to provide for that idea of a cloud journey, not a lift and shift, or say, Hey, I’m gonna migrate my data on Friday, and I’m gonna hope for the best on Monday morning.
Because we know the reality. If you’re talking multi terabyte, multi petabyte scale, it’s gonna be a bit of a journey. So the ability to combine on-premises and then use our SnapMirror replication technology to move volumes and create that idea of data affordability between your on-premises data center and the public cloud now gives you the options to move different volumes or workloads from one place to the other without disruption to the users. The users can still access the data as if it’s locally through the cache.
Justin Parisi: Okay. So, it sounds like we have a lot of opportunity here to use Global File Cache or CVEC in a lot of different areas. So I know you touched on CVEC being this encapsulating solution, including Global File Cache and CVO. What about Global File Cache itself? Like when would we use something like that if we aren’t able to use a CVEC?
Jaap van Duijvenbode: The idea around CVEC is that customers want to kind of burn down their hyperscaler monetary commit. So in Azure, you have what’s called a MAC. So basically customer buys some upfront commitment with Azure. They wanna spend that commitment through third party or native services in Azure, and CVEC is exactly built to do that.
So CVEC is what is called a marketplace eligible solution. So you can consume the technology, the whole stack of services through a PayGo or private offer model. That’s a different way of consuming versus GFC, because GFC traditionally has been sold or positioned as a standalone solution that extends Amazon, Amazon, FSXn, Azure NetApp Files, your on-premises data center.
Everything that is not CVO in Azure. And soon we’ll have CVO in Google supported as well. But GFC is traditionally designed to support everything that is NetApp ONTAP, and bringing the data close to the users at the edge caches. And that technology is still available today as a standalone solution, which is offered as a subscription by NetApp.
So this is not, we’re not selling that through the marketplace or as a private offer whatsoever. We’re selling that as a subscription and the way that customers buy has changed. I do have to say most customers are really excited about buying a platform that includes a cloud file system that includes backup and data protection and recoverability options and caching and visibility as a single solution, either through PayGo or a private offer. That’s what the major difference is. But for those customers that wanna benefit from GFC, or Global File Cache, it’s still out there today and we continue to make updates to the technology. We recently introduced the capability of managing and monitoring your Global File Cache software through our Cloud Insights platform, and released some cool dashboards to get you more visibility and telemetry through the platform. It’s here to stay. So Global File Cache is standalone. If you’re looking for a full platform opportunity and solution Cloud Volumes Edge Cache is the way to go.
Justin Parisi: All right, jaap. Sounds like we got a lot to think about with Global File Cache as well as CVEC. It sounds like we’ve got a really good use case spelled out here and that we understand where everything fits, so we know when to use what and when not to use it. So if you could, tell me where to find more information about Cloud Volumes Edge Cache.
Jaap van Duijvenbode: Thank you very much. And I think we have a massive opportunity in front of us. Most of our technology is now available, especially in the cloud portfolio through our Blue XP page, which is bluexp.netapp.com.
There’s different categories. You’ll find category storage, mobility, protection, analysis, and control. Under mobility, you’ll find edge caching, which basically gets you through both Global File Cache, and our Cloud Volumes Edge Cache offering. So if you wanna get started, go there. And otherwise you can check us out in the marketplace on Microsoft Azure and in the near future on another hyperscaler that is very popular nowadays with us that we’ll be announcing soon, so stay tuned for this. And thank you very much for listening.
Justin Parisi: Ooh. Little teaser there.
All right that music tells me it’s time to go. If you’d like to get in touch with us, send us an email to podcast@netapp.com, or send us a tweet @NetApp. As always, if you’d like to subscribe, find us on iTunes, Spotify, Google Play. iHeartRadio, SoundCloud, Stitcher, or via TechONTAPpodcast.com. If you liked the show today, leave us a review.
On behalf of the entire Tech ONTAP podcast team, I’d like to thank Jaap van Duijvenbode for joining us today. As always, thanks for listening.
Podcast intro/outro: [podcast outro music]