Behind the Scenes Episode 357: Cloud Data Sense – Data Governance and Compliance

Welcome to the Episode 357, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

2019-insight-design2-warhol-gophers

When I was a kid, my room was often a mess. Clothes on the floor, toys everywhere. My mom would try to stay on me to keep it clean, but it was mostly a losing battle for her. What usually inspired me to clean up, however, was when I couldn’t find something. Then I knew it was time to take inventory and get a bit more organized (for the time being, at least).

Why Your Child With ADHD Has Such a Messy Room - Child Mind ...

Data management isn’t much different – especially with large unstructured data lakes. Not only are there potentially millions – or billions – of files, but much of that data is redundant, unnecessary or obsolete. The problem is, if you haven’t been keeping your data sets clean, you are now stuck with the arduous task of finding out what is in your data.

This week on the podcast, Cecile Kellam (cecilek@netapp.com) and Michael Landau (mlandau@netapp.com, Michael Landau on LinkedIn) join me to discuss how Cloud Data Sense can quickly and easily help you find out what is in your data, with the end goal of tidying up your messy data sets.

For more information on Cloud Data Sense:

https://www.netapp.com/pdf.html?item=/media/56033-NetApp-Cloud-Data-Sense.pdf&v=20217231643

Finding the Podcast

You can find this week’s episode here:

I’ve also resurrected the YouTube playlist. You can find this week’s episode here:

You can also find the Tech ONTAP Podcast on:

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Transcription

The following transcript was generated using Descript’s speech to text service and then further edited. As it is AI generated, YMMV.

Episode 357: Cloud Data Sense Data Governance and Compliance
===

Justin Parisi: This week on the Tech ONTAP Podcast, we talk about data governance and compliance with BlueXP Cloud Data Sense with Cecile Kellam and Michael Landau.

Podcast intro/outro: [Podcast intro]

Justin Parisi: Hello and welcome to the Tech ONTAP podcast. My name is Justin Parisi. I’m here in the basement of my house and with me today I have a couple of special guests to talk to us about data governance and compliance. So to do that we have Cecile Kellam. You may recognize Cecile, but Cecile, for people who don’t recognize you, what do you do here at NetApp and how do we reach you?

Cecile Kellam: Hey, Justin. Thank you for having me again. I am a sales manager here that goes between product teams of the latest and greatest things that we are rolling out here in NetApp and the different sales teams and partners to ensure that we are getting a good flow from product to sales, basically.

Justin Parisi: All right, that seems important.

How do we reach you?

Cecile Kellam: You can reach me either on my LinkedIn at Cecile Kellam, or you can go to cecilek@netapp.com to reach me via email.

Justin Parisi: All right. Also with us today we have Michael Landau. So Michael, what do you do here at NetApp and how do we reach you?

Michael Landau: Well, thank you very much Justin. I work on the product team for Data Sense, which is under our BlueXP classification option, and I can be reached on our teams channel at mlandau@netapp.com, on LinkedIn or any other way on social media.

Justin Parisi: All right, we’ll be sure to include those links in our blog that accompanies this podcast. So I heard BlueXP and I know what it is and some people know what it is, but maybe others don’t. So, Cecile, you are our BlueXP explainer.

So XP-lain, us.

Cecile Kellam: Thank you Justin. Yeah, BlueXP is our answer to the evolved cloud state that we’re seeing with our customers. A lot of times we had heard that people were going all in with the cloud or they’re staying all on-prem or they’re going all in with AWS. But what we’ve actually seen over the past few years is that it’s not as clear cut as that.

Sometimes you need to bring things back on-prem from the cloud, or you find that something would be much better run in the cloud or you’re in AWS and you need to get a footprint in Azure. Whatever that looks like, BlueXP is our answer to supporting a data estate that goes beyond NetApp on-prem, but we do serve, whether you’re talking about block or file or object-based storage with BlueXP from the NetApp perspective, but we’re also able to work with all of your different clouds. So regardless of where that storage is sitting BlueXP is a way for you to have management and control over that entire data estate.

Justin Parisi: So sounds like it’s just everything, right? Is that all it is?

Cecile Kellam: It is our answer to trying to be everything.

It’s built on a common services layer of API so that while it might not be everything yet, it leaves itself open to where it can go to best serve our customers. And that involves cloud state, cause we don’t know where it might go.

Justin Parisi: It has the goal of being everything is what I’m hearing.

Cecile Kellam: It does.

Justin Parisi: Alright. So an important part of the cloud and on-prem and a lot of other areas is the idea of this notion of data governance and compliance. This podcast is gonna be about that sort of thing. So let’s talk about what data governance and compliance is. Can you kind of gimme a high level description of those things, because those are kind of buzzy sounding words that people may have heard, but maybe don’t fully understand the definition of, so if you could do that for me…

Michael Landau: Absolutely. I’ll jump in with that. For years we’ve been creating significant populations of data and what we’ll call data sprawl is the new order in our organizations. Data growth is happening on-prem, it’s happening in the cloud, and the reality is that the bulk of people in our organizations that have to manage that data don’t know what’s in the data. They don’t own it, and they didn’t create it. So Data Sense comes along to look at it from a governance and compliance perspective to say, let’s use the latest in technology – AI, machine learning, and natural language processing to understand what’s in the data. To classify it and categorize it by business topics, be able to allow you to understand who has access to it, who shouldn’t have access to it, what’s in it, what risks are contained in it, and be able to first know what’s in your data to then be able to plan accordingly and act on your data. The understanding that governance, compliance, and risk is different in all organizations is a foundation point for being able to know what’s in your data and then bring along a solution that helps you not only comply with your existing policies, but build new policies as data grows and expands.

The idea is governance and compliance through this type of solution is to allow you to know what’s in your data, see the conditions, and take actions upon them.

Justin Parisi: So why might seeing what’s in that data be important? Like what sort of rules and regulations apply to data that people maybe don’t think about all the time?

Michael Landau: Well, so rules I will say is your internal business policies and regulations. I’d like to call it the alphabet soup of the regulatory landscape. So we have things like GDPR, California Consumer Protection, the freedom of information in certain parts of Canada and across Canada, other privacy regulations. Then you have HIPAA compliance, PCI – Payment Card Industry Compliance, privacy risk obligations. There’s an entire landscape of regulatory obligations, and of course that’s just for your ordinary businesses. Then you go into the regulated industries and there’s an entire second level of regulatory compliance with organizations like FINRA and the SEC and more.

But I think it’s important to know also that each of our businesses, each of us who have data populations and are trying to manage them, have business policies. So what data is a business record? How long do we need to keep data? Are there different retention policies for, let’s say payroll information, sales information, marketing information, product or manufacturing information, financial results of the organization, for example, and having an understanding of what data needs to be kept, how long it needs to be kept, where it needs to be kept.

Should it be encrypted? Does it have risk in it? Those are the foundation points or the blocks of questions that by knowing what’s in your data, you get to right size it, right, locate it, right-comply, encrypt if it should be, delete if it should be, and one of the growing needs is return it to the consumer or organization it belongs to if requested.

Justin Parisi: As far as I know, GDPR and the California Protection Act are pretty similar. And I guess what you can do there is if you are a person that needs – they wanna see what data is out there about yourself, you can request that and get a list of that, and then you can, you know, require the organization to delete that data.

Is that kind of what I’m understanding how that works?

Michael Landau: Absolutely, and that’s one small set of features within Data Sense and our governance solution that allows you to put in a user name or an email address and find all the information under the data that’s being monitored related to that, what we call data subject.

It’s called the data subject access request, or a subject access request. And it is the obligation of organizations that have to comply with the various privacy regulations around North America and around the globe, especially GDPR and California Consumer Protection, and those that follow that to be able to locate any information that you’re keeping on an individual to be able to identify your rights to use and keep that information if there is a business reason, provide the retention policy, whether or not there have been third party transfers, and the nature of what’s in that information related to personally identifiable or sensitive personal information is something you’re required to report back to that individual on request. You may be required to anonymize it or obfuscate it.

You may be required to return it or delete. And organizations today need tools to be able to do that with disparate file shares, with on-prem, cloud, and multiple solutions as well as databases, it has become an arduous task for organizations to be able to respond in the regulated requirement of time depending on the type of organization or individual requesting it. If you have 30 days to respond, you first have to know what data you have and where it is. So organizations are using solutions like this, bringing Data Sense into their environment so that they can know what’s in their data and in the event that they get requests, be able to comply.

Justin Parisi: And I would imagine that further complicating this, other than having millions of files out there that you need to sort through, is the idea of moving that data in and out of the cloud and that cloud adds another layer of complexity and compliance, I would imagine. So can you kind of describe what sort of challenges we have when we try to move our data to the cloud with compliance and governance?

Michael Landau: You’d be surprised, and I’ll start with the fun one, how many times we find data in the cloud that lost its ACLs or permissions and is open to the public. And so thinking about what data belongs in the cloud in the first place is where it begins. And if data’s already in the cloud, being able to sweep across it and scan it at scale with speed to know if there’s personally identifiable information.

if there’s payment card information, credit cards or social security numbers, and if there’s sensitive personal information. And that’s the standard set of items. But we’ve added a whole new series of capabilities around business identifiable information. So individuals, product names, proprietary concepts can be added to the capabilities and understood that if you’re keeping data in the cloud, right-sizing the risk for what data should be in the cloud and what data shouldn’t, is a critical path for organizations today.

And being able to look at your data and do more than the old fashioned lift and shift, move from this server to that cloud, but be able to use artificial intelligence to understand business silos of data. Tranches by age in your data population, do you need to move 20 year old data into a cloud and pay for it monthly, or can that data be tiered off offline or to less expensive permanent storage, or is it needed at all? And then to make sure that the data that you’re keeping in the cloud doesn’t evolve to end up having sensitive personal information, personal information, or business confidential or critical information that should be on either immutable storage, offline or encrypted. So data hygiene from the perspective of moving the right data into the cloud and making sure that the data in the cloud is appropriate for that type of storage.

Justin Parisi: Let’s talk about the cost of all this, right? So , when you have data that’s out there that’s unprotected or in the wrong place, or having access that’s not allowed or for somebody else, right? I would imagine that the business cost is there, where it’s actually, you know, loss of trust and that sort of thing.

But there’s also regulatory fines. So how expensive is it to mess that up?

Michael Landau: Well, I think it’s far less expensive to clean it than it is to mess it up. And so we know that from a GDPR landscape, 4% of gross proceeds is the typical fine that an organization faces globally. So if you think about the revenue of some of our medium and large organizations and agencies, 4% is a pretty hefty tax to pay for a failure to govern data properly. A failure to respect individual privacy, protect against breaches, and make sure that you simply don’t have the risk data where it’s at most risk or peril. So the expense isn’t related to the terabytes of data you have, it’s the risk you expose, in particular individual’s data to.

And the more individuals there are, the more likely you will be to have a class action lawsuit about the harm done by the exposure of personal information or sensitive personal information. So you’ve got the regulatory slap, but then you’ve also got the risk of civil action. And when you go around the globe for our customers and friends who have businesses that reach outside of United States or North America, the privacy risks are even greater than the GDPR exposure because in certain countries around the world, individuals own their work emails, whereas in the United States, the corporation or the agency owns it, and that puts the fines and the risk far in excess of what you would see normally in your typical unfortunate breach that happens in North America.

Justin Parisi: So it’s one thing to have employees accessing data in the cloud from your organization. If your business model is dependent on external users generating content, say a social media company, that puts you at even more risk because now you’re liable for whatever somebody puts on your platform potentially when it violates one of those regulatory compliance pieces. Right. So how, how does something like a Cloud Data Sense help with that? I would imagine it’s able to do more than just scan your business.

It can scan everything, right?

Michael Landau: Well, absolutely. Data Sense does scan most everything, but it has a focus on production data, whereas a lot of social media data is ephemeral. The content that your organization keeps and maintains over its enterprise data lifecycle is where the risk has the most financial cost because you have your content that’s business critical, the content that has personally identifiable and critical information. But then I think what you are talking about, I’d like to bring up is more of third party information. And while social media is a good way to look at it, think about all of our organizations that send data to third parties.

Let’s say an external invoicing or accounting firm, or an audit firm or external manufacturer who use subcontract product or technology to do work for you. All of those organizations are now invoked in your governance and compliance obligations, and you’re responsible for what they do to your data and for their data related to you as an organization.

And so the data sovereignty of what’s within your organization expands to any organization that you give or share access to information that you have, and if that’s including information on data subjects, personally identifiable or sensitive personal information, credit card numbers, social security numbers, or any other identifiable artifacts that may be necessary in your usual course of business to take payments, for example.

You are obligated and at risk for the conditions that exist in that data, whether it’s in the hands of a third party or in your own hands.

Justin Parisi: So we have BlueXP Cloud Data Sense, and that’s gonna be something that we look at for this type of use case. How does it work? How does it make all that just happen for customers without having to worry about finding things on their own and having to do too much on the front end to configure it.

Michael Landau: So Data Sense uses artificial intelligence and machine learning to connect to data sources. It is two VMs. You have the BlueXP control plane that we are referring to. Then you have the Data Sense VM that allows it to both connect to and then read what’s in your data. The artificial intelligence literally cracks open the files, reads them, understands what type of files they are, what risk is contained in them, uses a natural language processing filter to make sure that your responses are accurate, applies security and compliance and governance policies or custom policies that you create that meet your business needs to allow you to visualize in dashboards what’s going on in the data to filter it to just the data you’re concerned about.

It allows you to act on that data – let’s say tag it, label it, copy, move or delete it. Get email alerts and report on your CCPA, HIPAA, PCI, or privacy obligations. So you install two VMs, you connect to your data sources, let it be databases, cloud, or on-prem storage or other applications, even your hyperscaler storage as well.

And then it starts to scan and understand what’s in your data and without any effort, the dashboard populates the nature of your data in all the categories we’ve been talking about. So the age, the contents, is it business or non-business data, who has permissions and all the different types of information contained in your data to, let’s say know, the contents of your data without having to actually look into it at

Justin Parisi: And is this all done on-prem, in the cloud? Both? I mean, where is it installed and what can I manage with it?

Michael Landau: So it is installed either on-prem or in the cloud. It is installed in a hybrid or multi-cloud fashion. Similarly, it can connect to data on-prem, in the cloud, hybrid, or multi-cloud. And while we are NetApp, it is NetApp agnostic.

It connects to just about any storage as a user. It’s agentless, and so that storage can be on-prem or in the cloud. It can scale up to many, many petabytes or look at terabytes. It can look at multiple sources, like structured databases, unstructured data, file, block, and object storage. It is scalable.

And if you think about the primary sources of data that we see, of course all of the NetApp solutions, our family of storage locations, Amazon FSx, AWS S3 buckets, OneDrive, Oracle, a host of SQL servers, MongoDB, SAP HANA, any NFS or SMB shares, RDS databases, Azure Blob, SharePoint online and on-prem, Google Storage, Google Drive, and much more now and even more coming.

Justin Parisi: So Cecile, you talk to a lot of customers about BlueXP in the cloud. When you kind of start telling them about this solution, what’s the reaction that you get?

Cecile Kellam: Well, as you can tell from the talk track, there is a lot to unpack when you ask a customer what’s in your data. Usually if you’re on camera, you see the deer in the headlights look because they don’t know what’s in your data.

So that’s a great conversation opener that leads to everything from the governance side of the conversation, to the compliance to things like least privileges in ensuring that you’re able to show a zero trust architecture as part of a strong cybersecurity framework.

Justin Parisi: And Michael, what about you? I know that you probably have discussed this with customers as well.

What’s kind of their reaction when you start to tell ’em about what this product can do?

Michael Landau: Yeah, I’ll tell you, I just got off a call a few minutes ago and the message was, wow. I can see everything going on under the hood in my shares and in my different sources quickly and easily. Now I’ve gotta figure out who I can bring this valuable information to and make a difference in our data population.

We have a global consultancy that came to us and said, "wait a minute, you can scan a petabyte and have that done in eight days?" And I said, what? Whoa, wait, wait, wait. Your mileage will vary. But they said the solution is extremely fast, it is reliable, and it shows us everything we need to know to figure out what data we need to keep, what we need to get rid of, where our risks are.

Oh my God, let’s expand this into other sources in our organization. And so it’s a pivotal moment for a lot of organizations where there’s always been single point solutions that’ll do just one source or one type or just cloud or on-prem, the common messages you connect to all of it, one dashboard and help us see what we didn’t know about our data.

And that’s big because most of the people managing the data didn’t create it, don’t own it, but are responsible for where it sits, that it’s secure, that it’s private, and that it has a good profile to prevent ransomware from being a risk. And of course, Data Sense is the brain behind our ransomware suite, but that’s for another podcast.

This is something that the people I’m working with, using Data Sense are saying… it’s light, it’s easy, it shows me what’s in my environment, and it even updates itself if you want it to.

Cecile Kellam: It’s an important point that Michael just made there around it updating itself because it’s also the power of that AI and how we’ve architected things to continually be tweaked, to truly be the best in class enterprise solution that we can offer to our end customers.

The innovation that this team puts behind it is just so exciting to see, and we always are welcoming customers’ feedback on the different things that they would find useful for their environment so that we can weigh that with how we continue to grow this product in the years to come.

Justin Parisi: So I think it’s kind of funny you said that, oh, this scan is really fast.

It takes eight days. And I’m like, wow, , that’s, that doesn’t sound fast. But I imagine comparatively to other things they’ve dealt with, that stuff probably took like a month. Right? It’s, it’s a lot of data.

Michael Landau: Well, Justin, I’m, I’m kind of glad you said that. I came from a great organization before joining NetApp, who was proud of their one terabyte a day in their solution.

And I know many of the peers and colleagues and good friends work at other places that do a half to two terabytes a day per server. And to do 1.3 petabytes in anything less than a month is extraordinary, let alone eight days. And I think the challenge and the reality is everybody’s mileage is gonna vary, but data at scale today needs a solution that can work faster than the organizations are creating data.

Data Sense can do it. We have external parties, consultancies, and customers that validate that. Even our customers with hundreds of petabytes of storage that create many terabytes a day and keeping up with what’s going on in their environments. And that’s relevant because we think about what’s growing on-prem, what’s growing in the cloud, and the terabytes to petabytes, to exabytes, to zetabytes.

The solutions going forward need to scale with the customers. And thinking about a petabyte of data and how long it should take to scan it is very important, but I’ll say to you that Data Sense goes faster than the governance, compliance, security and privacy teams at the customers can go. It scans faster than they can complete the projects, so they are never waiting for the tool.

There’s not a time when you go get a cup of coffee while you’re doing a search or a scan. It’s keeping the teams busy, improving the results on what’s in their data, reducing risk and reducing costs.

Justin Parisi: You say a petabyte of data if it’s like 10 files that are multiple terabytes in size versus a billion files, which are 4K in size, that’s probably gonna be a big difference in the time that it takes to scan, I would imagine.

Michael Landau: Well, absolutely. And there’s certain file types we wouldn’t wanna scan. You know, if you’re watching Game of Thrones videos, we don’t need to scan that. You know, and, and the whole idea is find those and get them out of the system. They don’t belong there. And right sizing the work for the environment, Data Sense allows you to pick what scan you do, depending on what the data is. You can do that file level scan or the content scan to give you that benefit of speed and manage and tier the effort that you’re doing to the focused data that needs to be worked on.

But Data Sense is first gonna show you all the data you have, where it is and all the risks combined in it. Then you can build an actionable workflow that helps orchestrate success and you don’t need to use the GUI alone to do that. Data Sense has Python scripting and a full suite of APIs so that if you wanted to incorporate it as an engine in a greater workflow, in an enterprise data lifecycle management solution, you never have to go into the GUI. You can simply use our API calls and get the information. Anything Data Sense learns, all the artifacts about the data is available through those API calls and can be incorporated into an automated data lifecycle management and hygiene workflow. Not only do we have alerting internally, but we connect through our scripts and our APIs to third party solutions and to workflows that allow you to really make this into a full enterprise solution. While at the same time you can go into the GUI, you can see the results highlighted in the dashboard, create an investigation, understand what the results are of that, send reports, alerts, or emails to those who need to hear about it and then act on it all in one place.

Justin Parisi: Now I imagine this functions basically as doing a bunch of scans across normal protocols like SMB, NFS, and S3 in parallel. So from my experience, there’s always a trade off for performance. So what sort of performance tax might there be while these scans are running?

Or is it really not noticeable?

Michael Landau: So Data Sense is designed to scale out, to meet performance needs. Its biggest impact on a system is during its initial scans, we see between half percent to 5% load on production environments. But the key is that we’re gonna work with you to architect the number of scanners that will work in each zone or geo in order to be able to right size and right speed the effort. So if you have a petabyte of data, maybe you only need your BlueXP VM and your Data Sense VM and maybe a second VM If you wanted it to go faster, come to me with 10 petabytes of data, perhaps we’re gonna architect six or eight VMs reading that data locally, either in the cloud or on-prem, wherever it sits, and put more compute near the data and only pass the artifacts, the metadata and the learned artifacts about what’s in the data over to the primary dashboard to minimize the impact on the production environment.

Our customers who follow the architecture generally have little notice of an impact on the environment, but we’ve also incorporated some features. We have slow scan, we have pause, and we have rescan features, as well as granular selection by share or by bucket, which sources to be scanning and to have always-on/incremental forever, and which sources to scan as a project and complete so that you have complete granular control over not only what we’re working on inside Data Sense, what data, but also the impact it has on an environment.

Justin Parisi: So what sort of cost would there be associated with doing this in the cloud? Because I imagine that this might be pretty taxing on your cloud costs. If you do a scan of like, you know, a petabyte of data. Is it any different from doing it on your own manually or is there an additional cost because there’s so many extra threads going on?

Michael Landau: So the Data Sense environment is designed not to exfiltrate data. It’s designed to read within the same zone, and so it’s not freshening the data, it is reading it in place, and the only thing we’re doing is passing the metadata and the artifacts across. But depending on whether you have those VMs that are reading the data on-prem or in the cloud, there is compute related to that, so your VMs would have that compute cost associated with it. Many of our customers put compute on-prem and have collectors just doing the reading in the cloud to minimize the compute impact. But it is not a deep and a very expensive process subject to your volume of data, your actual hyperscaler agreement, and keeping data within zones and read only locally.

Justin Parisi: Okay, so it does a good job of reducing that extra cost so you don’t get sticker shock after the fact, cuz, I mean, it’s one thing to pay for the license to do this, but then , if you get a cloud cost later on, you’re like, oh, no, what, what happened?

Michael Landau: Absolutely. You know, our customers are coming to us to optimize costs, to minimize risk, and not to have headaches that surprise them 30 days later.

But in fact, to allow them to see what data can be reduced from their cloud footprint as opposed to increasing the costs. And we will endeavor to encourage the architecture that we recommend to ensure keeping that cost minimization as a strategy going forward.

Justin Parisi: Does the BlueXP Cloud Data Sense product have a way to do a TCO calculator to kind of figure out what you might pay before you actually use it?

Or is that something you just kind of do on the back of a napkin?

Michael Landau: You know, there are industry standards for expected volumes of duplication, stale data and redundant, obsolete, or trivial data, let’s say the six to 20% for some of these categories and more. I’ve seen 60 and 80% duplication in some populations.

The challenge is to do this with integrity, you need to actually install it and sample customers data. Every organization’s different. They have different retention policies, and candidly, Data Sense is used often to help people design their retention policies and risk and privacy goals internally in the organization because they don’t know what’s in their data.

So the most effective total cost of ownership calculations happen after we get an initial volume deployed in a customer’s environment and get to show them really what’s in their data. Of course, I can tell you. there is consistently double digits in duplication in many of the file shares that I see in customer environments.

There is consistently stale data and over the years I’ve seen statistics that highlight 40% of enterprise data is usually beyond its retention policy, or hasn’t been accessed or edited in four, five or seven years. Therefore, there are a lot of opportunities to show a total cost of ownership benefit or the value proposition of Data Sense.

But candidly, it doesn’t display well or sell well until you’ve got true samples in an individual customer’s environment.

Justin Parisi: So as far as remediation goes for those duplicate files I imagine deleting is one way. Using something like a cloud tiering FabricPool type of thing is another way.

What about like actually migrating this data off to a single volume where you can take advantage of the ONTAP storage efficiencies like deduplication and compression, that sort of thing as another option. Does Cloud Data Sense do something like that already or is that something that could help?

Michael Landau: Well, great question. Cloud Data Sense integrates with CloudSync for sure, and with other of the NetApp capabilities in terms of FlexClone and moving data elsewhere. I think it starts with using the AI engine to pick the right volume of data, and thinking about block level efficiencies is extremely important, but when you start with file level efficiencies, it gives that block level even more impact to combine file level reduction and block level savings. It’s really a double hit on savings. And so Data Sense built in allows you to identify the duplicates, investigate the duplicates, and you have options of copying, moving, or deleting that data to other locations. And so, Data Sense does give you the ability to physically or logically tier off data to the right locations and to remove duplicates and remove redundant, obsolete, or trivial data prior to doing so, and it can automatically take some actions on the data as well, as setting policies to alert you that there’s more data that meets criteria that you don’t wanna keep, and then allow you to then act on it or automate actions going forward. The idea is you can automatically design workflows that Data Sense will copy, move, or delete data. We recommend, by the way, copy first, then delete second.

You know, you can always move the data, but we also integrate fully with CloudSync, which would give you a lot more features and capabilities around moving the data, and allow you the artificial intelligence tools to help select the data effectively and efficiently and then move them.

Justin Parisi: So once the data’s been migrated, can Cloud Data Sense run checksums on the files to make sure that you indeed moved the file correctly?

It is indeed there it is not corrupted and that sort of thing?

Michael Landau: Data Sense will allow you to report on what’s been moved and compare the report before and after. Outside of Data Sense, it’ll provide those reports. However, I suggest that we use the integration of CloudSync to do more structured and formal migrations and bring along the migration management that comes with CloudSync or with the other NetApp tools that exist in our family.

Justin Parisi: Okay, so it sounds like it can do a lot. Is there anything that it can’t do? Is there anything that people have been asking for that they just can’t do that today? Or it’s just not possible?

Michael Landau: Well, there’s always new sources, new storage, new cloud environments that are coming up that we’re adding, that we don’t do today, but we will soon.

Data Sense doesn’t do high availability. It will soon. And, thinking about Data Sense, the way I like to say it is, it is not a data lifecycle management tool. It helps you manage data throughout the lifecycle. It’s not a legal hold solution, but it helps you do legal holds. It’s not a migration tool, but it certainly helps you move data and it can migrate data up to 15 million files at a time is its current.

And so thinking about the fact that Data Sense is showing you about what’s in your data, the age of it, who has permissions, who owns it, what level of sensitivity is the data inside it? It’s not the archive or the manager, but it’s the solution to be the Sherlock Holmes to tell you this is what’s in your data, and to have you able to then make decisions and stage that data to temporary or new resting places or to leave it where it sits.

But no, what’s in your data?

Justin Parisi: Well, Michael,Cecile, thanks for joining us and talking to us all about data governance compliance, as well as BlueXP Cloud Data Sense. So Michael, If we wanted to reach you, how do we do that?

Michael Landau: Well, you can find me at mlandau@netapp.com. You can find me on LinkedIn, our internal teams channel, or send up a flare.

I look .

Justin Parisi: That sounds dangerous. And Cecile, how about yourself?

Cecile Kellam: I know my limitations, so no flares for me, but you could find me on LinkedIn or you connect with me at cecilek@netapp.com.

Justin Parisi: All right, excellent. Well, thanks so much for joining us today, Michael and Cecile, and hopefully we’ll talk again soon about what BlueXP Cloud Data Sense can do for you.

Podcast intro/outro: All right. That music tells me it’s time to go. If you’d like to get in touch with us, send us an email to podcast netapp.com or send us a tweet at NetApp. As always, if you’d like to subscribe, find us on iTunes, Spotify, GooglePlay, iHeartRadio, SoundCloud, Stitcher, or via techontappodcast.com. If you liked the show today, leave us a review. On behalf of the entire TechONTAP podcast team, I’d like to thank Cecile Kellam and Michael Landau for joining us today. As always, thanksforlistening.

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s