Behind the Scenes Episode 366: Incident Response Automation with NetApp Spot Connect

Welcome to the Episode 366, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

2019-insight-design2-warhol-gophers

Incident response in enterprise IT environments is a time critical task, where any delays can mean potential loss of revenue.

When dealing with the cloud, incident response takes on a whole new set of challenges, as most of the infrastructure involved is outside of your administrators reach. At that point, you’re looking at mitigating the risk by addressing incidents at the application layer.

With Spot Connect, you can put more of that infrastructure power into your administrator’s hands and automate incident responses for faster, more effective resolution of issues, which frees up your staff to create better architectures.

In this episode, Prasen Shelar (LinkedIn) of Spot by NetApp joins us to talk about how Spot Connect can revolutionize your incident response workflows.

For more information:

https://spot.io/blog/spot-connect-building-blocks-for-cloudops/

A note from Prasen:

“As Spot Connect is now available for private preview, we are looking for select design partners to further explore additional use cases that can solve real-life CloudOps automation challenges. If you are interested, please reach out to your Spot contact person.“

Finding the Podcast

You can find this week’s episode here:

I’ve also resurrected the YouTube playlist. You can find this week’s episode here:

You can also find the Tech ONTAP Podcast on:

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Transcription

The following transcript was generated using Descript’s speech to text service and then further edited. As it is AI generated, YMMV.

Episode 366: Incident Response Automation with NetApp Spot Connect
===

Justin Parisi: This week on the Tech ONTAP podcast, we tackle the issue of incident response automation and how NetApp Spot Connect can help simplify the life of your SREs.

Intro/outro: [Intro]

Justin Parisi: Hello and welcome to the Tech ONTAP podcast. My name is Justin Parisi. I’m here in the basement of my house and with me today, I have a special guest to talk to us all about NetApp Spot Connect. To do that we have Prasen Shelar on the phone. So Prasen, what do you do here at NetApp and how do we reach you?

Prasen Shelar: Hi Justin. Great to be here. So my name is Prasen Shelar as you know. I came in with the acquisition of Filament. Filament is a product which basically was focused on SRE and DevOps.

We used to do incident response automation, so I was the director of product there. After acquisition, the product name is changed and we are called Spot Connect and I work as the head of product here at Spot Connect.

Justin Parisi: All right, cool. So you came in as an acquisition through a company called Filament.

So let’s talk about Filament and let’s talk about Spot Connect and what it is.

Prasen Shelar: Back in the days, people used to automate stuff using Excel sheets or scripting like DevOps and SREs, right? And at that point, the co-founders like Pradeep and S ayan, they basically came up with an idea of automating the entire incident response process with an AI sort of a tool. But moving from ad-hoc scripts to an AI automation is jumping like from third world to first world, right? So someone has to actually build these workflows when it comes to incident response. What they did is they built a low-code, no-code workflow builder, which essentially helps you create a workflow and connect different tools that you need to respond to an incident that you get usually in all the APMs you have.

All the application performance monitoring tools, which you have like Datadog, Instana and New Relic. All the alerts you get, such as your EC2 is out, your CPU ,memory sort of alerts. When you respond to them, you need basically a tool where you can build these workflows, connect different tools, and then that is the magical piece they build back in Filament.

And so the product was mainly used to target customers who wanted to do incident response automation. So we provided not just the workflow builder, but also domain expertise on top of it. Build the workflows which were really useful for all these folks back then, and that’s how the company started.

We started getting a lot of traction in that space and I think that’s how Spot folks were really interested in us because the domain we picked was DevOps and SREs, but the workflow engine we built was quite flexible, and it can actually fit into any person a or any customer use case as such.

So I think the way we look at that acquisition was more around how can we take the workflow engine and now fit into cost optimization or any other CloudOps use case.

Justin Parisi: So, this is basically a way to do incident responses. If you’re a help desk person or a CloudOps person where you don’t want to get inundated with a bunch of alerts and false alarms and that sort of thing. Or if it’s something really simple that you can knock out with an automation task, I guess that’s what Spot Connect’s intent is here?

Prasen Shelar: Yeah. I mean, that’s how we started initially and that’s how we targeted the SRE persona, who were only focused on incident response automation at that point. So let me give you an example. For instance, let’s say your EC2 is out and you have a service deployed on it, right? Let’s say your Jenkins is deployed on EC2 and your EC2 is out in that particular region. You get that alert from AWS Health. And now you have to respond to it. You have a runbook, but there’s a bunch of different steps you need to actually do manually, or even through scripts.

You need to first check if your service is out. You need to have some sort of API monitoring that you do with maybe a tool like Datadog. Then you actually confirm. Then you clear ticket in Confluence. Then you go down, send a message to the on-call engineer. The on-call engineer will come on board.

We’ll start looking at the alert a little more in depth, and then we’ll switch your service from that particular region into a different region. This whole process needed automation, and then that is what we automated using our workflow builder back in Filament. Again, we targeted that particular space, which was only DevOps and incident response automation.

But now take this workflow engine out. And then move it into a cost optimization use case. Now that’s when you come to Spot, right, where Spot in general does all CloudOps products. So if you have a use case that’s not completely geared towards incident response, but more on the cost management, cost optimization of site where you need to scale up a cluster or scale down a cluster.

These processes even now are semi-automated or maybe manual. That’s where our workflow engine, the way we have built it, which is low code, no code, which is just a drag and drop technology can be utilized in these particular domains as well, and which makes it super easy, saves a lot of time and it’s really easy to use.

Justin Parisi: Okay. And I would say that incident response automation is a major piece of cost optimization because when you have a response, if you don’t act fast enough, if you get an alert and the SRE has to get the alert and then read the alert and then figure out how to fix the problem or figure out if it’s even a problem that is precious seconds or minutes or hours that are being taken away from production workloads, whereas if you automate these processes, That takes care of it all right away. And then you don’t lose all that valuable time that you could have saved.

Prasen Shelar: Totally. And then that’s how we started. We saw a lot of uptick in terms of customer traction. We were getting a lot of requests to build custom workflow around incident response, and then the categories grew after that, of course, because that’s how we started showing value of our product.

Justin Parisi: Yeah, absolutely. All right, so tell me a little bit more about how this integrates into Spot and is this something I can use today?

Prasen Shelar: There are different layer of products. The CloudOps in itself is divided into compute, storage, like Elastigroup, Ocean, Eco, all these different tools which have different layers of services, right? So Spot Connect right now is basically acting as the glue between all of these products. You have a use case to scale up your clusters or scale down your clusters. With Elastigroup, you can create a workflow that does it, and you can also schedule it on a regular basis. So with the particular frequency, right?

And that’s us acting as the glue between all these products because we connect with all these APIs of all these products that are there within the Spot ecosystem. And then we also are vendor neutral, so we also connect with tools like Slack, e-mail, ServiceNow, JIRA. So you can create tickets, you can communicate with other on-call engineers easily, and then that adds the value, right?

So we grow in terms of use cases, and then the workflow builder we have helps with orchestration, automation, and collaboration altogether. We are currently at beta. We released our alpha version back in December. So the version is out and we can feature flag our product to customers who are willing to give it a try, and then also we’ll help them build the use cases that they need. So we’re in the phase of understanding what more use cases we can build within the product and add them as templates.

So we’re looking for design partners would be ready to use the product. The product is complete. We’re just still working out some enhancements and building the templates we need at this point in time.

Justin Parisi: So you mentioned SREs and incident response automation as one of the use cases here.

Do you have any other examples of use cases that Spot Connect fits with?

Prasen Shelar: Yes, definitely. so it’s not just these two personas, right? The reason I’m mentioning these two personals, because we focused on the Spot ecosystem use cases at this point in time, like Elastigroup is mainly a DevOps persona. Ocean is again a DevOps SRE persona. Eco is more into FinOps, right?

Then we have security, which is SecOps. So, All these different personas can be supported. We have workflows around security automation. So let’s say your misconfigurations that you see in security products right now where likely it is more to do with CCSPM right? You have some misconfigurations that are basically acted as alerts.

You can bring them in to Spot Connect and you can also do incident response for them. So you can quickly respond to an alert where your S3 bucket is public, your EC2 is misconfigured. Your IAM rules are misconfigured. So you can even respond to those sort of use cases. We even can help you with use cases where let’s say you want to notify your on-call teams or engineers that some of the reserved instances you have may need more optimization, you can even get into that. We have Ocean use cases, which is more into rightsizing recommendations for your pods and clusters right now. So we can help you automate those sort of things as well. we can drill down into specific products and specific personas, but we can also do alerting and ticketing on top of these different alerts that you get.

Justin Parisi: Right? So is this available anywhere Spot is available, meaning all clouds, or is it only available in certain cloud offerings?

Prasen Shelar: So right now we only support AWS and some part of GCP. So again, let me explain this a little bit, right? The way we orchestrate or automate, or the way you can create these workflows is with the integrations we support. So an integration for us is nothing but an abstraction over the API layer of any particular third party service.

For, for cloud services, we have a node which basically takes on all the services and operations, right? So that cloud provider is covered with that node. We can build a similar integration with GCP and Azure, which will cover those set of use cases as well. So, Right now the AWS node is there. You can call any action in our workflow with AWS. We are working on the GCP and Azure side of roadmap, but then we have third party integrations right now, which are not related to directly cloud providers, but are more to do with APMs cost optimization notification you know, GitHub, all the resources integrations as well.

So at this point in time, only AWS, but you’ll soon come up with GCP and Azure side of things as well.

Justin Parisi: Okay. Makes sense. So if I wanted to access it or use it, how would I do that? And are there any demos out there? Are there any documentation that I can read over to kind of learn more about it? What’s available today?

Prasen Shelar: Yeah. So we are available in the Spot console right now. In order for you to access our product, you need to get in touch with your Spot representative, and then they will enable Spot Connect for you in your particular organization.

And once you get that, you would be having access to the entire portfolio or the product or the number of workflow that we already have there. You can start using it. So you can build as many workflows you need. I can run them as many times as you need. You can connect with different tools we have.

So that’s how to get to the product. We have a full blown documentation around how to create workflows, run workflows, but that’s available in Spot documentation right now under Spot Connect category. And yes, there are certain videos that we published, a couple of blogs we’ve written so far which you can easily tap into.

And then for in a documentation, right, as I mentioned, we have an overview documents, which talk about how to create workflow, run workflow, but it’s also like specific integration level documentation we have. So let’s say you want to use PagerDuty. What sort of use cases we support with PagerDuty are also written in this document, and then the specific actions, the use of every action.

And then the inputs are for parameters too, how you can configure those in your workflows. So there’s a full blown documentation available. There’s of course demo videos for a product under the Spot console. In the documentation page, a couple of blogs. So all this information will help you create different things and run things in Spot Connect.

Justin Parisi: Alright. So with anything that involves automation it needs to be easy, right? I don’t want it to be too hard. I don’t want it to be more trouble than it’s worth. So tell me about Spot Connect. How simple is it to set up a workflow and what does the process involve?

Prasen Shelar: Yes, I think this is the best one that I always love to answer.

Our product is really simple to use. I think one of the simplest that you may have seen so far in terms of visual interfaces, right, because it’s a drag and drop engine. The crux of our platform is the workflow builder. We just drag and drop things and create a logical flow of your choice. Now, how do you do this? So the first step is to understand what integrations you need, and as I said, an integration is nothing but an API call. You go to integration and set that instance up from your environment.

Let’s say I’m a customer and I’m using PagerDuty for instance, right? I will connect my personal instance to it. And now that I’ve configured the integration, I will be able to use all the APIs or the actions that we support in the Spot Connect console. Now you go to the workflow builder, in the left navigation panel of all the action library that we support, you can just click on the PagerDuty, drag and drop it to the canvas and then connect it with the different actions of your choice. So let’s say you wanted a PagerDuty alert trigger as the starting point to trigger your workflow.

It could come from any of your monitoring services like your CPU alerts or Azure app latency or anything of that sort. And that’s a starting point. And then the next set of actions you need are maybe to do more enrichment using some sort of APM after that. Or it could be a Slack notification, or it could be an email notification, or it could be creating one more ticket after that or maybe closing the ticket.

So all these API actions can be easily connected to this one node that you added and create a logical workflow for your choice. They can also add a condition to it make sure that you have some sort of an approval maybe taken from Slack. When you are asking an on-call engineer if you want to proceed with the next set of actions or not, right?

And that’s easy it is to just drag and drop things, create that, and then run it. And for running the workflow, it could either be manually by clicking the button run now, or it could also be a workflow that gets triggered automatically on top of the alert system that you have configured using the web hook of PagerDuty.

So it gets triggered automatically. It can be triggered manually, or you can schedule it at a particular frequency. So for instance, I want a particular monitoring workflow to run every 30 minutes or every 60 minutes, I’m maintaining, or I’m monitoring my infrastructure to see if my EC2s are in place, if my S tier is in place, but I run the workflow at a particular frequency and I check all those things.

So those are the three triggers we support. Simple drag and drop flow. And then of course, in the end you can look at the execution of the workflow by clicking on each step that you had configured. And that’s it. You just integrate, orchestrate, and then execute and look at execution, and that’s the whole end to end workflow.

Justin Parisi: Yeah, drag and drop is usually the best way to do things, right. Cuz you don’t want to have to worry about coding automation with REST APIs or worry about confusing drop down boxes and that sort of thing. And it’s just much faster to be able to just point and click, drag it around and be done.

Prasen Shelar: Yeah, and maintaining a access is so time consuming and creating these run books, you don’t have any idea of how they’re running, which environment they’re running into. You need to educate people on how to do that. You hire new people and they get confused. It’s a whole lot of mess. And as the organizations grow, the mess will grow as well.

So you sort of need to orchestrate it really well and create your own service and Spot Connected. I think the best tool to do that. Within the Spot ecosystem and outside of the Spot ecosystem.

Justin Parisi: So yeah. Can we talk a little bit more about outside the Spot ecosystem? Like what sort of use cases do you envision there?

Prasen Shelar: Yeah, That’s actually a really, really good question, right? So when it comes to Spot we, we just dealt with the person personas of DevOps, SREs, FinOps, right? But not necessarily the workflow engine could only be used to these domain specific things, right? Imagine you want to do infrastructure automation or maybe you have some sort of onboarding workflow, like you onboard an employee, or yeah, maybe you have some customer success related workflows, right?

Where certain APIs need to be called. And a certain logic needs to be closed. So I’m onboarding an employee and there are like 10 different things that I need. I do that maybe on a daily basis. I could just create these API layers. And again, an integration for us is super simple. So you can quickly add these integrations in the product, create a workflow with all the onboarding actions and then run back.

Or you could have a customer success workflow. Maybe you were managing a bunch of customers and you now need alerts for any of these things. Let’s say your customers are going out of date, they need certain things, or maybe you have Slack connections that you need with these triggers, you can add them to your workflow easily.

Maybe it could be alerting mechanism based workflows for customers, right? So even these tickets or these Confluence-based workflows can be easily managed within the platform. So it’s not only the DevOps domain or maybe the FinOps domain, but it’s also things that just need like a workflow builder, like a drag and drop mechanism with integrations of your choice.

And it’s pretty easy to do that, like steady, safe manage, for instance, right? So those are like some of the examples we have in the product. We can quickly build integrations for them and we can also have workflows built up in a day or two easily.

Justin Parisi: Okay, cool. So it sounds like there’s a lot of feature richness already there. Even though it’s a beta product.

Prasen Shelar: Oh yeah, yeah, yeah. We have around 35, 40 plus integrations. And ton of actions ton of templates. We have around, I think, 78 templates that we support already with eight different categories that include onboarding, cost optimization, API optimization, infrastructure automation, steady state management, Spot based workflows, right.

And each template is basically a way of you learning of how to implement a use case. You can quickly duplicate that and create a workflow of your own with it. So you get to know how these things are structured right now as part of the workflow. And then just add your own integrations to it or your own instances to it and you’re good to go.

You don’t even need to spend a lot of time there. It’s a really easy way to integrate and is quick to replicate and then run your own workflows. And then we also support the concept of workspaces, basically just like RBAC so you don’t want your production based workflow to be touched by anyone else in your org, right? So you create a specific workspace only where you get access to these prod environment integrations and you run your workflows there. And that’s even one more thing that we have in the product that makes it easier.

Justin Parisi: All right, excellent. So if I wanted to find more information about Spot, you mentioned some places you can go. What’s the website we want to look at?

Prasen Shelar: So one thing I mentioned earlier is that you just go through the Spot console if you already a Spot customer. And if you want access to Spot Connect, you just need to get in touch with your sales rep and they will enable Spot Connect for you and you start using the product. As you need more information on it, right? You can go to our documentation portal, which is docs.spot io, and on the left hand side you’ll find a category called Spot Connect. Just click on it. There’ll be a demo video over there with all the relevant documentation as to how to get in touch with people, how to create a workflow, how to run these workflows maybe give you an overview of how the product is and stuff like that, with all the demos possible. There’s also blogs that we published for Spot Connect right now, which give you a quick overview of what sort of use cases we support, and then what the integration plan would be, what the roadmap is and all those other things. So I think these combination of these three things would be really good.

And again Just to make it more personal, you can always get in touch with the product team at Spot Connect, and we are more than happy to help each and every customer as we onboard them.

Justin Parisi: Well, Prasen, thanks for joining us today and talking to us all about how NetApp Spot Connect can help you automate your incident response tasks in an ever-growing cloud world.

All right, that music tells me it’s time to go. If you’d like to get in touch with us, send us an email to podcast@netapp.com or send us a tweet @NetApp. As always, if you’d like to subscribe, find us on iTunes, Spotify, GooglePlay, iHeartRadio, SoundCloud, Stitcher, or via techontapodcast.com. If you liked the show today, leave us a review.

On behalf of the entire Tech ONTAP podcast team, I’d like to thank Prasen Shelar for joining us today. As always, thanks for listening.

Intro/outro: [Outro]

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s