Behind the Scenes: Episode 157 – Performance Analysis Using OnCommand Unified Manager

Welcome to the Episode 157, part of the continuing series called “Behind the Scenes of the NetApp Tech ONTAP Podcast.”

tot-gopher

This week on the podcast, we welcome Mr. Performance himself, Tony Gaddis (gaddis@netapp.com) to give us a tutorial on easily finding performance issues using OnCommand Unified Manager, as well as some common “rules of thumb” when it comes to how much latency and node utilization is too much.

Also, check out Tony’s NetApp Insight 2018 session in Las Vegas and Barcelona:

1181-1 – ONTAP Storage Performance Design Considerations for Emerging Technologies

Podcast listener Mick Landry was kind enough to document the “rules of thumb” that I forgot to add to the blog in the comments. Here they are:

  1. Performance utilization on a node > 85% points to latency issue on the node (broad latency for volumes on the node)
  2. Performance capacity used on a node > 100% points one or more volumes on the node that have latency due to CPU resources running out.
    • This is not an indicator of CPU headroom.
    • 100% is “optimal” – below is wiggle room.
  3. Spinning disk
    • Aggregate performance utilization – not capacity.
    • > 50% relates to disk latency impact will increase.
    • When queueing starts will double or triple latency on slow platters.
    • Performance utilization of the disk drive.
  4. Fragmented free space on spinning disk
    • Increases CP processing time
    • 85% utilization of capacity of aggregate, this will become a problem.
    • > 90% will impact heavy workloads
  5. Node utilization from an HA point of view
    • Keep the sum on the node utilizations less than 100% and will be okay.
    • For “user hours”, on “revenue generating systems”
  6. Disk
    • Spinning disk utilization < 50%
  7. Aggregate latency expectations
    • SATA latency < 12ms
    • SAS latency < 8ms
    • SSD latency < 2ms

Finding the Podcast

You can find this week’s episode here:

Also, if you don’t like using iTunes or SoundCloud, we just added the podcast to Stitcher.

http://www.stitcher.com/podcast/tech-ontap-podcast?refid=stpr

I also recently got asked how to leverage RSS for the podcast. You can do that here:

http://feeds.soundcloud.com/users/soundcloud:users:164421460/sounds.rss

Our YouTube channel (episodes uploaded sporadically) is here:

Advertisement

2 thoughts on “Behind the Scenes: Episode 157 – Performance Analysis Using OnCommand Unified Manager

  1. You noted that you were going to post the rules of thumb in the podcast.

    This is what I captured:

    1. Performance utilization on a node > 85% points to latency issue on the node (broad latency for volumes on the node)
    2. Performance capacity used on a node > 100% points one or more volumes on the node that have latency due to CPU resources running out.
    a. This is not an indicator of CPU headroom.
    b. 100% is “optimal” – below is wiggle room.
    3. Spinning disk
    a. Aggregate performance utilization – not capacity.
    b. > 50% relates to disk latency impact will increase.
    c. When queueing starts will double or triple latency on slow platters.
    d. Performance utilization of the disk drive.
    4. Fragmented free space on spinning disk
    a. Increases CP processing time
    b. > 85% utilization of capacity of aggregate, this will become a problem.
    c. > 90% will impact heavy workloads
    5. Node utilization from an HA point of view
    a. Keep the sum on the node utilizations less than 100% and will be okay.
    b. For “user hours”, on “revenue generating systems”
    6. Disk
    a. Spinning disk utilization < 50%
    7. Aggregate latency expectations
    a. SATA latency < 12ms
    b. SAS latency < 8ms
    c. SSD latency < 2ms

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s