Investigate VM performance with Performance Analysis
With complex networks consisting of cloud, hybrid IT, virtualization, storage area networks, and so on, multi-faceted IT issues can be difficult to pinpoint and diagnose. When an issue surfaces, for example a badly performing application or server, the investigation can take significant time to locate the core issue. The problem could be in storage, network connectivity, user access, or a mix of resources and configurations.
To investigate the issue, create troubleshooting projects with the Performance Analysis (PerfStack™) dashboard that visually correlate historical data from multiple SolarWinds products and entity types in a single view.
With Performance Analysis dashboards, you can do the following:
- Compare and analyze multiple metric types in a single view, including status, events, and statistics.
- Compare and analyze metrics for multiple entities in a single view, including, nodes, interfaces, volumes, applications, and more.
- Correlate data from across the Orion Platform on a single shared time line.
- Visualize hybrid data for on-premises, cloud, and everything in between.
- Share a troubleshooting project with your teams and experts to review historical data for an issue.
For VMAN, the possibilities are endless for application analysis and hybrid environments:
- Visually walk through historical data for VMs in your environment
- Verify resource allocation issues in hybrid environments
- Correlate data to troubleshoot network traffic sent and received by virtual servers (hosts, clusters, datastores, and VMs), on-premises servers, and cloud instances
The following example shows you how to identify a root cause for a VM experiencing performance issues. In this scenario, a virtual host encountered a resource and performance issue to the point where users encounter slower responses and access. The issue triggered an alert, which notified your application owner, who escalated the issue to system and network administrators.
Create a new troubleshooting project to investigate the issue to compare metrics for the host and all related virtual environment systems to track trends and spikes in usage.
In the Orion Web Console, select My Dashboards > Home > Performance Analysis.
This opens the Performance Analysis, or PerfStack, dashboard to build charts and graphs using metrics pulled from monitored applications and servers in the Metric Palette. Each chart can hold multiple metrics to directly correlate data.
In the New Analysis Project, click Add Entities.
To get started, you need to locate and add the VM in distress. In the search field, enter syd to bring up a list of virtual servers sharing that name. Expand and select Types or Status to filter the list if needed.
From the list, we find the virtual host encountering the issues and triggering alerts. Select the host and add it to the dashboard Metric Palette. Click the related entities icon to display all related servers and services to the host.
Interested in all associated nodes, applications, servers, and more to this selected node? Click the related entities icon. All related entities display in the Metric Palette providing more options for metrics possibly causing issues.
Select the syd host node to view and select metrics to drag and drop onto the dashboard. You can drag them into the same chart to compare values between metrics.
To start investigating, pull a series of metrics for the host and cluster, comparing metrics to find spikes or high usage. For this scenario, add these host metrics:
- Maximum Network Usage
- Maximum Network Transmit Rate
- Maximum Network Receive Rate
- Virtual Machines Running
For the cluster, add these metrics:
- Average CPU Load
- Average CPU Usage
The charts and graphs display with data and alerts for the Last 12 hours of metrics. You can expand the date and time to see additional historical metrics over the course of the alert.
Add usage metrics for VMs on the host to compare network usage and activity.
Analyzing the data, the issue looks to be a noisy neighbor for one of the virtual machines consuming resources and experiencing high traffic causing bottlenecks and issues for VMs sharing the host. Basically, another server, service, or application is consuming higher bandwidth, disk I/O, CPU, and other resources causing issues for this specific application.
This information gives your network and system administrators a direction for further investigation and resolving latency issues. To resolve, they can reallocate resources or move the high-consumption application to another location.
Click Save and give the project a name.
The project saves as a dashboard with the selected metrics in the set date and time range.
When saved, the URL becomes a sharable link. Copy and share the link to the saved dashboard in tickets or emails sent by the system and network administrators and the product owner. They can access the link to review the gathered data and troubleshoot.
After reallocating resources and making network changes, reopen the dashboard to verify changes and new usage trends for polled metrics.