Investigate application performance with PerfStack
With complex networks consisting of cloud, hybrid IT, virtualization, storage area networks, and so on, multi-faceted IT issues can be difficult to pinpoint and diagnose. When an issue surfaces, for example a badly performing application or server, the investigation can take significant time to locate the core issue. The problem could be in storage, network connectivity, user access, or a mix of resources and configurations.
To investigate the issue, create troubleshooting projects with the Performance Analysis (PerfStack™) dashboard that visually correlate historical and real-time data from multiple SolarWinds products and entity types in a single view.
With Performance Analysis dashboards, you can do the following:
- Compare and analyze multiple metric types in a single view, including status, events, and statistics.
- Compare and analyze metrics for multiple entities in a single view, including, nodes, interfaces, volumes, applications, and more.
- Correlate data from across the Orion Platform on a single, shared time line.
- Visualize hybrid data for on-premises, cloud, and everything in between.
- Share a troubleshooting project with your teams and experts to review historical data for an issue.
For SAM, the possibilities are endless for application analysis and hybrid environments:
- Visually walk through historical data for applications in your environment
- Verify resource allocation issues in hybrid environments for a specific application
- Correlate application data from templates, component monitors, Hardware Health monitoring, and more into a single view
To learn more about PerfStack, see the Orion Platform Administrator Guide.
The following example shows how to identify a root cause for a Windows Server 2012 application performance issue. In this scenario, the application performance has degraded to the point where users encounter slower responses and access. As you review the Windows Server 2003-2012 template dashboard, you find triggered alerts. These alerts notified your application owner, who escalated the issue to system and network administrators.
Rather than digging into the alerts and multiple Node Details pages to troubleshoot the issue, create a new troubleshooting project to investigate the issue.
Click My Dashboards > Home > Performance Analysis.
This opens the PerfStack, dashboard where you can build charts and graphs using metrics pulled from monitored applications and servers in the Metric Palette. Each chart can hold multiple metrics to directly correlate data.
In the New Analysis Project, click Add Entities.
Entities include all monitored and managed servers, applications, devices, services, and more.
Select Entities and click Add selected items.
To get started, locate and add the Windows 2012 application in distress. In the search field, enter Windows to get a list of all monitored nodes, component monitors, and more with Windows in the name or type. Expand and select Types or Status to filter the list.
From the list, we find the application monitor watching Windows Server 2003-2012 Services and Counters. Select and add it to the dashboard Metric Palette.
Click the related entities icon to display nodes, applications, servers, and other entities related to the selected node in the Metric Palette so you can check if metrics are causing issues.
Select the Windows 2012 node to view and select metrics to drag and drop onto the dashboard. You can drag them into the same chart to compare values between metrics.
To start investigating, pull a series of metrics for the IOPS throughput for the server. For this scenario, add the following metrics to investigate latency and connectivity:
- Logical Disk Average: Disk Queuing
- Average IOPS Read
- Maximum IOPS Write
- Maximum IOPS Read
- Average IOPS Write
- IO Latency Write
- IO Latency Read
- IOPS Total
The charts and graphs show data and alerts for the Last 12 hours of metrics.
Add more metrics from the virtual and storage layers to investigate and confirm any bandwidth spikes.
For example, add metrics for the IO latency from the virtual and storage layers to locate issues:
- IO Latency Write
- IO Latency Read
- IOPS Total
- Throughput Total
Analyzing the data, the issue looks to be a noisy neighbor. Basically, another server, service, or application is consuming higher bandwidth, disk I/O, CPU, and other resources causing issues for this specific application.
This information gives your network and system administrators a direction for further investigation and resolving latency issues. To resolve, they can reallocate resources or move the high-consumption application to another location.
Click Save and give the project a name.
The project saves as a dashboard with the selected metrics in the set date and time range.
When saved, the URL becomes a shareable link. Copy and share the link to the saved dashboard in tickets or emails sent by the system, network administrators, and the product owner. They can access the link to review the gathered data and troubleshoot.
After reallocating resources and making network changes, reopen the dashboard to verify changes and new usage trends for polled metrics.