Monitor hardware health in SAM
You can use SAM to monitor the health of Cisco UCS, Nutanix, Dell, HP, HPE ProLiant, and IBM hardware details such as temperature, fan speed, power supply, CPU, memory, disk space, and more. SAM provides instant visibility of hardware status (for example Up or Down); it can also calculate baseline data that can be used to configure hardware health thresholds.
This section focuses on Dell, HP, HPE ProLiant, and IBM devices. See the Orion Platform Administrator Guide to learn how to:
- Monitor Cisco UCS Devices. The first step is adding the parent UCS controller as an Orion Platform node.
- Monitor hardware health for Nutanix clusters. After adding Hyper-V or VMware nodes for monitoring, add the parent Nutanix cluster and provide Controller VM (CVM) credentials.
UCS and Nutanix hardware health monitoring does not currently support the Orion Remote Collector feature.
To get started monitoring hardware health for Dell, HP, HPE ProLiant, and IBM devices:
- Review the Monitor hardware health section of the Orion Platform products Administrator Guide.
- Review Hardware health monitoring requirements for SAM.
- Download, install, and configure agent software from third-party vendors so SAM can gather details that are not available natively from server operating systems. Click here for details.
- Run Discovery to detect third-party agent software on nodes, deploy agents that act as hardware health sensors, and automatically enable hardware health monitoring across multiple nodes. When Discovery enables hardware health monitoring for eligible devices, Asset Inventory data collection is also enabled to track each node's hardware and software daily.
Although Hardware Health and Asset Inventory can both be enabled automatically during Discovery, they can poll independently. For example, you can collect Asset Inventory from a node once a day, and collect hardware health every 10 minutes.
Note the following details about hardware health monitoring in SAM:
- In addition to using Discovery, you can enable monitoring in the Add Node Wizard or Node Details views.
- Certificate errors found during polling are ignored by default, but you can change that setting.
- For tips on monitoring HPE Proliant Gen10 servers, see this THWACK post.
- If you encounter issues, see Troubleshoot hardware health monitoring in SAM.
Hardware health monitoring is a database-intensive feature. Heavy usage can impact database performance and increase the size of the Orion database. To improve performance, consider how often you need to poll statistics, and how long data is archived. See Update polling settings in the Orion Platform.
Related topics in the Orion Platform Administrator Guide include:
- Change the MIB used for polling hardware health statistics
- Edit hardware health thresholds
- Enable, disable, or adjust hardware health sensors
- Troubleshoot hardware issues in the Orion Platform
The Difference in hardware health by manufacturer and polling method for servers article in the Success Center may also be useful.
Orion Web Console widgets
To learn about widgets shared by several products, see Orion Platform online help. For example, the following Hardware Health widgets are documented in Orion Platform online help.