Oracle Coherence Monitor User Guide

Oracle Coherence Monitor

User Guide

Using the OC Monitor - Introduction
This section describes the OC Monitor main display and the Coherence cluster metrics it provides, as well as basic OC Monitor GUI navigation and functionality. Except where noted, the items described here apply to the OCM Stand-alone version and the EM Solution Package version. The main difference between the OCM Stand-alone GUI and the OCM Solution Package GUI is the OCM Solution Package version provides a subset of OCM Stand-alone displays. For details, see Navigation Tree.

This section includes:

OCM Main Display: This section describes indicators in the main OC Monitor display.
OCM Navigation: This section describes the general operation and functionality of the OC Monitor user interface.

OCM Main Display
The OC Monitor main display, Cluster - Overview, enables you to quickly assess the configuration, activity and health of all of your Coherence clusters. Select a cluster from the Cluster drop-down menu. The following figure describes the cluster indicator areas.

The OCM main page reports on the following areas:

Coherence Cluster Configuration: Get the cluster name and total counts of members (JVMs) in each cluster. Counts include storage nodes, client nodes (non-storage nodes), the total number of caches and the version of Coherence used in the cluster.
Memory: Get memory information, including heap size and used memory totals for all storage and client (non-storage) nodes. Also see the total percent memory usage for storage and client nodes. A blue-colored recent memory usage trend chart is displayed for storage nodes and another for client nodes.
Service Configuration & HA Status: Check the high-availability (HA) status for all Coherence protocol-related cache services used by applications in the cluster. The StatusHA column indicates whether primary and backup objects are distributed for surviving machine failure or storage node failure. The most secure status is MACHINE-SAFE which indicates that an entire host could fail and all data could be recovered. NODE-SAFE indicates that a storage node could fail and data could be recovered, but data could be lost with a host failure. ENDANGERED indicates that the loss of a single storage node could result in data loss in the cluster. Note that Coherence does not track whether enough free memory is available for surviving machine or storage node failure without data loss.

Also, see the number of caches in each cache service, the number of storage nodes participating in each cache service, the number of objects in each cache service, the senior member for the cluster and the senior members for each service. Click a cache service to view details in the Single Service - Summary display.

Most Gets: Check on your four busiest caches (in terms of the number of gets reported in the last measurement). Mouse-over the bar charts to see the most recent metric for each cache. Select the Cumulative box to display the total number of gets since the cluster started (or since statistics were last reset). Statistics from the most active cache are displayed in the upper right field.
Largest Cache: Check on your four largest caches (in terms of memory usage). The largest cache’s current size, in units, is displayed in the upper right field. Mouse-over the bar charts to see the most recent metric for each cache. Units are user defined in your cache configuration file. Typically units are set to either the number of objects or the number of bytes consumed by objects.
Node Uptimes: Assess cluster stability. View how long nodes in the cluster have been members of the cluster. The OCM looks at the start time of every node in the cluster, determines how much time the node has been part of the cluster and categorizes the nodes into seconds, hours, days, weeks or months. Typically, if a node leaves the cluster as the result of a fault condition, the node subsequently rejoins, appears as a “younger” node, and is placed in the seconds, minutes or hours category. This metric can be an important indicator of cluster instability. If a node leaves the cluster and fails to rejoin, the node is subsequently shown in the Departed Nodes box.
Memory Utilization: Determine whether cluster memory usage has been increasing over the last hour and by how much. These bar charts are capacity indicators as well as garbage collection indicators. The memory usage is shown for the most recent data collection, and as an average for the last hour. The red bar chart is for the node in the cluster with the highest memory use, and the yellow bar shows memory use averaged across the whole cluster.
Communication Success Rate: Determine whether packet loss is occurring. The bar graph uses pairs of bars in which one bar represents the publisher success rate and the other represents the receiver success rate. The bar charts are the most important indicator for any issue affecting cluster health or performance. The chart shows the TCMP publisher and receiver success rates for the last 20 minutes. The success rate is typically 99% or greater in healthy clusters. Publisher/Receiver failures (the inverse of success) indicate that packets sent between nodes are not being acknowledged within the timeout period (which is typically 250 ms). This is typically the result of a node being unavailable due to garbage collection. However, there are many other possible causes as well (such as a network issue, a defective NIC card, a garbage collection issue, disk swapping, or a shortage of CPU on a single machine). Investigate further by clicking the bar chart to view details in the Cluster - Memory/Network Health display.

OCM Navigation
This section describes the general operation of the OC Monitor and the user interface. NOTE: Typically, it takes about 30 seconds after a server is started to appear in an OC Monitor display. By default, data is collected every 15 seconds, and the display is refreshed 15 seconds after that.

To access this information online select the ? button in the top right corner of any of the OC Monitor displays.

The following figure illustrates the OCM Stand-alone version.

Navigation Tree
The following figure illustrates the OCM Stand-alone version navigation tree and the OCM Solution Package navigation tree. The OCM Stand-alone navigation tree contains a series of drop-down menus that are organized by subject area. The OCM Solution Package navigation tree provides a subset of OCM Stand-alone displays.

OCM Stand-alone

OCM Solution Package

OCM Stand-alone Display Areas

EM Solution Package Displays

Cluster Views: Use these displays to assess Coherence cluster-level performance and utilization.
Proxy Services: Use these displays to assess proxy service performance metrics.
Cache Services: Use these displays to assess cache service performance metrics across all nodes.
All Caches: Use these displays to assess performance and utilization of all caches in the cluster.
Single Cache: Use these displays to investigate performance, utilization and activity metrics of a single cache.
All Nodes: Use these displays to assess node-level performance and utilization in the cluster.
Single Node: Use these displays to investigate performance and utilization metrics of a single node.
OC Administration: Use these displays to manage your Oracle Coherence metrics, nodes and caches.
Alert Views: Use this display to view alert states and manage alerts for all nodes in the cluster.
Administration: Use these displays to perform OCM administrative tasks for nodes, caches, alerts and metrics.

Cluster Summary: Use this display to quickly assess general cluster stability. This is the OCM main display.
Service Summary: Use this display to assess the performance of a service aggregated across all nodes.
Cache Summary: Use this display to perform a low level cache utilization analysis for a cache. Check the metrics for Size, Evictions and Misses to determine whether more capacity is needed.
Node Summary: Use this display to perform node utilization analysis.
All Services History: Use this display to assess capacity utilization, over time, by all services in a cluster.
All Caches History: Use this display to assess capacity utilization, over time, by all caches in a cluster.
All Nodes History: Use this display to assess capacity utilization, over time, by all nodes in a cluster.

Title Bar
Each display shares similar title bar functionality, described below.

Cluster Selector	Select a cluster from the drop-down menu.

Convenience Navigation	Convenience navigation buttons enable you to toggle between the most commonly accessed displays from the current display. These buttons are not available on all displays, and differ from one display to another.

Connection Indicator	Conn OK	Indicates that one or more servers is delivering data.
	No Data	Indicates that one or more servers is found but no engines are delivering data.
	No Conn	Indicates that no server is found.

Open New Window	Open one or more instance of the same display. Each window operates independently, allowing you to switch views, navigate to other displays in the Monitor, and compare performance data.

Help	Links to the online help page for the current display.

Open Multiple Windows

The following illustrates the usage of the Open New Window button. Typically, users have the Alert Detail Table display open in one window and another display, such as the All Caches - History Heatmap display, open in another window.

Tables and Sorting

The following illustrates the usage of to sort table columns in alphanumerical order. In this example, the Alert Level column is sorted.

Mouse-over

The mouse-over functionality provides additional detailed data in a popup window for trend graphs and heatmaps.

The following illustrates mouse-over functionality in a trend graph object. In this example, when you mouse over the Out Messages trend graph, the message out rate is shown at 60 second intervals throughout the graph.

The following illustrates mouse-over functionality in a heatmap object. In this example, when you mouse over a host, details are shown such as message inbound and outbound rates, and the number of pending messages.

RTView contains components licensed under the Apache License Version 2.0.

JMS, JMX and Java are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. They are mentioned in this document for identification purposes only.

SL, SL-GMS, GMS, RTView, SL Corporation, and the SL logo are trademarks or registered trademarks of Sherrill-Lubinski Corporation in the United States and other countries. Copyright © 1998-2013 Sherrill-Lubinski Corporation. All Rights Reserved.