
|
Oracle Coherence Monitor
User Guide
|
Using the OC Monitor
- Introduction
This section describes the OC Monitor main
display and the
Coherence
cluster metrics it provides,
as well as basic OC Monitor GUI navigation and functionality. Except where
noted, the items described here apply to the OCM Stand-alone version and the EM
Solution Package version. The main difference between the OCM Stand-alone GUI
and the OCM Solution Package GUI is the OCM Solution Package version provides a
subset of OCM Stand-alone displays. For details, see
Navigation Tree.
This section includes:
-
OCM Main Display: This
section describes indicators in the main OC Monitor display.
- OCM
Navigation: This section describes the
general operation and functionality of the OC Monitor user interface.
OCM
Main Display
The
OC
Monitor main
display,
Cluster - Overview,
enables you to
quickly
assess the configuration, activity and health of all of your Coherence clusters.
Select a cluster from the
Cluster
drop-down menu. The
following figure describes
the cluster indicator areas.

The OCM
main page reports on the following areas:
-
Coherence Cluster Configuration:
Get
the
cluster name and total counts of members (JVMs) in each cluster.
Counts include
storage nodes, client nodes (non-storage nodes), the total number of caches
and the version of Coherence used in the cluster.
- Memory:
Get
memory
information, including heap size and used memory totals for all storage and
client (non-storage) nodes. Also see the total percent memory usage for
storage and client nodes. A blue-colored recent memory usage trend chart is
displayed for storage nodes and another for client nodes.
- Service
Configuration & HA Status:
Check the high-availability (HA) status for all
Coherence protocol-related
cache services used
by
applications in the cluster.
The StatusHA column indicates whether primary and backup objects are
distributed for surviving machine failure or storage node failure. The most
secure status is MACHINE-SAFE which indicates that an entire host
could fail and all data could be recovered. NODE-SAFE indicates that
a storage node could fail and data could be recovered, but data could be
lost with a host failure. ENDANGERED indicates that the loss of a
single storage node could result in data loss in the cluster. Note that
Coherence does not track whether enough free memory is available for
surviving machine or storage node failure without data loss.
Also, see the number of
caches in each cache service, the number of storage nodes participating in
each cache service, the number of objects in each cache service, the senior
member for the cluster and the senior members for each service. Click a
cache service to view details in the
Single Service -
Summary display.
- Most Gets: Check
on your four busiest caches (in terms of the number of gets reported in the
last measurement). Mouse-over the bar charts to see the most recent metric
for each cache. Select the Cumulative box to display the total number
of gets since the cluster started (or since statistics were last reset).
Statistics from the most active cache are displayed in the upper right
field.
- Largest Cache:
Check on your four largest caches (in terms of memory usage). The largest
cache’s current size, in units, is displayed in the upper right field.
Mouse-over the bar charts to see the most recent metric for each cache.
Units are user defined in your cache configuration file. Typically units are
set to either the number of objects or the number of bytes consumed by
objects.
- Node Uptimes:
Assess cluster stability. View how long nodes in the cluster have been
members of the cluster. The OCM looks at the start time of every node in the
cluster, determines how much time the node has been part of the cluster and
categorizes the nodes into seconds, hours, days, weeks or months. Typically,
if a node leaves the cluster as the result of a fault condition, the node
subsequently rejoins, appears as a “younger” node, and is placed in the
seconds, minutes or hours category. This metric can be an important
indicator of cluster instability. If a node leaves the cluster and fails to
rejoin, the node is subsequently shown in the Departed Nodes box.
- Memory Utilization:
Determine whether cluster memory usage has been increasing over the last
hour and by how much. These bar charts are capacity indicators as well as
garbage collection indicators. The memory usage is shown for the most recent
data collection, and as an average for the last hour. The red bar chart is
for the node in the cluster with the highest memory use, and the yellow bar
shows memory use averaged across the whole cluster.
-
Communication Success
Rate: Determine whether packet loss is occurring. The bar graph uses
pairs of bars in which one bar represents the publisher success rate and the
other represents the receiver success rate.
The bar charts are
the most important indicator for any issue affecting cluster health or
performance. The chart shows the TCMP publisher and receiver success rates
for the last 20 minutes. The success rate is typically 99% or greater in
healthy clusters. Publisher/Receiver failures (the inverse of success)
indicate that packets sent between nodes are not being acknowledged within
the timeout period (which is typically 250 ms). This is typically the result
of a node being unavailable due to garbage collection. However, there are
many other possible causes as well (such as a
network issue, a defective
NIC card, a garbage collection issue, disk swapping, or a shortage of CPU on
a single machine).
Investigate further by clicking the bar chart to view details in the
Cluster - Memory/Network
Health display.
OCM
Navigation
This section describes the general operation
of the OC Monitor and the user interface.
NOTE: Typically, it takes about 30 seconds
after a server is started to appear in an OC Monitor display. By default, data
is collected every 15 seconds, and the display is refreshed 15 seconds after
that.
To access this information
online select the ? button in the top right corner of any of the
OC
Monitor displays.
The following figure illustrates
the OCM Stand-alone version.

Navigation Tree
The following figure illustrates
the OCM Stand-alone version
navigation tree and the OCM Solution Package navigation tree. The OCM
Stand-alone navigation tree contains a series of drop-down menus that are
organized by subject area. The OCM Solution Package navigation tree
provides a
subset of OCM Stand-alone displays.
OCM Stand-alone |
OCM Solution Package |

OCM
Stand-alone Display Areas |
EM Solution
Package Displays |
-
Cluster Views:
Use these displays to
assess Coherence cluster-level performance and utilization.
-
Proxy Services:
Use
these displays to
assess proxy service performance metrics.
-
Cache Services:
Use these
displays to assess
cache service performance metrics across all nodes.
-
All Caches:
Use these
displays to assess
performance and utilization of all caches in the cluster.
-
Single Cache:
Use these
displays to investigate
performance, utilization and activity metrics of a single cache.
-
All Nodes:
Use these
displays to assess
node-level performance and utilization in the cluster.
-
Single Node:
Use these
displays to investigate
performance and utilization metrics of a single node.
-
OC Administration: Use
these displays to manage your Oracle Coherence metrics, nodes
and caches.
-
Alert Views: Use this
display to view alert states and manage alerts for all nodes in
the cluster.
-
Administration: Use these
displays to perform OCM administrative tasks
for nodes, caches, alerts and metrics.
|
|
-
Cluster Summary:
Use
this display
to quickly assess general cluster stability.
This is the
OCM
main display.
-
Service Summary:
Use
this display to assess the performance of a service
aggregated
across all nodes.
-
Cache Summary:
Use
this display
to perform a low level cache
utilization analysis for a cache. Check the metrics for Size,
Evictions and Misses to determine whether more
capacity is needed.
-
Node Summary:
Use
this display
to perform node utilization
analysis.
-
All
Services History:
Use
this display
to assess
capacity utilization, over time, by all services in a cluster.
-
All
Caches History:
Use
this display
to assess
capacity utilization, over time, by all caches in a cluster.
-
All Nodes
History:
Use
this display
to assess
capacity utilization, over time, by all nodes in a cluster.
|
|
Title Bar
Each display shares similar title bar
functionality, described below.

 |
Cluster Selector |
Select a cluster from the
drop-down menu. |
|
|
|
 |
Convenience Navigation |
Convenience navigation buttons enable you to toggle
between the most commonly accessed displays from the current display. These buttons are not
available on all displays, and differ
from one display to another. |
|
|
|
|
 |
Connection Indicator |
Conn OK |
Indicates that one or more servers is delivering
data. |
No Data |
Indicates that one or more servers is found but no
engines are delivering data. |
No Conn |
Indicates that no server is found. |
|
|
|
 |
Open New Window |
Open one or more instance of the same display. Each
window operates independently, allowing you to switch views, navigate to other
displays in the Monitor, and compare performance data. |
|
|
|
 |
Help |
Links to the online help page for the current display. |
Open Multiple Windows
The following illustrates the
usage of the Open New Window
button. Typically, users have the Alert Detail Table display open in one
window and another display, such as the All Caches - History Heatmap
display, open in another window.

Tables and Sorting
The following illustrates the usage of
to sort table columns in alphanumerical order. In this example, the Alert
Level column is sorted.

Mouse-over
The mouse-over functionality provides additional
detailed data in a popup window for trend graphs and heatmaps.
The following illustrates mouse-over
functionality in a trend graph object. In this example, when you mouse over the
Out Messages trend graph, the message out rate is shown at 60 second intervals
throughout the graph.

The following illustrates mouse-over
functionality in a heatmap object. In this example, when you mouse over a host,
details are shown such as message inbound and outbound rates, and the number of
pending messages.

Treemap Algorithms v1.0 is used without
modifications and licensed by MPL Version 1.1. Copyright © 2001 University of
Maryland, College Park, MD
|
Datejs is licensed under MIT. Copyright © Coolite Inc.
|
jQuery is
licensed under MIT. Copyright © John Resig,
|
JCalendar 1.3.2 is licensed under LGPL.
Copyright © Kai Toedter.
|
jQuery is licensed under MIT. Copyright (c) 2009 John
Resig, http://jquery.com/ JCalendar 1.3.2 is licensed under LGPL.
Copyright © Kai Toedter.
|
JMS, JMX and Java are trademarks or registered trademarks
of Sun Microsystems, Inc. in the United States and other countries. They are
mentioned in this document for identification purposes only.
|
SL, SL-GMS, GMS, RTView, SL Corporation, and
the SL logo are trademarks or registered trademarks of Sherrill-Lubinski
Corporation in the United States and other countries. Copyright © 1998-2013
Sherrill-Lubinski Corporation. All Rights Reserved.
|