|
Oracle Coherence Monitor
User Guide |
|
OCM Substitutions
OCM is a configurable solution for
monitoring Coherence clusters. OCM comes with default values for configuration
options that determine Monitor behavior. Substitutions are a mechanism that
allows you to configure Monitor behavior. At runtime, a defined substitution
substitutes your own value for the preconfigured default. In this way, the
runtime behavior of OCM can be configured.
This section explains how to use the following substitutions:
You configure an OCM substitution by defining a value for the substitution. To define the value for a substitution you edit the rtview.properties file. Typically, you configure substitutions using the sl.rtview.sub property, which is also the recommended method. To see a sample rtview.properties file, see Sample rtview.properties File.
Substitution Syntax
Substitutions are optional and
require the following syntax:
sl.rtview.sub=<sub_name>:<sub_value>
For example:
sl.rtview.sub=$OCMCLUSTERSTATS_TABLE:OCM_CLUSTERSTATS
If a substitution value contains a single quote, it must be escaped using a / :
sl.rtview.sub=$filter:Plant=/'Dallas/'
If a substitution value contains a space, it must be enclosed in single quotes. Do not escape these single quotes:
sl.rtview.sub=$subname2:'sub value 2'
A substitution string cannot contain the following:
& | / | \ | { | } | [ | ] | ( | ) |
Database Substitutions
This section describes substitutions
used to configure
database connections and database tables. The table names and data connections specified in the substitutions must match the table names and data connections specified for your database configuration.
NOTE: The use of some persisted history value tables is optional. To prevent the use of such tables use the default substitution value of ‘’ (two single quotes) which prevents reading and writing of the given database table from OCM.
Substitution | Description |
$ALERTDEFS_DB |
Use this substitution to specify the SQL connection
to use to connect to the database containing alert threshold
tables. The
default is ALERTDEFS.
Example: |
$ALERTDEFS_TABLE |
Use this substitution to specify the database table
containing threshold values for scalar alerts. The default is ALERTDEFS.
Example: |
$OCMCACHESERVICESTATS_TABLE |
Use this substitution to specify the name of the
persisted history value table for OCM cache service statistics. The use of this
persisted history value table is optional
and not enabled, by default (it has a value of '').
Example: |
$OCMCACHESERVICETOTALS_TABLE |
Use this substitution to specify the name of the
database table containing persisted history
values for OCM cache service totals. The default is
OCM_CACHESERVICETOTALS.
Example: |
$OCMCACHESTATS_TABLE |
Use this substitution to specify the name of the
database table containing persisted history
values for OCM cache statistics. The use of this persisted history value table
is optional
and not enabled, by default (it has a value of '').
Example: |
$OCMCACHETOTALS_TABLE |
Use this substitution to specify the name of the
database table containing persisted history
values for OCM cache totals. The default is
OCM_CACHETOTALS.
Example: |
$OCMCLUSTERSTATS_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM cluster statistics. The default is OCM_CLUSTERSTATS.
Example: |
$OCMEXTENDCONNECTIONS_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM extend connections. The use of this persisted history value
table is optional
and not enabled, by default (it has a value of '').
Example: |
$OCMINVOCATIONSERVICESTATS_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM invocation service statistics. The use of this persisted
history value table is optional
and not enabled, by default (it has a value of '').
Example: |
$OCMINVOCATIONSERVICETOTALS_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM invocation service totals. The default is OCM_CLUSTERSTATS.
OCM_INVOCATIONSERVICETOTALS.
Example: |
$OCMJMXMGMTDATA_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM JMX management data. The use of this persisted history value
table is optional
and not enabled, by default (it has a value of '').
Example: |
$OCMJMXSTATSTOTALS_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM JMX statistic totals. The default is OCM_JMXSTATSTOTALS.
Example: |
$OCMJVMGCINFO_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM JVM garbage collection information. The use of this
persisted history value table is optional
and not enabled, by default (it has a value of '').
Example: |
$OCMJVMMEMORYPOOL_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM JVM memory pool data. The use of this persisted history
value table is optional
and not enabled, by default (it has a value of '').
Example: |
$OCMJVMOPERATINGSYSTEM2_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM JVM operating system data. The default is OCM_JVMOPERATINGSYSTEM2.
Example: |
$OCMNODESTATS_TABLE |
Use this substitution to specify the name of the
persisted history value table for OCM node statistics. The default is OCM_NODESTATS.
Example: |
$OCMNODETOTALS_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM node totals. The default is OCM_NODETOTALS.
Example: |
$OCMPROXYSERVICESTATS_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM proxy service statistics. The use of this persisted history
value table is optional
and not enabled, by default (it has a value of '').
Example: |
$OCMPROXYSERVICETOTALS_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM proxy service totals. The use of this persisted history
value table is optional
and not enabled, by default (it has a value of '').
Example: |
$OCMSTORAGESTATS_TABLE |
Use this substitution to specify the name of the
persisted history
value table for OCM storage statistics. The use of this persisted history value
table is optional
and not enabled, by default (it has a value of '').
Example: |
$OCMSTORAGETOTALS_TABLE |
Use this substitution to specify the name of the
persisted history value table for OCM storage totals. The default is
OCM_STORAGETOTALS.
Example: |
$RTVHISTORY_DB |
Use this substitution to specify the name of the SQL
connection to use for the database containing persisted history value tables
(the named SQL connection is also defined in the rtview.properties file). The
default is RTVHISTORY.
Example: |
Alert
Substitutions
This section describes substitutions
that are used to configure behavior of alerts described in the following table.
Substitution | Description |
$alertActionScript |
Specifies the name of the script to execute for an alert command, without
the extension. This name is combined with the value of $scriptEnding to
form the complete name of the script.
Example: |
$AVERAGE_MEMORY_TIME_WINDOW |
Use this substitution to specify the average memory
time window (the time range over which available memory is averaged) for the
OcAvailableMemoryLowNodeSpike alert.
The default is 86400 seconds (24 hours).
Example: |
$BAD_COMMUNICATION_NODES_TIME_RANGE
|
Use this substitution to specify the time range for the
OcBadCommunicationNodesInTimeRange alert.
The default is 300 seconds (5 minutes).
Example: |
$domainName |
Specifies a domain name to be used by the alert commands. Use this substitution
on any Data Server that generates alerts to identify the source of the alert.
Usually each Solution Package defines its own domain name.
Example: |
$NODES_DEPARTED_TIME_WINDOW
|
Use this substitution to specify the time
window (the time range over which departed nodes are monitored) for the
OcDepartedNodesPercentage alert.
The default is 300 seconds (5 minutes).
Example: |
$scriptEnding |
Specifies the suffix of the script called for an alert command.
Typically, it is set to bat on Windows systems and sh on Linux.
The default is bat. Example: sl.rtview.cmd_line=-sub:$scriptEnding:bat |
Filter Substitutions
This section describes
substitutions that are used to filter the JMX query returning data from
the Coherence cluster. Reducing the amount of data retuned can improve OCM
performance in cases where returning all data is too much. Filter substitutions
specify what data to return in a JMX query (rather than what data to exclude)
and subsequently display. Filter
substitutions can be used to return all relevant data (when the filter is *)
or a subset of data that matches the filter (for example,
when the filter is service=DistributedCache,name=foo,*).
Data can also be filtered to include a specific value.
For details about JMX specifications, see http://docs.oracle.com/javase/6/docs/technotes/guides/jmx/JMX_1_4_specification.pdf.
Substitution | Description |
$cacheFilter |
Use this substitution to modify the basic OCM Cache query. The purpose of this substitution is to reduce the amount of Cache MBean data gathered from the cluster and subsequently displayed by the Monitor, thereby improving OCM performance. The default is * (asterisk), which returns all Cache MBean data. To illustrate, the following examples contain red bold text to indicate where the $cacheFilter substitution modifies the OCM Cache query. The following is the basic Cache query used by the OCM which is modified by the value of the $cacheFilter substitution variable: Coherence:type=Cache,$cacheFilter 0 * -1 *- Examples: |
$storageFilter |
Use this substitution to modify the basic OCM StorageManager query. The purpose of this substitution is to reduce the amount of StorageManager MBean data gathered from the cluster and subsequently displayed by the Monitor, thereby improving OCM performance. The default is * (asterisk), which returns all StorageManager MBean data. To illustrate, the following examples contain red bold text to indicate where the $storageFilter substitution modifies the OCM StorageManager query. The following is the basic StorageManager query
used by the OCM which is modified by the value of the $storageFilter
substitution variable: Coherence:type=StorageManager,$storageFilter 0 *
-1 *- |
Cache Substitutions
This section describes
substitutions that are used
to configure cache behavior. For details about OCM caches in the cluster that persist data to the
database, see the index.html documentation located in the
cachedocs directory. This documentation describes settings for the cache such as
persisted columns, default table sizes and compaction rules.
Substitution | Description |
jvmCondenseRowsInterval |
Use this substitution to reduce the amount of
in-memory data stored in a JVM cache table via in-memory condensing of
historical data. Specifies the time interval used for JVM cache history
condensing. The default is 300 seconds (5 minutes). Raw values for this
interval are condensed into a single value representing the interval, on a
per-column basis. Specify a value using the
following format: |
jvmCondenseRowsRawDataTimeSpan |
Use this substitution to specify the time span of
raw JVM historical data held in-memory before in-memory condensing is applied.
The raw data is kept in the JVM cache history table and, if enabled, its
history_combo table. By default, this is enabled. The default is 1200
seconds (20 minutes).
Specify a value using the following
format: |
$cacheNameFormat |
Use this substitution to modify how cache names are
shown in OCM displays. By default, $cacheNameFormat is set to 4*24 which displays the initial 4 characters followed by ".." if the name has more than 24 characters, then up to 14 remaining characters, followed again by ".." if the name has more than 28 characters. You can change the value of $cacheNameFormat to N*M, where N is the number of initial characters to display, and M is the number of ending characters to display. In the following example the initial 4 characters of the cache name are displayed, up to 24 ending characters are displayed, and additional characters are elided and replaced by “…”
Example: |
$ocmCompactionRules
|
Use this substitution to reduce the amount of
data stored in the Historian table. Data compaction achieves this by aggregating
stored data as the data ages. By default, data compaction is enabled, with
settings suitable for most use cases. When data compaction is not
enabled, data must be reduced manually by backing up or deleting archived data. This substitution specifies to aggregate the number of data points and the time intervals for doing so. The default is 1d - ;1w 5m ;1M 15m (see detailed description, below). Compaction is specified using a semi-colon separated list in the following format: $ocmCompactionRules:'NNu<waitperiod> - ;NNu<firstaggregationrule> ;NNu<secondaggregationrule>' where NN is a number and u is a single
character. Valid characters are as follows: Using the ocmCompactionRules default settings, for example: sl.rtview.sub=$ocmCompactionRules:'1d - ;1w 5m ;1M 15m', no compaction occurs for data less than 24 hours old--a 1 day wait period specified by the first rule: 1d -. During this time data is stored 3600 points per hour (every second). When data is 1 day old, compaction begins at 5 minute intervals for the next week, specified by the second rule: 1w 5m. During this time the data is aggregated at a compaction level of 12 points per hour (60 minutes divided by 5 minutes). When the data is 8 days old (1 week + 1 day), compaction occurs at 15 minute intervals for the next month, specified by the third rule: 1M 15m. During this time the data is aggregated at a compaction level of 4 points per hour (60 minutes divided by 15 minutes). When that data is 38 days old (1 month + 1 week + 1 day), the data is stored in the Historian table at the compaction level of 4 points per hour. Data compaction increases the length of time between trend graph data points as the data ages. You can modify compaction settings by editing the ocmCompactionRules substitution in the rtview.properties file. For example, if you need to further reduce the amount of stored Historian data you might increase the compaction level sooner--in the second rule--from 4 points per hour to 1 point per hour. The 4 points per hour compaction level is the maximum recommended as trend graphs plot gaps when the level is above this.Conversely, if you need more data points to be visible in trend graphs, you might decrease the compaction level from 4 points per hour to 8 points per hour.
Example: |
$ocmCondenseRowsInterval |
Use this substitution to reduce the amount of
in-memory data stored in a cache table via in-memory condensing of historical
data. Specifies the time interval used for OCM cache history condensing. The
default is 300 seconds (5 minutes). Raw values for this interval are
condensed into a single value representing the interval, on a per-column basis. Specify a value using the following format: NNu w weeks (7 days) For example, to specify a ten minute interval: If only a number is entered, it is assumed to be seconds.
Example: |
$ocmCondenseRowsRawDataTimeSpan |
Use this substitution to specify the time span of
raw OCM historical data held in-memory before in-memory condensing is applied.
The raw data is kept in the OCM cache history table and, if enabled, its
history_combo table. By default, this is enabled. The default is 1200
seconds (20 minutes). Specify a value
using the following format: |
$ocmHistoryTimeSpan |
Use this substitution to specify, in seconds, the
number of days of history data to load at startup. This substitution can be used
to limit the SQL result set. The default is 1296000 (15 days).
Example: |
ocmMaxNumberOfHistoryRowsLarge |
Use this substitution to size in-memory
storage of history data. This substitution is typically helpful in multi-cluster monitoring,
where a cache is used to hold data from multiple clusters. The default is
300000. This substitution is
one of three substitutions that are used for the same purpose but for different
cache sizes. By default, caches that store history data are
categorized by size (as small, medium or large) according
to the
expected
maximum number of
history rows they store. Determine the size category of a cache by referring to the number of rows specified for Max Number Of History Rows in the index.html documentation, located in the cachedocs directory. Cache size categories with default values are as follows:
A higher number of rows typically shortens response times and makes more history data available, while more memory is consumed. A lower number of rows typically lengthens response times as history data not in-memory is read from the SQL database.
Example: |
ocmMaxNumberOfHistoryRowsMedium |
Use this substitution to size in-memory
storage of history data. This substitution is typically helpful in multi-cluster monitoring,
where a cache is used to hold data from multiple clusters. The default is
200000. This substitution is
one of three substitutions that are used for the same purpose but for different
cache sizes. By default, caches that store history data are
categorized by size (as small, medium or large) according
to the
expected maximum number of
history rows they store.
Determine the size category of a cache by referring to the number of rows specified for Max Number Of History Rows in the index.html documentation, located in the cachedocs directory. Cache size categories with default values are as follows:
A higher number of rows typically shortens response times and makes more history
data available, while more memory is consumed. A lower number of rows typically
lengthens response times as history data not in-memory is read from the SQL
database.
Example: |
ocmMaxNumberOfHistoryRowsSmall |
Use this substitution to size in-memory
storage of history data. This substitution is typically helpful in multi-cluster monitoring,
where a cache is used to hold data from multiple clusters. The default is
100000. This substitution is
one of three substitutions that are used for the same purpose but for different
cache sizes. By default, caches that store history data are
categorized by size (as small, medium or large) according
to the
expected maximum number of
history rows they store.
Determine the size category of a cache by referring to the number of rows specified for Max Number Of History Rows in the index.html documentation, located in the cachedocs directory. Cache size categories with default values are as follows:
A higher number of rows typically shortens response times and makes more history data available, while more memory is consumed. A lower number of rows typically lengthens response times as history data not in-memory is read from the SQL database.
Example: |
ocmRowExpirationMode |
Use this substitution with the ocmRowExpirationTime
and ocmRowExpirationTimeForDelete
substitutions to configure the Node Expiration
Mode. Use this substitution to make expired nodes visible and selectable in Monitor displays. The default is 3 (to not mark and show expired nodes in displays). When enabled (1) only active node counts are included in the total number of nodes in the system. Expired nodes are included in displays that show all nodes and the expired nodes are highlighted in red. Trend graphs stop updating expired nodes at the time of departure. When displays show selectable nodes (heatmaps, table rows, grids and drop-down lists) the total of selectable nodes is shown - active nodes and expired nodes which are highlighted in red. Also, node drop-down lists include the suffix [X] for departed nodes. Single node displays have a red background for expired nodes. When not enabled (3) only active nodes are included in the total number of nodes in the system and expired nodes are not shown in displays (they are not considered part of the system).
where: Use the ocmRowExpirationTime substitution to specify the amount of time, in seconds, after which a node is considered expired.
Example: |
ocmRowExpirationTime |
Use this substitution with the
ocmRowExpirationMode
and ocmRowExpirationTimeForDelete
substitutions to configure the Node Expiration
Mode.
Specifies the amount of time, in seconds, after which a node is considered expired when data updates are not received from it. The default is 25 (seconds). < Best practices dictate to allow at least two JMX updates to detect an expired node. Less than two updates might< give a false positive. If node data is missing from one sample, the second sample can confirm it, making a false positive unlikely. To ensure a minimum of two JMX updates, set the ocmRowExpirationTime to 2.5 x the current JMX Mbean sampling interval. For example, if the JMX Mbean sampling interval is 10 seconds, set the ocmRowExpirationTime substitution to ocmRowExpirationTime:25. Also note that if the ocmRowExpirationTime is set to 3 (or more) x the current JMX Mbean sampling interval, it will take at least three (or more) updates after no data is received from a node before a node is marked expired. Therefore, a higher setting can increase the latency in detecting expired nodes. The JMX Mbean sampling interval is specified by the collector.sl.rtvapm.ocmon.jmxsampleperiod property.
Example: |
ocmRowExpirationTimeForDelete |
Use this substitution with the
ocmRowExpirationMode
and ocmRowExpirationTime
substitutions to configure the Node Expiration
Mode.
Specifies the amount of time, in seconds, after which an expired node is no longer shown in displays. The default is 25 (seconds).
Example: |
Node Expiration Mode
Substitutions
When nodes expire, by default they are no longer selectable, nor are they shown,
in OCM displays. However, under certain circumstances it might be beneficial to
display them, and control how long expired nodes are shown in OCM displays.
There are three possible modes you can configure for expired nodes:
Mode 1: Expired nodes are not shown in displays (the default)
Mode 2: Expired nodes are shown and selectable in displays indefinitely. Expired nodes persist as expired nodes until they rejoin the cluster. If there is a large population of expired nodes, consider Mode 3.
Mode 3: Expired nodes are shown and selectable in displays for a specified time, then they are removed from displays at a user-specified time. This option enables you to manage the clutter of expired nodes – with the time window with which you wish to investigate them.
Example Mode 2: Expired nodes shown and selectable in displays indefinitelysl.rtview.sub=$ocmRowExpirationMode:3
Where: $ocmRowExpirationTime is 2.5 times the jmxsampleperiod in seconds
sl.rtview.sub=$ocmRowExpirationTime:25
sl.rtview.sub=$ocmRowExpirationTimeForDelete:25
$ocmRowExpirationTimeForDelete is the same value as ocmRowExpirationTime (nodes are deleted as they expire and are thus not displayed)
sl.rtview.sub=$ocmRowExpirationMode:1Example Mode 3: Expired nodes shown and selectable in displays for a specified time
sl.rtview.sub=$ocmRowExpirationTime:25
sl.rtview.sub=$ocmRowExpirationTimeForDelete:25
Where: $ocmRowExpirationMode is 1
$ocmRowExpirationTime is 2.5 times the jmxsampleperiod in seconds
$ocmRowExpirationTimeForDelete is ignored in this mode
sl.rtview.sub=$ocmRowExpirationMode:3
sl.rtview.sub=$ocmRowExpirationTime:25
sl.rtview.sub=$ocmRowExpirationTimeForDelete:86400
Where: $ocmRowExpirationMode is 3
$ocmRowExpirationTime is 2.5 times the jmxsampleperiod in seconds
$ocmRowExpirationTimeForDelete is the amount of time, in milliseconds, expired nodes are displayed. This value must be longer than $ocmRowExpirationTime. A value of 86400 would display expired nodes for 24 hours.
collector.sl.rtvapm.ocmon.jmxsampleperiod
It is helpful to understand jmxsampleperiod property when configuring
node expiration modes. jmxsampleperiod
is a property used to
control the rate at which JMX MBean attributes are polled. It can be used to
balance the overhead of requesting the data, with the latency of the results. To
avoid overloading systems, request data at a rate no faster than it
can be produced by the system being monitored. See the
Metrics Administration
display to see the total time taken to obtain the JMX data.
The jmxsampleperiod
property specifies the time interval, in
milliseconds, for polling MBean attributes and operations executed in data
attachments if no poll interval is specified in the data attachment. The default
is 10000 (10 seconds). This attribute is specified in the
rtview.properties file, located in the OC Monitor projects/mysample
directory.
Because the Default Poll Interval is superseded by the General Update Period,
the amount of time elapsed between MBean polls might be longer than the value
entered. For example, if the General Update Period is 2000 milliseconds and the
Default Poll Interval is 5000 milliseconds, MBean attributes and operations are
polled every six seconds.
Cluster Substitutions
This section describes substitutions that are
used to configure cluster behavior.
Substitution | Description |
$coherenceGlobalDomain |
Use this substitution to fetch data from "super
size" clusters. Specifies the global domain name for JMX Queries. The
default is Coherence. Use the default value of Coherence
to fetch data from Coherence MBeans. NOTE: This feature requires additional
system management for the cluster that is not included with the OC Monitor. For
information, contact SL Corporation, at
info@sl.com. Example: |
$coherenceLocalDomain |
Use this substitution to fetch data from "super
size" clusters. Specifies the local domain name for JMX Queries. The
default is Coherence. Use the default value of Coherence
to fetch data from Coherence MBeans. NOTE: This feature requires additional
system management for the cluster that is not included with the OC Monitor. For
information, contact SL Corporation, at
info@sl.com. Example: |
RTView contains components licensed under the Apache
License Version 2.0. |
Treemap Algorithms v1.0 is used without
modifications and licensed by MPL Version 1.1. Copyright © 2001 University of
Maryland, College Park, MD |
Datejs is licensed under MIT. Copyright © Coolite Inc. |
jQuery is
licensed under MIT. Copyright © John Resig, |
JCalendar 1.3.2 is licensed under LGPL.
Copyright © Kai Toedter. |
jQuery is licensed under MIT. Copyright (c) 2009 John
Resig, http://jquery.com/ JCalendar 1.3.2 is licensed under LGPL.
Copyright © Kai Toedter. |
JMS, JMX and Java are trademarks or registered trademarks
of Sun Microsystems, Inc. in the United States and other countries. They are
mentioned in this document for identification purposes only. |
SL, SL-GMS, GMS, RTView, SL Corporation, and
the SL logo are trademarks or registered trademarks of Sherrill-Lubinski
Corporation in the United States and other countries. Copyright © 1998-2013
Sherrill-Lubinski Corporation. All Rights Reserved. |