Enterprise Manager 11g; Cluster alert metrics test cases

February 1st, 2012 | Posted in 11gR2, Blog, enterprise manager, RAC | No Comments

1. Introduction:

The aim of this document is to list some troubleshooting procedures associated with the monitoring of a cluster database. In our RAC 7 nodes environment (11gR2), we are configured SNMP traps to be sent from Enterprise Manager 11gR1 to ZENOSS system.

The challenge to demonstrate to the client when the alerts get fired and how there are showing on Zenoss system. Demonstrations are including crashing the cluster to generate database or cluster alerts.
Some of alerts testing are straightforward when others need to deeply know how the monitoring system works. In this document, I am going to show you how I tested some cluster alerts.

2. Notification Rule creation:

To setup the notification rules, go to preferences on the top corner of the database console. Click on “Rules” Then click on “Create” to create a new notification rule.

3. Identify the targets;

This step is useful to force a metric collection and upload. We need to determine;

1. The Target Name
2. The Target Type
3. The Collection Name

Run the following command logging with oracle software owner. In our case, we need to identify the collection name for the target type cluster. From the output we identify the information;

cluster:tstcluster:CRSAlert+CRSStatus

From now on, we will need to use the collection names of CRSAlert and CRSStatus for cluster metrics collection.


-bash-3.2$ emctl config agent listtargets
Oracle Enterprise Manager 11g Database Control Release 11.2.0.2.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
[dbt01.example.com:3938, oracle_emd]
[dbt01.example.com, host]
[tstcluster, cluster]
[tstdb.example.com_tstdb1, oracle_database]
[tstdb.example.com, rac_database]
[+ASM1_dbt01.example.com, osm_instance]
[LISTENER_dbt01.example.com, oracle_listener]
[LISTENER_2_dbt01.example.com, oracle_listener]
[LISTENER_SCAN3_tstcluster, oracle_listener]

-bash-3.2$ emctl status agent scheduler
Oracle Enterprise Manager 11g Database Control Release 11.2.0.2.0
Copyright (c) 1996, 2010 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
Scheduler status at 2012-02-01 18:20:49
Running entries::
Ready entries::
Scheduled entries::
2012-02-01 18:20:53 : host:dbt01.example.com:Load
2012-02-01 18:20:53 : osm_instance:+ASM1_dbt01.example.com:diskgroup_space_usage
2012-02-01 18:20:56 : oracle_database:tstdb.example.com_tstdb1:health_check
2012-02-01 18:21:04 : oracle_emd:dbt01.example.com:3938:EMDUploadStats
2012-02-01 18:21:09 : oracle_database:tstdb.example.com_tstdb1:Response
2012-02-01 18:21:23 : host:dbt01.example.com:Network+ProgramResourceUtilization+CRSAlert+CRSStatus
2012-02-01 18:21:34 : Upload Files Recount
2012-02-01 18:21:38 : osm_instance:+ASM1_dbt01.example.com:ofs_collections+incident_meter
2012-02-01 18:21:40 : Ping Manager
2012-02-01 18:21:41 : rac_database:tstdb.example.com:streams_processes_count_item
2012-02-01 18:21:58 : rac_database:tstdb.example.com:activity_pending
2012-02-01 18:22:02 : oracle_listener:LISTENER_2_dbt01.example.com:Load+General Status
2012-02-01 18:22:11 : osm_instance:+ASM1_dbt01.example.com:adr_alert_log_rollup
2012-02-01 18:22:21 : cluster:tstcluster:CRSAlert+CRSStatus
2012-02-01 18:22:24 : rac_database:tstdb.example.com:dbjob_status+UserBlock+cardinality+service_performance+qos_psm
2012-02-01 18:22:43 : oracle_listener:LISTENER_SCAN3_tstcluster:Response
2012-02-01 18:22:54 : oracle_database:tstdb.example.com_tstdb1:haconfig2_collection+ha_rac_intrconn_traffic+sga_start+incident_meter
2012-02-01 18:23:08 : oracle_database:tstdb.example.com_tstdb1:sql_response
2012-02-01 18:23:20 : osm_instance:+ASM1_dbt01.example.com:performance_metrics
2012-02-01 18:24:32 : oracle_database:tstdb.example.com_tstdb1:DatabaseVaultRealmViolation_collection+DatabaseVaultCommandRuleViolation_collection+DatabaseVaultRealmConfigurationIssue_collection+DatabaseVaultCommandRuleConfigurationIssue_collection+DatabaseVaultPolicyChanges_collection
2012-02-01 18:24:33 : oracle_database:tstdb.example.com_tstdb1:adr_alert_log_rollup
2012-02-01 18:25:34 : rac_database:tstdb.example.com:streams_statistics
2012-02-01 18:25:38 : oracle_listener:LISTENER_2_dbt01.example.com:Response
2012-02-01 18:25:41 : osm_instance:+ASM1_dbt01.example.com:Response
2012-02-01 18:25:43 : oracle_listener:LISTENER_dbt01.example.com:Response
2012-02-01 18:26:23 : rac_database:tstdb.example.com:Recovery_Area+haconfig3_collection
2012-02-01 18:26:25 : oracle_listener:LISTENER_SCAN3_tstcluster:Load+General Status
2012-02-01 18:26:47 : osm_instance:+ASM1_dbt01.example.com:disk_status
2012-02-01 18:27:13 : oracle_database:tstdb.example.com_tstdb1:latest_hdm_findings_coll_item
2012-02-01 18:27:30 : oracle_database:tstdb.example.com_tstdb1:log_full
2012-02-01 18:28:06 : rac_database:tstdb.example.com:haconfig1_collection+segment_advisor_count+DatabaseVaultRealmViolation_collection+DatabaseVaultCommandRuleViolation_collection+DatabaseVaultRealmConfigurationIssue_collection
2012-02-01 18:28:21 : osm_instance:+ASM1_dbt01.example.com:diskgroup_failgroup_checks
2012-02-01 18:28:42 : oracle_database:tstdb.example.com_tstdb1:baseline_metadata
2012-02-01 18:30:57 : host:dbt01.example.com:Filesystems+DiskActivity+PagingActivity+CPUUsage+proc_zombie
2012-02-01 18:32:03 : oracle_database:tstdb.example.com_tstdb1:UserAudit
2012-02-01 18:32:11 : host:dbt01.example.com:LogFileMonitoring+FileMonitoring
2012-02-01 18:32:33 : Upload Manager
2012-02-01 18:33:25 : rac_database:tstdb.example.com:latest_db_hdm_findings_coll_item
2012-02-01 18:34:48 : oracle_emd:dbt01.example.com:3938:ProcessInfo
2012-02-01 18:35:15 : oracle_listener:LISTENER_dbt01.example.com:Load+General Status
2012-02-01 18:45:00 : oracle_database:tstdb.example.com_tstdb1:aq_monitoring_alerts
2012-02-01 18:45:39 : rac_database:tstdb.example.com:problemTbsp_10i_Dct+audit_failed_logins
2012-02-01 18:50:17 : rac_database:tstdb.example.com:aq_monitoring_alerts
2012-02-01 19:05:32 : Reap Connection Pools
2012-02-01 19:05:55 : osm_instance:+ASM1_dbt01.example.com:cluster_performance_metrics
2012-02-01 19:11:28 : rac_database:tstdb.example.com:DatabaseVaultCommandRuleConfigurationIssue_collection+DatabaseVaultPolicyChanges_collection+key_profiles_collection
2012-02-01 19:17:36 : osm_instance:+ASM1_dbt01.example.com:ofs_performance_metrics
2012-02-01 22:05:39 : oracle_database:tstdb.example.com_tstdb1:oracle_security
2012-02-01 22:05:43 : host:dbt01.example.com:Inventory
2012-02-01 22:05:48 : oracle_listener:LISTENER_2_dbt01.example.com:oracle_security
2012-02-01 22:05:49 : oracle_database:tstdb.example.com_tstdb1:cluster_resource_name
2012-02-01 22:05:51 : osm_instance:+ASM1_dbt01.example.com:cluster_resource_name
2012-02-01 22:05:53 : oracle_listener:LISTENER_dbt01.example.com:oracle_security
2012-02-01 22:05:58 : oracle_listener:LISTENER_2_dbt01.example.com:cluster_resource_name
2012-02-01 22:05:59 : oracle_database:tstdb.example.com_tstdb1:isHasManaged
2012-02-01 22:06:03 : oracle_listener:LISTENER_dbt01.example.com:cluster_resource_name
2012-02-01 22:06:03 : host:dbt01.example.com:oracle_security
2012-02-01 22:06:08 : oracle_listener:LISTENER_2_dbt01.example.com:isHasManaged
2012-02-01 22:06:13 : host:dbt01.example.com:host_storage
2012-02-01 22:06:13 : oracle_listener:LISTENER_dbt01.example.com:isHasManaged
2012-02-01 22:06:19 : oracle_database:tstdb.example.com_tstdb1:oracle_security_inst
2012-02-01 22:07:11 : cluster:tstcluster:ha_cls_intrconn+crs_event+resource_status
2012-02-01 22:07:37 : rac_database:tstdb.example.com:oracle_security
2012-02-01 22:07:47 : rac_database:tstdb.example.com:cluster_resource_name
2012-02-01 22:07:53 : oracle_listener:LISTENER_SCAN3_tstcluster:oracle_security
2012-02-01 22:07:57 : rac_database:tstdb.example.com:isHasManaged
2012-02-01 22:08:03 : oracle_listener:LISTENER_SCAN3_tstcluster:cluster_resource_name
2012-02-01 22:08:13 : oracle_listener:LISTENER_SCAN3_tstcluster:isHasManaged
2012-02-01 22:08:19 : rac_database:tstdb.example.com:tbspAllocation
2012-02-01 22:08:31 : rac_database:tstdb.example.com:oracle_storage
2012-02-01 22:09:28 : oracle_database:tstdb.example.com_tstdb1:oracle_dbconfig
2012-02-01 22:09:49 : rac_database:tstdb.example.com:problemSegTbsp+feature_usage_collection_item
2012-02-01 22:10:13 : rac_database:tstdb.example.com:invalid_objects_rollup
2012-02-01 22:11:40 : rac_database:tstdb.example.com:oracle_racconfig
2012-02-01 22:13:21 : rac_database:tstdb.example.com:audit_failed_logins_historical
2012-02-01 22:13:47 : osm_instance:+ASM1_dbt01.example.com:oracle_osm
2012-02-01 22:20:00 : cluster:tstcluster:mgmt_rac_services
2012-02-01 22:25:55 : oracle_database:tstdb.example.com_tstdb1:pwd_expiry
2012-02-01 22:27:14 : oracle_database:tstdb.example.com_tstdb1:ha_rac_intrconn+ha_rac_intrconn_type
2012-02-01 22:30:08 : oracle_emd:dbt01.example.com:3938:EMDIdentity+EMDUserLimits
2012-02-01 22:30:23 : osm_instance:+ASM1_dbt01.example.com:Disk_Path
2012-02-01 22:32:22 : host:dbt01.example.com:Swap_Area_Status+HostStorageSupport
2012-02-01 22:35:19 : rac_database:tstdb.example.com:StgPerf
---------------------------------------------------------------
Agent is Running and Ready
-bash-3.2$

4. OCR Alert Log Error:

This metric belongs to cluster target. This metric collects CRS-1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1010 and 1011 messages from CRS alert log at the cluster level and issue alerts based on the error code.

Simplest case: Perform the following steps to generate the alert;

1. Identify the CRS alert log (typically in $GRID_HOME/log//alert.log )
2. Add the following lines to the end of the file: (The Timestamp must be in the present)

2012-02-01 10:13:27.508
[cssd(16636)]CRS-1006: test OCR Alert log error.

3. Run on the agent of the DB Console, use the following command to perform an immediate reevaluation of a metric collection;

$AGENT_HOME/bin/emctl control agent runCollection target_name:target_type collection_name

emctl control agent runCollection :cluster CRSAlert (in my case the cluster name is tstcluster;

emctl control agent runCollection tstcluster:cluster CRSAlert

Use this command to force an immediate upload of the current management data from the managed host to the Management Service. Use this command instead of waiting until the next scheduled upload of the data.

emctl upload agent

4. Wait for 5 minutes and check for a new alert.
5. Open the EM console, click on the Cluster tab, go to All metrics (on the page bottom)
6. Click now on the OCR Alert Log error (see image below);

5. Node Configuration Alert Log Error:

This metric belongs to cluster target. This metric collects CRS-1607, 1802, 1803, 1804 and 1805 messages from the CRS alert log at the cluster level, and issues alerts based on the error code.

Simplest case: Perform the following steps to generate the alert;

1. Identify the CRS alert log (typically in $GRID_HOME/log//alert.log )
2. Add the following lines to the end of the file: (The Timestamp must be in the present)

2012-02-01 10:13:27.508
[cssd(16636)]CRS-1607: test Node configuration error.

3. Run on the agent of the DB Console, use the following command to perform an immediate reevaluation of a metric collection;

$AGENT_HOME/bin/emctl control agent runCollection target_name:target_type collection_name

emctl control agent runCollection tstcluster:cluster CRSAlert

Use this command to force an immediate upload of the current management data from the managed host to the Management Service. Use this command instead of waiting until the next scheduled upload of the data.

emctl upload agent

4. Wait for 5 minutes and check for a new alert.
5. Open the EM console, click on the Cluster tab, go to All metrics (on the page bottom)
6. Click now on the Node Configuration Alert Log Error

6. Node(s) with Clusterware Problem:

This metric belongs to cluster target. This metric shows how many nodes have clusterware problems. This metric uses the cluster verify utility to check cluster nodes.

cluvfy comp crs -n node1, node2 …
Where node1, node2 is the node list for the cluster.

Simplest case: Perform the following steps to generate the alert;

1. Backup and edit $GRID_HOME/bin/cluvfy
2. At the beginning of the file, add the following on the second line to exit the cluster verify utility;

#!/bin/sh
echo ERROR
exit 1;

3. Run on the agent of the DB Console, use the following command to perform an immediate reevaluation of a metric collection;

$AGENT_HOME/bin/emctl control agent runCollection target_name:target_type collection_name
emctl control agent runCollection tstcluster:cluster CRSStatus

4. Wait for 15 minutes and check for a new alert.
5. Open the EM console, click on the Cluster tab, go to All metrics (on the page bottom)
6. Click now on the Node(s) with Clusterware Problem

7. References:

http://docs.oracle.com/cd/B14099_19/manage.1012/b16242/emctl.htm

http://docs.oracle.com/cd/E11857_01/em.111/e16790/emctl.htm#BABHAFAA

Scridb filter


No Comments to “Enterprise Manager 11g; Cluster alert metrics test cases”

There are no comments yet, add one below.


Leave a Comment





Subscribe


Polls

which oracle topic interests you most?

View Results

Loading ... Loading ...


Oracle Class Tweets


Recent Posts


Recent Comments

  • Ravi: Hi, As above, I want to add a new column in to my production database (11g) that has millions of records but...
  • DACCorp: Thanks bro, it worked! XD
  • accutane: Hello there, just became alert to your blog through Google, and found that it is truly informative. I am...
  • Ayman Mohamed: Thanks for your nice article, it is very helpful
  • Moon: Thanks man you solved my problem. i was facing this error: ORA-19625: error identifying file while rman...
  • James: “To solve the issue, After fixing the /etc/hosts file, origin of this issue. I have deleted HAS, using...
  • Darrell Hanning: Awesome information, and very well presented! Stopped thinking I had screwed up in my migration, and...
  • Osama mustafa: Thanks for sharing, you need to confrim with Oracle Support about modify Hidden Parameter as you know...
  • Jaspreet: Brilliant explanation.
  • Yousuf: Very Nice… Just wanted to check you have done all patching at Node 1 only.. is there any thing need to...
  • Yousuf: Very Nice.. Thanks for sharing.. Once question.. You have executed all commands on Node 1 only.. Is there any...
  • Emir: Thanks… Great article
  • borse firmate: Thank you for another informative blog. The place else may just I am getting that kind of information...
  • leandro: why this parameter is systemwide? could you read from a asm instance from one failure group and from the...
  • Mohammad: paul, we create pfile from target database to source database and later we change database name, and...
  • Muhammad Ikram: Thanks Brother for sharing pearls of knowledge. May ALLAH reward you for this both here and...
  • rgrover: Thanks for the POST. I recently encounter similar issue. Your POST helped.
  • gopalredy: really its very use full to dbas
  • Vivian: This is awesome! Thank you so much!
  • henry zhong: CDB=DB, and PDB=SCHEMA but in a sub dictionary?