Yet another alert log puzzle

articles: 

A handy alert log is invaluable for troubleshooting database problems. A RAC database has multiple alert logs.
I prefer to monitor them through a single table.

"A master failing to make an entry in the vessel's official
logbook as required by this part is liable to the Government for a
civil penalty of $200.

United States Code. Title 46 - Shipping. Subtitle II -Vessels and seamen, Part G - Merchant seamen protection and relief. Chapter 1113 -Official logbooks.

How meticulously do you keep your book log, Captain?
Luckily for us, our databases are as much ships as they are first mates. They are intelligent enough to keep their own records. And those logs are as important as vessel logs of the past, because the information carried by an early 21st century database could be easily worth more than 1450 tons of tea carried by Cutty Sark in 1870 en route from Shanghai to London.

As Oracle does all the mundane work, our role becomes more creative - to inspect and properly use the gathered information.

The alert log file is a best DBA's friend and could be a true lifesaver. It contains chronological account of significant events, such as:

  • critical errors (internal, block corruption and deadlocks);
  • administrative actions (startup, shutdown, drop, alter, archivelog) ;
  • non-default initialization parameters;
  • shared server errors and messages;
  • errors during automatic refresh of materialized views.

If you ever happen to open a Service Request with the Oracle Support - this is the first file they ask for.

And one can tell a true fan of the Oracle database by his attitude to this file. That kind of people enjoy watching the instance startup (like a sunrise) with

   
   tail alert.log -f 

in the shell prompt.

The database keeps this file together with other trace files produced by background processes - in a directory defined by the BACKGROUND_DUMP_DEST initialization parameter.

Considering the importance of the alert log file, it is not a surprise that many DBAs already invented numerous ways to better use this file to their advantage.

Some find it convenient to access the alert log file through the database. Let's say I wish to group and count all the flavors of ORA-600 which happened in the database lifetime. Having the alert log in a relational table would be very handy in this case.

And there is probably a thousand ways to load it into a table or make it look like one - external tables, database jobs, recurring OS scripts etc.
This problem was solved a long time ago and is not thrilling anymore - unless I work with 10g in a clustered environment.

A RAC database, like a building, has several sides. Its facade (the database side) looks like a seamless whole. Developers are the lucky ones who observe this harmonic accordance most of the time.
On the contrary, DBAs regularly deal with the OS side of the database. They know well that each cluster node, like a suburban house, has its own personality, a fence and a backyard. To see an alert log of a particular instance I have to get into the particular backyard - physically log on to that machine. If the alert log is accessible through the database - I have to connect to that particular instance.

This need arises because when we install a cluster database we usually put the software and the admin directory (with the underlying bdump directory) on local hard drives. While the data and control files go to the shared storage (NetApp in our case). Thus node alert logs land far apart - into different computers.
If, during installation, I would have chosen the shared storage for the admin directory, the alert logs from all the instances would be in the same place. But that place would be on an external appliance, linked through the network. Although our networks never fail, I prefer to keep logs close to the home - on local devices.

To see all the alert logs in the cluster I have to connect to each instance (which is inconvenient) or somehow gather the logs in a central location.

Is it possible to load files from individual nodes into a single table using a single database session?

The task just became enticing! To add more spice - let's try to do it using the database instruments only. Nothing like bash scripts and cron jobs. For the sake of the example - my QA database cluster includes only two nodes - QA1 and QA2.

Requirements

As usual, I have a profound one-liner: "The alert log files from all the instances are loaded into a database table automatically ( e.g. once a day) or in manual mode".

Expected user interface

To load the log from the node QA2 user runs the following anonymous block:

BEGIN
  Alert_Log_pkg.Load('QA2');
END;
/

To load the log from the current node user runs this code:

BEGIN
  Alert_Log_pkg.Load;
END;
/

To view the log from the node QA2 user does runs a select statement:

SELECT   *
FROM     Alert_Log
WHERE    Node = 'QA2'
ORDER BY Id;
/

Data model

The simplest possible ALERT_LOG table has three columns:

  • node - to separate entries loaded from different nodes;
  • line - to keep the lines of text from the log files;
  • id - to preserve the original order of lines.

Listing 1. The ALERT_LOG table.

CREATE TABLE My_util.Alert_Log (
  Node VARCHAR2(50 BYTE),
  Line VARCHAR2(4000 BYTE),
  Id   NUMBER(10));
/ 

create sequence my_util.seq_alert_log_id
/

create or replace trigger my_util.tr_alert_log_bi
before insert or update on my_util.alert_log for each row
begin
select seq_alert_log_id.nextval into : new.id from dual;
end;
/

Without a primary key this table will be very inconvenient to use. Usually I want to see the tail of the log first. And my production logs have millions of lines. So I don't want to deal with full table scans.

Listing 2. The primary key.

ALTER TABLE My_util.Alert_Log ADD(
                              CONSTRAINT pk_Alert_Log_Id PRIMARY KEY( Id ));

I am creating the ALERT_LOG table in a schema where I keep my database utilities (MY_UTIL). Having a separate schema for this kind of objects is important - to lessen security risks and make maintenance easier. This is why the supporting PL/SQL package (ALERT_LOG_PKG) will reside in the same schema and will run with definer's rights.

Directory object

To access files on an Oracle server we need a directory object:

create or replace directory bdump_dir as '/oracle/admin/QA/bdump'

The path should match the background_dump_dest initialization parameter.

SELECT *
FROM   v$Parameter
WHERE  NAME = 'background_dump_dest'

The most intriguing moment of the design - the directory object points to the same location on two different computers. How do we access a particular directory on a particular node?

I am not going to build up tension here - soon you will see that I have chosen the service affinity feature of the DBMS_SCHEDULER package.

The supporting package needs privileges to read and write in this directory,

grant read, write on directory bdump_dir to MY_UTIL

Is it safe to access the oracle system directory through the database?
Yes, it is. At least what Oracle support says.

Loading logs

To load an alert log into a table I need to perform several actions:

  • determine the name of the file;
  • verify that the file exists;
  • rename it ;
  • read the file line by line and load into the table.

The name of the alert file is instance specific. In my case these files are alert_QA1.log and alert_QA2.log. So, the first thing the ALERT_LOG_PKG package does when it loads a file - it determines the current instance name using DBMS_UTILITY.

Why do I need to rename the file?
First, I prefer to work with a personal copy. Nothing should write into a file being uploaded. And Oracle has no problems with a missing alert file - it will make a new one when it is necessary.

Second - I don't want to accidentally upload the same file twice.
Thus, after several uploads I will have a moderately sized recent alert log and a collection of backup files:

Listing 3. Sample listing of the bdump directory.

alert_QA1.log
alert_QA1_060718020312.backup 
alert_QA1_060718101950.backup 
alert_QA2_060718102159.backup

Why do I dare to rename the database log file? Because the Oracle Database Reference says : "This file grows slowly, but without limit, so you might want to delete it periodically. You can delete the file even when the database is running."

If I need a single file I can easily concatenate these backups. But why would I need it when I have all the data in ALERT_LOG table?

Checking that alert log exists, renaming and reading it is easily done with by routines from the UTL_FILE package:

-- check if the file exists
utl_file.fgetattr(idir, ifile_name, result, file_length, block_size);

-- rename the file
utl_file.frename(vdir, vlog_name, vdir, vbackup_name, false);

-- open, read and close
f := utl_file.fopen(idir, ifile_name, 'r', c_max_line_size);

-- a loop is implied here
utl_file.get_line(f, vline, c_max_line_size);

utl_file.fclose(f);

Hypothetically, you might need to write messages into the alert log. For instance, to leave a comment or put a timestamp of the last upload operation. A well-known (and still undocumented) function could do that:

dbms_system.ksdwrt (destination IN BINARY_INTEGER, message IN VARCHAR2);

Where destination is:
1 - trace files;
2 - alert log;
3 - both the trace and the alert log files.

The usual disclaimer follows: avoid undocumented functions, since they might change at any moment. And I prefer to not mix my notes with the notes made by the database background processes. Splitting the log into smaller pieces is less intrusive because I preserve the original contents and leave my timestamps at a higher level of abstraction - the file name.

At this point I can run the ALERT_LOG_PKG.LOAD procedure and see new log entries in the ALERT_LOG table.
Now I am ready to automate the log upload process using Oracle jobs.

Automation

The Oracle Database Administration Guide earnestly recommends forgetting the DBMS_JOB package and switching to a new superior one - DBMS_SCHEDULER.

I am ready to embrace the new package as long as it allows me to force jobs to run on a specific node, as DBMS_JOB used to do. Indeed, DBMS_SCHEDULER does that, but in a more flexible and indirect way.

There is a new layer of abstraction. Instead of tying up jobs to database instances we have to tie them up to database services. This feature is great for fault-tolerant load partitioning. And it works well in my case - just with a little extra coding.

The first step is to create database services for each node in the cluster.

BEGIN
  dbms_Service.Create_Service('node_qa1','node_qa1',dbms_Service.Goal_None);
  
  dbms_Service.Start_Service('node_qa1','QA1');
  
  dbms_Service.Create_Service('node_qa2','node_qa2',dbms_Service.Goal_None);
  
  dbms_Service.Start_Service('node_qa2','QA2');
END;

The node_aq1 service runs only on the QA1 instance, node_qa2 - on QA2 respectively. To verify that services are running on proper nodes I use the following views: v$services, v$active_services, gv$active_services.

The second step is to create job classes with affinity to these new services.

BEGIN
  dbms_Scheduler.Create_Job_Class(Job_Class_Name => 'qa1',Service => 'node_qa1',
                                  Comments => 'jobs running on qa1');
  
  dbms_Scheduler.Create_Job_Class(Job_Class_Name => 'qa2',Service => 'node_qa2',
                                  Comments => 'jobs running on qa2');
END;

I can see the newly created job classes in the dba_scheduler_job_classes view.

The MY_UTIL user needs some privileges to use these classes:

grant execute any class to my_util;

grant create job to my_util;

Step three - I am ready to upload logs from any node in the cluster using a simple call:

BEGIN
  My_util.Alert_Log_pkg.Load('QA2');
END;

Now the log entries from QA2 are in the alert_log table.

Internally this call creates a job which runs immediately on QA2.

BEGIN
  dbms_Scheduler.Create_Job(Job_Name => 'my_util.alert_log_qa2',Job_Class => 'qa2',
                            Job_Type => 'STORED_PROCEDURE',Job_Action => 'my_util.alert_log_pkg.load',
                            Enabled => True,Comments => 'Upload alert_log from qa2 using the node_qa2 service.');
END;

To schedule a job which runs on QA2 once a day - just say so:

BEGIN
  dbms_Scheduler.Create_Job(Job_Name => 'my_util.alert_log_qa2',Job_Class => 'qa2',
                            Job_Type => 'STORED_PROCEDURE',Job_Action => 'my_util.alert_log_pkg.load',
                            Start_Date => SYSDATE,Repeat_Interval => 'FREQ = DAILY; INTERVAL = 1',
                            Enabled => True,Comments => 'Upload alert_log from qa2 using the node_qa2 service.');
END;

Note the key role of the "start_date" and "repeat_interval" parameters. If these parameters are null the job starts immediately.

To see the status of recurring jobs I use the dba_scheduler_jobs view:

SELECT *
FROM   dba_Scheduler_Jobs
WHERE  Job_Name LIKE 'ALERT_LOG_QA% '

To see if the jobs worked successfully or failed - I use the all_scheduler_job_log view:

SELECT   *
FROM     All_Scheduler_Job_Log
WHERE    Job_Name LIKE 'ALERT_LOG%'
ORDER BY Log_Id DESC

Finally, I let the scheduler do its job - collect alert logs for me. In case if I need to stop a job I will use the drop_job routine.

BEGIN
  dbms_Scheduler.Drop_Job('my_util.alert_log_qa1',True);
  
  dbms_Scheduler.Drop_Job('my_util.alert_log_qa2',True);
END;

Conclusion


1. In a cluster environment alert logs are easier to analyze when they are loaded into a database table.
2. The described job scheduling technique may be recommended any time a 10G RAC node needs personal attention. Create an instance-specific job class once. Use it as many times as you wish to run PL/SQL code on the same node.
3. The Oracle database is better than as tall ship - even if the Admiralty law applied to it, there would no missing logbook entries.