For more exciting blogs on Oracle, please click the LINK

1. How to determine redo log size is not optimal

As a general rule of thumb, Oracle recommends that you size your online redo logs not to switch more then 5 times per hour during peak DML times. If you are ever having more frequent switches than that, then you might need bigger logs.

Long-running jobs are often much faster when the entire job fits into a single online redo log. For the online transaction processing (OLTP) type of environment, smaller online redo logs are usually better. By monitoring the date and time of the online redo logs at the operating system level (or querying V$LOG_HISTORY), you can determine whether to increase the size or number of online redo logs to reach an optimum switching interval.

Here is a query that shows you the time between log switches. It can be handy in determining if you have a problem:

select  b.recid,
        to_char(b.first_time, ’dd-mon-yy hh:mi:ss’) start_time, 
        to_char(a.first_time, ’dd-mon-yy hh:mi:ss’) end_time,
        round(((a.first_time-b.first_time)*25)*60,2) minutes
from    v$log_history a, v$log_history b
where   a.recid = b.recid + 1
order   by a.first_time asc
set pages 999 lines 400
col Day form a10
col h0 format 999
col h1 format 999
col h2 format 999
col h3 format 999
col h4 format 999
col h5 format 999
col h6 format 999
col h7 format 999
col h8 format 999
col h9 format 999
col h10 format 999
col h11 format 999
col h12 format 999
col h13 format 999
col h14 format 999
col h15 format 999
col h16 format 999
col h17 format 999
col h18 format 999
col h19 format 999
col h20 format 999
col h21 format 999
col h22 format 999
col h23 format 999
SELECT TRUNC (first_time) "Date", inst_id, TO_CHAR (first_time, 'Dy') "Day",
COUNT (1) "Total",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '00', 1, 0)) "h0",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '01', 1, 0)) "h1",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '02', 1, 0)) "h2",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '03', 1, 0)) "h3",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '04', 1, 0)) "h4",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '05', 1, 0)) "h5",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '06', 1, 0)) "h6",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '07', 1, 0)) "h7",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '08', 1, 0)) "h8",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '09', 1, 0)) "h9",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '10', 1, 0)) "h10",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '11', 1, 0)) "h11",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '12', 1, 0)) "h12",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '13', 1, 0)) "h13",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '14', 1, 0)) "h14",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '15', 1, 0)) "h15",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '16', 1, 0)) "h16",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '17', 1, 0)) "h17",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '18', 1, 0)) "h18",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '19', 1, 0)) "h19",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '20', 1, 0)) "h20",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '21', 1, 0)) "h21",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '22', 1, 0)) "h22",
SUM (DECODE (TO_CHAR (first_time, 'hh24'), '23', 1, 0)) "h23",
ROUND (COUNT (1) / 24, 2) "Avg"
FROM gv$log_history
WHERE thread# = inst_id
AND first_time > sysdate -7
GROUP BY TRUNC (first_time), inst_id, TO_CHAR (first_time, 'Dy')
col c1 format a10 heading "Month"
col c2 format a25 heading "Archive Date"
col c3 format 999 heading "Switches"
compute AVG of C on A
compute AVG of C on REPORT
break on A skip 1 on REPORT skip 1
to_char(trunc(first_time), 'Month') c1,
to_char(trunc(first_time), 'Day : DD-Mon-YYYY') c2,
count(*) c3
trunc(first_time) > last_day(sysdate-100) +1
group by
order by


2. When do u get the message that checkpoint not complete?

The “checkpoint not complete” messages are generated because the logs are switching so fast that the checkpoint associated with the log switch isn’t complete.

During that time, Oracle’s LGWR process has filled up the other redo log groups and is now waiting for the first checkpoint to successfully execute.

Remember, Oracle will stop processing until the checkpoint completes successfully!

Usually you see this “checkpoint not complete” error in your alert log:

Sat Oct 24 12:21:55 2021
Thread 1 cannot allocate new log, sequence 50559
Checkpoint not complete

Oracle suggests that you might consider setting archive_lag_target to zero to reduce “checkpoint not complete” messages:

alter system set archive_lag_target=0 scope=both;

Overall Oracle performance can be dramatically improved by increasing the log sizes so that logs switch at the recommended interval of 15 to 30 minutes.

3. You see that the database is in hung state, how will you make the db alive with a minimal downtime as an oracle l3 dba. Before bouncing the database, what will you do as a dba?

Step 1: Check the alert log to see if database has reported any errors indicating why the database is hanging.

Step 2: See if you can get AWR or ASH report or query some of the ASH views. There might be events such as hard parses at the top of the load profile section of the AWR report, indicating the slowness of the database.

Step 3: See if any large expdp/impdp operation running in parallel mode which is consuming all database resources.

Step 4: As a DBA you might know that a single poorly written adhoc query has potential to bring an entire database to a halt. See if you can identify one or more poorly written SQL that may be leading to the hung database.

Step 5: Check if there are any blocking locks and latch contention.

Step 6: Check the server memory usage and CPU usage. Make sure that the sessions are not stalling because of low sized PGA. (not required if u configured AMM)

Step 7: Check whether archive log destination is filled. If Archive destination is full, database will hang. Here in this case you can connect as sys and make room for archive log. You can also change the archive log destination temporarily and then clean up the original destination and then change the archive log destination pointing to original one.

Step 8: Check the FRA. A database hangs when it is unable to write flashback database logs to the recovery area. You can fix this problem by increasing the size of the FRA using the command,

SQL>alter system set db_recovery_file_dest_size=20GB; 


If you are still unable to resolve the reasons for the hung database and you are also unable to connect to the database as SYSDBA then proceed with using prelim option. The prelim option does not require a database connection. You can not use prelim option if you are already connected to the database.

[oracle@jack ~]$ sqlplus /nolog

SQL*Plus: Release Production on Sat Oct 24 11:54:02 2021

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

SQL> set _prelim on
SQL> connect / as sysdba
Prelim connection established
You can also use
[oracle@jack ~]$ sqlplus -prelim "/ as sysdba"

SQL*Plus: Release Production on Sat Oct 24 11:54:02 2021

Copyright (c) 1982, 2009, Oracle.  All rights reserved.

Now you can use oradebug hanganalyze command to analyze the hung database.
SQL> oradebug hanganalyze 3

in RAC env

SQL> oradebug setinst all
SQL> oradebug -g def hanganalyze 3

You can repeat the oradebug hanganalyze command mutiple times to generate dump files for varying process states.

When you call Oracle Support about this issue, they may ask you to attach logs for process state dump also called systemstate dump to analyze hung database.

systemstate dump report shows "what processes are doing and the resource currently they are holding"
Commands to get systemstate dump (NON RAC ENV)
SQL>oradebug setmypid

SQL>oradebug dump systemstate 266

SQL>oradebug setmypid  --- You can aslo specify the process ID other than yours oradebug setmypid 
SQL>oradebug ulimit
SQL>oradebug -g all dump systemstate 266

If you try to issue this command without setting the PID you get ORA-00074.


4. Reason for high buffer busy waits in production

The buffer busy wait event happens when a session tries to access a block in the buffer cache but it can’t because the buffer is busy, because another session is modifying the block and the contents of the block are in flux. To guarantee that the reader has a coherent image of the block with either all of the changes or none of the changes, the session modifying the block marks the block header with a flag letting other users know a change is taking place and to wait until the complete change is applied.

Buffer busy waits are common in an I/O-bound Oracle system, as evidenced by any system with read (sequential/scattered) waits in the top-five waits in the Oracle STATSPACK report, like this:

Top 5 Timed Events
                                                          % Total
 Event                         Waits        Time (s)     Ela Time
 --------------------------- ------------ ----------- -----------
 db file sequential read       2,598        7,146           50.57
 db file scattered read       25,519        3,246           25.14
 library cache load lock         673        1,363            8.62
 CPU time                      2,154          934            9.87
 log file parallel write      19,157          837            7.86


This buffer busy wait condition can happen for either of the following reasons:
1. The block is being read into the buffer by another session, so the waiting session must wait for the block read to complete.

2. Another session has the buffer block locked in a mode that is incompatible with the waiting session’s request.
Because buffer busy waits are due to contention between particular blocks, there’s nothing you can do until you know which blocks are in conflict and why the conflicts are occurring. Tuning therefore involves identifying and eliminating the cause of the block contention.

The main way to reduce buffer busy waits is to reduce the total I/O on the system. This can be done by tuning the SQL to access rows with fewer block reads (i.e., by adding indexes). Even if we have a huge db_cache_size, we may still see buffer busy waits, and increasing the buffer size won’t help.

The most common remedies for high buffer busy waits include database writer (DBWR) contention tuning, adding freelists to a table and index, implementing Automatic Segment Storage Management (ASSM, a.k.a bitmap freelists), and, of course, and adding a missing index to reduce buffer touches.

5. Wait event responsible for enqueue wait event contention (high water mark wait event)

The HW enqueue is used to manage the allocation of space beyond the high water mark of a segment. The high water mark of a segment is the boundary between used and unused space in that segment.

This lock is acquired when a segment’s high-water mark is moved, which typically is the case during heavy inserts.

The HW High Water enqueue (a.k.a. enq: HW – contention) occurs when competing processing are inserting into the same table and are trying to increase the high water mark of a table simultaneously.

1. The HW enqueue can sometimes be removed by adding freelists or moving the segment to ASSM.
2. One solution is to spread the target objects across more physical disk spindles.

When a session needs access to a resource, it requests a lock on that resource in a specific mode. Internally, lock and resource structures are used to control access to a resource. Enqueues, as the name suggests, have a First In/First Out queuing mechanism.

Segments have a High Water Mark (HWM) indicating that blocks below that HWM have been formatted. New tables or truncated tables (that is, truncated without a reuse storage clause), have the HWM value set to the segment header block — meaning, there are zero blocks below the HWM. As new rows are inserted or existing rows updated (increasing row length), more blocks are added to the free lists, and the HWM is bumped up to reflect these new blocks. HW enqueues are acquired in Exclusive mode before updating the HWM, and essentially, HW enqueues operate as a serializing mechanism for HWM updates.

To see total number of gets on HW enqueue, the following query can be used. (The ksqstreq column indicates total number of gets; ksqstwat shows the total number of waits.)

SQL> select ksqstreq, ksqstwat from x$ksqst where ksqsttyp='HW';
---------- ----------
    546953         50


From Oracle version 10g and above, x$ksqst is externalized as v$enqueue_statistics.

If the tablespace uniform size is too low, then the number of HW enqueue gets increases sharply.
High HW enqueue contention is also prevalent during upgrades of Oracle applications.

This can also be because of Bug 6376915. Please refer Doc ID 1476233.1 for more details

6. You got a high contention in the library cache. How will you resolve it?

Oracle library cache contention is easy to understand once you know how Oracle processes SQL statements. The library cache is used like the data buffers, a place to store ready-to-run SQL which has already been parsed, semantically checked, optimized and has a load-and-go execution plan in-place.

Library cache contention occurs when Oracle places a “latch” on a SQL statement in order to manage concurrency. There is a dedicated latch-related wait event for the more popular latches that often generate significant contention. For those events the name of the latch appears in the name of the wait event; such as latch: library cache or latch: cache buffers chains. This enables you to quickly figure out if a particular type of latch is responsible for most of the latch-related contention.

The following query can show whether Oracle has contention for common library cache activities:

   event = 'library cache pin'
   state = 'WAITING';


Library cache latch contention is typically caused by NOT using bind variables. It is due to excessive parsing of statements.
If you discover your applications do not use bind variables — you must have this corrected. You’ll never have a good hit ratio if everyone submits “unique” sql. Your shared pool will never be used right and you’ll be using excessive CPU (90% of the time it takes to process “insert into t values ( 1 )” is parsing. If you use “insert into t values ( 😡 )”, and bind the value of 1 — then the next person that runs that insert will benefit from your work and run that much faster.

7. Possible reasons for library cache mutex wait events?

Mutexes are objects that exist within the operating system to provide access to shared memory structures. It is used in a number of areas including the library cache.
They are similar to latches.
Waits for ‘library cache: mutex X’ are similar to a library cache waits in earlier versions. ‘library cache: mutex X’ may be caused by many issues (including application issues, lack of sharing resulting in high version counts etc.) but essentially something is holding the mutex for “too long” such that other session have to wait for the resource. If there is contention on the latches/mutexes that protect the library cache structures this means that there is stress on the parsing system. Parsing of SQL takes longer because it cannot get the resources they need. This delays other operations and generally slows the system.

What causes ‘library cache: mutex X’ wait?

1. Frequent Hard Parses – If the frequency of Hard Parsing is extremely high, then contention can occur on this pin.

2. High Version Counts – When Version counts become excessive, a long chain of versions needs to be examined and this can lead to contention on this event

3. Invalidations – An invalidation is a measure of the number of times a cached cursor is deleted from the cache because it is no longer valid. A cursor is invalidated because something has changed such that the copy of the cursor in memory is not valid any more. For example, regathering the statistics on an object or modifying a table definition is enough to invalidate a cursor for a query that is based on that object. When a cursor is invalidated, any sessions wanting to use the cursor need to wait for a valid version to be loaded. If there is excessive or unnecessary invalidation then significant waits for ‘library cache: mutex X’ can be seen.

4. Reloads – Reload is a count of the number of times a cursor that previously existed in the cache, was searched for, found to not be there (because it had aged out etc) and then had to be re-compiled and re-loaded in to the library cache. High reloads are a bad thing because they indicate that you are doing work that you would not have had to do if your cache was setup appropriately so as not to remove the cursor in the first place. If a cursor is being reloaded then it cannot be grabbed for work by a session and this can lead to waits for ‘library cache: mutex X’.

5. Known Bugs

6. A too-small value for shared_pool_size (memory_target).  

8. A developer comes to you and asks you to tune a SQL. You checked the query and the developer asks you whether joins or sub queries would be better. Also he asks whether union or union all is better. What would be your answer?

Subquery and Union needs to be used.
To know more, we need to know about each in detail.

Joins and subqueries are both used to combine data from different tables into a single result. They share many similarities and differences.
Subqueries can be used to return either a scalar (single) value or a row set; whereas, joins are used to return rows.
Hence, if Subquery is being used, it would result in a better execution plan than Join

The only difference between Union and Union All is that Union extracts the rows that are being specified in the query while Union All extracts all the rows including the duplicates (repeated values) from both the queries.
A UNION statement effectively does a SELECT DISTINCT on the results set. If you know that all the records returned are unique from your union, use UNION ALL instead, it gives faster results, or else go with Union  

9. How do you verify whether a particular db is of enterprise or standard edition

The output from the below queries would show whether the Database is of Enterprise Edition. If no Edition is being shown, then the Database is of Standard Edition :

select * from v$version;

To get the feature wise difference between Oracle Enterprise and Standard Edition, click the LINK

10. How do you check which tablespaces inside the database are specifically kept in Backup mode?

To check whether a datafile is part of a current online tablespace backup, query the V$BACKUP view.

The V$BACKUP view is most useful when the database is open. It is also useful immediately after an instance failure because it shows the backup status of the files at the time of the failure. Use this information to determine whether you have left any tablespaces in backup mode.

V$BACKUP is not useful if the control file currently in use is a restored backup or a new control file created after the media failure occurred. A restored or re-created control file does not contain the information the database needs to populate V$BACKUP accurately. Also, if you have restored a backup of a file, this file’s STATUS in V$BACKUP reflects the backup status of the older version of the file, not the most current version. Thus, this view can contain misleading data about restored files.

Use below query to get all the tablespaces which are ACTIVE mode (means they are under BEGIN BACKUP mode)

SELECT AS "TB_NAME", d.file# as "DF#", AS "DF_NAME", b.status


The following sample output shows that the example and users tablespaces currently have ACTIVE status

TB_NAME        DF#        DF_NAME                               STATUS
-------------  ---------- ------------------------------------  ------
EXAMPLE        7          /oracle/oradata/proddb/example01.dbf  ACTIVE
USERS          8          /oracle/oradata/proddb/users01.dbf    ACTIVE


11. For a particular sql Id you got a bad execution plan, what are the ways to fix it

Using Oracle baselines you can fix the sql plan for a SQLID
SQL plan management is a preventative mechanism that records and evaluates the execution plans of SQL statements over time. This mechanism can build a SQL plan baseline, which is a set of accepted plans for a SQL statement. The accepted plans have been proven to perform well.
The goal of SQL plan baselines is to preserve the performance of corresponding SQL statements, regardless of changes in the database.
SQL plan baselines cannot help in cases where an event has caused irreversible execution plan changes, such as dropping an index.

Below is a basic way to work with Execution Plan stability, using baselines.

1. Use the below queries to see the available execution plans and see which plan was running fine.

	Ex: select * from TABLE(dbms_xplan.display_awr('32qrgv3ndcadr'));


The ones with the Smallest Elapsed Times are the best execution Plans for the SQL.

a. If the HASH Plan is still in the Cursor Cache it can be created as a baseline and instructed to run every time that SQL ID is loaded to the Shared Pool.

b. If the HASH Plan is no longer in the Cursor Cache, then it is still possible to load the HASH Plan to a Sql Tuning Set and create a baseline from the STS and assign it the SQL ID as well.

HASH /SQL plan needed found in the Cursor Cache

Now you know which hash plan hash to be fixed. Now follow the below example. If the needed plan is found in the cursor cache then it is very simple to create a baseline and fixing the plan for the SQL query.

Ex: Determined the Hash Plan: 2720265748 is the best to run against the SQL ID: 32qrgv3ndcadr

1. Create the Baseline:

SQL> !cat create_baseline.sql
var ret number
exec :ret := dbms_spm.load_plans_from_cursor_cache(sql_id=>'&sql_id', plan_hash_value=>&plan_hash_value);
 SQL> @create_baseline
Enter value for sql_id: 32qrgv3ndcadr
Enter value for plan_hash_value: 2720265748


2. Verify the baseline got created or not

SQL> select sql_handle, plan_name, enabled, accepted, fixed from dba_sql_plan_baselines;
SQL_HANDLE                      PLAN_NAME                                              ENA    ACC     FIX
------------------------------             ------------------------------                                     ---         ---         ---
SQL_4cb85e16fe8c1f30           SQL_PLAN_4tw5b0fdjw7wn6gh6a777       YES     YES     NO


To see all the details, this will create a file with all the baseline info.:
spool baseline_plan.txt
select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle=>’SQL_4cb85e16fe8c1f30′, format=>’basic’));
verify the spool file to confirm the SQL ID and the HASH associated with it


var v_num number;
exec :v_num:=dbms_spm.ALTER_SQL_PLAN_BASELINE (sql_handle =>'SQL_4cb85e16fe8c1f30',plan_name => 'SQL_PLAN_4tw5b0fdjw7wn6gh6a777', attribute_name=> 'FIXED',  attribute_value  => 'YES');


· enabled (YES/NO) : If YES, the plan is available for the optimizer if it is also marked as accepted.
· fixed (YES/NO) : If YES, the SQL plan baseline will not evolve over time. Fixed plans are used in preference to non-fixed plans.
· autopurge (YES/NO) : If YES, the SQL plan baseline is purged automatically if it is not used for a period of time.
· plan_name : Used to amend the SQL plan name, up to a maximum of 30 character.
· description : Used to amend the SQL plan description, up to a maximum of 30 character.

12. There’s an unnamed file created in your standby db which is creating a lag. What’s the reason for this, and how will you resolve the issue?

There are many reasons for a file being created as UNNAMED or MISSING in the standby database, including insufficient disk space on standby site (or) improper parameter settings related to file management.

STANDBY_FILE_MANAGEMENT enables or disables automatic standby file management. When automatic standby file management is enabled, operating system file additions and deletions on the primary database are replicated on the standby database.

In a Data Guard configuration it is very common to these the ‘UNNAMED File Issue/Error’ on Standby Database when you add new datafile on Primary Database and there is no space available on the standby database server or improper parameter settings related to standby file management. This will result in the following Oracle error messages ORA-01111, ORA-01110, ORA-01157 and cause MRP process to crash resulting in Standby database ‘out of sync’ with primary.

Steps to resolve :

a. Check if your MRP is running. If its running, stop the MRP process

b. Identify the datafile name on primary database

Check the file id in standby

SQL>  select * from v$recover_file where error like ‘%UNNAMED%’;

       ———- ———————————————————————-
        14 /u01/app/oracle/product/


Identify the datafile name on primary database

SQL>  select file#,name from v$datafile where file#=14;

 —— ———————————————————————-
 14  /u02/oradata/datafile/infra/users04.dbf

c. Temporary change setting for STANDBY_FILE_MANAGEMENT to MANUAL

SQL> alter system set standby_file_management=MANUAL scope=both;

System altered.

d. Create a new datafile with same name as of Primary ( Name obtained above)

SQL> alter database create datafile '/u01/app/oracle/product/' as '/u02/oradata/datafile/infrastdy/users04.dbf';

Database altered.

SQL> select file#,name from v$datafile where file#=14;

———- ———————————————————————-
        14 /u02/oradata/datafile/infrastdy/users04.dbf



SQL> alter system set standby_file_management=AUTO scope=both;

System altered.


f. Start MRP process and monitor lag gap.


Database altered.


13. Where do we need to set the standby file management auto parameter?

The Standby File Management parameter needs can be set on the fly from Manual to Auto or vice versa. This is dynamic parameter if your database is running in SPFILE

14. What are the possible reasons for primary and standby db not in sync?

There are many reason for your Primary and Standby Database not in SYNC. Some of them are noted below :

a. Your Standby Archive logs destination is full and is unable to accommodate more archive logs

b. Password File is not same

c. Network glitches

15. If you need to register 100 archive logs at the same time, how will you do it?

From RMAN we can catalog the path where the archive logs are placed to register the archive logs

rman> catalog start with 'u01/app/oracle/arch';


16. What is the header status format of a disk before adding a disk in ASM


  • CANDIDATE – Disk is not part of a disk group and may be added to a disk group with the ALTER DISKGROUP statement
  • PROVISIONED – Disk is not part of a disk group and may be added to a disk group with the ALTER DISKGROUP statement. The PROVISIONED header status is different from the CANDIDATE header status in that PROVISIONED implies that an additional platform-specific action has been taken by an administrator to make the disk available for Automatic Storage Management.


17. How to check which node is the master node in Oracle RAC?

There are many ways yo find :

a. select MASTER_NODE from v$ges_resource;

bocrconfig -showbackup
The node that store OCR backups is the master node.

c. Information about who is the master can be found in the cssd logfile

[oracle @ jack]: /u1/app/../cssd >grep -i “master node” ocssd.log | tail -1
[CSSD]CLSS-3001: local node number 1, master node number 1

18. If gpnp profile is not present in node 1, how will db start ?

GPNP Profile stores information about where to find ASM parameter file, what disks to discover, which networks to use and so on. So it is quite important and it is required to start the cluster stack at the very beginning.

A best profile stored inside OLR which enables cluster node to start even when the “profile.xml” itself is missing. 

For more details please click the LINK

19. What is Dynamic reconfiguration and dynamic remastering?

Dynamic Reconfiguration: Dynamic Reconfiguration is the process of reconfiguring resources of failed nodes among surviving nodes. Dynamic Reconfiguration comes into play when an instance crashes unexpectedly. Below are some of the high level steps due to instance crash when node failure is detected by CSSD daemon.

  1. The enqueue part of GRD is reconfigured among surviving nodes by global enqueue service. This operation occurs relatively faster.
  2. The cache part of GRD is reconfigured and SMON reads the redo log of the failed instance to identify the data blocks that it needs to recover.  Since blocks are shipped over the interconnect, SMON may have to merge the redo logs of the failed instance to overcome holes in the redo.
  3. SMON issues GRD requests to obtain all the data blocks that need recovery. Redo threads of failed instances are merged and applied to the most current versions or past image of the blocks from other instance cache. If past images are not available, then redo logs will be applied after reading the blocks from disk. All blocks become accessible now.
  4. Undo blocks are applied to the database in back ground. You can tune this rollback recovery by setting parameter fast_start_rollback_recovery parameter appropriately.

Dynamic Remastering: Before starting discussion on dynamic remastering, you need to be familiar with Global Resource Directory.  GRD basically contains useful information such as data block address, mode (null, shared or exclusive), role(local or global), SCN, past image and current image. Each instance maintains the portion of GRD in its own SGA. In other words, any node which is master of particular object or a block maintains the GRD portion for those blocks in its SGA. GCS in conjunction with GES is actually responsible for updating GRD for a resource in instance’s SGA. The objective of dynamic remastering is to master buffer cache resource on the instance where it’s mostly accessed.

Mastering of block is decided based on the demand for a block, so basically when demand for particular object or block is high on one node, that instance will become master for that object or block. And this is where dynamic mastering comes into play. For Example, instance A is currently master of object (Table EMP) because there were huge number of read/write requests made by instance A on table EMP. After sometime, other instance B started making such requests very frequently for EMP such that demand for EMP table increased on instance B. In such situations, instance A has to either ship the blocks to instance B or forward the request to other instance which is holding most current copy of the requested blocks. To avoid unnecessary interactions and network traffic, GCS remasters the blocks to instance B.

20. How scan name determines which node is least loaded 

For clients connecting using Oracle SQL*Net, three IP addresses are received by the client by resolving the SCAN name through DNS. The client then goes through the list that it receives from the DNS and tries connecting through one of the IP addresses in the list. If the client receives an error, then it tries connecting to the other addresses before returning an error to the user or application. This is similar to how client connection failover works in earlier Database releases, when an address list is provided in the client connection string.

When a SCAN Listener receives a connection request, the SCAN Listener checks for the least loaded instance providing the requested service. It then re-directs the connection request to the local listener on the node where the least loaded instance is running. Subsequently, the client is given the address of the local listener. The local listener then finally creates the connection to the Database instance

21. Using DataPump, how to export in higher version(11g) and import into lower version (10g)?

You need to mention the 10g version in which you are planning to import in the expdp command as below :


expdp username/password directory=test_dir dumpfile=testver.dmp tables=testversion
 version=10.2 reuse_dumpfiles=yes



22. Do we require to create user while at import?

If you are taking a full expdp, then theres no need to create user. But if you are taking specific schemas export, then you need to create the user before importing in the target DB

23. My export job is failed, where do you get what is the reason?

You can get the reason in the below logs:

a. Export log

b. Database alert log

24. How to dismount ASM disk group from particular instance in RAC?

alter diskgroup <diskgroup_name> dismount;



25. What Happen if you miss “+” sign while adding datafile in ASM disk group?

If you miss the ‘+’ sign, there will be several scenarios :

a. db_create_file_dest is set and points to ASM (i.e: +DATA): The database will take ‘DATA’ as an alias and will create ASM file on the destination associated with the ‘DATA’ alias that will be located at the ASM +DATA root directory.

b. db_create_file_dest is set and points to filesystem: It should be quite obvious that you’ll have a new ‘DATA’ file in the location pointed at the parameter.

c. db_create_file_dest is not set (worst scenario): The database will default to $ORACLE_HOME/dbs and create a datafile there with the name ‘DATA’.

26. How do you check the RAW Disk associated with your ASM Disk?

$ oracleasm querydisk -d ASM1
Disk “ASM1” is a valid ASM disk on device [252, 2]

$ ls -l /dev/* | grep “252, *2” | awk ‘{ print $10 }’


27. Somebody has removed alert log file and we want to see yesterday logs how can we see?

You can pull alert log info from x$dbgalertext view at database level.
This view uses .xml files stored dump location to get alert log info.
As long as you have xml files, you can still get alert log info

SQL> desc x$dbgalertext

select message_text
from   v$diag_alert_ext 
where  message_level = 1;

select message_type, message_level, message_text
from   v$diag_alert_ext
where  message_type in (2, 3);