Wednesday, March 21, 2007

TAF - Transparent Application Fail Over on RAC 9i

One benefit of Oracle RAC is TAF - transparent aplication failover,
which could guarantee continuing its session and query (only select that can be handled, DML, DDL still in my dream, perhaps oracle 11g could have it) when
instance failure.

Here I will talk about simple network naming method by local naming (tnsnames)
to accomodate TAF functioning.

in this scenario I am using the environment:
1. Oracle 9.2.0.4 cluster manager as clusterware or CRS
2. Oracle 9.2.0.4 as oracle RAC database
3. Linux CentOS Enterprise 2.1
4. VMWare (we do testing on a limited resources)

The installation of the above environment is quite easy and comfortable
for almost DBA, who have the knowledge about the concept of cluster as well as oracle database itself.

Because TAF provides failover for query (select statement) and the tool which I have tested is SQL*Plus, another drivers also have TAF ready which mainly OCI base.

Real Application Clusters Failover, involve several works inside, there are:
• Detect failure by monitoring the heartbeat
• Reorganize instance membership
• Perform instance recovery
• Reestablish failed client connections

TAF on Oracle RAC Can be used with the "available" nodes or with primary/secondary instance configurations (application partitioning against RAC database)


TAF Failover Mode Options

• Must add failover options manually to TNS configuration files
• They are part of the CONNECT_DATA section of a connect descriptor
• Failover options include
– TYPE – SESSION: Failover to an alternate session only
– SELECT: Failover and continue with any ongoing query
– NONE: Prevent failover

– METHOD • BASIC: Establishes no contact with the failover
instance prior to failure
• PRECONNECT: Creates mirror connections on the
standby instance for the connections on the primary
instance

– BACKUP: Identify an alternate net service name
Specify a different net service name for backup connections.
• Use with the PRECONNECT method to identify the
service to initiate your failover preconnections

– RETRIES: Limit the number of times a reconnection will be attempted
Specify the number of times to attempt to connect after a failover.

– DELAY: Specify how long to wait between reconnection attempts
Specify the amount of time in seconds to wait between connect attempts.


herewith a snippet of initial parameter to comply with TAF:

*.cluster_database_instances=3
*.cluster_database=TRUE
*.db_domain=''
*.db_name='racdb'
racdb2.local_listener='LISTENER_RACDB2'
racdb1.local_listener='LISTENER_RACDB1'
racdb3.local_listener='LISTENER_RACDB3'
*.remote_listener='LISTENERS_RACDB'

those LISTENER_* are entries in the tnsnames.ora file which point to each instance as well as an "FAILOVER" ready in the remote_listener:

here tnsnames.ora file to accomodate the above initial parameter complying TAF:

LISTENER_RACDB3 =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.3.91)(PORT = 1521))

#see racdb3.local_listener in the initial parameter and 10.1.3.91 is
the public IP of the third Node

LISTENER_RACDB2 =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.3.81)(PORT = 1521))

#see racdb2.local_listener in the initial parameter and 10.1.3.81 is
the public IP of the second Node

LISTENER_RACDB1 =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.3.71)(PORT = 1521))

#see racdb1.local_listener in the initial parameter and 10.1.3.71 is
the public IP of the first Node

RACDB =
(DESCRIPTION=
(ADDRESS_LIST=
(LOAD_BALANCE=ON)
(FAILOVER=ON)
(ADDRESS=(PROTOCOL=TCP)(HOST=10.1.3.71)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=10.1.3.81)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=10.1.3.91)(PORT=1521))
)
(CONNECT_DATA=
(SERVICE_NAME=racdb)
(FAILOVER_MODE=
(TYPE=SELECT)
(METHOD=BASIC)
(RETRIES=500)
(DELAY=5)
)
)
)

#the racdb entry is aimed for client connection, see here we use FAILOVER option and service_name derive from service_name in initial parameter or it could derived from db_name.db_domain.
# this entry also can handle load balancing and client connection time failover as well.

LISTENERS_RACDB =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE=ON)
(FAILOVER=ON)
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.3.71)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.3.81)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.3.91)(PORT = 1521))
(LOAD_BALANCE = yes)
)
(CONNECT_DATA =
(SERVICE_NAME = racdb)
(FAILOVER_MODE=
(TYPE=SELECT)
(METHOD=BASIC)
(RETRIES=500)
(DELAY=5)
)
)
)

# this entry also seem like for the client connection, which has FAILOVER option to comply remote_listener in the initial parameter.



this simple listener.ora file (here for node1)

LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.3.71)(PORT = 1521))
)
)
)

SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(SID_NAME = PLSExtProc)
(ORACLE_HOME = /u01/app/oracle/product/9.2.0)
(PROGRAM = extproc)
)
(SID_DESC =
(ORACLE_HOME = /u01/app/oracle/product/9.2.0)
(SID_NAME = racdb1)
)
)


Now its time to investigate TAF

connect from any client by SQL*Plus :

sqlplus /nolog
SQL> conn system/oracle@racdb
SQL> SELECT machine, failover_type, failover_method, failed_over, COUNT(*)
FROM v$session
GROUP BY machine, failover_type, failover_method, failed_over;

the row should indicated failover_type, failover_method match with the entries in the tnsnames.ora file.

check which instance you are currently in:
SQL> select instance_name from v$instance;

now grab all rows in dba_objects:
SQL> select * from dba_objects;

while the above query running, on the node which instance your connection run in, try shutdown abort to simulate instance crash failure, by using sqlplus or srvctl.

sqlplus '/as sysdba'
SQL> shutdown abort;

it should the above query select * from dba_objects, hold for a while and continue.