DataGuard 切换流程梳理(一)-Broker switchover

近期手头有 一项重要工作便是重新梳理Dataguard切换流程与注意事项,以便团队在数据库应急时更加熟练从容的进行容灾切换,主要分为四部分:

1.Broker Switcher

2.Broker Failover

3.命令行 switcher over

4.命令行Failover

5.命令行Failover怎样保证数据不丢失

一.操作前准备

操作前首先要确认主备库的ip、tnsnames、主备库的DB_UNIQUE_NAME、主备库目前DG配置状态,日志同步状态,主备库连接Process数,应用SERVICE处理、SERVICE在监听上注册情况等。检查点比较多,极有可能存在应急操作前没有足够的时间去确保各项检查点的正确性,所以这里罗列出来是为了提醒用途,DG切换成功与否需要以上检查内容的保证。

1.1 确认主备库的tnsnames与ip

主库
IP:181.208
机器名:liangce-ora
DB_UNQIUE_NAME:ORALIANGCE

备库
181.209
机器名:liangcestd-ora
DB_UNIQUE_NAME:ORALIANGCEstd

登录主备库确认$ORACLE_HOME/network/admin目录确认tnsnames.ora文件中的内容,确认ip对应的连接串是否如主库、备库配置如预期,并且tns别名中的配置信息进行DB_UNIQUE_NAME确认

1.2 确认主备库的process配置

为什么要确认此项配置?因为会存在主库多节点RAC+DG单机的情况,也就是说是否存在备库的单机process连接数足够承载RAC多节点的应用连接?

1.3 应用SERVICE处理

我们目前应用SERVICE只注册在主库上,并且应用目前使用的连接串是有DG配置的,如果在切换后主库不处理应用SERVICE,会导致ORA-O1033错误,详情请见早起文章:《中秋节快乐–中秋节前DataGuard切换演练产生问题》

备库:
SQL> show parameter service;

NAME                      TYPE        VALUE
---------------------- ----------- ------------------------------
service_names             string      ORALIANGCE,dbaas

SQL> alter system set service_names='ORALIANGCE,dbaas,Albert';

System altered.

SQL> show parameter service;

NAME                       TYPE        VALUE
----------------------- ----------- ------------------------------
service_names               string      ORALIANGCE,dbaas,Albert



主库:
SQL> show parameter service; 

NAME                        TYPE        VALUE
------------------------- ----------- ------------------------------
service_names               string      ORALIANGCE,dbaas,Albert

SQL> alter system set service_names='ORALIANGCE,dbaas';

System altered.

SQL> show parameter service;

NAME                          TYPE        VALUE
--------------------------- ----------- ------------------------------
service_names                  string      ORALIANGCE,dbaas

1.4 主备库配置,同步传输确认

主库上执行:

select thread#,
       DB_UNIQUE_NAME,
       DEST_NAME,
       TARGET,
       DATABASE_MODE,
       error,
       RECOVERY_MODE,
       GAP_STATUS,
       APPLIED_SEQ#,
       current_scn,
       applied_scn,
       to_char((scn_to_timestamp(current_scn)),'yyyymmdd hh24:mi:ss') current_scn_time,
       to_char((scn_to_timestamp(applied_scn)),'yyyymmdd hh24:mi:ss') applied_scn_time,
       scn_to_timestamp(current_scn)- scn_to_timestamp(current_scn) as delay_time
  from (SELECT al.thread#,
               ads.dest_id,
               ads.DEST_NAME,
               ads.TYPE || ' ' || ad.TARGET TARGET,
               ADS.DATABASE_MODE,
               ads.STATUS,
               ads.error,
               ads.RECOVERY_MODE,
               ads.DB_UNIQUE_NAME,
               ads.DESTINATION,
               ads.GAP_STATUS,
               (SELECT sequence#
                  FROM v$log na
                 WHERE na.thread# = al.thread#
                   and status = 'CURRENT') Current_Seq#,
               Last_Archived,
               al.APPLIED_SEQ#,
               ad.applied_scn,
               current_scn
          FROM (SELECT dest_id,
                       thread#,
                       d.current_scn,
                       MAX(sequence#) Last_Archived,
                       max(CASE
                             WHEN APPLIED = 'YES' THEN
                              sequence#
                           end) APPLIED_SEQ#
                  FROM v$archived_log V, v$database d
                 WHERE V.resetlogs_change# = d.RESETLOGS_CHANGE#
                 group by dest_id, thread#, d.current_scn) al,
               v$archive_dest_status ads,
               v$archive_dest AD
         WHERE al.dest_id(+) = ads.dest_id 
           AND ads.STATUS != 'INACTIVE'
           AND AD.DEST_ID = ADS.DEST_ID and ad.APPLIED_SCN <> 0
         ORDER BY al.thread#, ads.dest_id)
在主库上执行即可,可以确认主备库的一些基本配置与Current SCN,Applied SCN,Delay_time等

主库上执行:

[ora11g@~]$ dgmgrl
DGMGRL for Linux: Version 11.2.0.4.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys
Password:
Connected.
DGMGRL> show configuration 

Configuration - DG_Config

  Protection Mode: MaxPerformance
  Databases:
    ORALIANGCE    - Primary database
    ORALIANGCEstd - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS


2、查看主备库数据传输是否异常
DGMGRL> 
DGMGRL> show database ORALIANGCEstd  

Database - ORALIANGCEstd

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds (computed 1 second ago)
  Apply Lag:       0 seconds (computed 1 second ago)
  Apply Rate:      13.00 KByte/s
  Real Time Query: ON
  Instance(s):
    ORALIANGCE

Database Status:
SUCCESS

备库中的Transport Lag,Apply Lag是否正常

1.5 broker switcher切换

[ora11g@]$ dgmgrl
DGMGRL for Linux: Version 11.2.0.4.0 - 64bit Production

Copyright (c) 2000, 2009, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys
Password:
Connected.
DGMGRL> switchover to ORALIANGCEstd
Performing switchover NOW, please wait...
Operation requires a connection to instance "ORALIANGCE" on database "ORALIANGCEstd"
Connecting to instance "ORALIANGCE"...
Connected.
New primary database "ORALIANGCEstd" is opening...
Operation requires startup of instance "ORALIANGCE" on database "ORALIANGCE"
Starting instance "ORALIANGCE"...
ORACLE instance started.
Database mounted.
Database opened.
Switchover succeeded, new primary is "ORALIANGCEstd"
DGMGRL>

Broker会帮助你封装主备库实例启停,DG角色切换等命令

1.6 切换后检查

1.如RAC-RAC间切换,需要检查RAC各个节点是否启动

2.主备库数据字典检查
新主库:
SQL> select name,db_unique_name,database_role,open_mode from v$database;

NAME      DB_UNIQUE_NAME                 DATABASE_ROLE    OPEN_MODE
--------- ------------------------------ ---------------- --------------------
ORALIANGCE  ORALIANGCE                       PHYSICAL STANDBY READ ONLY WITH APPLY 

新备库:
SQL> select name,db_unique_name,database_role,open_mode from v$database;

SQL> set linesize 200 
SQL> select process,status,client_process,thread#,sequence#,block#,active_agents,known_agents from gv$managed_standby where process in ('LNS','RFS','MRP0') and  THREAD# <> 0;

PROCESS   STATUS       CLIENT_P    THREAD#  SEQUENCE#     BLOCK# ACTIVE_AGENTS KNOWN_AGENTS
--------- ------------ -------- ---------- ---------- ---------- ------------- ------------
RFS       IDLE         LGWR              1       6864        469             0            0
MRP0      APPLYING_LOG N/A               1       6864        469             5            5

3.alert日志检查:需要仔细检查主备库的alert日志中有无其他ORA报错,如使用HugePage,需要确认HugePage是否正确。

4.应用SERVICE监听注册情况