前天晚上,同事发来微信求助,监控到一套2节点数据库RAC 数据库主备库同步状态异常,并且同事经过分析发现造成此问题的是因为Dataguard MRP0进程存在,但是停止在某一个seqence#上,应用的sequnce#与block不进行恢复。
随即登录备库数据库查看同步状态,发现MRP0进程停滞在sequnce:197788与block:186030上,实际上并没有进行“APPLYING_LOG”。
SQL> set linesize 200
select process,status,client_process,thread#,sequence#,block#,active_agents,known_agents from gv$managed_standby where process in ('LNS','RFS','MRP0') and THREAD# <> 0
SQL> 2 ;
PROCESS STATUS CLIENT_P THREAD# SEQUENCE# BLOCK# ACTIVE_AGENTS KNOWN_AGENTS
--------- ------------ -------- ---------- ---------- ---------- ------------- ------------
RFS IDLE LGWR 1 199217 123144 0 0
RFS IDLE LGWR 2 142409 77956 0 0
MRP0 APPLYING_LOG N/A 1 197788 186030 9 9
此日志197788归档日志文件存在与备库,并且同事反馈尝试alter database register logfile、主库RMAN重新恢复此197788归档日志,均无效。
查询v$archived_log确认197788这个归档文件为20210109 15:00左右主库产生的,分别查看一下主库/备库的alert日志,看看那时数据库有无异常情况发生:
节点1,传输至备时出现异常:
Thread 1 advanced to log sequence 197788 (LGWR switch)
Current log# 5 seq# 197788 mem# 0: +DG_REDO/Travelskydba/onlinelog/group_5.287.1003922183
Current log# 5 seq# 197788 mem# 1: +DG_REDO/Travelskydba/onlinelog/group_5.286.1003922183
Sat Jan 09 15:32:25 2021
Archived Log entry 275332 added for thread 1 sequence 197787 ID 0x9c755e67 dest 1:
Sat Jan 09 15:32:25 2021
ARCj: Standby redo logfile selected for thread 1 sequence 197787 for destination LOG_ARCHIVE_DEST_2
Sat Jan 09 15:33:14 2021
LNS: Standby redo logfile selected for thread 1 sequence 197788 for destination LOG_ARCHIVE_DEST_2
Sat Jan 09 15:43:17 2021
Thread 1 advanced to log sequence 197789 (LGWR switch)
Current log# 1 seq# 197789 mem# 0: +DG_REDO/Travelskydba/onlinelog/group_1.295.1003922179
Current log# 1 seq# 197789 mem# 1: +DG_REDO/Travelskydba/onlinelog/group_1.294.1003922179
Sat Jan 09 15:43:17 2021
Archived Log entry 275335 added for thread 1 sequence 197788 ID 0x9c755e67 dest 1:
Sat Jan 09 15:47:06 2021
WARN: ARC1: Terminating pid 4047 hung on an I/O operation
Killing 1 processes with pids 4047 (Process by index) in order to remove hung processes. Requested by OS process 31580 on instance 1
ARC1: Error 16198 due to hung I/O operation to LOG_ARCHIVE_DEST_2
ARCo: Standby redo logfile selected for thread 1 sequence 197788 for destination LOG_ARCHIVE_DEST_2
节点2,传输至备库时出现异常:
Sat Jan 09 15:47:15 2021
Archived Log entry 275337 added for thread 2 sequence 141662 ID 0x9c755e67 dest 1:
Sat Jan 09 15:47:35 2021
WARN: ARC2: Terminating pid 27590 hung on an I/O operation
Killing 1 processes with pids 27590 (Process by index) in order to remove hung processes. Requested by OS process 27465 on instance 2
ARC2: Error 16198 due to hung I/O operation to LOG_ARCHIVE_DEST_2
Sat Jan 09 15:47:37 2021
NSA2 started with pid=29, OS id=2625
Sat Jan 09 15:47:42 2021
备库节点:
Archived Log entry 77427 added for thread 1 sequence 197786 ID 0x9c755e67 dest 1:
Media Recovery Log /oracle/archive_Travelskydba/Travelskydba/2_141661_853598219.arc
Media Recovery Waiting for thread 2 sequence 141662 (in transit)
Recovery of Online Redo Log: Thread 2 Group 11 Seq 141662 Reading mem 0
Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_11.301.1003923935
Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_11.304.1003923937
Media Recovery Waiting for thread 1 sequence 197788 (in transit)
Recovery of Online Redo Log: Thread 1 Group 7 Seq 197788 Reading mem 0
Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_7.282.1003923169
Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_7.280.1003923169
备库节点在1月9日 16:24:31时已经应用完成197788日志
Sat Jan 09 16:24:31 2021
Media Recovery Log /oracle/archive_Travelskydba/Travelskydba/1_197788_853598219.arc
197788这个日志已经应用完毕,并且197789日志也完好的存在与备库节点 ,那么很有可能目前备库MRP0进程不工作的原因是在等其他日志应用完毕,。备库应用日志需要连续,那么是否有可能其他节点或本节点的其他日志?
我们可以利用主库如下sql查询进行确认:
SQL> SELECT FIRST_CHANGE#,
THREAD#,
DEST_ID,
STATUS,
FAL,
APPLIED,
DELETED,
to_char(FIRST_TIME,'yyyymmdd hh24:mi:ss') FIRST_TIME,
to_char(COMPLETION_TIME,'yyyymmdd hh24:mi:ss') COMPLETION_TIME,
sequence#
FROM v$archived_log
WHERE to_char(COMPLETION_TIME,'yyyymmdd hh24:mi:ss') BETWEEN '20210109 15:00:17' AND '20210109 16:00:17' order by 1,5 2 3 4 5 6 7 8 9 10 11 12 ;
FIRST_CHANGE# THREAD# DEST_ID S FAL APPLIED DEL FIRST_TIME COMPLETION_TIME SEQUENCE#
---------------- ---------------- ---------------- - --- --------- --- ----------------- ----------------- ----------------
9555246783243 2 2 A NO YES NO 20210109 14:32:30 20210109 15:04:31 141659
9555246783243 2 1 D NO NO YES 20210109 14:32:30 20210109 15:04:32 141659
9555247407959 1 1 D NO NO YES 20210109 14:54:15 20210109 15:04:31 197783
9555247407959 1 2 A NO YES NO 20210109 14:54:15 20210109 15:04:31 197783
9555247700807 1 2 A NO YES NO 20210109 15:04:30 20210109 15:15:10 197784
9555247700807 1 1 D NO NO YES 20210109 15:04:30 20210109 15:15:10 197784
9555247701783 2 2 A NO YES NO 20210109 15:04:31 20210109 15:32:20 141660
9555247701783 2 1 D NO NO YES 20210109 15:04:31 20210109 15:32:21 141660
9555248017764 1 1 D NO NO YES 20210109 15:15:10 20210109 15:25:40 197785
9555248017764 1 2 A NO YES NO 20210109 15:15:10 20210109 15:25:40 197785
9555248313954 1 2 A NO YES NO 20210109 15:25:40 20210109 15:33:14 197786
9555248313954 1 1 D NO NO YES 20210109 15:25:40 20210109 15:32:19 197786
9555248508993 1 1 D NO NO YES 20210109 15:32:19 20210109 15:32:25 197787
9555248508993 1 2 A YES YES NO 20210109 15:32:19 20210109 15:32:26 197787
9555248509181 2 1 D NO NO YES 20210109 15:32:20 20210109 15:32:23 141661
9555248509181 2 2 A NO YES NO 20210109 15:32:20 20210109 15:32:23 141661
9555248510236 2 1 D NO NO YES 20210109 15:32:23 20210109 15:47:15 141662
9555248510774 1 1 D NO NO YES 20210109 15:32:25 20210109 15:43:17 197788
9555248831803 1 1 D NO NO YES 20210109 15:43:17 20210109 15:47:14 197789
9555248831803 1 2 A YES NO NO 20210109 15:43:17 20210109 15:52:30 197789
9555248943929 1 1 D NO NO YES 20210109 15:47:14 20210109 15:52:14 197790
9555248944032 2 1 D NO NO YES 20210109 15:47:15 20210109 15:47:42 141663
9555248944032 2 2 A YES NO NO 20210109 15:47:15 20210109 15:53:18 141663
9555248950448 2 1 D NO NO YES 20210109 15:47:42 20210109 15:53:18 141664
9555248950448 2 2 A YES NO NO 20210109 15:47:42 20210109 15:55:13 141664
25 rows selected.
上述查询确认后,每个sequence应当归档到dest_id=1(本地),dest_id=2(备库),但是发现thread 2的141662,未显示归档到DEST_ID=2,并且下一个日志便是197788
Media Recovery Log /oracle/archive_Travelskydba/Travelskydba/2_141661_853598219.arc
Media Recovery Waiting for thread 2 sequence 141662 (in transit)
Recovery of Online Redo Log: Thread 2 Group 11 Seq 141662 Reading mem 0
Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_11.301.1003923935
Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_11.304.1003923937
Media Recovery Waiting for thread 1 sequence 197788 (in transit)
Recovery of Online Redo Log: Thread 1 Group 7 Seq 197788 Reading mem 0
Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_7.282.1003923169
Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_7.280.1003923169
Sat Jan 09 15:49:02 2021
RFS[31]: Possible network disconnect with primary database
后续,备库一值在读取141662这个日志文件,但
Recovery of Online Redo Log: Thread 2 Group 11 Seq 141662 Reading mem 0
Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_11.301.1003923935
Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_11.304.1003923937
Mon Jan 18 20:42:42 2021
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 8 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Recovery of Online Redo Log: Thread 2 Group 11 Seq 141662 Reading mem 0
Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_11.301.1003923935
Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_11.304.1003923937
此时,主库141662这个归档认为已经传输到备库并且已经删除,但备库实际并未接收到。Threa#1 的197788无法应用的根因为Thread# 2 归档141662并未应用。
让同事将141662归档从主库归档备份中恢复出来 ,拷贝至备库注册或等待备库(fal_client)从主库(fal_server)中获取。同事恢复此归档文件后,备库MRP0进程立即开始工作,进行正常的日志应用,同时主库的v$archived_log中的日志记录变成了复合预期的状态:
SQL> SELECT FIRST_CHANGE#,
THREAD#,
DEST_ID,
STATUS,
FAL,
APPLIED,
DELETED,
to_char(FIRST_TIME,'yyyymmdd hh24:mi:ss') FIRST_TIME,
to_char(COMPLETION_TIME,'yyyymmdd hh24:mi:ss') COMPLETION_TIME,
sequence#
FROM v$archived_log
WHERE sequence#=141662 2 3 4 5 6 7 8 9 10 11 12 ;
FIRST_CHANGE# THREAD# DEST_ID S FAL APPLIED DEL FIRST_TIME COMPLETION_TIME SEQUENCE#
---------------- ---------------- ---------------- - --- --------- --- ----------------- ----------------- ----------------
9555248510236 2 1 D NO NO YES 20210109 15:32:23 20210109 15:47:15 141662
9555248510236 2 1 A NO NO NO 20210109 15:32:23 20210118 21:39:53 141662
9555248510236 2 2 A YES YES NO 20210109 15:32:23 20210118 21:40:33 141662