MRP0进程Hung?

前天晚上,同事发来微信求助,监控到一套2节点数据库RAC 数据库主备库同步状态异常,并且同事经过分析发现造成此问题的是因为Dataguard MRP0进程存在,但是停止在某一个seqence#上,应用的sequnce#与block不进行恢复。

随即登录备库数据库查看同步状态,发现MRP0进程停滞在sequnce:197788与block:186030上,实际上并没有进行“APPLYING_LOG”。

SQL> set linesize 200
select process,status,client_process,thread#,sequence#,block#,active_agents,known_agents from gv$managed_standby where process in ('LNS','RFS','MRP0') and  THREAD# <> 0
SQL>   2  ;

PROCESS   STATUS       CLIENT_P    THREAD#  SEQUENCE#     BLOCK# ACTIVE_AGENTS KNOWN_AGENTS
--------- ------------ -------- ---------- ---------- ---------- ------------- ------------
RFS       IDLE         LGWR              1     199217     123144             0            0
RFS       IDLE         LGWR              2     142409      77956             0            0
MRP0      APPLYING_LOG N/A               1     197788     186030             9            9

此日志197788归档日志文件存在与备库,并且同事反馈尝试alter database register logfile、主库RMAN重新恢复此197788归档日志,均无效。

查询v$archived_log确认197788这个归档文件为20210109 15:00左右主库产生的,分别查看一下主库/备库的alert日志,看看那时数据库有无异常情况发生:

节点1,传输至备时出现异常:
Thread 1 advanced to log sequence 197788 (LGWR switch)
  Current log# 5 seq# 197788 mem# 0: +DG_REDO/Travelskydba/onlinelog/group_5.287.1003922183
  Current log# 5 seq# 197788 mem# 1: +DG_REDO/Travelskydba/onlinelog/group_5.286.1003922183
Sat Jan 09 15:32:25 2021
Archived Log entry 275332 added for thread 1 sequence 197787 ID 0x9c755e67 dest 1:
Sat Jan 09 15:32:25 2021
ARCj: Standby redo logfile selected for thread 1 sequence 197787 for destination LOG_ARCHIVE_DEST_2
Sat Jan 09 15:33:14 2021
LNS: Standby redo logfile selected for thread 1 sequence 197788 for destination LOG_ARCHIVE_DEST_2
Sat Jan 09 15:43:17 2021
Thread 1 advanced to log sequence 197789 (LGWR switch)
  Current log# 1 seq# 197789 mem# 0: +DG_REDO/Travelskydba/onlinelog/group_1.295.1003922179
  Current log# 1 seq# 197789 mem# 1: +DG_REDO/Travelskydba/onlinelog/group_1.294.1003922179
Sat Jan 09 15:43:17 2021
Archived Log entry 275335 added for thread 1 sequence 197788 ID 0x9c755e67 dest 1:
Sat Jan 09 15:47:06 2021
WARN: ARC1: Terminating pid 4047 hung on an I/O operation
Killing 1 processes with pids 4047 (Process by index) in order to remove hung processes. Requested by OS process 31580 on instance 1
ARC1: Error 16198 due to hung I/O operation to LOG_ARCHIVE_DEST_2
ARCo: Standby redo logfile selected for thread 1 sequence 197788 for destination LOG_ARCHIVE_DEST_2

节点2,传输至备库时出现异常:
Sat Jan 09 15:47:15 2021
Archived Log entry 275337 added for thread 2 sequence 141662 ID 0x9c755e67 dest 1:
Sat Jan 09 15:47:35 2021
WARN: ARC2: Terminating pid 27590 hung on an I/O operation
Killing 1 processes with pids 27590 (Process by index) in order to remove hung processes. Requested by OS process 27465 on instance 2
ARC2: Error 16198 due to hung I/O operation to LOG_ARCHIVE_DEST_2
Sat Jan 09 15:47:37 2021
NSA2 started with pid=29, OS id=2625
Sat Jan 09 15:47:42 2021


备库节点:
Archived Log entry 77427 added for thread 1 sequence 197786 ID 0x9c755e67 dest 1:
Media Recovery Log /oracle/archive_Travelskydba/Travelskydba/2_141661_853598219.arc
Media Recovery Waiting for thread 2 sequence 141662 (in transit)
Recovery of Online Redo Log: Thread 2 Group 11 Seq 141662 Reading mem 0
  Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_11.301.1003923935
  Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_11.304.1003923937
Media Recovery Waiting for thread 1 sequence 197788 (in transit)
Recovery of Online Redo Log: Thread 1 Group 7 Seq 197788 Reading mem 0
  Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_7.282.1003923169
  Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_7.280.1003923169

备库节点在1月9日 16:24:31时已经应用完成197788日志
Sat Jan 09 16:24:31 2021
Media Recovery Log /oracle/archive_Travelskydba/Travelskydba/1_197788_853598219.arc

197788这个日志已经应用完毕,并且197789日志也完好的存在与备库节点 ,那么很有可能目前备库MRP0进程不工作的原因是在等其他日志应用完毕,。备库应用日志需要连续,那么是否有可能其他节点或本节点的其他日志?

我们可以利用主库如下sql查询进行确认:

SQL> SELECT FIRST_CHANGE#,
       THREAD#,
       DEST_ID,
       STATUS,
       FAL,
       APPLIED,
       DELETED,
       to_char(FIRST_TIME,'yyyymmdd hh24:mi:ss') FIRST_TIME,
       to_char(COMPLETION_TIME,'yyyymmdd hh24:mi:ss') COMPLETION_TIME,
       sequence#
FROM v$archived_log
WHERE to_char(COMPLETION_TIME,'yyyymmdd hh24:mi:ss') BETWEEN '20210109 15:00:17' AND '20210109 16:00:17'  order by 1,5  2    3    4    5    6    7    8    9   10   11   12  ;

   FIRST_CHANGE#          THREAD#          DEST_ID S FAL APPLIED   DEL FIRST_TIME        COMPLETION_TIME          SEQUENCE#
---------------- ---------------- ---------------- - --- --------- --- ----------------- ----------------- ----------------
   9555246783243                2                2 A NO  YES       NO  20210109 14:32:30 20210109 15:04:31           141659
   9555246783243                2                1 D NO  NO        YES 20210109 14:32:30 20210109 15:04:32           141659
   9555247407959                1                1 D NO  NO        YES 20210109 14:54:15 20210109 15:04:31           197783
   9555247407959                1                2 A NO  YES       NO  20210109 14:54:15 20210109 15:04:31           197783
   9555247700807                1                2 A NO  YES       NO  20210109 15:04:30 20210109 15:15:10           197784
   9555247700807                1                1 D NO  NO        YES 20210109 15:04:30 20210109 15:15:10           197784
   9555247701783                2                2 A NO  YES       NO  20210109 15:04:31 20210109 15:32:20           141660
   9555247701783                2                1 D NO  NO        YES 20210109 15:04:31 20210109 15:32:21           141660
   9555248017764                1                1 D NO  NO        YES 20210109 15:15:10 20210109 15:25:40           197785
   9555248017764                1                2 A NO  YES       NO  20210109 15:15:10 20210109 15:25:40           197785
   9555248313954                1                2 A NO  YES       NO  20210109 15:25:40 20210109 15:33:14           197786
   9555248313954                1                1 D NO  NO        YES 20210109 15:25:40 20210109 15:32:19           197786
   9555248508993                1                1 D NO  NO        YES 20210109 15:32:19 20210109 15:32:25           197787
   9555248508993                1                2 A YES YES       NO  20210109 15:32:19 20210109 15:32:26           197787
   9555248509181                2                1 D NO  NO        YES 20210109 15:32:20 20210109 15:32:23           141661
   9555248509181                2                2 A NO  YES       NO  20210109 15:32:20 20210109 15:32:23           141661
   9555248510236                2                1 D NO  NO        YES 20210109 15:32:23 20210109 15:47:15           141662
   9555248510774                1                1 D NO  NO        YES 20210109 15:32:25 20210109 15:43:17           197788
   9555248831803                1                1 D NO  NO        YES 20210109 15:43:17 20210109 15:47:14           197789
   9555248831803                1                2 A YES NO        NO  20210109 15:43:17 20210109 15:52:30           197789
   9555248943929                1                1 D NO  NO        YES 20210109 15:47:14 20210109 15:52:14           197790
   9555248944032                2                1 D NO  NO        YES 20210109 15:47:15 20210109 15:47:42           141663
   9555248944032                2                2 A YES NO        NO  20210109 15:47:15 20210109 15:53:18           141663
   9555248950448                2                1 D NO  NO        YES 20210109 15:47:42 20210109 15:53:18           141664
   9555248950448                2                2 A YES NO        NO  20210109 15:47:42 20210109 15:55:13           141664

25 rows selected.

上述查询确认后,每个sequence应当归档到dest_id=1(本地),dest_id=2(备库),但是发现thread 2的141662,未显示归档到DEST_ID=2,并且下一个日志便是197788
Media Recovery Log /oracle/archive_Travelskydba/Travelskydba/2_141661_853598219.arc
Media Recovery Waiting for thread 2 sequence 141662 (in transit)
Recovery of Online Redo Log: Thread 2 Group 11 Seq 141662 Reading mem 0
  Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_11.301.1003923935
  Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_11.304.1003923937
Media Recovery Waiting for thread 1 sequence 197788 (in transit)
Recovery of Online Redo Log: Thread 1 Group 7 Seq 197788 Reading mem 0
  Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_7.282.1003923169
  Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_7.280.1003923169
Sat Jan 09 15:49:02 2021
RFS[31]: Possible network disconnect with primary database
后续,备库一值在读取141662这个日志文件,但

Recovery of Online Redo Log: Thread 2 Group 11 Seq 141662 Reading mem 0
  Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_11.301.1003923935
  Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_11.304.1003923937

Mon Jan 18 20:42:42 2021
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 8 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Recovery of Online Redo Log: Thread 2 Group 11 Seq 141662 Reading mem 0
  Mem# 0: /oracle/redo_Travelskydba/Travelskydba/group_11.301.1003923935
  Mem# 1: /oracle/redo_Travelskydba/Travelskydba/group_11.304.1003923937

此时,主库141662这个归档认为已经传输到备库并且已经删除,但备库实际并未接收到。Threa#1 的197788无法应用的根因为Thread# 2 归档141662并未应用。

让同事将141662归档从主库归档备份中恢复出来 ,拷贝至备库注册或等待备库(fal_client)从主库(fal_server)中获取。同事恢复此归档文件后,备库MRP0进程立即开始工作,进行正常的日志应用,同时主库的v$archived_log中的日志记录变成了复合预期的状态:

SQL> SELECT FIRST_CHANGE#,
       THREAD#,
       DEST_ID,
       STATUS,
       FAL,
       APPLIED,
       DELETED,
       to_char(FIRST_TIME,'yyyymmdd hh24:mi:ss') FIRST_TIME,
       to_char(COMPLETION_TIME,'yyyymmdd hh24:mi:ss') COMPLETION_TIME,
       sequence#
FROM v$archived_log
WHERE sequence#=141662  2    3    4    5    6    7    8    9   10   11   12  ;

   FIRST_CHANGE#          THREAD#          DEST_ID S FAL APPLIED   DEL FIRST_TIME        COMPLETION_TIME          SEQUENCE#
---------------- ---------------- ---------------- - --- --------- --- ----------------- ----------------- ----------------
   9555248510236                2                1 D NO  NO        YES 20210109 15:32:23 20210109 15:47:15           141662
   9555248510236                2                1 A NO  NO        NO  20210109 15:32:23 20210118 21:39:53           141662
   9555248510236                2                2 A YES YES       NO  20210109 15:32:23 20210118 21:40:33           141662

利用sysmetric_history与dba_hist_event_histogram定位IO性能问题

近期发现,公司生产4套数据库从12月9日起,每周三22:00-22:30时间段均有应用事务DML缓慢的现象,IO等待事件明显增多:

IO等待类事件明显均有显著抬头的现象,例如log file sync等,但是我们如何定位是否真正的数据库IO真正出现问题?是因还是果?是否是io requeset变多还是IO真正出现问题?

问题时间段的AWR:

AWR中的Avg Wait极有可能被平均
另外一套库的显示的较为直观,log file sync平均等待高达17ms之多

换一个角度,我们从Oracle提供的数据库内部的v$metric_history与dba_hist_sysmetric_history动态视图中可以从“ Average Synchronous Single-Block Read Latency、I/O Megabytes per Second、I/O Requests per Second”等指标中辅助定位:

select to_char(end_time, 'yyyymmdd hh24:mi:ss') as time,
       max(case
             when metric_name = 'Average Synchronous Single-Block Read Latency' then
              round(value,2)
             else null
           end) as "Single Block Read Latency",
       max(case
             when metric_name = 'I/O Megabytes per Second' then
              round(value,2)
             else null
           end) as "IO Megabytes per Second",
       max(case
             when metric_name = 'I/O Requests per Second' then
              round(value,2)
             else null
           end) as "I/O Requests per Second"
  from dba_hist_sysmetric_history where instance_number=1 and to_char(end_time, 'yyyymmdd hh24:mi:ss') between '20210106 22:00:00' and '20210106 23:00:00'
 group by to_char(end_time, 'yyyymmdd hh24:mi:ss') order by 1

TIME              Single Block Read Latency IO Megabytes per Second I/O Requests per Second
----------------- ------------------------- ----------------------- -----------------------
20210106 21:00:44                      1.75                   23.69                 2060.17
20210106 21:01:44                       2.1                   13.32                 1474.73
20210106 21:02:43                      1.99                    18.6                 1716.56
20210106 21:03:43                      1.43                   11.54                 1089.31
20210106 21:04:43                      1.69                   20.77                 1567.28
20210106 21:05:43                      1.61                   17.92                 1806.92
20210106 21:06:44                      2.21                   27.95                 2466.91
20210106 21:07:44                      1.63                   11.29                 1052.18
20210106 21:08:44                      2.01                   17.23                 1729.85
20210106 21:09:43                      1.63                   12.53                  1309.2
20210106 21:10:43                      1.36                   15.53                 1180.81
20210106 21:11:43                      2.57                   12.97                 1399.38
20210106 21:12:43                      1.42                   16.71                 1323.95
20210106 21:13:44                      1.65                   10.37                 1367.22
20210106 21:14:44                      1.42                   20.49                    1493
20210106 21:15:44                      1.66                    8.21                 1186.28
20210106 21:16:43                      2.39                   23.23                 2012.56
20210106 21:17:43                       1.3                   11.34                 1219.04
20210106 21:18:43                      1.15                   15.06                 1219.79
20210106 21:19:43                      1.66                       7                 1057.58
20210106 21:20:44                      1.38                   19.74                 1402.93
20210106 21:21:44                      2.59                    7.66                 1086.08
20210106 21:22:44                      1.22                   16.51                 1307.65
20210106 21:23:43                      1.39                   12.68                 1391.17
20210106 21:24:43                      1.43                   15.38                 1255.99
20210106 21:25:43                      1.37                   13.82                 1489.76
20210106 21:26:44                       1.9                   24.64                 2456.92
20210106 21:27:44                      1.49                    7.35                 1232.54
20210106 21:28:44                      1.06                   50.22                  1287.7
20210106 21:29:44                      2.06                   31.16                 1516.66
20210106 21:30:43                      1.21                   67.67                 1750.49
20210106 21:31:43                      2.52                   67.19                  1398.3
20210106 21:32:43                      1.68                   38.94                 1831.17
20210106 21:33:44                      1.46                    9.81                 1366.68
20210106 21:34:44                      1.61                   17.64                 1447.77
20210106 21:35:44                      1.52                   15.25                 1860.36
20210106 21:36:44                      1.61                   26.07                 2350.26
20210106 21:37:43                      2.18                   11.01                 1161.19
20210106 21:38:43                      1.38                   16.08                 1438.71
20210106 21:39:44                      1.56                   10.44                 1005.47
20210106 21:40:44                      1.23                   15.76                 1202.61
20210106 21:41:44                      2.21                    8.56                 1374.73
20210106 21:42:44                      1.71                    19.9                 1298.54
20210106 21:43:44                      1.49                    9.08                 1196.41
20210106 21:44:43                      1.79                   18.07                 1715.96
20210106 21:45:43                      1.86                   11.46                 1055.36
20210106 21:46:44                      1.36                   22.71                  1960.6
20210106 21:47:44                      1.13                    27.1                 1589.31
20210106 21:48:44                      1.39                   18.62                 1142.78
20210106 21:49:44                      1.88                    6.48                  904.06
20210106 21:50:44                      1.22                   20.12                  1420.4
20210106 21:51:43                      1.79                    7.54                 1054.08
20210106 21:52:44                      1.32                   16.79                 1246.04
20210106 21:53:44                       1.5                   12.89                 1509.54
20210106 21:54:44                      1.59                   14.58                  961.99
20210106 21:55:44                      1.75                    8.41                 1114.03
20210106 21:56:44                      1.42                   29.02                 2388.62
20210106 21:57:44                       1.7                    6.22                  815.38
20210106 21:58:43                      1.36                   14.27                 1024.69
20210106 21:59:44                      1.89                   10.66                 1133.77
20210106 22:00:44                      1.02                   52.87                 3232.29
20210106 22:01:44                       .71                   94.45                 4846.43
20210106 22:02:44                      5.28                   83.54                 2627.49
20210106 22:03:44                      7.04                   69.19                 2067.08
20210106 22:04:44                      7.07                   74.68                 2382.61
20210106 22:05:43                      4.87                   79.73                 2654.56
20210106 22:06:44                      5.94                   90.77                 3937.21
20210106 22:07:44                      8.01                   31.11                 1074.18
20210106 22:08:44                      6.16                   26.27                 1639.99
20210106 22:09:44                      2.71                    31.5                 1797.32
20210106 22:10:44                      5.12                   37.47                 1533.45
20210106 22:11:44                      5.04                   62.63                 1520.07
20210106 22:12:44                      3.81                   35.91                 1402.06
20210106 22:13:44                      3.66                   18.25                 1626.73
20210106 22:14:44                      3.01                   61.43                  2179.6
20210106 22:15:44                      2.59                   75.41                 1789.91
20210106 22:16:44                      3.13                    91.9                 2745.88
20210106 22:17:44                      2.97                   86.56                  1798.5
20210106 22:18:45                      2.33                   84.64                 1505.69
20210106 22:19:44                      1.74                    72.1                 1378.42
20210106 22:20:44                      1.84                   91.02                 1957.69
20210106 22:21:44                      2.72                   68.14                 1447.02
20210106 22:22:44                      2.18                   80.76                 1649.19
20210106 22:23:44                      2.75                   75.58                 1676.43
20210106 22:24:44                       2.2                   36.45                 1502.78
20210106 22:25:45                      2.21                   21.77                 1429.71
20210106 22:26:44                      2.61                   40.56                 2667.25
20210106 22:27:44                      2.09                   19.95                 1068.11
20210106 22:28:44                      1.39                   57.66                 1419.02
20210106 22:29:44                      1.61                   44.66                 1701.81
20210106 22:30:44                       1.2                   74.61                 2328.65
20210106 22:31:45                      1.29                   83.01                 2020.88
20210106 22:32:45                       .76                   94.96                 2706.07
20210106 22:33:44                      1.02                   22.84                 2032.87
20210106 22:34:44                       .74                   58.67                 2297.62
20210106 22:35:44                      1.76                   81.66                 1727.17
20210106 22:36:44                       .58                    66.2                 4839.24
20210106 22:37:44                       .87                   29.04                 1617.19
20210106 22:38:45                      1.54                   53.16                 1539.99
20210106 22:39:45                      1.21                    51.9                  1420.5
20210106 22:40:44                         1                   49.76                 1646.52
20210106 22:41:44                      2.23                   74.05                 1761.78
20210106 22:42:44                      1.17                  146.94                 1505.77
20210106 22:43:44                      1.39                   75.94                 1620.12
20210106 22:44:45                      1.07                   73.43                 2045.32
20210106 22:45:45                       .44                   82.36                 2572.26
20210106 22:46:44                         2                   54.02                 1909.32
20210106 22:47:44                      1.61                   43.08                 1479.45
20210106 22:48:44                      1.29                   47.81                 1296.04
20210106 22:49:44                      1.39                    50.6                 1337.25
20210106 22:50:44                       1.3                   55.93                 1497.06
20210106 22:51:45                      1.85                   43.28                 1074.38
20210106 22:52:45                      1.06                   93.48                 1559.25
20210106 22:53:44                      1.17                   39.43                 2132.45
20210106 22:54:44                       .54                   50.37                 2259.39
20210106 22:55:44                      1.35                   47.12                 1347.34
20210106 22:56:44                      2.09                   57.55                 2647.21
20210106 22:57:45                      1.33                   46.67                  980.51
20210106 22:58:45                       .68                   74.52                 1480.33
20210106 22:59:45                      1.19                    51.5                 1871.55

可以从上述三个指标中观察,Single Block Read Latency高的时候,IO请求量并不匹配,并非因为IO请求量导致IO效率下降,延时变高。
延时指标高,IO量并不大
继续阅读