近期发现,公司生产4套数据库从12月9日起,每周三22:00-22:30时间段均有应用事务DML缓慢的现象,IO等待事件明显增多:

IO等待类事件明显均有显著抬头的现象,例如log file sync等,但是我们如何定位是否真正的数据库IO真正出现问题?是因还是果?是否是io requeset变多还是IO真正出现问题?
问题时间段的AWR:


换一个角度,我们从Oracle提供的数据库内部的v$metric_history与dba_hist_sysmetric_history动态视图中可以从“ Average Synchronous Single-Block Read Latency、I/O Megabytes per Second、I/O Requests per Second”等指标中辅助定位:
select to_char(end_time, 'yyyymmdd hh24:mi:ss') as time,
max(case
when metric_name = 'Average Synchronous Single-Block Read Latency' then
round(value,2)
else null
end) as "Single Block Read Latency",
max(case
when metric_name = 'I/O Megabytes per Second' then
round(value,2)
else null
end) as "IO Megabytes per Second",
max(case
when metric_name = 'I/O Requests per Second' then
round(value,2)
else null
end) as "I/O Requests per Second"
from dba_hist_sysmetric_history where instance_number=1 and to_char(end_time, 'yyyymmdd hh24:mi:ss') between '20210106 22:00:00' and '20210106 23:00:00'
group by to_char(end_time, 'yyyymmdd hh24:mi:ss') order by 1
TIME Single Block Read Latency IO Megabytes per Second I/O Requests per Second
----------------- ------------------------- ----------------------- -----------------------
20210106 21:00:44 1.75 23.69 2060.17
20210106 21:01:44 2.1 13.32 1474.73
20210106 21:02:43 1.99 18.6 1716.56
20210106 21:03:43 1.43 11.54 1089.31
20210106 21:04:43 1.69 20.77 1567.28
20210106 21:05:43 1.61 17.92 1806.92
20210106 21:06:44 2.21 27.95 2466.91
20210106 21:07:44 1.63 11.29 1052.18
20210106 21:08:44 2.01 17.23 1729.85
20210106 21:09:43 1.63 12.53 1309.2
20210106 21:10:43 1.36 15.53 1180.81
20210106 21:11:43 2.57 12.97 1399.38
20210106 21:12:43 1.42 16.71 1323.95
20210106 21:13:44 1.65 10.37 1367.22
20210106 21:14:44 1.42 20.49 1493
20210106 21:15:44 1.66 8.21 1186.28
20210106 21:16:43 2.39 23.23 2012.56
20210106 21:17:43 1.3 11.34 1219.04
20210106 21:18:43 1.15 15.06 1219.79
20210106 21:19:43 1.66 7 1057.58
20210106 21:20:44 1.38 19.74 1402.93
20210106 21:21:44 2.59 7.66 1086.08
20210106 21:22:44 1.22 16.51 1307.65
20210106 21:23:43 1.39 12.68 1391.17
20210106 21:24:43 1.43 15.38 1255.99
20210106 21:25:43 1.37 13.82 1489.76
20210106 21:26:44 1.9 24.64 2456.92
20210106 21:27:44 1.49 7.35 1232.54
20210106 21:28:44 1.06 50.22 1287.7
20210106 21:29:44 2.06 31.16 1516.66
20210106 21:30:43 1.21 67.67 1750.49
20210106 21:31:43 2.52 67.19 1398.3
20210106 21:32:43 1.68 38.94 1831.17
20210106 21:33:44 1.46 9.81 1366.68
20210106 21:34:44 1.61 17.64 1447.77
20210106 21:35:44 1.52 15.25 1860.36
20210106 21:36:44 1.61 26.07 2350.26
20210106 21:37:43 2.18 11.01 1161.19
20210106 21:38:43 1.38 16.08 1438.71
20210106 21:39:44 1.56 10.44 1005.47
20210106 21:40:44 1.23 15.76 1202.61
20210106 21:41:44 2.21 8.56 1374.73
20210106 21:42:44 1.71 19.9 1298.54
20210106 21:43:44 1.49 9.08 1196.41
20210106 21:44:43 1.79 18.07 1715.96
20210106 21:45:43 1.86 11.46 1055.36
20210106 21:46:44 1.36 22.71 1960.6
20210106 21:47:44 1.13 27.1 1589.31
20210106 21:48:44 1.39 18.62 1142.78
20210106 21:49:44 1.88 6.48 904.06
20210106 21:50:44 1.22 20.12 1420.4
20210106 21:51:43 1.79 7.54 1054.08
20210106 21:52:44 1.32 16.79 1246.04
20210106 21:53:44 1.5 12.89 1509.54
20210106 21:54:44 1.59 14.58 961.99
20210106 21:55:44 1.75 8.41 1114.03
20210106 21:56:44 1.42 29.02 2388.62
20210106 21:57:44 1.7 6.22 815.38
20210106 21:58:43 1.36 14.27 1024.69
20210106 21:59:44 1.89 10.66 1133.77
20210106 22:00:44 1.02 52.87 3232.29
20210106 22:01:44 .71 94.45 4846.43
20210106 22:02:44 5.28 83.54 2627.49
20210106 22:03:44 7.04 69.19 2067.08
20210106 22:04:44 7.07 74.68 2382.61
20210106 22:05:43 4.87 79.73 2654.56
20210106 22:06:44 5.94 90.77 3937.21
20210106 22:07:44 8.01 31.11 1074.18
20210106 22:08:44 6.16 26.27 1639.99
20210106 22:09:44 2.71 31.5 1797.32
20210106 22:10:44 5.12 37.47 1533.45
20210106 22:11:44 5.04 62.63 1520.07
20210106 22:12:44 3.81 35.91 1402.06
20210106 22:13:44 3.66 18.25 1626.73
20210106 22:14:44 3.01 61.43 2179.6
20210106 22:15:44 2.59 75.41 1789.91
20210106 22:16:44 3.13 91.9 2745.88
20210106 22:17:44 2.97 86.56 1798.5
20210106 22:18:45 2.33 84.64 1505.69
20210106 22:19:44 1.74 72.1 1378.42
20210106 22:20:44 1.84 91.02 1957.69
20210106 22:21:44 2.72 68.14 1447.02
20210106 22:22:44 2.18 80.76 1649.19
20210106 22:23:44 2.75 75.58 1676.43
20210106 22:24:44 2.2 36.45 1502.78
20210106 22:25:45 2.21 21.77 1429.71
20210106 22:26:44 2.61 40.56 2667.25
20210106 22:27:44 2.09 19.95 1068.11
20210106 22:28:44 1.39 57.66 1419.02
20210106 22:29:44 1.61 44.66 1701.81
20210106 22:30:44 1.2 74.61 2328.65
20210106 22:31:45 1.29 83.01 2020.88
20210106 22:32:45 .76 94.96 2706.07
20210106 22:33:44 1.02 22.84 2032.87
20210106 22:34:44 .74 58.67 2297.62
20210106 22:35:44 1.76 81.66 1727.17
20210106 22:36:44 .58 66.2 4839.24
20210106 22:37:44 .87 29.04 1617.19
20210106 22:38:45 1.54 53.16 1539.99
20210106 22:39:45 1.21 51.9 1420.5
20210106 22:40:44 1 49.76 1646.52
20210106 22:41:44 2.23 74.05 1761.78
20210106 22:42:44 1.17 146.94 1505.77
20210106 22:43:44 1.39 75.94 1620.12
20210106 22:44:45 1.07 73.43 2045.32
20210106 22:45:45 .44 82.36 2572.26
20210106 22:46:44 2 54.02 1909.32
20210106 22:47:44 1.61 43.08 1479.45
20210106 22:48:44 1.29 47.81 1296.04
20210106 22:49:44 1.39 50.6 1337.25
20210106 22:50:44 1.3 55.93 1497.06
20210106 22:51:45 1.85 43.28 1074.38
20210106 22:52:45 1.06 93.48 1559.25
20210106 22:53:44 1.17 39.43 2132.45
20210106 22:54:44 .54 50.37 2259.39
20210106 22:55:44 1.35 47.12 1347.34
20210106 22:56:44 2.09 57.55 2647.21
20210106 22:57:45 1.33 46.67 980.51
20210106 22:58:45 .68 74.52 1480.33
20210106 22:59:45 1.19 51.5 1871.55
可以从上述三个指标中观察,Single Block Read Latency高的时候,IO请求量并不匹配,并非因为IO请求量导致IO效率下降,延时变高。

从 dba_hist_event_histogram中可以进一步分析:可以看到log file parallel write等待事件落在8ms+的明显增多,再次证明此时的IO效率确实变差。
END_TIME EVENT_NAME WAIT_TIME_MILLI_1 WAIT_TIME_MILLI_2 WAIT_TIME_MILLI_4 WAIT_TIME_MILLI_8 WAIT_TIME_MILLI_16 WAIT_TIME_MILLI_32 WAIT_TIME_MILLI_64 WAIT_TIME_MILLI_128 WAIT_TIME_MILLI_256 WAIT_TIME_MILLI_512 WAIT_TIME_MILLI_1024 WAIT_TIME_MILLI_2048 WAIT_TIME_MILLI_4096
----------------- ---------------------------------------------------------------- ----------------- ----------------- ----------------- ----------------- ------------------ ------------------ ------------------ ------------------- ------------------- ------------------- -------------------- -------------------- --------------------
20210106 13:00:08 log file parallel write 1453847 73017 19362 4713 1126 200 48 13 9 0 0 0 0
20210106 14:00:27 log file parallel write 1382364 69863 17836 4631 1089 173 40 13 9 0 0 0 0
20210106 15:00:45 log file parallel write 1439064 88212 24520 6244 1306 178 34 9 10 0 0 0 0
20210106 16:00:02 log file parallel write 1572976 114068 30569 7419 1727 248 51 22 13 1 0 0 0
20210106 17:00:19 log file parallel write 1459730 79735 24228 5727 1366 230 41 15 9 0 0 0 0
20210106 18:00:36 log file parallel write 1628903 114694 42910 8061 1778 298 63 13 10 1 0 0 0
20210106 19:00:09 log file parallel write 1546947 77375 22573 5431 1308 211 44 12 7 0 0 0 0
20210106 20:00:10 log file parallel write 1576142 67645 16658 4506 1164 223 30 9 8 0 0 0 0
20210106 21:00:28 log file parallel write 1783540 88585 28536 6796 1592 259 60 10 10 0 0 0 0
20210106 22:00:10 log file parallel write 1284690 70283 24521 5326 1088 186 44 10 9 0 0 0 0
20210106 23:00:01 log file parallel write 828876 51267 23312 11203 6272 4512 2895 1651 698 183 17 0 0
20210107 00:00:18 log file parallel write 893169 51139 21390 8748 2500 427 59 4 5 0 0 0 0
20210107 01:00:11 log file parallel write 681905 44463 19066 8455 3241 800 199 25 2 0 0 0 0
20210107 02:00:31 log file parallel write 581899 30477 11871 4912 1360 181 31 5 4 0 0 0 0
20210107 03:00:09 log file parallel write 622364 23694 8323 3389 1098 285 86 12 6 0 0 0 0
20210107 04:00:11 log file parallel write 663978 24640 8175 3120 888 107 19 2 8 0 0 0 0
20210107 05:00:31 log file parallel write 595565 26306 11893 5084 1151 151 34 4 4 0 0 0 0
20210107 06:00:51 log file parallel write 943310 41237 21569 9587 2086 252 47 12 6 0 0 0 0
20210107 07:00:11 log file parallel write 1085230 33779 14295 5282 1266 230 47 3 8 0 0 0 0
20210107 08:00:31 log file parallel write 1329946 40328 12880 3500 800 165 34 9 8 0 0 0 0
20210107 09:00:51 log file parallel write 1619221 81287 29736 6585 1325 239 34 11 11 0 0 0 0
20210107 10:00:07 log file parallel write 1709450 88925 31447 7049 1627 312 67 17 8 0 0 0 0
20210107 11:00:24 log file parallel write 1539844 81010 27929 6687 1567 269 56 12 8 0 0 0 0
20210107 12:00:42 log file parallel write 1599507 83158 29299 6622 1556 229 46 13 2 2 0 0 0
20210107 13:00:13 log file parallel write 1559075 57723 22417 6122 1428 249 38 16 7 0 0 0 0
20210107 14:00:17 log file parallel write 1364735 46168 18084 4876 1317 274 46 14 5 0 0 0 0
20210107 15:00:34 log file parallel write 1487643 74023 23257 5939 1467 279 44 11 6 1 0 0 0
20210107 16:00:13 log file parallel write 1386919 94483 22675 5060 1282 294 60 18 7 0 0 0 0
20210107 17:00:08 log file parallel write 1430871 77444 22646 5076 1211 243 40 6 6 0 0 0 0
20210107 18:00:26 log file parallel write 1409409 103298 37591 6236 1427 193 39 13 7 1 0 0 0
20210107 19:00:14 log file parallel write 1473810 68445 17323 4645 1397 283 69 8 6 0 0 0 0
20210107 20:00:00 log file parallel write 1610689 69356 16569 4273 1176 217 41 13 6 0 0 0 0
20210107 21:00:18 log file parallel write 1838288 79924 19698 5138 1423 237 58 5 8 1 0 0 0
20210107 22:00:15 log file parallel write 1488849 62903 15115 4106 1093 206 40 15 6 0 0 0 0
20210107 23:00:34 log file parallel write 1333550 58605 17588 6038 1829 556 183 31 5 0 0 0 0
20210108 00:00:08 log file parallel write 1110931 46863 10661 2911 832 138 31 2 7 0 0 0 0
20210108 01:00:14 log file parallel write 1004687 39615 12080 4134 1215 183 32 9 3 0 0 0 0
20210108 02:00:34 log file parallel write 839122 38891 13360 4614 1568 271 45 8 9 0 0 0 0
20210108 03:00:53 log file parallel write 862505 31053 7420 2179 917 166 24 2 7 1 0 0 0
20210108 04:00:13 log file parallel write 848262 22221 5702 1834 762 128 24 3 3 0 0 0 0
20210108 05:00:32 log file parallel write 1055400 71281 28110 8952 1631 311 39 3 7 0 0 0 0
瞬时值可利用如下SQL定位,利用v$metric_history视图
select *
from (select to_char(end_time,'yyyymmdd hh24:mi:ss'), metric_name, value
from v$metric_history
where metric_name in ('Average Synchronous Single-Block Read Latency',
'I/O Requests per Second',
'I/O Megabytes per Second'))
pivot(sum(round(value, 2))
for metric_name in('Average Synchronous Single-Block Read Latency',
'I/O Requests per Second',
'I/O Megabytes per Second'))
order by 1
Oracle提供了诸多动态性能视图,内部结合不同的时间模型采集了非常全面的性能指标,对分析问题、定位故障提供了非常直观的数据支撑。大家有兴趣可以分析分析这些动态视图,11.2.0.4版本下v$metric_history提供了200+个指标,有些指标完全可以做成监控来使用,大家有兴趣可以仔细研究研究,Oracle初衷也是希望这些数据可以在Troubleshooting时能够帮助到大家。