ORA-15028: ASM file ‘+DG_ARC***’ not dropped; currently being accessed 案例

昨日,接到硬件团队同事电话,告知一个数据库出现内存耗尽,需要重启。查询了一下配置,确认此数据库database_role为 physical standby,随后硬件团队同事进行了服务器重启。

服务器重启过后,启动数据库,但是数据库登录依然缓慢,因为我们配置了broker,会自动启用实时应用:

alter database recover managed standby database using current logfile disconnect from session;

但是奇怪的是,数据库并没有应用日志,一直在处于waiting for log的状态,等待的日志文件已经在备库本地,无奈之下,手工注册日志后数据库进行了日志应用,但是很块MRP0进程继续处于 waiting for log的状态 ,再次注册日志却抛出了

ORA-00311: cannot read header from archiveed log
ORA-27072: File I/O error

看来这次故障并不是简单的内存耗尽,严重怀疑是存储链路层面出现了问题。向硬件团队反馈后,准备断掉主备库DataGuard关系,等存储问题恢复后进行重构。

将DataGuard断掉后,验证一下主库归档备份的状态,发现了更加奇怪的现象:

    RECID OPERATION                         STATUS                  START_TIME          END_TIME            OBJECT_TYPE
---------- --------------------------------- ----------------------- ------------------- ------------------- -------------	
     54558 BACKUP                            COMPLETED WITH WARNINGS 2020-04-16 23:30:37 2020-04-16 23:30:50 ARCHIVELOG
     54556 BACKUP                            COMPLETED WITH WARNINGS 2020-04-16 23:19:42 2020-04-16 23:19:57 ARCHIVELOG
     54554 BACKUP                            COMPLETED WITH WARNINGS 2020-04-16 23:15:10 2020-04-16 23:15:24 ARCHIVELOG
     54552 BACKUP                            COMPLETED WITH WARNINGS 2020-04-16 22:58:50 2020-04-16 22:59:12 ARCHIVELOG
     54550 BACKUP                            COMPLETED WITH WARNINGS 2020-04-16 21:58:51 2020-04-16 21:59:08 ARCHIVELOG
     54548 BACKUP                            COMPLETED WITH WARNINGS 2020-04-16 20:58:51 2020-04-16 20:59:04 ARCHIVELOG
     54546 BACKUP                            COMPLETED WITH WARNINGS 2020-04-16 19:58:51 2020-04-16 19:59:04 ARCHIVELOG


input archived log thread=1 sequence=54511 RECID=217898 STAMP=1037920783
input archived log thread=2 sequence=54482 RECID=217899 STAMP=1037920784
channel ch00: starting piece 1 at 16-APR-20
channel ch00: finished piece 1 at 16-APR-20
piece handle=arch_Travelskydba_std_27847_1_1037920788 tag=TAG20200416T231948 comment=API Version 2.0,MMS Version 5.0.0.0
channel ch00: backup set complete, elapsed time: 00:00:07
channel ch00: deleting archived log(s)
archived log file name=+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_1_seq_54501.352.1037918509 RECID=217870 STAMP=1037918508
RMAN-08118: WARNING: could not delete the following archived redo log
archived log file name=+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_1_seq_54501.352.1037918509 thread=1 sequence=54501
error from target database:
ORA-15028: ASM file '+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_1_seq_54501.352.1037918509' not dropped; currently being accessed

archived log file name=+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_1_seq_54502.355.1037918877 RECID=217873 STAMP=1037918877
RMAN-08118: WARNING: could not delete the following archived redo log
archived log file name=+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_1_seq_54502.355.1037918877 thread=1 sequence=54502
error from target database:
ORA-15028: ASM file '+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_1_seq_54502.355.1037918877' not dropped; currently being accessed

archived log file name=+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_2_seq_54474.308.1037918875 RECID=217872 STAMP=1037918875
RMAN-08118: WARNING: could not delete the following archived redo log
archived log file name=+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_2_seq_54474.308.1037918875 thread=2 sequence=54474
error from target database:
ORA-15028: ASM file '+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_2_seq_54474.308.1037918875' not dropped; currently being accessed

archived log file name=+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_2_seq_54481.388.1037920773 RECID=217897 STAMP=1037920772
archived log file name=+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_1_seq_54510.318.1037920771 RECID=217896 STAMP=1037920770
archived log file name=+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_1_seq_54511.332.1037920783 RECID=217898 STAMP=1037920783
archived log file name=+DG_ARC/Travelskydba_std/archivelog/2020_04_16/thread_2_seq_54482.321.1037920785 RECID=217899 STAMP=1037920784
Finished backup at 16-APR-20

released channel: ch00	

备份日志中出现了ORA-15028错误, error from target database:
ORA-15028: ASM file not dropped; currently being accessed , 可是主备库关系已经断开,备库理论上不需要主库的归档日志并且主库归档进程也是正常的。

再次登录备库,找到了异常的跟因:备库已经shutdown complete,但是操作系统中PMON与arch进程却没有终止(即使用操作系统kill -9也无果)。

随后将备库操作系统关机,再次执行主库的归档备份后, ORA-15028 错误不再出现,备份成功完成

     RECID OPERATION                         STATUS                  START_TIME          END_TIME            OBJECT_TYPE
---------- --------------------------------- ----------------------- ------------------- ------------------- -------------
     54560 BACKUP                            COMPLETED               2020-04-16 23:32:37 2020-04-16 23:32:50 ARCHIVELOG
     54558 BACKUP                            COMPLETED WITH WARNINGS 2020-04-16 23:30:37 2020-04-16 23:30:50 ARCHIVELOG
     54556 BACKUP                            COMPLETED WITH WARNINGS 2020-04-16 23:19:42 2020-04-16 23:19:57 ARCHIVELOG
     54554 BACKUP                            COMPLETED WITH WARNINGS 2020-04-16 23:15:10 2020-04-16 23:15:24 ARCHIVELOG
     54552 BACKUP                            COMPLETED WITH WARNINGS 2020-04-16 22:58:50 2020-04-16 22:59:12 ARCHIVELOG

随后在Oracle Support上搜寻了一下这个错误,文档 Doc ID 1466848.1 进行了非常详细的描述,对应本次的故障案例,备库ARCH进程夯死,传输接收时出现异常,从而主库没有及时的释放archive log的lock,RMAN备份archivelog后无法删除文件。