ORA-600/0RA-7445 导致Instance Crash无法启动恢复案例

前几日,一套开发人员邮件求助,数据库在执行批量删除delete时数据库异常崩溃,无法启动,启动后PMON进程会自动停止实例,数据库后台日志如下,数据库异常时出现ORA-00600与ORA-7445报错:

Wed Aug 21 15:53:29 2019
 Dumping diagnostic data in directory=[cdmp_20190821155329], requested by (instance=1, osid=29995), summary=[incident=437112].
 Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97ECFFC, kghalo()+570] [flags: 0x0, count: 1]
 Errors in file /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/trace/Travelskydba_ora_29995.trc (incident=437114):
 ORA-07445: 出现异常错误: 核心转储 [kghalo()+570] [SIGSEGV] [ADDR:0x0] [PC:0x97ECFFC] [SI_KERNEL(general_protection)] []
 ORA-00600: 内部错误代码, 参数: [17182], [0x7F617B5DAE58], [], [], [], [], [], [], [], [], [], []
 ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
 Incident details in: /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/incident/incdir_437114/Travelskydba_ora_29995_i437114.trc
 1562074,1 99%
 ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
 Incident details in: /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/incident/incdir_437114/Travelskydba_ora_29995_i437114.trc
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 Wed Aug 21 15:53:31 2019
 Sweep [inc][437114]: completed
 Sweep [inc][437113]: completed
 Sweep [inc][437112]: completed
 Sweep [inc2][437113]: completed
 Sweep [inc2][437112]: completed
 Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97ECFFC, kghalo()+570] [flags: 0x0, count: 2]
 Errors in file /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/trace/Travelskydba_ora_29995.trc (incident=437115):
 ORA-07445: 出现异常错误: 核心转储 [kghalo()+570] [SIGSEGV] [ADDR:0x0] [PC:0x97ECFFC] [SI_KERNEL(general_protection)] []
 ORA-07445: 出现异常错误: 核心转储 [kghalo()+570] [SIGSEGV] [ADDR:0x0] [PC:0x97ECFFC] [SI_KERNEL(general_protection)] []
 ORA-00600: 内部错误代码, 参数: [17182], [0x7F617B5DAE58], [], [], [], [], [], [], [], [], [], []
 ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
 Incident details in: /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/incident/incdir_437115/Travelskydba_ora_29995_i437115.trc
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x87F3BD8, dbgexDumpErrDesc()+82] [flags: 0x0, count: 3]
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 Wed Aug 21 15:53:57 2019
 Block recovery from logseq 8936, block 446894 to scn 843928251
 Recovery of Online Redo Log: Thread 1 Group 3 Seq 8936 Reading mem 0
 Mem# 0: /oracle/redo_Travelskydba/Travelskydba/redo31.log
 Mem# 1: /oracle/redo_Travelskydba/Travelskydba/redo32.log
 Block recovery completed at rba 8936.446897.16, scn 0.843928254
 Errors in file /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/trace/Travelskydba_pmon_51643.trc (incident=432032):
 ORA-00600: internal error code, arguments: [17182], [0x7FDB167BDBF8], [], [], [], [], [], [], [], [], [], []
 Incident details in: /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/incident/incdir_432032/Travelskydba_pmon_51643_i432032.trc
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97E6BEC, kghpmfal()+216] [flags: 0x0, count: 1]
 Errors in file /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/trace/Travelskydba_pmon_51643.trc (incident=432033):
 ORA-07445: exception encountered: core dump [kghpmfal()+216] [SIGSEGV] [ADDR:0x0] [PC:0x97E6BEC] [SI_KERNEL(general_protection)] []
 ORA-00600: internal error code, arguments: [17182], [0x7FDB167BDBF8], [], [], [], [], [], [], [], [], [], []
 Incident details in: /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/incident/incdir_432033/Travelskydba_pmon_51643_i432033.trc
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 Wed Aug 21 15:53:59 2019
 Dumping diagnostic data in directory=[cdmp_20190821155359], requested by (instance=1, osid=51643 (PMON)), summary=[incident=432032].
 1562114,1 99%
 ORA-07445: 出现异常错误: 核心转储 [kghalo()+570] [SIGSEGV] [ADDR:0x0] [PC:0x97ECFFC] [SI_KERNEL(general_protection)] []
 ORA-00600: 内部错误代码, 参数: [17182], [0x7F617B5DAE58], [], [], [], [], [], [], [], [], [], []
 ORA-00600: 内部错误代码, 参数: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
 Incident details in: /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/incident/incdir_437115/Travelskydba_ora_29995_i437115.trc
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x87F3BD8, dbgexDumpErrDesc()+82] [flags: 0x0, count: 3]
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 Wed Aug 21 15:53:57 2019
 Block recovery from logseq 8936, block 446894 to scn 843928251
 Recovery of Online Redo Log: Thread 1 Group 3 Seq 8936 Reading mem 0
 Mem# 0: /oracle/redo_Travelskydba/Travelskydba/redo31.log
 Mem# 1: /oracle/redo_Travelskydba/Travelskydba/redo32.log
 Block recovery completed at rba 8936.446897.16, scn 0.843928254
 Errors in file /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/trace/Travelskydba_pmon_51643.trc (incident=432032):
 ORA-00600: internal error code, arguments: [17182], [0x7FDB167BDBF8], [], [], [], [], [], [], [], [], [], []
 Incident details in: /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/incident/incdir_432032/Travelskydba_pmon_51643_i432032.trc
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x97E6BEC, kghpmfal()+216] [flags: 0x0, count: 1]
 Errors in file /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/trace/Travelskydba_pmon_51643.trc (incident=432033):
 ORA-07445: exception encountered: core dump [kghpmfal()+216] [SIGSEGV] [ADDR:0x0] [PC:0x97E6BEC] [SI_KERNEL(general_protection)] []
 ORA-00600: internal error code, arguments: [17182], [0x7FDB167BDBF8], [], [], [], [], [], [], [], [], [], []
 Incident details in: /opt/app/ora11g/orabase/diag/rdbms/Travelskydba/Travelskydba/incident/incdir_432033/Travelskydba_pmon_51643_i432033.trc
 Use ADRCI or Support Workbench to package the incident.
 See Note 411.1 at My Oracle Support for error and packaging details.
 Wed Aug 21 15:53:59 2019
 Dumping diagnostic data in directory=[cdmp_20190821155359], requested by (instance=1, osid=51643 (PMON)), summary=[incident=432032].
 System state dump requested by (instance=1, osid=51706 (CJQ0)), summary=[abnormal instance termination].
 Wed Aug 21 15:54:03 2019
 CJQ0 (ospid: 51706): terminating the instance due to error 472

从日志中看到,数据库从21日15:53开始报错:ORA-00600与ORA-07445,54分时,数据库实例被CJQ0进程所终止。

继续阅读

Online Redo Log丢失常见场景恢复方法

Online Redo Log丢失损坏场景恢复实验

场景介绍:

场景一.inactive状态的redo恢复,

场景二.active状态的redo恢复 ,

场景三.current状态的redo恢复。

场景一:inactive状态的redo损坏及恢复

会话1:我们可以看到,当前数据库为非归档模式(No Archive Mode),并且当前存在3个日志组,GROUP 3为Current状态,2和3均为INACTIVE状态,我们即将要破坏GROUP# 2或GROUP#3的日志文件,模拟故障现象

redo-1

会话2:dd group#2 日志文件,该文件状态为INACTIVE:

redo-2

会话1:重启数据库报错,通过alert日志中可以看到,无法打开GROUP 2 redolog文件,数据库一致性无法保证,数据库启动失败

redo-3

继续阅读

RMAN恢复实验(2)-Controlfile恢复及重建Controlfile

继RMAN恢复实验(1)- SPFILE恢复实验后,继续进行RMAN其他相关重要文件的恢复实验,本次实验恢复的目标为-Controlfile 控制文件:

场景(1)RAC集群架构 + RMAN AUTOBACKUP

模拟CONTROLFILE丢失:删除ASM中存储的controlfile01与controlfile02

CON1-1

启动数据库抛出control file错误,此时数据库实例状态为NOMOUNT状态:

CON1-2

继续阅读

RMAN恢复实验(1)- SPFILE恢复触发的BUG

最近在复习RMAN的恢复相关主题知识,做了一些RMAN相关恢复的实验,在多套库成功还原多次SPFILE后,再一次测试库中执行恢复时,突然触发了Oracle Bug ,详情如下:

模拟场景:

(1)数据库close并且参数文件丢失:

SPFILE-1无法启动数据库,缺失参数文件,数据库实例无法Nomount

本实验还有一点特殊的地方在于“存在备份且没有配置Catalog”情况,目前的状态是我们有备份,但是无法mount数据库,也就是无法读取controlfile中的备份元数据,无法从备份中恢复SPFILE。但是,RMAN 在设计时已想到了此问题,所以在使用RMAN 备份数据库时RMAN有一项配置:CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO ‘%F’; # default,当数据库备份丢失控制文件与SPFILE时,可借助此特性进行相关数据库启动必须文件“重要文件”的恢复。需要特别说明的是,使用RMAN备份时,即使CONFIGURE BACKUP OPTIMIZATION 设置为OFF,如果备份涉及backup datafile 1也就是system datafile时,会自动强制including current control file in backup set与including current SPFILE in backup set(存放于备份集中,不以操作系统文件的形式存放)

继续阅读

记一次DataGuard Corrupt block修复

近日,应用人员反馈数据库备库(Read only with apply)只读查询时报出如下错误:

坏块1报错内容很明显:13号文件出现文件坏块,并且该数据库块为NOLOGGING,也就是主库该数据库文件在加载时使用了NOLOGGING选项,很有可能是该操作数据行为对时间要求比较高,但是数据库主库没有开启FORCE LOGGING选项,导致备库恢复时出现坏块。

 

继续阅读

利用BBED工具重现生产系统UNDO故障

不久前,公司一套Oracle 9i数据库出现了Dead事务,具体表现为数据库事务SMON回滚事务出现死循环,导致数据库redo log激增,(不断重写UNDO,抹掉重写抹掉重写….),CPU使用率高居不下。Dead事务无法清除,造成了生产系统巨大安全隐患,数据库CPU Idle基本为0,巨大的Redo使得数据库恢复成为可能中的不可能…..

数据库出现Dead事务,死循环回滚是因为UNDO BLOCK出现逻辑损坏,处理思路将该事务所用的回滚段DROP掉,代价为事务的提交结果回滚状态不可控,经与应用确认可忽略此事务的最终状态,所以决定使用本案例中的处理方案。

因ORACLE 9i数据库测试资源有限,我使用了11g 11.2.0.4版本测试环境重现UNDO BLOCK逻辑问题及处理方式。

一.故障现象模拟

(1)数据库建立新事务,不提交或回滚并查看事务信息:

undo_corruption_block -11

 

undo_corruption_block -21

继续阅读

UDEV配置错误导致ASM磁盘组故障模拟及恢复测试(三)

场景3: dd现有设备损坏,ASM Instance Crash

一、故障现象模拟:

(1)添加一块现有磁盘,命名为DATA05,加入ASM-DISKGROUP后报错重复设备:

(2)因怀疑DATA05别名对应设备与系统现有设备重复,并且没有设别到DATA04已有数据,对DATA04进行dd操作:

dd if=/dev/zero of=/dev/HUA_LUN5500f_DATA05 bs=4k count=1 

(以上命令生产系统中请慎重操作)

ASM3-1-1

 

继续阅读