ORA-00704: bootstrap process failure故障修复

近日,同事反馈一套测试测试数据库宕机,进行重启数据库操作时发现数据库无法open,抛出ORA-00704: bootstrap process failure错误,版本为11.2.0.4

SQL> startup;
ORACLE instance started.

Total System Global Area 1068937216 bytes
Fixed Size		    2260088 bytes
Variable Size		  671089544 bytes
Database Buffers	  390070272 bytes
Redo Buffers		    5517312 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
Process ID: 57338
Session ID: 96 Serial number: 3

错误一并抛出了ORA-00942: table or view does not exist错误,应该是系统表丢失导致的bootstrap失败,并且alert日志中没有任何其他日志输出。

将数据库启动到nomount与mount状态,利用10046 event跟踪bootstrap过程:
SQL> oradebug setmypid
Statement processed.
SQL> oradebug tracefile_name
/opt/app/ora11g/albert/diag/rdbms/prod3/PROD3/trace/PROD3_ora_57585.trc
SQL> oradebug event 10046 trace name context forever,level 12;
Statement processed.
SQL> alter database mount;

Database altered.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-00942: table or view does not exist
Process ID: 57585
Session ID: 96 Serial number: 3

访问/opt/app/ora11g/albert/diag/rdbms/prod3/PROD3/trace/PROD3_ora_57585.trc发现访问access$表时抛出异常:

继续阅读

核心系统表关联顺序错误导致的低效SQL

同事发来一个核心业务库的TOP SQL,占据并浪费了大量的服务器CPU资源,此SQL执效率信息如下:

SELECT a.ID,
       a.SEG_FLIGHT_ID,
       a.SEG_DEP_STATION_CODE,
       a.SEG_ARR_STATION_CODE,
       a.SEG_CLASS_SEQUENCE,
       a.SEG_TRAFFIC_RESTRICTION_CODE,
       a.SEG_DEP_TERMINAL,
       a.SEG_ARR_TERMINAL,
       a.SEG_OC_TYPE,
       a.SEG_OC_MAP_NUMBER,
       a.SEG_OC_CLASS_MAPPING,
       a.SEG_DATA_SOURCE
FROM T_SEGMENT a,
  (SELECT id
   FROM
     (SELECT t.id
      FROM T_FLIGHT t,
           T_TASK ta
      WHERE 1=1
        AND ta.id IN ('11008',
                      '11003',
                      '11014',
                      '11012',
                      '14001',
                      '11001')
        AND t.FLT_AIRLINE_CODE = ta.FLT_AIRLINE_CODE
        AND t.FLT_FLIGHT_NO = ta.FLT_FLIGHT_NO
        AND t.FLT_FLIGHT_DATE = ta.FLT_FLIGHT_DATE
        AND t.FLT_OC_FLIGHT_ID IS NULL
        AND t.IS_VALID = 1
        AND nullif('AAAA',t.FLT_OFFICE) IS NOT NULL )
   LEFT JOIN t_dep_schema s ON id = s.flt_id
   WHERE s.flt_id IS NULL
     OR (s.flt_id IS NOT NULL
         AND (s.FLT_IN_TCARD_PROCESS_FLAG ='N'
              OR s.FLT_IN_TCARD_PROCESS_FLAG IS NULL))
     OR (s.flt_id IS NOT NULL
         AND (s.FLT_IN_INIT_PROCESS_FLAG IS NULL
              OR s.FLT_IN_INIT_PROCESS_FLAG ='N')) ) f
WHERE 1=1
  AND a.SEG_FLIGHT_ID = f.ID
  AND a.IS_VALID = 1

SQL执行计划如下:

从执行计划中看到,估算行数与实际行数差别巨大,实际行数只是899行,但是NESTED LOOP之后估算结果集为46K,并且TASK表与S表选择了HASH JOIN,在未收集个性化扩展统计信息的情况,针对这种复杂情况,估算不准的情况是比较常见的,通常需要利用HINT来调整表连接顺序来优化。此SQL的涉及表都不存在索引缺失的情况,只是行数估算错误导致表连接顺序有误,从而选择了低效的执行计划。

继续阅读