生产系统SQL低效索引案例

前不久,无意中看到生产中一则TOP SQL,消耗资源较多,感觉有很大优化空间

SQL_ID  1agm6bmnhh609, child number 24
--------------------------------------
select *
  from UMETRIP_EVENTLOG
where FLIGHTDATE = :1 
   and EVENTTYPE = 'GATE_CHG' 
   and PASSREASON = '-91000

TIME         EXECU_D      BUFFER        DISK      ET_D     ROWS_PRO                                        
------------ ------- ----------- ----------- --------- ------------                                        
2020052004         2      229527       75055       205         4201                                        
2020052101         2      288633       67290       166         4061                                        
2020052202         2      226528       74570       181         4224                                        
2020052203         2      205266       50359       134         4421                                        
2020052303         2      206985       67949       156         4638                                        
2020052306         1      152840       76679       196         1723                                        
2020052601         2      213294       47103       125         3441                                        
2020052603         2      209591        9441        36         3765                                        
2020052810         2      248329       30198        91         4765                                        
2020052813         2      271982       46553       123         4849

这条SQL执行次数并不多,却在AWR中进入了TOP 10 SQL,执行1-2次消耗了200000+的逻辑读,表记录数25042676行,NDV如下:

COLUMN_NAME                       NDV NUL  NUM_NULLS     DATA_TYPE      LAST_ANALYZED        HISTOGRAM
------------------------------ ---------- --- ---------- ------------ -------------------- ---------
CITYCODE                             2379 Y         6073 VARCHAR2     2020-05-27 22:09:30  HEIGHT BA
DESTA                                2365 Y          849 VARCHAR2     2020-05-27 22:09:30  HEIGHT BA
EVENTTYPE                              59 Y            0 VARCHAR2     2020-05-27 22:09:30  FREQUENCY
EXTRAINFO                         2189824 Y     14600645 VARCHAR2     2020-05-27 22:09:30  HEIGHT BA
FLIGHTDATE                           1891 Y          723 VARCHAR2     2020-05-27 22:09:30  HEIGHT BA
FLIGHTDYNFIELD                         45 Y         3651 VARCHAR2     2020-05-27 22:09:30  NONE
FLIGHTNO                            58972 Y            0 VARCHAR2     2020-05-27 22:09:30  HEIGHT BA
ID                               25042676 Y            0 NUMBER       2020-05-27 22:09:30  HEIGHT BA
INFOAFTER                          713856 Y          336 VARCHAR2     2020-05-27 22:09:30  HEIGHT BA
INFOBEFOR                          243808 Y     21175542 VARCHAR2     2020-05-27 22:09:30  NONE
INSERTTIME                         626176 Y            0 DATE         2020-05-27 22:09:30  HEIGHT BA
ISUSED                                  2 Y          165 NUMBER       2020-05-27 22:09:30  FREQUENCY
PASSREASON                             91 Y      3295111 NUMBER       2020-05-27 22:09:30  FREQUENCY


Plan hash value: 3303868666
-----------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                       |       |       | 22722 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID| UMETRIP_EVENTLOG      |    72 |  7776 | 22722   (1)| 00:04:33 |
|*  2 |   INDEX RANGE SCAN          | UMETRIP_EVENTLOG_IDX2 | 17964 |       |  5724   (1)| 00:01:09 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------


   1 - filter("PASSREASON"=(-91000))
   2 - access("EVENTTYPE"='GATE_CHG' AND "FLIGHTDATE"=:1)
       filter("FLIGHTDATE"=:1)

目前SQL执行计划走了UMETRIP_EVENTLOG_IDX2的索引,EVENTTYPE/ID/CITYCODE/FLIGHTDATE,起到过滤作用的只有EVENTTYPE,而FLIGHTDATE只能起到过滤作用。

继续阅读

生产数据库查询与插入hang案例

前不久,生产Oracle数据库上出现一个比较紧急的案例,前台业务查询与插入操作全部夯死,涉及这些SQL的业务全部中断。接到电话后立即查看系统状态

此时系统中出现了latch: row cache objects ,library cache lock这些本不该在生产系统中出现的事件:
SQL> select event,count(*) from v$session where wait_class <> 'Idle' group by event order by 2 desc;

EVENT                                                              COUNT(*)
---------------------------------------------------------------- ----------
library cache lock                                                        2
SQL*Net message to client                                                 2
log file sync                                                             1
latch: row cache objects                                                  1



这些等待的Blocking_session均为空
SQL> select username,sid,serial#,sql_id,blocking_session,event from  v$session where wait_class <> 'Idle';

USERNAME                              SID    SERIAL# SQL_ID        BLOCKING_SESSION EVENT
------------------------------ ---------- ---------- ------------- ---------------- ----------------------------------------------------------------
Albert                               1015       3251 fdn44h4am80mb                  gc cr multi block request
Albert                               1184      54311 6sx33q2nq3c67                  library cache lock
Albert                               2738       4293 3ndnsw62wz2f1                  latch: row cache objects
Albert                               2957      17039                                reliable message
Albert                               3414      13501 7jj34snwajctj                  library cache lock
SYS                                  4192      29879 dh8qnqqr8jqyt                  SQL*Net message to client

此时需要找到Library cache lock等事件的阻塞源头,尽快恢复中断业务,可以利用如下代码进行确认追踪

SQL> select /*+ ordered */
  2  w1.sid waiting_session,
  3  h1.sid holding_session,
  4  w.kgllktype lock_or_pin,
  5  w.kgllkhdl address,
  6  decode(h.kgllkmod,
  7          0,
  8          'None',
  9          1,
 10          'Null',
 11          2,
 12          'Share',
 13          3,
 14          'Exclusive',
 15          'Unknown') mode_held,
 16  decode(w.kgllkreq,
 17          0,
 18          'None',
 19          1,
 20          'Null',
 21          2,
 22          'Share',
 23          3,
 24          'Exclusive',
 25          'Unknown') mode_requested
 26    from dba_kgllock w, dba_kgllock h, v$session w1, v$session h1
 27  where (((h.kgllkmod != 0) and (h.kgllkmod != 1) and
 28         ((h.kgllkreq = 0) or (h.kgllkreq = 1))) and
 29         (((w.kgllkmod = 0) or (w.kgllkmod = 1)) and
 30         ((w.kgllkreq != 0) and (w.kgllkreq != 1))))
 31     and w.kgllktype = h.kgllktype
 32     and w.kgllkhdl = h.kgllkhdl
 33     and w.kgllkuse = w1.saddr
 34     and h.kgllkuse = h1.saddr;

WAITING_SESSION HOLDING_SESSION LOCK ADDRESS          MODE_HELD MODE_REQU
--------------- --------------- ---- ---------------- --------- ---------
           3414            2738 Lock 0000001FDF10B328 Share     Exclusive
           1184            2738 Lock 0000001FDF10B328 Share     Share

可以从上述结果中看到,2738会话持有Hanle address “0000001FDF10B328”的独占锁:Exclusive,所以应急杀掉2738会话,KILL SESSION会话后,故障恢复。

SQL> r
  1* select username,sid,serial#,sql_id,blocking_session,event from  v$session where wait_class <> 'Idle'

USERNAME                              SID    SERIAL# SQL_ID        BLOCKING_SESSION EVENT
------------------------------ ---------- ---------- ------------- ---------------- ----------------------------------------------------------------
Albert                                   1015       3251 35gvvuy7b2p2s                  db file scattered read
Albert                                   1184      54311 6sx33q2nq3c67                  SQL*Net more data from client
Albert                                   2957      17041 6sx33q2nq3c67                  SQL*Net more data from client
Albert                                 4192      29961 dh8qnqqr8jqyt                  SQL*Net message to client

事后根因分析,为何在业务高峰期间会有library cache lock长时间占有不释放的情况?

确认ASH时间点后,VTERM\GS02是受害者,基本大部分会话都是被2738这个会话阻塞
SQL> select --to_char(sample_time, 'yyyymmdd hh24:mi:ss') sample_time,
  2   event,
  3   sql_id,
  4   program,
  5   machine,
  6   sql_opname,
  7   BLOCKING_INST_ID || '_' || BLOCKING_SESSION,
  8   count(*)
  9    from v$active_session_history a
 10   where to_char(sample_time, 'yyyymmdd hh24:mi:ss') between
 11         '20200501 00:00:00' and '20200501 10:44:00'
 12     and event = 'library cache lock'
 13   group by --to_char(sample_time, 'yyyymmdd hh24:mi:ss'),
 14            event,
 15            sql_id,
 16            program,
 17            machine,
 18            BLOCKING_INST_ID || '_' || BLOCKING_SESSION,
 19            sql_opname
 20   order by 7
 21  ;

EVENT                                                            SQL_ID        PROGRAM                                          MACHINE                                                          SQL_OPNAME                                                       BLOCKING_INST_ID||'_'||BLOCKING_SESSION                                            COUNT(*)
---------------------------------------------------------------- ------------- ------------------------------------------------ ---------------------------------------------------------------- ---------------------------------------------------------------- -------------------------------------------------------------------------------- ----------
library cache lock                                               0uujs8nqcm145 plsqldev.exe                                     VTERM\GS02                                                       SELECT                                                           _                                                                                         7  
library cache lock                                               49qykykfhwqtm plsqldev.exe                                     VTERM\GS02                                                       SELECT                                                           _                                                                                        22
library cache lock                                               973s831gy6jk4 plsqldev.exe                                     VTERM\GS02                                                       SELECT                                                           _                                                                                        38
library cache lock                                                             sqlldr@hostname (TNS V1-V3)                      hostname                                                                                                                          _                                                                                        58
library cache lock                                               g639trfsrp9m5 plsqldev.exe                                     VTERM\Travelskydba01                                             PL/SQL EXECUTE                                                   _                                                                                        62
library cache lock                                               3y3gt31q23uk1 plsqldev.exe                                     VTERM\GS02                                                       SELECT                                                           _                                                                                        65
library cache lock                                               6bwmj9n38kjp6 plsqldev.exe                                     VTERM\GS02                                                       SELECT                                                           _                                                                                       494
library cache lock                                               973s831gy6jk4 plsqldev.exe                                     VTERM\Travelskydba01                                             SELECT                                                           _                                                                                      1475
library cache lock                                               7jj34snwajctj detrreceipt@hostname2 (TNS V1-V3)                hostname2                                                        INSERT                                                           1_2738                                                                                 2419
library cache lock                                               6sx33q2nq3c67 sqlldr@hostname (TNS V1-V3)                      hostname                                                         INSERT                                                           _                                                                                      2419
library cache lock                                               7jj34snwajctj detrreceipt@hostname2 (TNS V1-V3)                hostname2                                                        INSERT                                                           _                                                                                      2515
library cache lock                                               6sx33q2nq3c67 sqlldr@hostname (TNS V1-V3)                      hostname                                                         INSERT                                                           1_2738                                                                                10760

12 rows selected

ASH中后续并没有记录2738这个会话再等latch: row cache objects,但是抓到了其他的进程信息,例如PROGRAM/MACHINE,猜测故障当时此会话已经不是ACTIVE

ASH中并没有记录2738这个会话再等latch: row cache objects
SQL> select --to_char(sample_time, 'yyyymmdd hh24:mi:ss') sample_time,
  2   event,
  3   sql_id,
  4   program,
  5   machine,
  6   sql_opname,
  7   BLOCKING_INST_ID || '_' || BLOCKING_SESSION,
  8   count(*)
  9    from ash_20200501 a
 10   where to_char(sample_time, 'yyyymmdd hh24:mi:ss') between
 11         '20200430 00:00:00' and '20200501 10:44:00'
 12     and session_id=2738 and program ='plsqldev.exe'
 13   group by --to_char(sample_time, 'yyyymmdd hh24:mi:ss'),
 14            event,
 15            sql_id,
 16            program,
 17            machine,
 18            BLOCKING_INST_ID || '_' || BLOCKING_SESSION,
 19            sql_opname
 20   order by 1
 21  ;

EVENT                                                            SQL_ID        PROGRAM                                          MACHINE                                                          SQL_OPNAME                                                       BLOCKING_INST_ID||'_'||BLOCKING_SESSION                                            COUNT(*)
---------------------------------------------------------------- ------------- ------------------------------------------------ ---------------------------------------------------------------- ---------------------------------------------------------------- -------------------------------------------------------------------------------- ----------
                                                                 3ndnsw62wz2f1 plsqldev.exe                                     VTERM\Travelskydba01                                                    SELECT                                                           _                                                                            58210
                                                                 3ndnsw62wz2f1 plsqldev.exe                                     VTERM\Travelskydba01                                                                                                                     _                                                                             6

SQL> 

2738这个会话最早在4月30日18点20开始出现,一直持续到故障发生时(故障发生时间:20200501 10:00左右)

738这个会话最早在4月30日18点20开始出现,一直持续到故障发生时:
SQL> select rownum, a.*
  2    from (select to_char(sample_time, 'yyyymmdd hh24:mi:ss') sample_time,
  3                 event,
  4                 sql_id,
  5                 program,
  6                 machine,
  7                 sql_opname,
  8                 BLOCKING_INST_ID || '_' || BLOCKING_SESSION,
  9                 count(*)
 10            from ash_20200501 a
 11           where to_char(sample_time, 'yyyymmdd hh24:mi:ss') between
 12                 '20200430 00:00:00' and '20200501 10:44:00'
 13             and session_id = 2738
 14             and program = 'plsqldev.exe'
 15           group by to_char(sample_time, 'yyyymmdd hh24:mi:ss'),
 16                    event,
 17                    sql_id,
 18                    program,
 19                    machine,
 20                    BLOCKING_INST_ID || '_' || BLOCKING_SESSION,
 21                    sql_opname
 22           order by 1) a where rownum <2
 23  ;

    ROWNUM SAMPLE_TIME       EVENT                                                            SQL_ID        PROGRAM                                          MACHINE                                                          SQL_OPNAME                                                       BLOCKING_INST_ID||'_'||BLOCKING_SESSION                                            COUNT(*)
---------- ----------------- ---------------------------------------------------------------- ------------- ------------------------------------------------ ---------------------------------------------------------------- ---------------------------------------------------------------- -------------------------------------------------------------------------------- ----------
         1 20200430 18:20:47                                                                  3ndnsw62wz2f1 plsqldev.exe                                     VTERM\Travelskydba01                                                                                                                       _                                                                                         1
继续阅读