0. Summary
1. 問題現(xiàn)象
2. 問題分析
. 2.1 查看SGA設(shè)置參數(shù)
. 2.2 查看large pool大小以及自動調(diào)整
. 2.3 并行參數(shù)查看
. 2.4 告警日志詳細分析
. 2.5 shared pool大小查看
3. 問題處理建議
1. 問題現(xiàn)象
#### alert log ####
Sat Feb 04 02:08:41 2017
Memory Notification: Library Cache Object loaded into SGA
Heap size 51201K exceeds notification threshold (51200K)
Details in trace file /app/oracle/diag/rdbms/noap/noap/trace/noap_j000_4031.trc
KGL object name :SZ1X.IN_JS_CDR_HW_AC_TI
Memory Notification: Library Cache Object loaded into SGA
Heap size 331660K exceeds notification threshold (51200K)
Details in trace file /app/oracle/diag/rdbms/noap/noap/trace/noap_j000_4031.trc
KGL object name :alter table MOD_JS_CDR_HW drop partition SYS_P1404186
Sat Feb 04 02:09:20 2017
TABLE SZ1X.MOD_CDR_HW: ADDED INTERVAL PARTITION SYS_P1404388 (47883) VALUES LESS THAN (TO_DATE(' 2017-02-04 03:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))
Sat Feb 04 02:14:22 2017
Thread 1 advanced to log sequence 603901 (LGWR switch)
Current log# 1 seq# 603901 mem# 0: /app/oracle/oradata/noap/redo01.log
Sat Feb 04 02:32:34 2017
Errors in file /app/oracle/diag/rdbms/noap/noap/trace/noap_p085_5172.trc (incident=109601):
ORA-04031: ?T·¨·??? 2048024 ×??úμ?12?í?ú′? ("large pool","unknown object","large pool","PX msg pool")
Incident details in: /app/oracle/diag/rdbms/noap/noap/incident/incdir_109601/noap_p085_5172_i109601.trc
Sat Feb 04 02:32:34 2017
Errors in file /app/oracle/diag/rdbms/noap/noap/trace/noap_p034_5070.trc (incident=101439):
ORA-04031: ?T·¨·??? 2048024 ×??úμ?12?í?ú′? ("large pool","unknown object","large pool","PX msg pool")
Incident details in: /app/oracle/diag/rdbms/noap/noap/incident/incdir_101439/noap_p034_5070_i101439.trc
Sat Feb 04 02:32:34 2017
Errors in file /app/oracle/diag/rdbms/noap/noap/trace/noap_p092_5186.trc (incident=109832):
ORA-04031: ?T·¨·??? 2048024 ×??úμ?12?í?ú′? ("large pool","unknown object","large pool","PX msg pool")
Incident details in: /app/oracle/diag/rdbms/noap/noap/incident/incdir_109832/noap_p092_5186_i109832.trc
Sat Feb 04 02:32:34 2017
Errors in file /app/oracle/diag/rdbms/noap/noap/trace/noap_p051_5104.trc (incident=107372):
ORA-04031: ?T·¨·??? 2048024 ×??úμ?12?í?ú′? ("large pool","unknown object","large pool","PX msg pool")
Incident details in: /app/oracle/diag/rdbms/noap/noap/incident/incdir_107372/noap_p051_5104_i107372.trc
......
#### noap_p085_5172_i109601.trc ####
Dump continued from file: /app/oracle/diag/rdbms/noap/noap/trace/noap_p085_5172.trc
ORA-04031: ?T·¨·??? 2048024 ??úμ?12?í?ú′? ("large pool","unknown object","large pool","PX msg pool")
========= Dump for incident 109601 (ORA 4031) ========μ??μ???è,
*** 2017-02-04 02:32:34.673
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=385pbhfh4g7rn) -----
insert /*+append*/ into c_cdr_railway_huning
select /*+full(t) parallel(64)*/
RELEASE_CAUSE AS o??Dêí·??-ò
ACCESS_TIME AS ?óè?ê±?ì
......
告警日志有ORA-04031報錯钮惠,從報錯信息來看该窗,直接原因是因為并行引起的large pool不足導(dǎo)致暂雹。
2. 問題分析
2.1 查看SGA設(shè)置參數(shù)
SQL> show parameter sga
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
lock_sga boolean FALSE
pre_page_sga boolean FALSE
sga_max_size big integer 32G
sga_target big integer 32G
SQL> show parameter db_cache
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_cache_advice string ON
db_cache_size big integer 22G
SQL> show parameter shared_pool
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
shared_pool_reserved_size big integer 510027366
shared_pool_size big integer 8G
當(dāng)前數(shù)據(jù)庫SGA設(shè)置為ASMM自動管理
2.2 查看large pool大小以及自動調(diào)整
SQL> show parameter large
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
large_pool_size big integer 0
use_large_pages string TRUE
SQL> select t.*
2 from (select name,
3 bytes / (1024 * 1024) "MB",
4 round(bytes / (select value
5 from v$parameter t
6 where t.name = 'shared_pool_size') * 100,
7 2) || '%' "USED%"
8 from v$sgastat
9 where pool = 'large pool'
10 order by 2 desc) t
11 where rownum < 20;
NAME MB USED%
-------------------------- ---------- -----------------------------------------
free memory 119.8125 1.46%
PX msg pool 7.8125 .1%
ASM map operations hashta .375 0%
當(dāng)前數(shù)據(jù)庫SGA設(shè)置為ASMM自動管理,large pool沒有設(shè)置最小值板壮,目前使用是正常。因為使用的是自動管理原押,在組件進行調(diào)整的時候哈恰,也是有可能積壓到large pool的使用的。
SQL> select start_time,
2 component,
3 oper_type,
4 oper_mode,
5 initial_size / 1024 / 1024 "INITIAL",
6 final_size / 1024 / 1024 "FINAL",
7 end_time
8 from v$sga_resize_ops
9 where component in ('large pool')
10 order by start_time, component;
START_TIME COMPONENT OPER_TYPE OPER_MODE INITIAL FINAL END_TIME
------------------- ------------------------- ------------- --------- -------------------- -------------------- -------------------
30/01/2017 03:02:01 large pool GROW IMMEDIATE 192 256 30/01/2017 03:02:03
30/01/2017 03:02:01 large pool GROW IMMEDIATE 192 256 30/01/2017 03:02:03
30/01/2017 03:02:01 large pool GROW IMMEDIATE 192 256 30/01/2017 03:02:02
30/01/2017 03:02:01 large pool GROW IMMEDIATE 192 256 30/01/2017 03:02:02
30/01/2017 03:02:01 large pool GROW IMMEDIATE 192 256 30/01/2017 03:02:02
30/01/2017 03:02:01 large pool GROW IMMEDIATE 192 256 30/01/2017 03:02:02
30/01/2017 03:02:01 large pool GROW IMMEDIATE 192 256 30/01/2017 03:02:02
30/01/2017 03:02:01 large pool GROW IMMEDIATE 192 256 30/01/2017 03:02:02
30/01/2017 03:02:01 large pool GROW IMMEDIATE 192 256 30/01/2017 03:02:02
......
04/02/2017 02:32:32 large pool GROW IMMEDIATE 320 384 04/02/2017 02:32:33
04/02/2017 02:32:32 large pool GROW IMMEDIATE 320 384 04/02/2017 02:32:33
04/02/2017 02:32:32 large pool GROW IMMEDIATE 320 384 04/02/2017 02:32:33
04/02/2017 02:32:32 large pool GROW IMMEDIATE 320 384 04/02/2017 02:32:33
04/02/2017 02:35:47 large pool SHRINK DEFERRED 384 128 04/02/2017 02:35:47
......
04/02/2017 03:01:56 large pool GROW IMMEDIATE 320 384 04/02/2017 03:01:57
04/02/2017 03:01:56 large pool GROW IMMEDIATE 320 384 04/02/2017 03:01:57
04/02/2017 03:01:56 large pool GROW IMMEDIATE 320 384 04/02/2017 03:01:57
04/02/2017 03:01:56 large pool GROW IMMEDIATE 320 384 04/02/2017 03:01:57
04/02/2017 03:01:56 large pool GROW IMMEDIATE 320 384 04/02/2017 03:01:57
04/02/2017 03:01:56 large pool GROW IMMEDIATE 320 384 04/02/2017 03:01:57
04/02/2017 03:04:24 large pool SHRINK DEFERRED 384 128 04/02/2017 03:04:24
可以發(fā)現(xiàn)large pool較頻繁性的進行g(shù)row和shrink
2.3 并行參數(shù)查看
從報錯的trc中看臼氨,sql使用的并行度(64)較高掺喻。查看并行相關(guān)的參數(shù)
SQL> show parameter cpu_count
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cpu_count integer 16
SQL> show parameter parallel_max
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
parallel_max_servers integer 640
64設(shè)置的較大,該主機cpu count只有16储矩,建議適當(dāng)降低點并行度感耙。
2.4 告警日志詳細分析
Memory Notification: Library Cache Object loaded into SGA
Heap size 51201K exceeds notification threshold (51200K)
該信息代表內(nèi)存中某個組件的需求空間超過閾值,這個閾值由_kgl_large_heap_warning_threshold來控制持隧。這個特性在10gR2被引入抑月,單獨這個信息并不代表有問題,需要觀察后續(xù)是否有4031的報錯舆蝴。
參考:
Memory Notification: Library Cache Object loaded into SGA / ORA-600 [KGL-heap-size-exceeded] (文檔 ID 330239.1)
#### noap_j000_4031.trc ####
Memory Notification: Library Cache Object loaded into SGA
Heap size 73935K exceeds notification threshold (51200K)
LibraryHandle: Address=0x855ade650 Hash=70548654 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=alter table MOD_CDR_HW drop partition SYS_P1399962
FullHashValue=3aa1433897dd4d6fc458246c70548654 Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1884587604 OwnerIdn=83
Statistics: InvalidationCount=0 ExecutionCount=0 LoadCount=2 ActiveLocks=1 TotalLockCount=1 TotalPinCount=1
Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=1 BucketInUse=0 HandleInUse=0 HandleReferenceCount=0
Concurrency: DependencyMutex=0x855ade700(0, 1, 0, 0) Mutex=0x855ade780(1011, 21, 0, 6)
Flags=RON/PIN/TIM/PN0/DBN/[10012841]
WaitersLists:
Lock=0x855ade6e0[0x855ade6e0,0x855ade6e0]
Pin=0x855ade6c0[0x855ade6c0,0x855ade6c0]
Timestamp: Current=02-04-2017 02:00:34
HandleReference: Address=0x855ade820 Handle=(nil) Flags=[00]
觸發(fā)這個信息的trc中記錄了語句谦絮,即alert后面輸出的語句:
KGL object name :alter table MOD_JS_CDR_HW drop partition SYS_P1404186
繼續(xù)看large pool方面的報錯
Sat Feb 04 02:32:34 2017
Errors in file /app/oracle/diag/rdbms/noap/noap/trace/noap_p085_5172.trc (incident=109601):
ORA-04031: ?T·¨·??? 2048024 ×??úμ?12?í?ú′? ("large pool","unknown object","large pool","PX msg pool")
Incident details in: /app/oracle/diag/rdbms/noap/noap/incident/incdir_109601/noap_p085_5172_i109601.trc
Sat Feb 04 02:32:34 2017
Errors in file /app/oracle/diag/rdbms/noap/noap/trace/noap_p034_5070.trc (incident=101439):
ORA-04031: ?T·¨·??? 2048024 ×??úμ?12?í?ú′? ("large pool","unknown object","large pool","PX msg pool")
Incident details in: /app/oracle/diag/rdbms/noap/noap/incident/incdir_101439/noap_p034_5070_i101439.trc
Sat Feb 04 02:32:34 2017
Errors in file /app/oracle/diag/rdbms/noap/noap/trace/noap_p092_5186.trc (incident=109832):
ORA-04031: ?T·¨·??? 2048024 ×??úμ?12?í?ú′? ("large pool","unknown object","large pool","PX msg pool")
Incident details in: /app/oracle/diag/rdbms/noap/noap/incident/incdir_109832/noap_p092_5186_i109832.trc
Sat Feb 04 02:32:34 2017
Errors in file /app/oracle/diag/rdbms/noap/noap/trace/noap_p051_5104.trc (incident=107372):
ORA-04031: ?T·¨·??? 2048024 ×??úμ?12?í?ú′? ("large pool","unknown object","large pool","PX msg pool")
Incident details in: /app/oracle/diag/rdbms/noap/noap/incident/incdir_107372/noap_p051_5104_i107372.trc
large pool這部分輸出题诵,從前面SQL查詢剛好是large pool的shrink操作。
04/02/2017 02:32:32 large pool GROW IMMEDIATE 320 384 04/02/2017 02:32:33
04/02/2017 02:35:47 large pool SHRINK DEFERRED 384 128 04/02/2017 02:35:47
參考:
Multiple ORA-4031 Errors Of Reducing Sizes For "PX msg pool" In The Large Pool (文檔 ID 1515877.1)
和Bug:13072654 - ORA-4031 CANT ALLOC 14MB IN LARGE POOL, PX MSG POOL有關(guān)层皱,該bug在11.2.0.2有one-off patch, 可以考慮應(yīng)用性锭,或者設(shè)置large pool的最小值,或者改動SGA管理為手工管理叫胖。
2.5 shared pool大小查看
因為有LCO方面的信息爱致,查看shared pool當(dāng)前使用的大小
SQL> select t.*
2 from (select name,
3 bytes / (1024 * 1024) "MB",
4 round(bytes / (select value
5 from v$parameter t
6 where t.name = 'shared_pool_size') * 100,
7 2) || '%' "USED%"
8 from v$sgastat
9 where pool = 'shared pool'
10 order by 2 desc) t
11 where rownum < 20;
NAME MB USED%
-------------------------- -------------------- -----------------------------------------
free memory 2971.667167663574219 36.28%
PRTMV 2858.977592468261719 34.9%
SQLA 1862.341529846191406 22.73%
PRTDS 614.0089035034179688 7.5%
KQR M PO 315.4219131469726563 3.85%
KGLH0 199.7758560180664063 2.44%
dbktb: trace buffer 81.90625 1%
FileOpenBlock 60.796417236328125 .74%
ASM extent pointer array 52.86400604248046875 .65%
db_block_hash_buckets 44.50390625 .54%
dbwriter coalesce buffer 32.03125 .39%
ASH buffers 32 .39%
KGLHD 29.06992340087890625 .35%
kglsim object batch 19.32781219482421875 .24%
private strands 17.5341796875 .21%
Checkpoint queue 15.6328125 .19%
event statistics per sess 15.33984375 .19%
write state object 14.6377716064453125 .18%
ksunfy : SSO free list 14.32470703125 .17%
這里發(fā)現(xiàn)PRTMV這個組件比較陌生渊抽,并且占用了2.8G的空間,對比了其他庫:
NAME MB USED%
-------------------------- ---------- -----------------------------------------
free memory 6021.38947 58.8%
SQLA 1472.31024 14.38%
KGLH0 1264.46631 12.35%
PRTMV 219.83268 2.15%
KGLHD 199.816628 1.95%
db_block_hash_buckets 178.003906 1.74%
dbktb: trace buffer 102.390625 1%
ASH buffers 96 .94%
dbwriter coalesce buffer 80.078125 .78%
FileOpenBlock 71.1162643 .69%
KGLDA 65.826004 .64%
Checkpoint queue 46.8984375 .46%
KKSSP 40.4567947 .4%
private strands 25.9765625 .25%
dirty object counts array 24 .23%
event statistics per sess 22.8779297 .22%
ksunfy : SSO free list 21.7646484 .21%
parameter table block 19.9453812 .19%
KGLS 19.5513763 .19%
從對比可以看出,這個值可能存在異常月培,搜索了下MOS,確實存在相關(guān)的bug:
Bug 19461270 - high PRTMV allocations in shared pool executing concurrent DML and DDLs on interval partitioned tables (文檔 ID 19461270.8)
Description
Concurrent DDLs and DMLs happening on interval partitioned table that was created with deferred segment creation clause may do high PRTMV allocations.
Workaround
Do not run DDLs concurrently.
在使用interval分區(qū)的情況下椅野,可能會觸發(fā)氮发,與當(dāng)前問題現(xiàn)象較為吻合。
Bug 17037130 - Excess shared pool "PRTMV" memory use / ORA-4031 with partitioned tables (文檔 ID 17037130.8)
Description
This bug is only relevant when using Partitioned Tables
SQL on a partitioned table may cause excess shared pool usage and
ultimately fail with ORA-4031.
Rediscovery Notes:
ORA-4031 with child cursor(s) having dependency table entries
referencing obsolete (OBS) multi-versioned objects.
Workaround
Flushing the shared_pool and avoiding DDLs during high load time
can help to avoid this issue.
3. 問題處理建議
以上分析砸捏,large pool的4031報錯很可能和shrink large pool有關(guān)谬运。另外shared pool方面也存在問題。
對于large pool的bug垦藏,這個庫版本為11.2.0.2梆暖,未打PSU. 該bug在11.2.0.2有one-off patch,如果不應(yīng)用patch掂骏,可以考慮使用以下手段規(guī)避
- 對large pool設(shè)置最小值避免頻繁shrink轰驳,當(dāng)前庫設(shè)置為ASMM自動管理,db_cache(22g)和shared_pool(8g)已設(shè)置最小值弟灼,large_pool建議設(shè)置為200M.
alter system set large_pool_size=200M scope=spfile sid='*';
如果頻繁影響到并行任務(wù)级解,建議打上one-off patch或者修改內(nèi)存管理為手工管理。
- 并行任務(wù)中并行度64設(shè)置的較大袜爪,該主機cpu count只有16蠕趁,建議適當(dāng)降低點并行度。
對于shared pool的問題辛馆,當(dāng)前數(shù)據(jù)庫版本為11.2.0.2基版本沒有打PSU俺陋,涉及的兩個bug均沒有在11.2.0.2以及l(fā)inux平臺下的one-off patch. 在無法立即升級到11.2.0.3或以上版本的情況下,建議:
- 從bug 19461270描述來看昙篙,該bug除了與interval分區(qū)有關(guān)腊状,還和11g的新特性deferred segment creation特性有關(guān),建議關(guān)閉這個特性苔可。
alter system set deferred_segment_creation=false scope=spfile sid='*';
另一個bug 17037130從描述中和段延遲創(chuàng)建特性無關(guān)缴挖,建議按照第一步設(shè)置后持續(xù)觀察,臨時解決問題的方法是flush shared_pool或者避免在高負載時間段進行ddl.
對于當(dāng)前已經(jīng)使用的PRTMV組件焚辅,如果需要釋放映屋,建議可以找業(yè)務(wù)空閑的時間段手工flush shared_pool釋放苟鸯。