Something for free (serverside variables on JDBC client)

Two code pieces below give the same result,
The 1st one doesn’t require network round-trip/additional SQL execution at all.

  • String sidKey = (((oracle.jdbc.driver.OracleConnection) connection)).getServerSessionInfo().getProperty(“AUTH_SESSION_ID”);
  • ResultSet rsSid = connection.createStatement().executeQuery(“select SYS_CONTEXT(‘USERENV’, ‘SID’) from dual”);
    int sid = 0;
    if (rsSid.next()) {
    sid = rsSid.getInt(1);
    }
  • A full list as of 11.2.0.4 Oracle database and ojdbc6.jar JDBC driver
"INSTANCE_NAME"
"AUTH_NLS_LXCSTMPFM"
"AUTH_VERSION_STATUS"
"AUTH_SC_SERVER_HOST"
"SESSION_NLS_LXCNCHAREXCP"
"SERVER_HOST"
"AUTH_NLS_LXCTTZNFM"
"AUTH_SVR_RESPONSE"
"AUTH_FAILOVER_ID"
"AUTH_SERIAL_NUM"
"AUTH_NLS_LXLAN"
"AUTH_SC_INSTANCE_START_TIME"
"AUTH_SC_INSTANCE_NAME"
"AUTH_DBNAME"
"AUTH_SC_DB_DOMAIN"
"SESSION_NLS_LXCCHARSET"
"AUTH_DB_ID"
"AUTH_INSTANCENAME"
"AUTH_VERSION_SQL"
"AUTH_NLS_LXCTERRITORY"
"SESSION_TIME_ZONE"
"AUTH_INSTANCE_NO"
"DATABASE_NAME"
"AUTH_SC_SERVICE_NAME"
"AUTH_SERVER_PID"
"AUTH_NLS_LXCSORT"
"AUTH_NLS_LXCISOCURR"
"AUTH_VERSION_NO"
"AUTH_NLS_LXCDATELANG"
"SERVICE_NAME"
"AUTH_NLS_LXCSTZNFM"
"AUTH_SC_DBUNIQUE_NAME"
"AUTH_NLS_LXCDATEFM"
"AUTH_NLS_LXCUNIONCUR"
"AUTH_VERSION_STRING"
"AUTH_SESSION_ID"
"SESSION_NLS_LXCNLSLENSEM"
"AUTH_NLS_LXCNUMERICS"
"AUTH_XACTION_TRAITS"
"AUTH_SC_INSTANCE_ID"
"AUTH_SC_SVC_FLAGS"
"AUTH_NLS_LXCTIMEFM"
"AUTH_DB_MOUNT_ID"
"AUTH_CAPABILITY_TABLE"
"AUTH_USER_ID"
"AUTH_NLS_LXCCURRENCY"
"AUTH_NLS_LXCCALENDAR"
Advertisements

Effectiveness of CU compression and _inmemory_imcu_source_maxbytes

It seems quite obvious that effectiveness of compression depends on unit size considered for compression.
The amount of data loaded in-memory seems driven by _inmemory_imcu_source_maxbytes hidden parameter.
I decided to create a quite artificial example.

  1. Let’s create an uniform size TABLESPACE as:

    CREATE TABLESPACE DATA_U01 DATAFILE ‘/home/oracle/app/oracle/oradata/cdb1/orcl/DATA_U01_01.dbf’
    SIZE 500M
    EXTENT MANAGEMENT LOCAL UNIFORM SIZE 1M;

  2. Let’s create a segment with about 1M size:

    CREATE TABLE UNI2 TABLESPACE DATA_U01
    AS SELECT (1e11) + mod(rownum, 1*5e4) +1 id, (1e1) + mod(rownum, 1*5e4) id2
    FROM dual connect by level <=1*5e4

    The expression above creates 1M size table with fully unique values

  3. I’m going to play with the size of the unique part keeping the overall size  quite big  (40M)
  4. IM_compr_table
  5. A few remarks:
    If you have an unique part greater than a size of the segment you can’t expect effective compression.
    If you have an unique part greater than a size of “single” load event
    you can’t expect effective compression, it’s a quite natural hypothesis and I’m going to prove it as well.
  6. My parameterized CTAS expression looks like

    CREATE TABLE REMIDM.UNI2 TABLESPACE DATA_U01 AS select (1e11) + mod(rownum, &uq_size*5e4) +1 id, (1e1) + mod(rownum, &uq_size*5e4) id2 from dual connect by level <=40*5e4;

    So I do measures for 2 “_inmemory_imcu_source_maxbytes” sizes,
    i.e. 4M and 16M and read 40M segment with unique part varied from 1M to 8M, you can see results below:

  7. IM_compression
  8. I measure IM sizes as:

    select bytes/1024/1024 mbytes, inmemory_size/1024/1024 inmem from v$im_segments WHERE populate_status=’COMPLETED’;

  9. You can see that after “unique” part reach some value near the “_inmemory_imcu_source_maxbytes” size you stop with compression change at some “negative” compression value ~1.2 (50 M inmemory vs 40M on the disk).

List of “hiddens” for inmemory option

NAME
VALUE
DEFLT
TYPE
DESCRIPTION
_inmemory_repopulate_threshold_blocks TRUE number In-memory repopulate threshold number of modified blocks
_inmemory_repopulate_threshold_rows TRUE number In-memory repopulate threshold number of modified rows
_inmemory_fs_blk_inv_blkcnt TRUE boolean in-memory faststart CU invalidation threshold(blocks)
_inmemory_repopulate_threshold_mintime 0 TRUE number In-memory repopulate minimum interval (millisec)
_inmemory_imcu_target_bytes 0 TRUE number IMCU target size in bytes
_inmemory_repopulate_threshold_scans 0 TRUE number In-memory repopulate threshold number of scans
_inmemory_min_ima_defersize 0 TRUE 6 Defer in-memory area allocation beyond this size
_inmemory_jscan 0 TRUE number inmemory jscan enable
_inmemory_query_check 0 TRUE number In-memory query checking
_inmemory_journal_check 0 TRUE number Depending on value does one of the DML verifications
_inmemory_prepopulate_fg 0 TRUE number Force prepopulate of in-memory segment in foreground
_inmemory_imcu_source_extents 0 TRUE number number of source extents per IMCU
_inmemory_txn_checksum 0 TRUE number checksum for SMUs and private journals
_optimizer_inmemory_quotient 0 TRUE number in-memory quotient (% of rows in in-memory format)
_inmemory_imcu_source_blocks 0 TRUE number number of source blocks per IMCU
_inmemory_analyzer_optimize_for 0 TRUE number inmemory analyzer optimize for
_inmemory_trickle_repopulate_threshold_dirty_ratio 0 TRUE number IMCO Trickle Repopulate threshold dirty ratio
_inmemory_strdlxid_timeout 0 TRUE number max time to determine straddling transactions
_inmemory_segment_populate_verify 0 TRUE number In-memory segment populate verification
_inmemory_dbg_scan 0 TRUE number In-memory scan debugging
_inmemory_trickle_repopulate_fg 0 TRUE number Trickle Repopulate in the Foreground
_inmemory_test_verification 0 TRUE number In-memory verification testing
_inmemory_log_level 1 TRUE number in-memory log level
_inmemory_enable_population_verify 1 TRUE number verify in-memory population
_parallel_inmemory_time_unit 1 TRUE number unit of work used to derive the degree of parallelism for in-memory tables (in seconds)
_inmemory_repopulate_threshold_blocks_percent 10 TRUE number In-memory repopulate threshold blocks invalid percentage
_optimizer_inmemory_pruning_ratio_rows 100 TRUE number in-memory pruning ratio for # rows (% of rows remaining after pruning)
_inmemory_private_journal_quota 100 TRUE number quota for transaction in-memory private journals
_inmemory_pct_inv_blocks_invalidate_imcu 100 TRUE number In-memory percentage invalid blocks for IMCU invalidation
_inmemory_repopulate_priority_scale_factor 100 TRUE number In-memory repopulate priority threshold scale factor
_inmemory_repopulate_invalidate_rate_percent 100 TRUE number In-memory repopulate invalidate rate percent
_inmemory_buffer_waittime 100 TRUE number wait interval for one SMU or IMCU to be freed
_inmemory_cu_timeout 100 TRUE number maximum wait time for one IMCU to be freed
_inmemory_exclto_timeout 1000 TRUE number maximum wait time to pin SMU for cleanout
_inmemory_cudrop_timeout 1000 TRUE number maximum wait time for IMCU to be freed during drop
_inmemory_rows_check_interrupt 1000 TRUE number Number of rows buffered before interrupt check
_inmemory_imcu_target_rows 1048576 TRUE number IMCU target number of rows
_inmemory_imcu_source_minbytes 1048576 TRUE number number of minimum source bytes per IMCU
_inmemory_force_fs_tbs_size 1073741824 TRUE 6 in-memory faststart force tablespace size
_inmemory_imco_cycle 120 TRUE number IMCO cycle in seconds (sleep period)
_inmemory_private_journal_numgran 128 TRUE number Number of granules per HT node
_inmemory_imcu_source_analyze_bytes 134217728 TRUE number number of source analyze bytes per IMCU
_inmemory_pin_hist_mode 16 TRUE number settings for IM pinned buffer history
_inmemory_fs_blk_inv_blk_percent 20 TRUE number in-memory faststart CU invalidation threshold(blocks)
_inmemory_repopulate_priority_threshold_row 20 TRUE number In-memory repopulate priority threshold row
_inmemory_private_journal_sharedpool_quota 20 TRUE number quota for transaction in-memory objects
_inmemory_num_hash_latches 256 TRUE number Maximum number of latches for IM buffers
_inmemory_max_populate_retry 3 TRUE number IM populate maximum number of retry
_inmemory_64k_percent 30 TRUE number percentage of in-memory area for 64k pools
_inmemory_distribute_ondemand_timeout 300 TRUE number On demand timeout for redistribute
_inmemory_trickle_repopulate_min_interval 300 TRUE number IMCO Trickle Repopulate Interval
_inmemory_distribute_timeout 300 TRUE number If true, enable auto distribute with 2safe
_inmemory_max_queued_tasks 304 TRUE number Maximum queued populating tasks on the auxiliary queue
_inmemory_repopulate_priority_threshold_block 40 TRUE number In-memory repopulate priority threshold block
_inmemory_repopulate_threshold_mintime_factor 5 TRUE number In-memory repopulate minimum interval (N*timetorepop)
_inmemory_repopulate_threshold_rows_percent 5 TRUE number In-memory repopulate threshold rows invalid percentage
_inmemory_scan_threshold_percent_noscan 50 TRUE number In-memory scan threshold maximum percent dirty no scan
_inmemory_pct_inv_rows_invalidate_imcu 50 TRUE number In-memory percentage invalid rows for IMCU invalidation
_inmemory_private_journal_numbkts 512 TRUE number Number of priv jrnl ht bkts
_inmemory_imcu_populate_minbytes 5242880 TRUE 6 minimum free space in IMA for populating IMCU
_inmemory_imcu_source_maxbytes 536870912 TRUE number IMCU maximum source size in bytes
_inmemory_pga_per_server 536870912 TRUE 6 minimum pga needed per inmemory populate server
_inmemory_servers_throttle_pgalim_percent 55 TRUE number In-memory populate servers throttling pga limit percentage
_inmemory_populate_wait_max 600 TRUE number maximum wait time in seconds for segment populate
_inmemory_small_segment_threshold 65536 TRUE 6 In-memory small segment threshold (must be larger for in-memory)
_inmemory_imcu_target_maxrows 8388608 TRUE number IMCU maximum target number of rows
_inmemory_default_flags 8459 TRUE number Default flags based on inmemory_clause_default
_parallel_inmemory_min_time_threshold AUTO TRUE string threshold above which a plan is a candidate for parallelization for in-memory tables (in seconds)
_inmemory_checksum FALSE TRUE boolean If true, checksums in-memory area to detect stray writes
_inmemory_enable_stat_alert FALSE TRUE boolean dump in-memory stats in alert log file
_inmemory_check_protect FALSE TRUE boolean If true, marks in-memory area read only to prevent stray writes
_inmemory_default_new FALSE TRUE boolean Force in-memory on new tables
_inmemory_scan_override FALSE TRUE boolean In-memory scan override
_inmemory_suppress_vsga_ima FALSE TRUE boolean Suppress inmemory area in v$sga
_inmemory_fs_raise_error FALSE TRUE boolean in-memory faststart raise error
_inmemory_populate_fg FALSE TRUE boolean populate in foreground
_inmemory_journal_row_logging FALSE TRUE boolean If true, log the entire row into the in-memory journal
_inmemory_autodist_2safe FALSE TRUE boolean If true, enable auto distribute with 2safe
_inmemory_populate_wait FALSE TRUE boolean wait for population to complete
_inmemory_fs_enable FALSE TRUE boolean in-memory faststart enable
_inmemory_lock_for_smucreate FALSE TRUE boolean take object lock during smu creation
_inmemory_force_non_engineered FALSE TRUE boolean force non-engineered systems in-memory behavior on RAC
_inmemory_validate_fetch FALSE TRUE boolean If true, validate single-row fetch between in-memory and disk
_inmemory_fs_verify FALSE TRUE boolean in-memory faststart verify
_inmemory_repopulate_disable FALSE TRUE boolean disable In-memory repopulate
_inmemory_query_fetch_by_rowid FALSE TRUE boolean In-memory fetch-by-rowid enabled
_inmemory_enable_sys FALSE TRUE boolean enable in-memory on system tablespace with sys user
_inmemory_incremental_repopulation FALSE TRUE boolean If true, incremental repopulation of IMCU will be attempted
_inmemory_check_prot_meta FALSE TRUE boolean If true, marks SMU area read only to prevent stray writes
_inmemory_fs_nodml FALSE TRUE boolean in-memory faststart assumes no dmls while populating
_inmemory_force_fs FALSE TRUE boolean in-memory faststart force
_inmemory_pruning ON TRUE string In-memory pruning
_inmemory_force_fs_tbs SYSAUX TRUE string in-memory faststart force tablespace
_optimizer_inmemory_table_expansion TRUE FALSE boolean optimizer in-memory awareness for table expansion
_optimizer_inmemory_bloom_filter TRUE FALSE boolean controls serial bloom filter for in-memory tables
_optimizer_inmemory_cluster_aware_dop TRUE FALSE boolean Affinitize DOP for inmemory objects
_optimizer_inmemory_gen_pushable_preds TRUE FALSE boolean optimizer generate pushable predicates for in-memory
_optimizer_inmemory_minmax_pruning TRUE FALSE boolean controls use of min/max pruning for costing in-memory tables
_inmemory_auto_distribute TRUE TRUE boolean If true, enable auto distribute
_inmemory_fs_enable_blk_lvl_inv TRUE TRUE boolean in-memory faststart enable block level invalidation
_inmemory_memprot TRUE TRUE boolean enable or disable memory protection for in-memory
_inmemory_imcu_align TRUE TRUE boolean Enforce 8M IMCU alignment
_inmemory_invalidate_cursors TRUE TRUE boolean In-memory populate enable cursor invalidations
_inmemory_prepopulate TRUE TRUE boolean Enable inmemory populate by IMCO
_optimizer_inmemory_access_path TRUE FALSE boolean optimizer access path costing for in-memory
_optimizer_inmemory_autodop TRUE FALSE boolean optimizer autoDOP costing for in-memory
_inmemory_query_scan TRUE TRUE boolean In-memory scan enabled
_inmemory_trickle_repopulate TRUE TRUE boolean Enable inmemory trickle repopulate

Oracle 12c In-memory and IM scan CUs pruned mechanics

Recently I’ve learnt about one interesting optimization technique used by Oracle in-memory database engine, i.e. CU (compression units) pruning.
(see http://blogs.oracle.com/In-Memory/entry/getting_started_with_oracle_database2).
It looks similar to the trick used by Exadata storage indexes (or zone maps) to skip(prune) big data ranges by storing the min and max values of some columns
( see https://richardfoote.wordpress.com/2014/10/30/12-1-0-2-introduction-to-zone-maps-part-ii-changes/).

And I’ve almost immediate come up with idea that it also can suffer from “bad” data clustering in the same way as described by Richard Foote in his marvelous blog.

So I decided to check it and here is the result.

I’ve create a table as

CREATE TABLE IDXTEST
AS select (1e&pw) id, 1e4 id2, rpad('a',999,'a') filler
from dual where rownum <2;

ALTER TABLE IDXTEST INMEMORY PRIORITY CRITICAL;

and populated it in 2 ways:

1st (good clustering):

insert into IDXTEST select 1e&pw + trunc(rownum/1)+1 id, mod(rownum,20) id2, rpad('a',990,'b') || rownum filler
from dual connect by level <=2e5;
commit;

2nd (bad clustering):

insert into IDXTEST select 1e&pw + trunc(rownum/1)+1 id, mod(rownum,20) id2, rpad('a',990,'b') || rownum filler
from dual connect by level <=2e5 order by dbms_random.value;
commit;

I used the following query to do “in-memory” scans:
VARIABLE id2 number
EXEC :id2 :=(1e&pw.)+ 1.5e5

SELECT /*+ dynamic_sampling(t 0) FULL(t)*/ count(*) from IDXTEST t
WHERE id >:id2;

You need to run it twice:
1st time to populate in memory column store
and 2nd time to do “pure” in-memory scan.
(You need to wait 1-2 minutes before run the query 2nd time to finish population of the in memory column store, you can monitor the population process as
SELECT populate_status, bytes_not_populated,bytes,inmemory_size from v$im_segments)

and here is the corresponding stats

1st(good clustering) 2nd(bad clustering)
2 IM scan CUs pruned 0 IM scan CUs pruned
5 IM scan segments minmax eligible 5 IM scan segments minmax eligible

Moralite’: it’s still very important how your data is physically ordered/clustered even if you read it from memory.

PS
I had a small hope that Oracle in-memory engine could reorder data during populating column store but it seems to expensive operation to be true.

One more subquery factoring oddity/feature

Recently Jonathan Lewis published a mavelous blog series about subquery factoring: https://jonathanlewis.wordpress.com/2015/07/27/subquery-factoring-10/
I’ve managed to observed one more : parallel queries with subquery factoring tends to have PARALLEL TO SERIAL step in the execution plan.

I’ve created table as

CREATE TABLE PTEST TABLESPACE USERS
AS select rpad(‘z’,20,’z’) || (trunc(rownum/2)+1) id, rpad(‘a’,999,’a’) filler from dual connect by level <=1e4 order by dbms_random.value;

and the parallel query looks like:

 

ALTER SESSION FORCE PARALLEL QUERY PARALLEL 4;

with p as (SELECT * from PTEST where id>’a’)
SELECT * FROM p p0, p p1
WHERE p0.id=p1.id;

 

corresponding execution plan is :

Plan hash value: 2939472082 

-------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 230 (100)| | | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 10876 | 5661K| 110 (0)| 00:00:02 | Q1,00 | P->S | QC (RAND) |
| 4 | LOAD AS SELECT | | | | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 10876 | 5661K| 110 (0)| 00:00:02 | Q1,00 | PCWC | |
|* 6 | TABLE ACCESS FULL | PTEST | 10876 | 5661K| 110 (0)| 00:00:02 | Q1,00 | PCWP | |
| 7 | PX COORDINATOR | | | | | | | | |
| 8 | PX SEND QC (RANDOM) | :TQ20002 | 31719 | 32M| 120 (1)| 00:00:02 | Q2,02 | P->S | QC (RAND) |
|* 9 | HASH JOIN BUFFERED | | 31719 | 32M| 120 (1)| 00:00:02 | Q2,02 | PCWP | |
| 10 | PX RECEIVE | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,02 | PCWP | |
| 11 | PX SEND HASH | :TQ20000 | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,00 | P->P | HASH |
| 12 | VIEW | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,00 | PCWP | |
| 13 | PX BLOCK ITERATOR | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,00 | PCWC | |
|* 14| TABLE ACCESS FULL | SYS_TEMP_0FD9D6600_29614D6 | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,00 | PCWP | |
| 15 | PX RECEIVE | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,02 | PCWP | |
| 16 | PX SEND HASH | :TQ20001 | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,01 | P->P | HASH |
| 17 | VIEW | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,01 | PCWP | |
| 18 | PX BLOCK ITERATOR | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,01 | PCWC | |
|* 19 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6600_29614D6 | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,01 | PCWP | |
--------------------------------------------------------------------------------------------

You can see P->S part in the middle of the execution plan:| 3 | PX SEND QC (RANDOM) | :TQ10000 |…. | P->S |

SQL Plan Monitoring Details (Plan Hash Value=2939472082)                                  
=======================================================================================   
| Id |          Operation          |            Name            | ....... |    Rows   |   
|    |                             |                            | ....... |  (Actual) |   
=======================================================================================   
|  0 | SELECT STATEMENT            |                            |         |     19998 |   
|  1 |   TEMP TABLE TRANSFORMATION |                            |         |     19998 |   
|  2 |    PX COORDINATOR           |                            |         |         4 |   
|  3 |     PX SEND QC (RANDOM)     | :TQ10000                   |   ..... |         4 |   
|  4 |      LOAD AS SELECT         |                            |         |         4 |   
|  5 |       PX BLOCK ITERATOR     |                            |   ..... |     10000 |   
|  6 |        TABLE ACCESS FULL    | PTEST                      |   ..... |     10000 |   
|  7 |    PX COORDINATOR           |                            |         |     19998 |   
|  8 |     PX SEND QC (RANDOM)     | :TQ20002                   |   ......|     19998 |   
|  9 |      HASH JOIN BUFFERED     |                            |   ..... |     19998 |   
| 10 |       PX RECEIVE            |                            |   ..... |     10000 |   
| 11 |        PX SEND HASH         | :TQ20000                   |   ..... |     10000 |   
| 12 |         VIEW                |                            |   ..... |     10000 |   
| 13 |          PX BLOCK ITERATOR  |                            |   ..... |     10000 |   
| 14 |           TABLE ACCESS FULL | SYS_TEMP_0FD9D6600_29614D6 |   ..... |     10000 |   
| 15 |       PX RECEIVE            |                            |   ..... |     10000 |   
| 16 |        PX SEND HASH         | :TQ20001                   |   ..... |     10000 |   
| 17 |         VIEW                |                            |   ..... |     10000 |   
| 18 |          PX BLOCK ITERATOR  |                            |   ..... |     10000 |   
| 19 |           TABLE ACCESS FULL | SYS_TEMP_0FD9D6600_29614D6 |   ..... |     10000 |   
=======================================================================================   
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

Only 4 rows seems were treated by the query coordinator (QC).
Let’s have a look what exact rows were read by QC. I’ll use again dtraclio.d developed by Alexander Anokhin.

After monitoring QC process I got (where obj: -40016384 dobj: 4246144 – represents temporary object in SYS schema):

================================= Logical I/O Summary (grouped by object) ============= 
 object_id  data_object_id       lio        cr    cr (e)    cr (d)        cu    cu (d)  
---------- --------------- --------- --------- --------- --------- --------- ---------  
        60              60         1         1         0         0         0         0  
 -40016384              -2         1         1         0         0         0         0  
         7               7         1         1         1         0         0         0  
        46              46         1         1         1         0         0         0  
        36              36         2         2         1         0         0         0  
        37              37         6         6         4         0         0         0  
       426             426        10        10         5         0         0         0  
        78              78        15        15         0         0         0         0  
         0              -1        29         7         7         0        22         0  
     77939           77939       151       151         0         0         0         0  
 -40016384         4246144       233        12         0         0       221       127  
---------- --------------- --------- --------- --------- --------- --------- ---------  
     total                       450       207        19         0       243       127  
======================================================================================= 
                                                                                        
Legend                                                                                  
lio      : logical gets (cr + cu)                                                       
cr       : consistent gets                                                              
cr (e)   : consistent gets - examination                                                
cr (d)   : consistent gets direct                                                       
cu       : db block gets                                                                
cu (d)   : db block gets direct                                                         
ispnd (Y): buffer is pinned count                                                       
ispnd (N): buffer is not pinned count                                                   
pin rls  : pin releases                                                                 

It seems pretty much LIO has been done by QC, however let’s have a look on details.
You will see that QC is reading 4 blocks again and again (mostly in “current” mode):

kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbnew(0xFFFF80FFBFFF8260,1,273,4294967295) [tsn: 3 rdba: 0x40ca80 (1/51840) obj: -40016384 dobj: 4246144] where: 273 mode_held:  blocks: 1
kcbnew(0xFFFF80FFBFFF8260,1,273,4294967295) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 273 mode_held:  blocks: 1
kcbnew(0xFFFF80FFBFFF8260,1,273,4294967295) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 273 mode_held:  blocks: 1
kcbnew(0xFFFF80FFBFFF8260,1,273,4294967295) [tsn: 3 rdba: 0x40c900 (1/51456) obj: -40016384 dobj: 4246144] where: 273 mode_held:  blocks: 1
kcbnew(0xFFFF80FFBFFF8260,1,273,4294967295) [tsn: 3 rdba: 0x40c880 (1/51328) obj: -40016384 dobj: 4246144] where: 273 mode_held:  blocks: 1
kcbgcur(0xFFFF80FFBFFF88B0,2,65618,0) [tsn: 2 rdba: 0xc000d0 (3/208) obj: 0 dobj: -1] where: 82 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,128,0) [tsn: 2 rdba: 0xc000d0 (3/208) obj: 0 dobj: -1] where: 128 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c880 (1/51328) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c900 (1/51456) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c880 (1/51328) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c900 (1/51456) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c900 (1/51456) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcblnb_dscn(0xFFFF80FFBDDEAA00,18237071088943107,6,0)
kcblnb_dscn(0xFFFF80FFBDDEAA00,18237075383910403,6,0)
.......................................................................................
kcbgcur(0xFFFF80FFBFFF9640,2,247,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 247 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9740,2,248,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 248 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9640,2,338,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 338 mode_held: 2
kcbgcur(0xFFFF80FFBFFF92C0,2,257,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 257 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9640,2,249,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 249 mode_held: 2
kcbgcur(0xFFFF80FFBFFF89E0,2,65615,0) [tsn: 2 rdba: 0xc000d0 (3/208) obj: 0 dobj: -1] where: 79 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9C10,2,250,0) [tsn: 3 rdba: 0x40c47f (1/50303) obj: -40016384 dobj: 4246144] where: 250 mode_held: 2
kcbgcur(0xFFFF80FFBFFF8D00,2,771,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 771 mode_held: 2
kcbgcur(0xFFFF80FFBFFF8E00,2,772,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 772 mode_held: 2

and the final part of the puzzle, let’s query buffer cache header for the block which where read by QS, like:

select class,
decode(bh.state,0,’free’,1,’xcur’,2,’scur’,3,’cr’,4,’read’,5,’mrec’,
6,’irec’,7,’write’,8,’pi’, 9,’memory’,10,’mwrite’,
11,’donated’, 12,’protected’,13,’securefile’, 14,’siop’,15,’recckpt’
) state,
decode(bh.class,1,’data block’,2,’sort block’,3,’save undo block’,
4,’segment header’,5,’save undo header’,6,’free list’,7,’extent map’,
8,’1st level bmb’,9,’2nd level bmb’,10,’3rd level bmb’, 11,’bitmap block’,
12,’bitmap index block’,13,’file header block’,14,’unused’,
15,’system undo header’,16,’system undo block’, 17,’undo header’,
18,’undo block’
) block_class,
decode(bitand(bh.flag,1),0, ‘N’, ‘Y’) dirty,
decode(bitand(bh.flag,16), 0, ‘N’, ‘Y’) temp,
decode(bitand(bh.flag,1536), 0, ‘N’, ‘Y’) ping,
decode(bitand(bh.flag,16384), 0, ‘N’, ‘Y’) stale,
decode(bitand(bh.flag,65536), 0, ‘N’, ‘Y’) direct,
obj from x$bh bh where
1=1 AND obj=4246144

4	free	segment header	N	N	N	N	N	4246144
4	free	segment header	N	N	N	N	N	4246144
4	free	segment header	N	N	N	N	N	4246144
4	free	segment header	N	N	N	N	N	4246144

So it seems not a big issue with that parallel to serial step in the execution plan,
but QS is constantly reading and giving back to consumers the temporary segment header.
Actually it’s segment headers, number of headers is equal to your degree of parallelism.

Oracle index access path visualization

DtraceLio script by Alexander Anokhin has inspired me to do
some investigations in Oracle index access internals.

The main idea is to combine index tree structure with output of Dtrace LIO.

It also would be nice to combine LIO and PIO in a single output.
So I reworked the script a bit for my purposes.

PIO (phisical IO) has been mostly tricky part,

I’ve managed to found the following function in the call stack for Oracle process:

kd4obj(0xA8BCA000,0,8192,0)
kd4obj(0xA8E68000,0,8192,0)
kd4obj(0xA9D18000,0,8192,0)
kd4obj(0xA8430000,0,8192,0)
kd4obj(0xA86E2000,0,8192,0)
kd4obj(0xA8944000,0,8192,0)
kd4obj(0xA8BCC000,0,8192,0)
kd4obj(0xA8942000,0,8192,0)

it correlates very well with scattered read entries in the trace file.
So after some tryouts and guesses I come up with the following structure for the first argument of the function.

typedef struct kdbafr /* full relative dba */
{
ktsn tsn_kdbafr; /* a tablespace number */
krdba dba_kdbafr; /* a relative dba */
} kdbafr;

My addition was:

pid$1::kd4obj:entry
{
blk1 = ((kdbafr *) copyin(arg0, sizeof(kdbafr)));
tsn = blk1->tsn_kdbafr;
rdba = blk1->dba_kdbafr;
rdba_file = rdba >> 22; /* for smallfile tablespaces */
rdba_block = rdba & 0x3FFFFF;
}

pid$1::kd4obj:entry
{
printf(“%s(0x%X,%d,%d,%d) [tsn: %d rdba: 0x%x (%d/%d) ] \n”,probefunc,arg0,arg1,arg2,arg3,tsn, rdba, rdba_file, rdba_block);
}

and typical result is look like :

kcbgtcr(0xFFFF80FFB6EF5360,0,865,0) [tsn: 7 rdba: 0x1805816 (6/22550) obj: 77708 dobj: 77708] where: 865 exam: 0
kd4obj(0xB1EDE000,0,8192,0) [tsn: 41478 rdba: 0x1805810 (6/22544) ]
kd4obj(0xB10F8000,0,8192,0) [tsn: 41478 rdba: 0x1805811 (6/22545) ]
kd4obj(0xB13B0000,0,8192,0) [tsn: 41478 rdba: 0x1805812 (6/22546) ]
kd4obj(0xB1612000,0,8192,0) [tsn: 41478 rdba: 0x1805813 (6/22547) ]
kd4obj(0xB1896000,0,8192,0) [tsn: 41478 rdba: 0x1805814 (6/22548) ]
kd4obj(0xB1B2E000,0,8192,0) [tsn: 41478 rdba: 0x1805815 (6/22549) ]
kd4obj(0xB1B2C000,0,8192,0) [tsn: 41478 rdba: 0x1805816 (6/22550) ]
kd4obj(0xB1EE0000,0,8192,0) [tsn: 41478 rdba: 0x1805817 (6/22551) ]

So I guessed correctly for RDBA part but the 1st part is still unknown for me(it is definetely not a tablespace number 🙂 ).

I used
ALTER SESSION SET EVENTS ‘immediate trace name TREEDUMP level &obj_id’;
to dump the index tree structure and implemented a simple perl script to match dtracelio output with the tree structure.

Here is first results:

Single block access by unique index on the left and non-unique index access on the right side.

Unique_single
access by unique index

NonUnique_single
access by none-unique index

Bars above zero denotes index blocks access (height is a height of the corresponding index block in the index tree hierarchy), leaf index blocks are colored in red, bars below zero denotes table blocks access (-1 is LIO and -7 is PIO).

Looking into “raw” data for the pictures above you can see that for none-unique index access the leaf block has been accessed twice, it explains the “famous” difference in consistent reads between UNIQUE and NON UNIQUE  indexes (I would say actually it doesn’t matter for any real life scenario though)

LIO_IDX, 0x1400083, 4, 77795, branch
LIO_IDX, 0x14010cc, 3, 77795, branch
LIO_IDX, 0x14000c8, 2, 77795, branch
LIO_IDX, 0x1400086, 0, 77795, leaf, 0.2
LIO_TBL, 0x1804ee5, -1, 77794, table, , -1
PIO_TBL, 0x1804ee5,-7, , table , , , , -7
LIO_IDX, 0x1400086, 0, 77795, leaf, 0.2

PS

I’m going to continue publishing the further results of the investigation and here 1 picture from forthcoming results 🙂

PPS
I need to emphasize that db4obj doesn’t represent an OS level physical IO but rather a step of returning a single block to buffer cache from kd* level.

Direct_access