One more subquery factoring oddity/feature

Recently Jonathan Lewis published a mavelous blog series about subquery factoring: https://jonathanlewis.wordpress.com/2015/07/27/subquery-factoring-10/
I’ve managed to observed one more : parallel queries with subquery factoring tends to have PARALLEL TO SERIAL step in the execution plan.

I’ve created table as

CREATE TABLE PTEST TABLESPACE USERS
AS select rpad(‘z’,20,’z’) || (trunc(rownum/2)+1) id, rpad(‘a’,999,’a’) filler from dual connect by level <=1e4 order by dbms_random.value;

and the parallel query looks like:

 

ALTER SESSION FORCE PARALLEL QUERY PARALLEL 4;

with p as (SELECT * from PTEST where id>’a’)
SELECT * FROM p p0, p p1
WHERE p0.id=p1.id;

 

corresponding execution plan is :

Plan hash value: 2939472082 

-------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 230 (100)| | | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 10876 | 5661K| 110 (0)| 00:00:02 | Q1,00 | P->S | QC (RAND) |
| 4 | LOAD AS SELECT | | | | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 10876 | 5661K| 110 (0)| 00:00:02 | Q1,00 | PCWC | |
|* 6 | TABLE ACCESS FULL | PTEST | 10876 | 5661K| 110 (0)| 00:00:02 | Q1,00 | PCWP | |
| 7 | PX COORDINATOR | | | | | | | | |
| 8 | PX SEND QC (RANDOM) | :TQ20002 | 31719 | 32M| 120 (1)| 00:00:02 | Q2,02 | P->S | QC (RAND) |
|* 9 | HASH JOIN BUFFERED | | 31719 | 32M| 120 (1)| 00:00:02 | Q2,02 | PCWP | |
| 10 | PX RECEIVE | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,02 | PCWP | |
| 11 | PX SEND HASH | :TQ20000 | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,00 | P->P | HASH |
| 12 | VIEW | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,00 | PCWP | |
| 13 | PX BLOCK ITERATOR | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,00 | PCWC | |
|* 14| TABLE ACCESS FULL | SYS_TEMP_0FD9D6600_29614D6 | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,00 | PCWP | |
| 15 | PX RECEIVE | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,02 | PCWP | |
| 16 | PX SEND HASH | :TQ20001 | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,01 | P->P | HASH |
| 17 | VIEW | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,01 | PCWP | |
| 18 | PX BLOCK ITERATOR | | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,01 | PCWC | |
|* 19 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6600_29614D6 | 10876 | 5661K| 60 (0)| 00:00:01 | Q2,01 | PCWP | |
--------------------------------------------------------------------------------------------

You can see P->S part in the middle of the execution plan:| 3 | PX SEND QC (RANDOM) | :TQ10000 |…. | P->S |

SQL Plan Monitoring Details (Plan Hash Value=2939472082)                                  
=======================================================================================   
| Id |          Operation          |            Name            | ....... |    Rows   |   
|    |                             |                            | ....... |  (Actual) |   
=======================================================================================   
|  0 | SELECT STATEMENT            |                            |         |     19998 |   
|  1 |   TEMP TABLE TRANSFORMATION |                            |         |     19998 |   
|  2 |    PX COORDINATOR           |                            |         |         4 |   
|  3 |     PX SEND QC (RANDOM)     | :TQ10000                   |   ..... |         4 |   
|  4 |      LOAD AS SELECT         |                            |         |         4 |   
|  5 |       PX BLOCK ITERATOR     |                            |   ..... |     10000 |   
|  6 |        TABLE ACCESS FULL    | PTEST                      |   ..... |     10000 |   
|  7 |    PX COORDINATOR           |                            |         |     19998 |   
|  8 |     PX SEND QC (RANDOM)     | :TQ20002                   |   ......|     19998 |   
|  9 |      HASH JOIN BUFFERED     |                            |   ..... |     19998 |   
| 10 |       PX RECEIVE            |                            |   ..... |     10000 |   
| 11 |        PX SEND HASH         | :TQ20000                   |   ..... |     10000 |   
| 12 |         VIEW                |                            |   ..... |     10000 |   
| 13 |          PX BLOCK ITERATOR  |                            |   ..... |     10000 |   
| 14 |           TABLE ACCESS FULL | SYS_TEMP_0FD9D6600_29614D6 |   ..... |     10000 |   
| 15 |       PX RECEIVE            |                            |   ..... |     10000 |   
| 16 |        PX SEND HASH         | :TQ20001                   |   ..... |     10000 |   
| 17 |         VIEW                |                            |   ..... |     10000 |   
| 18 |          PX BLOCK ITERATOR  |                            |   ..... |     10000 |   
| 19 |           TABLE ACCESS FULL | SYS_TEMP_0FD9D6600_29614D6 |   ..... |     10000 |   
=======================================================================================   
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

Only 4 rows seems were treated by the query coordinator (QC).
Let’s have a look what exact rows were read by QC. I’ll use again dtraclio.d developed by Alexander Anokhin.

After monitoring QC process I got (where obj: -40016384 dobj: 4246144 – represents temporary object in SYS schema):

================================= Logical I/O Summary (grouped by object) ============= 
 object_id  data_object_id       lio        cr    cr (e)    cr (d)        cu    cu (d)  
---------- --------------- --------- --------- --------- --------- --------- ---------  
        60              60         1         1         0         0         0         0  
 -40016384              -2         1         1         0         0         0         0  
         7               7         1         1         1         0         0         0  
        46              46         1         1         1         0         0         0  
        36              36         2         2         1         0         0         0  
        37              37         6         6         4         0         0         0  
       426             426        10        10         5         0         0         0  
        78              78        15        15         0         0         0         0  
         0              -1        29         7         7         0        22         0  
     77939           77939       151       151         0         0         0         0  
 -40016384         4246144       233        12         0         0       221       127  
---------- --------------- --------- --------- --------- --------- --------- ---------  
     total                       450       207        19         0       243       127  
======================================================================================= 
                                                                                        
Legend                                                                                  
lio      : logical gets (cr + cu)                                                       
cr       : consistent gets                                                              
cr (e)   : consistent gets - examination                                                
cr (d)   : consistent gets direct                                                       
cu       : db block gets                                                                
cu (d)   : db block gets direct                                                         
ispnd (Y): buffer is pinned count                                                       
ispnd (N): buffer is not pinned count                                                   
pin rls  : pin releases                                                                 

It seems pretty much LIO has been done by QC, however let’s have a look on details.
You will see that QC is reading 4 blocks again and again (mostly in “current” mode):

kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbnew(0xFFFF80FFBFFF8260,1,273,4294967295) [tsn: 3 rdba: 0x40ca80 (1/51840) obj: -40016384 dobj: 4246144] where: 273 mode_held:  blocks: 1
kcbnew(0xFFFF80FFBFFF8260,1,273,4294967295) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 273 mode_held:  blocks: 1
kcbnew(0xFFFF80FFBFFF8260,1,273,4294967295) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 273 mode_held:  blocks: 1
kcbnew(0xFFFF80FFBFFF8260,1,273,4294967295) [tsn: 3 rdba: 0x40c900 (1/51456) obj: -40016384 dobj: 4246144] where: 273 mode_held:  blocks: 1
kcbnew(0xFFFF80FFBFFF8260,1,273,4294967295) [tsn: 3 rdba: 0x40c880 (1/51328) obj: -40016384 dobj: 4246144] where: 273 mode_held:  blocks: 1
kcbgcur(0xFFFF80FFBFFF88B0,2,65618,0) [tsn: 2 rdba: 0xc000d0 (3/208) obj: 0 dobj: -1] where: 82 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,128,0) [tsn: 2 rdba: 0xc000d0 (3/208) obj: 0 dobj: -1] where: 128 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c880 (1/51328) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c900 (1/51456) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c880 (1/51328) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c900 (1/51456) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40c900 (1/51456) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9A60,2,33,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 33 mode_held: 2
kcblnb_dscn(0xFFFF80FFBDDEAA00,18237071088943107,6,0)
kcblnb_dscn(0xFFFF80FFBDDEAA00,18237075383910403,6,0)
.......................................................................................
kcbgcur(0xFFFF80FFBFFF9640,2,247,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 247 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9740,2,248,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 248 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9640,2,338,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 338 mode_held: 2
kcbgcur(0xFFFF80FFBFFF92C0,2,257,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 257 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9640,2,249,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 249 mode_held: 2
kcbgcur(0xFFFF80FFBFFF89E0,2,65615,0) [tsn: 2 rdba: 0xc000d0 (3/208) obj: 0 dobj: -1] where: 79 mode_held: 2
kcbgcur(0xFFFF80FFBFFF9C10,2,250,0) [tsn: 3 rdba: 0x40c47f (1/50303) obj: -40016384 dobj: 4246144] where: 250 mode_held: 2
kcbgcur(0xFFFF80FFBFFF8D00,2,771,0) [tsn: 3 rdba: 0x40ca00 (1/51712) obj: -40016384 dobj: 4246144] where: 771 mode_held: 2
kcbgcur(0xFFFF80FFBFFF8E00,2,772,0) [tsn: 3 rdba: 0x40c980 (1/51584) obj: -40016384 dobj: 4246144] where: 772 mode_held: 2

and the final part of the puzzle, let’s query buffer cache header for the block which where read by QS, like:

select class,
decode(bh.state,0,’free’,1,’xcur’,2,’scur’,3,’cr’,4,’read’,5,’mrec’,
6,’irec’,7,’write’,8,’pi’, 9,’memory’,10,’mwrite’,
11,’donated’, 12,’protected’,13,’securefile’, 14,’siop’,15,’recckpt’
) state,
decode(bh.class,1,’data block’,2,’sort block’,3,’save undo block’,
4,’segment header’,5,’save undo header’,6,’free list’,7,’extent map’,
8,’1st level bmb’,9,’2nd level bmb’,10,’3rd level bmb’, 11,’bitmap block’,
12,’bitmap index block’,13,’file header block’,14,’unused’,
15,’system undo header’,16,’system undo block’, 17,’undo header’,
18,’undo block’
) block_class,
decode(bitand(bh.flag,1),0, ‘N’, ‘Y’) dirty,
decode(bitand(bh.flag,16), 0, ‘N’, ‘Y’) temp,
decode(bitand(bh.flag,1536), 0, ‘N’, ‘Y’) ping,
decode(bitand(bh.flag,16384), 0, ‘N’, ‘Y’) stale,
decode(bitand(bh.flag,65536), 0, ‘N’, ‘Y’) direct,
obj from x$bh bh where
1=1 AND obj=4246144

4	free	segment header	N	N	N	N	N	4246144
4	free	segment header	N	N	N	N	N	4246144
4	free	segment header	N	N	N	N	N	4246144
4	free	segment header	N	N	N	N	N	4246144

So it seems not a big issue with that parallel to serial step in the execution plan,
but QS is constantly reading and giving back to consumers the temporary segment header.
Actually it’s segment headers, number of headers is equal to your degree of parallelism.

Advertisements

Oracle index access path visualization

DtraceLio script by Alexander Anokhin has inspired me to do
some investigations in Oracle index access internals.

The main idea is to combine index tree structure with output of Dtrace LIO.

It also would be nice to combine LIO and PIO in a single output.
So I reworked the script a bit for my purposes.

PIO (phisical IO) has been mostly tricky part,

I’ve managed to found the following function in the call stack for Oracle process:

kd4obj(0xA8BCA000,0,8192,0)
kd4obj(0xA8E68000,0,8192,0)
kd4obj(0xA9D18000,0,8192,0)
kd4obj(0xA8430000,0,8192,0)
kd4obj(0xA86E2000,0,8192,0)
kd4obj(0xA8944000,0,8192,0)
kd4obj(0xA8BCC000,0,8192,0)
kd4obj(0xA8942000,0,8192,0)

it correlates very well with scattered read entries in the trace file.
So after some tryouts and guesses I come up with the following structure for the first argument of the function.

typedef struct kdbafr /* full relative dba */
{
ktsn tsn_kdbafr; /* a tablespace number */
krdba dba_kdbafr; /* a relative dba */
} kdbafr;

My addition was:

pid$1::kd4obj:entry
{
blk1 = ((kdbafr *) copyin(arg0, sizeof(kdbafr)));
tsn = blk1->tsn_kdbafr;
rdba = blk1->dba_kdbafr;
rdba_file = rdba >> 22; /* for smallfile tablespaces */
rdba_block = rdba & 0x3FFFFF;
}

pid$1::kd4obj:entry
{
printf(“%s(0x%X,%d,%d,%d) [tsn: %d rdba: 0x%x (%d/%d) ] \n”,probefunc,arg0,arg1,arg2,arg3,tsn, rdba, rdba_file, rdba_block);
}

and typical result is look like :

kcbgtcr(0xFFFF80FFB6EF5360,0,865,0) [tsn: 7 rdba: 0x1805816 (6/22550) obj: 77708 dobj: 77708] where: 865 exam: 0
kd4obj(0xB1EDE000,0,8192,0) [tsn: 41478 rdba: 0x1805810 (6/22544) ]
kd4obj(0xB10F8000,0,8192,0) [tsn: 41478 rdba: 0x1805811 (6/22545) ]
kd4obj(0xB13B0000,0,8192,0) [tsn: 41478 rdba: 0x1805812 (6/22546) ]
kd4obj(0xB1612000,0,8192,0) [tsn: 41478 rdba: 0x1805813 (6/22547) ]
kd4obj(0xB1896000,0,8192,0) [tsn: 41478 rdba: 0x1805814 (6/22548) ]
kd4obj(0xB1B2E000,0,8192,0) [tsn: 41478 rdba: 0x1805815 (6/22549) ]
kd4obj(0xB1B2C000,0,8192,0) [tsn: 41478 rdba: 0x1805816 (6/22550) ]
kd4obj(0xB1EE0000,0,8192,0) [tsn: 41478 rdba: 0x1805817 (6/22551) ]

So I guessed correctly for RDBA part but the 1st part is still unknown for me(it is definetely not a tablespace number 🙂 ).

I used
ALTER SESSION SET EVENTS ‘immediate trace name TREEDUMP level &obj_id’;
to dump the index tree structure and implemented a simple perl script to match dtracelio output with the tree structure.

Here is first results:

Single block access by unique index on the left and non-unique index access on the right side.

Unique_single
access by unique index

NonUnique_single
access by none-unique index

Bars above zero denotes index blocks access (height is a height of the corresponding index block in the index tree hierarchy), leaf index blocks are colored in red, bars below zero denotes table blocks access (-1 is LIO and -7 is PIO).

Looking into “raw” data for the pictures above you can see that for none-unique index access the leaf block has been accessed twice, it explains the “famous” difference in consistent reads between UNIQUE and NON UNIQUE  indexes (I would say actually it doesn’t matter for any real life scenario though)

LIO_IDX, 0x1400083, 4, 77795, branch
LIO_IDX, 0x14010cc, 3, 77795, branch
LIO_IDX, 0x14000c8, 2, 77795, branch
LIO_IDX, 0x1400086, 0, 77795, leaf, 0.2
LIO_TBL, 0x1804ee5, -1, 77794, table, , -1
PIO_TBL, 0x1804ee5,-7, , table , , , , -7
LIO_IDX, 0x1400086, 0, 77795, leaf, 0.2

PS

I’m going to continue publishing the further results of the investigation and here 1 picture from forthcoming results 🙂

PPS
I need to emphasize that db4obj doesn’t represent an OS level physical IO but rather a step of returning a single block to buffer cache from kd* level.

Direct_access