Chunking tables 8: second solution

Here is the second complete solution to the problem of dividing a table into chunks of equal size. This is the solution whose first version was marked “correct” on OTN. It is the basis for what Bryn Llewellyn calls “Approx_Method_Sql_Ashton_2” in his white paper.

Current list of posts: 

Overview

  1. extents_data: get all extents and data object ids for the table
  2. extents_with_sums:
    1. treat all the table blocks as if they were one contiguous set
    2. number all the blocks in ascending sequence
    3. calculate total number of blocks
  3. filtered_extents: filter out the extents that do not contain a chunk boundary
  4. expanded_extents: expand every row that crosses chunk boundaries so that there will be one row per chunk
  5. rowids:
    1. calculate each ROWID
    2. group by chunk.

filtered_extents

with extents_data as (
  select /*+ qb_name(extents_data) */
    o.data_object_id, e.file_id, e.block_id, e.blocks
  from dba_extents e
  join all_objects o
  on e.owner = o.owner
    and e.segment_name = o.object_name
    and e.segment_type = o.object_type
    and decode(e.partition_name, o.subobject_name, 0, 1) = 0
  where e.segment_type like 'TABLE%'
    and e.owner = :owner
    and e.segment_name = :table_name
)
, extents_with_sums as (   
  select /*+ qb_name(extents_with_sums) */
    sum(blocks) over() as total_blocks, 
    sum(blocks) over(
      order by data_object_id, file_id, block_id
    ) - blocks as first_extent_block,   
    sum(blocks) over(
      order by data_object_id, file_id, block_id
    ) as next_extent_block, 
    e.* 
  from extents_data e 
) 
, filtered_extents as (  
  select /*+ qb_name(filtered_extents)*/ * from (
    select
      width_bucket(first_extent_block-1, 0, total_blocks, :num_chunks)
        as prev_chunk, 
      width_bucket(first_extent_block, 0, total_blocks, :num_chunks)
        as first_chunk, 
      width_bucket(next_extent_block-1, 0, total_blocks, :num_chunks)
        as last_chunk,  
      width_bucket(next_extent_block, 0, total_blocks, :num_chunks)
        as next_chunk, 
      e.* 
    from extents_with_sums e   
  )
  where prev_chunk < next_chunk
)

The first two named subqueries are identical to the first solution. The filtered_extents subquery differs only by calculating more columns, since I will need them all later:

  • PREV_CHUNK is the chunk to which the end of the previous extent belongs.
  • FIRST_CHUNK and LAST_CHUNK are the chunks containing the first and last block of the current extent.
  • NEXT_CHUNK is the chunk to which the start of the next extent belongs.

expanded_extents

, expanded_extents as (
  select /*+ qb_name(expanded_extents) */
    first_chunk + level - 1 as chunk,
    prev_chunk, next_chunk, data_object_id, file_id,
    block_id, total_blocks, first_extent_block
  from filtered_extents
  connect by first_extent_block = prior first_extent_block
    and prior sys_guid() is not null
    and first_chunk + level - 1 <= last_chunk
)

This subquery used the method explained in my previous post to expand each row that crosses chunk boundaries. The only difference among the expanded rows is the chunk number assigned to each.

rowids

, rowids as (
  select /*+ qb_name(rowids) */ chunk,
    case when chunk > prev_chunk then dbms_rowid.rowid_create(
      1, data_object_id, file_id,
      block_id + CEIL( (chunk-1) * total_blocks / :num_chunks - 1 ) + 1 - first_extent_block,
      0
    ) 
    end as start_rowid,
    case when chunk < next_chunk then dbms_rowid.rowid_create(
      1, data_object_id, file_id,
      block_id + CEIL( chunk * total_blocks / :num_chunks - 1 )  - first_extent_block,
      32767
    ) 
    end as end_rowid
  from expanded_extents
)
select chunk, min(start_rowid) start_rowid, max(end_rowid) end_rowid
from rowids
group by chunk;

This subquery calculates the starting and ending rowids for each chunk. It is much more compact than the way I did this previously.

By comparing the current chunk number with prev_chunk and next_chunk, I can tell if I should calculate a starting rowid, an ending rowid or both. To calculate the block number,

  • I start from the block_id of the extent
  • I add the block number of the target chunk boundary
  • then I subtract the number I assigned previously to the first block of the extent.

At this point, each chunk may be present in one row or two. A simple GROUP BY gives me one row per chunk.

Evolution

My version of this solution has changed over time, especially while writing this series.

  • I now make sure my numbering scheme always starts with zero.
  • I have also incorporated WIDTH_BUCKET() and changed the way I calculate chunk boundaries to be coherent.
  • Above all, I have simplified the code that says when to calculate the rowids, so that the KEEP (DENSE_RANK...) functions are no longer necessary. I apologize for the earlier complicated code.

By the way, here is the code all together if anyone wants to copy and paste.

with extents_data as (
  select /*+ qb_name(extents_data) */
    o.data_object_id, e.file_id, e.block_id, e.blocks
  from dba_extents e
  join all_objects o
  on e.owner = o.owner
    and e.segment_name = o.object_name
    and e.segment_type = o.object_type
    and decode(e.partition_name, o.subobject_name, 0, 1) = 0
  where e.segment_type like 'TABLE%'
    and e.owner = :owner
    and e.segment_name = :table_name
)
, extents_with_sums as (   
  select /*+ qb_name(extents_with_sums) */
    sum(blocks) over() as total_blocks, 
    sum(blocks) over(
      order by data_object_id, file_id, block_id
    ) - blocks as first_extent_block,   
    sum(blocks) over(
      order by data_object_id, file_id, block_id
    ) as next_extent_block, 
    e.* 
  from extents_data e 
) 
, filtered_extents as (  
  select /*+ qb_name(filtered_extents)*/ * from (
    select
      width_bucket(first_extent_block-1, 0, total_blocks, :num_chunks)
        as prev_chunk, 
      width_bucket(first_extent_block, 0, total_blocks, :num_chunks)
        as first_chunk, 
      width_bucket(next_extent_block-1, 0, total_blocks, :num_chunks)
        as last_chunk,  
      width_bucket(next_extent_block, 0, total_blocks, :num_chunks)
        as next_chunk, 
      e.* 
    from extents_with_sums e   
  )
  where prev_chunk < next_chunk
)
, expanded_extents as (
  select /*+ qb_name(expanded_extents) */
    first_chunk + level - 1 as chunk,
    prev_chunk, next_chunk, data_object_id, file_id,
    block_id, total_blocks, first_extent_block
  from filtered_extents
  connect by first_extent_block = prior first_extent_block
    and prior sys_guid() is not null
    and first_chunk + level - 1 <= last_chunk
)
, rowids as (
  select /*+ qb_name(rowids) */ chunk,
    case when chunk > prev_chunk then dbms_rowid.rowid_create(
      1, data_object_id, file_id,
      block_id + CEIL( (chunk-1) * total_blocks / :num_chunks - 1 ) + 1 - first_extent_block,
      0
    ) 
    end as start_rowid,
    case when chunk < next_chunk then dbms_rowid.rowid_create(
      1, data_object_id, file_id,
      block_id + CEIL( chunk * total_blocks / :num_chunks - 1 )  - first_extent_block,
      32767
    ) 
    end as end_rowid
  from expanded_extents
)
select chunk, min(start_rowid) start_rowid, max(end_rowid) end_rowid
from rowids
group by chunk;
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s