# Chunking tables 8: second solution

Here is the second complete solution to the problem of dividing a table into chunks of equal size. This is the solution whose first version was marked “correct” on OTN. It is the basis for what Bryn Llewellyn calls “Approx_Method_Sql_Ashton_2” in his white paper.

Current list of posts:

#### Overview

1. extents_data: get all extents and data object ids for the table
2. extents_with_sums:
1. treat all the table blocks as if they were one contiguous set
2. number all the blocks in ascending sequence
3. calculate total number of blocks
3. filtered_extents: filter out the extents that do not contain a chunk boundary
4. expanded_extents: expand every row that crosses chunk boundaries so that there will be one row per chunk
5. rowids:
1. calculate each ROWID
2. group by chunk.

#### filtered_extents

```with extents_data as (
select /*+ qb_name(extents_data) */
o.data_object_id, e.file_id, e.block_id, e.blocks
from dba_extents e
join all_objects o
on e.owner = o.owner
and e.segment_name = o.object_name
and e.segment_type = o.object_type
and decode(e.partition_name, o.subobject_name, 0, 1) = 0
where e.segment_type like 'TABLE%'
and e.owner = :owner
and e.segment_name = :table_name
)
, extents_with_sums as (
select /*+ qb_name(extents_with_sums) */
sum(blocks) over() as total_blocks,
sum(blocks) over(
order by data_object_id, file_id, block_id
) - blocks as first_extent_block,
sum(blocks) over(
order by data_object_id, file_id, block_id
) as next_extent_block,
e.*
from extents_data e
)
, filtered_extents as (
select /*+ qb_name(filtered_extents)*/ * from (
select
width_bucket(first_extent_block-1, 0, total_blocks, :num_chunks)
as prev_chunk,
width_bucket(first_extent_block, 0, total_blocks, :num_chunks)
as first_chunk,
width_bucket(next_extent_block-1, 0, total_blocks, :num_chunks)
as last_chunk,
width_bucket(next_extent_block, 0, total_blocks, :num_chunks)
as next_chunk,
e.*
from extents_with_sums e
)
where prev_chunk < next_chunk
)```

The first two named subqueries are identical to the first solution. The filtered_extents subquery differs only by calculating more columns, since I will need them all later:

• `PREV_CHUNK` is the chunk to which the end of the previous extent belongs.
• `FIRST_CHUNK` and `LAST_CHUNK` are the chunks containing the first and last block of the current extent.
• `NEXT_CHUNK` is the chunk to which the start of the next extent belongs.

#### expanded_extents

```, expanded_extents as (
select /*+ qb_name(expanded_extents) */
first_chunk + level - 1 as chunk,
prev_chunk, next_chunk, data_object_id, file_id,
block_id, total_blocks, first_extent_block
from filtered_extents
connect by first_extent_block = prior first_extent_block
and prior sys_guid() is not null
and first_chunk + level - 1 <= last_chunk
)```

This subquery used the method explained in my previous post to expand each row that crosses chunk boundaries. The only difference among the expanded rows is the chunk number assigned to each.

#### rowids

```, rowids as (
select /*+ qb_name(rowids) */ chunk,
case when chunk > prev_chunk then dbms_rowid.rowid_create(
1, data_object_id, file_id,
block_id + CEIL( (chunk-1) * total_blocks / :num_chunks - 1 ) + 1 - first_extent_block,
0
)
end as start_rowid,
case when chunk < next_chunk then dbms_rowid.rowid_create(
1, data_object_id, file_id,
block_id + CEIL( chunk * total_blocks / :num_chunks - 1 )  - first_extent_block,
32767
)
end as end_rowid
from expanded_extents
)
select chunk, min(start_rowid) start_rowid, max(end_rowid) end_rowid
from rowids
group by chunk;```

This subquery calculates the starting and ending rowids for each chunk. It is much more compact than the way I did this previously.

By comparing the current chunk number with prev_chunk and next_chunk, I can tell if I should calculate a starting rowid, an ending rowid or both. To calculate the block number,

• I start from the block_id of the extent
• I add the block number of the target chunk boundary
• then I subtract the number I assigned previously to the first block of the extent.

At this point, each chunk may be present in one row or two. A simple `GROUP BY` gives me one row per chunk.

#### Evolution

My version of this solution has changed over time, especially while writing this series.

• I now make sure my numbering scheme always starts with zero.
• I have also incorporated `WIDTH_BUCKET()` and changed the way I calculate chunk boundaries to be coherent.
• Above all, I have simplified the code that says when to calculate the rowids, so that the `KEEP (DENSE_RANK...)` functions are no longer necessary. I apologize for the earlier complicated code.

By the way, here is the code all together if anyone wants to copy and paste.

```with extents_data as (
select /*+ qb_name(extents_data) */
o.data_object_id, e.file_id, e.block_id, e.blocks
from dba_extents e
join all_objects o
on e.owner = o.owner
and e.segment_name = o.object_name
and e.segment_type = o.object_type
and decode(e.partition_name, o.subobject_name, 0, 1) = 0
where e.segment_type like 'TABLE%'
and e.owner = :owner
and e.segment_name = :table_name
)
, extents_with_sums as (
select /*+ qb_name(extents_with_sums) */
sum(blocks) over() as total_blocks,
sum(blocks) over(
order by data_object_id, file_id, block_id
) - blocks as first_extent_block,
sum(blocks) over(
order by data_object_id, file_id, block_id
) as next_extent_block,
e.*
from extents_data e
)
, filtered_extents as (
select /*+ qb_name(filtered_extents)*/ * from (
select
width_bucket(first_extent_block-1, 0, total_blocks, :num_chunks)
as prev_chunk,
width_bucket(first_extent_block, 0, total_blocks, :num_chunks)
as first_chunk,
width_bucket(next_extent_block-1, 0, total_blocks, :num_chunks)
as last_chunk,
width_bucket(next_extent_block, 0, total_blocks, :num_chunks)
as next_chunk,
e.*
from extents_with_sums e
)
where prev_chunk < next_chunk
)
, expanded_extents as (
select /*+ qb_name(expanded_extents) */
first_chunk + level - 1 as chunk,
prev_chunk, next_chunk, data_object_id, file_id,
block_id, total_blocks, first_extent_block
from filtered_extents
connect by first_extent_block = prior first_extent_block
and prior sys_guid() is not null
and first_chunk + level - 1 <= last_chunk
)
, rowids as (
select /*+ qb_name(rowids) */ chunk,
case when chunk > prev_chunk then dbms_rowid.rowid_create(
1, data_object_id, file_id,
block_id + CEIL( (chunk-1) * total_blocks / :num_chunks - 1 ) + 1 - first_extent_block,
0
)
end as start_rowid,
case when chunk < next_chunk then dbms_rowid.rowid_create(
1, data_object_id, file_id,
block_id + CEIL( chunk * total_blocks / :num_chunks - 1 )  - first_extent_block,
32767
)
end as end_rowid
from expanded_extents
)
select chunk, min(start_rowid) start_rowid, max(end_rowid) end_rowid
from rowids
group by chunk;```