Chunking tables 5: chunk boundaries

We still want to split a table into 12 chunks with equal numbers of blocks, so we can access each chunk using a rowid range. We have the extents we need and now we need to calculate the exact chunk boundaries.

Current list of posts: 

Histograms and WIDTH_BUCKET

I explained in my previous post that WIDTH_BUCKET is based on a histogram having identical intervals in a range.

  • The minimum range value is included in the range but the maximum range value is not.
  • The same thing is true for each interval: the minimum value is included in the interval, but not the maximum value.

We have already used WIDTH_BUCKET to answer the question: “what chunk is this block in?” Now we are asking the opposite question: “what blocks start and end this chunk?” We need to make absolutely sure the answers are coherent!

Getting the upper chunk boundary

Using our test table from the previous post, let’s compare what WIDTH_BUCKET tells us with a wrong answer and a right answer. Then I’ll explain in detail. Right now, I’m only interested in those blocks that end a chunk.

select this_block_chunk, block last_block,
  round(256 * (this_block_chunk / 12), 1) max_chunk_val,
  floor(256 * (this_block_chunk / 12)) floor_val,
  ceil (256 * (this_block_chunk / 12) - 1) right_val
from (
  select level-1 block,
  width_bucket(level-1,0,256,12) this_block_chunk,
  width_bucket(level,0,256,12) next_block_chunk
  from dual
  connect by level <= 256
)
where this_block_chunk < next_block_chunk;
THIS_BLOCK_CHUNK LAST_BLOCK MAX_CHUNK_VAL FLOOR_VAL RIGHT_VAL
1 21 21.3 21 21
2 42 42.7 42 42
3 63 64 64 63
4 85 85.3 85 85
5 106 106.7 106 106
6 127 128 128 127
7 149 149.3 149 149
8 170 170.7 170 170
9 191 192 192 191
10 213 213.3 213 213
11 234 234.7 234 234
12 255 256 256 255

 

For each block, I asked WIDTH_BUCKET what chunk it is in and what chunk the next block is in, and I kept only those blocks where the next block is in the next chunk. That means I have a list of the last blocks for each chunk.

In the third column, I calculate the maximum value for each chunk. Notice it is not always an integer value. How can we calculate the greatest block number that is less than that maximum value?

  • Using CEIL() or ROUND(), we will sometimes get a result that is greater than the maximum value.
  • Using FLOOR(), we will sometimes get a result that is equal to the maximum value – but we need to be less than, not equal to! You can see the problem in chunks 3, 6, 9 and 12.
  • The right answer is to subtract 1 from the maximum value and then use CEIL(). That way the result will always be less than the maximum value, but as close to it as possible.

Getting the lower chunk boundary

I’ll bet you thought of this yourself. The lower boundary of a chunk is one greater that the upper boundary of the previous chunk :-)

Coming next…

We have the extents we need, and we have a method for calculating all the chunk boundaries. In the next post, I’ll present the first of two complete solutions for chunking a table.

Current list of posts: 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s