Splitting Strings: PL/SQL

Marc Bleron and Todd Hershiser gave me some very valuable feedback through their comments on my recent “Splitting Strings” posts. The big news is: PL/SQL beats SQL!

fn:tokenize

XQUERY provides some builtin functions prefixed by “fn”. fn:tokenize is equivalent to ora:tokenize except that it doesn’t break when the delimiter is absent from the input string: it just returns the string. Marc says fn:tokenize is supported, and it does work in 12.1.0.2, but the official documentation says “not supported”. I have asked Marc for more information.

This is not a big deal. With fn:tokenize, there would simply be a bit less typing.

Escaping the delimiter

Todd pointed out that tokenize splits a string based on a regular expression, and some common delimiters (like ‘|’) have special meaning in regular expressions. As a result, I now recommend to simply escape the delimiter with a backslash ‘\’.

PL/SQL

Todd provided a string splitting function in PL/SQL and claimed it is clearly faster than ora:tokenize. He is right!

I wrote a function similar to his and compared it to the “tokenize” solution. Here is the function:

create or replace function string_tokenize(
  p_string in varchar2,
  p_delim in varchar2
)
return sys.odcivarchar2list pipelined
as
  i_prev_pos integer := 1;
  i_pos integer;
  i_max_pos integer := length(p_string) + 1;
  i_delim_length integer := length(p_delim);
begin
  loop
    i_pos := instr(p_string, p_delim, i_prev_pos);
    if i_pos = 0 then
      i_pos := i_max_pos;
    end if;
    pipe row(substr(p_string, i_prev_pos, i_pos - i_prev_pos));
    exit when i_pos = i_max_pos;
    i_prev_pos := i_pos + i_delim_length;
  end loop;
  return;
end string_tokenize;
/

By the way, I tested this function with and without the PRAGMA UDF clause introduced in 12. I found no difference in performance in this case. Here is my final test harness:

set serveroutput on
declare
  l_num number;
  l_timestamp timestamp;
  l_plsql_secs number;
  l_tokenize_secs number;
  l_num_substrings number := 10;
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
begin
  select count(*) into l_num from user_tables where table_name = 'T';
  if l_num > 0 then
    do('drop table t purge');
  end if;
  do('create table t(id number, str varchar2(4000)) cache');
  insert into t
  select level, to_char(level,'fm000000000')||',000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010'
  from dual
  connect by level <= 10000;
  commit;
  dbms_output.put_line('Substrings' || chr(9) || 'tokenize' || chr(9) || 'PL/SQL');
  for i in 1..10 loop
    select count(*) into l_num from t;
    
    l_timestamp := localtimestamp;
    select count(column_value) into l_num from (
      select id, column_value from t, table(string_tokenize(str, ','))
    );
    l_plsql_secs := extract(second from localtimestamp - l_timestamp);
    
    l_timestamp := localtimestamp;
    select count(subs) into l_num from (
      select id, subs from t, xmltable(
        'if (contains($X,",")) then ora:tokenize($X,"\,") else $X' 
        passing str as X columns subs varchar2(4000) path '.')
    );
    l_tokenize_secs := extract(second from localtimestamp - l_timestamp);
    dbms_output.put_line(l_num_substrings || chr(9) || l_tokenize_secs || chr(9) || l_plsql_secs);
    update t set str =
    str || ',000000001,000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010';
    l_num_substrings := l_num_substrings + 10;
    commit;
  end loop;
end;
/

Notice that I keep the same number of input rows here, whereas in my previous tests I kept the same number of output rows. My “tokenize” solution scales OK, but the PL/SQL function is much faster and scales even better.

In this case a combined SQL + PL/SQL solution beats the best pure SQL solution.

Splitting Strings: Proof!

In my previous post I used XMLTABLE and ora:tokenize to split a comma delimited string. Now I’ll apply that technique to multiple rows, and show that it’s faster than other methods.

Test data

In my tests, I configure the length of the substring, the number of substrings per row and the total number of rows I should get as output. Each input string is unique because it starts with the ID of the row: this way I avoid any caching that might reduce the number of function calls.

drop table t purge;

create table t cache as
with parms as (
  select 9 str_len, 5 num_subs, 100000 num_rows from dual
)
, str_row as (
  select listagg(n,',') within group(order by n) str
  from (
    select lpad(level+1,str_len,'0') n from parms
    connect by level <= num_subs-1
  )
)
select level id,
  lpad(level,str_len,'0') ||','||str str
from parms, str_row
connect by level <= num_rows/num_subs;

select * from t where id <= 11;
ID STR
1 000000001,000000002,000000003,000000004,000000005
2 000000002,000000002,000000003,000000004,000000005
3 000000003,000000002,000000003,000000004,000000005
4 000000004,000000002,000000003,000000004,000000005
5 000000005,000000002,000000003,000000004,000000005
6 000000006,000000002,000000003,000000004,000000005
7 000000007,000000002,000000003,000000004,000000005
8 000000008,000000002,000000003,000000004,000000005
9 000000009,000000002,000000003,000000004,000000005
10 000000010,000000002,000000003,000000004,000000005
11 000000011,000000002,000000003,000000004,000000005

 

Notice the CACHE keyword when I create the table. Before my tests, I access the entire table to make sure it is all in the buffer cache.

The “substr+instr” technique

This is the technique from my “New, Improved IN Lists” post. All I need to do is apply it to multiple rows.

One way to do that is to use the 12c LATERAL() clause. If you are not yet in 12c, try
TABLE(CAST(MULTISET(...) AS SYS.ODCIVARCHAR2LIST))

select a.id, b.subs from t a,
lateral(
  select substr(
    str,
    pos + 1,
    lead(pos,1,4000) over(order by pos) - pos - 1
  ) subs
  from (
    select instr(str, ',', 1, level) pos
    from dual
    connect by
      level <= length(str) - nvl(length(replace(str, ',', '')), 0) + 1
  )
) b;

The “tokenize” technique

This one is easy to adapt to multiple rows:

select id, subs from t, xmltable(
  'if (contains($X,",")) then ora:tokenize($X,"\,") else $X'
  passing str as X
  columns subs varchar2(4000) path '.'
);

[Update 2016-08-02: in a comment, Todd Hershiser points out that the second parameter in ora:tokenize is a regex expression. In order to use a regex “metacharacter” like “|” as a delimiter, I need to escape it with a backslash. I decided to put the backslash in everywhere since it doesn’t do any harm.

On the other hand, if the delimiter is ‘&’ then this solution cannot be used.]

The “regexp_substr” technique

This technique is fairly popular, no doubt because it is concise. For multiple rows, I use the “CONNECT BY ... PRIOR SYS_GUID()” technique that I explained in Chunking tables 7: prior sys_guid().

select id, regexp_substr (str, '[^,]+', 1, level) subs
from t
connect by level <= length (regexp_replace (str, '[^,]+')) + 1
and id = prior id
and prior sys_guid() is not null;

Test Results

As you can see here, the “substr+instr” solution is slightly better than “tokenize” when there are fewer than 8 substrings per string. As the number of substrings increases, the “tokenize” solution scales much better. As for the “regexp_substr” solution, it is on average 20 times slower than “tokenize”.

Splitting Strings: Surprise!

In my post New, Improved IN Lists!, I split one string into multiple rows. Now I want to split multiple input strings – but first, I’ve rediscovered an even faster technique!

In this post I’ll introduce the technique, and in the next post I’ll do some testing and comparing.

XMLTABLE with XMLTYPE

  • (Warning: the XML services in the Oracle database have evolved over the last several versions. Some of the solutions here may not work in older versions. Also, prior to version 12c a DBA could decide not to install “XML DB” at all! I have only tested in version 12.1.0.2.)

Quite a few folks have used the XMLTABLE function to solve this problem. One way is to change the string into an XML document: there is one root node that contains one child node per substring.

var txt varchar2(20);
exec :txt := 'A,BB,CCC,DDDD,EEEEE';

select '<r><c>'||replace(:txt, ',', '</c><c>')||'</c></r>' txt
from dual;

<r><c>A</c><c>BB</c><c>CCC</c><c>DDDD</c><c>EEEEE</c></r>

Using the XPATH expression ‘/r/c/text()‘, XMLTABLE will go through the child nodes and produce one row per substring.

select subs from xmltable(
  '/r/c/text()'
  passing xmltype('<r><c>'||replace(:txt, ',', '</c><c>')||'</c></r>')
  columns subs varchar2(4000) path '.'
);
SUBS
A
BB
CCC
DDDD
EEEEE

 

The main drawback of this solution, aside from performance, is that the input string cannot be a full 4000 bytes long.

[Update 2016-08-01: Marc Bleron rightly commented that I could use a CLOB as an input string and have way more that 4000 bytes. The limitation I mention is only true when the input string is a VARCHAR2.]

XMLTABLE with an XQUERY sequence

If I put double quotes (or single quotes) around all the substrings, then the result is an XQUERY sequence. XMLTABLE will simply output that sequence one row per item.

select '"'||replace(:txt, ',', '","')||'"' str 
from dual;

"A","BB","CCC","DDDD","EEEEE"

with data as (
  select '"'||replace(:txt, ',', '","')||'"' str 
  from dual
)
select xmlcast(column_value as varchar2(4000)) subs
from data, xmltable(str);
SUBS
A
BB
CCC
DDDD
EEEEE

Again, this solution breaks if the input string is too long.

Tokenize!

Why doesn’t Oracle just provide a function to split these confounded strings? It does!

Oracle XQuery function ora:tokenize lets you use a regular expression to split the input string … into a sequence of strings.

select subs from xmltable(
  'ora:tokenize($X, "\,")'
  passing :txt as X
  columns subs varchar2(4000) path '.'
);

How simple is that? Well, not as simple as I would like. The nasty Oracle developers have decided that this function should raise an exception if the delimiter is not present in the input string. In other words, ‘A,BB’ is OK but just ‘A’ will produce “ORA-19176: FORX0003: regular expression matches zero-length string”.

Marc Bleron, who has published the ora:tokenize solution, worked around this problem by concatenating an extra comma to the input string. I worked out this alternative that allows for a full 4000 byte VARCHAR2 input string:

select subs from xmltable(
    'if (contains($X,",")) then ora:tokenize($X,"\,") else $X'
  passing :txt as X
  columns subs varchar2(4000) path '.'
);

[Update 2016-08-02: in a comment on the next post, Todd Hershiser points out that the second parameter in ora:tokenize is a regex expression. In order to use a regex “metacharacter” like “|” as a delimiter, I need to escape it with a backslash. I decided to put the backslash in everywhere since it doesn’t do any harm.

[Update 2016-08-05: There is a problem with this solution if the ampersand & is involved. You have to convert it to &amp; in the input string and the delimiter.]

What’s Next?

In the next post I’ll compare the performance of these solutions with the substr+instr approach, and with a popular regexp_substr approach.

Spreadsheet-like Totals and Subtotals

We very often make spreadsheets with subtotals for each row and for each column. Someone on the OTN forum asked how to produce data in this format. I answered using the cool CUBE function.

The question used SCOTT.EMP as input. The requirement was to sum salaries by DEPTNO and JOB and to display them as follows:

JOB 10 20 30 TOTAL
ANALYST 6000 6000
CLERK 1300 1900 950 4150
MANAGER 2450 2975 2850 8275
PRESIDENT 5000 5000
SALESMAN 5600 5600
 Total 8750 10875 9400 29025

 

  • The white cells with numbers contain subtotals by DEPTNO and JOB;
  • the yellow cells (right hand column) contain subtotals by JOB;
  • the blue cells (bottom row) contain subtotals by DEPTNO;
  • and the green cell (bottom right) contains the grand total.

Getting all the totals

The CUBE extension to GROUP BY is ideal for this kind of cross-tabular report: it will generate everything we need with one SELECT and one table scan.

select deptno, job, sum(sal) sal
from scott.emp
group by cube(deptno, job);
DEPTNO JOB SAL
29025
CLERK 4150
ANALYST 6000
MANAGER 8275
SALESMAN 5600
PRESIDENT 5000
10 8750
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
20 10875
20 CLERK 1900
20 ANALYST 6000
20 MANAGER 2975
30 9400
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600

 

Formatting the output

Some folks whose opinion I respect say that formatting reports should be done outside of SQL. I agree in principle, but that didn’t stop me from answering the question using the PIVOT clause. As always with this clause, you have to know in advance how many columns you want to end up with!

The tricky part of this particular pivoting operation is handling NULLs correctly. For one thing, the JOB subtotals need to be pivoted to the rightmost column, but they have no DEPTNO value to pivot to. For another thing, the input might have NULLs in the JOB or DEPTNO columns, so I need a reliable way to identify the output rows that have subtotals.

I use the GROUPING() function to identify the subtotals:

  • When GROUPING(DEPTNO) is equal to 1, the row contains a JOB subtotal (or the grand total) and I have to assign an arbitrary DEPTNO value so I can pivot.
  • When GROUPING(JOB) is equal to 1, the row contains a DEPTNO subtotal (or the grand total) so after pivoting I output ‘Total’ in the JOB column of the last row.
select case gr_job when 1 then 'Total' else job end job,
  "10", "20", "30", "Total"
from (
  select case grouping(deptno) when 1 then -1 else deptno end deptno,
    job, grouping(job) gr_job, sum(sal) sal
  from scott.emp
  group by cube(deptno, job)
) 
pivot(  
  sum(sal) for deptno in (10, 20, 30, -1 as "Total")  
)  
order by gr_job, job;
JOB 10 20 30 Total
ANALYST 6000 6000
CLERK 1300 1900 950 4150
MANAGER 2450 2975 2850 8275
PRESIDENT 5000 5000
SALESMAN 5600 5600
Total 8750 10875 9400 29025

 

New, Improved IN Lists!

In SQL, the “varying IN list” problem comes up constantly and there are many published solutions. My new favorite solution is inspired from the video Synthesizing rows inside Oracle by Connor McDonald (start at the 6 minute mark).

The idea is to extract a series of delimited values from a string, and to present those values one row at a time. For example, the string
A,BB,CCC,DDDD,EEEEE

would become:
A
BB
CCC
DDDD
EEEEE

I’m not going to bother you with the other solutions, except for two remarks:

  1. Tom Kyte’s classic solution with SUBSTR() and INSTR() is a bit complicated and doesn’t work with strings longer than 3998 bytes.
  2. The popular REGEXP_SUBSTR() solution is a bit complicated and the REGEXP* functions use rather a lot of CPU.

Connor’s idea is efficient, works with the longest strings and seems slightly less complicated to me. However, that’s not going to stop me from explaining my variant in excruciating detail!

Generating rows

Like Connor, I’ll start with a reminder how easy and efficient it is to generate rows from DUAL using CONNECT BY:

select level from dual connect by level <= 5;
LEVEL
1
2
3
4
5

 

Extracting a value

So I have 5 rows, big deal… Now I need to put the right value in each row, which SUBSTR() can do. To extract the third value for example:

var txt varchar2(20);
exec :txt := 'A,BB,CCC,DDDD,EEEEE';

select :txt "Input",
  6 "Position",
  3 "Output length",
  substr(:txt, 6, 3) CCC
from dual;
Input Position Output length CCC
A,BB,CCC,DDDD,EEEEE 6 3 CCC

 

Locating the delimiters

As you probably know, or can guess, the key to getting the right positions is to locate the delimiters. This is a job for INSTR():

select :txt "Input",
  ',' "Substring",
  1 "Position",
  2 "Occurence",
  instr( :txt, ',', 1, 2 ) "Pos after BB"
from dual;
Input Substring Position Occurence Pos after BB
A,BB,CCC,DDDD,EEEEE , 1 2 5

 

Now let’s locate all the commas:

select instr(:txt, ',', 1, level) pos
from dual
connect by level <= 5;
POS
2
5
9
14
0

 

Do you see how clever Connor is there? There are only 4 commas, but 5 rows are needed. That last row contains a zero because INSTR() did not find a fifth comma. I can pretend that zero is a “virtual comma” before the first value.

[Update 2016-06-23: changed the CONNECT BY to avoid repeated unnecessary calls to INSTR().]

Of course, I mustn’t hard-code that “5” at the end of the statement. Instead,

  • I’ll calculate the length of the input;
  • I’ll calculate the length of the input without commas;
  • by subtraction I get the number of commas,
  • then I add 1 to get the number of output rows.
select instr(:txt, ',', 1, level) pos
from dual
connect by
  level <= length(:txt) - nvl(length(replace(:txt, ',', '')), 0) + 1;
POS
2
5
9
14
0

 

Putting it all together

Now that I have all my commas, with one “virtual comma” at position zero, I just add one to the comma positions and I have the starting point of each value. The length is equal to the next position, minus this position, minus 1.

select substr(
  :txt,
  pos + 1,
  lead(pos) over(order by pos) - pos - 1
) subs
from (
  select instr(:txt, ',', 1, level) pos
  from dual
  connect by
    level <= length(:txt) - nvl(length(replace(:txt, ',', '')), 0) + 1
);
SUBS
A
BB
CCC
DDDD

 

What happened on that last line? The LEAD() function tried to access a row that doesn’t exist, so it returned NULL. Fortunately, we can make LEAD() return another value in that situation:

select substr(
  :txt,
  pos + 1,
  lead(pos, 1, 4000) over(order by pos) - pos - 1
) subs
from (
  select instr(:txt, ',', 1, level) pos
  from dual
  connect by
    level <= length(:txt) - nvl(length(replace(:txt, ',', '')), 0) + 1
);
SUBS
A
BB
CCC
DDDD
EEEEE

 

Conclusion

This to me is an elegant and efficient solution that demonstrates at least 3 clever things we can do with Oracle SQL:

  1. Use CONNECT BY LEVEL to generate as many rows as we want;
  2. Use the zero, returned by INSTR() when it finds nothing, as a “virtual comma” before the first value;
  3. Use the “default” parameter of LEAD() to get a “virtual comma” after the last value.

Kim Berg Hansen on “Use Cases of Row Pattern Matching in Oracle 12c”

As I write this, I am listening to Kim Berg Hansen explain the MATCH_RECOGNIZE clause. He was kind enough to give me credit for some of the examples and mention this blog.

In addition to my blog posts on the subject, you may enjoy my presentation on SlideShare. Please download it to see the animations!

If you have questions arising from the presentation, please add a comment here.

Bravo to Kim for his remarkably interactive webinar!

Chunking tables 8: second solution

Here is the second complete solution to the problem of dividing a table into chunks of equal size. This is the solution whose first version was marked “correct” on OTN. It is the basis for what Bryn Llewellyn calls “Approx_Method_Sql_Ashton_2” in his white paper.

Current list of posts: 

Overview

  1. extents_data: get all extents and data object ids for the table
  2. extents_with_sums:
    1. treat all the table blocks as if they were one contiguous set
    2. number all the blocks in ascending sequence
    3. calculate total number of blocks
  3. filtered_extents: filter out the extents that do not contain a chunk boundary
  4. expanded_extents: expand every row that crosses chunk boundaries so that there will be one row per chunk
  5. rowids:
    1. calculate each ROWID
    2. group by chunk.

filtered_extents

with extents_data as (
  select /*+ qb_name(extents_data) */
    o.data_object_id, e.file_id, e.block_id, e.blocks
  from dba_extents e
  join all_objects o
  on e.owner = o.owner
    and e.segment_name = o.object_name
    and e.segment_type = o.object_type
    and decode(e.partition_name, o.subobject_name, 0, 1) = 0
  where e.segment_type like 'TABLE%'
    and e.owner = :owner
    and e.segment_name = :table_name
)
, extents_with_sums as (   
  select /*+ qb_name(extents_with_sums) */
    sum(blocks) over() as total_blocks, 
    sum(blocks) over(
      order by data_object_id, file_id, block_id
    ) - blocks as first_extent_block,   
    sum(blocks) over(
      order by data_object_id, file_id, block_id
    ) as next_extent_block, 
    e.* 
  from extents_data e 
) 
, filtered_extents as (  
  select /*+ qb_name(filtered_extents)*/ * from (
    select
      width_bucket(first_extent_block-1, 0, total_blocks, :num_chunks)
        as prev_chunk, 
      width_bucket(first_extent_block, 0, total_blocks, :num_chunks)
        as first_chunk, 
      width_bucket(next_extent_block-1, 0, total_blocks, :num_chunks)
        as last_chunk,  
      width_bucket(next_extent_block, 0, total_blocks, :num_chunks)
        as next_chunk, 
      e.* 
    from extents_with_sums e   
  )
  where prev_chunk < next_chunk
)

The first two named subqueries are identical to the first solution. The filtered_extents subquery differs only by calculating more columns, since I will need them all later:

  • PREV_CHUNK is the chunk to which the end of the previous extent belongs.
  • FIRST_CHUNK and LAST_CHUNK are the chunks containing the first and last block of the current extent.
  • NEXT_CHUNK is the chunk to which the start of the next extent belongs.

expanded_extents

, expanded_extents as (
  select /*+ qb_name(expanded_extents) */
    first_chunk + level - 1 as chunk,
    prev_chunk, next_chunk, data_object_id, file_id,
    block_id, total_blocks, first_extent_block
  from filtered_extents
  connect by first_extent_block = prior first_extent_block
    and prior sys_guid() is not null
    and first_chunk + level - 1 <= last_chunk
)

This subquery used the method explained in my previous post to expand each row that crosses chunk boundaries. The only difference among the expanded rows is the chunk number assigned to each.

rowids

, rowids as (
  select /*+ qb_name(rowids) */ chunk,
    case when chunk > prev_chunk then dbms_rowid.rowid_create(
      1, data_object_id, file_id,
      block_id + CEIL( (chunk-1) * total_blocks / :num_chunks - 1 ) + 1 - first_extent_block,
      0
    ) 
    end as start_rowid,
    case when chunk < next_chunk then dbms_rowid.rowid_create(
      1, data_object_id, file_id,
      block_id + CEIL( chunk * total_blocks / :num_chunks - 1 )  - first_extent_block,
      32767
    ) 
    end as end_rowid
  from expanded_extents
)
select chunk, min(start_rowid) start_rowid, max(end_rowid) end_rowid
from rowids
group by chunk;

This subquery calculates the starting and ending rowids for each chunk. It is much more compact than the way I did this previously.

By comparing the current chunk number with prev_chunk and next_chunk, I can tell if I should calculate a starting rowid, an ending rowid or both. To calculate the block number,

  • I start from the block_id of the extent
  • I add the block number of the target chunk boundary
  • then I subtract the number I assigned previously to the first block of the extent.

At this point, each chunk may be present in one row or two. A simple GROUP BY gives me one row per chunk.

Evolution

My version of this solution has changed over time, especially while writing this series.

  • I now make sure my numbering scheme always starts with zero.
  • I have also incorporated WIDTH_BUCKET() and changed the way I calculate chunk boundaries to be coherent.
  • Above all, I have simplified the code that says when to calculate the rowids, so that the KEEP (DENSE_RANK...) functions are no longer necessary. I apologize for the earlier complicated code.

By the way, here is the code all together if anyone wants to copy and paste.

with extents_data as (
  select /*+ qb_name(extents_data) */
    o.data_object_id, e.file_id, e.block_id, e.blocks
  from dba_extents e
  join all_objects o
  on e.owner = o.owner
    and e.segment_name = o.object_name
    and e.segment_type = o.object_type
    and decode(e.partition_name, o.subobject_name, 0, 1) = 0
  where e.segment_type like 'TABLE%'
    and e.owner = :owner
    and e.segment_name = :table_name
)
, extents_with_sums as (   
  select /*+ qb_name(extents_with_sums) */
    sum(blocks) over() as total_blocks, 
    sum(blocks) over(
      order by data_object_id, file_id, block_id
    ) - blocks as first_extent_block,   
    sum(blocks) over(
      order by data_object_id, file_id, block_id
    ) as next_extent_block, 
    e.* 
  from extents_data e 
) 
, filtered_extents as (  
  select /*+ qb_name(filtered_extents)*/ * from (
    select
      width_bucket(first_extent_block-1, 0, total_blocks, :num_chunks)
        as prev_chunk, 
      width_bucket(first_extent_block, 0, total_blocks, :num_chunks)
        as first_chunk, 
      width_bucket(next_extent_block-1, 0, total_blocks, :num_chunks)
        as last_chunk,  
      width_bucket(next_extent_block, 0, total_blocks, :num_chunks)
        as next_chunk, 
      e.* 
    from extents_with_sums e   
  )
  where prev_chunk < next_chunk
)
, expanded_extents as (
  select /*+ qb_name(expanded_extents) */
    first_chunk + level - 1 as chunk,
    prev_chunk, next_chunk, data_object_id, file_id,
    block_id, total_blocks, first_extent_block
  from filtered_extents
  connect by first_extent_block = prior first_extent_block
    and prior sys_guid() is not null
    and first_chunk + level - 1 <= last_chunk
)
, rowids as (
  select /*+ qb_name(rowids) */ chunk,
    case when chunk > prev_chunk then dbms_rowid.rowid_create(
      1, data_object_id, file_id,
      block_id + CEIL( (chunk-1) * total_blocks / :num_chunks - 1 ) + 1 - first_extent_block,
      0
    ) 
    end as start_rowid,
    case when chunk < next_chunk then dbms_rowid.rowid_create(
      1, data_object_id, file_id,
      block_id + CEIL( chunk * total_blocks / :num_chunks - 1 )  - first_extent_block,
      32767
    ) 
    end as end_rowid
  from expanded_extents
)
select chunk, min(start_rowid) start_rowid, max(end_rowid) end_rowid
from rowids
group by chunk;