In my previous post I used
ora:tokenize to split a comma delimited string. Now I’ll apply that technique to multiple rows, and show that it’s faster than other methods.
In my tests, I configure the length of the substring, the number of substrings per row and the total number of rows I should get as output. Each input string is unique because it starts with the ID of the row: this way I avoid any caching that might reduce the number of function calls.
drop table t purge; create table t cache as with parms as ( select 9 str_len, 5 num_subs, 100000 num_rows from dual ) , str_row as ( select listagg(n,',') within group(order by n) str from ( select lpad(level+1,str_len,'0') n from parms connect by level <= num_subs-1 ) ) select level id, lpad(level,str_len,'0') ||','||str str from parms, str_row connect by level <= num_rows/num_subs; select * from t where id <= 11;
CACHE keyword when I create the table. Before my tests, I access the entire table to make sure it is all in the buffer cache.
The “substr+instr” technique
This is the technique from my “New, Improved IN Lists” post. All I need to do is apply it to multiple rows.
One way to do that is to use the 12c
LATERAL() clause. If you are not yet in 12c, try
TABLE(CAST(MULTISET(...) AS SYS.ODCIVARCHAR2LIST))
select a.id, b.subs from t a, lateral( select substr( str, pos + 1, lead(pos,1,4000) over(order by pos) - pos - 1 ) subs from ( select instr(str, ',', 1, level) pos from dual connect by level <= length(str) - nvl(length(replace(str, ',', '')), 0) + 1 ) ) b;
The “tokenize” technique
This one is easy to adapt to multiple rows:
select id, subs from t, xmltable( 'if (contains($X,",")) then ora:tokenize($X,"\,") else $X' passing str as X columns subs varchar2(4000) path '.' );
[Update 2016-08-02: in a comment, Todd Hershiser points out that the second parameter in ora:tokenize is a regex expression. In order to use a regex “metacharacter” like “|” as a delimiter, I need to escape it with a backslash. I decided to put the backslash in everywhere since it doesn’t do any harm.
On the other hand, if the delimiter is ‘&’ then this solution cannot be used.]
The “regexp_substr” technique
This technique is fairly popular, no doubt because it is concise. For multiple rows, I use the “
CONNECT BY ... PRIOR SYS_GUID()” technique that I explained in Chunking tables 7: prior sys_guid().
select id, regexp_substr (str, '[^,]+', 1, level) subs from t connect by level <= length (regexp_replace (str, '[^,]+')) + 1 and id = prior id and prior sys_guid() is not null;
As you can see here, the “substr+instr” solution is slightly better than “tokenize” when there are fewer than 8 substrings per string. As the number of substrings increases, the “tokenize” solution scales much better. As for the “regexp_substr” solution, it is on average 20 times slower than “tokenize”.