Like many of you, I have enjoyed using SQL to make data processing go hundreds of times faster. My new record is: 8000 times faster :-)
In a recent ODC thread, the question was how to split out comma separated values from two CLOBs, one containing “item ids” and the other “item keys”. The ids and the keys had to be matched by position: the first id with the first key, the second id with the second key, and so on. Then each id/key pair was to be inserted as a row into a target table.
With 50,000 ids of 7 characters and 50,000 keys of 172 characters, the current processing takes 2 hours.
First I’ll create a table with the two CLOBs.
--drop table t purge; create table t as select SUBSTR( xmlcast( xmlagg( xmlelement(E, ',' || to_char(level,'fm0000009')) order by level ) AS CLOB ), 2 ) i_item_id, SUBSTR( xmlcast( xmlagg( xmlelement(E, ',' || rpad(to_char(level,'fm0000009'),172,'*')) order by level ) AS CLOB ), 2 ) i_item_key from dual connect by level <= 50000;
Splitting CLOB strings
Instead of seeking a “perfect” custom-made solution, I just put together two functions I have already written about:
- The PIPE_CLOB function: a pipelined table function that reads a CLOB and outputs up to 4000 bytes at a time, respecting a delimiter.
- The STRING_TOKENIZE function: a pipelined table function that splits a VARCHAR2 string according to a delimiter.
All I have to do is feed the output from the first function into the second, and voilà!
select b.column_value item_id from t, table(pipe_clob( i_item_id, 4000, ',')) a, table(string_tokenize(a.column_value,',')) b;
Combining the two CLOBs
Remember, the requirement was to combine the values from both CLOBs by position, in order to insert them into a target table. Since the functions return the values in order, I just use ROWNUM to number the values and use that number to join them.
--drop table u purge; create table u as with item_ids as ( select rownum rn, b.column_value item_id from t, table(pipe_clob( i_item_id, 4000, ',')) a, table(string_tokenize(a.column_value,',')) b ) , item_keys as ( select rownum rn, b.column_value item_key from t, table(pipe_clob( i_item_key, 4000, ',')) a, table(string_tokenize(a.column_value,',')) b ) select item_id, item_key from item_ids join item_keys using(rn); Table U created. Elapsed: 00:00:00.879
That’s right, processing time went from two hours to less than one second.
Now, I suspect that the questioner’s code is doing more than just a simple insert, but until he or she provides more details about the requirement, I get to maintain my claim:
8000 times faster!
(which I hope applies only to my code, not to the speed at which time flies…)