My last post on splitting strings conceded that “a combined SQL + PL/SQL solution beats the best pure SQL solution.” Wrong! I didn’t try JSON_TABLE.
Pretend it’s an array
It’s not hard to reformat a string like
'a,b,c,d,e,"'
to be a JSON array:
select replace(json_array('a,b,c,d,e,"'), ',', '","') jstr from dual; JSTR -------------------------- ["a","b","c","d","e","\""]
Notice that the double quote in the data itself is properly escaped by the call to JSON_ARRAY. Then I turn the string into a series of array elements by putting double quotes around each comma. Once all that is done, JSON_TABLE will split it like this:
select subs from json_table( replace(json_array('a,b,c,d,e,"'), ',', '","'), '$[*]' columns ( subs varchar2(4000) path '$' ) ); SUBS ---- a b c d e "
So the escaped data is automatically unescaped. Now for speed, here is my test harness:
[Update 2022-12-27: the code for the test harness only worked if you first ran the code from a previous post. It should now work all by itself. 2023-01-02: the test harness now correctly puts JSON_ARRAY around the input string.]
set serveroutput on drop table t purge; create table t(id number, str varchar2(4000)) cache nologging; create or replace function string_tokenize( p_string in varchar2, p_delim in varchar2 ) return sys.odcivarchar2list pipelined as i_prev_pos integer := 1; i_pos integer; i_max_pos integer := length(p_string) + 1; i_delim_length integer := length(p_delim); begin loop i_pos := instr(p_string, p_delim, i_prev_pos); if i_pos = 0 then i_pos := i_max_pos; end if; pipe row(substr(p_string, i_prev_pos, i_pos - i_prev_pos)); exit when i_pos = i_max_pos; i_prev_pos := i_pos + i_delim_length; end loop; return; end string_tokenize; / declare l_num number; l_timestamp timestamp; l_plsql_secs number; l_JSON_secs number; l_num_substrings number := 10; procedure do(p_sql in varchar2) is begin dbms_output.put_line(p_sql); execute immediate p_sql; end do; begin insert into t select level, to_char(level,'fm000000000')||',000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010' from dual connect by level <= 10000; commit; dbms_output.put_line('Substrings' || chr(9) || 'JSON' || chr(9) || 'PL/SQL'); for i in 1..10 loop select count(*) into l_num from t; l_timestamp := localtimestamp; select count(column_value) into l_num from ( select id, column_value from t, table(string_tokenize(str, ',')) ); l_plsql_secs := extract(second from localtimestamp - l_timestamp); l_timestamp := localtimestamp; select count(subs) into l_num from ( select id, subs from t, JSON_table( replace(json_array(str), ',' , '","'), '$[*]' columns( subs varchar2(99) path '$' ) ) ); l_JSON_secs := extract(second from localtimestamp - l_timestamp); dbms_output.put_line(l_num_substrings || chr(9) || l_JSON_secs || chr(9) || l_plsql_secs); update t set str = str || ',000000001,000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010'; l_num_substrings := l_num_substrings + 10; commit; end loop; end; /
Notice that I keep the same number of input rows here and increase the number of substrings per row.
PL/SQL does great, but JSON_TABLE appears to beat it.
Hi Stew,
If you’re interested, I blogged about a similar usage for JSON_TABLE a couple of years ago, for parsing CSV structures :
https://odieweblog.wordpress.com/2016/04/22/simple-csv-parsing-using-xmltable-or-json_table/
BTW, congrats for your ACE nomination!
Hi Marc,
Good to hear from you, and thanks for your kind words.
If you didn’t notice, you are on my blogroll so it stands to reason that I have already read the post you reference ;) I’ll be mentioning a very similar use case in my next post.
In my case I can call JSON_ARRAY for the string itself, which escapes the reserved characters. See for example the difference between
and
Best regards, Stew
Pingback: Making Longer Lists | An Oracle Programmer