Splitting Strings: a New Champion!

My last post on splitting strings conceded that “a combined SQL + PL/SQL solution beats the best pure SQL solution.” Wrong! I didn’t try JSON_TABLE.

Pretend it’s an array

It’s not hard to reformat a string like

'a,b,c,d,e,"'

to be a JSON array:

select replace(json_array('a,b,c,d,e,"'), ',', '","') jstr
from dual;

JSTR
--------------------------
["a","b","c","d","e","\""]

Notice that the double quote in the data itself is properly escaped by the call to JSON_ARRAY. Then I turn the string into a series of array elements by putting double quotes around each comma. Once all that is done, JSON_TABLE will split it like this:

select subs from json_table(
  replace(json_array('a,b,c,d,e,"'), ',', '","'),
  '$[*]' columns (
    subs varchar2(4000) path '$'
  )
);

SUBS
----
a
b
c
d
e
"

So the escaped data is automatically unescaped. Now for speed, here is my test harness:

[Update 2022-12-27: the code for the test harness only worked if you first ran the code from a previous post. It should now work all by itself. 2023-01-02: the test harness now correctly puts JSON_ARRAY around the input string.]

set serveroutput on

drop table t purge;

create table t(id number, str varchar2(4000)) cache nologging;

create or replace function string_tokenize(
  p_string in varchar2,
  p_delim in varchar2
)
return sys.odcivarchar2list pipelined
as
  i_prev_pos integer := 1;
  i_pos integer;
  i_max_pos integer := length(p_string) + 1;
  i_delim_length integer := length(p_delim);
begin
  loop
    i_pos := instr(p_string, p_delim, i_prev_pos);
    if i_pos = 0 then
      i_pos := i_max_pos;
    end if;
    pipe row(substr(p_string, i_prev_pos, i_pos - i_prev_pos));
    exit when i_pos = i_max_pos;
    i_prev_pos := i_pos + i_delim_length;
  end loop;
  return;
end string_tokenize;
/

declare
  l_num number;
  l_timestamp timestamp;
  l_plsql_secs number;
  l_JSON_secs number;
  l_num_substrings number := 10;
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
begin
  insert into t
  select level, to_char(level,'fm000000000')||',000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010'
  from dual
  connect by level <= 10000;
  commit;
  dbms_output.put_line('Substrings' || chr(9) || 'JSON' || chr(9) || 'PL/SQL');
  for i in 1..10 loop
    select count(*) into l_num from t;
      
    l_timestamp := localtimestamp;
    select count(column_value) into l_num from (
      select id, column_value from t, table(string_tokenize(str, ','))
    );
    l_plsql_secs := extract(second from localtimestamp - l_timestamp);
      
    l_timestamp := localtimestamp;
    select count(subs) into l_num from (
      select id, subs from t, JSON_table(
        replace(json_array(str), ',' , '","'), '$[*]' columns(
      subs varchar2(99) path '$'
      )
    )
    );
    l_JSON_secs := extract(second from localtimestamp - l_timestamp);
    dbms_output.put_line(l_num_substrings || chr(9) || l_JSON_secs || chr(9) || l_plsql_secs);
    update t set str =
    str || ',000000001,000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010';
    l_num_substrings := l_num_substrings + 10;
    commit;
  end loop;
end;
/

Notice that I keep the same number of input rows here and increase the number of substrings per row.

PL/SQL does great, but JSON_TABLE appears to beat it.

Advertisement

3 thoughts on “Splitting Strings: a New Champion!

    • Hi Marc,

      Good to hear from you, and thanks for your kind words.

      If you didn’t notice, you are on my blogroll so it stands to reason that I have already read the post you reference ;) I’ll be mentioning a very similar use case in my next post.

      In my case I can call JSON_ARRAY for the string itself, which escapes the reserved characters. See for example the difference between

      with data as (
      select 'a,\' x from dual
      )
      select * from data, json_table(
        '[["' || replace(x, ',', '","') || '"]]',
        '$[*]' columns(
          a varchar2(128) path '$[0]',
          b varchar2(128) path '$[1]'
        )
      );

      and

      with data as (
      select 'a,\' x from dual
      )
      select * from data, json_table(
        '[' || replace(json_array(x), ',', '","') || ']' ,
        '$[*]' columns(
          a varchar2(128) path '$[0]',
          b varchar2(128) path '$[1]'
        )
      );

      Best regards, Stew

  1. Pingback: Making Longer Lists | An Oracle Programmer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s