Hierarchical JSON

At last, a use case where PL/SQL is faster than SQL : avoiding unnecessary ordering !

In the last few years, different solutions have been found to represent hierarchical data in JSON format :

I won’t go into the SQL solutions in detail. Basically there are three stages :

  1. Get the data in hierarchical order and return the order, the level, and the data as a JSON object.
  2. Using the LEAD() and LAG() functions, determine whether the current JSON object is a child, a parent or a sibling and apply the appropriate JSON “glue”.
  3. Aggregate the result into a CLOB, preserving the original order.

Notice that stages 2 and 3 require the data to be in the order established by stage 1. In the SQL solutions, the execution plan shows that unnecessary sorts were done, sometimes requiring the use of TEMP space : yuck !

I finally realised that a PL/SQL function could read the output from stage 1, and for each input row add the JSON “glue” and append the result to the CLOB that the function returns. No extra sorting, and it runs about 40% faster than the most optimal SQL solution. Let me explain a few details :

  • The main input is a CURSOR expression : this is not a literal, but a real live SELECT statement that can include bind variables. It returns only the level and a JSON object with the desired data, so it is quite generic. The ORDER SIBLINGS BY clause is optional.
  • The input parameter is a strongly typed REF CURSOR, which allows it to be fetched into a table of P/SQL records.
  • The input is fetched using BULK COLLECT with LIMIT, which uses little memory. Special handling is needed, because I need to know the levels of the rows immediately preceding and following the row being processed.
  • If a JSON object is a parent, I add a name / value pair where the value is an array. The name is by default ‘children’ but it is configurable by an optional second parameter.
  • Concatenating text to a CLOB is a costly operation. To speed it up, I fill up a VARCHAR2 buffer, then concatenate the buffer to the CLOB.
  • Use PLSQL_OPTIMIZE_LEVEL=3 in order to “inline” the calls to APPEND_JSO. This shaves off about .2 seconds per million rows processed.
create or replace package hier_to_clob is
  type t_hier_rec is record(
    lvl pls_integer,
    jso varchar2(4000)
  );
  type tt_hier_rec is table of t_hier_rec;
  TYPE t_hier_rc IS REF CURSOR RETURN t_hier_rec;
  function get(
    p_hier_rc in t_hier_rc,
    p_array_name varchar2 default 'children',
    p_limit integer default 500
  ) return clob;
end hier_to_clob;
/
create or replace package body hier_to_clob is
  function get(
    p_hier_rc in t_hier_rc,
    p_array_name varchar2 default 'children',
    p_limit integer default 500
  ) return clob is
    l_array_name varchar2(130) := ',"'||p_array_name||'":[';
    lt_hier_rec tt_hier_rec;
    l_hier_rec_prev t_hier_rec := t_hier_rec(0, null);
    l_lvl_prev2 pls_integer := 0;
    l_clob clob;
    l_buffer varchar2(32767) := null;
    l_buflen pls_integer := 0;
    do_prev boolean:= false;
    procedure append_jso(
      p_jso varchar2,
      p_lvl_prev pls_integer,
      p_lvl pls_integer,
      p_lvl_next pls_integer
    ) is
      l_jso varchar2(4000);
    begin
      l_jso :=
        case
          when p_lvl_prev = 0 then null
          when p_lvl - p_lvl_prev = 1 then l_array_name
          when p_lvl > 1 then ','
        end ||
        rtrim(p_jso, '}') ||
        rpad('}', (p_lvl - p_lvl_next) * 2 + 1, ']}');

      if l_buflen + lengthb(l_jso) > 32767 then
        l_clob := l_clob || l_buffer;
        l_buffer := l_jso;
        l_buflen := lengthb(l_buffer);
      else
        l_buffer := l_buffer || l_jso;
        l_buflen := l_buflen + lengthb(l_jso);
      end if;
    end append_jso;
  begin
    loop
      fetch p_hier_rc bulk collect into lt_hier_rec limit p_limit;
      if do_prev then
        append_jso(
          l_hier_rec_prev.jso,
          l_lvl_prev2,
          l_hier_rec_prev.lvl,
          case when lt_hier_rec.count > 0 then lt_hier_rec(1).lvl else 1 end
        );
        do_prev := false;
      end if;
      for i in 1..lt_hier_rec.count-1 loop
        append_jso(
          lt_hier_rec(i).jso,
          case
            when i = 1 then l_hier_rec_prev.lvl
            else lt_hier_rec(i-1).lvl
          end,
          lt_hier_rec(i).lvl,
          lt_hier_rec(i+1).lvl
        );
      end loop;
      if lt_hier_rec.count > 0 then
        l_lvl_prev2 :=
          case lt_hier_rec.count
            when 1 then l_hier_rec_prev.lvl
            else lt_hier_rec(lt_hier_rec.count-1).lvl
          end;
        l_hier_rec_prev := lt_hier_rec(lt_hier_rec.count);
        do_prev := true;
      end if;
      exit when p_hier_rc%notfound;
    end loop;
    if do_prev then
      append_jso(
        l_hier_rec_prev.jso,
        l_lvl_prev2,
        l_hier_rec_prev.lvl,
        1
      );
    end if;
    if l_buflen > 0 then
      l_clob := l_clob || l_buffer;
    end if;
    return l_clob;
  end get;
end hier_to_clob;
/
select hier_to_clob.get(
  cursor(
    select level, json_object(empno, ename)
    from emp
    start with empno = 7566
    connect by mgr = prior empno
  ),
  'grunts'
)
from dual;
/* pretty printing done outside of the function */
{
  "empno" : 7566,
  "ename" : "JONES",
  "grunts" :
  [
    {
      "empno" : 7788,
      "ename" : "SCOTT",
      "grunts" :
      [
        {
          "empno" : 7876,
          "ename" : "ADAMS"
        }
      ]
    },
    {
      "empno" : 7902,
      "ename" : "FORD",
      "grunts" :
      [
        {
          "empno" : 7369,
          "ename" : "SMITH"
        }
      ]
    }
  ]
}
Advertisement

My Presentations at #UKOUG #techfest19

I just finished my second presentation at TechFest19 in sunny Brighton, England. The conference this year is great from every point of view: venue, people, content, conversations… Thanks to those who came to my talks for your benevolent attention and excellent questions/remarks!

Both my presentations are now on SlideShare. I beg you to download them so the animations work; if you don’t, you will see a jumble.

Best wishes to all for the end of the conference and the trip home.

Regards, Stew

Make your Data Dance: UNPIVOT, PIVOT and GROUP BY extensions

JSON in Oracle database 18c and 19c

Making Longer Lists

For very long lists, we need the return values to be CLOBs. Alas, LISTAGG can only return VARCHAR2 lists, but XML and JSON aggregate functions can return CLOBs!

As an exercise, I’ll start from ALL_OBJECTS: I want to make a list of all fully qualified object names for each object type. To start with, a simple subquery:

with data as (
  select object_type, owner ||'.'|| object_name obj
  from all_objects
)
select * from data;

OBJECT_TYPE  OBJ
INDEX	     SYS.I_FILE#_BLOCK#
INDEX	     SYS.I_OBJ3
INDEX	     SYS.I_TS1
INDEX	     SYS.I_CON1
...

XML

XML functions work on the XMLTYPE data type, which has practically no length limitations. To make my list, I have to go through three steps:

  1. Make an XML fragment out of each object.
    This step will escape any special characters such as '<' or '>'.
  2. Aggregate the XML fragments.
  3. Extract the text from the aggregated XML data.
    This step will unescape the special characters.

I have to handle myself the comma that separates the elements of the list.

with data as (
  select object_type, owner ||'.'|| object_name obj
  from all_objects
)
select object_type,
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(O, ',' || obj)
      order by obj
    ) AS CLOB
  ), 2
) qualified_objects
from data
group by object_type
order by object_type;

On my system with about 78000 objects, this takes 3 seconds and a bit.

JSON

JSON support in the Oracle database is pretty new. For full support of CLOBs, we need versions 18c or later (or a patched version 12.2, but I don’t know the precise version number).

The steps are a little different with JSON:

  1. I can aggregate the data directly into a JSON array.
    This escapes any special characters like double quotes, newlines, etc.
  2. To turn the multiple array items into one (which is my list), I replace "," (which separates the array items) with my separator.
  3. I then use JSON_VALUE to extract the list from the array.
    This unescapes any character that needs it.
with data as (
  select object_type, owner ||'.'|| object_name obj
  from all_objects
)
select object_type,
  json_value(
    replace(
      json_arrayagg(obj order by obj returning clob),
      '","',
      ','
    ),
    '$[0]' returning clob
  ) qualified_objects
from data
group by object_type
order by object_type;

Instead of 3+ seconds, this code takes a bit over 1 second, so three times faster.

As a reminder, JSON is great for going the other way: split the items of a list into a table. See Splitting Strings: a New Champion!

My #OUGIreland19 Presentation on JSON

Thanks to those who came to my presentation today at OUG Ireland 2019. As promised, here is the link to the slides.

I had a great time at the conference, meeting new folks and spending time (but never enough) with the less new. The talks I attended were uniformly excellent. Special mention to Matt Mulvaney who again made me marvel at APEX, even though he showed how to go beyond what it can currently do.

Have safe trips home, everybody!

Best regards,
Stew

Extract from CLOB with JSON objects

On AskTOM, Kim Berg Hansen recently used JSON_OBJECT to parse CLOB data in “name=value” format. I added a variant based on my work with CSV data.

Kim decided to use objects instead of arrays for an excellent and important reason:

  • a JSON object is an unordered collection of name/value pairs.
  • a JSON array is an ordered list of values.

CSV data is an ordered list of values, going from left to right, so a JSON array is the obvious choice. The AskTOM question concerned name/value pairs that were not necessarily in the same order and not necessarily all there! A JSON object was the natural choice.

I’m really liking JSON more and more: not only is it simple, but it seems pretty easy to figure out how best to use it for different problems.

Instead of copying everything here, I’m going to be lazy for once and invite you to read my contribution here:

More on JSON_TABLE

You will see some of the same ideas:

  • Use PIPE_CLOB to cut the CLOB into VARCHAR sized bites;
  • Use JSON_ARRAY on the whole bite to escape characters if needed;
  • Use REPLACE to form an overall array, but this time of objects;
  • Then use JSON_TABLE to generate one row per object and one column per value.

Hope this helps…

Extract CSV from CLOB with JSON arrays

Marc Bleron blogged about CSV CLOBs and JSON_TABLE two years ago. Here’s my contribution to improve on a great idea.

“Simple” CSV

The Oracle Database does a fine job of parsing CSV data in flat files, using the External Tables facility. Unfortunately, this service is not available for CSV data in VARCHARs or CLOBs. Marc showed that JSON_TABLE (and XML_TABLE) can parse “simple” CSV if it is reformatted. What is “simple”?

CSV data consists of records and fields within records.

  • Records are delimited by NEWLINE (or some other string).
  • Fields are terminated by commas (or some other string),
  • If necessary, some or all fields can be enclosed by double quotes " (or some other string).

When Marc says “simple”, he means that fields are never enclosed. This is important, because enclosed fields may contain the double quote (provided it is present twice in a row) and / or the comma. With “simple” CSV, we know that all commas are true field terminators and we don’t have to replace "" with " .

“Simple” also means that there is no trimming of whitespace: you get everything between the commas.

Finally, Marc assumes there is no terminator after the last field of the record, even though Oracle allows it.

So, “simple” CSV has delimited records with terminated fields that are never enclosed. There is no trimming of whitespace and the last field in the record is not terminated.

My contribution

  • First of all, I break the CLOB into VARCHAR2 bites using the pipelined table function PIPE_CLOB (as explained in my previous post).
  • Then I remove any field terminator that immediately precedes a record delimiter.
  • Then I use JSON_ARRAY over the entire VARCHAR2 in case some characters need to be escaped.
  • Then I do several REPLACES such that:
    • each record becomes a JSON array of string values, and
    • those arrays are included in one overall array.
  • Finally, I use JSON_TABLE to break the overall array into rows and the inner arrays into columns.

Note that everything before the COLUMNS clause in JSON_TABLE is generic, because the inner arrays can contain any number of elements.

To demonstrate, here is a CLOB containing data from the EMP table, with a trailing comma added:

7369,SMITH,CLERK,7902,1980-12-17T00:00:00,800,,20,
7499,ALLEN,SALESMAN,7698,1981-02-20T00:00:00,1600,300,30,
7521,WARD,SALESMAN,7698,1981-02-22T00:00:00,1250,500,30,
7566,JONES,MANAGER,7839,1981-04-02T00:00:00,2975,,20,
7654,MARTIN,SALESMAN,7698,1981-09-28T00:00:00,1250,1400,30,
7698,BLAKE,MANAGER,7839,1981-05-01T00:00:00,2850,,30,
7782,CLARK,MANAGER,7839,1981-06-09T00:00:00,2450,,10,
7839,KING,PRESIDENT,,1981-11-17T00:00:00,5000,,10,
7844,TURNER,SALESMAN,7698,1981-09-08T00:00:00,1500,0,30,
7876,ADAMS,CLERK,7788,1987-05-23T00:00:00,1100,,20,
7900,JAMES,CLERK,7698,1981-12-03T00:00:00,950,,30,
7902,FORD,ANALYST,7566,1981-12-03T00:00:00,3000,,20,
7934,MILLER,CLERK,7782,1982-01-23T00:00:00,1300,,10,

And the code:

with last_term_removed as (
  select replace(column_value, ','||chr(10), chr(10)) str
  from table(pipe_clob((select c from t), 3500))
)
, json_data as (
  select '[' ||
    replace (
      replace(json_array(str), ',', '","'),
      '\n',
      '"],["'
    )
    || ']' jstr  
  from last_term_removed
)
select sql_data.*
from json_data j, json_table(
   j.jstr, '$[*]'
   columns empno    number        path '$[0]'
         , ename    varchar2(128) path '$[1]'
         , job      varchar2(128) path '$[2]'
         , mgr      number        path '$[3]'
         , hiredate date          path '$[4]'
         , sal      number        path '$[5]'
         , comm     number        path '$[6]'
         , deptno   number        path '$[7]'
) sql_data;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
7369 SMITH CLERK 7902 1980-12-17 800 20
7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
7566 JONES MANAGER 7839 1981-04-02 2975 20
7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
7698 BLAKE MANAGER 7839 1981-05-01 2850 30
7782 CLARK MANAGER 7839 1981-06-09 2450 10
7839 KING PRESIDENT 1981-11-17 5000 10
7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30
7876 ADAMS CLERK 7788 1987-05-23 1100 20
7900 JAMES CLERK 7698 1981-12-03 950 30
7902 FORD ANALYST 7566 1981-12-03 3000 20
7934 MILLER CLERK 7782 1982-01-23 1300 10

 

Scalability: yes!

When Marc tested reading CLOBs directly, performance went bad as the CLOB increased in size:

Rows Seconds
91336 2.2
182672 4.3
365344 9.4
730688 22
1461376 840

 

In my tests with very similar data, the number of rows per second remains about the same:

LINES SECS LINES_PER_SEC AVG_LINES_PER_SEC PCT_DIFF
91336 1.374 66475 68517 -3
182672 2.6 70258 68517 2.5
365344 5.35 68289 68517 -.3
730688 10 73069 68517 6.6
1461376 22 66426 68517 -3.1

 

Splitting Strings: a New Champion!

My last post on splitting strings conceded that “a combined SQL + PL/SQL solution beats the best pure SQL solution.” Wrong! I didn’t try JSON_TABLE.

Pretend it’s an array

It’s not hard to reformat a string like

'a,b,c,d,e,"'

to be a JSON array:

select replace(json_array('a,b,c,d,e,"'), ',', '","') jstr
from dual;

JSTR
--------------------------
["a","b","c","d","e","\""]

Notice that the double quote in the data itself is properly escaped by the call to JSON_ARRAY. Then I turn the string into a series of array elements by putting double quotes around each comma. Once all that is done, JSON_TABLE will split it like this:

select subs from json_table(
  replace(json_array('a,b,c,d,e,"'), ',', '","'),
  '$[*]' columns (
    subs varchar2(4000) path '$'
  )
);

SUBS
----
a
b
c
d
e
"

So the escaped data is automatically unescaped. Now for speed, here is my test harness:

[Update 2022-12-27: the code for the test harness only worked if you first ran the code from a previous post. It should now work all by itself. 2023-01-02: the test harness now correctly puts JSON_ARRAY around the input string.]

set serveroutput on

drop table t purge;

create table t(id number, str varchar2(4000)) cache nologging;

create or replace function string_tokenize(
  p_string in varchar2,
  p_delim in varchar2
)
return sys.odcivarchar2list pipelined
as
  i_prev_pos integer := 1;
  i_pos integer;
  i_max_pos integer := length(p_string) + 1;
  i_delim_length integer := length(p_delim);
begin
  loop
    i_pos := instr(p_string, p_delim, i_prev_pos);
    if i_pos = 0 then
      i_pos := i_max_pos;
    end if;
    pipe row(substr(p_string, i_prev_pos, i_pos - i_prev_pos));
    exit when i_pos = i_max_pos;
    i_prev_pos := i_pos + i_delim_length;
  end loop;
  return;
end string_tokenize;
/

declare
  l_num number;
  l_timestamp timestamp;
  l_plsql_secs number;
  l_JSON_secs number;
  l_num_substrings number := 10;
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
begin
  insert into t
  select level, to_char(level,'fm000000000')||',000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010'
  from dual
  connect by level <= 10000;
  commit;
  dbms_output.put_line('Substrings' || chr(9) || 'JSON' || chr(9) || 'PL/SQL');
  for i in 1..10 loop
    select count(*) into l_num from t;
      
    l_timestamp := localtimestamp;
    select count(column_value) into l_num from (
      select id, column_value from t, table(string_tokenize(str, ','))
    );
    l_plsql_secs := extract(second from localtimestamp - l_timestamp);
      
    l_timestamp := localtimestamp;
    select count(subs) into l_num from (
      select id, subs from t, JSON_table(
        replace(json_array(str), ',' , '","'), '$[*]' columns(
      subs varchar2(99) path '$'
      )
    )
    );
    l_JSON_secs := extract(second from localtimestamp - l_timestamp);
    dbms_output.put_line(l_num_substrings || chr(9) || l_JSON_secs || chr(9) || l_plsql_secs);
    update t set str =
    str || ',000000001,000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010';
    l_num_substrings := l_num_substrings + 10;
    commit;
  end loop;
end;
/

Notice that I keep the same number of input rows here and increase the number of substrings per row.

PL/SQL does great, but JSON_TABLE appears to beat it.