Hierarchical JSON

At last, a use case where PL/SQL is faster than SQL : avoiding unnecessary ordering !

In the last few years, different solutions have been found to represent hierarchical data in JSON format :

I won’t go into the SQL solutions in detail. Basically there are three stages :

  1. Get the data in hierarchical order and return the order, the level, and the data as a JSON object.
  2. Using the LEAD() and LAG() functions, determine whether the current JSON object is a child, a parent or a sibling and apply the appropriate JSON “glue”.
  3. Aggregate the result into a CLOB, preserving the original order.

Notice that stages 2 and 3 require the data to be in the order established by stage 1. In the SQL solutions, the execution plan shows that unnecessary sorts were done, sometimes requiring the use of TEMP space : yuck !

I finally realised that a PL/SQL function could read the output from stage 1, and for each input row add the JSON “glue” and append the result to the CLOB that the function returns. No extra sorting, and it runs about 40% faster than the most optimal SQL solution. Let me explain a few details :

  • The main input is a CURSOR expression : this is not a literal, but a real live SELECT statement that can include bind variables. It returns only the level and a JSON object with the desired data, so it is quite generic. The ORDER SIBLINGS BY clause is optional.
  • The input parameter is a strongly typed REF CURSOR, which allows it to be fetched into a table of P/SQL records.
  • The input is fetched using BULK COLLECT with LIMIT, which uses little memory. Special handling is needed, because I need to know the levels of the rows immediately preceding and following the row being processed.
  • If a JSON object is a parent, I add a name / value pair where the value is an array. The name is by default ‘children’ but it is configurable by an optional second parameter.
  • Concatenating text to a CLOB is a costly operation. To speed it up, I fill up a VARCHAR2 buffer, then concatenate the buffer to the CLOB.
  • Use PLSQL_OPTIMIZE_LEVEL=3 in order to “inline” the calls to APPEND_JSO. This shaves off about .2 seconds per million rows processed.
create or replace package hier_to_clob is
  type t_hier_rec is record(
    lvl pls_integer,
    jso varchar2(4000)
  );
  type tt_hier_rec is table of t_hier_rec;
  TYPE t_hier_rc IS REF CURSOR RETURN t_hier_rec;
  function get(
    p_hier_rc in t_hier_rc,
    p_array_name varchar2 default 'children',
    p_limit integer default 500
  ) return clob;
end hier_to_clob;
/
create or replace package body hier_to_clob is
  function get(
    p_hier_rc in t_hier_rc,
    p_array_name varchar2 default 'children',
    p_limit integer default 500
  ) return clob is
    l_array_name varchar2(130) := ',"'||p_array_name||'":[';
    lt_hier_rec tt_hier_rec;
    l_hier_rec_prev t_hier_rec := t_hier_rec(0, null);
    l_lvl_prev2 pls_integer := 0;
    l_clob clob;
    l_buffer varchar2(32767) := null;
    l_buflen pls_integer := 0;
    do_prev boolean:= false;
    procedure append_jso(
      p_jso varchar2,
      p_lvl_prev pls_integer,
      p_lvl pls_integer,
      p_lvl_next pls_integer
    ) is
      l_jso varchar2(4000);
    begin
      l_jso :=
        case
          when p_lvl_prev = 0 then null
          when p_lvl - p_lvl_prev = 1 then l_array_name
          when p_lvl > 1 then ','
        end ||
        rtrim(p_jso, '}') ||
        rpad('}', (p_lvl - p_lvl_next) * 2 + 1, ']}');

      if l_buflen + lengthb(l_jso) > 32767 then
        l_clob := l_clob || l_buffer;
        l_buffer := l_jso;
        l_buflen := lengthb(l_buffer);
      else
        l_buffer := l_buffer || l_jso;
        l_buflen := l_buflen + lengthb(l_jso);
      end if;
    end append_jso;
  begin
    loop
      fetch p_hier_rc bulk collect into lt_hier_rec limit p_limit;
      if do_prev then
        append_jso(
          l_hier_rec_prev.jso,
          l_lvl_prev2,
          l_hier_rec_prev.lvl,
          case when lt_hier_rec.count > 0 then lt_hier_rec(1).lvl else 1 end
        );
        do_prev := false;
      end if;
      for i in 1..lt_hier_rec.count-1 loop
        append_jso(
          lt_hier_rec(i).jso,
          case
            when i = 1 then l_hier_rec_prev.lvl
            else lt_hier_rec(i-1).lvl
          end,
          lt_hier_rec(i).lvl,
          lt_hier_rec(i+1).lvl
        );
      end loop;
      if lt_hier_rec.count > 0 then
        l_lvl_prev2 :=
          case lt_hier_rec.count
            when 1 then l_hier_rec_prev.lvl
            else lt_hier_rec(lt_hier_rec.count-1).lvl
          end;
        l_hier_rec_prev := lt_hier_rec(lt_hier_rec.count);
        do_prev := true;
      end if;
      exit when p_hier_rc%notfound;
    end loop;
    if do_prev then
      append_jso(
        l_hier_rec_prev.jso,
        l_lvl_prev2,
        l_hier_rec_prev.lvl,
        1
      );
    end if;
    if l_buflen > 0 then
      l_clob := l_clob || l_buffer;
    end if;
    return l_clob;
  end get;
end hier_to_clob;
/
select hier_to_clob.get(
  cursor(
    select level, json_object(empno, ename)
    from emp
    start with empno = 7566
    connect by mgr = prior empno
  ),
  'grunts'
)
from dual;
/* pretty printing done outside of the function */
{
  "empno" : 7566,
  "ename" : "JONES",
  "grunts" :
  [
    {
      "empno" : 7788,
      "ename" : "SCOTT",
      "grunts" :
      [
        {
          "empno" : 7876,
          "ename" : "ADAMS"
        }
      ]
    },
    {
      "empno" : 7902,
      "ename" : "FORD",
      "grunts" :
      [
        {
          "empno" : 7369,
          "ename" : "SMITH"
        }
      ]
    }
  ]
}

2 thoughts on “Hierarchical JSON

  1. Pingback: Hierarchical JSON with Stew Ashton & ORDS REST APIs

  2. Hi Stew,

    Fast and very straightforward implementation. I don’t think we can do much better in plain PL/SQL.
    I read the latest oracle-tech discussion you’re referring to (trying to catch up a couple of years away from the Oracle World), and I saw mentions of parallelization. This is not an easy task, to say the least, but would be a great enhancement for very large datasets. I think a good approach would be to parallelize the scans of distinct (and ideally balanced) branches of the tree. But now we’re limited to what PL/SQL can do OOTB :
    – DBMS_PARALLEL_EXECUTE : would require at least a staging structure
    – Parallel pipelined function : would require a parallelizable input source, and therefore a materialized dataset at some point, as to my knowledge, hierarchical or recursive queries won’t run in parallel.
    All of those aspects adding too much overhead compared to the serial process, I think. I’ll keep that in a corner of my mind for a weekend project.

    Best regards.

Leave a comment