Read CLOBs fast with less memory


Reading a big CLOB is like trying to eat a burger all at once: it will take you forever if you don’t choke. Why not cut that CLOB into bite-size chunks? It’s faster, uses less memory – and it’s good table manners…

Marc Bleron has blogged about parsing CSV data in CLOBs. As the CLOB got bigger, the parsing time went way up. I decided to write a pipelined table function that would return the CLOB in VARCHAR2-sized bites, cutting at record boundaries.

  • By default, the maximum bite size is 4000 bytes. You can make it less if you need some room to use REPLACE.
  • Also by default, the record delimiter is the NEWLINE of your operating system, but you can change it.

Again, the intent is to return as many entire records as will fit in the “bite”. If the input has a record that is longer than the maximum bite size, then this function will raise an exception.

One of the pains of CLOBs is that LENGTH() and SUBSTR() deal with characters only: the LENGTHB() and SUBSTRB() functions deal with bytes, but they are limited to VARCHAR2. Fortunately, VARCHAR2 in PL/SQL can be much longer, so I read 4000 characters into a buffer and then I cut off any records that overflow the 4000-byte boundary.

UPDATE 2018-06-15: I renamed “rec_term” to “rec_delim” because external table definitions say “records delimited by”. I now raise an exception if a bite does not contain the record delimiter.

UPDATE 2018-06-16 and -17: bug fixes for edge cases.

UPDATE 2018-06-28: better error handling. The maximum record length (in bytes) was actually 4000 including the record delimiter. It is now 4000 excluding the record delimiter.

create or replace function pipe_clob (
  p_clob in clob,
  p_max_lengthb in integer default 4000,
  p_rec_delim in varchar2 default '
'
) return sys.odcivarchar2list pipelined authid current_user as
/*
Break CLOB into VARCHAR2 sized bites.
Reduce p_max_lengthb if you need to expand the VARCHAR2
in later processing.
Last record delimiter in each bite is not returned,
but if it is a newline and the output is spooled
the newline will come back in the spooled output.
Note: this cannot work if the CLOB contains more than
<p_max_lengthb> consecutive bytes without a record delimiter.
*/
  l_amount integer;
  l_offset integer;
  l_buffer varchar2(32767 BYTE);
  l_out varchar2(32767 BYTE);
  l_buff_lengthb integer;
  l_occurence integer;
  l_rec_delim_length integer := length(p_rec_delim);
begin
  if p_max_lengthb > 4000 then
    raise_application_error(-20001, 'Maximum record length (p_max_lengthb) cannot be greater than 4000.');
  elsif p_max_lengthb < 10 then
    raise_application_error(-20002, 'Maximum record length (p_max_lengthb) cannot be less than 10.');
  end if;
  if p_rec_delim is null then
    raise_application_error(-20003, 'Record delimiter (p_rec_delim) cannot be null.');
  end if;
  /* This version is limited to 4000 byte output, so I can afford to ask for 4001
      in case the record is exactly 4000 bytes long.
  */
  l_amount := p_max_lengthb + l_rec_delim_length;
  l_offset := 1;
  while l_amount = p_max_lengthb + l_rec_delim_length loop
    begin
      DBMS_LOB.READ (
       p_clob,
       l_amount,
       l_offset,
       l_buffer
      );
    exception when no_data_found then
      l_amount := 0;
    end;
    if l_amount = 0 then
      exit;
    elsif lengthb(l_buffer) <= p_max_lengthb then pipe row(rtrim(l_buffer, p_rec_delim)); exit; end if; l_buff_lengthb := p_max_lengthb + l_rec_delim_length; l_occurence := 0; while l_buff_lengthb > p_max_lengthb loop
      l_occurence := l_occurence + 1;
      l_buff_lengthb := instrb(l_buffer,p_rec_delim, -1, l_occurence) - 1;
    end loop;
    if l_buff_lengthb < 0 then
      if l_amount = p_max_lengthb + l_rec_delim_length then
        raise_application_error(
          -20004, 
          'Input clob at offset '||l_offset||' for lengthb '||p_max_lengthb||' has no record delimiter'
        );
      end if;
    end if;
    l_out := substrb(l_buffer, 1, l_buff_lengthb);
    pipe row(l_out);
    l_offset := l_offset + nvl(length(l_out),0) + l_rec_delim_length;
    l_amount := p_max_lengthb + l_rec_delim_length;
  end loop;
  return;
end;
/

In my laptop tests, read time increased linearly with the number of records. If you try it, let me know how it works out.

Advertisements

Generate JOINs among related tables

After finding the shortest “foreign key” path between two tables, I’ll try to generate a SELECT statement that joins all the related tables.

(Please don’t tell me that Oracle SQL Developer does this: I know it does, as long as you already know the path and select all the tables yourself! I find a path for you and make sure the column names are unique. Despite these differences, you will probably never need this code if you use SQL Developer.)

  • The final SELECT starts with the bottom of the hierarchy and works its way up.
  • Table aliases are T1, T2 and so on.
  • All columns of all tables are selected (based on ALL_TAB_COLUMNS, not ALL_TAB_COLS).
  • To make columns unique, they are prefixed with the table alias: T1_* and so on.
  • Table names and column names are only enclosed in double quotes when they contain non-alphanumeric characters. Exceptionally, I always enclose the join columns in double quotes just to save myself some trouble…
  • I use ANSI join syntax, and when I do the equality comparison I use parentheses like this:
    on (T1.COL1, T1.COL2) = ((T2.COL1, T2.COL2))
    It’s a bit easier to code. Don’t be surprised to see the parentheses even when there is only one column.

The code is more than 100 lines long. I’ll put my inadequate explanation first, then the code.

  • search_for: the owners and tables I want to find the path between.
  • pur: the constraints I need to start with.
  • relations: add to each constraint row the remote or parent table.
  • best_path: returns a string that contains all the data from all the rows in the path
  • best_path_tables: breaks down the string into the rows and columns of the path I found. I use REGEXP functions instead of JSON functions so this should work from version 11.2.
  • final_select: get all the columns for the final SELECT, accessing ALL_TAB_COLUMNS
  • final_from: get the FROM and JOIN clauses, accessing ALL_CONS_COLUMNS.

I’ll sign off before posting the code. I would say “hope this helps”, but I feel more like saying

Hope this works!

with search_for (owner, table_name) as (
  select 'OBE', 'OEHR_REGIONS' from dual union all
  select 'OBE', 'OEHR_ORDER_ITEMS' from dual
)
, pur as (
  select OWNER,   CONSTRAINT_NAME, TABLE_NAME,
       R_OWNER, R_CONSTRAINT_NAME
  from dba_constraints
  where constraint_type in('P','U','R')
  and owner in (select username from dba_users where oracle_maintained = 'N')
)
, relations as (
  select a.owner, a.table_name, a.constraint_name,
       a.r_owner, b.table_name r_table_name, a.r_constraint_name
  from pur a join pur b
  on (a.R_OWNER, a.R_CONSTRAINT_NAME) = ((b.OWNER, b.CONSTRAINT_NAME))
  and (a.OWNER, a.table_name) != ((b.OWNER, b.table_name))
)
, best_path as (
  select path
  from (
    select row_number() over( order by
      case when (r_owner, r_table_name) 
            != ((connect_by_root(owner), connect_by_root(table_name)))
        then level end
      ) rn, 
      sys_connect_by_path(
      owner  ||'/'||table_name  ||'/'||  constraint_name||'/'||
      r_owner||'/'||r_table_name||'/'||r_constraint_name,'<') path
    from relations a
    where (r_owner, r_table_name) in (select * from search_for)
    start with (owner, table_name) in (select * from search_for)
    connect by nocycle (owner, table_name) = ((prior r_owner, prior r_table_name))
    and (owner, table_name) not in (select * from search_for)
  )
  where rn = 1
)
, best_path_tables as (
  select level lvl,
    'T'||level al,
    'T'||(level+1) r_al,
    regexp_substr(path, '<(.+?)/',            1, level,'i',1) owner,
    regexp_substr(path, '<(.+?/){1}(.+?)/',   1, level,'i',2) table_name,
    regexp_substr(path, '<(.+?/){2}(.+?)/',   1, level,'i',2) constraint_name,
    regexp_substr(path, '<(.+?/){3}(.+?)/',   1, level,'i',2) r_owner,
    regexp_substr(path, '<(.+?/){4}(.+?)/',   1, level,'i',2) r_table_name,
    regexp_substr(path, '<(.+?/){5}([^/<]+)', 1, level,'i',2) r_constraint_name
  from best_path
  connect by level <= regexp_count (path, '<')
)
, final_select as (
  select lvl, pref ||
  listagg(al||'.'||qu||column_name||qu||' '||qu||al||'_'||column_name||qu,',')
    within group(order by column_id) str
  from (
    select -99 lvl, 'select ' pref, al, column_name, column_id,
      case when not regexp_like (column_name, '^[A-Z][A-Z0-9$#_]*$') then '"' end qu
    from best_path_tables join all_tab_columns using(owner, table_name)
    where lvl = 1
    union all
    select lvl-99, ',', r_al, column_name, column_id,
      case when not regexp_like (column_name, '^[A-Z][A-Z0-9$#_]*$') then '"' end qu
    from best_path_tables b
    join all_tab_columns a on (b.r_owner, b.r_table_name) = ((a.owner, a.table_name))
  )
  group by lvl, pref
)
, final_from as (
  select 0 lvl, 'from '||oq||owner||oq||'.'||tq||table_name||tq||' '|| al str
  from (
    select a.*,
      case when not regexp_like (owner, '^[A-Z][A-Z0-9$#_]*$') then '"' end oq,
      case when not regexp_like (table_name, '^[A-Z][A-Z0-9$#_]*$') then '"' end tq
      from best_path_tables a
      where lvl = 1
  ) a
  union all
  select lvl, 'join '||oq||r_owner||oq||'.'||tq||r_table_name||tq||' '|| r_al || ' on (' ||
  (
    select listagg(al||'."'||column_name||'"',',') within group(order by position)
    from all_cons_columns 
    where (owner, table_name, constraint_name) = ((a.owner, a.table_name, a.constraint_name))
  )
  || ') = ((' ||
  (
    select listagg(r_al||'."'||column_name||'"',',') within group(order by position)
    from all_cons_columns 
    where (owner, table_name, constraint_name) = ((a.r_owner, a.r_table_name, a.r_constraint_name))
  )
  || '))'
  from (
    select a.*,
      case when not regexp_like (r_owner, '^[A-Z][A-Z0-9$#_]*$') then '"' end oq,
      case when not regexp_like (r_table_name, '^[A-Z][A-Z0-9$#_]*$') then '"' end tq
      from best_path_tables a
  ) a
)
select str from (
  select * from final_select
  union all
  select * from final_from
)
order by lvl;
select T1.ORDER_ID T1_ORDER_ID,T1.LINE_ITEM_ID T1_LINE_ITEM_ID,T1.PRODUCT_ID T1_PRODUCT_ID,T1.UNIT_PRICE T1_UNIT_PRICE,T1.QUANTITY T1_QUANTITY,T1.ORDER_ITEM_ID T1_ORDER_ITEM_ID
,T2.ORDER_ID T2_ORDER_ID,T2.ORDER_DATE T2_ORDER_DATE,T2.ORDER_MODE T2_ORDER_MODE,T2.CUSTOMER_ID T2_CUSTOMER_ID,T2.ORDER_STATUS T2_ORDER_STATUS,T2.ORDER_TOTAL T2_ORDER_TOTAL,T2.SALES_REP_ID T2_SALES_REP_ID,T2.PROMOTION_ID T2_PROMOTION_ID
,T3.EMPLOYEE_ID T3_EMPLOYEE_ID,T3.FIRST_NAME T3_FIRST_NAME,T3.LAST_NAME T3_LAST_NAME,T3.EMAIL T3_EMAIL,T3.PHONE_NUMBER T3_PHONE_NUMBER,T3.HIRE_DATE T3_HIRE_DATE,T3.JOB_ID T3_JOB_ID,T3.SALARY T3_SALARY,T3.COMMISSION_PCT T3_COMMISSION_PCT,T3.MANAGER_ID T3_MANAGER_ID,T3.DEPARTMENT_ID T3_DEPARTMENT_ID
,T4.DEPARTMENT_ID T4_DEPARTMENT_ID,T4.DEPARTMENT_NAME T4_DEPARTMENT_NAME,T4.MANAGER_ID T4_MANAGER_ID,T4.LOCATION_ID T4_LOCATION_ID
,T5.LOCATION_ID T5_LOCATION_ID,T5.STREET_ADDRESS T5_STREET_ADDRESS,T5.POSTAL_CODE T5_POSTAL_CODE,T5.CITY T5_CITY,T5.STATE_PROVINCE T5_STATE_PROVINCE,T5.COUNTRY_ID T5_COUNTRY_ID
,T6.COUNTRY_ID T6_COUNTRY_ID,T6.COUNTRY_NAME T6_COUNTRY_NAME,T6.REGION_ID T6_REGION_ID
,T7.REGION_ID T7_REGION_ID,T7.REGION_NAME T7_REGION_NAME
from OBE.OEHR_ORDER_ITEMS T1
join OBE.OEHR_ORDERS T2 on (T1."ORDER_ID") = ((T2."ORDER_ID"))
join OBE.OEHR_EMPLOYEES T3 on (T2."SALES_REP_ID") = ((T3."EMPLOYEE_ID"))
join OBE.OEHR_DEPARTMENTS T4 on (T3."DEPARTMENT_ID") = ((T4."DEPARTMENT_ID"))
join OBE.OEHR_LOCATIONS T5 on (T4."LOCATION_ID") = ((T5."LOCATION_ID"))
join OBE.OEHR_COUNTRIES T6 on (T5."COUNTRY_ID") = ((T6."COUNTRY_ID"))
join OBE.OEHR_REGIONS T7 on (T6."REGION_ID") = ((T7."REGION_ID"))

Split a string again and again

I recently needed to gather multiple columns from multiple rows into one string, then split them out again. JSON_TABLE did both splits at the same time!

The situation

I want to find the shortest “foreign key” path between two tables in the OBE schema: OEHR_ORDER_ITEMS and OEHR_DEPARTMENTS. I also want the intermediate tables and the constraints used to connect the tables. As a starting point for today, I’ll create a little table with the data for the tables in the OBE schema that start with ‘OEHR’:

drop table t purge;
create table t as
with search_for (owner, table_name) as (
  select 'OBE', 'OEHR_DEPARTMENTS' from dual union all
  select 'OBE', 'OEHR_ORDER_ITEMS' from dual
)
, pur as (
  select OWNER, CONSTRAINT_NAME, TABLE_NAME,
    R_OWNER, R_CONSTRAINT_NAME
  from dba_constraints
  where constraint_type in('P','U','R')
  and owner = 'OBE'
  and table_name like 'OEHR%'
)
, relations as (
  select a.table_name, a.constraint_name,
    b.table_name r_table_name, a.r_constraint_name
  from pur a join pur b
  on (a.R_OWNER, a.R_CONSTRAINT_NAME) = ((b.OWNER, b.CONSTRAINT_NAME))
  and (a.OWNER, a.table_name) != ((b.OWNER, b.table_name))
)
select * from relations;

Now I’ll use a hierarchical query to find paths from one table to the other:

select rownum rn, level lvl, TABLE_NAME, CONSTRAINT_NAME, R_TABLE_NAME, R_CONSTRAINT_NAME
from t a
start with table_name = 'OEHR_ORDER_ITEMS'
connect by nocycle table_name = prior r_table_name
and table_name != 'OEHR_DEPARTMENTS';
RN LVL TABLE_NAME CONSTRAINT_NAME R_TABLE_NAME R_CONSTRAINT_NAME
1 1 OEHR_ORDER_ITEMS OEHR_ORDER_ITEMS_ORDER_ID_FK OEHR_ORDERS OEHR_ORDER_PK
2 2 OEHR_ORDERS OEHR_ORDERS_CUSTOMER_ID_FK OEHR_CUSTOMERS OEHR_CUSTOMERS_PK
3 3 OEHR_CUSTOMERS OEHR_CUST_ACCT_MGR_FK OEHR_EMPLOYEES OEHR_EMP_EMP_ID_PK
4 4 OEHR_EMPLOYEES OEHR_EMP_DEPT_FK OEHR_DEPARTMENTS OEHR_DEPT_ID_PK
5 4 OEHR_EMPLOYEES OEHR_EMP_JOB_FK OEHR_JOBS OEHR_JOB_ID_PK
6 2 OEHR_ORDERS OEHR_ORDERS_SALES_REP_FK OEHR_EMPLOYEES OEHR_EMP_EMP_ID_PK
7 3 OEHR_EMPLOYEES OEHR_EMP_DEPT_FK OEHR_DEPARTMENTS OEHR_DEPT_ID_PK
8 3 OEHR_EMPLOYEES OEHR_EMP_JOB_FK OEHR_JOBS OEHR_JOB_ID_PK
9 1 OEHR_ORDER_ITEMS OEHR_ORDER_ITEMS_PRODUCT_ID_FK OEHR_PRODUCT_INFORMATION OEHR_PRODUCT_INFO_PK

 

Just looking at the data myself, I can see that the shortest path will include rows 1, 6 and 7. Rows 1 through 4 will work too, but the path is longer.

Getting the data for the whole path

The problem is, how can I identify those three rows I want and just query them? The only way I see is to use SYS_CONNECT_BY_PATH (or the equivalent with a CTE) to string together all the data from all the rows I want – which means I have to split that data out again later.

To simplify what follows, I’m going to manually delete the rows that don’t matter:

delete from t where table_name not in ('OEHR_ORDER_ITEMS','OEHR_ORDERS','OEHR_EMPLOYEES');
delete from t where r_table_name not in ('OEHR_DEPARTMENTS','OEHR_ORDERS','OEHR_EMPLOYEES');
commit;

The rows that are left contain the data I want:

TABLE_NAME CONSTRAINT_NAME R_TABLE_NAME R_CONSTRAINT_NAME
OEHR_ORDER_ITEMS OEHR_ORDER_ITEMS_ORDER_ID_FK OEHR_ORDERS OEHR_ORDER_PK
OEHR_ORDERS OEHR_ORDERS_SALES_REP_FK OEHR_EMPLOYEES OEHR_EMP_EMP_ID_PK
OEHR_EMPLOYEES OEHR_EMP_DEPT_FK OEHR_DEPARTMENTS OEHR_DEPT_ID_PK

 

Now I’ll string that data together:

select sys_connect_by_path(
  table_name  ||'/'||  constraint_name||'/'||
  r_table_name||'/'||r_constraint_name
  ,'<'
) path
from t a
where r_table_name = 'OEHR_DEPARTMENTS'
start with table_name = 'OEHR_ORDER_ITEMS'
connect by table_name = prior r_table_name;

<OEHR_ORDER_ITEMS/OEHR_ORDER_ITEMS_ORDER_ID_FK/OEHR_ORDERS/OEHR_ORDER_PK
<OEHR_ORDERS/OEHR_ORDERS_SALES_REP_FK/OEHR_EMPLOYEES/OEHR_EMP_EMP_ID_PK
<OEHR_EMPLOYEES/OEHR_EMP_DEPT_FK/OEHR_DEPARTMENTS/OEHR_DEPT_ID_PK

I split the data into three lines for legibility. Note that I have to use different literals in the two parameters of SYS_CONNECT_BY_PATH, otherwise I get an exception.

Splitting out the data

Now we’re finally getting to the point. I can use JSON_ARRAY to string together the data for one row. Once I have all the data for all the rows, I can turn that into an array of arrays. Finally, JSON_TABLE will split everything out for me. Let’s start with JSON_ARRAY:

select sys_connect_by_path(
  json_array(table_name, constraint_name, r_table_name, r_constraint_name)
  , '
'
) path
from t a
where r_table_name = 'OEHR_DEPARTMENTS'
start with table_name = 'OEHR_ORDER_ITEMS'
connect by table_name = prior r_table_name;

["OEHR_ORDER_ITEMS","OEHR_ORDER_ITEMS_ORDER_ID_FK","OEHR_ORDERS","OEHR_ORDER_PK"]
["OEHR_ORDERS","OEHR_ORDERS_SALES_REP_FK","OEHR_EMPLOYEES","OEHR_EMP_EMP_ID_PK"]
["OEHR_EMPLOYEES","OEHR_EMP_DEPT_FK","OEHR_DEPARTMENTS","OEHR_DEPT_ID_PK"]

Now I’ll turn that data into an array of arrays:

with data as (
  select sys_connect_by_path(
    json_array(table_name, constraint_name, r_table_name, r_constraint_name)
    , '/'
  ) path
  from t a
  where r_table_name = 'OEHR_DEPARTMENTS'
  start with table_name = 'OEHR_ORDER_ITEMS'
  connect by table_name = prior r_table_name
)
select '['||substr(replace(path,'/',','),2) || ']' path
from data;
[
  [
    "OEHR_ORDER_ITEMS",
    "OEHR_ORDER_ITEMS_ORDER_ID_FK",
    "OEHR_ORDERS",
    "OEHR_ORDER_PK"
  ],
  [
    "OEHR_ORDERS",
    "OEHR_ORDERS_SALES_REP_FK",
    "OEHR_EMPLOYEES",
    "OEHR_EMP_EMP_ID_PK"
  ],
  [
    "OEHR_EMPLOYEES",
    "OEHR_EMP_DEPT_FK",
    "OEHR_DEPARTMENTS",
    "OEHR_DEPT_ID_PK"
  ]
]

Now JSON_TABLE will break down the top array into rows, just by putting ‘$[*]’ in the second parameter. The COLUMNS will do the rest, splitting the data in the inner arrays into different columns.

with data as (
  select sys_connect_by_path(
    json_array(table_name, constraint_name, r_table_name, r_constraint_name)
    , '/'
  ) path
  from t a
  where r_table_name = 'OEHR_DEPARTMENTS'
  start with table_name = 'OEHR_ORDER_ITEMS'
  connect by table_name = prior r_table_name
)
select a.* from data, json_table(
  '['||substr(replace(path,'/',','),2) || ']',
  '$[*]' columns(
    rn for ordinality,
    table_name varchar2(128) path '$[0]',
    constraint_name varchar2(128) path '$[1]',
    r_table_name varchar2(128) path '$[2]',
    r_constraint_name varchar2(128) path '$[3]'
  )
) a;
RN TABLE_NAME CONSTRAINT_NAME R_TABLE_NAME R_CONSTRAINT_NAME
1 OEHR_ORDER_ITEMS OEHR_ORDER_ITEMS_ORDER_ID_FK OEHR_ORDERS OEHR_ORDER_PK
2 OEHR_ORDERS OEHR_ORDERS_SALES_REP_FK OEHR_EMPLOYEES OEHR_EMP_EMP_ID_PK
3 OEHR_EMPLOYEES OEHR_EMP_DEPT_FK OEHR_DEPARTMENTS OEHR_DEPT_ID_PK

 

Whew!

P.S. If you think a CTE solution is cleaner:

with data( TABLE_NAME, R_TABLE_NAME, path ) as (
  select n.TABLE_NAME, n.R_TABLE_NAME,
    json_array(n.TABLE_NAME, n.CONSTRAINT_NAME, n.R_TABLE_NAME, n.R_CONSTRAINT_NAME)
  from t n
  where table_name = 'OEHR_ORDER_ITEMS'
  union all
  select n.TABLE_NAME, n.R_TABLE_NAME,
    o.path || ',' ||
    json_array(n.TABLE_NAME, n.CONSTRAINT_NAME, n.R_TABLE_NAME, n.R_CONSTRAINT_NAME)
  from t n
  join data o on (n.table_name = o.r_table_name)
    and n.table_name != 'OEHR_DEPARTMENTS'
)
select b.* from data a, json_table(
  '['|| path || ']',
  '$[*]' columns(
    rn for ordinality,
    table_name varchar2(128) path '$[0]',
    constraint_name varchar2(128) path '$[1]',
    r_table_name varchar2(128) path '$[2]',
    r_constraint_name varchar2(128) path '$[3]'
  )
) b
where a.r_table_name = 'OEHR_DEPARTMENTS';

Foreign keys between two tables

In a comment on my post about getting all the foreign key relationships, I was asked how to get the shortest path between two tables. Not too easy…

UPDATE: you can see how hard it is by the fact I have had to correct two bugs, one concerning direct or indirect self-references.

Using the ALL_CONSTRAINTS view, we can put together all the foreign key relationships between child tables and parent tables, and we can do hierarchical queries to go up and down branches of the hierarchy. If you give me two tables, though, how do I know whether to go up or down, and which table to start with?

The idea

I decided to start with both tables and go up. I expect children to outnumber parents, so going up should be more efficient. I start with both tables, knowing that at most one will be the descendant of the other. I then go up the hierarchy until I either get to the top or hit one of the tables. The “shortest path” is the one where the final LEVEL is smallest.

The code

I start with a little subquery that provides the target tables.

with search_for (owner, table_name) as (
  select 'OBE', 'OEHR_COUNTRIES' from dual union all
  select 'OBE', 'OEHR_ORDER_ITEMS' from dual
)

Then I get all the constraints that may be involved in foreign key relationships.

, pur as (
  select
    OWNER, CONSTRAINT_NAME, TABLE_NAME,
    R_OWNER, R_CONSTRAINT_NAME
  from dba_constraints
  where constraint_type in('P','U','R')
  and owner in (select username from dba_users where oracle_maintained = 'N')
) and owner in (select username from dba_users where oracle_maintained = 'N')
)

I do a self join on the PUR subquery to add the parent table to each row.
UPDATE: added a line to RELATIONS to avoid self-references.

, relations as (
  select a.owner, a.table_name,
    a.r_owner, b.table_name r_table_name
  from pur a join pur b
  on (a.R_OWNER, a.R_CONSTRAINT_NAME) = ((b.OWNER, b.CONSTRAINT_NAME))
  and (a.OWNER, a.table_name) != ((b.OWNER, b.table_name))
)

Now I start going up the hierarchy:

  • I “start with” both tables
  • I “connect by” going up the hierarchy, stopping before the other table can be a child
    This is important, because the other table could be at the top of the hierarchy and not be a child at all.
  • The WHERE clause is applied after the CONNECT BY: I only keep rows where I found the other table.
  • I now have one row for each path from one table to the other. I use SYS_CONNECT_BY_PATH to concatenate the original child and all the parents going up the hierarchy.
    Notice I have to use one literal (<) within the first parameter and another (/) as the second parameter. I wanted to use ‘<‘ both places, but Oracle won’t let me.
  • Finally, I choose the shortest path, translate ‘/’ to ‘<‘, get rid of the first ‘<‘ and display the result.

UPDATES:

  • To avoid indirect self-referencing, I make sure the last remote table is not the same and the original table!
  • I had the ‘<‘ outside of the CASE statement. Now it is inside so you do not get ‘<<‘ anymore.
, paths as (
  select row_number() over( order by
    case when (r_owner, r_table_name) != ((connect_by_root(owner), connect_by_root(table_name)))
      then level end
    ) rn, 
    sys_connect_by_path(
      case level when 1 then owner||'.'||table_name || '<' end ||
      r_owner||'.'||r_table_name,
    '/') path
  from relations r
  where (r_owner, r_table_name) in (select * from search_for)
  start with (owner, table_name) in (select * from search_for)
  connect by nocycle (owner, table_name) = ((prior r_owner, prior r_table_name))
  and (owner, table_name) not in (select * from search_for)
)
select substr(translate(path,'/','<'), 2) best_path
from paths where rn = 1;

BEST_PATH
------------------------------------------------------------------------------------------------------------------
OBE.OEHR_ORDER_ITEMS<OBE.OEHR_ORDERS<OBE.OEHR_EMPLOYEES<OBE.OEHR_DEPARTMENTS<OBE.OEHR_LOCATIONS<OBE.OEHR_COUNTRIES

As the man said, hope this helps…

Splitting Strings: a New Champion!

My last post on splitting strings conceded that “a combined SQL + PL/SQL solution beats the best pure SQL solution.” Wrong! I didn’t try JSON_TABLE.

Pretend it’s an array

It’s not hard to reformat a string like

'a,b,c,d,e,"'

to be a JSON array:

select replace(json_array('a,b,c,d,e,"'), ',', '","') jstr
from dual;

JSTR
--------------------------
["a","b","c","d","e","\""]

Notice that the double quote in the data itself is properly escaped by the call to JSON_ARRAY. Then I turn the string into a series of array elements by putting double quotes around each comma. Once all that is done, JSON_TABLE will split it like this:

select subs from json_table(
  replace(json_array('a,b,c,d,e,"'), ',', '","'),
  '$[*]' columns (
    subs varchar2(4000) path '$'
  )
);

SUBS
----
a
b
c
d
e
"

So the escaped data is automatically unescaped. Now for speed, here is my test harness:

set serveroutput on
declare
  l_num number;
  l_timestamp timestamp;
  l_plsql_secs number;
  l_JSON_secs number;
  l_num_substrings number := 10;
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
begin
  select count(*) into l_num from user_tables where table_name = 'T';
  if l_num > 0 then
    do('drop table t purge');
  end if;
  do('create table t(id number, str varchar2(4000)) cache');
  insert into t
  select level, to_char(level,'fm000000000')||',000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010'
  from dual
  connect by level <= 10000;
  commit;
  dbms_output.put_line('Substrings' || chr(9) || 'JSON' || chr(9) || 'PL/SQL');
  for i in 1..10 loop
    select count(*) into l_num from t;
     
    l_timestamp := localtimestamp;
    select count(column_value) into l_num from (
      select id, column_value from t, table(string_tokenize(str, ','))
    );
    l_plsql_secs := extract(second from localtimestamp - l_timestamp);
     
    l_timestamp := localtimestamp;
    select count(subs) into l_num from (
      select id, subs from t, JSON_table(
      '["' || replace(str, ',' , '","') || '"]', '$[*]' columns(
      subs varchar2(99) path '$'
      )
    )
    );
    l_JSON_secs := extract(second from localtimestamp - l_timestamp);
    dbms_output.put_line(l_num_substrings || chr(9) || l_JSON_secs || chr(9) || l_plsql_secs);
    update t set str =
    str || ',000000001,000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010';
    l_num_substrings := l_num_substrings + 10;
    commit;
  end loop;
end;
/

Notice that I keep the same number of input rows here and increase the number of substrings per row.

PL/SQL does great, but JSON_TABLE appears to beat it.

PIVOT Function with Totals

I have updated the ADVANCED_PIVOT function to support row and/or column totals. It’s neat, but you need to know what you’re doing.

No totals

Let’s start from an example that has no totals. If we didn’t want to pivot, we could do a simple GROUP BY:

SQL> select dname, job, sum(sal)
  2  from emp join dept using(deptno)
  3  group by dname, job
  4  order by 1,2;

DNAME          JOB         SUM(SAL)
-------------- --------- ----------
ACCOUNTING     CLERK           1300
ACCOUNTING     MANAGER         2450
ACCOUNTING     PRESIDENT       5000
RESEARCH       ANALYST         3000
RESEARCH       CLERK           1900
RESEARCH       MANAGER         2975
SALES          CLERK            950
SALES          MANAGER         2850
SALES          SALESMAN        5600

If we do want to pivot, the PIVOT clause will do the aggregation for us:

SQL> var rc refcursor

SQL> begin
  2  :rc := advanced_pivot(
  3  p_source => q'@(
  4  select dname, job, sal
  5  from emp join dept using(deptno)
  6  )@',
  7  p_aggfuncs => 'sum(sal)',
  8  p_pivotfor => 'job'
  9  );
 10  end;
 11  /

print :rc

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING                      1300       2450       5000           
RESEARCH             3000       1900       2975                      
SALES                            950       2850                  5600

Horizontal total

Now suppose we want an extra column that shows the total for each row. Is that a “row total” or a “column total”? To avoid confusion, I’m going to say “horizontal total” when we get an extra column, and “vertical total” when we get an extra row.

The first thing I realized is that, if we want any totals, we have to compute them beforehand: the PIVOT clause will only be used to create the pivot columns and move the data to the right places. Here is a query to do that:

SQL> select dname, 
  2    job,
  3    sum(sal) sal
  4  from emp join dept using(deptno)
  5  group by dname, rollup(job);

DNAME          JOB              SAL
-------------- --------- ----------
SALES          CLERK            950
SALES          MANAGER         2850
SALES          SALESMAN        5600
SALES                          9400
RESEARCH       CLERK           1900
RESEARCH       ANALYST         3000
RESEARCH       MANAGER         2975
RESEARCH                       7875
ACCOUNTING     CLERK           1300
ACCOUNTING     MANAGER         2450
ACCOUNTING     PRESIDENT       5000
ACCOUNTING                     8750

Notice the use of ROLLUP(JOB) to get totals for all the jobs, but always at the DNAME level.

Notice also that the total rows have a JOB value of NULL. PIVOT will not allow us to pivot data based on a NULL column value. So we use the GROUPING() function to find out when there is a total of all jobs, and replace the NULL with a literal.

SQL> select dname, 
  2    decode(grouping(job),1,'(Total)',job) job,
  3    sum(sal) sal
  4  from emp join dept using(deptno)
  5  group by dname, rollup(job);

DNAME          JOB              SAL
-------------- --------- ----------
SALES          CLERK            950
SALES          MANAGER         2850
SALES          SALESMAN        5600
SALES          (Total)         9400
RESEARCH       CLERK           1900
RESEARCH       ANALYST         3000
RESEARCH       MANAGER         2975
RESEARCH       (Total)         7875
ACCOUNTING     CLERK           1300
ACCOUNTING     MANAGER         2450
ACCOUNTING     PRESIDENT       5000
ACCOUNTING     (Total)         8750

Notice I always put the literal ‘(Total)’ before the value. If the value is not a string, Oracle will implicitly convert it. If we said decode(grouping(<column>),0,<column>,'(Total)') and <column> was a date or a number, we would get a conversion exception.

Now that we have the totals with a valid label, let’s plug that into my function:

SQL> begin
  2  :rc := advanced_pivot(
  3  p_source => q'@(
  4  select dname, 
  5    decode(grouping(job),1,'(Total)',job) job,
  6    sum(sal) sal
  7  from emp join dept using(deptno)
  8  group by dname, rollup(job)
  9  )@',
 10  p_aggfuncs => 'sum(sal)',
 11  p_pivotfor => 'job'
 12  );
 13  end;
 14  /

DNAME             (Total)    ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING           8750                  1300       2450       5000           
RESEARCH             7875       3000       1900       2975                      
SALES                9400                   950       2850                  5600

Oops! The Total column is in the wrong place. To fix that, I added an optional parameter p_total_label. Whenever we order by a column, I will make sure that the content of that parameter goes last. Let’s try it:

SQL> begin
  2  :rc := advanced_pivot(
  3  p_source => q'@(
  4  select dname, 
  5    decode(grouping(job),1,'(Total)',job) job,
  6    sum(sal) sal
  7  from emp join dept using(deptno)
  8  group by dname, rollup(job)
  9  )@',
 10  p_aggfuncs => 'sum(sal)',
 11  p_pivotfor => 'job',
 12  p_total_label => '(Total)'
 13  );
 14  end;
 15  /

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN    (Total)
-------------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING                      1300       2450       5000                  8750
RESEARCH             3000       1900       2975                             7875
SALES                            950       2850                  5600       9400

Warnings:

  1. Make sure you use the same literal value for p_total_label and within the p_source query! The function will never try to check for coherence, so it’s up to you.
  2. To use something like ‘(Total)’ instead of NULL, dates and numbers must be converted to strings. If that happens, the function can only order by the string value, not the original date or number value!

Vertical total

Let’s try to ROLLUP by DNAME rather than JOB, in order to get an extra row with the totals for each job.

SQL> begin
  2  :rc := advanced_pivot(
  3  p_source => q'@(
  4  select decode(grouping(dname),1,'(Total)',dname) dname, 
  5    job,
  6    sum(sal) sal
  7  from emp join dept using(deptno)
  8  group by rollup(dname), job
  9  )@',
 10  p_aggfuncs => 'sum(sal)',
 11  p_pivotfor => 'job',
 12  p_total_label => '(Total)'
 13  );
 14  end;
 15  /

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- ---------- ---------- ---------- ---------- ----------
(Total)              3000       4150       8275       5000       5600
ACCOUNTING                      1300       2450       5000           
RESEARCH             3000       1900       2975                      
SALES                            950       2850                  5600

We still have a problem with sorting the rows. The function has no idea that the DNAME column has a ‘(Total)’ row that needs to go last. To fix that, I added another optional parameter p_ordercol. The function will order the final query by the column listed here, making sure that the total line goes last.

SQL> begin
  2  :rc := advanced_pivot(
  3  p_source => q'@(
  4  select decode(grouping(dname),1,'(Total)',dname) dname, 
  5    job,
  6    sum(sal) sal
  7  from emp join dept using(deptno)
  8  group by rollup(dname), job
  9  )@',
 10  p_aggfuncs => 'sum(sal)',
 11  p_pivotfor => 'job',
 12  p_total_label => '(Total)',
 13  p_ordercol => 'dname'
 14  );
 15  end;
 16  /

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING                      1300       2450       5000           
RESEARCH             3000       1900       2975                      
SALES                            950       2850                  5600
(Total)              3000       4150       8275       5000       5600

Horizontal and vertical totals

To get both totals, we just need to change the source query. We could combine two ROLLUPs, but in this case the CUBE function is more concise:

SQL> begin
  2  :rc := advanced_pivot(
  3  p_source => q'@(
  4  select decode(grouping(dname),1,'(Total)',dname) dname, 
  5    decode(grouping(job),1,'(Total)',job) job,
  6    sum(sal) sal
  7  from emp join dept using(deptno)
  8  group by cube(dname,job)
  9  )@',
 10  p_aggfuncs => 'sum(sal)',
 11  p_pivotfor => 'job',
 12  p_total_label => '(Total)',
 13  p_ordercol => 'dname'
 14  );
 15  end;
 16  /

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN    (Total)
-------------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING                      1300       2450       5000                  8750
RESEARCH             3000       1900       2975                             7875
SALES                            950       2850                  5600       9400
(Total)              3000       4150       8275       5000       5600      26025

Two vertical totals

Now we’re getting fancy! Suppose we want totals by department, but also subtotals within each department by the year someone was hired?

  • row 10: get the hiring year
  • row 13: instead of ROLLUP on DNAME only, we ROLLUP on DNAME and HIREYEAR
  • row 5: HIREYEAR should say ‘(Total)’ only when we are on a subtotal line, not on the grand total.
  • row 18: the p_ordercol parameter can be a list of column names
SQL> begin
  2  :rc := advanced_pivot(
  3  p_source => q'@(
  4  select decode(grouping(dname),1,'(Total)',dname) dname,
  5    decode(grouping_id(dname,hireyear),1,'(Total)',hireyear) hireyear,
  6    job,
  7    sum(sal) sal
  8  from (
  9    select dname, job, sal,
 10      extract(year from hiredate) hireyear
 11    from dept join emp using(deptno)
 12  )
 13  group by rollup(dname, hireyear), job
 14  )@',
 15  p_aggfuncs => 'sum(sal)',
 16  p_pivotfor => 'job',
 17  p_total_label => '(Total)',
 18  p_ordercol => 'dname,hireyear'
 19  );
 20  end;
 21  /

DNAME          HIREYEAR     ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- --------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING     1981                                  2450       5000           
ACCOUNTING     1982                       1300                                 
ACCOUNTING     (Total)                    1300       2450       5000           
RESEARCH       1980                        800                                 
RESEARCH       1981            3000                  2975                      
RESEARCH       1987                       1100                                 
RESEARCH       (Total)         3000       1900       2975                      
SALES          1981                        950       2850                  5600
SALES          (Total)                     950       2850                  5600
(Total)                        3000       4150       8275       5000       5600

Two vertical totals, one horizontal

  • row 13: we ROLLUP on the vertical axis (DNAME, HIREYEAR) and on the horizontal (JOB)
  • row 6: JOB must never have a NULL value, else the PIVOT will fail.
SQL> begin
  2  :rc := advanced_pivot(
  3  p_source => q'@(
  4  select decode(grouping(dname),1,'(Total)',dname) dname,
  5    decode(grouping_id(dname,hireyear),1,'(Total)',hireyear) hireyear,
  6    decode(grouping(job),1,'(Total)',job) job,
  7    sum(sal) sal
  8  from (
  9    select dname, job, sal,
 10      extract(year from hiredate) hireyear
 11    from dept join emp using(deptno)
 12  )
 13  group by rollup(dname, hireyear), rollup(job)
 14  )@',
 15  p_aggfuncs => 'sum(sal)',
 16  p_pivotfor => 'job',
 17  p_total_label => '(Total)',
 18  p_ordercol => 'dname,hireyear'
 19  );
 20  end;
 21  /

DNAME          HIREYEAR     ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN    (Total)
-------------- --------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING     1981                                  2450       5000                  7450
ACCOUNTING     1982                       1300                                        1300
ACCOUNTING     (Total)                    1300       2450       5000                  8750
RESEARCH       1980                        800                                         800
RESEARCH       1981            3000                  2975                             5975
RESEARCH       1987                       1100                                        1100
RESEARCH       (Total)         3000       1900       2975                             7875
SALES          1981                        950       2850                  5600       9400
SALES          (Total)                     950       2850                  5600       9400
(Total)                        3000       4150       8275       5000       5600      26025

One vertical, two horizontal

  • row 5: HIREYEAR is now a source for a pivot column, so it must never be NULL.
  • row 16: the p_pivotfor parameter can be a list of column names
SQL> begin
  2  :rc := advanced_pivot(
  3  p_source => q'@(
  4  select decode(grouping(dname),1,'(Total)',dname) dname,
  5    decode(grouping(hireyear),1,'(Total)',hireyear) hireyear,
  6    decode(grouping(job),1,'(Total)',job) job,
  7    sum(sal) sal
  8  from (
  9    select dname, job, sal,
 10      extract(year from hiredate) hireyear
 11    from dept join emp using(deptno)
 12  )
 13  group by rollup(dname), rollup(hireyear, job)
 14  )@',
 15  p_aggfuncs => 'sum(sal)',
 16  p_pivotfor => 'hireyear,job',
 17  p_total_label => '(Total)',
 18  p_ordercol => 'dname'
 19  );
 20  end;
 21  /

DNAME          1980_CLERK 1980_(Total) 1981_ANALYST 1981_CLERK 1981_MANAGER 1981_PRESIDENT 1981_SALESMAN 1981_(Total) 1982_CLERK 1982_(Total) 1987_CLERK 1987_(Total) (Total)_(Total)
-------------- ---------- ------------ ------------ ---------- ------------ -------------- ------------- ------------ ---------- ------------ ---------- ------------ ---------------
ACCOUNTING                                                             2450           5000                       7450       1300         1300                                    8750
RESEARCH              800          800         3000                    2975                                      5975                               1100         1100            7875
SALES                                                      950         2850                         5600         9400                                                            9400
(Total)               800          800         3000        950         8275           5000          5600        22825       1300         1300       1100         1100           26025

Conclusion

Did you really read down to here? If so, congratulations! Here is what I wish you would take away from this post:

  • The ROLLUP and CUBE extensions to GROUP BY are very powerful means to get various levels of subtotals, totals and grand totals.
  • The PIVOT clause can manage vertical and / or horizontal totals very cleanly.
  • The source query should do one ROLLUP on the vertical columns and another on the horizontal columns, except when there is exactly one of each: then just use CUBE.
  • The ADVANCED_PIVOT function generates the appropriate PIVOT clause based on the data, and can handle multiple levels of vertical and / or horizontal totals.
      • The horizontal “p_pivotfor” columns must never be NULL.
      • The vertical “p_ordercol” columns can be NULL or labeled, your choice. If there are dates or numbers, NULL allows for sorting by the original values.

Improved PIVOT Function

My generic PIVOT function only allowed one column in the FOR clause. The modestly named “Advanced” function has no such limit.

Here are some test runs to put it through its paces. I’ll save the code for the end.

UPDATE 2018-05-31: I added support for horizontal and vertical totals. The most recent code is still at the bottom of this post, but the additional explanations are here: PIVOT Function with Totals

NOTE: I changed the order of the input parameters to match the order of the PIVOT statement itself: aggregation functions before the pivot_for_clause.

One aggregation function, one FOR column

SQL> set serveroutput off
SQL> var rc refcursor
SQL> begin
  2    :rc := advanced_pivot(
  3      '(select dname, job, sal from DEPT join EMP using(DEPTNO))', 
  4      'AVG(SAL)', 
  5      'JOB'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING                      1300       2450       5000           
RESEARCH             3000        950       2975                      
SALES                            950       2850                  1400

Transposing the data and aliasing the aggregate

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select dname, job, sal from DEPT join EMP using(DEPTNO))', 
  4      'AVG(SAL) AVG_SAL', 
  5      'DNAME'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


JOB       ACCOUNTING_AVG_SAL RESEARCH_AVG_SAL SALES_AVG_SAL
--------- ------------------ ---------------- -------------
ANALYST                                  3000              
CLERK                   1300              950           950
MANAGER                 2450             2975          2850
PRESIDENT               5000                               
SALESMAN                                               1400

One aggregate, two FOR columns

The pivot columns are ordered by DNAME, JOB from left to right.

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select dname, job, sal from DEPT join EMP using(DEPTNO))', 
  4      'MAX(SAL) MAX_SAL', 
  5      'DNAME,JOB'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


ACCOUNTING_CLERK_MAX_SAL ACCOUNTING_MANAGER_MAX_SAL ACCOUNTING_PRESIDENT_MAX_SAL RESEARCH_ANALYST_MAX_SAL RESEARCH_CLERK_MAX_SAL RESEARCH_MANAGER_MAX_SAL SALES_CLERK_MAX_SAL SALES_MANAGER_MAX_SAL SALES_SALESMAN_MAX_SAL
------------------------ -------------------------- ---------------------------- ------------------------ ---------------------- ------------------------ ------------------- --------------------- ----------------------
                    1300                       2450                         5000                     3000                   1100                     2975                 950                  2850                   1600

Shorter pivot column names

The pivot column names can get long real fast. With this generic function, the way to shorten the names is to shorten the input data.

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select initcap(substr(dname,1,4)) dname, initcap(substr(job,1,5)) job, sal from DEPT join EMP using(DEPTNO))', 
  4      'MIN(SAL) "Min_Sal"', 
  5      'JOB,DNAME'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


Analy_Rese_Min_Sal Clerk_Acco_Min_Sal Clerk_Rese_Min_Sal Clerk_Sale_Min_Sal Manag_Acco_Min_Sal Manag_Rese_Min_Sal Manag_Sale_Min_Sal Presi_Acco_Min_Sal Sales_Sale_Min_Sal
------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------
              3000               1300                800                950               2450               2975               2850               5000               1250

Two aggregates, two FOR columns

I would rather not do this by hand…

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select substr(dname,1,2) dname, substr(job,1,2) job, sal from DEPT join EMP using(DEPTNO))', 
  4      'count(*),AVG(SAL) AVG', 
  5      'DNAME,JOB'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


     AC_CL  AC_CL_AVG      AC_MA  AC_MA_AVG      AC_PR  AC_PR_AVG      RE_AN  RE_AN_AVG      RE_CL  RE_CL_AVG      RE_MA  RE_MA_AVG      SA_CL  SA_CL_AVG      SA_MA  SA_MA_AVG      SA_SA  SA_SA_AVG
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
         1       1300          1       2450          1       5000          1       3000          2        950          1       2975          1        950          1       2850          4       1400

What if the FOR column is a date?

I want to break down the number of hires in each department by month, so why not truncate HIREDATE by month, then just display the month and the year?

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select dname, to_char(trunc(hiredate,''MM''),''MON-yyyy'') hiredate from DEPT join EMP using(DEPTNO))', 
  4      'count(*)', 
  5      'hiredate'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


DNAME            APR-1981   DEC-1980   DEC-1981   FEB-1981   JAN-1982   JUN-1981   MAY-1981   MAY-1987   NOV-1981   SEP-1981
-------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING              0          0          0          0          1          1          0          0          1          0
RESEARCH                1          1          1          0          0          0          0          1          0          0
SALES                   0          0          1          2          0          0          1          0          0          2

Oops! The input HIREDATE is now a string, so the pivot column names are ordered by the string values.

Ordering dates in date order

Here is a situation where implicit conversion is your friend. Just keep the truncated date in your input query and set NLS_DATE_FORMAT as you wish. I’ll order the columns by date and Oracle will convert the dates to strings in the format you specified.

SQL> alter session set nls_date_format='mon-yyyy';

Session altered.

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select dname, trunc(hiredate,''MM'') hiredate from DEPT join EMP using(DEPTNO))', 
  4      'count(*)', 
  5      'hiredate'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


DNAME            dec-1980   feb-1981   apr-1981   may-1981   jun-1981   sep-1981   nov-1981   dec-1981   jan-1982   may-1987
-------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING              0          0          0          0          1          0          1          0          1          0
RESEARCH                1          0          1          0          0          0          0          1          0          1
SALES                   0          2          0          1          0          2          0          1          0          0

The Code

create or replace function advanced_pivot(
  p_source in varchar2,       -- table, view or query in parentheses
  p_aggfuncs in varchar2,     -- one or more aggregation functions with or without aliases
  p_pivotfor in varchar2,     -- one or more columns from the input separated by commas
  p_total_label in varchar2 default null, -- label to use for pivot totals (if CUBE, ROLLUP, etc.)
  p_ordercol in varchar2 default null     -- column(s) to order by with p_total_label last
) return sys_refcursor authid current_user is
/*
Calculates pivot_in_list using SQL 1, updates SQL 2 text and opens ref cursor.
- Pivot_in_list concatenates values from all PIVOT_FOR columns
  for example, if (DNAME,JOB) then
    ('ACCOUNTING','CLERK') as "ACCOUNTING_CLERK",
    ('ACCOUNTING','MANAGER') as "ACCOUNTING_MANAGER",
  and so on
*/
  l_sql sys.odcivarchar2list := sys.odcivarchar2list(
  
q'@select listagg('(''' || expr || ''') as "' || al || '"', ',
')
within group(order by #ORDERFOR#)
from (select distinct
#EXPR# expr,
#ALIAS# al,
#PIVOTFOR#
from #SOURCE#)@',

'select * from #SOURCE#
pivot(#AGGFUNCS# for (#PIVOTFOR#) in (
#PIVOTINLIST#
)) order by #ORDERCOL#1,2,3'
  );
  l_pivotfor varchar2(255);
  l_orderfor varchar2(255);
  l_refcur sys_refcursor;
  l_pivotinlist varchar2(32767);
  l_expr varchar2(4000);
  l_alias varchar2(4000);
  l_ordercol varchar2(255);
begin
  l_pivotfor := trim( ',' from replace(p_pivotfor,' ') );
  l_orderfor := 
    case when p_total_label is null 
      then l_pivotfor
      else 'nullif(' 
        || replace(l_pivotfor, ',', ','''||p_total_label||'''),nullif(') 
        || ','''||p_total_label||''')'
    end;
  l_sql(1) := replace(l_sql(1), '#ORDERFOR#', l_orderfor);
  
  l_expr := replace(l_pivotfor,',',q'%||''','''||%');
  l_sql(1) := replace(l_sql(1), '#EXPR#', l_expr);
  
  l_alias := replace(l_pivotfor,',',q'%||'_'||%');
  l_sql(1) := replace(l_sql(1), '#ALIAS#', l_alias);
  
  for i in 1..l_sql.count loop
    l_sql(i) := replace(l_sql(i), '#SOURCE#', p_source);
    l_sql(i) := replace(l_sql(i), '#PIVOTFOR#', l_pivotfor);
  end loop;
  
  dbms_output.put_line(l_sql(1));
  dbms_output.put_line('/');
  open l_refcur for l_sql(1);
  fetch l_refcur into l_pivotinlist;
  close l_refcur;
  
  l_sql(2) := replace(l_sql(2), '#AGGFUNCS#', p_aggfuncs);
  l_sql(2) := replace(l_sql(2), '#PIVOTINLIST#', l_pivotinlist);
  
  
  l_ordercol := trim( ',' from replace(p_ordercol,' ') );
  l_ordercol :=
    case when p_total_label is null or l_ordercol is null
      then l_ordercol
      else 'nullif(' 
        || replace(l_ordercol, ',', ','''||p_total_label||'''),nullif(') 
        || ','''||p_total_label||'''),'
    end;
  l_sql(2) := replace(l_sql(2), '#ORDERCOL#', l_ordercol);
  
  dbms_output.put_line(l_sql(2));
  dbms_output.put_line('/');
  open l_refcur for l_sql(2);
  return l_refcur;
  
end advanced_pivot;
/

I really enjoyed the challenge of writing a truly generic function. I hope you enjoy using it!