CSV from CLOB with field enclosures

After my post about extracting CSV without enclosures from a CLOB, here is my solution for CSV with enclosures. It wasn’t easy…

How Oracle parses CSV

We can parse CSV in files using SQL*Loader, but I prefer External Tables with the ORACLE_LOADER access driver. Suppose an External Table has these parameters:

records delimited by NEWLINE
SKIP 0
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' NOTRIM
(or FIELDS CSV WITHOUT EMBEDDED NOTRIM in 12.2)

From the documentation and testing, what I see is this:

  1. With these parameters, we cannot embed record delimiters within quotes.
  2. When the field has no enclosures, all characters between commas are output, whitespace or not.
  3. When there are enclosures:
    1. Whitespace is allowed and stripped before and after enclosed fields, even with NOTRIM
    2. field terminators can be embedded between the enclosures
    3. record delimiters cannot be embedded: they end the record
    4. to be enclosed, field enclosures must be doubled: they are undoubled in the output
    5. Whitespace characters are: all characters considered whitespace by REGEXP ‘\s’
      9-13, 32 (space) – and 17 others if AL32UTF8!
    6. No second enclosure > error
    7. No delimiter between enclosed fields > error
    8. If there is a field delimiter at the end of a record, it ends the field but does not start a new field.

I admit that 3.A. (whitespace OK before and after enclosures) was a surprise to me.

My objective

I want to emulate the parameters listed above but I also want decent performance. For performance reasons, I decided to simplify the rules I follow:

  • 3.A. : no whitespace allowed before or after enclosed fields
  • 3.E. : I don’t care whether a character is whitespace or not.
  • 3.G. : the error is now “second enclosure not followed by a field delimiter”
  • Instead of LOG and BAD files, I log errors and bad input with DBMS_OUTPUT.PUT_LINE
  • I ignore extra newlines in the input, but I don’t attempt to suppress output with all null fields.

The code

This code should probably be written in C, which allows direct access to each character in a string. However, anyone who can install a C program on the Database server can probably copy the CLOB to a file and use External Table functionality directly! To make my PL/SQL as fast as possible, I use a technique from my COBOL and C days: the GOTO. With this type of code, calling subprocedures would measurably increase CPU usage.

create or replace function parse_csv_json_array(
  p_str in varchar2,
  p_log integer default null
) return varchar2 authid current_user is
/*
Objective: minimal service for parsing CSV with enclosures, similar to:
RECORDS DELIMITED BY NEWLINE
SKIP 0
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' NOTRIM
(or FIELDS CSV WITHOUT EMBEDDED NOTRIM in 12.2)
> EXCEPT: I allow no whitespace between field enclosures and field terminators.

- record delimiter = NEWLINE, field terminator = "," and field enclosure = '"'
  all are hard coded for the moment.
- record delimiter cannot be embedded in enclosed field.
- without enclosures, everything between field terminators and / or record delimiters is output
- with enclosures:
  - no whitespace allowed before or after enclosed fields
  - field terminators can be embedded between the enclosures
  - to be enclosed, field enclosures must be doubled: they are undoubled in the output
  - no second enclosure > 'Ending field enclosure not found'
  - no terminator after enclosed field > error 'Ending field enclosure not followed by field terminator'
  - If there is a field delimiter at the end of a record, it ends the field but does not start a new field.
- Instead of LOG and BAD files, I log errors and bad input with DBMS_OUTPUT.PUT_LINE
- I ignore extra newlines in the input, but I do not try to suppress output with all null fields.
- The input contains multiple records, so create an array of arrays: one inner array per record
*/
  l_out varchar2(4000);
  i_str integer;
  max_i_str integer;
  max_i_rec integer;
  i_end integer;
  i_num_backslash integer;
begin
  if p_str = '[]' then
    return p_str;
  end if;
  i_str := 3;
  max_i_rec := 2 - 2; -- end of "preceding record", counting 2 for length of record delimiter
  max_i_str := length(p_str) - 2;
  l_out := '[["';

  <<start_record>>
  i_end := max_i_rec + 3; -- length of record delimiter + 1
  i_end := instr(p_str, '\n', i_end);
  if i_end = 0 or i_end is null then
    -- end of record and end of input
    max_i_rec := max_i_str;
  else
    max_i_rec := i_end - 1;
    -- found \n at beginning, skip
    if max_i_rec < i_str then
      i_str := i_str + 2; -- length of record delimiter
      goto start_record;
    end if;
    -- check for '\\' before n
    i_num_backslash := 0;
    while substr(p_str, i_end-i_num_backslash-1,1) = '\' loop
      i_num_backslash := i_num_backslash + 1;
    end loop;
    if mod(i_num_backslash,2) = 1 then
      -- false alert, there was '\n' in the input and json_array made it '\\n'
      goto start_record;
    end if;
  end if;

  <<start_field>>
  if substr(p_str, i_str, 2) = '\"' then
    -- enclosures, so must do one character at a time
    i_str := i_str + 2;
    goto during_enclosed;
  else
    -- no enclosures, search for end of field in record
    i_end := instr(substr(p_str, i_str, max_i_rec - i_str + 1)||',', ',');
    l_out := l_out || substr(p_str, i_str, i_end - 1);
    i_str := i_str + i_end;
    if i_str <= max_i_rec + 1 then
      -- end of field, not end of record
      l_out := l_out || '","';
      goto start_field;
    elsif max_i_rec < max_i_str then
      -- last field of record, not last record
      i_str := max_i_rec + 3;
      l_out := l_out || '"],["';
      goto start_record;
    else
      -- last field of record, last record of input
      l_out := l_out || '"]]';
      goto end_input;
    end if;
  end if;

  <<during_enclosed>>
  i_end := instr(p_str, '\"', i_str);
  if i_end = 0 or i_end > max_i_rec then
    dbms_output.put_line('Ending field enclosure not found, input follows:');
    dbms_output.put_line('<'||json_value(p_str, '$[0]')||'>');
    l_out := null;
    goto end_input;
  end if;
  l_out := l_out || substr(p_str, i_str, i_end - i_str);
  i_str := i_end + 2;
  if substr(p_str, i_str, 2) = '\"' then
    l_out := l_out || '\"';
    i_str := i_str + 2;
  elsif substr(p_str, i_str, 1) = ',' then
      l_out := l_out || '","';
      i_str := i_str + 1;
      goto start_field;
  elsif i_str > max_i_str then
    l_out := l_out || '"]]';
    goto end_input;
  elsif i_str > max_i_rec then
    l_out := l_out || '"],["';
    i_str := max_i_rec + 3;
    goto start_record;
  else
    dbms_output.put_line('Ending field enclosure not followed by field terminator, input follows:');
    dbms_output.put_line('<'||json_value(p_str, '$[0]')||'>');
    l_out := null;
    goto end_input;
  end if;
  goto during_enclosed;

  <<end_input>>
  return l_out;
end parse_csv_json_array;
/

Test results

I took five columns from DBA_OBJECTS and added one VARCHAR2(100) and one NUMBER(4). I tested with four sizes of CLOBS: 100,000 records, 200,000 records, 400,000 records and 800,000 records. I compared input with no enclosures and with all fields enclosed.

Compared to the “simple” CSV solution, this solution is about 50% to 70% slower, depending on how many fields are enclosed. However, the number of records extracted per second remains stable as volume increases.

Please let me know if this solution is of practical interest to anyone…

Advertisements

Extract from CLOB with JSON objects

On AskTOM, Kim Berg Hansen recently used JSON_OBJECT to parse CLOB data in “name=value” format. I added a variant based on my work with CSV data.

Kim decided to use objects instead of arrays for an excellent and important reason:

  • a JSON object is an unordered collection of name/value pairs.
  • a JSON array is an ordered list of values.

CSV data is an ordered list of values, going from left to right, so a JSON array is the obvious choice. The AskTOM question concerned name/value pairs that were not necessarily in the same order and not necessarily all there! A JSON object was the natural choice.

I’m really liking JSON more and more: not only is it simple, but it seems pretty easy to figure out how best to use it for different problems.

Instead of copying everything here, I’m going to be lazy for once and invite you to read my contribution here:

More on JSON_TABLE

You will see some of the same ideas:

  • Use PIPE_CLOB to cut the CLOB into VARCHAR sized bites;
  • Use JSON_ARRAY on the whole bite to escape characters if needed;
  • Use REPLACE to form an overall array, but this time of objects;
  • Then use JSON_TABLE to generate one row per object and one column per value.

Hope this helps…

Extract CSV from CLOB with JSON arrays

Marc Bleron blogged about CSV CLOBs and JSON_TABLE two years ago. Here’s my contribution to improve on a great idea.

“Simple” CSV

The Oracle Database does a fine job of parsing CSV data in flat files, using the External Tables facility. Unfortunately, this service is not available for CSV data in VARCHARs or CLOBs. Marc showed that JSON_TABLE (and XML_TABLE) can parse “simple” CSV if it is reformatted. What is “simple”?

CSV data consists of records and fields within records.

  • Records are delimited by NEWLINE (or some other string).
  • Fields are terminated by commas (or some other string),
  • If necessary, some or all fields can be enclosed by double quotes " (or some other string).

When Marc says “simple”, he means that fields are never enclosed. This is important, because enclosed fields may contain the double quote (provided it is present twice in a row) and / or the comma. With “simple” CSV, we know that all commas are true field terminators and we don’t have to replace "" with " .

“Simple” also means that there is no trimming of whitespace: you get everything between the commas.

Finally, Marc assumes there is no terminator after the last field of the record, even though Oracle allows it.

So, “simple” CSV has delimited records with terminated fields that are never enclosed. There is no trimming of whitespace and the last field in the record is not terminated.

My contribution

  • First of all, I break the CLOB into VARCHAR2 bites using the pipelined table function PIPE_CLOB (as explained in my previous post).
  • Then I remove any field terminator that immediately precedes a record delimiter.
  • Then I use JSON_ARRAY over the entire VARCHAR2 in case some characters need to be escaped.
  • Then I do several REPLACES such that:
    • each record becomes a JSON array of string values, and
    • those arrays are included in one overall array.
  • Finally, I use JSON_TABLE to break the overall array into rows and the inner arrays into columns.

Note that everything before the COLUMNS clause in JSON_TABLE is generic, because the inner arrays can contain any number of elements.

To demonstrate, here is a CLOB containing data from the EMP table, with a trailing comma added:

7369,SMITH,CLERK,7902,1980-12-17T00:00:00,800,,20,
7499,ALLEN,SALESMAN,7698,1981-02-20T00:00:00,1600,300,30,
7521,WARD,SALESMAN,7698,1981-02-22T00:00:00,1250,500,30,
7566,JONES,MANAGER,7839,1981-04-02T00:00:00,2975,,20,
7654,MARTIN,SALESMAN,7698,1981-09-28T00:00:00,1250,1400,30,
7698,BLAKE,MANAGER,7839,1981-05-01T00:00:00,2850,,30,
7782,CLARK,MANAGER,7839,1981-06-09T00:00:00,2450,,10,
7839,KING,PRESIDENT,,1981-11-17T00:00:00,5000,,10,
7844,TURNER,SALESMAN,7698,1981-09-08T00:00:00,1500,0,30,
7876,ADAMS,CLERK,7788,1987-05-23T00:00:00,1100,,20,
7900,JAMES,CLERK,7698,1981-12-03T00:00:00,950,,30,
7902,FORD,ANALYST,7566,1981-12-03T00:00:00,3000,,20,
7934,MILLER,CLERK,7782,1982-01-23T00:00:00,1300,,10,

And the code:

with last_term_removed as (
  select replace(column_value, ','||chr(10), chr(10)) str
  from table(pipe_clob((select c from t), 3500))
)
, json_data as (
  select '[' ||
    replace (
      replace(json_array(str), ',', '","'),
      '\n',
      '"],["'
    )
    || ']' jstr  
  from last_term_removed
)
select sql_data.*
from json_data j, json_table(
   j.jstr, '$[*]'
   columns empno    number        path '$[0]'
         , ename    varchar2(128) path '$[1]'
         , job      varchar2(128) path '$[2]'
         , mgr      number        path '$[3]'
         , hiredate date          path '$[4]'
         , sal      number        path '$[5]'
         , comm     number        path '$[6]'
         , deptno   number        path '$[7]'
) sql_data;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
7369 SMITH CLERK 7902 1980-12-17 800 20
7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
7566 JONES MANAGER 7839 1981-04-02 2975 20
7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
7698 BLAKE MANAGER 7839 1981-05-01 2850 30
7782 CLARK MANAGER 7839 1981-06-09 2450 10
7839 KING PRESIDENT 1981-11-17 5000 10
7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30
7876 ADAMS CLERK 7788 1987-05-23 1100 20
7900 JAMES CLERK 7698 1981-12-03 950 30
7902 FORD ANALYST 7566 1981-12-03 3000 20
7934 MILLER CLERK 7782 1982-01-23 1300 10

 

Scalability: yes!

When Marc tested reading CLOBs directly, performance went bad as the CLOB increased in size:

Rows Seconds
91336 2.2
182672 4.3
365344 9.4
730688 22
1461376 840

 

In my tests with very similar data, the number of rows per second remains about the same:

LINES SECS LINES_PER_SEC AVG_LINES_PER_SEC PCT_DIFF
91336 1.374 66475 68517 -3
182672 2.6 70258 68517 2.5
365344 5.35 68289 68517 -.3
730688 10 73069 68517 6.6
1461376 22 66426 68517 -3.1

 

Read CLOBs fast with less memory


Reading a big CLOB is like trying to eat a burger all at once: it will take you forever if you don’t choke. Why not cut that CLOB into bite-size chunks? It’s faster, uses less memory – and it’s good table manners…

Marc Bleron has blogged about parsing CSV data in CLOBs. As the CLOB got bigger, the parsing time went way up. I decided to write a pipelined table function that would return the CLOB in VARCHAR2-sized bites, cutting at record boundaries.

  • By default, the maximum bite size is 4000 bytes. You can make it less if you need some room to use REPLACE.
  • Also by default, the record delimiter is the NEWLINE of your operating system, but you can change it.

Again, the intent is to return as many entire records as will fit in the “bite”. If the input has a record that is longer than the maximum bite size, then this function will raise an exception.

One of the pains of CLOBs is that LENGTH() and SUBSTR() deal with characters only: the LENGTHB() and SUBSTRB() functions deal with bytes, but they are limited to VARCHAR2. Fortunately, VARCHAR2 in PL/SQL can be much longer, so I read 4000 characters into a buffer and then I cut off any records that overflow the 4000-byte boundary.

UPDATE 2018-06-15: I renamed “rec_term” to “rec_delim” because external table definitions say “records delimited by”. I now raise an exception if a bite does not contain the record delimiter.

UPDATE 2018-06-16 and -17: bug fixes for edge cases.

UPDATE 2018-06-28: better error handling. The maximum record length (in bytes) was actually 4000 including the record delimiter. It is now 4000 excluding the record delimiter.

create or replace function pipe_clob (
  p_clob in clob,
  p_max_lengthb in integer default 4000,
  p_rec_delim in varchar2 default '
'
) return sys.odcivarchar2list pipelined authid current_user as
/*
Break CLOB into VARCHAR2 sized bites.
Reduce p_max_lengthb if you need to expand the VARCHAR2
in later processing.
Last record delimiter in each bite is not returned,
but if it is a newline and the output is spooled
the newline will come back in the spooled output.
Note: this cannot work if the CLOB contains more than
<p_max_lengthb> consecutive bytes without a record delimiter.
*/
  l_amount integer;
  l_offset integer;
  l_buffer varchar2(32767 BYTE);
  l_out varchar2(32767 BYTE);
  l_buff_lengthb integer;
  l_occurence integer;
  l_rec_delim_length integer := length(p_rec_delim);
begin
  if p_max_lengthb > 4000 then
    raise_application_error(-20001, 'Maximum record length (p_max_lengthb) cannot be greater than 4000.');
  elsif p_max_lengthb < 10 then
    raise_application_error(-20002, 'Maximum record length (p_max_lengthb) cannot be less than 10.');
  end if;
  if p_rec_delim is null then
    raise_application_error(-20003, 'Record delimiter (p_rec_delim) cannot be null.');
  end if;
  /* This version is limited to 4000 byte output, so I can afford to ask for 4001
      in case the record is exactly 4000 bytes long.
  */
  l_amount := p_max_lengthb + l_rec_delim_length;
  l_offset := 1;
  while l_amount = p_max_lengthb + l_rec_delim_length loop
    begin
      DBMS_LOB.READ (
       p_clob,
       l_amount,
       l_offset,
       l_buffer
      );
    exception when no_data_found then
      l_amount := 0;
    end;
    if l_amount = 0 then
      exit;
    elsif lengthb(l_buffer) <= p_max_lengthb then pipe row(rtrim(l_buffer, p_rec_delim)); exit; end if; l_buff_lengthb := p_max_lengthb + l_rec_delim_length; l_occurence := 0; while l_buff_lengthb > p_max_lengthb loop
      l_occurence := l_occurence + 1;
      l_buff_lengthb := instrb(l_buffer,p_rec_delim, -1, l_occurence) - 1;
    end loop;
    if l_buff_lengthb < 0 then
      if l_amount = p_max_lengthb + l_rec_delim_length then
        raise_application_error(
          -20004, 
          'Input clob at offset '||l_offset||' for lengthb '||p_max_lengthb||' has no record delimiter'
        );
      end if;
    end if;
    l_out := substrb(l_buffer, 1, l_buff_lengthb);
    pipe row(l_out);
    l_offset := l_offset + nvl(length(l_out),0) + l_rec_delim_length;
    l_amount := p_max_lengthb + l_rec_delim_length;
  end loop;
  return;
end;
/

In my laptop tests, read time increased linearly with the number of records. If you try it, let me know how it works out.

Foreign keys between two tables

In a comment on my post about getting all the foreign key relationships, I was asked how to get the shortest path between two tables. Not too easy…

UPDATE: you can see how hard it is by the fact I have had to correct two bugs, one concerning direct or indirect self-references.

Using the ALL_CONSTRAINTS view, we can put together all the foreign key relationships between child tables and parent tables, and we can do hierarchical queries to go up and down branches of the hierarchy. If you give me two tables, though, how do I know whether to go up or down, and which table to start with?

The idea

I decided to start with both tables and go up. I expect children to outnumber parents, so going up should be more efficient. I start with both tables, knowing that at most one will be the descendant of the other. I then go up the hierarchy until I either get to the top or hit one of the tables. The “shortest path” is the one where the final LEVEL is smallest.

The code

I start with a little subquery that provides the target tables.

with search_for (owner, table_name) as (
  select 'OBE', 'OEHR_COUNTRIES' from dual union all
  select 'OBE', 'OEHR_ORDER_ITEMS' from dual
)

Then I get all the constraints that may be involved in foreign key relationships.

, pur as (
  select
    OWNER, CONSTRAINT_NAME, TABLE_NAME,
    R_OWNER, R_CONSTRAINT_NAME
  from dba_constraints
  where constraint_type in('P','U','R')
  and owner in (select username from dba_users where oracle_maintained = 'N')
) and owner in (select username from dba_users where oracle_maintained = 'N')
)

I do a self join on the PUR subquery to add the parent table to each row.
UPDATE: added a line to RELATIONS to avoid self-references.

, relations as (
  select a.owner, a.table_name,
    a.r_owner, b.table_name r_table_name
  from pur a join pur b
  on (a.R_OWNER, a.R_CONSTRAINT_NAME) = ((b.OWNER, b.CONSTRAINT_NAME))
  and (a.OWNER, a.table_name) != ((b.OWNER, b.table_name))
)

Now I start going up the hierarchy:

  • I “start with” both tables
  • I “connect by” going up the hierarchy, stopping before the other table can be a child
    This is important, because the other table could be at the top of the hierarchy and not be a child at all.
  • The WHERE clause is applied after the CONNECT BY: I only keep rows where I found the other table.
  • I now have one row for each path from one table to the other. I use SYS_CONNECT_BY_PATH to concatenate the original child and all the parents going up the hierarchy.
    Notice I have to use one literal (<) within the first parameter and another (/) as the second parameter. I wanted to use ‘<‘ both places, but Oracle won’t let me.
  • Finally, I choose the shortest path, translate ‘/’ to ‘<‘, get rid of the first ‘<‘ and display the result.

UPDATES:

  • To avoid indirect self-referencing, I make sure the last remote table is not the same and the original table!
  • I had the ‘<‘ outside of the CASE statement. Now it is inside so you do not get ‘<<‘ anymore.
, paths as (
  select row_number() over( order by
    case when (r_owner, r_table_name) != ((connect_by_root(owner), connect_by_root(table_name)))
      then level end
    ) rn, 
    sys_connect_by_path(
      case level when 1 then owner||'.'||table_name || '<' end ||
      r_owner||'.'||r_table_name,
    '/') path
  from relations r
  where (r_owner, r_table_name) in (select * from search_for)
  start with (owner, table_name) in (select * from search_for)
  connect by nocycle (owner, table_name) = ((prior r_owner, prior r_table_name))
  and (owner, table_name) not in (select * from search_for)
)
select substr(translate(path,'/','<'), 2) best_path
from paths where rn = 1;

BEST_PATH
------------------------------------------------------------------------------------------------------------------
OBE.OEHR_ORDER_ITEMS<OBE.OEHR_ORDERS<OBE.OEHR_EMPLOYEES<OBE.OEHR_DEPARTMENTS<OBE.OEHR_LOCATIONS<OBE.OEHR_COUNTRIES

As the man said, hope this helps…

Splitting Strings: a New Champion!

My last post on splitting strings conceded that “a combined SQL + PL/SQL solution beats the best pure SQL solution.” Wrong! I didn’t try JSON_TABLE.

Pretend it’s an array

It’s not hard to reformat a string like

'a,b,c,d,e,"'

to be a JSON array:

select replace(json_array('a,b,c,d,e,"'), ',', '","') jstr
from dual;

JSTR
--------------------------
["a","b","c","d","e","\""]

Notice that the double quote in the data itself is properly escaped by the call to JSON_ARRAY. Then I turn the string into a series of array elements by putting double quotes around each comma. Once all that is done, JSON_TABLE will split it like this:

select subs from json_table(
  replace(json_array('a,b,c,d,e,"'), ',', '","'),
  '$[*]' columns (
    subs varchar2(4000) path '$'
  )
);

SUBS
----
a
b
c
d
e
"

So the escaped data is automatically unescaped. Now for speed, here is my test harness:

set serveroutput on
declare
  l_num number;
  l_timestamp timestamp;
  l_plsql_secs number;
  l_JSON_secs number;
  l_num_substrings number := 10;
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
begin
  select count(*) into l_num from user_tables where table_name = 'T';
  if l_num > 0 then
    do('drop table t purge');
  end if;
  do('create table t(id number, str varchar2(4000)) cache');
  insert into t
  select level, to_char(level,'fm000000000')||',000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010'
  from dual
  connect by level <= 10000;
  commit;
  dbms_output.put_line('Substrings' || chr(9) || 'JSON' || chr(9) || 'PL/SQL');
  for i in 1..10 loop
    select count(*) into l_num from t;
     
    l_timestamp := localtimestamp;
    select count(column_value) into l_num from (
      select id, column_value from t, table(string_tokenize(str, ','))
    );
    l_plsql_secs := extract(second from localtimestamp - l_timestamp);
     
    l_timestamp := localtimestamp;
    select count(subs) into l_num from (
      select id, subs from t, JSON_table(
      '["' || replace(str, ',' , '","') || '"]', '$[*]' columns(
      subs varchar2(99) path '$'
      )
    )
    );
    l_JSON_secs := extract(second from localtimestamp - l_timestamp);
    dbms_output.put_line(l_num_substrings || chr(9) || l_JSON_secs || chr(9) || l_plsql_secs);
    update t set str =
    str || ',000000001,000000002,000000003,000000004,000000005,000000006,000000007,000000008,000000009,000000010';
    l_num_substrings := l_num_substrings + 10;
    commit;
  end loop;
end;
/

Notice that I keep the same number of input rows here and increase the number of substrings per row.

PL/SQL does great, but JSON_TABLE appears to beat it.

Improved PIVOT Function

My generic PIVOT function only allowed one column in the FOR clause. The modestly named “Advanced” function has no such limit.

Here are some test runs to put it through its paces. I’ll save the code for the end.

UPDATE 2018-05-31: I added support for horizontal and vertical totals. The most recent code is still at the bottom of this post, but the additional explanations are here: PIVOT Function with Totals

NOTE: I changed the order of the input parameters to match the order of the PIVOT statement itself: aggregation functions before the pivot_for_clause.

One aggregation function, one FOR column

SQL> set serveroutput off
SQL> var rc refcursor
SQL> begin
  2    :rc := advanced_pivot(
  3      '(select dname, job, sal from DEPT join EMP using(DEPTNO))', 
  4      'AVG(SAL)', 
  5      'JOB'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING                      1300       2450       5000           
RESEARCH             3000        950       2975                      
SALES                            950       2850                  1400

Transposing the data and aliasing the aggregate

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select dname, job, sal from DEPT join EMP using(DEPTNO))', 
  4      'AVG(SAL) AVG_SAL', 
  5      'DNAME'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


JOB       ACCOUNTING_AVG_SAL RESEARCH_AVG_SAL SALES_AVG_SAL
--------- ------------------ ---------------- -------------
ANALYST                                  3000              
CLERK                   1300              950           950
MANAGER                 2450             2975          2850
PRESIDENT               5000                               
SALESMAN                                               1400

One aggregate, two FOR columns

The pivot columns are ordered by DNAME, JOB from left to right.

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select dname, job, sal from DEPT join EMP using(DEPTNO))', 
  4      'MAX(SAL) MAX_SAL', 
  5      'DNAME,JOB'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


ACCOUNTING_CLERK_MAX_SAL ACCOUNTING_MANAGER_MAX_SAL ACCOUNTING_PRESIDENT_MAX_SAL RESEARCH_ANALYST_MAX_SAL RESEARCH_CLERK_MAX_SAL RESEARCH_MANAGER_MAX_SAL SALES_CLERK_MAX_SAL SALES_MANAGER_MAX_SAL SALES_SALESMAN_MAX_SAL
------------------------ -------------------------- ---------------------------- ------------------------ ---------------------- ------------------------ ------------------- --------------------- ----------------------
                    1300                       2450                         5000                     3000                   1100                     2975                 950                  2850                   1600

Shorter pivot column names

The pivot column names can get long real fast. With this generic function, the way to shorten the names is to shorten the input data.

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select initcap(substr(dname,1,4)) dname, initcap(substr(job,1,5)) job, sal from DEPT join EMP using(DEPTNO))', 
  4      'MIN(SAL) "Min_Sal"', 
  5      'JOB,DNAME'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


Analy_Rese_Min_Sal Clerk_Acco_Min_Sal Clerk_Rese_Min_Sal Clerk_Sale_Min_Sal Manag_Acco_Min_Sal Manag_Rese_Min_Sal Manag_Sale_Min_Sal Presi_Acco_Min_Sal Sales_Sale_Min_Sal
------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------
              3000               1300                800                950               2450               2975               2850               5000               1250

Two aggregates, two FOR columns

I would rather not do this by hand…

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select substr(dname,1,2) dname, substr(job,1,2) job, sal from DEPT join EMP using(DEPTNO))', 
  4      'count(*),AVG(SAL) AVG', 
  5      'DNAME,JOB'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


     AC_CL  AC_CL_AVG      AC_MA  AC_MA_AVG      AC_PR  AC_PR_AVG      RE_AN  RE_AN_AVG      RE_CL  RE_CL_AVG      RE_MA  RE_MA_AVG      SA_CL  SA_CL_AVG      SA_MA  SA_MA_AVG      SA_SA  SA_SA_AVG
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
         1       1300          1       2450          1       5000          1       3000          2        950          1       2975          1        950          1       2850          4       1400

What if the FOR column is a date?

I want to break down the number of hires in each department by month, so why not truncate HIREDATE by month, then just display the month and the year?

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select dname, to_char(trunc(hiredate,''MM''),''MON-yyyy'') hiredate from DEPT join EMP using(DEPTNO))', 
  4      'count(*)', 
  5      'hiredate'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


DNAME            APR-1981   DEC-1980   DEC-1981   FEB-1981   JAN-1982   JUN-1981   MAY-1981   MAY-1987   NOV-1981   SEP-1981
-------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING              0          0          0          0          1          1          0          0          1          0
RESEARCH                1          1          1          0          0          0          0          1          0          0
SALES                   0          0          1          2          0          0          1          0          0          2

Oops! The input HIREDATE is now a string, so the pivot column names are ordered by the string values.

Ordering dates in date order

Here is a situation where implicit conversion is your friend. Just keep the truncated date in your input query and set NLS_DATE_FORMAT as you wish. I’ll order the columns by date and Oracle will convert the dates to strings in the format you specified.

SQL> alter session set nls_date_format='mon-yyyy';

Session altered.

SQL> begin
  2    :rc := advanced_pivot(
  3      '(select dname, trunc(hiredate,''MM'') hiredate from DEPT join EMP using(DEPTNO))', 
  4      'count(*)', 
  5      'hiredate'
  6    );
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> print :rc


DNAME            dec-1980   feb-1981   apr-1981   may-1981   jun-1981   sep-1981   nov-1981   dec-1981   jan-1982   may-1987
-------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING              0          0          0          0          1          0          1          0          1          0
RESEARCH                1          0          1          0          0          0          0          1          0          1
SALES                   0          2          0          1          0          2          0          1          0          0

The Code

create or replace function advanced_pivot(
  p_source in varchar2,       -- table, view or query in parentheses
  p_aggfuncs in varchar2,     -- one or more aggregation functions with or without aliases
  p_pivotfor in varchar2,     -- one or more columns from the input separated by commas
  p_total_label in varchar2 default null, -- label to use for pivot totals (if CUBE, ROLLUP, etc.)
  p_ordercol in varchar2 default null     -- column(s) to order by with p_total_label last
) return sys_refcursor authid current_user is
/*
Calculates pivot_in_list using SQL 1, updates SQL 2 text and opens ref cursor.
- Pivot_in_list concatenates values from all PIVOT_FOR columns
  for example, if (DNAME,JOB) then
    ('ACCOUNTING','CLERK') as "ACCOUNTING_CLERK",
    ('ACCOUNTING','MANAGER') as "ACCOUNTING_MANAGER",
  and so on
*/
  l_sql sys.odcivarchar2list := sys.odcivarchar2list(
  
q'@select listagg('(''' || expr || ''') as "' || al || '"', ',
')
within group(order by #ORDERFOR#)
from (select distinct
#EXPR# expr,
#ALIAS# al,
#PIVOTFOR#
from #SOURCE#)@',

'select * from #SOURCE#
pivot(#AGGFUNCS# for (#PIVOTFOR#) in (
#PIVOTINLIST#
)) order by #ORDERCOL#1,2,3'
  );
  l_pivotfor varchar2(255);
  l_orderfor varchar2(255);
  l_refcur sys_refcursor;
  l_pivotinlist varchar2(32767);
  l_expr varchar2(4000);
  l_alias varchar2(4000);
  l_ordercol varchar2(255);
begin
  l_pivotfor := trim( ',' from replace(p_pivotfor,' ') );
  l_orderfor := 
    case when p_total_label is null 
      then l_pivotfor
      else 'nullif(' 
        || replace(l_pivotfor, ',', ','''||p_total_label||'''),nullif(') 
        || ','''||p_total_label||''')'
    end;
  l_sql(1) := replace(l_sql(1), '#ORDERFOR#', l_orderfor);
  
  l_expr := replace(l_pivotfor,',',q'%||''','''||%');
  l_sql(1) := replace(l_sql(1), '#EXPR#', l_expr);
  
  l_alias := replace(l_pivotfor,',',q'%||'_'||%');
  l_sql(1) := replace(l_sql(1), '#ALIAS#', l_alias);
  
  for i in 1..l_sql.count loop
    l_sql(i) := replace(l_sql(i), '#SOURCE#', p_source);
    l_sql(i) := replace(l_sql(i), '#PIVOTFOR#', l_pivotfor);
  end loop;
  
  dbms_output.put_line(l_sql(1));
  dbms_output.put_line('/');
  open l_refcur for l_sql(1);
  fetch l_refcur into l_pivotinlist;
  close l_refcur;
  
  l_sql(2) := replace(l_sql(2), '#AGGFUNCS#', p_aggfuncs);
  l_sql(2) := replace(l_sql(2), '#PIVOTINLIST#', l_pivotinlist);
  
  
  l_ordercol := trim( ',' from replace(p_ordercol,' ') );
  l_ordercol :=
    case when p_total_label is null or l_ordercol is null
      then l_ordercol
      else 'nullif(' 
        || replace(l_ordercol, ',', ','''||p_total_label||'''),nullif(') 
        || ','''||p_total_label||'''),'
    end;
  l_sql(2) := replace(l_sql(2), '#ORDERCOL#', l_ordercol);
  
  dbms_output.put_line(l_sql(2));
  dbms_output.put_line('/');
  open l_refcur for l_sql(2);
  return l_refcur;
  
end advanced_pivot;
/

I really enjoyed the challenge of writing a truly generic function. I hope you enjoy using it!