Splitting strings before they’re CLOBs!

After I tokenized two CLOBs in response to a forum question, Andy Sayer mentioned that they were probably files first. So I smacked my forehead and used external tables. When the light bulb went on, I saw that the comma needed to be a record delimiter, not a field separator. Easy.

create table item_ids_EXT(  
  "RN" NUMBER,  
  "ITEM_ID" VARCHAR2(16)  
)  
ORGANIZATION EXTERNAL(  
  TYPE ORACLE_LOADER  
  DEFAULT DIRECTORY EXT_DATA_DIR  
  ACCESS PARAMETERS(  
    records delimited by ','   
    BADFILE EXT_DATA_DIR:'i_item_id.txt.bad'  
    LOGFILE EXT_DATA_DIR:'i_item_id.txt.log'  
    FIELDS TERMINATED BY "§" NOTRIM  
    MISSING FIELD VALUES ARE NULL  
    REJECT ROWS WITH ALL NULL FIELDS  
    (  
      "RN" recnum,  
      "ITEM_ID" char(255)  
    )  
  )  
  location ('i_item_id.txt')  
)  
REJECT LIMIT 10;

create table item_keys_EXT(  
  "RN" NUMBER,  
  "ITEM_KEY" VARCHAR2(255)  
)  
ORGANIZATION EXTERNAL(  
  TYPE ORACLE_LOADER  
  DEFAULT DIRECTORY EXT_DATA_DIR  
  ACCESS PARAMETERS(  
    records delimited by ','   
    BADFILE EXT_DATA_DIR:'i_item_id.txt.bad'  
    LOGFILE EXT_DATA_DIR:'i_item_id.txt.log'  
    FIELDS TERMINATED BY "§" NOTRIM  
    MISSING FIELD VALUES ARE NULL  
    REJECT ROWS WITH ALL NULL FIELDS  
    (  
      "RN" recnum,  
      "ITEM_KEY" char(255)  
    )  
  )  
  location ('i_item_key.txt')  
)  
REJECT LIMIT 10;  

set timing on

create table u as  
select item_id, item_key  
from item_ids_ext  
join item_keys_ext using(rn);

Table U created.

Elapsed: 00:00:00.787

select floor(2 * 60 * 60 / 0.787) times_faster from dual;

TIMES_FASTER
------------
        9148
Advertisements

Splitting strings when they are CLOBs

Like many of you, I have enjoyed using SQL to make data processing go hundreds of times faster. My new record is: 8000 times faster :-)

In a recent ODC thread, the question was how to split out comma separated values from two CLOBs, one containing “item ids” and the other “item keys”. The ids and the keys had to be matched by position: the first id with the first key, the second id with the second key, and so on. Then each id/key pair was to be inserted as a row into a target table.

With 50,000 ids of 7 characters and 50,000 keys of 172 characters, the current processing takes 2 hours.

Test data

First I’ll create a table with the two CLOBs.

--drop table t purge;
create table t as
select 
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(E, ',' || to_char(level,'fm0000009'))
      order by level
    ) AS CLOB
  ), 2
) i_item_id,
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(E, ',' || rpad(to_char(level,'fm0000009'),172,'*'))
      order by level
    ) AS CLOB
  ), 2
) i_item_key
from dual
connect by level <= 50000;

Splitting CLOB strings

Instead of seeking a “perfect” custom-made solution, I just put together two functions I have already written about:

  1. The PIPE_CLOB function: a pipelined table function that reads a CLOB and outputs up to 4000 bytes at a time, respecting a delimiter.
  2. The STRING_TOKENIZE function: a pipelined table function that splits a VARCHAR2 string according to a delimiter.

All I have to do is feed the output from the first function into the second, and voilà!

select b.column_value item_id
from t,
table(pipe_clob( i_item_id, 4000, ',')) a,
table(string_tokenize(a.column_value,',')) b;

Combining the two CLOBs

Remember, the requirement was to combine the values from both CLOBs by position, in order to insert them into a target table. Since the functions return the values in order, I just use ROWNUM to number the values and use that number to join them.

--drop table u purge;
create table u as
with item_ids as (
  select rownum rn,
    b.column_value item_id
    from t,
  table(pipe_clob( i_item_id, 4000, ',')) a,
  table(string_tokenize(a.column_value,',')) b
)
, item_keys as (
  select rownum rn,
    b.column_value item_key
    from t,
  table(pipe_clob( i_item_key, 4000, ',')) a,
  table(string_tokenize(a.column_value,',')) b
)
select item_id, item_key 
from item_ids join item_keys using(rn);

Table U created.

Elapsed: 00:00:00.879

That’s right, processing time went from two hours to less than one second.

Now, I suspect that the questioner’s code is doing more than just a simple insert, but until he or she provides more details about the requirement, I get to maintain my claim:

8000 times faster!

(which I hope applies only to my code, not to the speed at which time flies…)

Using SQL Translation Framework to access package variables

My last post had code to rewrite SELECT statements that tried to access public package variables. Here I’ll integrate the code into the SQL Translation Framework. Use this only in development environments.

Why only in Dev? Overhead and complication

You normally use the SQL Translation Framework for database migration. The idea is to translate SQL statements from non-Oracle syntax to Oracle syntax, test the translated statements and ensure that the translations are used instead of the original. You don’t want to do all that work every time a statement is submitted to the database, and you don’t want to do it for every statement. In fact, Oracle drivers can now allow applications to say which statements are to be translated and which are to be left alone.

My solution is necessarily generic:

  • The translation service is enabled at the session level.
  • At that point, every statement execution is preceded by a “parse call” and the execution of my TRANSLATE_SQL procedure.

Those “parse calls” will result in “soft parses” since the statement already resides in the library cache. That is still unacceptable overhead in any production-type environment.

I would add that on-the-fly translation in production would cause havoc if it ever stopped working, and would complicate debugging considerably.

Don’t do it.

In fact, I highly recommend just using the TRANSLATE_SQL procedure as presented in my previous post. You can then see, edit, test and deploy the result. You might want to use on-the-fly translation if you are writing a SELECT statement that accesses package variables and you want to make and test changes faster, but that’s it.

Anyway, for what it’s worth here is the code:

create or replace package translate_pkg_vars authid current_user as

  g_trans_profile_name constant varchar2(30) := 'PACKAGE_VARS';
  -- Standard prefix of functions in the generated WITH clause
  wf constant varchar2(8) := 'WF';
  
  procedure create_profile;
  procedure enable_trans;
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  );
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  );
  procedure disable_trans;
  procedure drop_profile;
end translate_pkg_vars;
/
create or replace package body translate_pkg_vars as
  
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
  
  procedure create_profile is
    name_in_use exception;
    pragma exception_init(name_in_use, -955);
  begin
    dbms_sql_translator.create_profile(g_trans_profile_name); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_TRANSLATOR, 
    attribute_value  =>  $$PLSQL_UNIT); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_RAISE_TRANSLATION_ERROR, 
    attribute_value  =>  DBMS_SQL_TRANSLATOR.ATTR_VALUE_FALSE); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_TRACE_TRANSLATION, 
    attribute_value  =>  DBMS_SQL_TRANSLATOR.ATTR_VALUE_FALSE); 
  exception when name_in_use then null;
  end create_profile;
  
  procedure enable_trans is
  begin
    do('alter session set sql_translation_profile = '|| g_trans_profile_name);
    do('alter session set events = ''10601 trace name context forever, level 32''');
  end enable_trans;
  
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  ) is
    not_a_procedure exception;
    pragma exception_init(not_a_procedure, -6553);
    invalid_identifier exception;
    pragma exception_init(invalid_identifier, -904);
    
    g_with_clause varchar2(32767);
    g_select_sql varchar2(32767);
    l_cursor_id integer;
    g_last_error_position integer;
    g_f_num integer := 0;
    
    procedure check_input is
      l_first_word varchar2(256);
      clob_too_long exception;
      pragma exception_init(clob_too_long, -6502);
      l_clob_too_long_msg constant varchar2(999) :=
        'Cannot parse statement longer than 32767 bytes: '
        || 'DBMS_SQL might return erroneous error position.';
    begin
      -- initialize with clause and select sql. Make sure input is 32k max.
      select lower(regexp_substr(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)',1,1,'im',4)) into l_first_word from dual;
      if l_first_word = 'select' then
        g_with_clause := null;
        g_select_sql := sql_text;
      elsif l_first_word = 'with' then
        g_with_clause := 'WITH
';
        g_select_sql := regexp_replace(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)','\1',1,1,'im');
      else
        dbms_output.put_line('Input must be a SELECT statement.');
        raise_application_error(-20000, 'Input must be a SELECT statement.');
      end if;
    exception when clob_too_long then
      raise_application_error(-20001, l_clob_too_long_msg);
    end check_input;
    
    procedure add_function is
      l_errm varchar2(4000);
      l_member varchar2(128);
      l_identifier varchar2(4000);
      l_func_name varchar2(30 BYTE);
    begin
      -- change last error position from byte 0-offset to character 1-offset
      -- find last member of identifier
      -- find entire identifier
      -- generate function name
      -- add function definition to with clause
      -- replace identifier by function call.
      g_last_error_position :=
        length(
          substrb(g_with_clause||g_select_sql,1,g_last_error_position)
        ) 
        - nvl(lengthb(g_with_clause), 0)
        + 1;
      l_errm := sqlerrm;
      if sqlcode = -6553 then
        if instr(l_errm, 'PLS-221') = 0 then
          raise_application_error(
            -20002,
            'Identifier '||l_identifier||' is invalid even within PL/SQL.'
          );
        else
          select regexp_substr(l_errm, '''(.+)''',1,1,null,1) into l_member
          from dual;
        end if;
      else
        select regexp_substr(l_errm, '"([^"]+)"[^"]*$',1,1,null,1) into l_member
        from dual;
      end if;
      select regexp_substr(
        g_select_sql, 
        '([[:alpha:]][[:alpha:]0-9_#$]*\s*[.]\s*)+'||l_member,
        g_last_error_position,1,'im'
      )
      into l_identifier from dual;
      g_f_num := g_f_num + 1;
      l_func_name := substrb(
        WF||g_f_num||'_'||regexp_replace(l_identifier,'\s|[.]','_'),
        1,30
      );
      g_with_clause := nvl(g_with_clause, 'WITH
')
        || 'function '||l_func_name||' return '
        ||l_identifier||'%type deterministic is begin return '
        ||l_identifier||'; end;
';
      g_select_sql := regexp_replace(
        g_select_sql, l_identifier, l_func_name, g_last_error_position, 1
      );

    end add_function;
    
  begin
    check_input;
    loop
      begin
        l_cursor_id := dbms_sql.open_cursor();
        dbms_sql.parse(l_cursor_id,g_with_clause||g_select_sql,1);
        dbms_sql.close_cursor(l_cursor_id);
        -- If all is well, we are done
        exit;
      exception when not_a_procedure or invalid_identifier then
        g_last_error_position := dbms_sql.last_error_position;
        dbms_sql.close_cursor(l_cursor_id);
        add_function;
      when others then
        if l_cursor_id is not null then
          dbms_sql.close_cursor(l_cursor_id);
        end if;
        raise;
      end;
    end loop;
    
    translated_text := g_with_clause||g_select_sql;
    
  end translate_sql;
  
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  ) is
  begin
    translated_sqlstate := null;
  end translate_error;
  
  procedure disable_trans is
  begin
    do('alter session set sql_translation_profile = null');
    do('alter session set events = ''10601 trace name context off''');
  end disable_trans;
  
  procedure drop_profile is
    profile_not_found exception;
    pragma exception_init(profile_not_found, -24252);
  begin
    dbms_sql_translator.drop_profile(g_trans_profile_name);
  exception when profile_not_found then null;
  end drop_profile;
  
end translate_pkg_vars;
/

And a quick test:

exec translate_pkg_vars.create_profile;
exec translate_pkg_vars.enable_trans;

select dbms_lob.lobmaxsize, dbms_lob.call from dual;

WF2_DBMS_LOB_LOBMAXSIZE WF1_DBMS_LOB_CALL
----------------------- -----------------
             1.8447E+19                12
          
exec translate_pkg_vars.disable_trans;
exec translate_pkg_vars.drop_profile;

Make public package variables available to SQL

I recently posted an experimental approach to accessing package constants from SQL. Here is a more robust approach that includes all variables.

The problem

Within PL/SQL, SQL statements can be either static or dynamic. Static SQL can access any PL/SQL variables that are within scope, including variables defined in package specifications. PL/SQL rewrites the statement, putting a bind variable where the PL/SQL variable was and binding the value before execution.

Dynamic SQL, whether submitted from PL/SQL or any other language, cannot directly access package variables, even though they are part of the published specification. This frustrates developers, since access to functions has no such restriction. Getter functions are a workaround, but they are no fun.

Possible solutions

  • Table functions: whether pipelined or not, table functions can submit static SQL but be accessed by dynamic SQL. This approach is not very flexible, but when it fits the bill it can be used in production today.
  • Literals: for constants only, rewrite the SQL and replace package constants by their values, as found in DBA_SOURCE. My previous post tried this.
  • Bind variables: this would mimic what PL/SQL does for static SQL. To be a full solution, the values would have to be looked up and bound before executing the rewritten statement.
  • Getter functions.

Today’s choice

Starting with database version 12.1, we can declare PL/SQL functions in the WITH clause of a SELECT statement, so that is where I’m going to put my getter functions.

Take for example the statement:

select dbms_lob.call c from dual

Add a WITH clause containing a function, then call the function instead of accessing the package constant:

WITH function WF1_dbms_lob_call return dbms_lob.call%type deterministic is 
begin return dbms_lob.call; end;
select WF1_dbms_lob_call c from dual
  • By declaring the function to be deterministic, it is called once per execution and not once per row – at least in my tests using 18c. There appears to be no need to use scalar subquery caching.
  • Unlike the “literal” solution, any package variable that is valid in static SQL will be valid in the rewritten SQL.
  • There is no need to access the values of the variables in order to bind them.
  • The %type attribute ensures that the function returns a value of the same data type as the package variable.

Finding the package variables

Before making and using a getter function, I have to know that my SQL is trying to access a package variable! Instead of trying to reinvent parsing, I use the DBMS_SQL.PARSE procedure.

Let’s assume a SELECT statement is perfectly valid, except for a reference to a package variable. When I call DBMS_SQL.PARSE, I will get one of two errors:

  • “ORA-06553: PLS-221: ‘<variable>’ is not a procedure or is undefined”
  • ORA-00904: “<package>”.”<record/object>”.”<member>”: invalid identifier

I can also access the “last error position”, which gives me the location of the package variable within the SQL text. I now have enough information to extract the package variable text from the SELECT, generate a getter function and call that function instead of referencing the variable directly. After that, I go through the same parse / replace operations until the statement parses without error.

Please note that if there truly is an invalid identifier in the statement, the generated getter function will cause the next parse to fail with a different error message, which is enough to raise an appropriate exception.

Known limitations

  • Requires database version 12.1 or later. Tested in 18c.
  • This version is restricted to SELECT statements. There may or may not be a preexisting WITH clause.
  • The SQL statement, including the generated getter functions, cannot exceed 32767 bytes in length. When I try to parse anything longer, DBMS_SQL gives me the wrong last error position.
  • At this time, I do not handle quoted identifiers.
  • In theory, comments are allowed between members and periods:
    select dbms_lob  /*WEIRD*/ . call from dual;
    I can handle the spaces, but not the comments.
  • If you want to access a boolean in SQL, this utility can’t help you.

No SQL Translation Framework yet

The code here just rewrites the SELECT; it doesn’t use the SQL Translation Framework to execute it directly. Maybe next time?

create or replace package translate_pkg_vars authid current_user as
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  );
end translate_pkg_vars;
/
create or replace package body translate_pkg_vars as

  -- Standard prefix of functions in the generated WITH clause
  wf constant varchar2(8) := 'WF';
  
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  ) is
    not_a_procedure exception;
    pragma exception_init(not_a_procedure, -6553);
    invalid_identifier exception;
    pragma exception_init(invalid_identifier, -904);
    
    g_with_clause varchar2(32767);
    g_select_sql varchar2(32767);
    l_cursor_id integer;
    g_last_error_position integer;
    g_f_num integer := 0;
    
    procedure check_input is
      l_first_word varchar2(256);
      clob_too_long exception;
      pragma exception_init(clob_too_long, -6502);
      l_clob_too_long_msg constant varchar2(999) :=
        'Cannot parse statement longer than 32767 bytes: '
        || 'DBMS_SQL might return erroneous error position.';
    begin
      -- initialize with clause and select sql. Make sure input is 32k max.
      select lower(regexp_substr(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)',1,1,'im',4)) into l_first_word from dual;
      if l_first_word = 'select' then
        g_with_clause := null;
        g_select_sql := sql_text;
      elsif l_first_word = 'with' then
        g_with_clause := 'WITH
';
        g_select_sql := regexp_replace(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)','\1',1,1,'im');
      else
        raise_application_error(-20000, 'Input must be a SELECT statement.');
      end if;
    exception when clob_too_long then
      raise_application_error(-20001, l_clob_too_long_msg);
    end check_input;
    
    procedure add_function is
      l_errm varchar2(4000);
      l_member varchar2(128);
      l_identifier varchar2(4000);
      l_func_name varchar2(30 BYTE);
    begin
      -- change last error position from byte 0-offset to character 1-offset
      -- find last member of identifier
      -- find entire identifier
      -- generate function name
      -- add function definition to with clause
      -- replace identifier by function call.
      g_last_error_position :=
        length(
          substrb(g_with_clause||g_select_sql,1,g_last_error_position)
        ) 
        - nvl(lengthb(g_with_clause), 0)
        + 1;
      l_errm := sqlerrm;
      if sqlcode = -6553 then
        if instr(l_errm, 'PLS-221') = 0 then
          raise_application_error(
            -20002,
            'Identifier '||l_identifier||' is invalid even within PL/SQL.'
          );
        else
          select regexp_substr(l_errm, '''(.+)''',1,1,null,1) into l_member
          from dual;
        end if;
      else
        select regexp_substr(l_errm, '"([^"]+)"[^"]*$',1,1,null,1) into l_member
        from dual;
      end if;
      select regexp_substr(
        g_select_sql, 
        '([[:alpha:]][[:alpha:]0-9_#$]*\s*[.]\s*)+'||l_member,
        g_last_error_position,1,'im'
      )
      into l_identifier from dual;
      g_f_num := g_f_num + 1;
      l_func_name := substrb(
        WF||g_f_num||'_'||regexp_replace(l_identifier,'\s|[.]','_'),
        1,30
      );
      g_with_clause := nvl(g_with_clause, 'WITH
')
        || 'function '||l_func_name||' return '
        ||l_identifier||'%type deterministic is begin return '
        ||l_identifier||'; end;
';
      g_select_sql := regexp_replace(
        g_select_sql, l_identifier, l_func_name, g_last_error_position, 1
      );

    end add_function;
    
  begin
    check_input;
    loop
      begin
        l_cursor_id := dbms_sql.open_cursor();
        dbms_sql.parse(l_cursor_id,g_with_clause||g_select_sql,1);
        dbms_sql.close_cursor(l_cursor_id);
        -- If all is well, we are done
        exit;
      exception when not_a_procedure or invalid_identifier then
        g_last_error_position := dbms_sql.last_error_position;
        dbms_sql.close_cursor(l_cursor_id);
        add_function;
      when others then
        if l_cursor_id is not null then
          dbms_sql.close_cursor(l_cursor_id);
        end if;
        raise;
      end;
    end loop;
    
    translated_text := g_with_clause||g_select_sql;
    
  end translate_sql;
  
end translate_pkg_vars;
/
declare
  sql_text  CLOB := 'select dbms_lob.call c from dual';
  translated_text CLOB;
begin
  translate_pkg_vars.translate_sql(sql_text, translated_text);
  dbms_output.put_line('output:
'||translated_text);
end;
/
output:
WITH
function WF1_dbms_lob_call return dbms_lob.call%type deterministic is begin return dbms_lob.call; end;
select WF1_dbms_lob_call c from dual

Make public package constants available to SQL

This is an enhancement request on ODC and a recent Twitter topic. Here is a homemade solution using the SQL Translation Framework.

UPDATE 2018-12-31: this solution became obsolete pretty quickly! Please see this post for a much better solution.

When executing static SQL inside PL/SQL, the developer can directly reference package variables because PL/SQL will automatically treat them as bind variables and bind the correct values. If the developer wants to test the SQL statement outside of PL/SQL, he/she discovers that the package variable is interpreted as a package function, which of course does not exist.

The package I wrote can create a translation profile, enable it, disable it and drop it. It also contains the TRANSLATE_SQL procedure that does the work. It is meant strictly for development environments as a transparent workaround.

The procedure parses the statement, and if it finds something that can be a reference to a CONSTANT variable defined in a package, it substitutes the value found in DBA_SOURCE.

create or replace package translate_constants as
  g_profile_name constant varchar2(30) := 'CONSTANTS';
  procedure create_profile;
  procedure enable_trans;
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  );
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  );
  procedure disable_trans;
  procedure drop_profile;
end translate_constants;
/
create or replace package body translate_constants as
  type t_parm_val is record (
    parm varchar2(512),
    val varchar2(4000)
  );
  type tt_parm_val is table of t_parm_val;
  lt_parm_val tt_parm_val := null;
  
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
  
  procedure create_profile is
    name_in_use exception;
    pragma exception_init(name_in_use, -955);
  begin
    dbms_sql_translator.create_profile(g_profile_name); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_TRANSLATOR, 
    attribute_value  =>  $$PLSQL_UNIT); 
  exception when name_in_use then null;
  end create_profile;
  
  procedure enable_trans is
  begin
    do('alter session set sql_translation_profile = '|| g_profile_name);
    do('alter session set events = ''10601 trace name context forever, level 32''');
  end enable_trans;
  
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  ) is
    l_first_word varchar2(256);
    l_cursor_id integer;
    not_a_procedure exception;
    pragma exception_init(not_a_procedure, -6553);
    l_errm varchar2(4000);
    l_constant varchar2(4000);
    l_identifier varchar2(4000);
    l_test_sql varchar2(4000);
    l_text varchar2(4000);
  begin
    select lower(regexp_substr(sql_text,'\w+')) into l_first_word from dual;
    if l_first_word not in ('select','with','insert','update','delete','merge') then
      translated_text := null;
      return;
    end if;
    l_errm := 'start';
    translated_text := sql_text;
    while l_errm is not null loop
      l_errm := null;
      begin
        l_cursor_id := dbms_sql.open_cursor();
        dbms_output.put_line(translated_text);
        dbms_sql.parse(l_cursor_id,translated_text,1);
      exception when not_a_procedure then
        l_errm := sqlerrm;
        if instr(l_errm, 'PLS-221:') = 0 then
          raise;
        end if;
        dbms_output.put_line(l_errm);
        select regexp_substr(l_errm, q'§'(.+)'§',1,1,null,1) into l_constant from dual;
        dbms_output.put_line(l_constant);
        select regexp_substr(
          translated_text, 
          '([A-Z][A-Z0-9#_$]*[.])?[A-Z][A-Z0-9#_$]*[.]'||l_constant
          ,1,1,'i'
        )
        into l_identifier from dual;
        dbms_output.put_line(l_identifier);        
        dbms_utility.expand_sql_text(
          'select ' || l_identifier || ' from dual',
          l_test_sql
        );
        dbms_output.put_line(l_test_sql);
        dbms_output.put_line(regexp_substr(l_test_sql,'"(.*?)"',1,1,null,1));
        dbms_output.put_line(regexp_substr(l_test_sql,'"(.*?)"',1,2,null,1));
        select regexp_substr(text, ':=\s*?(\S.*?)\s*;',1,1,null,1)
        into l_text from dba_source
        where owner = regexp_substr(l_test_sql,'"(.*?)"',1,1,null,1)
          and name = regexp_substr(l_test_sql,'"(.*?)"',1,2,null,1) 
          and type = 'PACKAGE' 
          and regexp_like(text, l_constant||'\s+constant\s+','i');
        
        dbms_output.put_line('>'||l_text);
        translated_text := regexp_replace(translated_text,'([^.])'||l_identifier,'\1'||l_text,1,0,'m');
      end;
      dbms_sql.close_cursor(l_cursor_id);
    end loop;
    if l_cursor_id is not null then
      dbms_sql.close_cursor(l_cursor_id);
    end if;
  exception when others then
    if l_cursor_id is not null then
      dbms_sql.close_cursor(l_cursor_id);
    end if;
    raise;

  end translate_sql;
  
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  ) is
  begin
    translated_sqlstate := null;
  end translate_error;
  
  procedure disable_trans is
  begin
    do('alter session set sql_translation_profile = null');
    do('alter session set events = ''10601 trace name context off''');
  end disable_trans;
  
  procedure drop_profile is
    profile_not_found exception;
    pragma exception_init(profile_not_found, -24252);
  begin
    dbms_sql_translator.drop_profile(g_profile_name);
  exception when profile_not_found then null;
  end drop_profile;
  
end translate_constants;
/

And a small test:

exec translate_constants.create_profile;
exec translate_constants.enable_trans;

select dbms_lob.lobmaxsize maxsize,
sys.dbms_lob.lobmaxsize samething from dual;

   MAXSIZE  SAMETHING
---------- ----------
1.8447E+19 1.8447E+19
          
exec translate_constants.disable_trans;
exec translate_constants.drop_profile;

Remember, this is only to help developers. It is not something to ever deploy in a test or production environment.

SQL Templates and multiple REPLACE

I frequently use what I call “SQL templates”, where I replace multiple placeholders by calculated values. Here are two methods that don’t require a stored procedure.

What do I mean by “template”?

SQL statements have to be parsed before they can be executed. The standard way to minimise parsing is to use bind variables. If we use a bind variable instead of a literal, the same statement can be parsed once and executed many times with different values assigned to the variable.

We cannot put the names of tables or columns into bind variables: these are data objects that have to be known at parse time.

What I mean by “template” is a SQL statement with a specific logic, but without the names of the data objects. I get the names I need from input parameters and queries, then I put them in the statement where the placeholders were. If I do this twice with different parameters and the resulting text is different, the optimizer will of course parse each result separately before execution.

So: don’t confuse this technique with bind variables.

Please keep in mind that I never use this technique in production. I generate the SQL, read it, test it and then deploy it. This technique is meant to make SQL development more productive, not replace the developer.

If you ever want to generate SQL in production, work very hard to avoid SQL injection bugs. For example, use DBMS_ASSERT or data dictionary lookups to make sure input parameters are really data objects.

Examples

select * from emp where ename = 'KING';

This statement is not a template and it doesn’t use bind variables.

select * from emp where ename = :ename;

This statement is not a template, but it uses a bind variable.

select /*+ qb_name(COMPARE) */
  #COL_LIST#,
  decode(sign(sum(NEW_CNT)), 1, 'I', 'D') OP,
  abs(sum(NEW_CNT)) CNT
FROM (
  select /*+ qb_name(old) */
  #COL_LIST#
    , -1 NEW_CNT
  from #OLD_TABLE# O
  union all
  select /*+ qb_name(new) */
  #COL_LIST#
    , 1 NEW_CNT
  from #NEW_TABLE# N
)
group by
  #COL_LIST#
having sum(NEW_CNT) != 0
order by 1, OP;

This is a “template” as I use the word here. It shows the logic for comparing two tables using the GROUP BY method. To get a usable SQL statement, I need to replace all the placeholders (enclosed in #) by real values.

Model clause technique

Here is one way to do multiple replacements using pure SQL:

with template(sql_text) as (select cast(
q'§<template>§'
as varchar2(4000)) from dual
)
, parms(v_from, v_to) as (
  <parms>
)
select sql_text from template
model
REFERENCE P ON (select rownum p_dim, v_from, v_to from parms)
  DIMENSION BY (p_dim) measures (v_from, v_to) 
main M
dimension by (1 m_dim)
measures(sql_text)
rules iterate(99) until P.v_from[iteration_number+1] is null (
  sql_text[1] = replace(sql_text[1],P.v_from[iteration_number+1], P.v_to[iteration_number+1])
);
  1. Put the template text in the TEMPLATE subquery.
  2. In the PARMS subquery, create one row per replacement: V_FROM is the placeholder and V_TO is the value.
  3. The MODEL clause will iterate through the reference model P and do each replacement in turn.

Here is an example that generates a comparison. It’s a silly example because I am comparing EMP to itself, but it’s just for illustration!

with template(sql_text) as (select cast(
q'§select /*+ qb_name(COMPARE) */
  #COL_LIST#,
  decode(sign(sum(NEW_CNT)), 1, 'I', 'D') OP,
  abs(sum(NEW_CNT)) CNT
FROM (
  select /*+ qb_name(old) */
  #COL_LIST#
    , -1 NEW_CNT
  from #OLD_TABLE# O
  union all
  select /*+ qb_name(new) */
  #COL_LIST#
    , 1 NEW_CNT
  from #NEW_TABLE# N
)
group by
  #COL_LIST#
having sum(NEW_CNT) != 0
order by 1, OP;§'
as varchar2(4000)) from dual
)
, parms(v_from, v_to) as (
  select '#NEW_TABLE#', 'EMP' from dual
  union all
  select '#OLD_TABLE#', 'EMP' from dual
  union all
  select '#COL_LIST#',
    listagg(column_name, ',')  within group(order by internal_column_id)
  from user_tab_cols
  where table_name = 'EMP'
  and virtual_column = 'NO'
)
select sql_text from template
model
REFERENCE P ON (select rownum p_dim, v_from, v_to from parms)
  DIMENSION BY (p_dim) measures (v_from, v_to) 
main M
dimension by (1 m_dim)
measures(sql_text)
rules iterate(99) until P.v_from[iteration_number+1] is null (
  sql_text[1] = replace(sql_text[1],P.v_from[iteration_number+1], P.v_to[iteration_number+1])
);
select /*+ qb_name(COMPARE) */
  EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO,
  decode(sign(sum(NEW_CNT)), 1, 'I', 'D') OP,
  abs(sum(NEW_CNT)) CNT
FROM (
  select /*+ qb_name(old) */
  EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
    , -1 NEW_CNT
  from EMP O
  union all
  select /*+ qb_name(new) */
  EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
    , 1 NEW_CNT
  from EMP N
)
group by
  EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
having sum(NEW_CNT) != 0
order by 1, OP;

Using new stuff: JSON

The problem with the MODEL technique is that the output cannot be a CLOB. To get a CLOB, suppose we use a bit of PL/SQL in a WITH clause? Now we have to pass the placeholder / value pairs as parameters. I decided that a JSON object would be a reasonable implementation. Warning: this looks nice but requires version 12.2 or later.

with  function multi_replace(
    p_template in varchar2,
    p_parms in varchar2
  ) return clob is
    l_out_text clob := p_template;
    l_json JSON_OBJECT_T := JSON_OBJECT_T.parse(p_parms);
    l_keys JSON_KEY_LIST;
  begin
    l_keys := l_json.get_keys;
    for i in 1..l_keys.count loop
      l_out_text := replace(l_out_text, l_keys(i), l_json.get_string(l_keys(i)));
    end loop;
    return l_out_text;
  end multi_replace;
input(template, parms) as (select
q'§select /*+ qb_name(COMPARE) */
  #COL_LIST#,
  decode(sign(sum(NEW_CNT)), 1, 'I', 'D') OP,
  abs(sum(NEW_CNT)) CNT
FROM (
  select /*+ qb_name(old) */
  #COL_LIST#
    , -1 NEW_CNT
  from #OLD_TABLE# O
  union all
  select /*+ qb_name(new) */
  #COL_LIST#
    , 1 NEW_CNT
  from #NEW_TABLE# N
)
group by
  #COL_LIST#
having sum(NEW_CNT) != 0
order by 1, OP;§',
  json_object(
    '#NEW_TABLE#' is 'EMP',
    '#OLD_TABLE#' is 'EMP',
    '#COL_LIST#'  is (
      select listagg(column_name, ',')  within group(order by internal_column_id)
      from user_tab_cols
      where table_name = 'EMP' and virtual_column = 'NO'
    )
  )
  from dual
)
select multi_replace(template, parms) sql_text
from input;
  • The PL/SQL function parses the input JSON object, gets all the keys (the placeholders), then uses them to get the values one by one and apply them to the template.
  • The JSON_OBJECT function in SQL turns each placeholder / replacement pair into a JSON name:value pair.

I like this approach because it shows each placeholder and its replacement value side by side. It’s just easier for me to read (and write!).

CSV from CLOB with field enclosures

After my post about extracting CSV without enclosures from a CLOB, here is my solution for CSV with enclosures. It wasn’t easy…

How Oracle parses CSV

We can parse CSV in files using SQL*Loader, but I prefer External Tables with the ORACLE_LOADER access driver. Suppose an External Table has these parameters:

records delimited by NEWLINE
SKIP 0
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' NOTRIM
(or FIELDS CSV WITHOUT EMBEDDED NOTRIM in 12.2)

From the documentation and testing, what I see is this:

  1. With these parameters, we cannot embed record delimiters within quotes.
  2. When the field has no enclosures, all characters between commas are output, whitespace or not.
  3. When there are enclosures:
    1. Whitespace is allowed and stripped before and after enclosed fields, even with NOTRIM
    2. field terminators can be embedded between the enclosures
    3. record delimiters cannot be embedded: they end the record
    4. to be enclosed, field enclosures must be doubled: they are undoubled in the output
    5. Whitespace characters are: all characters considered whitespace by REGEXP ‘\s’
      9-13, 32 (space) – and 17 others if AL32UTF8!
    6. No second enclosure > error
    7. No delimiter between enclosed fields > error
    8. If there is a field delimiter at the end of a record, it ends the field but does not start a new field.

I admit that 3.A. (whitespace OK before and after enclosures) was a surprise to me.

My objective

I want to emulate the parameters listed above but I also want decent performance. For performance reasons, I decided to simplify the rules I follow:

  • 3.A. : no whitespace allowed before or after enclosed fields
  • 3.E. : I don’t care whether a character is whitespace or not.
  • 3.G. : the error is now “second enclosure not followed by a field delimiter”
  • Instead of LOG and BAD files, I log errors and bad input with DBMS_OUTPUT.PUT_LINE
  • I ignore extra newlines in the input, but I don’t attempt to suppress output with all null fields.

The code

This code should probably be written in C, which allows direct access to each character in a string. However, anyone who can install a C program on the Database server can probably copy the CLOB to a file and use External Table functionality directly! To make my PL/SQL as fast as possible, I use a technique from my COBOL and C days: the GOTO. With this type of code, calling subprocedures would measurably increase CPU usage.

create or replace function parse_csv_json_array(
  p_str in varchar2,
  p_log integer default null
) return varchar2 authid current_user is
/*
Objective: minimal service for parsing CSV with enclosures, similar to:
RECORDS DELIMITED BY NEWLINE
SKIP 0
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' NOTRIM
(or FIELDS CSV WITHOUT EMBEDDED NOTRIM in 12.2)
> EXCEPT: I allow no whitespace between field enclosures and field terminators.

- record delimiter = NEWLINE, field terminator = "," and field enclosure = '"'
  all are hard coded for the moment.
- record delimiter cannot be embedded in enclosed field.
- without enclosures, everything between field terminators and / or record delimiters is output
- with enclosures:
  - no whitespace allowed before or after enclosed fields
  - field terminators can be embedded between the enclosures
  - to be enclosed, field enclosures must be doubled: they are undoubled in the output
  - no second enclosure > 'Ending field enclosure not found'
  - no terminator after enclosed field > error 'Ending field enclosure not followed by field terminator'
  - If there is a field delimiter at the end of a record, it ends the field but does not start a new field.
- Instead of LOG and BAD files, I log errors and bad input with DBMS_OUTPUT.PUT_LINE
- I ignore extra newlines in the input, but I do not try to suppress output with all null fields.
- The input contains multiple records, so create an array of arrays: one inner array per record
*/
  l_out varchar2(4000);
  i_str integer;
  max_i_str integer;
  max_i_rec integer;
  i_end integer;
  i_num_backslash integer;
begin
  if p_str = '[]' then
    return p_str;
  end if;
  i_str := 3;
  max_i_rec := 2 - 2; -- end of "preceding record", counting 2 for length of record delimiter
  max_i_str := length(p_str) - 2;
  l_out := '[["';

  <<start_record>>
  i_end := max_i_rec + 3; -- length of record delimiter + 1
  i_end := instr(p_str, '\n', i_end);
  if i_end = 0 or i_end is null then
    -- end of record and end of input
    max_i_rec := max_i_str;
  else
    max_i_rec := i_end - 1;
    -- found \n at beginning, skip
    if max_i_rec < i_str then
      i_str := i_str + 2; -- length of record delimiter
      goto start_record;
    end if;
    -- check for '\\' before n
    i_num_backslash := 0;
    while substr(p_str, i_end-i_num_backslash-1,1) = '\' loop
      i_num_backslash := i_num_backslash + 1;
    end loop;
    if mod(i_num_backslash,2) = 1 then
      -- false alert, there was '\n' in the input and json_array made it '\\n'
      goto start_record;
    end if;
  end if;

  <<start_field>>
  if substr(p_str, i_str, 2) = '\"' then
    -- enclosures, so must do one character at a time
    i_str := i_str + 2;
    goto during_enclosed;
  else
    -- no enclosures, search for end of field in record
    i_end := instr(substr(p_str, i_str, max_i_rec - i_str + 1)||',', ',');
    l_out := l_out || substr(p_str, i_str, i_end - 1);
    i_str := i_str + i_end;
    if i_str <= max_i_rec + 1 then
      -- end of field, not end of record
      l_out := l_out || '","';
      goto start_field;
    elsif max_i_rec < max_i_str then
      -- last field of record, not last record
      i_str := max_i_rec + 3;
      l_out := l_out || '"],["';
      goto start_record;
    else
      -- last field of record, last record of input
      l_out := l_out || '"]]';
      goto end_input;
    end if;
  end if;

  <<during_enclosed>>
  i_end := instr(p_str, '\"', i_str);
  if i_end = 0 or i_end > max_i_rec then
    dbms_output.put_line('Ending field enclosure not found, input follows:');
    dbms_output.put_line('<'||json_value(p_str, '$[0]')||'>');
    l_out := null;
    goto end_input;
  end if;
  l_out := l_out || substr(p_str, i_str, i_end - i_str);
  i_str := i_end + 2;
  if substr(p_str, i_str, 2) = '\"' then
    l_out := l_out || '\"';
    i_str := i_str + 2;
  elsif substr(p_str, i_str, 1) = ',' then
      l_out := l_out || '","';
      i_str := i_str + 1;
      goto start_field;
  elsif i_str > max_i_str then
    l_out := l_out || '"]]';
    goto end_input;
  elsif i_str > max_i_rec then
    l_out := l_out || '"],["';
    i_str := max_i_rec + 3;
    goto start_record;
  else
    dbms_output.put_line('Ending field enclosure not followed by field terminator, input follows:');
    dbms_output.put_line('<'||json_value(p_str, '$[0]')||'>');
    l_out := null;
    goto end_input;
  end if;
  goto during_enclosed;

  <<end_input>>
  return l_out;
end parse_csv_json_array;
/

Test results

I took five columns from DBA_OBJECTS and added one VARCHAR2(100) and one NUMBER(4). I tested with four sizes of CLOBS: 100,000 records, 200,000 records, 400,000 records and 800,000 records. I compared input with no enclosures and with all fields enclosed.

Compared to the “simple” CSV solution, this solution is about 50% to 70% slower, depending on how many fields are enclosed. However, the number of records extracted per second remains stable as volume increases.

Please let me know if this solution is of practical interest to anyone…