Splitting strings before they’re CLOBs!

After I tokenized two CLOBs in response to a forum question, Andy Sayer mentioned that they were probably files first. So I smacked my forehead and used external tables. When the light bulb went on, I saw that the comma needed to be a record delimiter, not a field separator. Easy.

create table item_ids_EXT(  
  "RN" NUMBER,  
  "ITEM_ID" VARCHAR2(16)  
)  
ORGANIZATION EXTERNAL(  
  TYPE ORACLE_LOADER  
  DEFAULT DIRECTORY EXT_DATA_DIR  
  ACCESS PARAMETERS(  
    records delimited by ','   
    BADFILE EXT_DATA_DIR:'i_item_id.txt.bad'  
    LOGFILE EXT_DATA_DIR:'i_item_id.txt.log'  
    FIELDS TERMINATED BY "§" NOTRIM  
    MISSING FIELD VALUES ARE NULL  
    REJECT ROWS WITH ALL NULL FIELDS  
    (  
      "RN" recnum,  
      "ITEM_ID" char(255)  
    )  
  )  
  location ('i_item_id.txt')  
)  
REJECT LIMIT 10;

create table item_keys_EXT(  
  "RN" NUMBER,  
  "ITEM_KEY" VARCHAR2(255)  
)  
ORGANIZATION EXTERNAL(  
  TYPE ORACLE_LOADER  
  DEFAULT DIRECTORY EXT_DATA_DIR  
  ACCESS PARAMETERS(  
    records delimited by ','   
    BADFILE EXT_DATA_DIR:'i_item_id.txt.bad'  
    LOGFILE EXT_DATA_DIR:'i_item_id.txt.log'  
    FIELDS TERMINATED BY "§" NOTRIM  
    MISSING FIELD VALUES ARE NULL  
    REJECT ROWS WITH ALL NULL FIELDS  
    (  
      "RN" recnum,  
      "ITEM_KEY" char(255)  
    )  
  )  
  location ('i_item_key.txt')  
)  
REJECT LIMIT 10;  

set timing on

create table u as  
select item_id, item_key  
from item_ids_ext  
join item_keys_ext using(rn);

Table U created.

Elapsed: 00:00:00.787

select floor(2 * 60 * 60 / 0.787) times_faster from dual;

TIMES_FASTER
------------
        9148
Advertisements

Splitting strings when they are CLOBs

Like many of you, I have enjoyed using SQL to make data processing go hundreds of times faster. My new record is: 8000 times faster :-)

In a recent ODC thread, the question was how to split out comma separated values from two CLOBs, one containing “item ids” and the other “item keys”. The ids and the keys had to be matched by position: the first id with the first key, the second id with the second key, and so on. Then each id/key pair was to be inserted as a row into a target table.

With 50,000 ids of 7 characters and 50,000 keys of 172 characters, the current processing takes 2 hours.

Test data

First I’ll create a table with the two CLOBs.

--drop table t purge;
create table t as
select 
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(E, ',' || to_char(level,'fm0000009'))
      order by level
    ) AS CLOB
  ), 2
) i_item_id,
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(E, ',' || rpad(to_char(level,'fm0000009'),172,'*'))
      order by level
    ) AS CLOB
  ), 2
) i_item_key
from dual
connect by level <= 50000;

Splitting CLOB strings

Instead of seeking a “perfect” custom-made solution, I just put together two functions I have already written about:

  1. The PIPE_CLOB function: a pipelined table function that reads a CLOB and outputs up to 4000 bytes at a time, respecting a delimiter.
  2. The STRING_TOKENIZE function: a pipelined table function that splits a VARCHAR2 string according to a delimiter.

All I have to do is feed the output from the first function into the second, and voilà!

select b.column_value item_id
from t,
table(pipe_clob( i_item_id, 4000, ',')) a,
table(string_tokenize(a.column_value,',')) b;

Combining the two CLOBs

Remember, the requirement was to combine the values from both CLOBs by position, in order to insert them into a target table. Since the functions return the values in order, I just use ROWNUM to number the values and use that number to join them.

--drop table u purge;
create table u as
with item_ids as (
  select rownum rn,
    b.column_value item_id
    from t,
  table(pipe_clob( i_item_id, 4000, ',')) a,
  table(string_tokenize(a.column_value,',')) b
)
, item_keys as (
  select rownum rn,
    b.column_value item_key
    from t,
  table(pipe_clob( i_item_key, 4000, ',')) a,
  table(string_tokenize(a.column_value,',')) b
)
select item_id, item_key 
from item_ids join item_keys using(rn);

Table U created.

Elapsed: 00:00:00.879

That’s right, processing time went from two hours to less than one second.

Now, I suspect that the questioner’s code is doing more than just a simple insert, but until he or she provides more details about the requirement, I get to maintain my claim:

8000 times faster!

(which I hope applies only to my code, not to the speed at which time flies…)

Using SQL Translation Framework to access package variables

My last post had code to rewrite SELECT statements that tried to access public package variables. Here I’ll integrate the code into the SQL Translation Framework. Use this only in development environments.

Why only in Dev? Overhead and complication

You normally use the SQL Translation Framework for database migration. The idea is to translate SQL statements from non-Oracle syntax to Oracle syntax, test the translated statements and ensure that the translations are used instead of the original. You don’t want to do all that work every time a statement is submitted to the database, and you don’t want to do it for every statement. In fact, Oracle drivers can now allow applications to say which statements are to be translated and which are to be left alone.

My solution is necessarily generic:

  • The translation service is enabled at the session level.
  • At that point, every statement execution is preceded by a “parse call” and the execution of my TRANSLATE_SQL procedure.

Those “parse calls” will result in “soft parses” since the statement already resides in the library cache. That is still unacceptable overhead in any production-type environment.

I would add that on-the-fly translation in production would cause havoc if it ever stopped working, and would complicate debugging considerably.

Don’t do it.

In fact, I highly recommend just using the TRANSLATE_SQL procedure as presented in my previous post. You can then see, edit, test and deploy the result. You might want to use on-the-fly translation if you are writing a SELECT statement that accesses package variables and you want to make and test changes faster, but that’s it.

Anyway, for what it’s worth here is the code:

create or replace package translate_pkg_vars authid current_user as

  g_trans_profile_name constant varchar2(30) := 'PACKAGE_VARS';
  -- Standard prefix of functions in the generated WITH clause
  wf constant varchar2(8) := 'WF';
  
  procedure create_profile;
  procedure enable_trans;
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  );
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  );
  procedure disable_trans;
  procedure drop_profile;
end translate_pkg_vars;
/
create or replace package body translate_pkg_vars as
  
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
  
  procedure create_profile is
    name_in_use exception;
    pragma exception_init(name_in_use, -955);
  begin
    dbms_sql_translator.create_profile(g_trans_profile_name); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_TRANSLATOR, 
    attribute_value  =>  $$PLSQL_UNIT); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_RAISE_TRANSLATION_ERROR, 
    attribute_value  =>  DBMS_SQL_TRANSLATOR.ATTR_VALUE_FALSE); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_TRACE_TRANSLATION, 
    attribute_value  =>  DBMS_SQL_TRANSLATOR.ATTR_VALUE_FALSE); 
  exception when name_in_use then null;
  end create_profile;
  
  procedure enable_trans is
  begin
    do('alter session set sql_translation_profile = '|| g_trans_profile_name);
    do('alter session set events = ''10601 trace name context forever, level 32''');
  end enable_trans;
  
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  ) is
    not_a_procedure exception;
    pragma exception_init(not_a_procedure, -6553);
    invalid_identifier exception;
    pragma exception_init(invalid_identifier, -904);
    
    g_with_clause varchar2(32767);
    g_select_sql varchar2(32767);
    l_cursor_id integer;
    g_last_error_position integer;
    g_f_num integer := 0;
    
    procedure check_input is
      l_first_word varchar2(256);
      clob_too_long exception;
      pragma exception_init(clob_too_long, -6502);
      l_clob_too_long_msg constant varchar2(999) :=
        'Cannot parse statement longer than 32767 bytes: '
        || 'DBMS_SQL might return erroneous error position.';
    begin
      -- initialize with clause and select sql. Make sure input is 32k max.
      select lower(regexp_substr(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)',1,1,'im',4)) into l_first_word from dual;
      if l_first_word = 'select' then
        g_with_clause := null;
        g_select_sql := sql_text;
      elsif l_first_word = 'with' then
        g_with_clause := 'WITH
';
        g_select_sql := regexp_replace(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)','\1',1,1,'im');
      else
        dbms_output.put_line('Input must be a SELECT statement.');
        raise_application_error(-20000, 'Input must be a SELECT statement.');
      end if;
    exception when clob_too_long then
      raise_application_error(-20001, l_clob_too_long_msg);
    end check_input;
    
    procedure add_function is
      l_errm varchar2(4000);
      l_member varchar2(128);
      l_identifier varchar2(4000);
      l_func_name varchar2(30 BYTE);
    begin
      -- change last error position from byte 0-offset to character 1-offset
      -- find last member of identifier
      -- find entire identifier
      -- generate function name
      -- add function definition to with clause
      -- replace identifier by function call.
      g_last_error_position :=
        length(
          substrb(g_with_clause||g_select_sql,1,g_last_error_position)
        ) 
        - nvl(lengthb(g_with_clause), 0)
        + 1;
      l_errm := sqlerrm;
      if sqlcode = -6553 then
        if instr(l_errm, 'PLS-221') = 0 then
          raise_application_error(
            -20002,
            'Identifier '||l_identifier||' is invalid even within PL/SQL.'
          );
        else
          select regexp_substr(l_errm, '''(.+)''',1,1,null,1) into l_member
          from dual;
        end if;
      else
        select regexp_substr(l_errm, '"([^"]+)"[^"]*$',1,1,null,1) into l_member
        from dual;
      end if;
      select regexp_substr(
        g_select_sql, 
        '([[:alpha:]][[:alpha:]0-9_#$]*\s*[.]\s*)+'||l_member,
        g_last_error_position,1,'im'
      )
      into l_identifier from dual;
      g_f_num := g_f_num + 1;
      l_func_name := substrb(
        WF||g_f_num||'_'||regexp_replace(l_identifier,'\s|[.]','_'),
        1,30
      );
      g_with_clause := nvl(g_with_clause, 'WITH
')
        || 'function '||l_func_name||' return '
        ||l_identifier||'%type deterministic is begin return '
        ||l_identifier||'; end;
';
      g_select_sql := regexp_replace(
        g_select_sql, l_identifier, l_func_name, g_last_error_position, 1
      );

    end add_function;
    
  begin
    check_input;
    loop
      begin
        l_cursor_id := dbms_sql.open_cursor();
        dbms_sql.parse(l_cursor_id,g_with_clause||g_select_sql,1);
        dbms_sql.close_cursor(l_cursor_id);
        -- If all is well, we are done
        exit;
      exception when not_a_procedure or invalid_identifier then
        g_last_error_position := dbms_sql.last_error_position;
        dbms_sql.close_cursor(l_cursor_id);
        add_function;
      when others then
        if l_cursor_id is not null then
          dbms_sql.close_cursor(l_cursor_id);
        end if;
        raise;
      end;
    end loop;
    
    translated_text := g_with_clause||g_select_sql;
    
  end translate_sql;
  
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  ) is
  begin
    translated_sqlstate := null;
  end translate_error;
  
  procedure disable_trans is
  begin
    do('alter session set sql_translation_profile = null');
    do('alter session set events = ''10601 trace name context off''');
  end disable_trans;
  
  procedure drop_profile is
    profile_not_found exception;
    pragma exception_init(profile_not_found, -24252);
  begin
    dbms_sql_translator.drop_profile(g_trans_profile_name);
  exception when profile_not_found then null;
  end drop_profile;
  
end translate_pkg_vars;
/

And a quick test:

exec translate_pkg_vars.create_profile;
exec translate_pkg_vars.enable_trans;

select dbms_lob.lobmaxsize, dbms_lob.call from dual;

WF2_DBMS_LOB_LOBMAXSIZE WF1_DBMS_LOB_CALL
----------------------- -----------------
             1.8447E+19                12
          
exec translate_pkg_vars.disable_trans;
exec translate_pkg_vars.drop_profile;

Make public package variables available to SQL

I recently posted an experimental approach to accessing package constants from SQL. Here is a more robust approach that includes all variables.

The problem

Within PL/SQL, SQL statements can be either static or dynamic. Static SQL can access any PL/SQL variables that are within scope, including variables defined in package specifications. PL/SQL rewrites the statement, putting a bind variable where the PL/SQL variable was and binding the value before execution.

Dynamic SQL, whether submitted from PL/SQL or any other language, cannot directly access package variables, even though they are part of the published specification. This frustrates developers, since access to functions has no such restriction. Getter functions are a workaround, but they are no fun.

Possible solutions

  • Table functions: whether pipelined or not, table functions can submit static SQL but be accessed by dynamic SQL. This approach is not very flexible, but when it fits the bill it can be used in production today.
  • Literals: for constants only, rewrite the SQL and replace package constants by their values, as found in DBA_SOURCE. My previous post tried this.
  • Bind variables: this would mimic what PL/SQL does for static SQL. To be a full solution, the values would have to be looked up and bound before executing the rewritten statement.
  • Getter functions.

Today’s choice

Starting with database version 12.1, we can declare PL/SQL functions in the WITH clause of a SELECT statement, so that is where I’m going to put my getter functions.

Take for example the statement:

select dbms_lob.call c from dual

Add a WITH clause containing a function, then call the function instead of accessing the package constant:

WITH function WF1_dbms_lob_call return dbms_lob.call%type deterministic is 
begin return dbms_lob.call; end;
select WF1_dbms_lob_call c from dual
  • By declaring the function to be deterministic, it is called once per execution and not once per row – at least in my tests using 18c. There appears to be no need to use scalar subquery caching.
  • Unlike the “literal” solution, any package variable that is valid in static SQL will be valid in the rewritten SQL.
  • There is no need to access the values of the variables in order to bind them.
  • The %type attribute ensures that the function returns a value of the same data type as the package variable.

Finding the package variables

Before making and using a getter function, I have to know that my SQL is trying to access a package variable! Instead of trying to reinvent parsing, I use the DBMS_SQL.PARSE procedure.

Let’s assume a SELECT statement is perfectly valid, except for a reference to a package variable. When I call DBMS_SQL.PARSE, I will get one of two errors:

  • “ORA-06553: PLS-221: ‘<variable>’ is not a procedure or is undefined”
  • ORA-00904: “<package>”.”<record/object>”.”<member>”: invalid identifier

I can also access the “last error position”, which gives me the location of the package variable within the SQL text. I now have enough information to extract the package variable text from the SELECT, generate a getter function and call that function instead of referencing the variable directly. After that, I go through the same parse / replace operations until the statement parses without error.

Please note that if there truly is an invalid identifier in the statement, the generated getter function will cause the next parse to fail with a different error message, which is enough to raise an appropriate exception.

Known limitations

  • Requires database version 12.1 or later. Tested in 18c.
  • This version is restricted to SELECT statements. There may or may not be a preexisting WITH clause.
  • The SQL statement, including the generated getter functions, cannot exceed 32767 bytes in length. When I try to parse anything longer, DBMS_SQL gives me the wrong last error position.
  • At this time, I do not handle quoted identifiers.
  • In theory, comments are allowed between members and periods:
    select dbms_lob  /*WEIRD*/ . call from dual;
    I can handle the spaces, but not the comments.
  • If you want to access a boolean in SQL, this utility can’t help you.

No SQL Translation Framework yet

The code here just rewrites the SELECT; it doesn’t use the SQL Translation Framework to execute it directly. Maybe next time?

create or replace package translate_pkg_vars authid current_user as
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  );
end translate_pkg_vars;
/
create or replace package body translate_pkg_vars as

  -- Standard prefix of functions in the generated WITH clause
  wf constant varchar2(8) := 'WF';
  
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  ) is
    not_a_procedure exception;
    pragma exception_init(not_a_procedure, -6553);
    invalid_identifier exception;
    pragma exception_init(invalid_identifier, -904);
    
    g_with_clause varchar2(32767);
    g_select_sql varchar2(32767);
    l_cursor_id integer;
    g_last_error_position integer;
    g_f_num integer := 0;
    
    procedure check_input is
      l_first_word varchar2(256);
      clob_too_long exception;
      pragma exception_init(clob_too_long, -6502);
      l_clob_too_long_msg constant varchar2(999) :=
        'Cannot parse statement longer than 32767 bytes: '
        || 'DBMS_SQL might return erroneous error position.';
    begin
      -- initialize with clause and select sql. Make sure input is 32k max.
      select lower(regexp_substr(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)',1,1,'im',4)) into l_first_word from dual;
      if l_first_word = 'select' then
        g_with_clause := null;
        g_select_sql := sql_text;
      elsif l_first_word = 'with' then
        g_with_clause := 'WITH
';
        g_select_sql := regexp_replace(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)','\1',1,1,'im');
      else
        raise_application_error(-20000, 'Input must be a SELECT statement.');
      end if;
    exception when clob_too_long then
      raise_application_error(-20001, l_clob_too_long_msg);
    end check_input;
    
    procedure add_function is
      l_errm varchar2(4000);
      l_member varchar2(128);
      l_identifier varchar2(4000);
      l_func_name varchar2(30 BYTE);
    begin
      -- change last error position from byte 0-offset to character 1-offset
      -- find last member of identifier
      -- find entire identifier
      -- generate function name
      -- add function definition to with clause
      -- replace identifier by function call.
      g_last_error_position :=
        length(
          substrb(g_with_clause||g_select_sql,1,g_last_error_position)
        ) 
        - nvl(lengthb(g_with_clause), 0)
        + 1;
      l_errm := sqlerrm;
      if sqlcode = -6553 then
        if instr(l_errm, 'PLS-221') = 0 then
          raise_application_error(
            -20002,
            'Identifier '||l_identifier||' is invalid even within PL/SQL.'
          );
        else
          select regexp_substr(l_errm, '''(.+)''',1,1,null,1) into l_member
          from dual;
        end if;
      else
        select regexp_substr(l_errm, '"([^"]+)"[^"]*$',1,1,null,1) into l_member
        from dual;
      end if;
      select regexp_substr(
        g_select_sql, 
        '([[:alpha:]][[:alpha:]0-9_#$]*\s*[.]\s*)+'||l_member,
        g_last_error_position,1,'im'
      )
      into l_identifier from dual;
      g_f_num := g_f_num + 1;
      l_func_name := substrb(
        WF||g_f_num||'_'||regexp_replace(l_identifier,'\s|[.]','_'),
        1,30
      );
      g_with_clause := nvl(g_with_clause, 'WITH
')
        || 'function '||l_func_name||' return '
        ||l_identifier||'%type deterministic is begin return '
        ||l_identifier||'; end;
';
      g_select_sql := regexp_replace(
        g_select_sql, l_identifier, l_func_name, g_last_error_position, 1
      );

    end add_function;
    
  begin
    check_input;
    loop
      begin
        l_cursor_id := dbms_sql.open_cursor();
        dbms_sql.parse(l_cursor_id,g_with_clause||g_select_sql,1);
        dbms_sql.close_cursor(l_cursor_id);
        -- If all is well, we are done
        exit;
      exception when not_a_procedure or invalid_identifier then
        g_last_error_position := dbms_sql.last_error_position;
        dbms_sql.close_cursor(l_cursor_id);
        add_function;
      when others then
        if l_cursor_id is not null then
          dbms_sql.close_cursor(l_cursor_id);
        end if;
        raise;
      end;
    end loop;
    
    translated_text := g_with_clause||g_select_sql;
    
  end translate_sql;
  
end translate_pkg_vars;
/
declare
  sql_text  CLOB := 'select dbms_lob.call c from dual';
  translated_text CLOB;
begin
  translate_pkg_vars.translate_sql(sql_text, translated_text);
  dbms_output.put_line('output:
'||translated_text);
end;
/
output:
WITH
function WF1_dbms_lob_call return dbms_lob.call%type deterministic is begin return dbms_lob.call; end;
select WF1_dbms_lob_call c from dual

Make public package constants available to SQL

This is an enhancement request on ODC and a recent Twitter topic. Here is a homemade solution using the SQL Translation Framework.

UPDATE 2018-12-31: this solution became obsolete pretty quickly! Please see this post for a much better solution.

When executing static SQL inside PL/SQL, the developer can directly reference package variables because PL/SQL will automatically treat them as bind variables and bind the correct values. If the developer wants to test the SQL statement outside of PL/SQL, he/she discovers that the package variable is interpreted as a package function, which of course does not exist.

The package I wrote can create a translation profile, enable it, disable it and drop it. It also contains the TRANSLATE_SQL procedure that does the work. It is meant strictly for development environments as a transparent workaround.

The procedure parses the statement, and if it finds something that can be a reference to a CONSTANT variable defined in a package, it substitutes the value found in DBA_SOURCE.

create or replace package translate_constants as
  g_profile_name constant varchar2(30) := 'CONSTANTS';
  procedure create_profile;
  procedure enable_trans;
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  );
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  );
  procedure disable_trans;
  procedure drop_profile;
end translate_constants;
/
create or replace package body translate_constants as
  type t_parm_val is record (
    parm varchar2(512),
    val varchar2(4000)
  );
  type tt_parm_val is table of t_parm_val;
  lt_parm_val tt_parm_val := null;
  
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
  
  procedure create_profile is
    name_in_use exception;
    pragma exception_init(name_in_use, -955);
  begin
    dbms_sql_translator.create_profile(g_profile_name); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_TRANSLATOR, 
    attribute_value  =>  $$PLSQL_UNIT); 
  exception when name_in_use then null;
  end create_profile;
  
  procedure enable_trans is
  begin
    do('alter session set sql_translation_profile = '|| g_profile_name);
    do('alter session set events = ''10601 trace name context forever, level 32''');
  end enable_trans;
  
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  ) is
    l_first_word varchar2(256);
    l_cursor_id integer;
    not_a_procedure exception;
    pragma exception_init(not_a_procedure, -6553);
    l_errm varchar2(4000);
    l_constant varchar2(4000);
    l_identifier varchar2(4000);
    l_test_sql varchar2(4000);
    l_text varchar2(4000);
  begin
    select lower(regexp_substr(sql_text,'\w+')) into l_first_word from dual;
    if l_first_word not in ('select','with','insert','update','delete','merge') then
      translated_text := null;
      return;
    end if;
    l_errm := 'start';
    translated_text := sql_text;
    while l_errm is not null loop
      l_errm := null;
      begin
        l_cursor_id := dbms_sql.open_cursor();
        dbms_output.put_line(translated_text);
        dbms_sql.parse(l_cursor_id,translated_text,1);
      exception when not_a_procedure then
        l_errm := sqlerrm;
        if instr(l_errm, 'PLS-221:') = 0 then
          raise;
        end if;
        dbms_output.put_line(l_errm);
        select regexp_substr(l_errm, q'§'(.+)'§',1,1,null,1) into l_constant from dual;
        dbms_output.put_line(l_constant);
        select regexp_substr(
          translated_text, 
          '([A-Z][A-Z0-9#_$]*[.])?[A-Z][A-Z0-9#_$]*[.]'||l_constant
          ,1,1,'i'
        )
        into l_identifier from dual;
        dbms_output.put_line(l_identifier);        
        dbms_utility.expand_sql_text(
          'select ' || l_identifier || ' from dual',
          l_test_sql
        );
        dbms_output.put_line(l_test_sql);
        dbms_output.put_line(regexp_substr(l_test_sql,'"(.*?)"',1,1,null,1));
        dbms_output.put_line(regexp_substr(l_test_sql,'"(.*?)"',1,2,null,1));
        select regexp_substr(text, ':=\s*?(\S.*?)\s*;',1,1,null,1)
        into l_text from dba_source
        where owner = regexp_substr(l_test_sql,'"(.*?)"',1,1,null,1)
          and name = regexp_substr(l_test_sql,'"(.*?)"',1,2,null,1) 
          and type = 'PACKAGE' 
          and regexp_like(text, l_constant||'\s+constant\s+','i');
        
        dbms_output.put_line('>'||l_text);
        translated_text := regexp_replace(translated_text,'([^.])'||l_identifier,'\1'||l_text,1,0,'m');
      end;
      dbms_sql.close_cursor(l_cursor_id);
    end loop;
    if l_cursor_id is not null then
      dbms_sql.close_cursor(l_cursor_id);
    end if;
  exception when others then
    if l_cursor_id is not null then
      dbms_sql.close_cursor(l_cursor_id);
    end if;
    raise;

  end translate_sql;
  
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  ) is
  begin
    translated_sqlstate := null;
  end translate_error;
  
  procedure disable_trans is
  begin
    do('alter session set sql_translation_profile = null');
    do('alter session set events = ''10601 trace name context off''');
  end disable_trans;
  
  procedure drop_profile is
    profile_not_found exception;
    pragma exception_init(profile_not_found, -24252);
  begin
    dbms_sql_translator.drop_profile(g_profile_name);
  exception when profile_not_found then null;
  end drop_profile;
  
end translate_constants;
/

And a small test:

exec translate_constants.create_profile;
exec translate_constants.enable_trans;

select dbms_lob.lobmaxsize maxsize,
sys.dbms_lob.lobmaxsize samething from dual;

   MAXSIZE  SAMETHING
---------- ----------
1.8447E+19 1.8447E+19
          
exec translate_constants.disable_trans;
exec translate_constants.drop_profile;

Remember, this is only to help developers. It is not something to ever deploy in a test or production environment.

SQL Templates and multiple REPLACE

I frequently use what I call “SQL templates”, where I replace multiple placeholders by calculated values. Here are two methods that don’t require a stored procedure.

What do I mean by “template”?

SQL statements have to be parsed before they can be executed. The standard way to minimise parsing is to use bind variables. If we use a bind variable instead of a literal, the same statement can be parsed once and executed many times with different values assigned to the variable.

We cannot put the names of tables or columns into bind variables: these are data objects that have to be known at parse time.

What I mean by “template” is a SQL statement with a specific logic, but without the names of the data objects. I get the names I need from input parameters and queries, then I put them in the statement where the placeholders were. If I do this twice with different parameters and the resulting text is different, the optimizer will of course parse each result separately before execution.

So: don’t confuse this technique with bind variables.

Please keep in mind that I never use this technique in production. I generate the SQL, read it, test it and then deploy it. This technique is meant to make SQL development more productive, not replace the developer.

If you ever want to generate SQL in production, work very hard to avoid SQL injection bugs. For example, use DBMS_ASSERT or data dictionary lookups to make sure input parameters are really data objects.

Examples

select * from emp where ename = 'KING';

This statement is not a template and it doesn’t use bind variables.

select * from emp where ename = :ename;

This statement is not a template, but it uses a bind variable.

select /*+ qb_name(COMPARE) */
  #COL_LIST#,
  decode(sign(sum(NEW_CNT)), 1, 'I', 'D') OP,
  abs(sum(NEW_CNT)) CNT
FROM (
  select /*+ qb_name(old) */
  #COL_LIST#
    , -1 NEW_CNT
  from #OLD_TABLE# O
  union all
  select /*+ qb_name(new) */
  #COL_LIST#
    , 1 NEW_CNT
  from #NEW_TABLE# N
)
group by
  #COL_LIST#
having sum(NEW_CNT) != 0
order by 1, OP;

This is a “template” as I use the word here. It shows the logic for comparing two tables using the GROUP BY method. To get a usable SQL statement, I need to replace all the placeholders (enclosed in #) by real values.

Model clause technique

Here is one way to do multiple replacements using pure SQL:

with template(sql_text) as (select cast(
q'§<template>§'
as varchar2(4000)) from dual
)
, parms(v_from, v_to) as (
  <parms>
)
select sql_text from template
model
REFERENCE P ON (select rownum p_dim, v_from, v_to from parms)
  DIMENSION BY (p_dim) measures (v_from, v_to) 
main M
dimension by (1 m_dim)
measures(sql_text)
rules iterate(99) until P.v_from[iteration_number+1] is null (
  sql_text[1] = replace(sql_text[1],P.v_from[iteration_number+1], P.v_to[iteration_number+1])
);
  1. Put the template text in the TEMPLATE subquery.
  2. In the PARMS subquery, create one row per replacement: V_FROM is the placeholder and V_TO is the value.
  3. The MODEL clause will iterate through the reference model P and do each replacement in turn.

Here is an example that generates a comparison. It’s a silly example because I am comparing EMP to itself, but it’s just for illustration!

with template(sql_text) as (select cast(
q'§select /*+ qb_name(COMPARE) */
  #COL_LIST#,
  decode(sign(sum(NEW_CNT)), 1, 'I', 'D') OP,
  abs(sum(NEW_CNT)) CNT
FROM (
  select /*+ qb_name(old) */
  #COL_LIST#
    , -1 NEW_CNT
  from #OLD_TABLE# O
  union all
  select /*+ qb_name(new) */
  #COL_LIST#
    , 1 NEW_CNT
  from #NEW_TABLE# N
)
group by
  #COL_LIST#
having sum(NEW_CNT) != 0
order by 1, OP;§'
as varchar2(4000)) from dual
)
, parms(v_from, v_to) as (
  select '#NEW_TABLE#', 'EMP' from dual
  union all
  select '#OLD_TABLE#', 'EMP' from dual
  union all
  select '#COL_LIST#',
    listagg(column_name, ',')  within group(order by internal_column_id)
  from user_tab_cols
  where table_name = 'EMP'
  and virtual_column = 'NO'
)
select sql_text from template
model
REFERENCE P ON (select rownum p_dim, v_from, v_to from parms)
  DIMENSION BY (p_dim) measures (v_from, v_to) 
main M
dimension by (1 m_dim)
measures(sql_text)
rules iterate(99) until P.v_from[iteration_number+1] is null (
  sql_text[1] = replace(sql_text[1],P.v_from[iteration_number+1], P.v_to[iteration_number+1])
);
select /*+ qb_name(COMPARE) */
  EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO,
  decode(sign(sum(NEW_CNT)), 1, 'I', 'D') OP,
  abs(sum(NEW_CNT)) CNT
FROM (
  select /*+ qb_name(old) */
  EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
    , -1 NEW_CNT
  from EMP O
  union all
  select /*+ qb_name(new) */
  EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
    , 1 NEW_CNT
  from EMP N
)
group by
  EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO
having sum(NEW_CNT) != 0
order by 1, OP;

Using new stuff: JSON

The problem with the MODEL technique is that the output cannot be a CLOB. To get a CLOB, suppose we use a bit of PL/SQL in a WITH clause? Now we have to pass the placeholder / value pairs as parameters. I decided that a JSON object would be a reasonable implementation. Warning: this looks nice but requires version 12.2 or later.

with  function multi_replace(
    p_template in varchar2,
    p_parms in varchar2
  ) return clob is
    l_out_text clob := p_template;
    l_json JSON_OBJECT_T := JSON_OBJECT_T.parse(p_parms);
    l_keys JSON_KEY_LIST;
  begin
    l_keys := l_json.get_keys;
    for i in 1..l_keys.count loop
      l_out_text := replace(l_out_text, l_keys(i), l_json.get_string(l_keys(i)));
    end loop;
    return l_out_text;
  end multi_replace;
input(template, parms) as (select
q'§select /*+ qb_name(COMPARE) */
  #COL_LIST#,
  decode(sign(sum(NEW_CNT)), 1, 'I', 'D') OP,
  abs(sum(NEW_CNT)) CNT
FROM (
  select /*+ qb_name(old) */
  #COL_LIST#
    , -1 NEW_CNT
  from #OLD_TABLE# O
  union all
  select /*+ qb_name(new) */
  #COL_LIST#
    , 1 NEW_CNT
  from #NEW_TABLE# N
)
group by
  #COL_LIST#
having sum(NEW_CNT) != 0
order by 1, OP;§',
  json_object(
    '#NEW_TABLE#' is 'EMP',
    '#OLD_TABLE#' is 'EMP',
    '#COL_LIST#'  is (
      select listagg(column_name, ',')  within group(order by internal_column_id)
      from user_tab_cols
      where table_name = 'EMP' and virtual_column = 'NO'
    )
  )
  from dual
)
select multi_replace(template, parms) sql_text
from input;
  • The PL/SQL function parses the input JSON object, gets all the keys (the placeholders), then uses them to get the values one by one and apply them to the template.
  • The JSON_OBJECT function in SQL turns each placeholder / replacement pair into a JSON name:value pair.

I like this approach because it shows each placeholder and its replacement value side by side. It’s just easier for me to read (and write!).

Ranges with NULLs 07: Swiss Army Knife

I came up with this name a few years ago for a problem that I couldn’t solve then: analyse a series of ranges and say whether and how they overlap. It turns out the solution is not that hard.

Current list of posts: 

The Problem

I want to order a set of ranges by “from” and “to”, then determine the relations between each range and the ranges that follow it. The only limit is with the “precedes” relation: I know that the first range may precede any number of following ranges, so I only want relations where one range immediately precedes the next, with no intervening range.

MATCH_RECOGNIZE to the rescue

The MATCH_RECOGNIZE clause (introduced in 12c) is a good fit here, for several reasons:

  • I can PARTITION BY different test cases and ORDER BY the “from” and “to” values.
  • I can say AFTER MATCH SKIP TO NEXT ROW to use each and every range as a starting point.
  • I can use the pipe character | for alternation. Alternation matches a single regular expression from a list of several possible regular expressions. Alternatives are preferred in the order they are specified.

Reminders (skip if you are reading the whole series)

  • Ranges have exclusive end points, which allows ranges that “meet” without gaps or overlaps.
  • Three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • My data consists of an object and a “from” and “to” range.
  • I propose solutions that require database version 12.1 or later.

No NULLs

This is the same data as in my previous post Ranges with NULLs 03: Gaps

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.
The code:
  • line 4: I partition by the test case and order by “from” and “to”.
  • line 5: the classifier() function returns the name of the condition that was met, which is also the name of the relation I am looking for.
  • line 7: I check each row in turn.
  • line 8: A is the first row (which I do not return), and the following rows belong to one of the other conditions.
  • I check the relations “backwards” so that some checks are implicit:
    • equals: between the first row and the current row, “from” and “to” are the same.
    • starts: I don’t explicitly check that the first “to” is less than the current “to”. At this point they cannot be equal and the ORDER BY guarantees that the first “to” is not greater than the current “to”.
    • after the “starts” condition, the first “from” must be less than the current “to”, so I don’t have to check it explicitly.
    • I don’t care about “precedes” if there are intervening rows.
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T
from t
match_recognize(
  partition by obj_id order by f, t
  measures a.f prior_f, a.t prior_t, classifier() relation
  all rows per match
  after match skip to next row
  pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ )
  define equals as a.f = f and a.t = t,
    starts as a.f = f, -- if true, a.t must be < t because of order by
    -- from now on a.f must be < f because of order by
    contains as a.t > t,
    finished_by as a.t = t,
    overlaps as a.t > f and a.t < t,
    meets as a.t = f,
    -- at this point a.t must be < f
    precedes as count(*) = 2    
);
OBJ_ID PRIOR_F PRIOR_T RELATION F T
1p    0 1 2 PRECEDES 3 4
2m    0 1 2 MEETS 2 3
3o    0 1 3 OVERLAPS 2 4
4f    0 1 3 FINISHED_BY 2 3
5c    0 1 4 CONTAINS 2 3
6s    0 1 2 STARTS 1 3
7e    0 1 2 EQUALS 1 2

 

“To” NULL

delete from t;
Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
  • lines 9 and 12: with equality comparisons, I use DECODE() to treat NULL values as equal.
  • lines 11 and 13: with inequality comparisons, I check explicitly for NULL when necessary.
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T
from t
match_recognize(
  partition by obj_id order by f, t
  measures a.f prior_f, a.t prior_t, classifier() relation
  all rows per match
  after match skip to next row
  pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ )
  define equals as a.f = f and decode(a.t,t,0,1) = 0,
    starts as a.f = f,
    contains as (a.t > t or a.t is null),
    finished_by as decode(a.t,t,0,1) = 0,
    overlaps as a.t > f and (a.t < t or t is null),
    meets as a.t = f,
    precedes as count(*) = 2    
);
OBJ_ID PRIOR_F PRIOR_T RELATION F T
1p    1 1 2 PRECEDES 3
2m    1 1 2 MEETS 2
3o    1 1 3 OVERLAPS 2
4f    1 1 FINISHED_BY 2
5c    1 1 CONTAINS 2 3
6s    1 1 2 STARTS 1
7e    1 1 EQUALS 1

 

With “From” and “To” NULLs

delete from t;
Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T
from t
match_recognize(
  partition by obj_id order by f nulls first, t
  measures a.f prior_f, a.t prior_t, classifier() relation
  all rows per match
  after match skip to next row
  pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ )
  define equals as decode(a.f,f,0,1) = 0 and decode(a.t,t,0,1) = 0,
    starts as decode(a.f,f,0,1) = 0,
    contains as a.t > t or a.t is null,
    finished_by as decode(a.t,t,0,1) = 0,
    overlaps as a.t > f and (a.t < t or t is null),
    meets as a.t = f,
    precedes as count(*) = 2    
);
OBJ_ID PRIOR_F PRIOR_T RELATION F T
1p    2 2 PRECEDES 3
2m    2 2 MEETS 2
3o    2 3 OVERLAPS 2
4f    2 FINISHED_BY 2
5c    2 CONTAINS 2 3
6s    2 2 STARTS
7e    2 EQUALS