SQL table macros 12: compare tables or queries

The GROUP BY method for comparing table data just turned 18. To celebrate, let’s make it a macro.

My second ever blog post described a method, popularised by Tom Kyte, for comparing the contents of tables or queries. As part of my COMP_SYNC package, the COMPARE_SQL function returns a SELECT statement that uses that method. With a macro, instead of returning the statement I can just execute it and compare tables, views and / or named subqueries.

[UPDATE 2022-02-25 : there is now an optional parameter to indicate a primary or unique key.]

  • I call the two data sources “rowsets”: one is “old”, the other is “new”.
  • Optional excluded columns are excluded from the comparison and the output.
  • Optional key columns must identify all rows uniquely.
    • With key columns, output column “Z#_OP” indicates the type of change:
      • I: Insert into old (present in new, not old)
      • D: Delete from old (present in old, not new)
      • U: Updated data (as present in new)
      • O: Old data (as present in old)
    • Without key columns, the output contains only ‘D’ and ‘I’ rows, and column “Z#_CNT” shows the number of rows to be deleted or inserted.
  • The output contains the values used for the comparison, which may differ from the actual values of the data if the data types do not allow direct comparison (LOBs for example).

The code makes heavy use of my SQM_UTIL package to configure the SELECT template. Depending on the data type of each column, an expression is applied to allow comparisons. For example, LOB column content is replaced by a hash and user-defined types are replaced by JSON.

create or replace function compare_rowsets(
  p_old_table in dbms_tf.table_t,
  p_new_table in dbms_tf.table_t,
  p_key_cols in dbms_tf.columns_t default null,
  p_exclude_cols in dbms_tf.columns_t default null
) return clob sql_macro is
/*
Compares tables, views or named subqueries; one is "old", one is "new".
Optional excluded columns are excluded from the comparison and the output.

The output contains the values used for the comparison, which may differ
from the actual values of the data if the data types do not allow
direct comparison (LOBs for example).

Column "Z#_OP" indicates the type of change:
- I: Insert into old (present in new, not old)
- D: Delete from old (present in old, not new)
- U: Updated data (as present in new)
- O: Old data (as present in old)

If present, key column(s) must identify all rows uniquely.

Without key columns, the output contains only 'D' and 'I' rows,
and column "Z#_CNT" shows the number of rows to be deleted or inserted.
*/
  l_col_column_names_old long;
  l_col_comparables_old long;
  l_col_comparables_new long;
  l_col_keys long;
  l_sql clob;
begin
  sqm_util.col_column_names(p_old_table, l_col_column_names_old, p_exclude_cols);
  sqm_util.col_comparables(p_old_table, l_col_comparables_old, p_exclude_cols);
  sqm_util.col_comparables(p_new_table, l_col_comparables_new, p_exclude_cols);
  
  if p_key_cols is null then
  
    l_sql :=
'select /*+ qb_name(COMPARE) */
  decode(sign(sum(Z#_NEW_CNT)), 1, ''I'', ''D'') Z#_OP,
  abs(sum(Z#_NEW_CNT)) Z#_CNT,
  '|| l_col_column_names_old ||'
FROM (
  select /*+ qb_name(old) */
  '|| l_col_comparables_old ||'
    , -1 Z#_NEW_CNT
  from p_old_table O
  union all
  select /*+ qb_name(new) */
  '|| l_col_comparables_new ||'
    , 1 Z#_NEW_CNT
  from p_new_table N
)
group by
  '|| l_col_column_names_old ||'
having sum(Z#_NEW_CNT) != 0';

  else
    sqm_util.list_columns(p_key_cols, l_col_keys);
    l_sql :=

'select /*+ qb_name(COMPARE) */
  case count(*) over(partition by
    '|| l_col_keys ||'
  ) - Z#_NEW_CNT
    when 0 then ''I''
    when 1 then ''U''
    when 2 then ''D''
    when 3 then ''O''
  end Z#_OP,
  '|| l_col_column_names_old ||'
FROM (
  select
    '|| l_col_column_names_old ||',
    sum(Z#_NEW_CNT) Z#_NEW_CNT
  FROM (
    select /*+ qb_name(old) */
    '|| l_col_comparables_old ||',
    -1 Z#_NEW_CNT
    from p_old_table O
    union all
    select /*+ qb_name(new) */
    '|| l_col_comparables_new ||',
    1 Z#_NEW_CNT
    from p_new_table N
  )
  group by
    '|| l_col_column_names_old ||'
  having sum(Z#_NEW_CNT) != 0
)';

  end if;
  --dbms_output.put_line(l_sql);
  return l_sql;
end compare_rowsets;
/
drop table emp2 purge;
 
create table emp2 as select * from emp;
 
update emp2 set ename = ename||' KONG' where rownum = 1;
 
insert into emp2 
select EMPNO+1000, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO
from emp2 where rownum <= 4 and job != 'PRESIDENT';
 
select * from compare_rowsets(
  emp, emp2, 
  p_exclude_cols => columns(comm)
)
order by empno, z#_op;
Z#_OP Z#_CNT EMPNO ENAME JOB MGR HIREDATE SAL DEPTNO
D 1 7839 KING PRESIDENT   1981-11-17 00:00:00 5000 10
I 1 7839 KING KONG PRESIDENT   1981-11-17 00:00:00 5000 10
I 1 8566 JONES MANAGER 7839 1981-04-02 00:00:00 2975 20
I 1 8698 BLAKE MANAGER 7839 1981-05-01 00:00:00 2850 30
I 1 8782 CLARK MANAGER 7839 1981-06-09 00:00:00 2450 10
I 1 8788 SCOTT ANALYST 7566 1987-04-19 00:00:00 3000 20
 
Advertisement

Hierarchical JSON

At last, a use case where PL/SQL is faster than SQL : avoiding unnecessary ordering !

In the last few years, different solutions have been found to represent hierarchical data in JSON format :

I won’t go into the SQL solutions in detail. Basically there are three stages :

  1. Get the data in hierarchical order and return the order, the level, and the data as a JSON object.
  2. Using the LEAD() and LAG() functions, determine whether the current JSON object is a child, a parent or a sibling and apply the appropriate JSON “glue”.
  3. Aggregate the result into a CLOB, preserving the original order.

Notice that stages 2 and 3 require the data to be in the order established by stage 1. In the SQL solutions, the execution plan shows that unnecessary sorts were done, sometimes requiring the use of TEMP space : yuck !

I finally realised that a PL/SQL function could read the output from stage 1, and for each input row add the JSON “glue” and append the result to the CLOB that the function returns. No extra sorting, and it runs about 40% faster than the most optimal SQL solution. Let me explain a few details :

  • The main input is a CURSOR expression : this is not a literal, but a real live SELECT statement that can include bind variables. It returns only the level and a JSON object with the desired data, so it is quite generic. The ORDER SIBLINGS BY clause is optional.
  • The input parameter is a strongly typed REF CURSOR, which allows it to be fetched into a table of P/SQL records.
  • The input is fetched using BULK COLLECT with LIMIT, which uses little memory. Special handling is needed, because I need to know the levels of the rows immediately preceding and following the row being processed.
  • If a JSON object is a parent, I add a name / value pair where the value is an array. The name is by default ‘children’ but it is configurable by an optional second parameter.
  • Concatenating text to a CLOB is a costly operation. To speed it up, I fill up a VARCHAR2 buffer, then concatenate the buffer to the CLOB.
  • Use PLSQL_OPTIMIZE_LEVEL=3 in order to “inline” the calls to APPEND_JSO. This shaves off about .2 seconds per million rows processed.
create or replace package hier_to_clob is
  type t_hier_rec is record(
    lvl pls_integer,
    jso varchar2(4000)
  );
  type tt_hier_rec is table of t_hier_rec;
  TYPE t_hier_rc IS REF CURSOR RETURN t_hier_rec;
  function get(
    p_hier_rc in t_hier_rc,
    p_array_name varchar2 default 'children',
    p_limit integer default 500
  ) return clob;
end hier_to_clob;
/
create or replace package body hier_to_clob is
  function get(
    p_hier_rc in t_hier_rc,
    p_array_name varchar2 default 'children',
    p_limit integer default 500
  ) return clob is
    l_array_name varchar2(130) := ',"'||p_array_name||'":[';
    lt_hier_rec tt_hier_rec;
    l_hier_rec_prev t_hier_rec := t_hier_rec(0, null);
    l_lvl_prev2 pls_integer := 0;
    l_clob clob;
    l_buffer varchar2(32767) := null;
    l_buflen pls_integer := 0;
    do_prev boolean:= false;
    procedure append_jso(
      p_jso varchar2,
      p_lvl_prev pls_integer,
      p_lvl pls_integer,
      p_lvl_next pls_integer
    ) is
      l_jso varchar2(4000);
    begin
      l_jso :=
        case
          when p_lvl_prev = 0 then null
          when p_lvl - p_lvl_prev = 1 then l_array_name
          when p_lvl > 1 then ','
        end ||
        rtrim(p_jso, '}') ||
        rpad('}', (p_lvl - p_lvl_next) * 2 + 1, ']}');

      if l_buflen + lengthb(l_jso) > 32767 then
        l_clob := l_clob || l_buffer;
        l_buffer := l_jso;
        l_buflen := lengthb(l_buffer);
      else
        l_buffer := l_buffer || l_jso;
        l_buflen := l_buflen + lengthb(l_jso);
      end if;
    end append_jso;
  begin
    loop
      fetch p_hier_rc bulk collect into lt_hier_rec limit p_limit;
      if do_prev then
        append_jso(
          l_hier_rec_prev.jso,
          l_lvl_prev2,
          l_hier_rec_prev.lvl,
          case when lt_hier_rec.count > 0 then lt_hier_rec(1).lvl else 1 end
        );
        do_prev := false;
      end if;
      for i in 1..lt_hier_rec.count-1 loop
        append_jso(
          lt_hier_rec(i).jso,
          case
            when i = 1 then l_hier_rec_prev.lvl
            else lt_hier_rec(i-1).lvl
          end,
          lt_hier_rec(i).lvl,
          lt_hier_rec(i+1).lvl
        );
      end loop;
      if lt_hier_rec.count > 0 then
        l_lvl_prev2 :=
          case lt_hier_rec.count
            when 1 then l_hier_rec_prev.lvl
            else lt_hier_rec(lt_hier_rec.count-1).lvl
          end;
        l_hier_rec_prev := lt_hier_rec(lt_hier_rec.count);
        do_prev := true;
      end if;
      exit when p_hier_rc%notfound;
    end loop;
    if do_prev then
      append_jso(
        l_hier_rec_prev.jso,
        l_lvl_prev2,
        l_hier_rec_prev.lvl,
        1
      );
    end if;
    if l_buflen > 0 then
      l_clob := l_clob || l_buffer;
    end if;
    return l_clob;
  end get;
end hier_to_clob;
/
select hier_to_clob.get(
  cursor(
    select level, json_object(empno, ename)
    from emp
    start with empno = 7566
    connect by mgr = prior empno
  ),
  'grunts'
)
from dual;
/* pretty printing done outside of the function */
{
  "empno" : 7566,
  "ename" : "JONES",
  "grunts" :
  [
    {
      "empno" : 7788,
      "ename" : "SCOTT",
      "grunts" :
      [
        {
          "empno" : 7876,
          "ename" : "ADAMS"
        }
      ]
    },
    {
      "empno" : 7902,
      "ename" : "FORD",
      "grunts" :
      [
        {
          "empno" : 7369,
          "ename" : "SMITH"
        }
      ]
    }
  ]
}

Predict_Clustering_Factor

Jonathan Lewis recently wrote about estimating the clustering factor of an index, taking into account the intended value of the TABLE_CACHED_BLOCKS parameter of DBMS_STATS.SET_TABLE_PREFS. He included a function I wrote called predict_clustering_factor. Here is a corrected and improved version. Any questions or remarks about the code are welcome here.

Please refer to blog posts by Jonathan and Richard Foote for explanations of the clustering factor and the TABLE_CACHED_BLOCKS parameter.

UPDATE 2019-10-28: thanks to a comment by Viacheslav Andzhich on Jonathan’s post, I found that PLSQL_CODE_TYPE=’NATIVE’ makes the function run in two-thirds the time. He also mentioned the possibility of overflow on PLS_INTEGER, so I now use INTEGER only. Thanks Viacheslav!

Description

The function takes as input:
p_table_name: the table name
p_column_list: a comma-separated list of the column names to be indexed
p_max_table_cached_blocks: the MAXIMUM value of TABLE_CACHED_BLOCKS to be tested
(255 by default).

The function then queries the ROWIDs, sorted in index order:
– Rows are omitted if all index columns are NULL.
– In case of ties, rows are sorted by ROWID.
– Each ROWID has its “row number” portion set to 0, so it effectively becomes a block id.

The function collects the most recent p_max_table_cached_blocks block ids.
When there is no more room, it replaces the Least Recently Used block id with the current one.

lt_hits_per_RU is a table of intermediate results. The number of entries =  p_max_table_cached_blocks. When a new row has a block id among the most recent, I call that a “hit”. Every hit adds one to an entry in the result table: the index of the entry is based on how “recently used” the block id was. For example, if the current row and the previous row are in the same block, the index is one.

After all the rows are processed, the result table is transformed into a table of clustering factors: the clustering factor is the total number of rows minus all the hits up to and including the current entry.

In the output, the number of rows is equal to p_max_table_cached_blocks.
The ROWNUM provides the TABLE_CACHED_BLOCKS parameter value.

Sample of usage:

select rownum table_cached_blocks,
column_value clustering_factor
from table(predict_clustering_factor('T1','v1,rand'))

The code

ALTER SESSION SET PLSQL_CODE_TYPE='NATIVE';

create or replace function predict_clustering_factor(
/*
Function to predict the clustering factor of an index,
taking into account the intended value of
the TABLE_CACHED_BLOCKS parameter of DBMS_STATS.SET_TABLE_PREFS.

The function takes as input:
- p_table_name: the table name
- p_column_list: a comma-separated list of the column names to be indexed
- p_max_table_cached_blocks: the MAXIMUM value of TABLE_CACHED_BLOCKS to be tested
  (255 by default).

The function then queries the ROWIDs, sorted in index order:
- Rows are omitted if all index columns are NULL.
- In case of ties, rows are sorted by ROWID.
- Each ROWID has its "row number" portion set to 0, so it effectively becomes a block id.

The function collects the most recent p_max_table_cached_blocks block ids.
When there is no more room, it replaces the Least Recently Used block id with the current one.

The function returns a table of clustering factors.
The number of rows is equal to p_max_table_cached_blocks.
The ROWNUM of the table is the TABLE_CACHED_BLOCKS parameter value.

Sample of usage:
  select rownum table_cached_blocks,
    column_value clustering_factor
  from table(predict_clustering_factor('T1','v1,rand'))
*/
  p_table_name in varchar2,
  p_column_list in varchar2,
  p_max_table_cached_blocks in number default 255
) return sys.odcinumberlist authid current_user is

  sql_text varchar2(4000);
  rc sys_refcursor;
  type tt_rids is table of rowid;
  lt_rids tt_rids;

  type t_block is record(
    block_id rowid,
    most_recent_hit integer
  );
  type tt_blocks is table of t_block;
  lt_blocks tt_blocks := new tt_blocks();
  l_block_id rowid;
  l_block_id_prev rowid;

  l_rn integer := 0;
  b_block_found boolean;
  l_LRU integer;
  i_LRU integer := 0;

  lt_hits_per_RU sys.odcinumberlist := new sys.odcinumberlist();
  i_hits_per_RU integer;

  function truncated_rid(p_rid in rowid) return rowid is
    l_rowid_type number;
    l_object_number NUMBER;
    l_relative_fno NUMBER;
    l_block_number NUMBER;
    l_row_number NUMBER;
  begin
    DBMS_ROWID.ROWID_INFO (
      p_rid,
      l_rowid_type,
      l_object_number,
      l_relative_fno,
      l_block_number,
      l_row_number
    );
    return DBMS_ROWID.ROWID_CREATE (
      l_rowid_type,
      l_object_number,
      l_relative_fno,
      l_block_number,
      0
    );
  end truncated_rid;

  function hits_per_RU(p_most_recent_hit in integer) return integer is
    i_hits_per_RU integer := 1;
  begin
    for i in 1..lt_blocks.count loop
      if lt_blocks(i).most_recent_hit > p_most_recent_hit then
        i_hits_per_RU := i_hits_per_RU + 1;
      end if;
    end loop;
    return i_hits_per_RU;
  end hits_per_RU;

begin
  -- Check for valid TABLE_CACHED_PARAMETER value
  if p_max_table_cached_blocks != trunc(p_max_table_cached_blocks)
  or p_max_table_cached_blocks not between 1 and 255 then
    raise_application_error(
      -20001,
      'input parameter p_max_table_cached_blocks must be an integer between 1 and 255'
    );
  end if;

  -- Initialize hits_per_RU table
  lt_hits_per_RU.extend(p_max_table_cached_blocks);
  for i in 1..lt_hits_per_RU.count loop
    lt_hits_per_RU(i) := 0;
  end loop;

  -- Execute query that mimics index
  sql_text :=
    'select rowid from '||p_table_name
    ||' where '||replace(p_column_list, ',', ' is not null or ')||' is not null'
    ||' order by '||p_column_list||', rowid';
  dbms_output.put_line('Query text: '||sql_text);
  open rc for sql_text;

  loop
    fetch rc bulk collect into lt_rids limit 10000;

    for irid in 1..lt_rids.count loop

      l_rn := l_rn + 1;
      l_block_id := truncated_rid(lt_rids(irid));

      -- Optimized treatment of first row
      if l_rn = 1 then
        lt_blocks.extend;
        lt_blocks(1).block_id := l_block_id;
        lt_blocks(1).most_recent_hit := l_rn;
        l_block_id_prev := l_block_id;
        continue;
      end if;

      -- Optimized treatment of consecutive rows in same block
      if l_block_id = l_block_id_prev then
        lt_hits_per_RU(1) := lt_hits_per_RU(1) + 1;
        continue;
      end if;

      l_block_id_prev := l_block_id;
      l_LRU := l_rn;
      b_block_found := false;

      for i in 1..lt_blocks.count loop

        -- if the new block_id is never found,
        -- i_LRU will point to the Least Recently Used block
        if l_LRU > lt_blocks(i).most_recent_hit then
          l_LRU := lt_blocks(i).most_recent_hit;
          i_LRU := i;
        end if;

        -- if the new block_id is found,
        -- then how many blocks ago was it found?
        if lt_blocks(i).block_id = l_block_id then
          b_block_found := true;
          -- how recently used is the block?
          i_hits_per_RU := hits_per_RU(lt_blocks(i).most_recent_hit);
          -- update hit summary
          lt_hits_per_RU(i_hits_per_RU) := lt_hits_per_RU(i_hits_per_RU) + 1;
          -- the block_id was just hit, so update most_recent_hit value
          lt_blocks(i).most_recent_hit := l_rn;
          exit;
        end if;

      end loop;

      -- If new block_id, add to lt_blocks if there is room,
      -- otherwise overwrite Least Recently Used entry
      if not b_block_found then
        if lt_blocks.count <span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span>< p_max_table_cached_blocks then
          lt_blocks.extend;
          lt_blocks(lt_blocks.count).block_id := l_block_id;
          lt_blocks(lt_blocks.count).most_recent_hit := l_rn;
        else
          lt_blocks(i_LRU).block_id := l_block_id;
          lt_blocks(i_LRU).most_recent_hit := l_rn;
        end if;
      end if;

    end loop;

    exit when rc%notfound;
  end loop;

  close rc;

-- Prepare output
  -- calculate cumulative sum of hits
  for i in 2..lt_hits_per_RU.count loop
    lt_hits_per_RU(i) := lt_hits_per_RU(i) + lt_hits_per_RU(i-1);
  end loop;
  -- subtract cumulative hits from total number of rows to get
  -- clustering factor. ROWNUM provides the TABLE_CACHED_BLOCKS value.
  for i in 1..lt_hits_per_RU.count loop
    lt_hits_per_RU(i) := l_rn - lt_hits_per_RU(i);
  end loop;

  dbms_output.put_line('Total number of rows in index = '|| l_rn);
  return lt_hits_per_RU;

exception when others then
  if rc%isopen then
    close rc;
  end if;
  raise;

end predict_clustering_factor;
/

Elegant MODEL clause

Frank Kulash recently asked a challenging question on OTN, which Chris227 answered using the MODEL clause. To help  fully understand his excellent answer, I’m going to go into detail. (To concentrate on the modeling concepts, I’ve changed a few details in the problem and the solution.)

Teams Drafting Players

There are five teams that take turns choosing among eight players.

TEAM_NAME PLAYER_NAME
Atletico Mane
Burnley Neymar
Crystal Ronaldo
Dortmund Silva
Everton Weah
Xavi
Yorke
Zidane

 

The teams choose in alphabetical order. Once a player is picked, he is of course unavailable to be picked by another team. In fact, a team does not pick directly: it gives a rating to each player that it wants, and the available player with the highest rating is automatically assigned to the team. To simplify the problem, each team picks only once.

Here are the ratings, ordered by TEAM_NAME, RATING DESC, with the picks in bold.

TEAM_NAME PLAYER_NAME RATING
Atletico Mane 90
Atletico Neymar 80
Atletico Ronaldo 60
Burnley Silva 91
Burnley Weah 71
Burnley Zidane 51
Crystal Silva 82
Crystal Weah 72
Crystal Mane 62
Dortmund Silva 93
Dortmund Weah 83
Dortmund Mane 73
Everton Weah 94
Everton Silva 84
Everton Yorke 74
  • Atletico and Burney get their top rated players (Mane and Silva).
  • Crystal gets its second rated (Weah) because its first rated (Silva) is taken.
  • Dortmund has only rated three players and they are all taken, so Dortmund gets no one.
  • Everton gets its third rated player (Yorke) because the first two are taken.

For input, here is Frank’s table with the ratings for each team. Notice that (team_name, player_name) is unique, and (team_name, rating) is unique.

CREATE TABLE    choice (team_name, player_name, rating)    AS
SELECT  'Atletico', 'Mane',    90  FROM dual  UNION ALL
SELECT  'Atletico', 'Neymar',  80  FROM dual  UNION ALL      -- combination of
SELECT  'Atletico', 'Ronaldo', 60  FROM dual  UNION ALL      -- (team_name, player_name)
SELECT  'Burnley',  'Silva',   91  FROM dual  UNION ALL      -- is unique; so is
SELECT  'Burnley',  'Weah',    71  FROM dual  UNION ALL      -- (team_name, rating)
SELECT  'Burnley',  'Zidane',  51  FROM dual  UNION ALL
SELECT  'Crystal',  'Silva',   82  FROM dual  UNION ALL
SELECT  'Crystal',  'Weah',    72  FROM dual  UNION ALL
SELECT  'Crystal',  'Mane',    62  FROM dual  UNION ALL
SELECT  'Dortmund', 'Silva',   93  FROM dual  UNION ALL
SELECT  'Dortmund', 'Weah',    83  FROM dual  UNION ALL
SELECT  'Dortmund', 'Mane',    73  FROM dual  UNION ALL
SELECT  'Everton',  'Weah',    94  FROM dual  UNION ALL
SELECT  'Everton',  'Silva',   84  FROM dual  UNION ALL
SELECT  'Everton',  'Yorke',   74  FROM dual;

Starting to MODEL

select * from choice
model
dimension by ( team_name, player_name)
measures ( rating, 0 picked )
rules (
)
order by team_name, rating desc;

The MODEL clause turns the input data into a multi-dimensional array.

  • The DIMENSION clause lists the array indexes: here they are strings, not integers. The dimension columns identify each row uniquely, and their values cannot be changed.
  • The MEASURES clause lists columns whose values can be changed.
    A measure column in an individual row is called a cell: for example, RATING['Atletico', 'Mane']
  • The RULES clause lists rules that say what changes should be made. We’ll see rules in a bit.

Here we see what the data looks like before any rules are applied.

TEAM_NAME PLAYER_NAME RATING PICKED
Atletico Mane 90 0
Atletico Neymar 80 0
Atletico Ronaldo 60 0
Burnley Silva 91 0
Burnley Weah 71 0
Burnley Zidane 51 0
Crystal Silva 82 0
Crystal Weah 72 0
Crystal Mane 62 0
Dortmund Silva 93 0
Dortmund Weah 83 0
Dortmund Mane 73 0
Everton Weah 94 0
Everton Silva 84 0
Everton Yorke 74 0

 

What is a RULE?

A rule assigns a value to a cell. For example

PICKED['Atletico', 'Mane'] = 1

The left-hand side is the cell, and the right-hand side is the value assigned to that cell.

The left-hand side can refer to more than one cell:

  • PICKED['Atletico', ANY] = 1 will assign 1 to the PICKED cells for team ‘Atletico’, no matter who the player is;
  • PICKED[ANY, 'Mane'] = 1 will assign 1 to the PICKED cells for player ‘Mane’, no matter what the team is;
  • PICKED[ANY, ANY] = 1 will assign 1 to each and every PICKED cell.

The right-hand side contains a value, but that value can come from a cell. To use the same dimension values on the left-hand side and the right-hand side, we can use CV().

  • PICKED[ANY, ANY] = RATING[CV(),CV()] will assign to each PICKED cell the value RATING from the same row.

Each rule applies to only one cell at a time. If the left-hand side refers to several cells, the right-hand side is evaluated and assigned separately to each left-hand cell. The result of the evaluation must be one value, not a range of values.

Aggregations in Rules

On the right-hand side of a rule, we can aggregate over a range of cells to get one value.

  • PICKED[ANY, ANY] = max(RATING)[CV(),ANY] will assign to each PICKED cell the maximum value of RATING for that team. Notice the parentheses end before the dimension columns.

Ordering in Rules

Since each rule is applied to only one left-side cell at a time, the order of execution can be important. Look what happens with this rule:

picked[any,any] order by team_name, rating desc = max(picked)[any,any] + 1

TEAM_NAME PLAYER_NAME RATING PICKED
Atletico Mane 90 1
Atletico Neymar 80 2
Atletico Ronaldo 60 3
Burnley Silva 91 4
Burnley Weah 71 5
Burnley Zidane 51 6
Crystal Silva 82 7
Crystal Weah 72 8
Crystal Mane 62 9
Dortmund Silva 93 10
Dortmund Weah 83 11
Dortmund Mane 73 12
Everton Weah 94 13
Everton Silva 84 14
Everton Yorke 74 15

 

Do you see what happened here? For every left-side cell, the right-side expression is evaluated using the previously assigned values! That is why the order is so important.

The Solution to our problem

If we order the rows by team_name, rating desc then we can say for each row:

  • If the current row’s team has not picked
  • and the current row’s player has not been picked,
  • then pick that player for that team.
select * from choice
model
dimension by ( team_name, player_name)
measures ( rating, 0 picked )
rules (
  picked[any,any] order by team_name, rating desc = 
    case
      when max(picked)[any,cv()] = 0 and max(picked)[cv(),any] = 0
      then 1
      else 0
    end    
);
TEAM_NAME PLAYER_NAME RATING PICKED
Atletico Mane 90 1
Atletico Neymar 80 0
Atletico Ronaldo 60 0
Burnley Silva 91 1
Burnley Weah 71 0
Burnley Zidane 51 0
Crystal Silva 82 0
Crystal Weah 72 1
Crystal Mane 62 0
Dortmund Silva 93 0
Dortmund Weah 83 0
Dortmund Mane 73 0
Everton Weah 94 0
Everton Silva 84 0
Everton Yorke 74 1

 

Multiple Rounds

To let the teams pick more than one player, we can just add the ITERATE keyword. The ITERATION_NUMBER function starts with 0.

select * from choice
model
dimension by ( team_name, player_name)
measures ( rating, 0 picked )
rules iterate (3) (
  picked[any,any] order by team_name, rating desc = 
    case
      when max(picked)[any,cv()] = 0 and max(picked)[cv(),any] <= iteration_number
      then iteration_number + 1
      else picked[cv(),cv()]
    end    
);
TEAM_NAME PLAYER_NAME RATING PICKED
Atletico Mane 90 1
Atletico Neymar 80 2
Atletico Ronaldo 60 3
Burnley Silva 91 1
Burnley Weah 71 0
Burnley Zidane 51 2
Crystal Silva 82 0
Crystal Weah 72 1
Crystal Mane 62 0
Dortmund Silva 93 0
Dortmund Weah 83 0
Dortmund Mane 73 0
Everton Weah 94 0
Everton Silva 84 0
Everton Yorke 74 1

 

Did you make it this far?

If so, thanks for your patience! I thought this was a great example of the power of MODEL , because it uses ordering, aggregation and previously changed values in its evaluations – all in one pretty concise rule.

Hope this helps…

Missing cents when dividing an amount

Alex Nuijten (@alexnuijten) wrote a blog post on this subject a few years ago. Since his blog doesn’t accept comments at the moment, here is a variant I think is interesting.

The problem

As an example, say we have $100 that we want to divide into 42 parts.

  • ROUND(100/42,2) gives us $2.38
  • Multiplying $2.38 by 42 gives us $99.96
  • So to get a total of 100, we need to add one cent to four lines.

Adding cents to the first four lines

Here is Alex’s solution:

with amounts
as
( select level id
      ,100 amount
      ,count(*) over () entries
      ,round (100 / count(*) over (), 2) rounded
    from dual
    connect by level <= 42
)
,indicators as 
(
select id
      ,amount
      ,entries
      ,rounded
      ,case 
       when rownum <= abs ((amount - (entries * rounded)) * 100)
       then 'x'
       end as indicator
      ,sign (amount - (entries * rounded)) as pos_neg
 from amounts
)
select id
      ,rounded +
       (nvl2 (indicator, 0.01, 0) * pos_neg) final_amount
  from indicators
;
ID FINAL_AMOUNT
1 2.39
2 2.39
3 2.39
4 2.39
5 2.38
6 2.38
7 2.38
8 2.38
9 2.38

 

Adding cents using a running total

Personally, I would prefer to spread the extra cents more evenly across the lines. To do that, I simply round the current running total, then subtract the previous lines rounded running total. This turns out to be less code.

with data(id, amount, entries) as (
  select level, 100, count(*) over()
  from dual connect by level <= 42
)
select id, amount, entries,
  round(id*amount/entries, 2) - round((id-1)*amount/entries, 2) split
from data;

The lines with an extra cent are 6, 16, 27 and 37.

Hope this helps. Regards, Stew

UPDATE 2019-10-12: in the comments, mathguy proposed an alternative saying “no analytic functions and no subqueries needed”. In fact, if the amount and the number of entries are given as inputs, no analytic function is needed, and the WITH clause is simply a matter of style. Also, paulzipblog mentioned that the amount could be negative. My solution works with negative amounts and, given the right inputs, requires neither analytic functions or subqueries.

with inputs(amount, entries) as (select -100, 42 from dual)
select level as id, amount, entries,
  round(level*amount/entries, 2) - round((level-1)*amount/entries, 2) split
from inputs
connect by level <= entries;

Splitting strings before they’re CLOBs!

After I tokenized two CLOBs in response to a forum question, Andy Sayer mentioned that they were probably files first. So I smacked my forehead and used external tables. When the light bulb went on, I saw that the comma needed to be a record delimiter, not a field separator. Easy.

create table item_ids_EXT(  
  "RN" NUMBER,  
  "ITEM_ID" VARCHAR2(16)  
)  
ORGANIZATION EXTERNAL(  
  TYPE ORACLE_LOADER  
  DEFAULT DIRECTORY EXT_DATA_DIR  
  ACCESS PARAMETERS(  
    records delimited by ','   
    BADFILE EXT_DATA_DIR:'i_item_id.txt.bad'  
    LOGFILE EXT_DATA_DIR:'i_item_id.txt.log'  
    FIELDS TERMINATED BY "§" NOTRIM  
    MISSING FIELD VALUES ARE NULL  
    REJECT ROWS WITH ALL NULL FIELDS  
    (  
      "RN" recnum,  
      "ITEM_ID" char(255)  
    )  
  )  
  location ('i_item_id.txt')  
)  
REJECT LIMIT 10;

create table item_keys_EXT(  
  "RN" NUMBER,  
  "ITEM_KEY" VARCHAR2(255)  
)  
ORGANIZATION EXTERNAL(  
  TYPE ORACLE_LOADER  
  DEFAULT DIRECTORY EXT_DATA_DIR  
  ACCESS PARAMETERS(  
    records delimited by ','   
    BADFILE EXT_DATA_DIR:'i_item_id.txt.bad'  
    LOGFILE EXT_DATA_DIR:'i_item_id.txt.log'  
    FIELDS TERMINATED BY "§" NOTRIM  
    MISSING FIELD VALUES ARE NULL  
    REJECT ROWS WITH ALL NULL FIELDS  
    (  
      "RN" recnum,  
      "ITEM_KEY" char(255)  
    )  
  )  
  location ('i_item_key.txt')  
)  
REJECT LIMIT 10;  

set timing on

create table u as  
select item_id, item_key  
from item_ids_ext  
join item_keys_ext using(rn);

Table U created.

Elapsed: 00:00:00.787

select floor(2 * 60 * 60 / 0.787) times_faster from dual;

TIMES_FASTER
------------
        9148

Splitting strings when they are CLOBs

Like many of you, I have enjoyed using SQL to make data processing go hundreds of times faster. My new record is: 8000 times faster :-)

In a recent ODC thread, the question was how to split out comma separated values from two CLOBs, one containing “item ids” and the other “item keys”. The ids and the keys had to be matched by position: the first id with the first key, the second id with the second key, and so on. Then each id/key pair was to be inserted as a row into a target table.

With 50,000 ids of 7 characters and 50,000 keys of 172 characters, the current processing takes 2 hours.

Test data

First I’ll create a table with the two CLOBs.

--drop table t purge;
create table t as
select 
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(E, ',' || to_char(level,'fm0000009'))
      order by level
    ) AS CLOB
  ), 2
) i_item_id,
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(E, ',' || rpad(to_char(level,'fm0000009'),172,'*'))
      order by level
    ) AS CLOB
  ), 2
) i_item_key
from dual
connect by level <= 50000;

Splitting CLOB strings

Instead of seeking a “perfect” custom-made solution, I just put together two functions I have already written about:

  1. The PIPE_CLOB function: a pipelined table function that reads a CLOB and outputs up to 4000 bytes at a time, respecting a delimiter.
  2. The STRING_TOKENIZE function: a pipelined table function that splits a VARCHAR2 string according to a delimiter.

All I have to do is feed the output from the first function into the second, and voilà!

select b.column_value item_id
from t,
table(pipe_clob( i_item_id, 4000, ',')) a,
table(string_tokenize(a.column_value,',')) b;

Combining the two CLOBs

Remember, the requirement was to combine the values from both CLOBs by position, in order to insert them into a target table. Since the functions return the values in order, I just use ROWNUM to number the values and use that number to join them.

--drop table u purge;
create table u as
with item_ids as (
  select rownum rn,
    b.column_value item_id
    from t,
  table(pipe_clob( i_item_id, 4000, ',')) a,
  table(string_tokenize(a.column_value,',')) b
)
, item_keys as (
  select rownum rn,
    b.column_value item_key
    from t,
  table(pipe_clob( i_item_key, 4000, ',')) a,
  table(string_tokenize(a.column_value,',')) b
)
select item_id, item_key 
from item_ids join item_keys using(rn);

Table U created.

Elapsed: 00:00:00.879

That’s right, processing time went from two hours to less than one second.

Now, I suspect that the questioner’s code is doing more than just a simple insert, but until he or she provides more details about the requirement, I get to maintain my claim:

8000 times faster!

(which I hope applies only to my code, not to the speed at which time flies…)

Using SQL Translation Framework to access package variables

My last post had code to rewrite SELECT statements that tried to access public package variables. Here I’ll integrate the code into the SQL Translation Framework. Use this only in development environments.

Why only in Dev? Overhead and complication

You normally use the SQL Translation Framework for database migration. The idea is to translate SQL statements from non-Oracle syntax to Oracle syntax, test the translated statements and ensure that the translations are used instead of the original. You don’t want to do all that work every time a statement is submitted to the database, and you don’t want to do it for every statement. In fact, Oracle drivers can now allow applications to say which statements are to be translated and which are to be left alone.

My solution is necessarily generic:

  • The translation service is enabled at the session level.
  • At that point, every statement execution is preceded by a “parse call” and the execution of my TRANSLATE_SQL procedure.

Those “parse calls” will result in “soft parses” since the statement already resides in the library cache. That is still unacceptable overhead in any production-type environment.

I would add that on-the-fly translation in production would cause havoc if it ever stopped working, and would complicate debugging considerably.

Don’t do it.

In fact, I highly recommend just using the TRANSLATE_SQL procedure as presented in my previous post. You can then see, edit, test and deploy the result. You might want to use on-the-fly translation if you are writing a SELECT statement that accesses package variables and you want to make and test changes faster, but that’s it.

Anyway, for what it’s worth here is the code:

create or replace package translate_pkg_vars authid current_user as

  g_trans_profile_name constant varchar2(30) := 'PACKAGE_VARS';
  -- Standard prefix of functions in the generated WITH clause
  wf constant varchar2(8) := 'WF';
  
  procedure create_profile;
  procedure enable_trans;
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  );
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  );
  procedure disable_trans;
  procedure drop_profile;
end translate_pkg_vars;
/
create or replace package body translate_pkg_vars as
  
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
  
  procedure create_profile is
    name_in_use exception;
    pragma exception_init(name_in_use, -955);
  begin
    dbms_sql_translator.create_profile(g_trans_profile_name); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_TRANSLATOR, 
    attribute_value  =>  $$PLSQL_UNIT); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_RAISE_TRANSLATION_ERROR, 
    attribute_value  =>  DBMS_SQL_TRANSLATOR.ATTR_VALUE_FALSE); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_TRACE_TRANSLATION, 
    attribute_value  =>  DBMS_SQL_TRANSLATOR.ATTR_VALUE_FALSE); 
  exception when name_in_use then null;
  end create_profile;
  
  procedure enable_trans is
  begin
    do('alter session set sql_translation_profile = '|| g_trans_profile_name);
    do('alter session set events = ''10601 trace name context forever, level 32''');
  end enable_trans;
  
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  ) is
    not_a_procedure exception;
    pragma exception_init(not_a_procedure, -6553);
    invalid_identifier exception;
    pragma exception_init(invalid_identifier, -904);
    
    g_with_clause varchar2(32767);
    g_select_sql varchar2(32767);
    l_cursor_id integer;
    g_last_error_position integer;
    g_f_num integer := 0;
    
    procedure check_input is
      l_first_word varchar2(256);
      clob_too_long exception;
      pragma exception_init(clob_too_long, -6502);
      l_clob_too_long_msg constant varchar2(999) :=
        'Cannot parse statement longer than 32767 bytes: '
        || 'DBMS_SQL might return erroneous error position.';
    begin
      -- initialize with clause and select sql. Make sure input is 32k max.
      select lower(regexp_substr(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)',1,1,'im',4)) into l_first_word from dual;
      if l_first_word = 'select' then
        g_with_clause := null;
        g_select_sql := sql_text;
      elsif l_first_word = 'with' then
        g_with_clause := 'WITH
';
        g_select_sql := regexp_replace(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)','\1',1,1,'im');
      else
        dbms_output.put_line('Input must be a SELECT statement.');
        raise_application_error(-20000, 'Input must be a SELECT statement.');
      end if;
    exception when clob_too_long then
      raise_application_error(-20001, l_clob_too_long_msg);
    end check_input;
    
    procedure add_function is
      l_errm varchar2(4000);
      l_member varchar2(128);
      l_identifier varchar2(4000);
      l_func_name varchar2(30 BYTE);
    begin
      -- change last error position from byte 0-offset to character 1-offset
      -- find last member of identifier
      -- find entire identifier
      -- generate function name
      -- add function definition to with clause
      -- replace identifier by function call.
      g_last_error_position :=
        length(
          substrb(g_with_clause||g_select_sql,1,g_last_error_position)
        ) 
        - nvl(lengthb(g_with_clause), 0)
        + 1;
      l_errm := sqlerrm;
      if sqlcode = -6553 then
        if instr(l_errm, 'PLS-221') = 0 then
          raise_application_error(
            -20002,
            'Identifier '||l_identifier||' is invalid even within PL/SQL.'
          );
        else
          select regexp_substr(l_errm, '''(.+)''',1,1,null,1) into l_member
          from dual;
        end if;
      else
        select regexp_substr(l_errm, '"([^"]+)"[^"]*$',1,1,null,1) into l_member
        from dual;
      end if;
      select regexp_substr(
        g_select_sql, 
        '([[:alpha:]][[:alpha:]0-9_#$]*\s*[.]\s*)+'||l_member,
        g_last_error_position,1,'im'
      )
      into l_identifier from dual;
      g_f_num := g_f_num + 1;
      l_func_name := substrb(
        WF||g_f_num||'_'||regexp_replace(l_identifier,'\s|[.]','_'),
        1,30
      );
      g_with_clause := nvl(g_with_clause, 'WITH
')
        || 'function '||l_func_name||' return '
        ||l_identifier||'%type deterministic is begin return '
        ||l_identifier||'; end;
';
      g_select_sql := regexp_replace(
        g_select_sql, l_identifier, l_func_name, g_last_error_position, 1
      );

    end add_function;
    
  begin
    check_input;
    loop
      begin
        l_cursor_id := dbms_sql.open_cursor();
        dbms_sql.parse(l_cursor_id,g_with_clause||g_select_sql,1);
        dbms_sql.close_cursor(l_cursor_id);
        -- If all is well, we are done
        exit;
      exception when not_a_procedure or invalid_identifier then
        g_last_error_position := dbms_sql.last_error_position;
        dbms_sql.close_cursor(l_cursor_id);
        add_function;
      when others then
        if l_cursor_id is not null then
          dbms_sql.close_cursor(l_cursor_id);
        end if;
        raise;
      end;
    end loop;
    
    translated_text := g_with_clause||g_select_sql;
    
  end translate_sql;
  
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  ) is
  begin
    translated_sqlstate := null;
  end translate_error;
  
  procedure disable_trans is
  begin
    do('alter session set sql_translation_profile = null');
    do('alter session set events = ''10601 trace name context off''');
  end disable_trans;
  
  procedure drop_profile is
    profile_not_found exception;
    pragma exception_init(profile_not_found, -24252);
  begin
    dbms_sql_translator.drop_profile(g_trans_profile_name);
  exception when profile_not_found then null;
  end drop_profile;
  
end translate_pkg_vars;
/

And a quick test:

exec translate_pkg_vars.create_profile;
exec translate_pkg_vars.enable_trans;

select dbms_lob.lobmaxsize, dbms_lob.call from dual;

WF2_DBMS_LOB_LOBMAXSIZE WF1_DBMS_LOB_CALL
----------------------- -----------------
             1.8447E+19                12
          
exec translate_pkg_vars.disable_trans;
exec translate_pkg_vars.drop_profile;

Make public package variables available to SQL

I recently posted an experimental approach to accessing package constants from SQL. Here is a more robust approach that includes all variables.

The problem

Within PL/SQL, SQL statements can be either static or dynamic. Static SQL can access any PL/SQL variables that are within scope, including variables defined in package specifications. PL/SQL rewrites the statement, putting a bind variable where the PL/SQL variable was and binding the value before execution.

Dynamic SQL, whether submitted from PL/SQL or any other language, cannot directly access package variables, even though they are part of the published specification. This frustrates developers, since access to functions has no such restriction. Getter functions are a workaround, but they are no fun.

Possible solutions

  • Table functions: whether pipelined or not, table functions can submit static SQL but be accessed by dynamic SQL. This approach is not very flexible, but when it fits the bill it can be used in production today.
  • Literals: for constants only, rewrite the SQL and replace package constants by their values, as found in DBA_SOURCE. My previous post tried this.
  • Bind variables: this would mimic what PL/SQL does for static SQL. To be a full solution, the values would have to be looked up and bound before executing the rewritten statement.
  • Getter functions.

Today’s choice

Starting with database version 12.1, we can declare PL/SQL functions in the WITH clause of a SELECT statement, so that is where I’m going to put my getter functions.

Take for example the statement:

select dbms_lob.call c from dual

Add a WITH clause containing a function, then call the function instead of accessing the package constant:

WITH function WF1_dbms_lob_call return dbms_lob.call%type deterministic is 
begin return dbms_lob.call; end;
select WF1_dbms_lob_call c from dual
  • By declaring the function to be deterministic, it is called once per execution and not once per row – at least in my tests using 18c. There appears to be no need to use scalar subquery caching.
  • Unlike the “literal” solution, any package variable that is valid in static SQL will be valid in the rewritten SQL.
  • There is no need to access the values of the variables in order to bind them.
  • The %type attribute ensures that the function returns a value of the same data type as the package variable.

Finding the package variables

Before making and using a getter function, I have to know that my SQL is trying to access a package variable! Instead of trying to reinvent parsing, I use the DBMS_SQL.PARSE procedure.

Let’s assume a SELECT statement is perfectly valid, except for a reference to a package variable. When I call DBMS_SQL.PARSE, I will get one of two errors:

  • “ORA-06553: PLS-221: ‘<variable>’ is not a procedure or is undefined”
  • ORA-00904: “<package>”.”<record/object>”.”<member>”: invalid identifier

I can also access the “last error position”, which gives me the location of the package variable within the SQL text. I now have enough information to extract the package variable text from the SELECT, generate a getter function and call that function instead of referencing the variable directly. After that, I go through the same parse / replace operations until the statement parses without error.

Please note that if there truly is an invalid identifier in the statement, the generated getter function will cause the next parse to fail with a different error message, which is enough to raise an appropriate exception.

Known limitations

  • Requires database version 12.1 or later. Tested in 18c.
  • This version is restricted to SELECT statements. There may or may not be a preexisting WITH clause.
  • The SQL statement, including the generated getter functions, cannot exceed 32767 bytes in length. When I try to parse anything longer, DBMS_SQL gives me the wrong last error position.
  • At this time, I do not handle quoted identifiers.
  • In theory, comments are allowed between members and periods:
    select dbms_lob  /*WEIRD*/ . call from dual;
    I can handle the spaces, but not the comments.
  • If you want to access a boolean in SQL, this utility can’t help you.

No SQL Translation Framework yet

The code here just rewrites the SELECT; it doesn’t use the SQL Translation Framework to execute it directly. Maybe next time?

create or replace package translate_pkg_vars authid current_user as
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  );
end translate_pkg_vars;
/
create or replace package body translate_pkg_vars as

  -- Standard prefix of functions in the generated WITH clause
  wf constant varchar2(8) := 'WF';
  
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  ) is
    not_a_procedure exception;
    pragma exception_init(not_a_procedure, -6553);
    invalid_identifier exception;
    pragma exception_init(invalid_identifier, -904);
    
    g_with_clause varchar2(32767);
    g_select_sql varchar2(32767);
    l_cursor_id integer;
    g_last_error_position integer;
    g_f_num integer := 0;
    
    procedure check_input is
      l_first_word varchar2(256);
      clob_too_long exception;
      pragma exception_init(clob_too_long, -6502);
      l_clob_too_long_msg constant varchar2(999) :=
        'Cannot parse statement longer than 32767 bytes: '
        || 'DBMS_SQL might return erroneous error position.';
    begin
      -- initialize with clause and select sql. Make sure input is 32k max.
      select lower(regexp_substr(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)',1,1,'im',4)) into l_first_word from dual;
      if l_first_word = 'select' then
        g_with_clause := null;
        g_select_sql := sql_text;
      elsif l_first_word = 'with' then
        g_with_clause := 'WITH
';
        g_select_sql := regexp_replace(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)','\1',1,1,'im');
      else
        raise_application_error(-20000, 'Input must be a SELECT statement.');
      end if;
    exception when clob_too_long then
      raise_application_error(-20001, l_clob_too_long_msg);
    end check_input;
    
    procedure add_function is
      l_errm varchar2(4000);
      l_member varchar2(128);
      l_identifier varchar2(4000);
      l_func_name varchar2(30 BYTE);
    begin
      -- change last error position from byte 0-offset to character 1-offset
      -- find last member of identifier
      -- find entire identifier
      -- generate function name
      -- add function definition to with clause
      -- replace identifier by function call.
      g_last_error_position :=
        length(
          substrb(g_with_clause||g_select_sql,1,g_last_error_position)
        ) 
        - nvl(lengthb(g_with_clause), 0)
        + 1;
      l_errm := sqlerrm;
      if sqlcode = -6553 then
        if instr(l_errm, 'PLS-221') = 0 then
          raise_application_error(
            -20002,
            'Identifier '||l_identifier||' is invalid even within PL/SQL.'
          );
        else
          select regexp_substr(l_errm, '''(.+)''',1,1,null,1) into l_member
          from dual;
        end if;
      else
        select regexp_substr(l_errm, '"([^"]+)"[^"]*$',1,1,null,1) into l_member
        from dual;
      end if;
      select regexp_substr(
        g_select_sql, 
        '([[:alpha:]][[:alpha:]0-9_#$]*\s*[.]\s*)+'||l_member,
        g_last_error_position,1,'im'
      )
      into l_identifier from dual;
      g_f_num := g_f_num + 1;
      l_func_name := substrb(
        WF||g_f_num||'_'||regexp_replace(l_identifier,'\s|[.]','_'),
        1,30
      );
      g_with_clause := nvl(g_with_clause, 'WITH
')
        || 'function '||l_func_name||' return '
        ||l_identifier||'%type deterministic is begin return '
        ||l_identifier||'; end;
';
      g_select_sql := regexp_replace(
        g_select_sql, l_identifier, l_func_name, g_last_error_position, 1
      );

    end add_function;
    
  begin
    check_input;
    loop
      begin
        l_cursor_id := dbms_sql.open_cursor();
        dbms_sql.parse(l_cursor_id,g_with_clause||g_select_sql,1);
        dbms_sql.close_cursor(l_cursor_id);
        -- If all is well, we are done
        exit;
      exception when not_a_procedure or invalid_identifier then
        g_last_error_position := dbms_sql.last_error_position;
        dbms_sql.close_cursor(l_cursor_id);
        add_function;
      when others then
        if l_cursor_id is not null then
          dbms_sql.close_cursor(l_cursor_id);
        end if;
        raise;
      end;
    end loop;
    
    translated_text := g_with_clause||g_select_sql;
    
  end translate_sql;
  
end translate_pkg_vars;
/
declare
  sql_text  CLOB := 'select dbms_lob.call c from dual';
  translated_text CLOB;
begin
  translate_pkg_vars.translate_sql(sql_text, translated_text);
  dbms_output.put_line('output:
'||translated_text);
end;
/
output:
WITH
function WF1_dbms_lob_call return dbms_lob.call%type deterministic is begin return dbms_lob.call; end;
select WF1_dbms_lob_call c from dual

Make public package constants available to SQL

This is an enhancement request on ODC and a recent Twitter topic. Here is a homemade solution using the SQL Translation Framework.

UPDATE 2018-12-31: this solution became obsolete pretty quickly! Please see this post for a much better solution.

When executing static SQL inside PL/SQL, the developer can directly reference package variables because PL/SQL will automatically treat them as bind variables and bind the correct values. If the developer wants to test the SQL statement outside of PL/SQL, he/she discovers that the package variable is interpreted as a package function, which of course does not exist.

The package I wrote can create a translation profile, enable it, disable it and drop it. It also contains the TRANSLATE_SQL procedure that does the work. It is meant strictly for development environments as a transparent workaround.

The procedure parses the statement, and if it finds something that can be a reference to a CONSTANT variable defined in a package, it substitutes the value found in DBA_SOURCE.

create or replace package translate_constants as
  g_profile_name constant varchar2(30) := 'CONSTANTS';
  procedure create_profile;
  procedure enable_trans;
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  );
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  );
  procedure disable_trans;
  procedure drop_profile;
end translate_constants;
/
create or replace package body translate_constants as
  type t_parm_val is record (
    parm varchar2(512),
    val varchar2(4000)
  );
  type tt_parm_val is table of t_parm_val;
  lt_parm_val tt_parm_val := null;
  
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
  
  procedure create_profile is
    name_in_use exception;
    pragma exception_init(name_in_use, -955);
  begin
    dbms_sql_translator.create_profile(g_profile_name); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_TRANSLATOR, 
    attribute_value  =>  $$PLSQL_UNIT); 
  exception when name_in_use then null;
  end create_profile;
  
  procedure enable_trans is
  begin
    do('alter session set sql_translation_profile = '|| g_profile_name);
    do('alter session set events = ''10601 trace name context forever, level 32''');
  end enable_trans;
  
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  ) is
    l_first_word varchar2(256);
    l_cursor_id integer;
    not_a_procedure exception;
    pragma exception_init(not_a_procedure, -6553);
    l_errm varchar2(4000);
    l_constant varchar2(4000);
    l_identifier varchar2(4000);
    l_test_sql varchar2(4000);
    l_text varchar2(4000);
  begin
    select lower(regexp_substr(sql_text,'\w+')) into l_first_word from dual;
    if l_first_word not in ('select','with','insert','update','delete','merge') then
      translated_text := null;
      return;
    end if;
    l_errm := 'start';
    translated_text := sql_text;
    while l_errm is not null loop
      l_errm := null;
      begin
        l_cursor_id := dbms_sql.open_cursor();
        dbms_output.put_line(translated_text);
        dbms_sql.parse(l_cursor_id,translated_text,1);
      exception when not_a_procedure then
        l_errm := sqlerrm;
        if instr(l_errm, 'PLS-221:') = 0 then
          raise;
        end if;
        dbms_output.put_line(l_errm);
        select regexp_substr(l_errm, q'§'(.+)'§',1,1,null,1) into l_constant from dual;
        dbms_output.put_line(l_constant);
        select regexp_substr(
          translated_text, 
          '([A-Z][A-Z0-9#_$]*[.])?[A-Z][A-Z0-9#_$]*[.]'||l_constant
          ,1,1,'i'
        )
        into l_identifier from dual;
        dbms_output.put_line(l_identifier);        
        dbms_utility.expand_sql_text(
          'select ' || l_identifier || ' from dual',
          l_test_sql
        );
        dbms_output.put_line(l_test_sql);
        dbms_output.put_line(regexp_substr(l_test_sql,'"(.*?)"',1,1,null,1));
        dbms_output.put_line(regexp_substr(l_test_sql,'"(.*?)"',1,2,null,1));
        select regexp_substr(text, ':=\s*?(\S.*?)\s*;',1,1,null,1)
        into l_text from dba_source
        where owner = regexp_substr(l_test_sql,'"(.*?)"',1,1,null,1)
          and name = regexp_substr(l_test_sql,'"(.*?)"',1,2,null,1) 
          and type = 'PACKAGE' 
          and regexp_like(text, l_constant||'\s+constant\s+','i');
        
        dbms_output.put_line('>'||l_text);
        translated_text := regexp_replace(translated_text,'([^.])'||l_identifier,'\1'||l_text,1,0,'m');
      end;
      dbms_sql.close_cursor(l_cursor_id);
    end loop;
    if l_cursor_id is not null then
      dbms_sql.close_cursor(l_cursor_id);
    end if;
  exception when others then
    if l_cursor_id is not null then
      dbms_sql.close_cursor(l_cursor_id);
    end if;
    raise;

  end translate_sql;
  
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  ) is
  begin
    translated_sqlstate := null;
  end translate_error;
  
  procedure disable_trans is
  begin
    do('alter session set sql_translation_profile = null');
    do('alter session set events = ''10601 trace name context off''');
  end disable_trans;
  
  procedure drop_profile is
    profile_not_found exception;
    pragma exception_init(profile_not_found, -24252);
  begin
    dbms_sql_translator.drop_profile(g_profile_name);
  exception when profile_not_found then null;
  end drop_profile;
  
end translate_constants;
/

And a small test:

exec translate_constants.create_profile;
exec translate_constants.enable_trans;

select dbms_lob.lobmaxsize maxsize,
sys.dbms_lob.lobmaxsize samething from dual;

   MAXSIZE  SAMETHING
---------- ----------
1.8447E+19 1.8447E+19
          
exec translate_constants.disable_trans;
exec translate_constants.drop_profile;

Remember, this is only to help developers. It is not something to ever deploy in a test or production environment.