Elegant MODEL clause

Frank Kulash recently asked a challenging question on OTN, which Chris227 answered using the MODEL clause. To help  fully understand his excellent answer, I’m going to go into detail. (To concentrate on the modeling concepts, I’ve changed a few details in the problem and the solution.)

Teams Drafting Players

There are five teams that take turns choosing among eight players.

TEAM_NAME PLAYER_NAME
Atletico Mane
Burnley Neymar
Crystal Ronaldo
Dortmund Silva
Everton Weah
Xavi
Yorke
Zidane

 

The teams choose in alphabetical order. Once a player is picked, he is of course unavailable to be picked by another team. In fact, a team does not pick directly: it gives a rating to each player that it wants, and the available player with the highest rating is automatically assigned to the team. To simplify the problem, each team picks only once.

Here are the ratings, ordered by TEAM_NAME, RATING DESC, with the picks in bold.

TEAM_NAME PLAYER_NAME RATING
Atletico Mane 90
Atletico Neymar 80
Atletico Ronaldo 60
Burnley Silva 91
Burnley Weah 71
Burnley Zidane 51
Crystal Silva 82
Crystal Weah 72
Crystal Mane 62
Dortmund Silva 93
Dortmund Weah 83
Dortmund Mane 73
Everton Weah 94
Everton Silva 84
Everton Yorke 74
  • Atletico and Burney get their top rated players (Mane and Silva).
  • Crystal gets its second rated (Weah) because its first rated (Silva) is taken.
  • Dortmund has only rated three players and they are all taken, so Dortmund gets no one.
  • Everton gets its third rated player (Yorke) because the first two are taken.

For input, here is Frank’s table with the ratings for each team. Notice that (team_name, player_name) is unique, and (team_name, rating) is unique.

CREATE TABLE    choice (team_name, player_name, rating)    AS
SELECT  'Atletico', 'Mane',    90  FROM dual  UNION ALL
SELECT  'Atletico', 'Neymar',  80  FROM dual  UNION ALL      -- combination of
SELECT  'Atletico', 'Ronaldo', 60  FROM dual  UNION ALL      -- (team_name, player_name)
SELECT  'Burnley',  'Silva',   91  FROM dual  UNION ALL      -- is unique; so is
SELECT  'Burnley',  'Weah',    71  FROM dual  UNION ALL      -- (team_name, rating)
SELECT  'Burnley',  'Zidane',  51  FROM dual  UNION ALL
SELECT  'Crystal',  'Silva',   82  FROM dual  UNION ALL
SELECT  'Crystal',  'Weah',    72  FROM dual  UNION ALL
SELECT  'Crystal',  'Mane',    62  FROM dual  UNION ALL
SELECT  'Dortmund', 'Silva',   93  FROM dual  UNION ALL
SELECT  'Dortmund', 'Weah',    83  FROM dual  UNION ALL
SELECT  'Dortmund', 'Mane',    73  FROM dual  UNION ALL
SELECT  'Everton',  'Weah',    94  FROM dual  UNION ALL
SELECT  'Everton',  'Silva',   84  FROM dual  UNION ALL
SELECT  'Everton',  'Yorke',   74  FROM dual;

Starting to MODEL

select * from choice
model
dimension by ( team_name, player_name)
measures ( rating, 0 picked )
rules (
)
order by team_name, rating desc;

The MODEL clause turns the input data into a multi-dimensional array.

  • The DIMENSION clause lists the array indexes: here they are strings, not integers. The dimension columns identify each row uniquely, and their values cannot be changed.
  • The MEASURES clause lists columns whose values can be changed.
    A measure column in an individual row is called a cell: for example, RATING['Atletico', 'Mane']
  • The RULES clause lists rules that say what changes should be made. We’ll see rules in a bit.

Here we see what the data looks like before any rules are applied.

TEAM_NAME PLAYER_NAME RATING PICKED
Atletico Mane 90 0
Atletico Neymar 80 0
Atletico Ronaldo 60 0
Burnley Silva 91 0
Burnley Weah 71 0
Burnley Zidane 51 0
Crystal Silva 82 0
Crystal Weah 72 0
Crystal Mane 62 0
Dortmund Silva 93 0
Dortmund Weah 83 0
Dortmund Mane 73 0
Everton Weah 94 0
Everton Silva 84 0
Everton Yorke 74 0

 

What is a RULE?

A rule assigns a value to a cell. For example

PICKED['Atletico', 'Mane'] = 1

The left-hand side is the cell, and the right-hand side is the value assigned to that cell.

The left-hand side can refer to more than one cell:

  • PICKED['Atletico', ANY] = 1 will assign 1 to the PICKED cells for team ‘Atletico’, no matter who the player is;
  • PICKED[ANY, 'Mane'] = 1 will assign 1 to the PICKED cells for player ‘Mane’, no matter what the team is;
  • PICKED[ANY, ANY] = 1 will assign 1 to each and every PICKED cell.

The right-hand side contains a value, but that value can come from a cell. To use the same dimension values on the left-hand side and the right-hand side, we can use CV().

  • PICKED[ANY, ANY] = RATING[CV(),CV()] will assign to each PICKED cell the value RATING from the same row.

Each rule applies to only one cell at a time. If the left-hand side refers to several cells, the right-hand side is evaluated and assigned separately to each left-hand cell. The result of the evaluation must be one value, not a range of values.

Aggregations in Rules

On the right-hand side of a rule, we can aggregate over a range of cells to get one value.

  • PICKED[ANY, ANY] = max(RATING)[CV(),ANY] will assign to each PICKED cell the maximum value of RATING for that team. Notice the parentheses end before the dimension columns.

Ordering in Rules

Since each rule is applied to only one left-side cell at a time, the order of execution can be important. Look what happens with this rule:

picked[any,any] order by team_name, rating desc = max(picked)[any,any] + 1

TEAM_NAME PLAYER_NAME RATING PICKED
Atletico Mane 90 1
Atletico Neymar 80 2
Atletico Ronaldo 60 3
Burnley Silva 91 4
Burnley Weah 71 5
Burnley Zidane 51 6
Crystal Silva 82 7
Crystal Weah 72 8
Crystal Mane 62 9
Dortmund Silva 93 10
Dortmund Weah 83 11
Dortmund Mane 73 12
Everton Weah 94 13
Everton Silva 84 14
Everton Yorke 74 15

 

Do you see what happened here? For every left-side cell, the right-side expression is evaluated using the previously assigned values! That is why the order is so important.

The Solution to our problem

If we order the rows by team_name, rating desc then we can say for each row:

  • If the current row’s team has not picked
  • and the current row’s player has not been picked,
  • then pick that player for that team.
select * from choice
model
dimension by ( team_name, player_name)
measures ( rating, 0 picked )
rules (
  picked[any,any] order by team_name, rating desc = 
    case
      when max(picked)[any,cv()] = 0 and max(picked)[cv(),any] = 0
      then 1
      else 0
    end    
);
TEAM_NAME PLAYER_NAME RATING PICKED
Atletico Mane 90 1
Atletico Neymar 80 0
Atletico Ronaldo 60 0
Burnley Silva 91 1
Burnley Weah 71 0
Burnley Zidane 51 0
Crystal Silva 82 0
Crystal Weah 72 1
Crystal Mane 62 0
Dortmund Silva 93 0
Dortmund Weah 83 0
Dortmund Mane 73 0
Everton Weah 94 0
Everton Silva 84 0
Everton Yorke 74 1

 

Multiple Rounds

To let the teams pick more than one player, we can just add the ITERATE keyword. The ITERATION_NUMBER function starts with 0.

select * from choice
model
dimension by ( team_name, player_name)
measures ( rating, 0 picked )
rules iterate (3) (
  picked[any,any] order by team_name, rating desc = 
    case
      when max(picked)[any,cv()] = 0 and max(picked)[cv(),any] <= iteration_number
      then iteration_number + 1
      else picked[cv(),cv()]
    end    
);
TEAM_NAME PLAYER_NAME RATING PICKED
Atletico Mane 90 1
Atletico Neymar 80 2
Atletico Ronaldo 60 3
Burnley Silva 91 1
Burnley Weah 71 0
Burnley Zidane 51 2
Crystal Silva 82 0
Crystal Weah 72 1
Crystal Mane 62 0
Dortmund Silva 93 0
Dortmund Weah 83 0
Dortmund Mane 73 0
Everton Weah 94 0
Everton Silva 84 0
Everton Yorke 74 1

 

Did you make it this far?

If so, thanks for your patience! I thought this was a great example of the power of MODEL , because it uses ordering, aggregation and previously changed values in its evaluations – all in one pretty concise rule.

Hope this helps…

Advertisements

Missing cents when dividing an amount

Alex Nuijten (@alexnuijten) wrote a blog post on this subject a few years ago. Since his blog doesn’t accept comments at the moment, here is a variant I think is interesting.

The problem

As an example, say we have $100 that we want to divide into 42 parts.

  • ROUND(100/42,2) gives us $2.38
  • Multiplying $2.38 by 42 gives us $99.96
  • So to get a total of 100, we need to add one cent to four lines.

Adding cents to the first four lines

Here is Alex’s solution:

with amounts
as
( select level id
      ,100 amount
      ,count(*) over () entries
      ,round (100 / count(*) over (), 2) rounded
    from dual
    connect by level <= 42
)
,indicators as 
(
select id
      ,amount
      ,entries
      ,rounded
      ,case 
       when rownum <= abs ((amount - (entries * rounded)) * 100)
       then 'x'
       end as indicator
      ,sign (amount - (entries * rounded)) as pos_neg
 from amounts
)
select id
      ,rounded +
       (nvl2 (indicator, 0.01, 0) * pos_neg) final_amount
  from indicators
;
ID FINAL_AMOUNT
1 2.39
2 2.39
3 2.39
4 2.39
5 2.38
6 2.38
7 2.38
8 2.38
9 2.38

 

Adding cents using a running total

Personally, I would prefer to spread the extra cents more evenly across the lines. To do that, I simply round the current running total, then subtract the previous lines rounded running total. This turns out to be less code.

with data(id, amount, entries) as (
  select level, 100, count(*) over()
  from dual connect by level <= 42
)
select id, amount, entries,
  round(id*amount/entries, 2) - round((id-1)*amount/entries, 2) split
from data;

The lines with an extra cent are 6, 16, 27 and 37.

Hope this helps. Regards, Stew

My #OUGIreland19 Presentation on JSON

Thanks to those who came to my presentation today at OUG Ireland 2019. As promised, here is the link to the slides.

I had a great time at the conference, meeting new folks and spending time (but never enough) with the less new. The talks I attended were uniformly excellent. Special mention to Matt Mulvaney who again made me marvel at APEX, even though he showed how to go beyond what it can currently do.

Have safe trips home, everybody!

Best regards,
Stew

My #OUGN19 Presentations

As promised, here are the two presentations I gave on the OUGN 2019 Boat conference. I had a great time with old friends and new, and managed to have discussions about politics without anyone getting upset! Hope to see some of you again at OUG Ireland!

Please download the presentations to see the animations.

https://www.slideshare.net/stewashton/json-in-18c-and-19c

https://www.slideshare.net/stewashton/make-your-data-dance

Splitting strings before they’re CLOBs!

After I tokenized two CLOBs in response to a forum question, Andy Sayer mentioned that they were probably files first. So I smacked my forehead and used external tables. When the light bulb went on, I saw that the comma needed to be a record delimiter, not a field separator. Easy.

create table item_ids_EXT(  
  "RN" NUMBER,  
  "ITEM_ID" VARCHAR2(16)  
)  
ORGANIZATION EXTERNAL(  
  TYPE ORACLE_LOADER  
  DEFAULT DIRECTORY EXT_DATA_DIR  
  ACCESS PARAMETERS(  
    records delimited by ','   
    BADFILE EXT_DATA_DIR:'i_item_id.txt.bad'  
    LOGFILE EXT_DATA_DIR:'i_item_id.txt.log'  
    FIELDS TERMINATED BY "§" NOTRIM  
    MISSING FIELD VALUES ARE NULL  
    REJECT ROWS WITH ALL NULL FIELDS  
    (  
      "RN" recnum,  
      "ITEM_ID" char(255)  
    )  
  )  
  location ('i_item_id.txt')  
)  
REJECT LIMIT 10;

create table item_keys_EXT(  
  "RN" NUMBER,  
  "ITEM_KEY" VARCHAR2(255)  
)  
ORGANIZATION EXTERNAL(  
  TYPE ORACLE_LOADER  
  DEFAULT DIRECTORY EXT_DATA_DIR  
  ACCESS PARAMETERS(  
    records delimited by ','   
    BADFILE EXT_DATA_DIR:'i_item_id.txt.bad'  
    LOGFILE EXT_DATA_DIR:'i_item_id.txt.log'  
    FIELDS TERMINATED BY "§" NOTRIM  
    MISSING FIELD VALUES ARE NULL  
    REJECT ROWS WITH ALL NULL FIELDS  
    (  
      "RN" recnum,  
      "ITEM_KEY" char(255)  
    )  
  )  
  location ('i_item_key.txt')  
)  
REJECT LIMIT 10;  

set timing on

create table u as  
select item_id, item_key  
from item_ids_ext  
join item_keys_ext using(rn);

Table U created.

Elapsed: 00:00:00.787

select floor(2 * 60 * 60 / 0.787) times_faster from dual;

TIMES_FASTER
------------
        9148

Splitting strings when they are CLOBs

Like many of you, I have enjoyed using SQL to make data processing go hundreds of times faster. My new record is: 8000 times faster :-)

In a recent ODC thread, the question was how to split out comma separated values from two CLOBs, one containing “item ids” and the other “item keys”. The ids and the keys had to be matched by position: the first id with the first key, the second id with the second key, and so on. Then each id/key pair was to be inserted as a row into a target table.

With 50,000 ids of 7 characters and 50,000 keys of 172 characters, the current processing takes 2 hours.

Test data

First I’ll create a table with the two CLOBs.

--drop table t purge;
create table t as
select 
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(E, ',' || to_char(level,'fm0000009'))
      order by level
    ) AS CLOB
  ), 2
) i_item_id,
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(E, ',' || rpad(to_char(level,'fm0000009'),172,'*'))
      order by level
    ) AS CLOB
  ), 2
) i_item_key
from dual
connect by level <= 50000;

Splitting CLOB strings

Instead of seeking a “perfect” custom-made solution, I just put together two functions I have already written about:

  1. The PIPE_CLOB function: a pipelined table function that reads a CLOB and outputs up to 4000 bytes at a time, respecting a delimiter.
  2. The STRING_TOKENIZE function: a pipelined table function that splits a VARCHAR2 string according to a delimiter.

All I have to do is feed the output from the first function into the second, and voilà!

select b.column_value item_id
from t,
table(pipe_clob( i_item_id, 4000, ',')) a,
table(string_tokenize(a.column_value,',')) b;

Combining the two CLOBs

Remember, the requirement was to combine the values from both CLOBs by position, in order to insert them into a target table. Since the functions return the values in order, I just use ROWNUM to number the values and use that number to join them.

--drop table u purge;
create table u as
with item_ids as (
  select rownum rn,
    b.column_value item_id
    from t,
  table(pipe_clob( i_item_id, 4000, ',')) a,
  table(string_tokenize(a.column_value,',')) b
)
, item_keys as (
  select rownum rn,
    b.column_value item_key
    from t,
  table(pipe_clob( i_item_key, 4000, ',')) a,
  table(string_tokenize(a.column_value,',')) b
)
select item_id, item_key 
from item_ids join item_keys using(rn);

Table U created.

Elapsed: 00:00:00.879

That’s right, processing time went from two hours to less than one second.

Now, I suspect that the questioner’s code is doing more than just a simple insert, but until he or she provides more details about the requirement, I get to maintain my claim:

8000 times faster!

(which I hope applies only to my code, not to the speed at which time flies…)

Using SQL Translation Framework to access package variables

My last post had code to rewrite SELECT statements that tried to access public package variables. Here I’ll integrate the code into the SQL Translation Framework. Use this only in development environments.

Why only in Dev? Overhead and complication

You normally use the SQL Translation Framework for database migration. The idea is to translate SQL statements from non-Oracle syntax to Oracle syntax, test the translated statements and ensure that the translations are used instead of the original. You don’t want to do all that work every time a statement is submitted to the database, and you don’t want to do it for every statement. In fact, Oracle drivers can now allow applications to say which statements are to be translated and which are to be left alone.

My solution is necessarily generic:

  • The translation service is enabled at the session level.
  • At that point, every statement execution is preceded by a “parse call” and the execution of my TRANSLATE_SQL procedure.

Those “parse calls” will result in “soft parses” since the statement already resides in the library cache. That is still unacceptable overhead in any production-type environment.

I would add that on-the-fly translation in production would cause havoc if it ever stopped working, and would complicate debugging considerably.

Don’t do it.

In fact, I highly recommend just using the TRANSLATE_SQL procedure as presented in my previous post. You can then see, edit, test and deploy the result. You might want to use on-the-fly translation if you are writing a SELECT statement that accesses package variables and you want to make and test changes faster, but that’s it.

Anyway, for what it’s worth here is the code:

create or replace package translate_pkg_vars authid current_user as

  g_trans_profile_name constant varchar2(30) := 'PACKAGE_VARS';
  -- Standard prefix of functions in the generated WITH clause
  wf constant varchar2(8) := 'WF';
  
  procedure create_profile;
  procedure enable_trans;
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  );
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  );
  procedure disable_trans;
  procedure drop_profile;
end translate_pkg_vars;
/
create or replace package body translate_pkg_vars as
  
  procedure do(p_sql in varchar2) is
  begin
    dbms_output.put_line(p_sql);
    execute immediate p_sql;
  end do;
  
  procedure create_profile is
    name_in_use exception;
    pragma exception_init(name_in_use, -955);
  begin
    dbms_sql_translator.create_profile(g_trans_profile_name); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_TRANSLATOR, 
    attribute_value  =>  $$PLSQL_UNIT); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_RAISE_TRANSLATION_ERROR, 
    attribute_value  =>  DBMS_SQL_TRANSLATOR.ATTR_VALUE_FALSE); 
  DBMS_SQL_TRANSLATOR.SET_ATTRIBUTE(
    profile_name     =>  g_trans_profile_name, 
    attribute_name   =>  DBMS_SQL_TRANSLATOR.ATTR_TRACE_TRANSLATION, 
    attribute_value  =>  DBMS_SQL_TRANSLATOR.ATTR_VALUE_FALSE); 
  exception when name_in_use then null;
  end create_profile;
  
  procedure enable_trans is
  begin
    do('alter session set sql_translation_profile = '|| g_trans_profile_name);
    do('alter session set events = ''10601 trace name context forever, level 32''');
  end enable_trans;
  
  procedure translate_sql(
    sql_text in CLOB, 
    translated_text out nocopy CLOB
  ) is
    not_a_procedure exception;
    pragma exception_init(not_a_procedure, -6553);
    invalid_identifier exception;
    pragma exception_init(invalid_identifier, -904);
    
    g_with_clause varchar2(32767);
    g_select_sql varchar2(32767);
    l_cursor_id integer;
    g_last_error_position integer;
    g_f_num integer := 0;
    
    procedure check_input is
      l_first_word varchar2(256);
      clob_too_long exception;
      pragma exception_init(clob_too_long, -6502);
      l_clob_too_long_msg constant varchar2(999) :=
        'Cannot parse statement longer than 32767 bytes: '
        || 'DBMS_SQL might return erroneous error position.';
    begin
      -- initialize with clause and select sql. Make sure input is 32k max.
      select lower(regexp_substr(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)',1,1,'im',4)) into l_first_word from dual;
      if l_first_word = 'select' then
        g_with_clause := null;
        g_select_sql := sql_text;
      elsif l_first_word = 'with' then
        g_with_clause := 'WITH
';
        g_select_sql := regexp_replace(sql_text,'(\s+|(/\*[^/*]*\*/)|(--.*
))*(\S+)','\1',1,1,'im');
      else
        dbms_output.put_line('Input must be a SELECT statement.');
        raise_application_error(-20000, 'Input must be a SELECT statement.');
      end if;
    exception when clob_too_long then
      raise_application_error(-20001, l_clob_too_long_msg);
    end check_input;
    
    procedure add_function is
      l_errm varchar2(4000);
      l_member varchar2(128);
      l_identifier varchar2(4000);
      l_func_name varchar2(30 BYTE);
    begin
      -- change last error position from byte 0-offset to character 1-offset
      -- find last member of identifier
      -- find entire identifier
      -- generate function name
      -- add function definition to with clause
      -- replace identifier by function call.
      g_last_error_position :=
        length(
          substrb(g_with_clause||g_select_sql,1,g_last_error_position)
        ) 
        - nvl(lengthb(g_with_clause), 0)
        + 1;
      l_errm := sqlerrm;
      if sqlcode = -6553 then
        if instr(l_errm, 'PLS-221') = 0 then
          raise_application_error(
            -20002,
            'Identifier '||l_identifier||' is invalid even within PL/SQL.'
          );
        else
          select regexp_substr(l_errm, '''(.+)''',1,1,null,1) into l_member
          from dual;
        end if;
      else
        select regexp_substr(l_errm, '"([^"]+)"[^"]*$',1,1,null,1) into l_member
        from dual;
      end if;
      select regexp_substr(
        g_select_sql, 
        '([[:alpha:]][[:alpha:]0-9_#$]*\s*[.]\s*)+'||l_member,
        g_last_error_position,1,'im'
      )
      into l_identifier from dual;
      g_f_num := g_f_num + 1;
      l_func_name := substrb(
        WF||g_f_num||'_'||regexp_replace(l_identifier,'\s|[.]','_'),
        1,30
      );
      g_with_clause := nvl(g_with_clause, 'WITH
')
        || 'function '||l_func_name||' return '
        ||l_identifier||'%type deterministic is begin return '
        ||l_identifier||'; end;
';
      g_select_sql := regexp_replace(
        g_select_sql, l_identifier, l_func_name, g_last_error_position, 1
      );

    end add_function;
    
  begin
    check_input;
    loop
      begin
        l_cursor_id := dbms_sql.open_cursor();
        dbms_sql.parse(l_cursor_id,g_with_clause||g_select_sql,1);
        dbms_sql.close_cursor(l_cursor_id);
        -- If all is well, we are done
        exit;
      exception when not_a_procedure or invalid_identifier then
        g_last_error_position := dbms_sql.last_error_position;
        dbms_sql.close_cursor(l_cursor_id);
        add_function;
      when others then
        if l_cursor_id is not null then
          dbms_sql.close_cursor(l_cursor_id);
        end if;
        raise;
      end;
    end loop;
    
    translated_text := g_with_clause||g_select_sql;
    
  end translate_sql;
  
  procedure translate_error(
    error_code in BINARY_INTEGER,
    translated_code out BINARY_INTEGER,
    translated_sqlstate out nocopy VARCHAR2
  ) is
  begin
    translated_sqlstate := null;
  end translate_error;
  
  procedure disable_trans is
  begin
    do('alter session set sql_translation_profile = null');
    do('alter session set events = ''10601 trace name context off''');
  end disable_trans;
  
  procedure drop_profile is
    profile_not_found exception;
    pragma exception_init(profile_not_found, -24252);
  begin
    dbms_sql_translator.drop_profile(g_trans_profile_name);
  exception when profile_not_found then null;
  end drop_profile;
  
end translate_pkg_vars;
/

And a quick test:

exec translate_pkg_vars.create_profile;
exec translate_pkg_vars.enable_trans;

select dbms_lob.lobmaxsize, dbms_lob.call from dual;

WF2_DBMS_LOB_LOBMAXSIZE WF1_DBMS_LOB_CALL
----------------------- -----------------
             1.8447E+19                12
          
exec translate_pkg_vars.disable_trans;
exec translate_pkg_vars.drop_profile;