SQL table macros 9: just pivot!

The PIVOT clause does some aggregation, then pivots the result – but it is not dynamic.

Let’s make a deal: you do the aggregation and the JUST_PIVOT table macro will pivot the result, dynamically!

[UPDATE 2022-01-14: the solution I propose here works only in version 19c. For a more complicated solution that works in 19c and 21c, see SQL table macros 11: JUST_PIVOT for 21c (and 19c)]

Doing it the old-fashioned way

Let’s say we want the total salary for each job type in each department. This is a simple aggregation:

select dname, job, sum(sal) sal
from dept
join emp using(deptno)
group by dname, job
order by 1,2
DNAME JOB SAL
ACCOUNTING CLERK 1300
ACCOUNTING MANAGER 2450
ACCOUNTING PRESIDENT 5000
RESEARCH ANALYST 6000
RESEARCH CLERK 1900
RESEARCH MANAGER 2975
SALES CLERK 950
SALES MANAGER 2850
SALES SALESMAN 5600
 

Now that we know all possible JOB values, we can aggregate and pivot the result directly.

select * from (
  select dname, job, sal
  from dept
  join emp using(deptno)
)
pivot(sum(SAL) SAL for (JOB) in (
  ('ANALYST') as ANALYST,
  ('CLERK') as CLERK,
  ('MANAGER') as MANAGER,
  ('PRESIDENT') as PRESIDENT,
  ('SALESMAN') as SALESMAN
));
DNAME ANALYST_SAL CLERK_SAL MANAGER_SAL PRESIDENT_SAL SALESMAN_SAL
ACCOUNTING   1300 2450 5000  
RESEARCH 6000 1900 2975    
SALES   950 2850   5600
 
  • I call SAL an “aggregation column” because it is used in an aggregation function.
  • JOB is a “pivot column” because its values are used in create the new pivoted columns.
  • DNAME is a “group by column” because PIVOT groups by it implicitly.

Using the JUST_PIVOT macro

JUST_PIVOT asks you for four things:

  1. p_INPUT_QUERY: the input query must be a literal wrapped in a sys.odcivarchar2list() collection. This allows the content to be accessed by the macro; macros do not have access to the content of VARCHAR2 literals.

    Every column in the input query must also appear in one of these column lists:
  2. p_GROUP_COLS: columns that are GROUPed BY explicitly in the input query and implicitly in the PIVOT
  3. p_PIVOT_COLS: columns that are GROUPed BY explicitly in the input query and pivoted in the PIVOT
  4. p_AGG_COLS: column that are aggregated explicitly in the input query and “re-aggregated” using MAX() in the PIVOT
select * from just_pivot(
  p_input_query => sys.odcivarchar2list(
q'§
select dname, job, sum(sal) sal
from dept
join emp using(deptno)
group by dname, job
§'
  ),
  p_group_cols => columns(dname),
  p_pivot_cols => columns(job),
  p_agg_cols => columns(sal)
);
DNAME ANALYST_SAL CLERK_SAL MANAGER_SAL PRESIDENT_SAL SALESMAN_SAL
ACCOUNTING   1300 2450 5000  
RESEARCH 6000 1900 2975    
SALES   950 2850   5600
 

Advantages of the macro

  • The macro actually executes the input query a first time in order to build a complete “pivot-in-list”. We don’t have to.
  • The final query is a flashback query, so it is parsed every time and the macro generates an up-to-date final query every time.
  • If we generate total values, they are automatically put in the right places in the output!
    Here is an example, with one line of code changed :
select * from just_pivot(
  p_input_query => sys.odcivarchar2list(
q'§
select dname, job, sum(sal) sal
from dept
join emp using(deptno)
group by cube(dname, job)
§'
  ),
  p_group_cols => columns(dname),
  p_pivot_cols => columns(job),
  p_agg_cols => columns(sal)
);
DNAME ANALYST_SAL CLERK_SAL MANAGER_SAL PRESIDENT_SAL SALESMAN_SAL Total_SAL
ACCOUNTING   1300 2450 5000   8750
RESEARCH 6000 1900 2975     10875
SALES   950 2850   5600 9400
Total 6000 4150 8275 5000 5600 29025
 

My next post will show how the macro works with multiple group by, pivot and / or aggregation columns.

The code

/* Just Pivot MACRO
This macro takes as input a query that does an explicit GROUP BY.
The macro first uses the input query to get the list of pivot columns,
then wraps the input query in a final query that does a PIVOT using MAX().
The final query has a flashback clause in order to make the cursor unshareable:
  - there will be a hard parse phase at every execution,
  - the macro will execute and the list of pivot values will be up to date.

Input:
1) p_INPUT_QUERY: the input query must be a literal
   wrapped in a sys.odcivarchar2list() collection.
   This allows the content to be accessed by the macro;
   macros do not have access to the content of VARCHAR2 literals.

Every column in the input query must also appear in one of these column lists:
2) p_GROUP_COLS: columns that are GROUPed BY explicitly in the input query
   and implicitly in the PIVOT (*)
3) p_PIVOT_COLS: column that are GROUPed BY explicitly in the input query
   and pivoted in the PIVOT (*)
4) p_AGG_COLS: column that are aggregated explicitly in the input query
   and "re-aggregated" using MAX() in the PIVOT
5) p_total_label: label to use for pivot totals (if CUBE, ROLLUP, etc.)
   null values are assumed to be totals and will be replaced with some non-null label
   such as 'Total', which is the default, and will be ordered at the end

Processing:
- within the macro, execute the input query to get the list of pivot columns
- generate the final SQL statement based on the input query and the intermediate result,
  adding an AS OF clause to ensure the cursor will not be reused.
*/
create or replace function just_pivot(
  p_INPUT_QUERY in sys.odcivarchar2list,
  p_GROUP_COLS in dbms_tf.columns_t,
  p_PIVOT_COLS in dbms_tf.columns_t,
  p_AGG_COLS in dbms_tf.columns_t,
  p_total_label in dbms_tf.columns_t default null   
) return clob sql_macro is
/*
- PIVOT_IN_LIST concatenates values from all p_PIVOT_COLS columns
  for example, if (DNAME,JOB) then
    ('ACCOUNTING','CLERK') as "ACCOUNTING_CLERK",
    ('ACCOUNTING','MANAGER') as "ACCOUNTING_MANAGER",
  and so on
*/
  l_pivot_in_list_sql long :=  
q'@select listagg('(' || EXPR || ') as "' || AL || '"', ',
') within group(order by #PIVOT_COLS#)
from (
  select distinct
  #EXPR# EXPR,
  #ALIAS# AL,
  #PIVOT_COLS#
  from (#INPUT_QUERY#)
)@';
  l_PIVOT_COLS varchar2(4000);
  l_EXPR varchar2(4000);
  l_ALIAS varchar2(4000);

  l_final_sql long :=
'with pivoted as (
  select * from (
    select #NVL_COLS#,
    #AGG_COLS#
    from (#INPUT_QUERY#)
  )
  pivot(#AGG_MAX_COLS# for (#PIVOT_COLS#) in (
    #PIVOT_IN_LIST#
  ))
)
select * from pivoted
as of scn dbms_flashback.get_system_change_number
order by #ORDER_COLS#';
  l_NVL_COLS varchar2(4000);
  l_AGG_COLS varchar2(4000);
  l_AGG_MAX_COLS varchar2(4000);
  l_PIVOT_IN_LIST varchar2(4000);
  l_ORDER_COLS varchar2(4000);
  
  l_total_label varchar2(32) := 'Total';

begin
  -- set value of l_total_label, which is needed right away
  if p_total_label is not null then
    sqm_util.list_columns(p_total_label, l_total_label, '%s', null, true);
  end if;
  
  -- set values to be plugged into l_pivot_in_list_sql
  sqm_util.list_columns(p_PIVOT_COLS, l_PIVOT_COLS);
  sqm_util.list_columns(
    p_PIVOT_COLS, l_EXPR,
    '''''''''||nvl(%s||null,'''||l_total_label||''')||''''''''', 
    q'§||','||§',
    true
  );
  sqm_util.list_columns(p_PIVOT_COLS, l_ALIAS,
    'nvl(%s||null,'''||l_total_label||''')',
    q'§||'_'||§',
    true
  );
  
  -- plug values into l_pivot_in_list_sql
  l_pivot_in_list_sql := replace(replace(replace(replace(
    l_pivot_in_list_sql,
    '#PIVOT_COLS#', l_PIVOT_COLS),
    '#EXPR#', l_EXPR),
    '#ALIAS#', l_ALIAS),
    '#INPUT_QUERY#', p_INPUT_QUERY(1)
  );
  dbms_output.put_line('line 109');
  
  -- execute l_pivot_in_list_sql
  dbms_output.put_line(l_pivot_in_list_sql);
  execute immediate l_pivot_in_list_sql into l_PIVOT_IN_LIST;
  dbms_output.put_line(l_PIVOT_IN_LIST);
  
  -- set values to be plugged into l_final_sql
  sqm_util.list_columns(
    p_GROUP_COLS multiset union p_PIVOT_COLS,
    l_NVL_COLS,
    'nvl(%s||null, '''||l_total_label||''') %s'
  );

  sqm_util.list_columns(p_AGG_COLS, l_AGG_MAX_COLS, 'max(%s) %s');
  sqm_util.list_columns(p_AGG_COLS, l_AGG_COLS);
  sqm_util.list_columns(p_GROUP_COLS, l_ORDER_COLS,
    'nullif(%s, '''||l_total_label||''')'
  );
  
  -- plug values into l_final_sql
  l_final_sql := replace(replace(replace(replace(replace(replace(replace(
    l_final_sql,
    '#NVL_COLS#', l_NVL_COLS),
    '#AGG_COLS#', l_AGG_COLS),
    '#INPUT_QUERY#', p_INPUT_QUERY(1)),
    '#AGG_MAX_COLS#', l_AGG_MAX_COLS),
    '#PIVOT_COLS#', l_PIVOT_COLS),
    '#ORDER_COLS#', l_ORDER_COLS),
    '#PIVOT_IN_LIST#', l_PIVOT_IN_LIST);
  
  dbms_output.put_line(l_final_sql);
  return l_final_sql;
  
end just_pivot;
/
Advertisement

APEX – Scratching the itch #JoelKallmanDay

My biggest regret upon retirement in 2015 was not having had the time to study APEX and promote it in my organisation.

Fortunately, I had a daughter in medical school in Paris. She was in the Commission that managed the process of assigning internships to students every trimester. I used the “Hungarian Algorithm” to optimise the students’ preferences and achieve the greatest average satisfaction. With the help of Richard Martens, I then developed an APEX application to accept the preferences and to enable an administrator to run the process directly without my intervention.

Students on average got their fifth choice out of about 150 possibilities!

Since then I have developed a new version of this application after the merger of two medical schools, a similar application for the medical school in Nancy, and two COVID-related applications.

By this time, I should have enough experience to temper my enthusiasm for APEX: I should be starting the downward slope in the Hype Cycle from “inflated expectations” to the “trough of disillusionment”. No way! APEX was always there for me, either meeting my need out of the box or allowing me to tinker with a bit of javascript. Not only that, but slightly more difficult tasks kept getting easier with each version as new functions kept piling up.

Sometimes the feature I needed was hidden away where I could not find it, but online searches generally led me to some member of the APEX community who had already solved my problem. Often enough, that feature was simply there in the first spot I looked.

I know I have barely scratched the surface of all that APEX has to offer, but I was able to scratch my itch,
and boy did it feel good!

With my thanks to the entire team and especially Mr. Kallman,

Stew Ashton

LISTAGG() as a Cumulative Function

LISTAGG() can be used as an analytic function, but cannot be cumulative. What does that mean, and is there a workaround?

I’ll demonstrate with the boring old DEPT and EMP tables, just because they have so few rows, and I’ll keep only the department and employee names:

create or replace view denames as
select dname, ename from dept join emp using(deptno);

select * from denames
order by dname, ename;
DNAME ENAME
ACCOUNTING CLARK
ACCOUNTING KING
ACCOUNTING MILLER
RESEARCH ADAMS
RESEARCH FORD
RESEARCH JONES
RESEARCH SCOTT
RESEARCH SMITH
SALES ALLEN
SALES BLAKE
SALES JAMES
SALES MARTIN
SALES TURNER
SALES WARD

 

Reminder: aggregate versus analytic

“Aggregate functions return a single result row based on groups of rows, rather than on single rows.” (19c doc)

select dname,
  listagg(ename, ',') within group(order by ename) enames
from denames
group by dname
order by dname;
DNAME ENAMES
ACCOUNTING CLARK,KING,MILLER
RESEARCH ADAMS,FORD,JONES,SCOTT,SMITH
SALES ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

 

By contrast, “analytic functions…differ from aggregate functions in that they return multiple rows for each group…All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed.” (19c). In other words, the output rows are already determined and any analytic function simply adds a column to each existing row. The “analytic clause”, starting with the keyword OVER(), says what rows provide data for the function.

select de.*,
  listagg(ename, ',') within group(order by ename)
    over(partition by dname)
  enames
from denames de;
DNAME ENAME ENAMES
ACCOUNTING CLARK CLARK,KING,MILLER
ACCOUNTING KING CLARK,KING,MILLER
ACCOUNTING MILLER CLARK,KING,MILLER
RESEARCH ADAMS ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH FORD ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH JONES ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH SCOTT ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH SMITH ADAMS,FORD,JONES,SCOTT,SMITH
SALES ALLEN ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES BLAKE ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES JAMES ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES MARTIN ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES TURNER ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES WARD ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

 

LISTAGG() Limitations

The “analytic clause” has three parts:

  1. PARTITION BY: like GROUP BY, separates the data into chunks based on one or more values;
  2. ORDER BY: within the “partition”, orders the rows and sets up the next part;
  3. Windowing clause: says what rows within the ordered partition are concerned by the function.

Some analytic functions allow only the first or second parts, and LISTAGG() is one of them. Let’s try to use it as a cumulative function:

select de.*,
  listagg(ename, ',') within group(order by ename)
    over(partition by dname order by ename)
  enames
from denames de;

SQL Error: ORA-30487: ORDER BY not allowed here
30487. 00000 -  "ORDER BY not allowed here"
*Cause:    DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY

The error message is a bit strange, but the conclusion is: it doesn’t work.

The workaround

The idea is simple: take what LISTAGG() gives us and do a SUBSTR() based on the row number:

select dname, ename,
  substr(enames, 1, instr(enames, ',', 1, rn) - 1) enames
from (
  select de.*,
    listagg(ename, ',') within group(order by ename)
      over(partition by dname)
    || ','
    enames,
    row_number() over(partition by dname order by ename) rn
  from denames de
);
DNAME ENAME ENAMES
ACCOUNTING CLARK CLARK
ACCOUNTING KING CLARK,KING
ACCOUNTING MILLER CLARK,KING,MILLER
RESEARCH ADAMS ADAMS
RESEARCH FORD ADAMS,FORD
RESEARCH JONES ADAMS,FORD,JONES
RESEARCH SCOTT ADAMS,FORD,JONES,SCOTT
RESEARCH SMITH ADAMS,FORD,JONES,SCOTT,SMITH
SALES ALLEN ALLEN
SALES BLAKE ALLEN,BLAKE
SALES JAMES ALLEN,BLAKE,JAMES
SALES MARTIN ALLEN,BLAKE,JAMES,MARTIN
SALES TURNER ALLEN,BLAKE,JAMES,MARTIN,TURNER
SALES WARD ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

 

Caveats

This solution seems fine, as long as the source values are NOT NULL. If there is a NULL value, the “cumulative value” becomes NULL. Look at the last row of the output.

update emp set ename = null where ename = 'ALLEN';

(repeat the query)
DNAME ENAME ENAMES
ACCOUNTING CLARK CLARK
ACCOUNTING KING CLARK,KING
ACCOUNTING MILLER CLARK,KING,MILLER
RESEARCH ADAMS ADAMS
RESEARCH FORD ADAMS,FORD
RESEARCH JONES ADAMS,FORD,JONES
RESEARCH SCOTT ADAMS,FORD,JONES,SCOTT
RESEARCH SMITH ADAMS,FORD,JONES,SCOTT,SMITH
SALES BLAKE BLAKE
SALES JAMES BLAKE,JAMES
SALES MARTIN BLAKE,JAMES,MARTIN
SALES TURNER BLAKE,JAMES,MARTIN,TURNER
SALES WARD BLAKE,JAMES,MARTIN,TURNER,WARD
SALES

 

Also, with version 19c LISTAGG() allows the DISTINCT keyword. If you want to use DISTINCT, then instead of ROW_NUMBER() use DENSE_RANK()!

update emp set ename = 'MARTIN' where ename is null;

select dname, ename,
  substr(enames, 1, instr(enames, ',', 1, rn) - 1) enames
from (
  select de.*,
    listagg(DISTINCT ename, ',') within group(order by ename)
      over(partition by dname)
    || ','
    enames,
    DENSE_RANK() over(partition by dname order by ename) rn
  from denames de
);
DNAME ENAME ENAMES
ACCOUNTING CLARK CLARK
ACCOUNTING KING CLARK,KING
ACCOUNTING MILLER CLARK,KING,MILLER
RESEARCH ADAMS ADAMS
RESEARCH FORD ADAMS,FORD
RESEARCH JONES ADAMS,FORD,JONES
RESEARCH SCOTT ADAMS,FORD,JONES,SCOTT
RESEARCH SMITH ADAMS,FORD,JONES,SCOTT,SMITH
SALES BLAKE BLAKE
SALES JAMES BLAKE,JAMES
SALES MARTIN BLAKE,JAMES,MARTIN
SALES MARTIN BLAKE,JAMES,MARTIN
SALES TURNER BLAKE,JAMES,MARTIN,TURNER
SALES WARD BLAKE,JAMES,MARTIN,TURNER,WARD

 
With these precautions, why not use LISTAGG() as a cumulative function if you want? Hope this helps…

Interval expressions

I just learned about these critters thanks to an ODC forum question: how to calculate the difference between two dates in calendar years and months, days, hours, minutes and seconds.

What are they, and why are they?

Take any two datetime values (DATE, TIMESTAMP, TIMESTAMP WITH (LOCAL) TIME ZONE) and subtract one from the other: what do you get?

  • With two dates, you get the number of days.
  • With two timestamps, you get an INTERVAL DAY TO SECOND.

If you always want the interval, just put the two values in parentheses and add DAY TO SECOND. You can add YEAR TO MONTH instead to get an INTERVAL YEAR TO MONTH.

All the following combinations are OK:

select (sysdate - sysdate) day to second from dual;
select (sysdate - sysdate) year to month from dual;
select (sysdate - systimestamp) day to second from dual;
select (sysdate - systimestamp) year to month from dual;
select (systimestamp - sysdate) day to second from dual;
select (systimestamp - sysdate) year to month from dual;
select (systimestamp - systimestamp) day to second from dual;
select (systimestamp - systimestamp) year to month from dual;

Interval expressions are a handy way to get intervals (especially year-month intervals) from all datetime datatypes with the same code.

INTERVAL YEAR TO MONTH

INTERVAL YEAR TO MONTH is based on calendar years and months, so the “same” interval may cover a different number of days depending on the context. For example, adding INTERVAL '1' MONTH  to January 1 will add 31 days, but adding it to February 1, 2019 will add 28 days. In fact, sometimes it won’t add any days at all! Tell Oracle to add 1 month to January 31 and it will just say February 31 doesn’t exist. (Actually the error message is not quite that specific…).

The other surprise is rounding: since both year and month portions are integers, the month portion is rounded according to a rule I don’t fully understand:

  • (DATE '2016-02-29' - DATE '2016-02-15') year to month is 0 months, whereas
  • (DATE '2016-03-01' - DATE '2016-02-16') year to month is 1 month.

Keeping all this in mind, we can still use intervals and interval expressions to calculate the difference between two datetime values in calendar years and months, days, hours, minutes and seconds.

The answer, step by step

To split things up into smaller portions that are easier to understand, I’ll start with a view called QUESTION. We can change this view whenever we want different test data. Here I’ll test all possible date-only ranges (all time portions are 00:00:00) in the years 2015 through 2017.

create or replace view question as  
with dates(dte) as (  
  select date '2014-12-31' + level  
  from dual  
  connect by level <= 365+366+365  
)  
, ranges(date_from, date_to) as (  
  select a.dte, b.dte  
  from dates a, dates b  
  where a.dte < b.dte  
)  
select * from ranges;

My first draft of an answer is straightforward:

  • The year-month interval goes from the first day of the “from” month to the first day of the “to” month.
  • The day-second interval =
    • the day-second interval from date_from to date_to
    • minus the day-second interval from the first day of the “from” month to the first day of the “to” month.
create or replace view answer as  
select date_from, date_to,  
  (trunc(date_to,'mm') - trunc(date_from,'mm')) year(5) to month as iym,  
  (date_to - date_from) day(7) to second -  
    (trunc(date_to,'mm') - trunc(date_from,'mm')) day(7) to second  
  as ids  
from question;

This solution lets us do “round trips”: if we have date_from, iym and ids, we can calculate date_to, and if we have date_to, iym and ids we can calculate date_from.

select a.*,  
  case when ids < interval '0' second  
    then date_from + ids + iym  
    else date_from + iym + ids  
  end new_to,  
  case when ids < interval '0' second  
    then date_to - iym - ids  
    else date_to - ids - iym  
  end new_from  
from answer a;

The only problem is that IDS, which is the day-second interval, is negative almost half the time. We can use that fact to calculate properly, but users will find it strange.

In order to satisfy the users, here is a second draft:

  • When IDS is negative
    • subtract 1 from IYM
    • add to IDS the number of days in the “from” month

If we want to avoid ambiguous results, we still have to say when the original IDS was negative.
To simplify “round trips”, I decided to add an IDS_ADJUSTMENT column with the number of days in the “from” month.

create or replace view adjusted_answer as  
with details as (  
  select date_from, date_to,  
    (trunc(date_to,'mm') - trunc(date_from,'mm')) year(5) to month  
    as iym,  
    (date_to - date_from) day(7) to second -  
      (trunc(date_to,'mm') - trunc(date_from,'mm')) day(7) to second  
    as ids,  
    ((trunc(date_from,'mm') + interval '1' month) - trunc(date_from,'mm')) day to second  
    as from_month_days  
  from question  
)  
select date_from, date_to,  
  case when ids < interval '0' second  
    then iym - interval '1' month -- subtract 1 month  
    else iym  
  end iym,  
  case when ids < interval '0' second  
    then ids + from_month_days --add number of days in "from" month  
    else ids  
  end ids,  
  case when ids < interval '0' second  
    then from_month_days  
    else interval '0' second  
  end ids_adjustment    
from details;

This gives plausible results to the users. The same DATE_FROM, IYM and IDS combination will sometimes occur for two different DATE_TO values, but the IDS_ADJUSTMENT column will allow us to do “round trips”.

select a.*,  
  case when ids_adjustment > interval '0' second  
    then date_from  
      + (ids - ids_adjustment)  
      + (iym + interval '1' month)  
    else date_from + iym + ids  
  end new_to,  
  case when ids_adjustment > interval '0' second  
    then date_to  
      - (iym + interval '1' month)  
      - (ids - ids_adjustment)  
    else date_to - ids - iym  
  end new_from      
from adjusted_answer a;

Again, almost half the ranges will have a positive IDS_ADJUSTMENT value. We can eliminate that when the “from” month and the month preceding date_to have the same number of days. (See lines 11,12 and 24.)

create or replace view final_answer as  
with details as (  
  select date_from, date_to,  
    (trunc(date_to,'mm') - trunc(date_from,'mm')) year(5) to month  
    as iym,  
    (date_to - date_from) day(7) to second -  
      (trunc(date_to,'mm') - trunc(date_from,'mm')) day(7) to second  
    as ids,  
    ((trunc(date_from,'mm') + interval '1' month) - trunc(date_from,'mm')) day to second  
    as from_month_days,  
    (trunc(date_to,'mm') - (trunc(date_to,'mm') - interval '1' month)) day to second  
    as before_to_month_days  
  from question  
)  
select date_from, date_to,  
  case when ids < interval '0' second  
    then iym - interval '1' month -- subtract 1 month  
    else iym  
  end iym,  
  case when ids < interval '0' second  
    then ids + from_month_days --add number of days in "from" month  
    else ids  
  end ids,  
  case when ids < interval '0' second and from_month_days != before_to_month_days  
    then from_month_days  
    else interval '0' second  
  end ids_adjustment    
from details;

Now IDS_ADJUSTMENT will only be positive 25% of the time (with this test data!) We can do “round trips” the same way as with the “adjusted answer”.

As a bonus, thanks to “interval expressions” we can change the data types in the QUESTION view and the solution will still work with no changes to the code.

Conclusion

We can calculate a sensible difference (including years and months) between two dates, but unless we provide additional information we cannot calculate one date using the other date and the difference.

#DOAG2018: my presentations

I had a great time at DOAG 2018, both during the conference and in the evenings. Thanks to everyone who participated in my two sessions: it was a pleasure to share a bit of knowledge with you!

Here are links to my two presentations on Slideshare. Please download the files so you can see the animations.

Best regards,
Stew

https://www.slideshare.net/stewashton/json-in-18c-json-as-it-was-meant-to-be

https://www.slideshare.net/stewashton/make-your-data-dance-pivot-and-group-by-in-oracle-sql

Generate JOINs among related tables

After finding the shortest “foreign key” path between two tables, I’ll try to generate a SELECT statement that joins all the related tables.

(Please don’t tell me that Oracle SQL Developer does this: I know it does, as long as you already know the path and select all the tables yourself! I find a path for you and make sure the column names are unique. Despite these differences, you will probably never need this code if you use SQL Developer.)

  • The final SELECT starts with the bottom of the hierarchy and works its way up.
  • Table aliases are T1, T2 and so on.
  • All columns of all tables are selected (based on ALL_TAB_COLUMNS, not ALL_TAB_COLS).
  • To make columns unique, they are prefixed with the table alias: T1_* and so on.
  • Table names and column names are only enclosed in double quotes when they contain non-alphanumeric characters. Exceptionally, I always enclose the join columns in double quotes just to save myself some trouble…
  • I use ANSI join syntax, and when I do the equality comparison I use parentheses like this:
    on (T1.COL1, T1.COL2) = ((T2.COL1, T2.COL2))
    It’s a bit easier to code. Don’t be surprised to see the parentheses even when there is only one column.

The code is more than 100 lines long. I’ll put my inadequate explanation first, then the code.

  • search_for: the owners and tables I want to find the path between.
  • pur: the constraints I need to start with.
  • relations: add to each constraint row the remote or parent table.
  • best_path: returns a string that contains all the data from all the rows in the path
  • best_path_tables: breaks down the string into the rows and columns of the path I found. I use REGEXP functions instead of JSON functions so this should work from version 11.2.
  • final_select: get all the columns for the final SELECT, accessing ALL_TAB_COLUMNS
  • final_from: get the FROM and JOIN clauses, accessing ALL_CONS_COLUMNS.

I’ll sign off before posting the code. I would say “hope this helps”, but I feel more like saying

Hope this works!

with search_for (owner, table_name) as (
  select 'OBE', 'OEHR_REGIONS' from dual union all
  select 'OBE', 'OEHR_ORDER_ITEMS' from dual
)
, pur as (
  select OWNER,   CONSTRAINT_NAME, TABLE_NAME,
       R_OWNER, R_CONSTRAINT_NAME
  from dba_constraints
  where constraint_type in('P','U','R')
  and owner in (select username from dba_users where oracle_maintained = 'N')
)
, relations as (
  select a.owner, a.table_name, a.constraint_name,
       a.r_owner, b.table_name r_table_name, a.r_constraint_name
  from pur a join pur b
  on (a.R_OWNER, a.R_CONSTRAINT_NAME) = ((b.OWNER, b.CONSTRAINT_NAME))
  and (a.OWNER, a.table_name) != ((b.OWNER, b.table_name))
)
, best_path as (
  select path
  from (
    select row_number() over( order by
      case when (r_owner, r_table_name) 
            != ((connect_by_root(owner), connect_by_root(table_name)))
        then level end
      ) rn, 
      sys_connect_by_path(
      owner  ||'/'||table_name  ||'/'||  constraint_name||'/'||
      r_owner||'/'||r_table_name||'/'||r_constraint_name,'<') path
    from relations a
    where (r_owner, r_table_name) in (select * from search_for)
    start with (owner, table_name) in (select * from search_for)
    connect by nocycle (owner, table_name) = ((prior r_owner, prior r_table_name))
    and (owner, table_name) not in (select * from search_for)
  )
  where rn = 1
)
, best_path_tables as (
  select level lvl,
    'T'||level al,
    'T'||(level+1) r_al,
    regexp_substr(path, '<(.+?)/',            1, level,'i',1) owner,
    regexp_substr(path, '<(.+?/){1}(.+?)/',   1, level,'i',2) table_name,
    regexp_substr(path, '<(.+?/){2}(.+?)/',   1, level,'i',2) constraint_name,
    regexp_substr(path, '<(.+?/){3}(.+?)/',   1, level,'i',2) r_owner,
    regexp_substr(path, '<(.+?/){4}(.+?)/',   1, level,'i',2) r_table_name,
    regexp_substr(path, '<(.+?/){5}([^/<]+)', 1, level,'i',2) r_constraint_name
  from best_path
  connect by level <= regexp_count (path, '<')
)
, final_select as (
  select lvl, pref ||
  listagg(al||'.'||qu||column_name||qu||' '||qu||al||'_'||column_name||qu,',')
    within group(order by column_id) str
  from (
    select -99 lvl, 'select ' pref, al, column_name, column_id,
      case when not regexp_like (column_name, '^[A-Z][A-Z0-9$#_]*$') then '"' end qu
    from best_path_tables join all_tab_columns using(owner, table_name)
    where lvl = 1
    union all
    select lvl-99, ',', r_al, column_name, column_id,
      case when not regexp_like (column_name, '^[A-Z][A-Z0-9$#_]*$') then '"' end qu
    from best_path_tables b
    join all_tab_columns a on (b.r_owner, b.r_table_name) = ((a.owner, a.table_name))
  )
  group by lvl, pref
)
, final_from as (
  select 0 lvl, 'from '||oq||owner||oq||'.'||tq||table_name||tq||' '|| al str
  from (
    select a.*,
      case when not regexp_like (owner, '^[A-Z][A-Z0-9$#_]*$') then '"' end oq,
      case when not regexp_like (table_name, '^[A-Z][A-Z0-9$#_]*$') then '"' end tq
      from best_path_tables a
      where lvl = 1
  ) a
  union all
  select lvl, 'join '||oq||r_owner||oq||'.'||tq||r_table_name||tq||' '|| r_al || ' on (' ||
  (
    select listagg(al||'."'||column_name||'"',',') within group(order by position)
    from all_cons_columns 
    where (owner, table_name, constraint_name) = ((a.owner, a.table_name, a.constraint_name))
  )
  || ') = ((' ||
  (
    select listagg(r_al||'."'||column_name||'"',',') within group(order by position)
    from all_cons_columns 
    where (owner, table_name, constraint_name) = ((a.r_owner, a.r_table_name, a.r_constraint_name))
  )
  || '))'
  from (
    select a.*,
      case when not regexp_like (r_owner, '^[A-Z][A-Z0-9$#_]*$') then '"' end oq,
      case when not regexp_like (r_table_name, '^[A-Z][A-Z0-9$#_]*$') then '"' end tq
      from best_path_tables a
  ) a
)
select str from (
  select * from final_select
  union all
  select * from final_from
)
order by lvl;
select T1.ORDER_ID T1_ORDER_ID,T1.LINE_ITEM_ID T1_LINE_ITEM_ID,T1.PRODUCT_ID T1_PRODUCT_ID,T1.UNIT_PRICE T1_UNIT_PRICE,T1.QUANTITY T1_QUANTITY,T1.ORDER_ITEM_ID T1_ORDER_ITEM_ID
,T2.ORDER_ID T2_ORDER_ID,T2.ORDER_DATE T2_ORDER_DATE,T2.ORDER_MODE T2_ORDER_MODE,T2.CUSTOMER_ID T2_CUSTOMER_ID,T2.ORDER_STATUS T2_ORDER_STATUS,T2.ORDER_TOTAL T2_ORDER_TOTAL,T2.SALES_REP_ID T2_SALES_REP_ID,T2.PROMOTION_ID T2_PROMOTION_ID
,T3.EMPLOYEE_ID T3_EMPLOYEE_ID,T3.FIRST_NAME T3_FIRST_NAME,T3.LAST_NAME T3_LAST_NAME,T3.EMAIL T3_EMAIL,T3.PHONE_NUMBER T3_PHONE_NUMBER,T3.HIRE_DATE T3_HIRE_DATE,T3.JOB_ID T3_JOB_ID,T3.SALARY T3_SALARY,T3.COMMISSION_PCT T3_COMMISSION_PCT,T3.MANAGER_ID T3_MANAGER_ID,T3.DEPARTMENT_ID T3_DEPARTMENT_ID
,T4.DEPARTMENT_ID T4_DEPARTMENT_ID,T4.DEPARTMENT_NAME T4_DEPARTMENT_NAME,T4.MANAGER_ID T4_MANAGER_ID,T4.LOCATION_ID T4_LOCATION_ID
,T5.LOCATION_ID T5_LOCATION_ID,T5.STREET_ADDRESS T5_STREET_ADDRESS,T5.POSTAL_CODE T5_POSTAL_CODE,T5.CITY T5_CITY,T5.STATE_PROVINCE T5_STATE_PROVINCE,T5.COUNTRY_ID T5_COUNTRY_ID
,T6.COUNTRY_ID T6_COUNTRY_ID,T6.COUNTRY_NAME T6_COUNTRY_NAME,T6.REGION_ID T6_REGION_ID
,T7.REGION_ID T7_REGION_ID,T7.REGION_NAME T7_REGION_NAME
from OBE.OEHR_ORDER_ITEMS T1
join OBE.OEHR_ORDERS T2 on (T1."ORDER_ID") = ((T2."ORDER_ID"))
join OBE.OEHR_EMPLOYEES T3 on (T2."SALES_REP_ID") = ((T3."EMPLOYEE_ID"))
join OBE.OEHR_DEPARTMENTS T4 on (T3."DEPARTMENT_ID") = ((T4."DEPARTMENT_ID"))
join OBE.OEHR_LOCATIONS T5 on (T4."LOCATION_ID") = ((T5."LOCATION_ID"))
join OBE.OEHR_COUNTRIES T6 on (T5."COUNTRY_ID") = ((T6."COUNTRY_ID"))
join OBE.OEHR_REGIONS T7 on (T6."REGION_ID") = ((T7."REGION_ID"))

Get all the Foreign Key Relationships

An old asktom question was recently highlighted on Twitter: All Parent – Child tables in the database. I contributed a new solution, but here is one I think is more complete.

The Requirement: foreign key relationships

The questioner wanted to show the hierarchy of “parent-child” relationships, starting from the top ancestor and going down to the last descendant. He asked for the entire database.

By “parent-child” we mean foreign key relationships: the “parent” has a primary key constraint or a unique constraint, and the “child” has a foreign key constraint that references the parent key.

I assume I should omit any table that is neither parent nor child. I also add an IS_CYCLE column: if = 1, a CONNECT_BY_ISCYCLE condition was met. Finally, I show whether the referenced key is Primary or Unique.

Building the solution

Since we are working on the whole database, DBA_CONSTRAINTS is the appropriate source. For just my schema, I can use USER_CONSTRAINTS instead. For readability, I’ll restrict the output to the HR schema.

The basic information we need includes the owner, constraint name, constraint type and table name, together with the referenced constraint and its owner. I include only constraints of type ‘P’ (Primary), ‘U’ (Unique), and ‘R’ (foreign key Reference). In recent versions, I can restrict my query to owners that are not “Oracle maintained”.

with pur as (
  select OWNER, CONSTRAINT_NAME, CONSTRAINT_TYPE, TABLE_NAME,
    R_OWNER, R_CONSTRAINT_NAME
  from dba_constraints
  where constraint_type in('P','U','R')
  and owner in (select username from dba_users where oracle_maintained = 'N')
)
select * from pur;
OWNER CONSTRAINT_NAME CONSTRAINT_TYPE TABLE_NAME R_OWNER R_CONSTRAINT_NAME
HR COUNTRY_C_ID_PK P COUNTRIES
HR COUNTR_REG_FK R COUNTRIES HR REG_ID_PK
HR DEPT_ID_PK P DEPARTMENTS
HR DEPT_LOC_FK R DEPARTMENTS HR LOC_ID_PK
HR DEPT_MGR_FK R DEPARTMENTS HR EMP_EMP_ID_PK
HR EMP_EMP_ID_PK P EMPLOYEES
HR EMP_DEPT_FK R EMPLOYEES HR DEPT_ID_PK
HR EMP_JOB_FK R EMPLOYEES HR JOB_ID_PK
HR EMP_MANAGER_FK R EMPLOYEES HR EMP_EMP_ID_PK
HR EMP_EMAIL_UK U EMPLOYEES
HR JOB_ID_PK P JOBS
HR JHIST_EMP_ID_ST_DATE_PK P JOB_HISTORY
HR JHIST_DEPT_FK R JOB_HISTORY HR DEPT_ID_PK
HR JHIST_EMP_FK R JOB_HISTORY HR EMP_EMP_ID_PK
HR JHIST_JOB_FK R JOB_HISTORY HR JOB_ID_PK
HR LOC_ID_PK P LOCATIONS
HR LOC_C_ID_FK R LOCATIONS HR COUNTRY_C_ID_PK
HR REG_ID_PK P REGIONS

 

Now we need to add the table the referenced constraints belong to. I took the easy path and did a self join.

with pur as (
  select OWNER, CONSTRAINT_NAME, CONSTRAINT_TYPE, TABLE_NAME,
    R_OWNER, R_CONSTRAINT_NAME
  from dba_constraints
  where constraint_type in('P','U','R')
  and owner in (select username from dba_users where oracle_maintained = 'N')
)
, relations as (
  select a.owner, a.table_name, a.r_owner,
    b.table_name r_table_name,
    b.constraint_type r_constraint_type
  from pur a join pur b
  on (a.R_OWNER, a.R_CONSTRAINT_NAME) = ((b.OWNER, b.CONSTRAINT_NAME))
  --and (a.OWNER, a.table_name) != ((b.OWNER, b.table_name))
)
select * from relations
OWNER TABLE_NAME R_OWNER R_TABLE_NAME R_CONSTRAINT_TYPE
HR COUNTRIES HR REGIONS P
HR DEPARTMENTS HR EMPLOYEES P
HR DEPARTMENTS HR LOCATIONS P
HR EMPLOYEES HR DEPARTMENTS P
HR EMPLOYEES HR EMPLOYEES P
HR EMPLOYEES HR JOBS P
HR JOB_HISTORY HR DEPARTMENTS P
HR JOB_HISTORY HR EMPLOYEES P
HR JOB_HISTORY HR JOBS P
HR LOCATIONS HR COUNTRIES P

 

Note that EMPLOYEES references itself.

UPDATE: I added a commented line at the end of RELATIONS. If you want to eliminate self-references, uncomment that line.

Now we need to identify the starting points for the hierarchical query. These are tables that are referenced, but do not reference any other table – though they can reference themselves. We’ll add those to the RELATIONS result set:

with pur as (
  select OWNER, CONSTRAINT_NAME, CONSTRAINT_TYPE, TABLE_NAME,
    R_OWNER, R_CONSTRAINT_NAME
  from dba_constraints
  where constraint_type in('P','U','R')
  and owner in (select username from dba_users where oracle_maintained = 'N')
)
, relations as (
  select a.owner, a.table_name, a.r_owner,
    b.table_name r_table_name,
    b.constraint_type r_constraint_type
  from pur a join pur b
  on (a.R_OWNER, a.R_CONSTRAINT_NAME) = ((b.OWNER, b.CONSTRAINT_NAME))
  -- and (a.OWNER, a.table_name) != ((b.OWNER, b.table_name))
)
, with_parents as (
  select * from relations
  union
  select r_owner, r_table_name, null, null, null
  from relations 
  where (r_owner, r_table_name) not in (
    select owner, table_name
    from relations
    where (owner, table_name) != ((r_owner, r_table_name))
  )
) select * from with_parents
OWNER TABLE_NAME R_OWNER R_TABLE_NAME R_CONSTRAINT_TYPE
HR COUNTRIES HR REGIONS P
HR DEPARTMENTS HR EMPLOYEES P
HR DEPARTMENTS HR LOCATIONS P
HR EMPLOYEES HR DEPARTMENTS P
HR EMPLOYEES HR EMPLOYEES P
HR EMPLOYEES HR JOBS P
HR JOBS
HR JOB_HISTORY HR DEPARTMENTS P
HR JOB_HISTORY HR EMPLOYEES P
HR JOB_HISTORY HR JOBS P
HR LOCATIONS HR COUNTRIES P
HR REGIONS

 

Finally, we can start from the rows where R_OWNER is null and go down the hierarchy:

with pur as (
  select OWNER, CONSTRAINT_NAME, CONSTRAINT_TYPE, TABLE_NAME,
    R_OWNER, R_CONSTRAINT_NAME
  from dba_constraints
  where constraint_type in('P','U','R')
  and owner in (select username from dba_users where oracle_maintained = 'N')
)
, relations as (
  select a.owner, a.table_name, a.r_owner,
    b.table_name r_table_name,
    b.constraint_type r_constraint_type
  from pur a join pur b
  on (a.R_OWNER, a.R_CONSTRAINT_NAME) = ((b.OWNER, b.CONSTRAINT_NAME))
  -- and (a.OWNER, a.table_name) != ((b.OWNER, b.table_name))
)
, with_parents as (
  select * from relations
  union
  select r_owner, r_table_name, null, null, null
  from relations 
  where (r_owner, r_table_name) not in (
    select owner, table_name
    from relations
    where (owner, table_name) != ((r_owner, r_table_name))
  )
)
select level LVL, OWNER, TABLE_NAME, R_OWNER, R_TABLE_NAME, R_CONSTRAINT_TYPE,
  connect_by_iscycle is_cycle
from with_parents
start with r_owner is null
connect by nocycle (r_owner, r_table_name) = ((prior owner, prior table_name))
order siblings by owner, table_name;
LVL OWNER TABLE_NAME R_OWNER R_TABLE_NAME R_CONSTRAINT_TYPE IS_CYCLE
1 HR JOBS 0
2 HR EMPLOYEES HR JOBS P 1
3 HR DEPARTMENTS HR EMPLOYEES P 1
4 HR JOB_HISTORY HR DEPARTMENTS P 0
3 HR JOB_HISTORY HR EMPLOYEES P 0
2 HR JOB_HISTORY HR JOBS P 0
1 HR REGIONS 0
2 HR COUNTRIES HR REGIONS P 0
3 HR LOCATIONS HR COUNTRIES P 0
4 HR DEPARTMENTS HR LOCATIONS P 0
5 HR EMPLOYEES HR DEPARTMENTS P 1
6 HR JOB_HISTORY HR EMPLOYEES P 0
5 HR JOB_HISTORY HR DEPARTMENTS P 0

 

Hope this helps…

More on Comparing Columns across Rows

My previous post got several replies about other ways to compare columns across rows. Here’s my take on all of them.

Test data (again)

create table t as
with data as (
  select level n from dual connect by level <= 3
)
, tests as (
  select row_number() over (order by a.n, b.n) test_id,
  nullif(a.n, 3) a, 
  nullif(b.n, 3) b,
  case a.n when b.n then 'Same' else 'Different' end status
  from data a, data b
)
select test_id, tbl, n, status
from tests
unpivot include nulls (n for tbl in(A,B));

select * from t order by 1,2;
TEST_ID TBL N STATUS
1 A 1 Same
1 B 1 Same
2 A 1 Different
2 B 2 Different
3 A 1 Different
3 B Different
4 A 2 Different
4 B 1 Different
5 A 2 Same
5 B 2 Same
6 A 2 Different
6 B Different
7 A Different
7 B 1 Different
8 A Different
8 B 2 Different
9 A Same
9 B Same

 

My solution (again)

It looks like the most concise test for a difference is COUNT(N) BETWEEN 1 and COUNT(DISTINCT N).

select t.*,
count(distinct n) over(partition by test_id) cnt_distinct,
count(n) over(partition by test_id) cnt,
case when count(n) over(partition by test_id)
  between 1 and count(distinct n) over(partition by test_id)
  then 'Different' else 'Same' end new_status
from t
order by 4,5,6,1,2;
TEST_ID TBL N STATUS CNT_DISTINCT CNT NEW_STATUS
3 A 1 Different 1 1 Different
3 B Different 1 1 Different
6 A 2 Different 1 1 Different
6 B Different 1 1 Different
7 A Different 1 1 Different
7 B 1 Different 1 1 Different
8 A Different 1 1 Different
8 B 2 Different 1 1 Different
2 A 1 Different 2 2 Different
2 B 2 Different 2 2 Different
4 A 2 Different 2 2 Different
4 B 1 Different 2 2 Different
9 A Same 0 0 Same
9 B Same 0 0 Same
1 A 1 Same 1 2 Same
1 B 1 Same 1 2 Same
5 A 2 Same 1 2 Same
5 B 2 Same 1 2 Same

 

Using the DUMP() function (Andrej Pashchenko)

Since the DUMP() function returns a non-null result in every case, we can simply test

COUNT( DISTINCT DUMP(n) ) OVER (PARTITION BY TEST_ID)

If the result is 1, N is the same, otherwise it is different.

This solution is more concise, but in my tests it ran 10 times slower. The DUMP() function chews up some CPU.

Using SYS_OP_MAP_NONNULL() (Tony Hasler)

As with DUMP(), we can test

COUNT( DISTINCT SYS_OP_MAP_NONNULL(n) ) OVER (PARTITION BY TEST_ID)

Again a concise solution. Surprisingly, it runs twice as slow as my solution – and despite one mention in the documentation I doubt that this function is supported.

Simplifying the comparison

Zahar Hilkevich suggested simply comparing the two counts:

select t.*,
count(distinct n) over(partition by test_id) cnt_distinct,
count(n) over(partition by test_id) cnt,
case when count(n)          over(partition by test_id)
        = count(distinct n) over(partition by test_id)
  then 'Different' else 'Same' end new_status
from t
order by 4,5,6,1,2;
TEST_ID TBL N STATUS CNT_DISTINCT CNT NEW_STATUS
3 A 1 Different 1 1 Different
3 B Different 1 1 Different
6 A 2 Different 1 1 Different
6 B Different 1 1 Different
7 A Different 1 1 Different
7 B 1 Different 1 1 Different
8 A Different 1 1 Different
8 B 2 Different 1 1 Different
2 A 1 Different 2 2 Different
2 B 2 Different 2 2 Different
4 A 2 Different 2 2 Different
4 B 1 Different 2 2 Different
9 A Same 0 0 Different
9 B Same 0 0 Different
1 A 1 Same 1 2 Same
1 B 1 Same 1 2 Same
5 A 2 Same 1 2 Same
5 B 2 Same 1 2 Same

 

Notice that in test case 9, when both values are null, Zahar’s test says they are “different”. This is not a problem in the context of displaying different values, since the output values will be null no matter what.

Conclusion

Thanks to those who mentioned other more concise solutions. Despite my taste for concision, I think the solution I proposed is correct and efficient. The choice is yours!

My #ilOUG18 presentations

I just finished presenting at the Israeli OUG conference. The conference was a great experience, from the Sunday visit of Jerusalem to the two days of presentations. Thanks to the organizers for inviting me and taking great care of everyone!

Slideshare does not allow me to upload a new version of my slides, so here are links to the versions I presented during 2017. I made no important changes for Israel. If you were at the conference and you want the very latest version, please check on the ILOUG website.

Safe trips home to everyone!

Please download to see the animations, they will help point you to the right elements as you move through the slides.

Enjoy and best regards, Stew

My #ukoug_tech17 presentation on Advanced Row Pattern Matching

I just finished presenting on this topic, had a great time and got some memorable praise. Thanks again to all who attended for your benevolent attention – and your correct answers to my questions!

As promised, the absolutely freshest copy of the slides is now on the Conference website and on slideshare:

For those interested, here is the prequel that is not “advanced”:

Please download to see the animations, they will help point you to the right elements as you move through the slides.

Enjoy and best regards, Stew