Multi-Table Inserts with Sequences

Multi-table inserts are a drag when sequences are involved, because we don’t have enough control over when the sequence is incremented. I’ve seen some dangerous workarounds. Here is a workaround that I think is safe.

[Update  2016-02-12: paulzip caught me assuming the output would be ordered without an ORDER BY. I made the correction.]

DEPT and EMP from XML

Here is a snippet from a typical XML document with DEPT and EMP information.

...
<DEPTS>
    <DEPT>
        <DNAME>ACCOUNTING</DNAME>
        <LOC>NEW YORK</LOC>
        <EMPS>
            <EMP>
                <ENAME>CLARK</ENAME>
                <JOB>MANAGER</JOB>
                <HIREDATE>1981-06-09</HIREDATE>
                <SAL>2450</SAL>
            </EMP>
            <EMP>
                <ENAME>MILLER</ENAME>
                <JOB>CLERK</JOB>
                <HIREDATE>1982-01-23</HIREDATE>
                <SAL>1300</SAL>
            </EMP>
...
        </EMPS>
    </DEPT>
...
</DEPTS>

The fact that CLARK works in the ACCOUNTING department is indicated by the placement of CLARK inside the path of ACCOUNTING. (To simplify the problem, I leave out MGR.)

The following view will generate the XML content from the SCOTT schema, leaving out DEPTNO, EMPNO and MGR.

create or replace view dept_emp_xml as
with emps as (
  select deptno,
    xmlagg(
      xmlelement(
        "EMP",
        xmlforest(ENAME, JOB, HIREDATE, SAL, COMM)
      )
    ) emps
  from scott.emp
  group by deptno
)
select xmlelement(
  "DEPTS",
  xmlagg(
    xmlelement(
      "DEPT",
      xmlforest(DNAME, LOC, EMPS)
    )
  )
) depts
from scott.dept
left join emps using(deptno);

From XML to Relational

Let’s say I want to load this data to relational tables. I can use the XMLTABLE() function to convert the data to a result set:

create or replace view dept_emp_rel as
select deptno, DNAME, LOC, RN, 0 EMPNO, ENAME, JOB, HIREDATE, SAL, COMM
from dept_emp_xml a,
XMLTABLE(
  '$X/DEPTS/DEPT' PASSING DEPTS AS X
  COLUMNS
    deptno for ordinality,
    DNAME VARCHAR2(63) path 'DNAME',
    LOC VARCHAR2(63) path 'LOC',
    EMPS xmltype path 'EMPS'
) (+) b,
xmltable(
  '/EMPS/EMP' passing b.EMPS
  columns
    rn for ordinality,
    ENAME VARCHAR2(63) path 'ENAME',
    JOB VARCHAR2(63) path 'JOB',
    HIREDATE date path 'HIREDATE',
    SAL number path 'SAL',
    COMM number path 'COMM'
) (+) c;
DEPTNO DNAME LOC RN EMPNO ENAME JOB HIREDATE SAL COMM
1 ACCOUNTING NEW YORK 1 0 CLARK MANAGER 1981-06-09 2450
1 ACCOUNTING NEW YORK 2 0 MILLER CLERK 1982-01-23 1300
1 ACCOUNTING NEW YORK 3 0 KING PRESIDENT 1981-11-17 5000
2 RESEARCH DALLAS 1 0 SMITH CLERK 1980-12-17 800
2 RESEARCH DALLAS 2 0 FORD ANALYST 1981-12-03 3000
2 RESEARCH DALLAS 3 0 ADAMS CLERK 1987-05-23 1100
2 RESEARCH DALLAS 4 0 SCOTT ANALYST 1987-04-19 3000
2 RESEARCH DALLAS 5 0 JONES MANAGER 1981-04-02 2975
3 SALES CHICAGO 1 0 ALLEN SALESMAN 1981-02-20 1600 300
3 SALES CHICAGO 2 0 JAMES CLERK 1981-12-03 950
3 SALES CHICAGO 3 0 TURNER SALESMAN 1981-09-08 1500 0
3 SALES CHICAGO 4 0 BLAKE MANAGER 1981-05-01 2850
3 SALES CHICAGO 5 0 MARTIN SALESMAN 1981-09-28 1250 1400
3 SALES CHICAGO 6 0 WARD SALESMAN 1981-02-22 1250 500
4 OPERATIONS BOSTON 0

Loading the Result Set

Let’s imagine for a moment that the result set contains valid DEPTNO and EMPNO values. If that were true, we could use a multi-table insert to load the DEPT and EMP records. The important thing is to have the correct DEPTNO value in every row, so that the foreign key in EMP is correct.

drop table dept purge;
create table dept as
select * from scott.dept where 1=0;

drop table emp purge;
create table emp as
select EMPNO, ENAME, JOB, HIREDATE, SAL, COMM, DEPTNO
from scott.emp
where 1=0;

insert all
when rn = 1 or rn is null then
  into dept(deptno, dname, loc)
     values(deptno, dname, loc)
when rn is not null then
  into emp(EMPNO, ENAME, JOB, HIREDATE, SAL, COMM, DEPTNO)
    values(EMPNO, ENAME, JOB, HIREDATE, SAL, COMM, DEPTNO)
select * from dept_emp_rel order by deptno, rn;

select * from dept
left join emp using(deptno)
order by deptno;
DEPTNO DNAME LOC EMPNO ENAME JOB HIREDATE SAL COMM
1 ACCOUNTING NEW YORK 0 CLARK MANAGER 1981-06-09 2450
1 ACCOUNTING NEW YORK 0 KING PRESIDENT 1981-11-17 5000
1 ACCOUNTING NEW YORK 0 MILLER CLERK 1982-01-23 1300
2 RESEARCH DALLAS 0 SCOTT ANALYST 1987-04-19 3000
2 RESEARCH DALLAS 0 ADAMS CLERK 1987-05-23 1100
2 RESEARCH DALLAS 0 FORD ANALYST 1981-12-03 3000
2 RESEARCH DALLAS 0 SMITH CLERK 1980-12-17 800
2 RESEARCH DALLAS 0 JONES MANAGER 1981-04-02 2975
3 SALES CHICAGO 0 ALLEN SALESMAN 1981-02-20 1600 300
3 SALES CHICAGO 0 JAMES CLERK 1981-12-03 950
3 SALES CHICAGO 0 TURNER SALESMAN 1981-09-08 1500 0
3 SALES CHICAGO 0 BLAKE MANAGER 1981-05-01 2850
3 SALES CHICAGO 0 MARTIN SALESMAN 1981-09-28 1250 1400
3 SALES CHICAGO 0 WARD SALESMAN 1981-02-22 1250 500
4 OPERATIONS BOSTON

Handling the Sequence!

The problem with sequences in my INSERT ALL statement is that any sequence will be incremented for every row returned from the SELECT. This is the problem people try to work around. Often, the idea is to increment the sequence inside a function. Unfortunately, there is never a guarantee that the function will be called exactly when we want.

My suggestion is to use a pipelined table function, which gives us complete control over when the sequence is incremented. The function takes the result of the XMLTABLE() query and manages the sequences as follows:

  1. When the incoming DEPTNO value changes, increment the DEPT sequence.
  2. When RN is not null, increment the EMP sequence.

The output of the function is identical to the input, except that DEPTNO and EMPNO are updated from the sequences.

drop sequence dept_seq;
create sequence dept_seq;

drop sequence emp_seq;
create sequence emp_seq;

create or replace package dept_emp as
  type tt_depts is table of dept_emp_rel%rowtype;
  function add_seq return tt_depts pipelined;
end dept_emp;
/

create or replace package body dept_emp as
  function add_seq return tt_depts pipelined is
    l_prev_deptno dept_emp_rel.deptno%type := 0;
    l_seqno integer;
  begin
    for rec in (select * from dept_emp_rel order by deptno, rn) loop
      if rec.deptno > l_prev_deptno then
        l_prev_deptno := rec.deptno;
        l_seqno := dept_seq.nextval;
      end if;
      rec.deptno := l_seqno;
      if rec.rn is null then
        rec.empno := null;
      else
        rec.empno := emp_seq.nextval;
      end if;
      pipe row(rec);
    end loop;
  end add_seq;
end dept_emp;
/
select * from table(dept_emp.add_seq);
DEPTNO DNAME LOC RN EMPNO ENAME JOB HIREDATE SAL COMM
1 ACCOUNTING NEW YORK 1 1 CLARK MANAGER 1981-06-09 2450
1 ACCOUNTING NEW YORK 2 2 MILLER CLERK 1982-01-23 1300
1 ACCOUNTING NEW YORK 3 3 KING PRESIDENT 1981-11-17 5000
2 RESEARCH DALLAS 1 4 SMITH CLERK 1980-12-17 800
2 RESEARCH DALLAS 2 5 FORD ANALYST 1981-12-03 3000
2 RESEARCH DALLAS 3 6 ADAMS CLERK 1987-05-23 1100
2 RESEARCH DALLAS 4 7 SCOTT ANALYST 1987-04-19 3000
2 RESEARCH DALLAS 5 8 JONES MANAGER 1981-04-02 2975
3 SALES CHICAGO 1 9 ALLEN SALESMAN 1981-02-20 1600 300
3 SALES CHICAGO 2 10 JAMES CLERK 1981-12-03 950
3 SALES CHICAGO 3 11 TURNER SALESMAN 1981-09-08 1500 0
3 SALES CHICAGO 4 12 BLAKE MANAGER 1981-05-01 2850
3 SALES CHICAGO 5 13 MARTIN SALESMAN 1981-09-28 1250 1400
3 SALES CHICAGO 6 14 WARD SALESMAN 1981-02-22 1250 500
4 OPERATIONS BOSTON

And now my INSERT ALL statement works like a charm:

insert all
when rn = 1 or rn is null then
  into dept(deptno, dname, loc)
     values(deptno, dname, loc)
when rn is not null then
  into emp(EMPNO, ENAME, JOB, HIREDATE, SAL, COMM, DEPTNO)
    values(EMPNO, ENAME, JOB, HIREDATE, SAL, COMM, DEPTNO)
select * from table(dept_emp.add_seq);

Conclusion

When foreign keys reference primary keys that are themselves generated from sequences, loading the parent and child tables together is a great idea – if we can increment those sequences correctly. A simple pipelined function gives us the control we need with few lines of code and little overhead.

Disappearing Data

Did you know that Oracle DML can silently change data you want to put into the database? A recent twitter discussion started by Martin Widlake (@MDWidlake) prompted me to study more closely what happens when.

Number, precision and scale

When we define a number, we can specify its “precision” (the maximum number of significant decimal digits) and its “scale” (the number of digits from the decimal point to the least significant digit). Here are some examples:

  • NUMBER(2,0): an integer in the range of -99 through 99.
  • NUMBER(2,1): a number from -9.9 through 9.9, always rounded to tenths.
  • NUMBER(2,-1): a number from -990 through 990, always rounded to tens.

DML complains about precision

Suppose I have a table column defined as NUMBER(2,0) and I try to put the value 100 in there. There is no room for that value, so Oracle will raise an exception:

ORA-01438: value larger than specified precision allowed for this column

DML complains about scale: NO, it rounds

Now suppose I try to put the value 99.4 into my NUMBER(2,0) column. Oracle will not raise an exception: it will silently round the value to 99 and insert that! The input data is changed with no warning.

The Oracle documentation is clear on the subject:

If a value exceeds the precision, then Oracle returns an error. If a value exceeds the scale, then Oracle rounds it.

What about other data types?

There are two other data types that are subject to rounding: TIMESTAMP (with or without time zone) and INTERVAL DAY TO SECOND. We can specify the “precision” of the fractional seconds. For example:

  • TIMESTAMP(1) only allows one digit after the decimal point for seconds
  • INTERVAL DAY TO SECOND(3) allows up to 3 digits after the decimal point.

Again, if we try to put in a more precise value, Oracle will silently round to make the data fit.

An apparent exception: TIMESTAMP to DATE

Going from TIMESTAMP to DATE is not rounding, it is a datatype conversion. When converting to a DATE, any fractional seconds are discarded. The effect is truncation rather than rounding.

If you want rounding to occur, then CAST the timestamp to TIMESTAMP(0) first.

select systimestamp right_now,
cast(systimestamp as date) truncated,
cast(cast(systimestamp as timestamp(0)) as date) rounded
from dual;
RIGHT_NOW TRUNCATED ROUNDED
2017-01-06 13:24:19.606274 +01:00 2017-01-06 13:24:19 2017-01-06 13:24:20

 

What does this have to do with Twitter?

In the recent twitter conversation that Martin started, there were two suggestions:

  1. Create a new datatype (called DAY, perhaps) that would be a valid date without a time component
  2. Allow for “scale” in the DATE datatype: allow it to contain only the year, or only year+month, or only year+month+day, etc.

Supposing Oracle managed these data types as it does the current types, then in case 1. converting from DATE to DAY would cause truncation of the time component, whereas in case 2. silent rounding would occur.

I think “rounding” to the nearest day, month or year would be strange, don’t you?

#DOAG2016: Bug !

I did it: in my proudest moment at DOAG2016, at the very end of my second presentation, I presented a “brilliant” SQL query with a bug in it. After agonizing for awhile, I finally saw that the correction was simple and didn’t change the timing.

If you were there, the bug was in slide 21 where I joined two tables with overlapping ranges. For some reason I thought I should order the processing by the end of the range (which sometimes works) instead of by the start of the range (which always works – so far).

I have uploaded the corrected presentation to the DOAG website and to slideshare. I will be blogging about the content, so stay tuned.

Here is the link to the presentation on slideshare:

Apologies for the bug…

#DOAG2016: Ranges, Ranges Everywhere!

Today I did my second presentation at DOAG2016. It was at 9:00 so I got to sleep in ;)

The room was huge but there were enough people that I didn’t feel too lonely.

The room and the technical help were top notch, and again there were questions at just the right time to remind me of things I might have left out!

[Update 2016-11-20: there was a bug on slide 21 ! Instead of “order by end_n” it should be “order by start_n”. I have updated the slide on the DOAG website and on slideshare. My test cases were not thorough; my apologies…]

As promised, I put the presentation on Slideshare. Here is the link:

I will be blogging about some of the content later on, so stay tuned…

As always, please download the slides and play them in Powerpoint so you can see the animations.

Thanks to DOAG for the invite!

#DOAG2016: Advanced Row Pattern Matching

DOAG2016 started today at 8:30, and I did too. There were so many great presentations at the same time as mine, I was surprised and pleased to get a nice audience.

The room and the technical help were top notch, and the questions came at just the right time to remind me of things I might have left out!

As promised, I put the presentation on Slideshare. Here is the link:

http://www.slideshare.net/StewAshton1/advanced-row-pattern-matching

I will be blogging about some of the content later on, so stay tuned…

If you want to start with an “unadvanced” presentation, go here first:

http://www.slideshare.net/StewAshton1/row-pattern-matching-in-oracle-database-12c

As always, please download the slides and play them in Powerpoint so you can see the animations.

Thanks to DOAG for the invite!