More on Optimistic Locking with ORA_ROWSCN

Thanks to comments by Tony Hasler and pingbacks from Jeff Kemp, here’s more detail on how optimistic locking works with SCNs, especially with respect to “restarts” during update.

(I assume you have already read Avoiding Lost Updates with ORA_ROWSCN and Optimistic Locking: One SCN to rule them all before reading this post.)

Jeff’s post lists several methods for dealing with lost updates.

  • One of them, the “column compare” method, is different from the others. Suppose application A updates the DNAME column in DEPT, and application B updates the LOC column. If A checks only DNAME for a conflicting change, and B checks only LOC, then both applications can modify the same row independently.
  • The other methods, based on hashes, timestamps or version numbers, stop two sessions from updating the same row even if the columns involved are different.

So how does the “SCN” method behave? Well, it depends on whether there is a “restart”.

Again, suppose A in one session wants to update DEPT.DNAME and B in another session wants to update DEPT.LOC.

Scenario 1

  1. A: query DEPT where DEPTNO=10 and get current SCN
  2. B: query DEPT where DEPTNO=10 and get current SCN
  3. B: update LOC
  4. B: commit
  5. A: update DNAME, referring to the DNAME column and checking the SCN
    Result: since the ORA_ROWSCN has changed, the update is refused.

Scenario 2

  1. A: query DEPT where DEPTNO=10 and get current SCN
  2. B: query DEPT where DEPTNO=10 and get current SCN
  3. B: update LOC
  4. A: update DNAME, referring to the DNAME column and checking the SCN, which appears OK.
    The update will be blocked waiting for B to commit.
  5. B: COMMIT
  6. A: the update is unblocked and Oracle checks to see if the DNAME column has been updated. Since it has not, there is no restart and the ORA_ROWSCN pseudocolumn is not re-evaluated.
    Result: the update is allowed!

So, the behavior of the SCN method is inconsistent: when a restart is involved, applications can update different columns in the same row, but when no restart is involved they cannot.

Scenario 3

This scenario is just to emphasize that there is no “restart” if B rolls back.

  1. A: query DEPT where DEPTNO=10 and get current SCN
  2. B: query DEPT where DEPTNO=10 and get current SCN
  3. B: update LOC and DNAME
  4. A: update DNAME, referring to the DNAME column and checking the SCN, which appears OK.
    The update will be blocked waiting for B to commit.
  5. B: ROLLBACK
  6. A: the update is unblocked and Oracle checks to see if the DNAME column has been changed. Since it has not (because of the rollback), there is no restart.
    Result: the update is allowed.

Dealing with “false positives”

If you want to allow independent updates to different columns in the same row, the SCN method will not work – most of the time. It will only work in a restart scenario, since you can decide which columns will trigger a restart. Even then, you know a column has been updated but you don’t know if its value is different: someone may have updated a column to the same value.

If that is truly a problem for you, our famous SCN can come to the rescue. When a row is not updated, do a flashback query (“AS OF SCN <your SCN>”) on that row. You can then do a “column compare” to see whether the changes concern you or not.

By the way, if you use the SCN method on a table without ROWDEPENDENCIES you risk getting “false positives” even when the row of interest has not been updated. Again, you can double-check with a flashback query and a “column compare”.

Conclusion

The SCN method will warn of a “lost update”:

  • If the table does not have ROWDEPENDENCIES: when any row in the block has been updated, deleted or inserted.
  • If the table has ROWDEPENDENCIES and no restart is involved: when any column in the same row has been updated.
  • If the table has ROWDEPENDENCIES and a restart is involved: when the columns listed in the WHERE clause of the update have been updated.

To make sure the warning is not a “false positive”, use the SCN in a flashback query, compare the columns of interest and then decide whether to redo the update.

With that extra code, the SCN method can be used with confidence, even on tables without ROWDEPENDENCIES.

If your table has ROWDEPENDENCIES and you accept that two sessions cannot update the same row at the same time no matter what, then you don’t need that extra code.

 

Optimistic Locking: One SCN to rule them all

Previously I showed how to avoid lost updates with ORA_ROWSCN. Now let’s create an API that avoids lost updates with just one SCN.

What kind of API?

A transaction consists of one or more changes to data that should happen together: either all should happen or none. When the transaction commits, all the changes happen; when the transaction rolls back, none of them happen. This is the “atomic” part of ACID, the set of properties of database transactions.

A “statement”, be it SQL or a call to the PL/SQL engine, is also atomic:

  • When a “client” calls the database and the call returns with an exception, then any changes made during the call are automatically rolled back. All the data is exactly what it was before the call was made.
  • Of course, when everything goes well the changes are still there.
  • In both cases, the client must either COMMIT or ROLLBACK whatever changes remain that are within the scope of the transaction.

This “statement level atomicity” is guaranteed as long as no COMMITs or ROLLBACKS are executed during the call, and as long as the exception is allowed to return to the client.

So there are two kinds of atomicity we can leverage: at the statement level and the transaction level.

Either way, when a client calls an “atomic API”, it must provide all the input necessary to make all the changes that form a logical unit. If the client has to make more than one call, then the API is obviously not “atomic”.

How does optimistic locking fit in?

In order to do optimistic locking, there is a two-step process:

  1. The application gets data from the “old” rows.
    Since the rows must be checked later for changes, row-based version numbers or checksums may accompany the data.
  2. The application sends the “new” data back, together with whatever “old” data is needed to check the rows for changes, and the checking is executed within the UPDATE statement.

I suggest that for each API that updates the database, we should provide one API that provides the “old” data in the first place.

Getting the “old” data – and one SCN

Ideally, the “old” data should be read consistent: all the data should have existed at the same point in time in the database. Oracle guarantees this consistency within one SQL query, but it does not normally guarantee it across queries. To get read consistency across queries, use a “read-only transaction” or flashback queries.

However we do it, if we get read consistent data we can also get the SCN used by our query or queries. If any of the rows change afterwards, the ORA_ROWSCN of those rows must be greater than that SCN. We can use the SCN like a universal version number to check for any changes to any of the rows we queried.

Example: getting the data

I’m going to demonstrate using the 12cR2 SQL/JSON functions, because they allow me to get all the data I need for one logical unit in a single SQL statement, even if the rows involved are not related and have different columns.

In my stupid example, I assume a user wants to manually update the departments in the DEPT table: the DNAME and LOC columns should resemble the DEPARTMENT_NAME and CITY columns from the DEPARTMENTS and LOCATIONS tables in the HR schema.

with json_departments as(
    select json_arrayagg(
        json_object(
            'DEPARTMENT_ID' value DEPARTMENT_ID,
            'DEPARTMENT_NAME' value DEPARTMENT_NAME,
            'CITY' value CITY,
            'STATE_PROVINCE' value STATE_PROVINCE,
            'COUNTRY_ID' value COUNTRY_ID
        )
        order by DEPARTMENT_ID
    ) departments
    from hr.departments join hr.locations using(location_id)
    where DEPARTMENT_ID <= 40
)
, json_depts as (
    select json_arrayagg(
        json_object(
            'DEPTNO' value DEPTNO,
            'DNAME' value DNAME,
            'LOC' value LOC
        )
        order by deptno
    ) depts
    from dept
)
select json_object(
    'SCN' value timestamp_to_scn(systimestamp),
    'DEPARTMENTS' value departments,
    'DEPTS' value depts
) json_output
from json_departments, json_depts;
{
  "SCN": 14080746,
  "DEPARTMENTS": [
    {
      "DEPARTMENT_ID": 10,
      "DEPARTMENT_NAME": "Administration",
      "CITY": "Seattle",
      "STATE_PROVINCE": "Washington",
      "COUNTRY_ID": "US"
    },
    {
      "DEPARTMENT_ID": 20,
      "DEPARTMENT_NAME": "Marketing",
      "CITY": "Toronto",
      "STATE_PROVINCE": "Ontario",
      "COUNTRY_ID": "CA"
    },
    {
      "DEPARTMENT_ID": 30,
      "DEPARTMENT_NAME": "Purchasing",
      "CITY": "Seattle",
      "STATE_PROVINCE": "Washington",
      "COUNTRY_ID": "US"
    },
    {
      "DEPARTMENT_ID": 40,
      "DEPARTMENT_NAME": "Human Resources",
      "CITY": "London",
      "STATE_PROVINCE": null,
      "COUNTRY_ID": "UK"
    }
  ],
  "DEPTS": [
    {
      "DEPTNO": 10,
      "DNAME": "ACCOUNTING",
      "LOC": "NEW YORK"
    },
    {
      "DEPTNO": 20,
      "DNAME": "RESEARCH",
      "LOC": "DALLAS"
    },
    {
      "DEPTNO": 30,
      "DNAME": "SALES",
      "LOC": "CHICAGO"
    },
    {
      "DEPTNO": 40,
      "DNAME": "OPERATIONS",
      "LOC": "BOSTON"
    }
  ]
}

I used timestamp_to_scn(systimestamp) instead of accessing V$DATABASE or using DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER because the latter two require that privileges be granted. I suggest granting the necessary privileges. Notice that the data from the two tables are decorrelated.

Example: the update API

Now that the user has decided on the changes, the application calls the API with more JSON:

{"Action":"Update","SCN":14080746,"DEPTS":[
            {"DEPTNO":10,"DNAME":"Admin","LOC":"Seattle"},
            {"DEPTNO":20,"DNAME":"Marketing","LOC":"Toronto"},
            {"DEPTNO":30,"DNAME":"Purchasing","LOC":"Seattle"},
            {"DEPTNO":40,"DNAME":"HR","LOC":"London"}
            ]}

Notice that the only extra data required for optimistic locking is that one SCN.

Now here is the API that converts the JSON to relational data and updates DEPT:

create or replace procedure dept_xapi(p_json_input varchar2) as
    cursor c_input is select 0 SCN, deptno, dname, loc from dept;
    type tt_dept is table of c_input%rowtype;
    lt_dept tt_dept;
    l_rowcount number;
begin
    select SCN, deptno, dname, loc bulk collect into lt_dept
    from json_table(p_json_input, '$'
        columns(
            action path '$.Action',
            scn number path '$.SCN',
            nested path '$.DEPTS[*]'
            columns(
                deptno number path '$.DEPTNO',
                dname path '$.DNAME',
                loc path '$.LOC'
            )
        )
    )
    where action = 'Update';
    forall i in 1..lt_dept.count
        update dept set dname = lt_dept(i).dname, loc = lt_dept(i).loc
            where deptno = lt_dept(i).deptno
            and ora_rowscn <= lt_dept(i).scn
            and coalesce('@',dname||loc) is not null;
    for i in 1..lt_dept.count loop
        if sql%bulk_rowcount(i) = 0 then
            raise_application_error(
                -20001,
                'DEPT row with DEPTNO = ' || lt_dept(i).deptno||' already changed by another user. No updates have been made.');
        end if;
    end loop;
end dept_xapi;
/

…and to call the API:

begin
    dept_xapi('{"Action":"Update","SCN":14080746,"DEPTS":[
            {"DEPTNO":10,"DNAME":"Admin","LOC":"Seattle"},
            {"DEPTNO":20,"DNAME":"Marketing","LOC":"Toronto"},
            {"DEPTNO":30,"DNAME":"Purchasing","LOC":"Seattle"},
            {"DEPTNO":40,"DNAME":"HR","LOC":"London"}
            ]}');
end;
/

Conclusion

The API as I have coded it is atomic: it either does all the updates or does none of them – as long as it is called from a client. If it is called from other PL/SQL code that does explicit COMMITs or ROLLBACKs, or does not let the exception go back to the client, then the atomicity does not operate.

In order to fit into the “ThickDB” paradigm as promoted by Toon Koppelaars and Bryn Llewellyn, this code would not be called directly from the client. Rather, it would be called by another API that would explicitly COMMIT or ROLLBACK before returning to the client.

Whichever path you choose, the client must receive read-consistent data, and provide all the data for the related updates in one call. To avoid lost updates, just add one SCN

Avoiding Lost Updates with ORA_ROWSCN

Applications today use “optimistic locking” to avoid lost updates. Tom Kyte thought of using ORA_ROWSCN for optimistic locking, then decided it didn’t work (see ORA_ROWSCN for optimistic locking). I now think it does!

If you don’t understand what an SCN is, and how it helps with read consistency, this post may go over your head. Sorry I can’t take the time to try explaining all that here…

Test Data and Use Case

Oracle generally keeps one ORA_ROWSCN per block, since that is enough (with the UNDO) to ensure read consistency. However, we can use ROWDEPENDENCIES when creating the table to make room for one SCN per row.

I’m going to create the venerable EMP table with ROWDEPENDENCIES:

CREATE TABLE EMP (
    EMPNO NUMBER(4) CONSTRAINT PK_EMP PRIMARY KEY,
    ENAME VARCHAR2(10),
    JOB VARCHAR2(9),
    MGR NUMBER(4),
    HIREDATE DATE,
    SAL NUMBER(7,2),
    COMM NUMBER(7,2),
    DEPTNO NUMBER(2)
) rowdependencies;

INSERT INTO EMP VALUES
(7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
INSERT INTO EMP VALUES
(7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
INSERT INTO EMP VALUES
(7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
INSERT INTO EMP VALUES
(7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
INSERT INTO EMP VALUES
(7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
INSERT INTO EMP VALUES
(7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
INSERT INTO EMP VALUES
(7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
INSERT INTO EMP VALUES
(7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
INSERT INTO EMP VALUES
(7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
INSERT INTO EMP VALUES
(7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
INSERT INTO EMP VALUES
(7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);
INSERT INTO EMP VALUES
(7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
INSERT INTO EMP VALUES
(7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
INSERT INTO EMP VALUES
(7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
commit;

select distinct ora_rowscn from emp;
ORA_ROWSCN
13848205

 

In honor of Oracle’s 40-year birthday, SMITH is going to submit a batch job to give everyone a 10% raise, including himself. At the same time, MILLER is doing a reorganization: he assigns SMITH to department 30, using an OLTP application that updates both department number and salary.

Lost Updates

Here is what happens when an application neglects “optimistic locking”:

BATCH MILLER
– A1: SMITH submits a batch job for 10% raises

update emp set sal = sal * 1.1;

– A2: MILLER queries SMITH and sees committed data

select empno, sal, deptno from emp
where ename = ‘SMITH’;

EMPNO SAL DEPTNO
7369 800 20
– A3: MILLER reassigns SMITH: update waits for batch job to commit

update emp set sal = 800, deptno = 30
where empno = 7369;

– A4: the batch job commits

commit;

-The application sees “1 rows updated.”
– A5: MILLER commits

commit;

– A6: SMITH discovers he was the only one who didn’t get a raise!

select empno, sal, deptno from emp
where ename = ‘SMITH’;

EMPNO SAL DEPTNO
7369 800 30

 

You see what happened here? Oracle told MILLER that SMITH’s salary was 800, because that was the latest committed value. The batch update had not committed, so the new salary was invisible to any session but the batch session itself. Since the OLTP application updated the salary as well, it replaced the new salary with the old one.

For details on lost updates, see the documentation on Data Concurrency and Consistency.

Optimistic Locking using column values

To avoid lost updates, the application has to check during its update that someone else has not changed the data in the meantime.

BATCH MILLER
– B1: SMITH submits a batch job for 10% raises

update emp set sal = sal * 1.1;

– B2: MILLER queries SMITH and sees committed data

select empno, sal, deptno from emp
where ename = ‘SMITH’;

EMPNO SAL DEPTNO
7369 800 20
– B3: MILLER reassigns SMITH: update waits for batch job to commit

update emp set sal = 800, deptno = 30
where empno = 7369
and sal = 800 and deptno = 20;

– B4: the batch job commits

commit;

 -The application sees “0 rows updated.”
and explains to MILLER that someone else
already changed the data.
– B5: SMITH got his raise!

select empno, sal, deptno from emp
where ename = ‘SMITH’;

EMPNO SAL DEPTNO
7369 880 20

 

You may be wondering why Step B3 didn’t find the row it wanted to update. That is because of “restartability”:

  • Oracle did a “consistent get” to get the latest committed values, where SAL was 800
  • but then it did a “current get” to apply MILLER’s change to the current version of the data. This is when it blocked, waiting for the batch job to commit and release the lock on that row.
  • When the “current get” finally happened, Oracle noticed that the SAL value had changed, so it restarted the update: it did another consistent get, at which time it found no rows that matched the WHERE clause – so it did not do the update.

For a detailed explanation of restartability, see these posts by Tom Kyte:

http://tkyte.blogspot.fr/2005/08/something-different-part-i-of-iii.html

http://tkyte.blogspot.fr/2005/08/part-ii-seeing-restart.html

http://tkyte.blogspot.fr/2005/09/part-iii-why-is-restart-important-to.html

Why not use ORA_ROWSCN?

Tom discovered that putting ORA_ROWSCN in the WHERE clause did not trigger a restart. In our scenario:

BATCH MILLER
– C1: SMITH submits a batch job for 10% raises

update emp set sal = sal * 1.1;

– C2: MILLER queries SMITH and sees committed data

select empno, sal, deptno, ora_rowscn from emp
where ename = ‘SMITH’;

EMPNO SAL DEPTNO ORA_ROWSCN
7369 800 30 13857633

 

– C3: MILLER reassigns SMITH: update waits for batch job to commit

update emp set sal = 800, deptno = 30
where empno = 7369 and ora_rowscn = 13857633;

– C4: the batch job commits

commit;

-The application sees “1 rows updated.”

 

No reason to go any further: the update did not see that the batch job had changed the ORA_ROWSCN.

Making ORA_ROWSCN work!

Here is what I stumbled on: in the WHERE clause, referring to ORA_ROWSCN will not trigger a restart, but referring to SAL, which changed, will trigger a restart – and once the restart is triggered, then the new “consistent get” will see the new value of ORA_ROWSCN. We don’t need the old value of SAL, we just need to mention it. For example:

BATCH MILLER
– D1: SMITH submits a batch job for 10% raises

update emp set sal = sal * 1.1;

– D2: MILLER queries SMITH and sees committed data

select empno, sal, deptno, ora_rowscn from emp
where ename = ‘SMITH’;

EMPNO SAL DEPTNO ORA_ROWSCN
7369 800 30 13865770

 

– D3: MILLER reassigns SMITH: update waits for batch job to commit

update emp set sal = 800, deptno = 30
where empno = 7369
and ora_rowscn = 13865770
and coalesce(‘*’,sal||deptno) is not null;

– D4: the batch job commits

commit;

 -The application sees “0 rows updated.”
and explains to MILLER that someone else
already changed the data.

 

I decided to use COALESCE to refer to the column values because it uses short-circuit evaluation. Once it determines that ‘*’ is not null, it doesn’t bother evaluating the concatenation of the columns. Also, concatenation implicitly converts every datatype to VARCHAR2 so I can list strings, dates, timestamps, numbers and intervals without worry.

Conclusion

ORA_ROWSCN can indeed be used as a “version number” that can be checked to avoid lost updates. The UPDATE statement just has to list the appropriate columns in the WHERE clause, but it doesn’t have to check them against values from a prior SELECT statement. The only real value that needs to be checked is ORA_ROWSCN.

There are some limitations and caveats here:

  1. If your table has not been created with ROWDEPENDENCIES, ORA_ROWSCN will change every time any row in the block has changed. The table has to be “recreated” with ROWDEPENDENCIES for this to work.
  2. You cannot use this method with Index Organized Tables! If you create such a table with ROWDEPENDENCIES, then SELECT ORA_ROWSCN… will return NULL.
  3. You might wonder whether a “delayed block cleanout” will cause the ORA_ROWSCN to change without the data having changed. I don’t see how that could be a problem, since the SELECT done to get the data (and the ORA_ROWSCN) will do the cleanout as part of its processing.

Comparing Expression Lists (not tuples)

This post is a long-winded answer to a question Bryn Llewellyn asked me on Twitter about comparing expression lists.

Sabine Heimsath recently tweeted a regret about CASE and “tuples”, and gave an example of what she meant.

This is an example of a “simple” case statement (or expression?) using expression lists, not tuples. Tuples are unordered sets of attribute values, and here order matters. As Sabine discovered, this syntax is not supported.

[Update 2017-05-09: a case statement ends with END CASE and has statements after each THEN keyword. A case expression has expressions after THEN but does not have the CASE keyword at the end. Sabine appears to have mixed the two syntaxes.]

Oren Nakdimon then gave an example of a “searched” case statement expression that handles expression lists:

Sabine thought this meant that IN works and “=” does not. I replied that “=” works, but only in SQL not in PL/SQL, and provided you put an extra pair of parentheses around the expression list on the right hand side.

This led to a question from Bryn Llewellyn:

Since I need more than 140 characters, here is my attempt to answer Bryn.

His example is a simple comparison condition. If you look at the illustration below (taken from the documentation), Bryn is using the lower form with parentheses around the left-hand expressions. Therefore, the expression_list must also be surrounded by parentheses.

Additionally, since there are multiple expressions to the left of the operator, the expression_list must also be in its lower form:

So there must be an additional set of parentheses.

All of the above are from the SQL reference. The PL/SQL language reference has its own documentation on comparisons, and I have found no explicit reference to expression lists.

Multi-Table Inserts with Sequences

Multi-table inserts are a drag when sequences are involved, because we don’t have enough control over when the sequence is incremented. I’ve seen some dangerous workarounds. Here is a workaround that I think is safe.

[Update  2016-02-12: paulzip caught me assuming the output would be ordered without an ORDER BY. I made the correction.]

DEPT and EMP from XML

Here is a snippet from a typical XML document with DEPT and EMP information.

...
<DEPTS>
    <DEPT>
        <DNAME>ACCOUNTING</DNAME>
        <LOC>NEW YORK</LOC>
        <EMPS>
            <EMP>
                <ENAME>CLARK</ENAME>
                <JOB>MANAGER</JOB>
                <HIREDATE>1981-06-09</HIREDATE>
                <SAL>2450</SAL>
            </EMP>
            <EMP>
                <ENAME>MILLER</ENAME>
                <JOB>CLERK</JOB>
                <HIREDATE>1982-01-23</HIREDATE>
                <SAL>1300</SAL>
            </EMP>
...
        </EMPS>
    </DEPT>
...
</DEPTS>

The fact that CLARK works in the ACCOUNTING department is indicated by the placement of CLARK inside the path of ACCOUNTING. (To simplify the problem, I leave out MGR.)

The following view will generate the XML content from the SCOTT schema, leaving out DEPTNO, EMPNO and MGR.

create or replace view dept_emp_xml as
with emps as (
  select deptno,
    xmlagg(
      xmlelement(
        "EMP",
        xmlforest(ENAME, JOB, HIREDATE, SAL, COMM)
      )
    ) emps
  from scott.emp
  group by deptno
)
select xmlelement(
  "DEPTS",
  xmlagg(
    xmlelement(
      "DEPT",
      xmlforest(DNAME, LOC, EMPS)
    )
  )
) depts
from scott.dept
left join emps using(deptno);

From XML to Relational

Let’s say I want to load this data to relational tables. I can use the XMLTABLE() function to convert the data to a result set:

create or replace view dept_emp_rel as
select deptno, DNAME, LOC, RN, 0 EMPNO, ENAME, JOB, HIREDATE, SAL, COMM
from dept_emp_xml a,
XMLTABLE(
  '$X/DEPTS/DEPT' PASSING DEPTS AS X
  COLUMNS
    deptno for ordinality,
    DNAME VARCHAR2(63) path 'DNAME',
    LOC VARCHAR2(63) path 'LOC',
    EMPS xmltype path 'EMPS'
) (+) b,
xmltable(
  '/EMPS/EMP' passing b.EMPS
  columns
    rn for ordinality,
    ENAME VARCHAR2(63) path 'ENAME',
    JOB VARCHAR2(63) path 'JOB',
    HIREDATE date path 'HIREDATE',
    SAL number path 'SAL',
    COMM number path 'COMM'
) (+) c;
DEPTNO DNAME LOC RN EMPNO ENAME JOB HIREDATE SAL COMM
1 ACCOUNTING NEW YORK 1 0 CLARK MANAGER 1981-06-09 2450
1 ACCOUNTING NEW YORK 2 0 MILLER CLERK 1982-01-23 1300
1 ACCOUNTING NEW YORK 3 0 KING PRESIDENT 1981-11-17 5000
2 RESEARCH DALLAS 1 0 SMITH CLERK 1980-12-17 800
2 RESEARCH DALLAS 2 0 FORD ANALYST 1981-12-03 3000
2 RESEARCH DALLAS 3 0 ADAMS CLERK 1987-05-23 1100
2 RESEARCH DALLAS 4 0 SCOTT ANALYST 1987-04-19 3000
2 RESEARCH DALLAS 5 0 JONES MANAGER 1981-04-02 2975
3 SALES CHICAGO 1 0 ALLEN SALESMAN 1981-02-20 1600 300
3 SALES CHICAGO 2 0 JAMES CLERK 1981-12-03 950
3 SALES CHICAGO 3 0 TURNER SALESMAN 1981-09-08 1500 0
3 SALES CHICAGO 4 0 BLAKE MANAGER 1981-05-01 2850
3 SALES CHICAGO 5 0 MARTIN SALESMAN 1981-09-28 1250 1400
3 SALES CHICAGO 6 0 WARD SALESMAN 1981-02-22 1250 500
4 OPERATIONS BOSTON 0

Loading the Result Set

Let’s imagine for a moment that the result set contains valid DEPTNO and EMPNO values. If that were true, we could use a multi-table insert to load the DEPT and EMP records. The important thing is to have the correct DEPTNO value in every row, so that the foreign key in EMP is correct.

drop table dept purge;
create table dept as
select * from scott.dept where 1=0;

drop table emp purge;
create table emp as
select EMPNO, ENAME, JOB, HIREDATE, SAL, COMM, DEPTNO
from scott.emp
where 1=0;

insert all
when rn = 1 or rn is null then
  into dept(deptno, dname, loc)
     values(deptno, dname, loc)
when rn is not null then
  into emp(EMPNO, ENAME, JOB, HIREDATE, SAL, COMM, DEPTNO)
    values(EMPNO, ENAME, JOB, HIREDATE, SAL, COMM, DEPTNO)
select * from dept_emp_rel order by deptno, rn;

select * from dept
left join emp using(deptno)
order by deptno;
DEPTNO DNAME LOC EMPNO ENAME JOB HIREDATE SAL COMM
1 ACCOUNTING NEW YORK 0 CLARK MANAGER 1981-06-09 2450
1 ACCOUNTING NEW YORK 0 KING PRESIDENT 1981-11-17 5000
1 ACCOUNTING NEW YORK 0 MILLER CLERK 1982-01-23 1300
2 RESEARCH DALLAS 0 SCOTT ANALYST 1987-04-19 3000
2 RESEARCH DALLAS 0 ADAMS CLERK 1987-05-23 1100
2 RESEARCH DALLAS 0 FORD ANALYST 1981-12-03 3000
2 RESEARCH DALLAS 0 SMITH CLERK 1980-12-17 800
2 RESEARCH DALLAS 0 JONES MANAGER 1981-04-02 2975
3 SALES CHICAGO 0 ALLEN SALESMAN 1981-02-20 1600 300
3 SALES CHICAGO 0 JAMES CLERK 1981-12-03 950
3 SALES CHICAGO 0 TURNER SALESMAN 1981-09-08 1500 0
3 SALES CHICAGO 0 BLAKE MANAGER 1981-05-01 2850
3 SALES CHICAGO 0 MARTIN SALESMAN 1981-09-28 1250 1400
3 SALES CHICAGO 0 WARD SALESMAN 1981-02-22 1250 500
4 OPERATIONS BOSTON

Handling the Sequence!

The problem with sequences in my INSERT ALL statement is that any sequence will be incremented for every row returned from the SELECT. This is the problem people try to work around. Often, the idea is to increment the sequence inside a function. Unfortunately, there is never a guarantee that the function will be called exactly when we want.

My suggestion is to use a pipelined table function, which gives us complete control over when the sequence is incremented. The function takes the result of the XMLTABLE() query and manages the sequences as follows:

  1. When the incoming DEPTNO value changes, increment the DEPT sequence.
  2. When RN is not null, increment the EMP sequence.

The output of the function is identical to the input, except that DEPTNO and EMPNO are updated from the sequences.

drop sequence dept_seq;
create sequence dept_seq;

drop sequence emp_seq;
create sequence emp_seq;

create or replace package dept_emp as
  type tt_depts is table of dept_emp_rel%rowtype;
  function add_seq return tt_depts pipelined;
end dept_emp;
/

create or replace package body dept_emp as
  function add_seq return tt_depts pipelined is
    l_prev_deptno dept_emp_rel.deptno%type := 0;
    l_seqno integer;
  begin
    for rec in (select * from dept_emp_rel order by deptno, rn) loop
      if rec.deptno > l_prev_deptno then
        l_prev_deptno := rec.deptno;
        l_seqno := dept_seq.nextval;
      end if;
      rec.deptno := l_seqno;
      if rec.rn is null then
        rec.empno := null;
      else
        rec.empno := emp_seq.nextval;
      end if;
      pipe row(rec);
    end loop;
  end add_seq;
end dept_emp;
/
select * from table(dept_emp.add_seq);
DEPTNO DNAME LOC RN EMPNO ENAME JOB HIREDATE SAL COMM
1 ACCOUNTING NEW YORK 1 1 CLARK MANAGER 1981-06-09 2450
1 ACCOUNTING NEW YORK 2 2 MILLER CLERK 1982-01-23 1300
1 ACCOUNTING NEW YORK 3 3 KING PRESIDENT 1981-11-17 5000
2 RESEARCH DALLAS 1 4 SMITH CLERK 1980-12-17 800
2 RESEARCH DALLAS 2 5 FORD ANALYST 1981-12-03 3000
2 RESEARCH DALLAS 3 6 ADAMS CLERK 1987-05-23 1100
2 RESEARCH DALLAS 4 7 SCOTT ANALYST 1987-04-19 3000
2 RESEARCH DALLAS 5 8 JONES MANAGER 1981-04-02 2975
3 SALES CHICAGO 1 9 ALLEN SALESMAN 1981-02-20 1600 300
3 SALES CHICAGO 2 10 JAMES CLERK 1981-12-03 950
3 SALES CHICAGO 3 11 TURNER SALESMAN 1981-09-08 1500 0
3 SALES CHICAGO 4 12 BLAKE MANAGER 1981-05-01 2850
3 SALES CHICAGO 5 13 MARTIN SALESMAN 1981-09-28 1250 1400
3 SALES CHICAGO 6 14 WARD SALESMAN 1981-02-22 1250 500
4 OPERATIONS BOSTON

And now my INSERT ALL statement works like a charm:

insert all
when rn = 1 or rn is null then
  into dept(deptno, dname, loc)
     values(deptno, dname, loc)
when rn is not null then
  into emp(EMPNO, ENAME, JOB, HIREDATE, SAL, COMM, DEPTNO)
    values(EMPNO, ENAME, JOB, HIREDATE, SAL, COMM, DEPTNO)
select * from table(dept_emp.add_seq);

Conclusion

When foreign keys reference primary keys that are themselves generated from sequences, loading the parent and child tables together is a great idea – if we can increment those sequences correctly. A simple pipelined function gives us the control we need with few lines of code and little overhead.

Disappearing Data

Did you know that Oracle DML can silently change data you want to put into the database? A recent twitter discussion started by Martin Widlake (@MDWidlake) prompted me to study more closely what happens when.

Number, precision and scale

When we define a number, we can specify its “precision” (the maximum number of significant decimal digits) and its “scale” (the number of digits from the decimal point to the least significant digit). Here are some examples:

  • NUMBER(2,0): an integer in the range of -99 through 99.
  • NUMBER(2,1): a number from -9.9 through 9.9, always rounded to tenths.
  • NUMBER(2,-1): a number from -990 through 990, always rounded to tens.

DML complains about precision

Suppose I have a table column defined as NUMBER(2,0) and I try to put the value 100 in there. There is no room for that value, so Oracle will raise an exception:

ORA-01438: value larger than specified precision allowed for this column

DML complains about scale: NO, it rounds

Now suppose I try to put the value 99.4 into my NUMBER(2,0) column. Oracle will not raise an exception: it will silently round the value to 99 and insert that! The input data is changed with no warning.

The Oracle documentation is clear on the subject:

If a value exceeds the precision, then Oracle returns an error. If a value exceeds the scale, then Oracle rounds it.

What about other data types?

There are two other data types that are subject to rounding: TIMESTAMP (with or without time zone) and INTERVAL DAY TO SECOND. We can specify the “precision” of the fractional seconds. For example:

  • TIMESTAMP(1) only allows one digit after the decimal point for seconds
  • INTERVAL DAY TO SECOND(3) allows up to 3 digits after the decimal point.

Again, if we try to put in a more precise value, Oracle will silently round to make the data fit.

An apparent exception: TIMESTAMP to DATE

Going from TIMESTAMP to DATE is not rounding, it is a datatype conversion. When converting to a DATE, any fractional seconds are discarded. The effect is truncation rather than rounding.

If you want rounding to occur, then CAST the timestamp to TIMESTAMP(0) first.

select systimestamp right_now,
cast(systimestamp as date) truncated,
cast(cast(systimestamp as timestamp(0)) as date) rounded
from dual;
RIGHT_NOW TRUNCATED ROUNDED
2017-01-06 13:24:19.606274 +01:00 2017-01-06 13:24:19 2017-01-06 13:24:20

 

What does this have to do with Twitter?

In the recent twitter conversation that Martin started, there were two suggestions:

  1. Create a new datatype (called DAY, perhaps) that would be a valid date without a time component
  2. Allow for “scale” in the DATE datatype: allow it to contain only the year, or only year+month, or only year+month+day, etc.

Supposing Oracle managed these data types as it does the current types, then in case 1. converting from DATE to DAY would cause truncation of the time component, whereas in case 2. silent rounding would occur.

I think “rounding” to the nearest day, month or year would be strange, don’t you?

#DOAG2016: Bug !

I did it: in my proudest moment at DOAG2016, at the very end of my second presentation, I presented a “brilliant” SQL query with a bug in it. After agonizing for awhile, I finally saw that the correction was simple and didn’t change the timing.

If you were there, the bug was in slide 21 where I joined two tables with overlapping ranges. For some reason I thought I should order the processing by the end of the range (which sometimes works) instead of by the start of the range (which always works – so far).

I have uploaded the corrected presentation to the DOAG website and to slideshare. I will be blogging about the content, so stay tuned.

Here is the link to the presentation on slideshare:

Apologies for the bug…