Statement-Level Atomicity

[UPDATE 2014/11/17: Horrible copy/paste error! My PL/SQL code below left out the all-important RAISE; command in the exception handler. Many thanks to Dom Brooks for giving me a heads up.]

So important yet often ignored or misunderstood! No, not me, but “statement level atomicity”.

Whenever you call the Oracle database to change data, the result will be all or nothing: Oracle will either do everything you asked it to, or nothing at all. This is true for SQL and PL/SQL.

What the documentation says

From the Concepts guide:

If a SQL statement causes an error during execution, then it is not successful and so all effects of the statement are rolled back. This operation is a statement-level rollback. This operation has the following characteristics:

  • A SQL statement that does not succeed causes the loss only of work it would have performed itself.The unsuccessful statement does not cause the loss of any work that preceded it in the current transaction…
  • The effect of the rollback is as if the statement had never been run.

There is no specific mention of PL/SQL here. I asked the author, Tom Kyte, about PL/SQL “statements” on, “unhandled exceptions”. He replied:

any “statement” is an atomic statement.
every “statement” is
plsql is just a statement, so is update, they are all just statements.
all statements in Oracle are atomic…

I kept searching the Oracle documentation and finally found this in the TimesTen PL/SQL Developer’s Guide:

TimesTen PL/SQL differs from Oracle Database PL/SQL in a scenario where an application executes PL/SQL in the middle of a transaction, and an unhandled exception occurs during execution of the PL/SQL. Oracle Database rolls back to the beginning of the anonymous block. TimesTen does not roll back.

So there it is in black and white: it doesn’t matter whether you call the SQL engine or the PL/SQL engine, your “statement” either makes all the changes you asked for or none at all.

What Steven Feuerstein says

This blog was inspired by Steven’s recent tweet:

This may seem to contradict the Oracle documentation, but in fact it doesn’t. Steven is talking about what happens inside the PL/SQL code, whereas the documentation refers to what happens after the PL/SQL code has finished.

What my tests show

Here is a simple test case that shows how Steven is right.


set serveroutput on

  procedure show_cnt(p_label in varchar2) is
    l_cnt number;
    select count(*) into l_cnt from t;
    dbms_output.put_line(p_label || ', count(*) = ' || l_cnt);
  end show_cnt;
  show_cnt('At beginning');
  insert into t values(1);
  show_cnt('After good insert');
  insert into t values('a');
exception when invalid_number then
  show_cnt('After bad insert');
  -- the following line reraises the exception
Error report -
ORA-01722: invalid number
ORA-06512: at line 16
01722. 00000 -  "invalid number"
*Cause:    The specified number was invalid.
*Action:   Specify a valid number.
At beginning, count(*) = 0
After good insert, count(*) = 1
After bad insert, count(*) = 1

select count(*) from t;


select count(*) trans_cnt
from v$transaction;


Notice how the first insert did NOT get rolled back within the PL/SQL block, but it DID get rolled back after the block ended!
Notice also that, since the PL/SQL block was the first statement in a transaction, there is no transaction anymore. The situation is exactly as if the statement never ran in the first place.


  • When a “client” calls the Oracle database, it doesn’t matter whether it calls the SQL engine or the PL/SQL engine, the entire call is a “statement” that will always be atomic.
  • When PL/SQL (inside the database) calls the SQL engine, the SQL statement is atomic. If the execution goes wrong, only that SQL statement is rolled back.
  • PL/SQL execution “goes wrong” if and only if an exception is passed back to the calling program. Without the RAISE in my exception handler, the exception would never go back to the caller and Oracle would not roll back the statement!
  • If the PL/SQL code does a COMMIT or ROLLBACK itself, statement-level atomicity will not work as intended.

OpenWorld Presentation on Row Pattern Matching

You can download my presentation from

Please download instead of viewing: the slideshare viewer doesn’t show the animations I worked so hard on.

I have already blogged about some of the topics in my presentation. Sometime soon I will continue with in depth looks at other issues, such as “catastrophic backtracking”.

To those many who missed it, I will be making the same presentation at UKOUG Tech 14 in December.

Oracle 12c Row Pattern Matching: Beat the Best Pre-12c Solutions at OpenWorld!

Catchy title, don’t you think? My session has been moved to Monday 4 P.M., in direct conflict with Tom Kyte – and Keith Laker, who asked me to present in the first place.

Avoid the lines: come see the MATCH_RECOGNIZE clause push great pre-12c solutions into retirement. As a bonus, be the first person on your block able to prevent “catastrophic backtracking”.

Click here to see my session in the OpenWorld catalog

Rob van Wijk and ad hoc Grouping

In March 2014, I wrote about two methods for ad hoc grouping: “Grouping Sequences” and “Start of Group“. I just found out that Rob van Wijk wrote on the same subjects two months earlier. We even used the same example!

I mention Rob’s article for two reasons:

  • If my explanation wasn’t clear enough, you can try Rob’s.
  • When I use “Start of Group”, Rob uses a similar method he calls “max-on-case-row-number”. These techniques are completely equivalent. Execution plans and performance are identical.

The “Start of Group” method can take many forms. Don’t think you need to know every one! Just choose the form you understand and like best, and use it all the time.

Don’t forget, when you get 12c you can replace both methods by the MATCH_RECOGNIZE clause :)

SQL and date ranges: don’t make NULL mean something

We often use NULL values in date ranges, usually in the end date to signify the indefinite future. I have decided that this is a bad idea; here are my reasons.

[Update 2014-07-04] There have been some great replies to this post. I am going to address some of their points.

1) NULL should not mean anything

According to the SQL standard, NULL means “unknown”. That is why you can’t really compare a NULL value to anything, you can only say if it IS NULL or not.

If you use NULL in date ranges, it doesn’t just mean something, it means two different things! In the start date it means “from the beginning of time”, and in the end date it means “‘until the end of time”. This in not only contrary to the “official” meaning of NULL, it is confusing.

[Update] Kevan Gelling points out that there may really be “unknown” values, in which case NULL should be allowed. I agree, as long as NULL is reserved for that use.

Jeffrey Kemp says: ‘In most cases an End Date of NULL means “we don’t know (yet) what the end date will be – or even if it will ever end”.’ Jeffrey, you should add “we don’t even know if it has already ended”! Your phrase taken alone implies that the end date is in the future, not in the past. This makes NULL meaningful.

2) We can use real date limits instead of NULL

One argument for using NULL is that it means the earliest possible start date, or the latest possible end date. We don’t need NULL for that! In Oracle, the earliest possible date is 4712-01-01 B.C., or DATE '-4712-01-01'. The latest possible date is DATE '9999-12-31' (you can add the time element 23:59:59 if you want to be a purist.)

To enforce this, I suggest declaring the start and end date columns as NOT NULL with default values. In Database 12c, I would use the DEFAULT ON NULL clause: this clause puts the default value in the column even if you explicitly try to put a NULL there.

If you want the output to show NULL instead of these default values, you do have to use something like NULLIF() in your SELECT clause.

[Update] Again, Kevan Gelling argues that NULL may be necessary to indicate a value that is not known. In that case my suggestion is no good, but you still have to use something other than NULL to indicate “beginning of time” and “end of time”.

Several posters bring up the fact that artificial extreme dates will “skew” the data and make it hard for the optimizer to choose the best plan. They are right, I should mention this. However, the data is skewed whether I use extreme dates or NULL. Supposing I do use NULL to mean “the end of time”, many queries will have to use END_DATE IS NULL in the WHERE clause, so the skew is there no matter what.

3) Oracle doesn’t index NULLs

When you query data with date ranges, you often have to check “greater than” one value and “less than” another value. This may require separate indexes on start date and end date. If you have NULLs in your date ranges, those indexes will not be used since Oracle doesn’t create index entries when all the indexed columns are NULL.

If you use real values and NOT NULL, your indexes will always work.

[Update] Some readers apparently didn’t see the phrase that I have now underlined. The header was probably misleading: Oracle will indeed index a row if any of the indexed columns is not NULL.

Some objected that indexes on dates alone are rarely used: date ranges are almost always applied to some object, so the index will include the object and one or both dates. In that case, the rows with NULL dates will be indexed. I agree. In that situation, NULLs don’t cause a problem, and “skew” won’t either, as long as the object comes before the date in the index column list.

4) Queries on NULLable date ranges are hard

I have blogged about various date range problems: finding gaps, finding overlaps and merging contiguous ranges. Almost always, my solutions worked fine without NULLs and broke when I introduced NULLs. Handling NULLs required either more complex queries or substituting real values for NULL. Why go to all that work when we can just use the real values to begin with?

[Update] Even if we need NULL values sometimes to mean “unknown value”, we still don’t use IS NULL in our queries, so indexes should work when needed.

5) Unique constraints don’t work with NULL

Most queries on date ranges, including mine, assume that start dates are unique. In some cases, it may also be useful to have unique end dates. The only way to make sure they are unique is to add a unique constraint. Unfortunately, uniqueness is only enforced for NOT NULL values.

Most of the time, your constraint will be on two columns: some object and a date. In this case the unique constraint will work as long as the object column is NOT NULL. However, anytime you need unique dates throughout a table you must define them as NOT NULL and use default values.

[Update] As mentioned under paragraph 3), this point is only valid for unique constraints on the date alone. This is probably a rare case. Also, as Kevan reminded me, you could always use a function-based index on (DATE_COL, ‘X’) to make sure every row was indexed.

[Update] Conclusion: NULL should mean “unknown” only

As readers have pointed out, my arguments 3) and 5) are weak, since they only apply to indexes or constraints on a date column alone, and those are rarely needed.

I’ll stand by arguments 1), 2) and 4), but thanks to Kevan I’ll add one thing: just because I don’t use NULL to mean “beginning or end of time”, that doesn’t mean I might not need it to mean “unknown value”. In that case, default values may not be the way to go.

Finally, please read the replies: good stuff, and thanks to all!

SQL to find World Cup matches with comebacks

Lucas Jellema is writing a great series on SQL applied to the World Cup of football / soccer. In the article about finding matches with comebacks, he challenged readers to “find ‘dramatic comebacks’ where a team was two goals behind at some stage and yet managed to win the game.” Here is my reply, since replying directly on his blog was too challenging for me!

Here is a copy of Lucas’ DDL and test data. I added three rows to his data, one with a draw, one with a scoreless draw and one with a “dramatic comeback”.

drop table match_results;
   (    "GROUP1" VARCHAR2(1 BYTE), 
    "HOME" NUMBER(1,0), 
    "AWAY" NUMBER(1,0), 
    "HOME_GOALS" NUMBER(2,0), 
    "AWAY_GOALS" NUMBER(2,0), 
    "SDM_ID" NUMBER(2,0), 
    "ID" NUMBER(2,0) 
Insert into MATCH_RESULTS values 
Insert into MATCH_RESULTS values
Insert into MATCH_RESULTS values 
Insert into MATCH_RESULTS values 
Insert into MATCH_RESULTS values 
Insert into MATCH_RESULTS values 
Insert into MATCH_RESULTS values 
Insert into MATCH_RESULTS values 
Insert into MATCH_RESULTS values 
Insert into MATCH_RESULTS values 
Insert into MATCH_RESULTS values 

The SCORING_PROCESS column shows who scored in what order: for example ’100′ means one team scored the first goal, then the other team scored 2 goals.

Who won?

Conceptually, the first thing I do is determine who won the game, by counting the number of ’1′ and the number of ’0′. When the ’1′s are ahead, I say the winner is 1; when there are more ’0′s, I say the winner is -1. For draws, the “winner” is 0; I then remove draws in my WHERE clause.

select id, scoring_process, 
   - length(replace(scoring_process||'01','1',''))) 
from match_results
where sign(length(replace(scoring_process||'01','0','')) 
         - length(replace(scoring_process||'01','1',''))) 
  != 0;
1 1000 -1
2 0 -1
17 0 -1
18 1111 1
3 011111 1
4 0010 -1
19 11 1
20 10011 1
23 00011111 1

How far behind did the winner get?

For each row, I go through the SCORING_PROCESS column , turning ’1′ into the number 1 and ’0′ into the number -1. This makes it easy to use the SUM() analytic function to calculate who led by how much after each goal. Here is an example using a hard-coded value.

select sum(
) over (order by level)
from dual 
connect by level <= length('00011111');

In this case, I know the winner was team 1, so I can see that they came back from a 3-goal deficit. Putting it all together, here is a solution that shows all the comebacks and how far behind the winning team got.

select id, scoring_process, 
max(-winner*column_value) max_behind 
from ( 
  select id, scoring_process, 
     - length(replace(scoring_process||'01','1',''))) 
  from match_results
  where sign(length(replace(scoring_process||'01','0','')) 
           - length(replace(scoring_process||'01','1',''))) 
    != 0
) mr, 
  select sum( 
  ) over (order by level) 
  from dual 
  connect by level <= length(mr.scoring_process)
) as sys.odcinumberlist)) l
group by id, scoring_process
having max(-winner*column_value) > 0;
23 00011111 3
3 011111 1
1 1000 1
20 10011 1

Bin Fitting problems with SQL

“Bin Fitting” or “Bin Packing” means putting the greatest quantity in the smallest number of “bins” or containers. There are some good solutions with the MODEL clause, but the most concise and efficient solutions use the new 12c MATCH_RECOGNIZE clause.

There are two categories of “bin fitting” problems:

  1. The number of bins is fixed and the capacity can vary.
    > We want to equalize the quantities we put into each bin.
  2. The bins have a fixed capacity and the number of bins can vary.
    > We want to minimize the number of bins.

I’ll show an example of each.

Fill a fixed number of bins as equally as possible

Wikipedia calls this the “Multiprocessor scheduling problem“. Brendan Furey, author of a brilliant MODEL solution, calls it “balanced number partitioning“. Let’s take a simple, reproducible example:

SELECT LEVEL item_name, level item_value 
from dual
connect by level <= 10;

The total value of all the items is 55. We want to split the items into 3 “bins” or “buckets” or groups, so that the sums of the values are as equal as possible. Here the best we can do is put 19 in one bin and 18 in each of the other bins.

There are several algorithms that solve this problem more or less accurately. The one I’m going to show is fairly simple:

  • First, order the items by value in descending order
  • Then, assign each item to whatever bin has the smallest sum so far.

Here is Brendan’s MODEL implementation:

SELECT item_name, bin, item_value,
Max (bin_value) OVER (PARTITION BY bin) bin_value
  SELECT * FROM items
  DIMENSION BY (Row_Number() OVER (ORDER BY item_value DESC) rn)
    Row_Number() OVER (ORDER BY item_value DESC) bin,
    item_value bin_value,
    Row_Number() OVER (ORDER BY item_value DESC) rn_m,
    0 min_bin,
    Count(*) OVER () - :N_BINS - 1 n_iters
  RULES ITERATE(100000) UNTIL (ITERATION_NUMBER >= n_iters[1]) (
    min_bin[1] =
      Min(rn_m) KEEP (DENSE_RANK FIRST ORDER BY bin_value)[rn <= :N_BINS],
    bin[ITERATION_NUMBER + :N_BINS + 1] = 
    bin_value[min_bin[1]] =
      bin_value[CV()] + Nvl (item_value[ITERATION_NUMBER + :N_BINS + 1], 0)
ORDER BY item_value DESC;
10 1 10 19
9 2 9 18
8 3 8 18
7 3 7 18
6 2 6 18
5 1 5 19
4 1 4 19
3 2 3 18
2 3 2 18
1 3 1 18

Please refer to Brendan’s post for a full explanation, but notice the number of “bins” is actually a bind variable!

Unfortunately, this is not a very scalable solution: with 10,000 records, this query runs in about 30 seconds. My implementation with the 12C MATCH_RECOGNIZE clause has to hard-code the number of buckets, but it does 10,000 records in .2 seconds!

SELECT * from  items
  ORDER BY item_value desc
  MEASURES to_number(substr(classifier(),4)) bin#,
    sum(bin1.item_value) bin1,
    sum(bin2.item_value) bin2,
    sum(bin3.item_value) bin3
  PATTERN ((bin1|bin2|bin3)*)
  DEFINE bin1 AS count(bin1.*) = 1
    OR sum(bin1.item_value)-bin1.item_value
        <= least(sum(bin2.item_value), sum(bin3.item_value)),
    bin2 AS count(bin2.*) = 1
      OR sum(bin2.item_value)-bin2.item_value
        <= sum(bin3.item_value)
10 10 1 10
9 9 2 10 9
8 8 3 10 9 8
7 7 3 10 9 15
6 6 2 10 15 15
5 5 1 15 15 15
4 4 1 19 15 15
3 3 2 19 18 15
2 2 3 19 18 17
1 1 3 19 18 18

I’ve already mentioned the basic syntax of this clause in previous posts, but look at the PATTERN line. The vertical pipe means “OR” – I want a line that matches either “bin1″, “bin2″ or bin3″, and the first match wins.

The DEFINE clause takes some getting used to. The way it works is, you assume that the current row matches the first condition and you test the condition; if the condition is true then there really is a match and you move on to the next row. Otherwise you test the second condition, and so one until the last. In this case, I don’t have to define BIN3 at all: it’s always true if none of the previous matches work.

The BIN1 condition starts with count(bin1.*) = 1: this can only be true if no previous row has been assigned to BIN1. The other part of the condition compares the sums of BIN1, BIN2 and BIN3. If BIN1 has the smallest amount, it gets the current row. Now here’s the tricky part: if I just say sum(bin1.item_value), then that sum will include the row I am testing. What I really want is the sum so far, not including the row I am testing, so I have to subtract that value.

The BIN2 condition is simpler, because I already know that BIN1 contains more that BIN2.

Bins have a fixed capacity: use as few bins as possible

My first post on the MATCH_RECOGNIZE clause showed how to calculate running totals that start over before the total exceeds a certain value. This is a simple version of the “fixed capacity” problem.

I’m going to complicate things a bit by letting each bin have an elastic capacity. The normal capacity is 50, but in some cases I can go over the limit of 50, as long as I never go over 55.

  • If the current bin already has 50 or more, it is full and cannot receive a new item.
  • If the current bin has less than 50, I can add a new item as long as the new total is 55 or less.

I admit that I was hoping to solve this problem by using more advanced syntax in the PATTERN clause, such as “reluctant qualifiers”. I tried 4 or 5 different solutions and they all had bugs. The only solution that seems to work uses a simple PATTERN and a slightly more complex DEFINE. Anyway, here’s the code to create a test table with different boundary conditions.

drop table t cascade constraints purge;
create table t(cat_id number, item_id number, num_units number, incr number);

insert into t
select rownum cat_id, rownum item_id, num_units, incr from (
  select level incr from dual
  connect by level <= 6
) a
, lateral(
  select 51 - level num_units from dual
  connect by level <= a.incr+1
insert into t
select cat_id, rownum+100 item_id, incr num_units, incr from t
, lateral(
  select null n from dual
  connect by level <= (56 - t.num_units)/t.incr+1
insert into t values(100, 200, 41, 1);

select cat_id, item_id, num_units,
incr "Increment"
from t
where num_units > 40
order by cat_id, item_id;
1 1 50 1
2 2 49 1
3 3 50 2
4 4 49 2
5 5 48 2
6 6 50 3
7 7 49 3
8 8 48 3
9 9 47 3
10 10 50 4
11 11 49 4
12 12 48 4
13 13 47 4
14 14 46 4
15 15 50 5
16 16 49 5
17 17 48 5
18 18 47 5
19 19 46 5
20 20 45 5
21 21 50 6
22 22 49 6
23 23 48 6
24 24 47 6
25 25 46 6
26 26 45 6
27 27 44 6
100 200 41 1

Each category is a different test case. The first item in each category has a value between 44 and 50. The following items all have the same values, indicated in the “Increment” column. I add a last case at the end where I don’t reach the limit of 50.

Notice the use of the LATERAL clause, which is now documented in 12c. It’s a lot simpler than the old TABLE(CAST(MULTISET syntax. Now here is the solution; I only show the first bin in each category, plus the first item in the second bin. I’ve got nothing more to say, so don’t scroll to the bottom if you don’t feel like it :)

select cat_id, bin#, sum_units, num_units, item_id
from t
  partition by cat_id
  order by item_id
  measures final sum(num_units) sum_units,
  match_number() bin#, count(*) cnt
  all rows per match
  define A as sum(num_units) - num_units < 50
    and sum(num_units) <= 55
where bin# = 1 or cnt = 1;
1 1 50 50 1
1 2 7 1 101
2 1 50 49 2
2 1 50 1 108
2 2 7 1 109
3 1 50 50 3
3 2 8 2 116
4 1 51 49 4
4 1 51 2 120
4 2 6 2 121
5 1 50 48 5
5 1 50 2 124
5 2 8 2 125
6 1 50 50 6
6 2 9 3 129
7 1 52 49 7
7 1 52 3 132
7 2 6 3 133
8 1 51 48 8
8 1 51 3 135
8 2 6 3 136
9 1 50 47 9
9 1 50 3 138
9 2 9 3 139
10 1 50 50 10
10 2 8 4 142
11 1 53 49 11
11 1 53 4 144
11 2 4 4 145
12 1 52 48 12
12 1 52 4 146
12 2 8 4 147
13 1 51 47 13
13 1 51 4 149
13 2 8 4 150
14 1 50 46 14
14 1 50 4 152
14 2 8 4 153
15 1 50 50 15
15 2 10 5 155
16 1 54 49 16
16 1 54 5 157
16 2 5 5 158
17 1 53 48 17
17 1 53 5 159
17 2 5 5 160
18 1 52 47 18
18 1 52 5 161
18 2 5 5 162
19 1 51 46 19
19 1 51 5 163
19 2 10 5 164
20 1 50 45 20
20 1 50 5 166
20 2 10 5 167
21 1 50 50 21
21 2 12 6 169
22 1 55 49 22
22 1 55 6 171
22 2 6 6 172
23 1 54 48 23
23 1 54 6 173
23 2 6 6 174
24 1 53 47 24
24 1 53 6 175
24 2 6 6 176
25 1 52 46 25
25 1 52 6 177
25 2 6 6 178
26 1 51 45 26
26 1 51 6 179
26 2 6 6 180
27 1 50 44 27
27 1 50 6 181
27 2 12 6 182
100 1 41 41 200