#DOAG2016: Ranges, Ranges Everywhere!

Today I did my second presentation at DOAG2016. It was at 9:00 so I got to sleep in ;)

The room was huge but there were enough people that I didn’t feel too lonely.

The room and the technical help were top notch, and again there were questions at just the right time to remind me of things I might have left out!

[Update 2016-11-20: there was a bug on slide 21 ! Instead of “order by end_n” it should be “order by start_n”. I have updated the slide on the DOAG website and on slideshare. My test cases were not thorough; my apologies…]

As promised, I put the presentation on Slideshare. Here is the link:

I will be blogging about some of the content later on, so stay tuned…

As always, please download the slides and play them in Powerpoint so you can see the animations.

Thanks to DOAG for the invite!

#DOAG2016: Advanced Row Pattern Matching

DOAG2016 started today at 8:30, and I did too. There were so many great presentations at the same time as mine, I was surprised and pleased to get a nice audience.

The room and the technical help were top notch, and the questions came at just the right time to remind me of things I might have left out!

As promised, I put the presentation on Slideshare. Here is the link:

http://www.slideshare.net/StewAshton1/advanced-row-pattern-matching

I will be blogging about some of the content later on, so stay tuned…

If you want to start with an “unadvanced” presentation, go here first:

http://www.slideshare.net/StewAshton1/row-pattern-matching-in-oracle-database-12c

As always, please download the slides and play them in Powerpoint so you can see the animations.

Thanks to DOAG for the invite!

Kim Berg Hansen on “Use Cases of Row Pattern Matching in Oracle 12c”

As I write this, I am listening to Kim Berg Hansen explain the MATCH_RECOGNIZE clause. He was kind enough to give me credit for some of the examples and mention this blog.

In addition to my blog posts on the subject, you may enjoy my presentation on SlideShare. Please download it to see the animations!

If you have questions arising from the presentation, please add a comment here.

Bravo to Kim for his remarkably interactive webinar!

Joining Temporal Tables 3: Gaps

This is my third post about temporal tables that don’t allow overlaps. I’m going to add support for temporal gaps, which will also allow objects to stop existing at some point.

Test Data

I’m going to add a NOT NULL numeric field called START_GAP, which can be 0 or 1. When START_GAP = 0, the row contains data that is valid starting at FROM_DATE. When START_GAP = 1, then FROM_DATE is actually the end date of the previous row.

The data can now contain all the relations between time intervals that I listed previously.

alter session set nls_date_format='dd';
drop table a purge;

create table a (
  case_id varchar2(32),
  from_date date,
  start_gap number(1,0) not null check(start_gap in (0,1)),
  a_string varchar2(32),
  primary key(case_id, from_date)
);
Insert into A values ('01-precedes',to_date('2016-01-01','yyyy-mm-dd'),0,'A01-precedes');
Insert into A values ('01-precedes',to_date('2016-01-02','yyyy-mm-dd'),1,'A01-precedes');
Insert into A values ('02-meets',to_date('2016-01-01','yyyy-mm-dd'),0,'A02-meets');
Insert into A values ('02-meets',to_date('2016-01-02','yyyy-mm-dd'),1,'A02-meets');
Insert into A values ('03-overlaps',to_date('2016-01-01','yyyy-mm-dd'),0,'A03-overlaps');
Insert into A values ('03-overlaps',to_date('2016-01-03','yyyy-mm-dd'),1,'A03-overlaps');
Insert into A values ('04-finished by',to_date('2016-01-01','yyyy-mm-dd'),0,'A04-finished by');
Insert into A values ('04-finished by',to_date('2016-01-03','yyyy-mm-dd'),1,'A04-finished by');
Insert into A values ('05-contains',to_date('2016-01-01','yyyy-mm-dd'),0,'A05-contains');
Insert into A values ('05-contains',to_date('2016-01-04','yyyy-mm-dd'),1,'A05-contains');
Insert into A values ('06-starts',to_date('2016-01-01','yyyy-mm-dd'),0,'A06-starts');
Insert into A values ('06-starts',to_date('2016-01-02','yyyy-mm-dd'),1,'A06-starts');
Insert into A values ('07-equals',to_date('2016-01-01','yyyy-mm-dd'),0,'A07-equals');
Insert into A values ('07-equals',to_date('2016-01-02','yyyy-mm-dd'),1,'A07-equals');

drop table b purge;

create table b (
  case_id varchar2(32),
  from_date date,
  start_gap number(1,0) not null check(start_gap in (0,1)),
  b_string varchar2(32),
  primary key(case_id, from_date)
);

Insert into B values ('01-precedes',to_date('2016-01-03','yyyy-mm-dd'),0,'B01-precedes');
Insert into B values ('01-precedes',to_date('2016-01-04','yyyy-mm-dd'),1,'B01-precedes');
Insert into B values ('02-meets',to_date('2016-01-02','yyyy-mm-dd'),0,'B02-meets');
Insert into B values ('02-meets',to_date('2016-01-03','yyyy-mm-dd'),1,'B02-meets');
Insert into B values ('03-overlaps',to_date('2016-01-02','yyyy-mm-dd'),0,'B03-overlaps');
Insert into B values ('03-overlaps',to_date('2016-01-04','yyyy-mm-dd'),1,'B03-overlaps');
Insert into B values ('04-finished by',to_date('2016-01-02','yyyy-mm-dd'),0,'B04-finished by');
Insert into B values ('04-finished by',to_date('2016-01-03','yyyy-mm-dd'),1,'B04-finished by');
Insert into B values ('05-contains',to_date('2016-01-02','yyyy-mm-dd'),0,'B05-contains');
Insert into B values ('05-contains',to_date('2016-01-03','yyyy-mm-dd'),1,'B05-contains');
Insert into B values ('06-starts',to_date('2016-01-01','yyyy-mm-dd'),0,'B06-starts');
Insert into B values ('06-starts',to_date('2016-01-03','yyyy-mm-dd'),1,'B06-starts');
Insert into B values ('07-equals',to_date('2016-01-01','yyyy-mm-dd'),0,'B07-equals');
Insert into B values ('07-equals',to_date('2016-01-02','yyyy-mm-dd'),1,'B07-equals');
commit;

select * from a
union all
select * from b
order by 1,2,4;
CASE_ID FROM_DATE START_GAP A_STRING or B_STRING
01-precedes 01 0 A01-precedes
01-precedes 02 1 A01-precedes
01-precedes 03 0 B01-precedes
01-precedes 04 1 B01-precedes
02-meets 01 0 A02-meets
02-meets 02 1 A02-meets
02-meets 02 0 B02-meets
02-meets 03 1 B02-meets
03-overlaps 01 0 A03-overlaps
03-overlaps 02 0 B03-overlaps
03-overlaps 03 1 A03-overlaps
03-overlaps 04 1 B03-overlaps
04-finished by 01 0 A04-finished by
04-finished by 02 0 B04-finished by
04-finished by 03 1 A04-finished by
04-finished by 03 1 B04-finished by
05-contains 01 0 A05-contains
05-contains 02 0 B05-contains
05-contains 03 1 B05-contains
05-contains 04 1 A05-contains
06-starts 01 0 A06-starts
06-starts 01 0 B06-starts
06-starts 02 1 A06-starts
06-starts 03 1 B06-starts
07-equals 01 0 A07-equals
07-equals 01 0 B07-equals
07-equals 02 1 A07-equals
07-equals 02 1 B07-equals

 

What’s the idea?

As before, we need to carry down values and eliminate certain rows. How do we manage that for the new rows where START_GAP = 1?

  1. For table A, if the most recent START_GAP is 1 then A_STRING should be NULL. Same for B.
  2. If the most recent START_GAP is 1 for both tables, then that row should be eliminated.
  3. If the first row is from A, then B.START_GAP will be NULL. In that case B.START_GAP should be made = 1, since the B object did not exist at that point.
  4. As before, TO_DATE must be NULL or greater than FROM_DATE, otherwise the row is eliminated.

The MATCH_RECOGNIZE solution

Note that I don’t create a ROWTYPE column anymore: since START_GAP is defined as NOT NULL, I can use A.START_GAP to determine whether the current row comes from table A.

select case_id, from_date, to_date, a_str a_string, b_str b_string
from (
  select case_id, from_date, start_gap agap, a_string,
    null bgap, null b_string
  from a
  union all
  select case_id, from_date, null, null, start_gap bgap, b_string
  from b
)
match_recognize(
  partition by case_id order by from_date
  measures next(from_date) to_date,
    nvl(a.agap,1) + nvl(b.bgap,1) abgap,
    case a.agap when 0 then a.a_string end a_str,
    case b.bgap when 0 then b.b_string end b_str
  all rows per match
  pattern( (a|b)+ )
  define a as agap is not null
)
where (to_date > from_date or to_date is null)
  and abgap < 2;
CASE_ID FROM_DATE TO_DATE A_STRING B_STRING
01-precedes 01 02 A01-precedes
01-precedes 03 04 B01-precedes
02-meets 01 02 A02-meets
02-meets 02 03 B02-meets
03-overlaps 01 02 A03-overlaps
03-overlaps 02 03 A03-overlaps B03-overlaps
03-overlaps 03 04 B03-overlaps
04-finished by 01 02 A04-finished by
04-finished by 02 03 A04-finished by B04-finished by
05-contains 01 02 A05-contains
05-contains 02 03 A05-contains B05-contains
05-contains 03 04 A05-contains
06-starts 01 02 A06-starts B06-starts
06-starts 02 03 B06-starts
07-equals 01 02 A07-equals B07-equals

 

The Analytic solution

This one took me some time to figure out. I broke it down into 3 logical steps using WITH clauses.

  • ALL_ROWS just does the UNION ALL, while transforming NULL values to CHR(0)
  • ANALYTIC_ROWS applies all the analytic functions
  • The main SELECT transforms CHR(0) to NULL, makes a value NULL if the most recent START_GAP = 1, and eliminates excess rows.
with all_rows as (
  select case_id, from_date, start_gap agap, nvl(a_string, chr(0)) a_string,
    null bgap, null b_string
  from a
  union all
  select case_id, from_date, null, null,
    start_gap bgap, nvl(b_string, chr(0)) b_string
  from b
)
, analyzed_rows as (
  select case_id, from_date,
    lead(from_date) over(partition by case_id order by from_date) to_date,
    last_value(agap) ignore nulls
      over(partition by case_id order by from_date) agap,
    last_value(bgap) ignore nulls
      over(partition by case_id order by from_date) bgap,
    last_value(a_string) ignore nulls
      over(partition by case_id order by from_date) a_string,
    last_value(b_string) ignore nulls
      over(partition by case_id order by from_date) b_string
  from all_rows
)
select case_id, from_date, to_date,
  case when agap = 0 then nullif(a_string, chr(0)) end a_string,
  case when bgap = 0 then nullif(b_string, chr(0)) end b_string
from analyzed_rows
where (to_date > from_date or to_date is null)
  and nvl(agap,1) + nvl(bgap,1) < 2;

There’s more?

Yes indeed, I’m not through yet.

At this point I can define temporal tables that allow gaps but not overlaps. The attributes can be NULL but FROM_DATE can’t. The TO_DATE is not stored in the same row, but it can be displayed that way. I can also “join” two such tables in an efficient manner.

I only have one problem: your tables probably aren’t built that way. You have FROM_DATE and TO_DATE in each row, and if you say you don’t have overlapping ranges then you want me to believe you.

In my next post I’ll try to “join” tables that store FROM_DATE and TO_DATE, assuming there are no overlaps.

Summarize Data by Range

At the UKOUG Tech15 conference, Jonathan Lewis presented a question asked on OTN and the answer published on his blog. The problem is how to count the number of rows in one table that fall into the ranges stored in another table.

The straightforward solution in SQL is to join the two tables using a BETWEEN condition, but that involves comparing each row in one table to every row in the other table. With the questioner’s data, this solution takes seven hours.

Jonathan’s solution requires more than one SQL statement, but it can answer the question in a few minutes. Please refer to his blog for details about the problem statement and his solution.

If you have database version 12c or later, the MATCH_RECOGNIZE clause can help solve this problem directly. I’ll present two variants:

  • the first assumes that the ranges cannot overlap,
  • and the second handles any ranges, including overlaps.

Ranges without overlaps

Here are some test data using Jonathan’s table and column names:

define d_sqrt_rows = 10

create table msisdns as
with data as (
  select null from dual
  connect by level <= &d_sqrt_rows
)
select rownum msisdn
from data, data
where rownum < power(&d_sqrt_rows,2);

create table number_ranges as
select level + &d_sqrt_rows * (level - 1) from_number,
&d_sqrt_rows * level to_number
from dual
connect by level <= &d_sqrt_rows;

select * from number_ranges;
FROM_NUMBER TO_NUMBER
1 10
12 20
23 30
34 40
45 50
56 60
67 70
78 80
89 90
100 100

 

I start by combining the ranges and the individual rows in one dataset:

select from_number, to_number from number_ranges
union all
select msisdn, null from msisdns

Using the MATCH_RECOGNIZE clause, I can:

  • Order the data by from_number
  • then process the data in that order:
    • Assign the label “A” to the first range
    • Assign the label “B” to all the following individual rows that fall within the range
    • Return one row with the range from “A” and the count of “B” rows
    • Then go on to the next range, skipping any individual rows that fall after the first range and before the second.
select * from (
  select from_number, to_number from number_ranges
  union all
  select msisdn, null from msisdns
)
match_recognize(
  order by from_number, to_number 
  measures a.from_number from_number,
           a.to_number to_number,
           count(b.*) range_count
  pattern(a b*)
  define a as to_number is not null,
         b as from_number <= a.to_number
);
FROM_NUMBER TO_NUMBER RANGE_COUNT
1 10 10
12 20 9
23 30 8
34 40 7
45 50 6
56 60 5
67 70 4
78 80 3
89 90 2
100 100 0

 

Yes, the last RANGE_COUNT should be 0 because I only loaded the first 99 individual rows.

Ranges with overlaps

To solve this, I need to use MATCH_RECOGNIZE twice! This is fun for me, though it may not be for you…

First, I’ll update the number_range table to create overlaps:

update number_ranges set to_number = to_number+10;

select * from number_ranges;
FROM_NUMBER TO_NUMBER
1 20
12 30
23 40
34 50
45 60
56 70
67 80
78 90
89 100
100 110

 

Overlapping ranges require us to count some individual rows more than once, if they show up in more than one range. One way to handle this is to break down the ranges into what I call “base ranges” every time there is an intersection. For example, for ranges 1-4 and 3-6:

  • the “base ranges” are 1-2, 3-4 and 5-6
  • and the rows in range 3-4 must be counted both in range 1-4 and in range 3-6.

I identify the “base ranges” by putting both “from” and “to” numbers in the first column so I can sort them together:

select from_number, 0 rowtype from number_ranges
union all
select msisdn, 1 from msisdns
union all
select to_number, 2 rowtype from number_ranges

Once I order by from_number and rowtype, all the individual rows will be sandwiched between “from” or “to” numbers having a rowtype other than 1. So, I just return one row per “sandwich”. Notice that after each match I start from the last row of the previous match (“after match skip to last a”): the bottom slice of one sandwich becomes the top slice of the next sandwich.

with ranges_plus_data as (
  select from_number num, 0 rowtype from number_ranges
  union all
  select msisdn, 1 from msisdns
  union all
  select to_number, 2 rowtype from number_ranges
)
select * from ranges_plus_data
match_recognize(
  order by num, rowtype 
  measures first(num) from_number,
           last(num) to_number,
           count(b.*) range_count
  after match skip to last a
  pattern(a b+ a)
  define a as rowtype !=1,
         b as rowtype = 1
);
FROM_NUMBER TO_NUMBER RANGE_COUNT
1 12 11
12 20 9
20 23 2
23 30 8
30 34 3
34 40 7
40 45 4
45 50 6
50 56 5
56 60 5
60 67 6
67 70 4
70 78 7
78 80 3
80 89 8
89 90 2
90 100 9

 

Now I need to match these base ranges with each original range.

  • I combine original ranges and base ranges, then sort them.
  • Then I start from each original range and sum all the range_counts of the included base ranges.
  • To make sure I don’t skip an original range, I start looking for the next match one row after the beginning of the previous match (“after match skip to next row”).
with ranges_plus_data as (
  select from_number num, 0 rowtype from number_ranges
  union all
  select msisdn, 1 from msisdns
  union all
  select to_number, 2 rowtype from number_ranges
)
, base_range_sums as (
  select * from ranges_plus_data
  match_recognize(
    order by num, rowtype 
    measures first(num) from_number,
             last(num) to_number,
             count(b.*) range_count
    after match skip to last a
    pattern(a b+ a)
    define a as rowtype !=1,
           b as rowtype = 1
  )
)
select from_number, to_number, range_count
from (
  select * from base_range_sums
  union all
  select from_number, to_number, 0 from number_ranges
)
match_recognize(
  order by from_number, range_count, to_number
  measures first(from_number) from_number,
           first(to_number) to_number,
           sum(range_count) range_count
  after match skip to next row
  pattern(a (a|b)*)
  define a as range_count = 0,
         b as to_number <= first(to_number)
);
FROM_NUMBER TO_NUMBER RANGE_COUNT
1 20 20
12 30 19
23 40 18
34 50 17
45 60 16
56 70 15
67 80 14
78 90 13
89 100 11
100 110 0

With 40 million rows and 1000 ranges, Jonathan’s solution ran in less than two minutes. This solution runs in a little over a minute; it is not necessarily “better” but it is a reasonable alternative for a shop with 12c and a SQL developer who is familiar with MATCH_RECOGNIZE.