Joining Temporal Tables 4: ranges

If you have temporal tables with date ranges, and you are sure those date ranges cannot overlap, then you can join them using the technique in my previous post – with a few important changes!

Here’s a sample table with date ranges:

create table a_with_ranges (
  case_id varchar2(32),
  from_date date,
  to_date date not null,
  a_string varchar2(32)
);

Insert into a_with_ranges values
  ('01-precedes', date '2016-01-01', date '2016-01-02','First value');
Insert into a_with_ranges values
  ('01-precedes', date '2016-01-03', date '2016-01-04','Second value');

select * from a_with_ranges order by 1,2,3;
CASE_ID FROM_DATE TO_DATE A_STRING
01-precedes 2016-01-01 2016-01-02 First value
01-precedes 2016-01-03 2016-01-04 Second value

 

For my previous technique to work, we need data that looks like this:

CASE_ID FROM_DATE START_GAP A_STRING
01-precedes 2016-01-01 0 First value
01-precedes 2016-01-02 1
01-precedes 2016-01-03 0 Second value
01-precedes 2016-01-04 1

 

Basically, every line in the table becomes two lines: the first line has the FROM_DATE value; the second has the TO_DATE value and it “starts a gap”.

We can easily get this result with a UNION ALL:

select case_id, from_date, 0 start_gap, a_string
from a_with_ranges
union all
select case_id, to_date, 1, null
from a_with_ranges
order by 1,2,3;

I don’t like this code, because it scans the table twice.

If we have at least version 11 of the Oracle database, we can scan the table just once and use UNPIVOT to create the extra line:

select case_id, from_date, start_gap, a_string
from (
  select case_id, from_date, to_date,
    0 start_gap, 1 start_gap_b,
    a_string, null a_string_b
  from a_with_ranges
)
unpivot(
  (from_date, start_gap, a_string)
  for col in (
    (from_date, start_gap, a_string),
    (to_date, start_gap_b, a_string_b)
  )
)
order by 1,2;

You can integrate this with my “join” techniques by making one complicated SELECT, or you can create a VIEW based on the above code and just use my previous technique as is.

Oops! Not quite…

You may remember that in my previous post, my tables did not always have two rows per “virtual range”. The same row could be the start of one range and the end of the preceding range. I could not have two rows with the same CASE_ID and FROM_DATE because of the primary key.

Here, when I double the rows, I may wind up with unnecessary rows where START_GAP = 1 , in which case there will be more than one row with the same CASE_ID and FROM_DATE. Here is an example:

delete from a_with_ranges;

Insert into a_with_ranges values
  ('02-meets', date '2016-01-01', date '2016-01-03','First value');
Insert into a_with_ranges values
  ('02-meets', date '2016-01-03', date '2016-01-04','Second value');

select case_id, from_date, start_gap, a_string
from (
  select case_id, from_date, to_date,
    0 start_gap, 1 start_gap_b,
    a_string, null a_string_b
  from a_with_ranges
)
unpivot(
  (from_date, start_gap, a_string)
  for col in (
    (from_date, start_gap, a_string),
    (to_date, start_gap_b, a_string_b)
  )
)
order by 1,2,3;
CASE_ID FROM_DATE START_GAP A_STRING
02-meets 2016-01-01 0 First value
02-meets 2016-01-03 0 Second value
02-meets 2016-01-03 1
02-meets 2016-01-04 1

 

Man, that is all messed up! Not only is row 3 unnecessary, but it’s telling us that the previous row stopped at the same time it started!

Fortunately, we don’t have to do any extra work to get rid of that row. All we need to do is adjust the order of the rows in the previous techniques: ORDER BY FROM_DATE, START_GAP DESCENDING. Now that extra row will end a range and the next row will start a range, whether the FROM_DATEs are the same or not.

The adjusted solutions

create or replace view va as
select case_id, from_date, start_gap, a_string
from (
  select case_id, from_date, to_date,
    0 start_gap, 1 start_gap_b,
    a_string, null a_string_b
  from a_with_ranges
)
unpivot(
  (from_date, start_gap, a_string)
  for col in (
    (from_date, start_gap, a_string),
    (to_date, start_gap_b, a_string_b)
  )
);

drop table b_with_ranges purge;

create table b_with_ranges as
select case_id,
min(from_date) from_date,
max(to_date) to_date,
'B value' b_string
from a_with_ranges
group by case_id;

create or replace view vb as
select case_id, from_date, start_gap, b_string
from (
  select case_id, from_date, to_date,
    0 start_gap, 1 start_gap_b,
    b_string, null b_string_b
  from b_with_ranges
)
unpivot(
  (from_date, start_gap, b_string)
  for col in (
    (from_date, start_gap, b_string),
    (to_date, start_gap_b, b_string_b)
  )
);

select * from vb;
CASE_ID FROM_DATE START_GAP B_STRING
02-meets 01 0 B value
02-meets 04 1
select case_id, from_date, to_date, a_str a_string, b_str b_string
from (
  select case_id, from_date, start_gap, start_gap agap, a_string,
    null bgap, null b_string
  from va
  union all
  select case_id, from_date, start_gap, null, null, start_gap bgap, b_string
  from vb
)
match_recognize(
  partition by case_id order by from_date, start_gap desc
  measures next(from_date) to_date,
    nvl(a.agap,1) + nvl(b.bgap,1) abgap,
    case a.agap when 0 then a.a_string end a_str,
    case b.bgap when 0 then b.b_string end b_str
  all rows per match
  pattern( (a|b)+ )
  define a as agap is not null
)
where (to_date > from_date or to_date is null)
  and abgap < 2; 
 with all_rows as (   select case_id, from_date, start_gap,     start_gap agap, nvl(a_string, chr(0)) a_string,     null bgap, null b_string   from va   union all   select case_id, from_date, start_gap,     null, null,     start_gap bgap, nvl(b_string, chr(0)) b_string   from vb ) , analyzed_rows as (   select case_id, from_date,     lead(from_date) over(partition by case_id order by from_date, start_gap desc) to_date,     last_value(agap) ignore nulls       over(partition by case_id order by from_date, start_gap desc) agap,     last_value(bgap) ignore nulls       over(partition by case_id order by from_date, start_gap desc) bgap,     last_value(a_string) ignore nulls       over(partition by case_id order by from_date, start_gap desc) a_string,     last_value(b_string) ignore nulls       over(partition by case_id order by from_date, start_gap desc) b_string   from all_rows ) select case_id, from_date, to_date,   case when agap = 0 then nullif(a_string, chr(0)) end a_string,   case when bgap = 0 then nullif(b_string, chr(0)) end b_string from analyzed_rows where (to_date > from_date or to_date is null)
  and nvl(agap,1) + nvl(bgap,1) < 2;
CASE_ID FROM_DATE TO_DATE A_STRING B_STRING
02-meets 01 03 First value B value
02-meets 03 04 Second value B value

 

Conclusion

This series of posts has focused on temporal tables that should not allow overlaps. The easiest way to avoid overlaps is for each row to store a date, not a range. A flag says whether the date is the start of a range or the start of a gap. When there is no gap, the end of a range is simply the date in the following row.

Whether you use this design or store date ranges, the above solutions will allow you to “join” two temporal tables – as long as there really are no overlaps.

Advertisement

Joining Temporal Tables 3: Gaps

This is my third post about temporal tables that don’t allow overlaps. I’m going to add support for temporal gaps, which will also allow objects to stop existing at some point.

Test Data

I’m going to add a NOT NULL numeric field called START_GAP, which can be 0 or 1. When START_GAP = 0, the row contains data that is valid starting at FROM_DATE. When START_GAP = 1, then FROM_DATE is actually the end date of the previous row.

The data can now contain all the relations between time intervals that I listed previously.

alter session set nls_date_format='dd';
drop table a purge;

create table a (
  case_id varchar2(32),
  from_date date,
  start_gap number(1,0) not null check(start_gap in (0,1)),
  a_string varchar2(32),
  primary key(case_id, from_date)
);
Insert into A values ('01-precedes',to_date('2016-01-01','yyyy-mm-dd'),0,'A01-precedes');
Insert into A values ('01-precedes',to_date('2016-01-02','yyyy-mm-dd'),1,'A01-precedes');
Insert into A values ('02-meets',to_date('2016-01-01','yyyy-mm-dd'),0,'A02-meets');
Insert into A values ('02-meets',to_date('2016-01-02','yyyy-mm-dd'),1,'A02-meets');
Insert into A values ('03-overlaps',to_date('2016-01-01','yyyy-mm-dd'),0,'A03-overlaps');
Insert into A values ('03-overlaps',to_date('2016-01-03','yyyy-mm-dd'),1,'A03-overlaps');
Insert into A values ('04-finished by',to_date('2016-01-01','yyyy-mm-dd'),0,'A04-finished by');
Insert into A values ('04-finished by',to_date('2016-01-03','yyyy-mm-dd'),1,'A04-finished by');
Insert into A values ('05-contains',to_date('2016-01-01','yyyy-mm-dd'),0,'A05-contains');
Insert into A values ('05-contains',to_date('2016-01-04','yyyy-mm-dd'),1,'A05-contains');
Insert into A values ('06-starts',to_date('2016-01-01','yyyy-mm-dd'),0,'A06-starts');
Insert into A values ('06-starts',to_date('2016-01-02','yyyy-mm-dd'),1,'A06-starts');
Insert into A values ('07-equals',to_date('2016-01-01','yyyy-mm-dd'),0,'A07-equals');
Insert into A values ('07-equals',to_date('2016-01-02','yyyy-mm-dd'),1,'A07-equals');

drop table b purge;

create table b (
  case_id varchar2(32),
  from_date date,
  start_gap number(1,0) not null check(start_gap in (0,1)),
  b_string varchar2(32),
  primary key(case_id, from_date)
);

Insert into B values ('01-precedes',to_date('2016-01-03','yyyy-mm-dd'),0,'B01-precedes');
Insert into B values ('01-precedes',to_date('2016-01-04','yyyy-mm-dd'),1,'B01-precedes');
Insert into B values ('02-meets',to_date('2016-01-02','yyyy-mm-dd'),0,'B02-meets');
Insert into B values ('02-meets',to_date('2016-01-03','yyyy-mm-dd'),1,'B02-meets');
Insert into B values ('03-overlaps',to_date('2016-01-02','yyyy-mm-dd'),0,'B03-overlaps');
Insert into B values ('03-overlaps',to_date('2016-01-04','yyyy-mm-dd'),1,'B03-overlaps');
Insert into B values ('04-finished by',to_date('2016-01-02','yyyy-mm-dd'),0,'B04-finished by');
Insert into B values ('04-finished by',to_date('2016-01-03','yyyy-mm-dd'),1,'B04-finished by');
Insert into B values ('05-contains',to_date('2016-01-02','yyyy-mm-dd'),0,'B05-contains');
Insert into B values ('05-contains',to_date('2016-01-03','yyyy-mm-dd'),1,'B05-contains');
Insert into B values ('06-starts',to_date('2016-01-01','yyyy-mm-dd'),0,'B06-starts');
Insert into B values ('06-starts',to_date('2016-01-03','yyyy-mm-dd'),1,'B06-starts');
Insert into B values ('07-equals',to_date('2016-01-01','yyyy-mm-dd'),0,'B07-equals');
Insert into B values ('07-equals',to_date('2016-01-02','yyyy-mm-dd'),1,'B07-equals');
commit;

select * from a
union all
select * from b
order by 1,2,4;
CASE_ID FROM_DATE START_GAP A_STRING or B_STRING
01-precedes 01 0 A01-precedes
01-precedes 02 1 A01-precedes
01-precedes 03 0 B01-precedes
01-precedes 04 1 B01-precedes
02-meets 01 0 A02-meets
02-meets 02 1 A02-meets
02-meets 02 0 B02-meets
02-meets 03 1 B02-meets
03-overlaps 01 0 A03-overlaps
03-overlaps 02 0 B03-overlaps
03-overlaps 03 1 A03-overlaps
03-overlaps 04 1 B03-overlaps
04-finished by 01 0 A04-finished by
04-finished by 02 0 B04-finished by
04-finished by 03 1 A04-finished by
04-finished by 03 1 B04-finished by
05-contains 01 0 A05-contains
05-contains 02 0 B05-contains
05-contains 03 1 B05-contains
05-contains 04 1 A05-contains
06-starts 01 0 A06-starts
06-starts 01 0 B06-starts
06-starts 02 1 A06-starts
06-starts 03 1 B06-starts
07-equals 01 0 A07-equals
07-equals 01 0 B07-equals
07-equals 02 1 A07-equals
07-equals 02 1 B07-equals

 

What’s the idea?

As before, we need to carry down values and eliminate certain rows. How do we manage that for the new rows where START_GAP = 1?

  1. For table A, if the most recent START_GAP is 1 then A_STRING should be NULL. Same for B.
  2. If the most recent START_GAP is 1 for both tables, then that row should be eliminated.
  3. If the first row is from A, then B.START_GAP will be NULL. In that case B.START_GAP should be made = 1, since the B object did not exist at that point.
  4. As before, TO_DATE must be NULL or greater than FROM_DATE, otherwise the row is eliminated.

The MATCH_RECOGNIZE solution

Note that I don’t create a ROWTYPE column anymore: since START_GAP is defined as NOT NULL, I can use A.START_GAP to determine whether the current row comes from table A.

select case_id, from_date, to_date, a_str a_string, b_str b_string
from (
  select case_id, from_date, start_gap agap, a_string,
    null bgap, null b_string
  from a
  union all
  select case_id, from_date, null, null, start_gap bgap, b_string
  from b
)
match_recognize(
  partition by case_id order by from_date
  measures next(from_date) to_date,
    nvl(a.agap,1) + nvl(b.bgap,1) abgap,
    case a.agap when 0 then a.a_string end a_str,
    case b.bgap when 0 then b.b_string end b_str
  all rows per match
  pattern( (a|b)+ )
  define a as agap is not null
)
where (to_date > from_date or to_date is null)
  and abgap < 2;
CASE_ID FROM_DATE TO_DATE A_STRING B_STRING
01-precedes 01 02 A01-precedes
01-precedes 03 04 B01-precedes
02-meets 01 02 A02-meets
02-meets 02 03 B02-meets
03-overlaps 01 02 A03-overlaps
03-overlaps 02 03 A03-overlaps B03-overlaps
03-overlaps 03 04 B03-overlaps
04-finished by 01 02 A04-finished by
04-finished by 02 03 A04-finished by B04-finished by
05-contains 01 02 A05-contains
05-contains 02 03 A05-contains B05-contains
05-contains 03 04 A05-contains
06-starts 01 02 A06-starts B06-starts
06-starts 02 03 B06-starts
07-equals 01 02 A07-equals B07-equals

 

The Analytic solution

This one took me some time to figure out. I broke it down into 3 logical steps using WITH clauses.

  • ALL_ROWS just does the UNION ALL, while transforming NULL values to CHR(0)
  • ANALYTIC_ROWS applies all the analytic functions
  • The main SELECT transforms CHR(0) to NULL, makes a value NULL if the most recent START_GAP = 1, and eliminates excess rows.
with all_rows as (
  select case_id, from_date, start_gap agap, nvl(a_string, chr(0)) a_string,
    null bgap, null b_string
  from a
  union all
  select case_id, from_date, null, null,
    start_gap bgap, nvl(b_string, chr(0)) b_string
  from b
)
, analyzed_rows as (
  select case_id, from_date,
    lead(from_date) over(partition by case_id order by from_date) to_date,
    last_value(agap) ignore nulls
      over(partition by case_id order by from_date) agap,
    last_value(bgap) ignore nulls
      over(partition by case_id order by from_date) bgap,
    last_value(a_string) ignore nulls
      over(partition by case_id order by from_date) a_string,
    last_value(b_string) ignore nulls
      over(partition by case_id order by from_date) b_string
  from all_rows
)
select case_id, from_date, to_date,
  case when agap = 0 then nullif(a_string, chr(0)) end a_string,
  case when bgap = 0 then nullif(b_string, chr(0)) end b_string
from analyzed_rows
where (to_date > from_date or to_date is null)
  and nvl(agap,1) + nvl(bgap,1) < 2;

There’s more?

Yes indeed, I’m not through yet.

At this point I can define temporal tables that allow gaps but not overlaps. The attributes can be NULL but FROM_DATE can’t. The TO_DATE is not stored in the same row, but it can be displayed that way. I can also “join” two such tables in an efficient manner.

I only have one problem: your tables probably aren’t built that way. You have FROM_DATE and TO_DATE in each row, and if you say you don’t have overlapping ranges then you want me to believe you.

In my next post I’ll try to “join” tables that store FROM_DATE and TO_DATE, assuming there are no overlaps.

Joining Temporal Tables 2: handling nulls

In my previous post, I showed a way to implement and join temporal tables when there are no gaps, overlaps or null values. Let’s see what happens when I allow null values.

[Update 2016-01-16: for performance reasons, now using UNION ALL instead of FULL JOIN.]

(This series of posts is inspired by a discussion with Richard Smith that started at UKOUG Tech 15. Thanks Richard!)

Test data

I’m going to recreate the same tables as in my previous post, except that A_STRING and B_STRING are null in the second row of each table.

drop table a purge;

create table a (
  case_id varchar2(32),
  from_date date,
  a_string varchar2(32) NULL,
  primary key(case_id, from_date)
);

insert into a
select 'Meets only, nulls' case_id,
  trunc(sysdate, 'MM') - 1 + level*level from_date,
  case when mod(level,2) = 1 then 'Row ' || level end a_string
from dual
connect by level <= 3;

select case_id, from_date,
lead(from_date) over (partition by case_id order by from_date) to_date,
a_string
from a;
CASE_ID FROM_DATE TO_DATE A_STRING
Meets only, nulls 2016-01-01 2016-01-04 Row 1
Meets only, nulls 2016-01-04 2016-01-09
Meets only, nulls 2016-01-09 Row 3
drop table b purge;

create table b (
  case_id varchar2(32),
  from_date date,
  b_string varchar2(32) NULL,
  primary key(case_id, from_date)
);

insert into b
select 'Meets only, nulls' case_id,
  trunc(sysdate, 'MM') - 1 + level*2 from_date,
  case when mod(level,2) = 1 then 'Row ' || level end b_string
from dual
connect by level <= 3;

select case_id, from_date,
lead(from_date) over (partition by case_id order by from_date) to_date,
b_string
from b;
CASE_ID FROM_DATE TO_DATE B_STRING
Meets only, nulls 2016-01-02 2016-01-04 Row 1
Meets only, nulls 2016-01-04 2016-01-06
Meets only, nulls 2016-01-06 Row 3

 

Joining with MATCH_RECOGNIZE

This solution doesn’t care about A_STRING or B_STRING being null, so it works just as well now as it did before.

select case_id, from_date, to_date, astr a_string, bstr b_string
from (
  select 'A' rowtype, case_id, from_date, a_string, null b_string from a
  union all
  select 'B', case_id, from_date, null, b_string from b
)
match_recognize(
  partition by case_id order by from_date
  measures next(from_date) to_date, a.a_string astr, b.b_string bstr
  all rows per match
  pattern ( (a|b)+ )
  define a as rowtype = 'A'
)
where to_date > from_date or to_date is null;
CASE_ID FROM_DATE TO_DATE A_STRING B_STRING
Meets only, nulls 2016-01-01 2016-01-02 Row 1
Meets only, nulls 2016-01-02 2016-01-04 Row 1 Row 1
Meets only, nulls 2016-01-04 2016-01-06
Meets only, nulls 2016-01-06 2016-01-09 Row 3
Meets only, nulls 2016-01-09 Row 3 Row 3

 

Joining with Analytics

My analytic solution used the IGNORE NULLS option – and that’s exactly what I can’t do now!

Previously, a NULL in A_STRING just meant this was a B row. Now it can also mean an A row where A_STRING really is NULL.

Since NULL can mean two different things, how do I distinguish the two meanings? The only way I know is to keep NULL when it’s a B row and use something else to stand for a NULL value in an A row.

  • Before the analytic function, use NVL() to substitute some “impossible” value for NULL
    (see lines 15 and 18)
  • During the analytic function, use IGNORE NULLs as before, since NULL now has only one meaning
  • After the analytic function, use NULLIF() to turn the “impossible” value back to NULL
    (see lines 4 through 13).
select * from (
  select case_id, from_date,
    lead(from_date) over(partition by case_id order by from_date) to_date,
    nullif(
      last_value(a_string) ignore nulls
        over(partition by case_id order by from_date),
      chr(0)
    ) a_string,
    nullif(
      last_value(b_string) ignore nulls
        over(partition by case_id order by from_date),
      chr(0)
    ) b_string
  from (
    select case_id, from_date, nvl(a_string, chr(0)) a_string, null b_string
    from a
    union all
    select case_id, from_date, null, nvl(b_string, chr(0))
    from b
  )
)
where to_date > from_date or to_date is null
order by 1,2;

What’s next?

So far, so good: we can define temporal tables that allow neither gaps nor overlaps, and we can join two such tables using MATCH_RECOGNIZE – or analytic functions if the database version is less than 12c. The data can include null values when necessary.

What we cannot do is handle gaps, or even say that some object “no longer exists”. The latest row is always valid until the end of time.

In the next post, I’ll try to add support for gaps while preventing overlaps.

Joining Temporal Tables 1: virtual date ranges

In my presentation at UKOUG Tech 15, I explained that temporal tables with date ranges are very hard to manage. One way to simplify things is not to store date ranges at all! In this post, I’ll explain how to join two temporal tables when there are no gaps, no overlaps and no nulls in any column.

[Update 2016-01-14: I forgot to mention that this series of posts was inspired by a discussion with Richard Smith that started at UKOUG Tech 15. Thanks Richard!]

[Update 2016-01-15: replaced FULL JOIN USING() with UNION ALL for performance reasons, see below.]

Reminder: Allen’s time intervals

Dr. Allen’s Time Interval Algebra refers to “time intervals”. The SQL:2011 standard refers to “time periods”. I talk about “date ranges” simply because it is a more popular search term – forgive me! Anyway, for my purposes they all mean the same thing.

Here is a summary of the possible relations between two time intervals:

Time Interval Relationships

Although only the day of the month is shown, the boundaries are actually points in time, at exactly 12 A.M. each day. Notice that the start time is included in the interval, but the end time is not. This allows two intervals to “meet” at 12 A.M. without overlapping.

Temporal tables without gaps or overlaps

One of the hardest things about temporal tables is establishing constraints. Suppose you have a table with date ranges, FROM_DATE and TO_DATE:

  • You can say that FROM_DATE is NOT NULL
  • You can say that TO_DATE must be greater than FROM_DATE
  • But: how do you say that overlapping date ranges are not allowed? How do you say that gaps are not allowed?
    These are examples of “cross row” constraints that are very hard to implement successfully.

The simplest way to implement such constraints is to store FROM_DATE only, and calculate TO_DATE! That is what I mean by “virtual date ranges”.

Let’s take a very simple example to illustrate. Note that none of the columns can be NULL, that’s important.

drop table a purge;

create table a (
  case_id varchar2(32),
  from_date date,
  a_string varchar2(32) not null,
  primary key(case_id, from_date)
);

insert into a
select 'Meets only, no nulls' case_id,
  trunc(sysdate, 'MM') - 1 + level*level from_date,
  'Row ' || level a_string
from dual
connect by level <= 3;

select case_id, from_date,
lead(from_date) over (partition by case_id order by from_date) to_date,
a_string
from a;
CASE_ID FROM_DATE TO_DATE A_STRING
Meets only, no nulls 2016-01-01 2016-01-04 Row 1
Meets only, no nulls 2016-01-04 2016-01-09 Row 2
Meets only, no nulls 2016-01-09 Row 3

 

When two date ranges “meet”, the TO_DATE is always equal to the following FROM_DATE. So why store the TO_DATE? Just use the LEAD() function to get the TO_DATE from the following row. It’s so simple.

Here is another table with the same constraints as the first:

drop table b purge;

create table b (
  case_id varchar2(32),
  from_date date,
  b_string varchar2(32) not null,
  primary key(case_id, from_date)
);

insert into b
select 'Meets only, no nulls' case_id,
  trunc(sysdate, 'MM') - 1 + level*2 from_date,
  'Row ' || level b_string
from dual
connect by level <= 3;

select case_id, from_date,
lead(from_date) over (partition by case_id order by from_date) to_date,
b_string
from b;
CASE_ID FROM_DATE TO_DATE B_STRING
Meets only, no nulls 2016-01-02 2016-01-04 Row 1
Meets only, no nulls 2016-01-04 2016-01-06 Row 2
Meets only, no nulls 2016-01-06 Row 3

 

Joining gapless temporal tables with Analytics

[Update 2016-01-15: here is where I switch to UNION ALL.]

Before doing any analytic functions, I have to put the tables together somehow. I see two ways:

  • FULL JOIN on CASE_ID and FROM_DATE
  • UNION ALL

After testing, it seems that UNION ALL uses about half the CPU of the FULL JOIN.

select case_id, from_date, a_string, null b_string from a
union all
select case_id, from_date, null, b_string from b
order by 1,2;
CASE_ID FROM_DATE A_STRING B_STRING
Meets only, no nulls 2016-01-01 Row 1
Meets only, no nulls 2016-01-02 Row 1
Meets only, no nulls 2016-01-04 Row 2
Meets only, no nulls 2016-01-04 Row 2
Meets only, no nulls 2016-01-06 Row 3
Meets only, no nulls 2016-01-09 Row 3

 

This is not far from the answer we want. We need to “carry down” A_STRING and B_STRING values to the null columns in the rows below. Fortunately there is an analytic function that does just this: LAST_VALUE() with the IGNORE NULLS option. Once we do that, we just need to get rid of an extra row.

select * from (
  select case_id, from_date,
    lead(from_date) over(partition by case_id order by from_date) to_date,
    last_value(a_string)
      ignore nulls
      over(partition by case_id order by from_date)
    a_string,
    last_value(b_string)
      ignore nulls
      over(partition by case_id order by from_date)
    b_string
  from (
    select case_id, from_date, a_string, null b_string from a
    union all
    select case_id, from_date, null, b_string from b
  )
)
where to_date > from_date or to_date is null
order by 1,2;
CASE_ID FROM_DATE TO_DATE A_STRING B_STRING
Meets only, no nulls 2016-01-01 2016-01-02 Row 1
Meets only, no nulls 2016-01-02 2016-01-04 Row 1 Row 1
Meets only, no nulls 2016-01-04 2016-01-06 Row 2 Row 2
Meets only, no nulls 2016-01-06 2016-01-09 Row 2 Row 3
Meets only, no nulls 2016-01-09 Row 3 Row 3

 

Joining gapless temporal tables with MATCH_RECOGNIZE

The MATCH_RECOGNIZE clause lets us assign labels to the A and B rows so we can put columns from two input rows into one output row.

select case_id, from_date, to_date, astr a_string, bstr b_string
from (
  select 'A' rowtype, case_id, from_date, a_string, null b_string from a
  union all
  select 'B', case_id, from_date, null, b_string from b
)
match_recognize(
  partition by case_id order by from_date
  measures next(from_date) to_date, a.a_string astr, b.b_string bstr
  all rows per match
  pattern ( (a|b)+ )
  define a as rowtype = 'A'
)
where to_date > from_date or to_date is null;
  • In the MEASURES clause, A.<column> always refers to the most recent A row and B.<column> always refers to the most recent B row.
  • The PATTERN looks for exactly one match, with as many consecutive rows as possible, and where every row is either A or B.
  • Since B is not defined explicitly, it defaults to “always true”. It’s like an invisible “else”.
  • Since I say ALL ROWS PER MATCH, all the input columns show up in the output of MATCH_RECOGNIZE. That is why I have to rename A_STRING as ASTR: otherwise I would get an “ambiguous column” exception.

The output is exactly the same as from the analytic solution.

Limitations

There are two big limitations to these solutions:

  1. The values A_STRING and B_STRING cannot be NULL, or else the IGNORE NULLS option will not work right;
  2. There is no way to say that a CASE_ID does not exist anymore! It always exists from the latest FROM_DATE until the end of time.

In the following posts, I’m going to overcome these limitations, but be warned: the solutions are going to get more complex.