Joining Temporal Tables 2: handling nulls

In my previous post, I showed a way to implement and join temporal tables when there are no gaps, overlaps or null values. Let’s see what happens when I allow null values.

[Update 2016-01-16: for performance reasons, now using UNION ALL instead of FULL JOIN.]

(This series of posts is inspired by a discussion with Richard Smith that started at UKOUG Tech 15. Thanks Richard!)

Test data

I’m going to recreate the same tables as in my previous post, except that A_STRING and B_STRING are null in the second row of each table.

drop table a purge;

create table a (
  case_id varchar2(32),
  from_date date,
  a_string varchar2(32) NULL,
  primary key(case_id, from_date)
);

insert into a
select 'Meets only, nulls' case_id,
  trunc(sysdate, 'MM') - 1 + level*level from_date,
  case when mod(level,2) = 1 then 'Row ' || level end a_string
from dual
connect by level <= 3;

select case_id, from_date,
lead(from_date) over (partition by case_id order by from_date) to_date,
a_string
from a;
CASE_ID FROM_DATE TO_DATE A_STRING
Meets only, nulls 2016-01-01 2016-01-04 Row 1
Meets only, nulls 2016-01-04 2016-01-09
Meets only, nulls 2016-01-09 Row 3
drop table b purge;

create table b (
  case_id varchar2(32),
  from_date date,
  b_string varchar2(32) NULL,
  primary key(case_id, from_date)
);

insert into b
select 'Meets only, nulls' case_id,
  trunc(sysdate, 'MM') - 1 + level*2 from_date,
  case when mod(level,2) = 1 then 'Row ' || level end b_string
from dual
connect by level <= 3;

select case_id, from_date,
lead(from_date) over (partition by case_id order by from_date) to_date,
b_string
from b;
CASE_ID FROM_DATE TO_DATE B_STRING
Meets only, nulls 2016-01-02 2016-01-04 Row 1
Meets only, nulls 2016-01-04 2016-01-06
Meets only, nulls 2016-01-06 Row 3

 

Joining with MATCH_RECOGNIZE

This solution doesn’t care about A_STRING or B_STRING being null, so it works just as well now as it did before.

select case_id, from_date, to_date, astr a_string, bstr b_string
from (
  select 'A' rowtype, case_id, from_date, a_string, null b_string from a
  union all
  select 'B', case_id, from_date, null, b_string from b
)
match_recognize(
  partition by case_id order by from_date
  measures next(from_date) to_date, a.a_string astr, b.b_string bstr
  all rows per match
  pattern ( (a|b)+ )
  define a as rowtype = 'A'
)
where to_date > from_date or to_date is null;
CASE_ID FROM_DATE TO_DATE A_STRING B_STRING
Meets only, nulls 2016-01-01 2016-01-02 Row 1
Meets only, nulls 2016-01-02 2016-01-04 Row 1 Row 1
Meets only, nulls 2016-01-04 2016-01-06
Meets only, nulls 2016-01-06 2016-01-09 Row 3
Meets only, nulls 2016-01-09 Row 3 Row 3

 

Joining with Analytics

My analytic solution used the IGNORE NULLS option – and that’s exactly what I can’t do now!

Previously, a NULL in A_STRING just meant this was a B row. Now it can also mean an A row where A_STRING really is NULL.

Since NULL can mean two different things, how do I distinguish the two meanings? The only way I know is to keep NULL when it’s a B row and use something else to stand for a NULL value in an A row.

  • Before the analytic function, use NVL() to substitute some “impossible” value for NULL
    (see lines 15 and 18)
  • During the analytic function, use IGNORE NULLs as before, since NULL now has only one meaning
  • After the analytic function, use NULLIF() to turn the “impossible” value back to NULL
    (see lines 4 through 13).
select * from (
  select case_id, from_date,
    lead(from_date) over(partition by case_id order by from_date) to_date,
    nullif(
      last_value(a_string) ignore nulls
        over(partition by case_id order by from_date),
      chr(0)
    ) a_string,
    nullif(
      last_value(b_string) ignore nulls
        over(partition by case_id order by from_date),
      chr(0)
    ) b_string
  from (
    select case_id, from_date, nvl(a_string, chr(0)) a_string, null b_string
    from a
    union all
    select case_id, from_date, null, nvl(b_string, chr(0))
    from b
  )
)
where to_date > from_date or to_date is null
order by 1,2;

What’s next?

So far, so good: we can define temporal tables that allow neither gaps nor overlaps, and we can join two such tables using MATCH_RECOGNIZE – or analytic functions if the database version is less than 12c. The data can include null values when necessary.

What we cannot do is handle gaps, or even say that some object “no longer exists”. The latest row is always valid until the end of time.

In the next post, I’ll try to add support for gaps while preventing overlaps.

Leave a comment