Joining Temporal Tables 4: ranges

If you have temporal tables with date ranges, and you are sure those date ranges cannot overlap, then you can join them using the technique in my previous post – with a few important changes!

Here’s a sample table with date ranges:

create table a_with_ranges (
  case_id varchar2(32),
  from_date date,
  to_date date not null,
  a_string varchar2(32)
);

Insert into a_with_ranges values
  ('01-precedes', date '2016-01-01', date '2016-01-02','First value');
Insert into a_with_ranges values
  ('01-precedes', date '2016-01-03', date '2016-01-04','Second value');

select * from a_with_ranges order by 1,2,3;
CASE_ID FROM_DATE TO_DATE A_STRING
01-precedes 2016-01-01 2016-01-02 First value
01-precedes 2016-01-03 2016-01-04 Second value

 

For my previous technique to work, we need data that looks like this:

CASE_ID FROM_DATE START_GAP A_STRING
01-precedes 2016-01-01 0 First value
01-precedes 2016-01-02 1
01-precedes 2016-01-03 0 Second value
01-precedes 2016-01-04 1

 

Basically, every line in the table becomes two lines: the first line has the FROM_DATE value; the second has the TO_DATE value and it “starts a gap”.

We can easily get this result with a UNION ALL:

select case_id, from_date, 0 start_gap, a_string
from a_with_ranges
union all
select case_id, to_date, 1, null
from a_with_ranges
order by 1,2,3;

I don’t like this code, because it scans the table twice.

If we have at least version 11 of the Oracle database, we can scan the table just once and use UNPIVOT to create the extra line:

select case_id, from_date, start_gap, a_string
from (
  select case_id, from_date, to_date,
    0 start_gap, 1 start_gap_b,
    a_string, null a_string_b
  from a_with_ranges
)
unpivot(
  (from_date, start_gap, a_string)
  for col in (
    (from_date, start_gap, a_string),
    (to_date, start_gap_b, a_string_b)
  )
)
order by 1,2;

You can integrate this with my “join” techniques by making one complicated SELECT, or you can create a VIEW based on the above code and just use my previous technique as is.

Oops! Not quite…

You may remember that in my previous post, my tables did not always have two rows per “virtual range”. The same row could be the start of one range and the end of the preceding range. I could not have two rows with the same CASE_ID and FROM_DATE because of the primary key.

Here, when I double the rows, I may wind up with unnecessary rows where START_GAP = 1 , in which case there will be more than one row with the same CASE_ID and FROM_DATE. Here is an example:

delete from a_with_ranges;

Insert into a_with_ranges values
  ('02-meets', date '2016-01-01', date '2016-01-03','First value');
Insert into a_with_ranges values
  ('02-meets', date '2016-01-03', date '2016-01-04','Second value');

select case_id, from_date, start_gap, a_string
from (
  select case_id, from_date, to_date,
    0 start_gap, 1 start_gap_b,
    a_string, null a_string_b
  from a_with_ranges
)
unpivot(
  (from_date, start_gap, a_string)
  for col in (
    (from_date, start_gap, a_string),
    (to_date, start_gap_b, a_string_b)
  )
)
order by 1,2,3;
CASE_ID FROM_DATE START_GAP A_STRING
02-meets 2016-01-01 0 First value
02-meets 2016-01-03 0 Second value
02-meets 2016-01-03 1
02-meets 2016-01-04 1

 

Man, that is all messed up! Not only is row 3 unnecessary, but it’s telling us that the previous row stopped at the same time it started!

Fortunately, we don’t have to do any extra work to get rid of that row. All we need to do is adjust the order of the rows in the previous techniques: ORDER BY FROM_DATE, START_GAP DESCENDING. Now that extra row will end a range and the next row will start a range, whether the FROM_DATEs are the same or not.

The adjusted solutions

create or replace view va as
select case_id, from_date, start_gap, a_string
from (
  select case_id, from_date, to_date,
    0 start_gap, 1 start_gap_b,
    a_string, null a_string_b
  from a_with_ranges
)
unpivot(
  (from_date, start_gap, a_string)
  for col in (
    (from_date, start_gap, a_string),
    (to_date, start_gap_b, a_string_b)
  )
);

drop table b_with_ranges purge;

create table b_with_ranges as
select case_id,
min(from_date) from_date,
max(to_date) to_date,
'B value' b_string
from a_with_ranges
group by case_id;

create or replace view vb as
select case_id, from_date, start_gap, b_string
from (
  select case_id, from_date, to_date,
    0 start_gap, 1 start_gap_b,
    b_string, null b_string_b
  from b_with_ranges
)
unpivot(
  (from_date, start_gap, b_string)
  for col in (
    (from_date, start_gap, b_string),
    (to_date, start_gap_b, b_string_b)
  )
);

select * from vb;
CASE_ID FROM_DATE START_GAP B_STRING
02-meets 01 0 B value
02-meets 04 1
select case_id, from_date, to_date, a_str a_string, b_str b_string
from (
  select case_id, from_date, start_gap, start_gap agap, a_string,
    null bgap, null b_string
  from va
  union all
  select case_id, from_date, start_gap, null, null, start_gap bgap, b_string
  from vb
)
match_recognize(
  partition by case_id order by from_date, start_gap desc
  measures next(from_date) to_date,
    nvl(a.agap,1) + nvl(b.bgap,1) abgap,
    case a.agap when 0 then a.a_string end a_str,
    case b.bgap when 0 then b.b_string end b_str
  all rows per match
  pattern( (a|b)+ )
  define a as agap is not null
)
where (to_date > from_date or to_date is null)
  and abgap < 2; 
 with all_rows as (   select case_id, from_date, start_gap,     start_gap agap, nvl(a_string, chr(0)) a_string,     null bgap, null b_string   from va   union all   select case_id, from_date, start_gap,     null, null,     start_gap bgap, nvl(b_string, chr(0)) b_string   from vb ) , analyzed_rows as (   select case_id, from_date,     lead(from_date) over(partition by case_id order by from_date, start_gap desc) to_date,     last_value(agap) ignore nulls       over(partition by case_id order by from_date, start_gap desc) agap,     last_value(bgap) ignore nulls       over(partition by case_id order by from_date, start_gap desc) bgap,     last_value(a_string) ignore nulls       over(partition by case_id order by from_date, start_gap desc) a_string,     last_value(b_string) ignore nulls       over(partition by case_id order by from_date, start_gap desc) b_string   from all_rows ) select case_id, from_date, to_date,   case when agap = 0 then nullif(a_string, chr(0)) end a_string,   case when bgap = 0 then nullif(b_string, chr(0)) end b_string from analyzed_rows where (to_date > from_date or to_date is null)
  and nvl(agap,1) + nvl(bgap,1) < 2;
CASE_ID FROM_DATE TO_DATE A_STRING B_STRING
02-meets 01 03 First value B value
02-meets 03 04 Second value B value

 

Conclusion

This series of posts has focused on temporal tables that should not allow overlaps. The easiest way to avoid overlaps is for each row to store a date, not a range. A flag says whether the date is the start of a range or the start of a gap. When there is no gap, the end of a range is simply the date in the following row.

Whether you use this design or store date ranges, the above solutions will allow you to “join” two temporal tables – as long as there really are no overlaps.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s