If you have temporal tables with date ranges, and you are sure those date ranges cannot overlap, then you can join them using the technique in my previous post – with a few important changes!
Here’s a sample table with date ranges:
create table a_with_ranges ( case_id varchar2(32), from_date date, to_date date not null, a_string varchar2(32) ); Insert into a_with_ranges values ('01-precedes', date '2016-01-01', date '2016-01-02','First value'); Insert into a_with_ranges values ('01-precedes', date '2016-01-03', date '2016-01-04','Second value'); select * from a_with_ranges order by 1,2,3;
CASE_ID | FROM_DATE | TO_DATE | A_STRING |
---|---|---|---|
01-precedes | 2016-01-01 | 2016-01-02 | First value |
01-precedes | 2016-01-03 | 2016-01-04 | Second value |
For my previous technique to work, we need data that looks like this:
CASE_ID | FROM_DATE | START_GAP | A_STRING |
---|---|---|---|
01-precedes | 2016-01-01 | 0 | First value |
01-precedes | 2016-01-02 | 1 | |
01-precedes | 2016-01-03 | 0 | Second value |
01-precedes | 2016-01-04 | 1 |
Basically, every line in the table becomes two lines: the first line has the FROM_DATE value; the second has the TO_DATE value and it “starts a gap”.
We can easily get this result with a UNION ALL:
select case_id, from_date, 0 start_gap, a_string from a_with_ranges union all select case_id, to_date, 1, null from a_with_ranges order by 1,2,3;
I don’t like this code, because it scans the table twice.
If we have at least version 11 of the Oracle database, we can scan the table just once and use UNPIVOT to create the extra line:
select case_id, from_date, start_gap, a_string from ( select case_id, from_date, to_date, 0 start_gap, 1 start_gap_b, a_string, null a_string_b from a_with_ranges ) unpivot( (from_date, start_gap, a_string) for col in ( (from_date, start_gap, a_string), (to_date, start_gap_b, a_string_b) ) ) order by 1,2;
You can integrate this with my “join” techniques by making one complicated SELECT, or you can create a VIEW based on the above code and just use my previous technique as is.
Oops! Not quite…
You may remember that in my previous post, my tables did not always have two rows per “virtual range”. The same row could be the start of one range and the end of the preceding range. I could not have two rows with the same CASE_ID and FROM_DATE because of the primary key.
Here, when I double the rows, I may wind up with unnecessary rows where START_GAP = 1 , in which case there will be more than one row with the same CASE_ID and FROM_DATE. Here is an example:
delete from a_with_ranges; Insert into a_with_ranges values ('02-meets', date '2016-01-01', date '2016-01-03','First value'); Insert into a_with_ranges values ('02-meets', date '2016-01-03', date '2016-01-04','Second value'); select case_id, from_date, start_gap, a_string from ( select case_id, from_date, to_date, 0 start_gap, 1 start_gap_b, a_string, null a_string_b from a_with_ranges ) unpivot( (from_date, start_gap, a_string) for col in ( (from_date, start_gap, a_string), (to_date, start_gap_b, a_string_b) ) ) order by 1,2,3;
CASE_ID | FROM_DATE | START_GAP | A_STRING |
---|---|---|---|
02-meets | 2016-01-01 | 0 | First value |
02-meets | 2016-01-03 | 0 | Second value |
02-meets | 2016-01-03 | 1 | |
02-meets | 2016-01-04 | 1 |
Man, that is all messed up! Not only is row 3 unnecessary, but it’s telling us that the previous row stopped at the same time it started!
Fortunately, we don’t have to do any extra work to get rid of that row. All we need to do is adjust the order of the rows in the previous techniques: ORDER BY FROM_DATE, START_GAP DESCENDING. Now that extra row will end a range and the next row will start a range, whether the FROM_DATEs are the same or not.
The adjusted solutions
create or replace view va as select case_id, from_date, start_gap, a_string from ( select case_id, from_date, to_date, 0 start_gap, 1 start_gap_b, a_string, null a_string_b from a_with_ranges ) unpivot( (from_date, start_gap, a_string) for col in ( (from_date, start_gap, a_string), (to_date, start_gap_b, a_string_b) ) ); drop table b_with_ranges purge; create table b_with_ranges as select case_id, min(from_date) from_date, max(to_date) to_date, 'B value' b_string from a_with_ranges group by case_id; create or replace view vb as select case_id, from_date, start_gap, b_string from ( select case_id, from_date, to_date, 0 start_gap, 1 start_gap_b, b_string, null b_string_b from b_with_ranges ) unpivot( (from_date, start_gap, b_string) for col in ( (from_date, start_gap, b_string), (to_date, start_gap_b, b_string_b) ) ); select * from vb;
CASE_ID | FROM_DATE | START_GAP | B_STRING |
---|---|---|---|
02-meets | 01 | 0 | B value |
02-meets | 04 | 1 |
select case_id, from_date, to_date, a_str a_string, b_str b_string from ( select case_id, from_date, start_gap, start_gap agap, a_string, null bgap, null b_string from va union all select case_id, from_date, start_gap, null, null, start_gap bgap, b_string from vb ) match_recognize( partition by case_id order by from_date, start_gap desc measures next(from_date) to_date, nvl(a.agap,1) + nvl(b.bgap,1) abgap, case a.agap when 0 then a.a_string end a_str, case b.bgap when 0 then b.b_string end b_str all rows per match pattern( (a|b)+ ) define a as agap is not null ) where (to_date > from_date or to_date is null) and abgap < 2;
with all_rows as ( select case_id, from_date, start_gap, start_gap agap, nvl(a_string, chr(0)) a_string, null bgap, null b_string from va union all select case_id, from_date, start_gap, null, null, start_gap bgap, nvl(b_string, chr(0)) b_string from vb ) , analyzed_rows as ( select case_id, from_date, lead(from_date) over(partition by case_id order by from_date, start_gap desc) to_date, last_value(agap) ignore nulls over(partition by case_id order by from_date, start_gap desc) agap, last_value(bgap) ignore nulls over(partition by case_id order by from_date, start_gap desc) bgap, last_value(a_string) ignore nulls over(partition by case_id order by from_date, start_gap desc) a_string, last_value(b_string) ignore nulls over(partition by case_id order by from_date, start_gap desc) b_string from all_rows ) select case_id, from_date, to_date, case when agap = 0 then nullif(a_string, chr(0)) end a_string, case when bgap = 0 then nullif(b_string, chr(0)) end b_string from analyzed_rows where (to_date > from_date or to_date is null) and nvl(agap,1) + nvl(bgap,1) < 2;
CASE_ID | FROM_DATE | TO_DATE | A_STRING | B_STRING |
---|---|---|---|---|
02-meets | 01 | 03 | First value | B value |
02-meets | 03 | 04 | Second value | B value |
Conclusion
This series of posts has focused on temporal tables that should not allow overlaps. The easiest way to avoid overlaps is for each row to store a date, not a range. A flag says whether the date is the start of a range or the start of a gap. When there is no gap, the end of a range is simply the date in the following row.
Whether you use this design or store date ranges, the above solutions will allow you to “join” two temporal tables – as long as there really are no overlaps.