#DOAG2016: Ranges, Ranges Everywhere!

Today I did my second presentation at DOAG2016. It was at 9:00 so I got to sleep in ;)

The room was huge but there were enough people that I didn’t feel too lonely.

The room and the technical help were top notch, and again there were questions at just the right time to remind me of things I might have left out!

[Update 2016-11-20: there was a bug on slide 21 ! Instead of “order by end_n” it should be “order by start_n”. I have updated the slide on the DOAG website and on slideshare. My test cases were not thorough; my apologies…]

As promised, I put the presentation on Slideshare. Here is the link:

I will be blogging about some of the content later on, so stay tuned…

As always, please download the slides and play them in Powerpoint so you can see the animations.

Thanks to DOAG for the invite!

Joining Temporal Tables 3: Gaps

This is my third post about temporal tables that don’t allow overlaps. I’m going to add support for temporal gaps, which will also allow objects to stop existing at some point.

Test Data

I’m going to add a NOT NULL numeric field called START_GAP, which can be 0 or 1. When START_GAP = 0, the row contains data that is valid starting at FROM_DATE. When START_GAP = 1, then FROM_DATE is actually the end date of the previous row.

The data can now contain all the relations between time intervals that I listed previously.

alter session set nls_date_format='dd';
drop table a purge;

create table a (
  case_id varchar2(32),
  from_date date,
  start_gap number(1,0) not null check(start_gap in (0,1)),
  a_string varchar2(32),
  primary key(case_id, from_date)
);
Insert into A values ('01-precedes',to_date('2016-01-01','yyyy-mm-dd'),0,'A01-precedes');
Insert into A values ('01-precedes',to_date('2016-01-02','yyyy-mm-dd'),1,'A01-precedes');
Insert into A values ('02-meets',to_date('2016-01-01','yyyy-mm-dd'),0,'A02-meets');
Insert into A values ('02-meets',to_date('2016-01-02','yyyy-mm-dd'),1,'A02-meets');
Insert into A values ('03-overlaps',to_date('2016-01-01','yyyy-mm-dd'),0,'A03-overlaps');
Insert into A values ('03-overlaps',to_date('2016-01-03','yyyy-mm-dd'),1,'A03-overlaps');
Insert into A values ('04-finished by',to_date('2016-01-01','yyyy-mm-dd'),0,'A04-finished by');
Insert into A values ('04-finished by',to_date('2016-01-03','yyyy-mm-dd'),1,'A04-finished by');
Insert into A values ('05-contains',to_date('2016-01-01','yyyy-mm-dd'),0,'A05-contains');
Insert into A values ('05-contains',to_date('2016-01-04','yyyy-mm-dd'),1,'A05-contains');
Insert into A values ('06-starts',to_date('2016-01-01','yyyy-mm-dd'),0,'A06-starts');
Insert into A values ('06-starts',to_date('2016-01-02','yyyy-mm-dd'),1,'A06-starts');
Insert into A values ('07-equals',to_date('2016-01-01','yyyy-mm-dd'),0,'A07-equals');
Insert into A values ('07-equals',to_date('2016-01-02','yyyy-mm-dd'),1,'A07-equals');

drop table b purge;

create table b (
  case_id varchar2(32),
  from_date date,
  start_gap number(1,0) not null check(start_gap in (0,1)),
  b_string varchar2(32),
  primary key(case_id, from_date)
);

Insert into B values ('01-precedes',to_date('2016-01-03','yyyy-mm-dd'),0,'B01-precedes');
Insert into B values ('01-precedes',to_date('2016-01-04','yyyy-mm-dd'),1,'B01-precedes');
Insert into B values ('02-meets',to_date('2016-01-02','yyyy-mm-dd'),0,'B02-meets');
Insert into B values ('02-meets',to_date('2016-01-03','yyyy-mm-dd'),1,'B02-meets');
Insert into B values ('03-overlaps',to_date('2016-01-02','yyyy-mm-dd'),0,'B03-overlaps');
Insert into B values ('03-overlaps',to_date('2016-01-04','yyyy-mm-dd'),1,'B03-overlaps');
Insert into B values ('04-finished by',to_date('2016-01-02','yyyy-mm-dd'),0,'B04-finished by');
Insert into B values ('04-finished by',to_date('2016-01-03','yyyy-mm-dd'),1,'B04-finished by');
Insert into B values ('05-contains',to_date('2016-01-02','yyyy-mm-dd'),0,'B05-contains');
Insert into B values ('05-contains',to_date('2016-01-03','yyyy-mm-dd'),1,'B05-contains');
Insert into B values ('06-starts',to_date('2016-01-01','yyyy-mm-dd'),0,'B06-starts');
Insert into B values ('06-starts',to_date('2016-01-03','yyyy-mm-dd'),1,'B06-starts');
Insert into B values ('07-equals',to_date('2016-01-01','yyyy-mm-dd'),0,'B07-equals');
Insert into B values ('07-equals',to_date('2016-01-02','yyyy-mm-dd'),1,'B07-equals');
commit;

select * from a
union all
select * from b
order by 1,2,4;
CASE_ID FROM_DATE START_GAP A_STRING or B_STRING
01-precedes 01 0 A01-precedes
01-precedes 02 1 A01-precedes
01-precedes 03 0 B01-precedes
01-precedes 04 1 B01-precedes
02-meets 01 0 A02-meets
02-meets 02 1 A02-meets
02-meets 02 0 B02-meets
02-meets 03 1 B02-meets
03-overlaps 01 0 A03-overlaps
03-overlaps 02 0 B03-overlaps
03-overlaps 03 1 A03-overlaps
03-overlaps 04 1 B03-overlaps
04-finished by 01 0 A04-finished by
04-finished by 02 0 B04-finished by
04-finished by 03 1 A04-finished by
04-finished by 03 1 B04-finished by
05-contains 01 0 A05-contains
05-contains 02 0 B05-contains
05-contains 03 1 B05-contains
05-contains 04 1 A05-contains
06-starts 01 0 A06-starts
06-starts 01 0 B06-starts
06-starts 02 1 A06-starts
06-starts 03 1 B06-starts
07-equals 01 0 A07-equals
07-equals 01 0 B07-equals
07-equals 02 1 A07-equals
07-equals 02 1 B07-equals

 

What’s the idea?

As before, we need to carry down values and eliminate certain rows. How do we manage that for the new rows where START_GAP = 1?

  1. For table A, if the most recent START_GAP is 1 then A_STRING should be NULL. Same for B.
  2. If the most recent START_GAP is 1 for both tables, then that row should be eliminated.
  3. If the first row is from A, then B.START_GAP will be NULL. In that case B.START_GAP should be made = 1, since the B object did not exist at that point.
  4. As before, TO_DATE must be NULL or greater than FROM_DATE, otherwise the row is eliminated.

The MATCH_RECOGNIZE solution

Note that I don’t create a ROWTYPE column anymore: since START_GAP is defined as NOT NULL, I can use A.START_GAP to determine whether the current row comes from table A.

select case_id, from_date, to_date, a_str a_string, b_str b_string
from (
  select case_id, from_date, start_gap agap, a_string,
    null bgap, null b_string
  from a
  union all
  select case_id, from_date, null, null, start_gap bgap, b_string
  from b
)
match_recognize(
  partition by case_id order by from_date
  measures next(from_date) to_date,
    nvl(a.agap,1) + nvl(b.bgap,1) abgap,
    case a.agap when 0 then a.a_string end a_str,
    case b.bgap when 0 then b.b_string end b_str
  all rows per match
  pattern( (a|b)+ )
  define a as agap is not null
)
where (to_date > from_date or to_date is null)
  and abgap < 2;
CASE_ID FROM_DATE TO_DATE A_STRING B_STRING
01-precedes 01 02 A01-precedes
01-precedes 03 04 B01-precedes
02-meets 01 02 A02-meets
02-meets 02 03 B02-meets
03-overlaps 01 02 A03-overlaps
03-overlaps 02 03 A03-overlaps B03-overlaps
03-overlaps 03 04 B03-overlaps
04-finished by 01 02 A04-finished by
04-finished by 02 03 A04-finished by B04-finished by
05-contains 01 02 A05-contains
05-contains 02 03 A05-contains B05-contains
05-contains 03 04 A05-contains
06-starts 01 02 A06-starts B06-starts
06-starts 02 03 B06-starts
07-equals 01 02 A07-equals B07-equals

 

The Analytic solution

This one took me some time to figure out. I broke it down into 3 logical steps using WITH clauses.

  • ALL_ROWS just does the UNION ALL, while transforming NULL values to CHR(0)
  • ANALYTIC_ROWS applies all the analytic functions
  • The main SELECT transforms CHR(0) to NULL, makes a value NULL if the most recent START_GAP = 1, and eliminates excess rows.
with all_rows as (
  select case_id, from_date, start_gap agap, nvl(a_string, chr(0)) a_string,
    null bgap, null b_string
  from a
  union all
  select case_id, from_date, null, null,
    start_gap bgap, nvl(b_string, chr(0)) b_string
  from b
)
, analyzed_rows as (
  select case_id, from_date,
    lead(from_date) over(partition by case_id order by from_date) to_date,
    last_value(agap) ignore nulls
      over(partition by case_id order by from_date) agap,
    last_value(bgap) ignore nulls
      over(partition by case_id order by from_date) bgap,
    last_value(a_string) ignore nulls
      over(partition by case_id order by from_date) a_string,
    last_value(b_string) ignore nulls
      over(partition by case_id order by from_date) b_string
  from all_rows
)
select case_id, from_date, to_date,
  case when agap = 0 then nullif(a_string, chr(0)) end a_string,
  case when bgap = 0 then nullif(b_string, chr(0)) end b_string
from analyzed_rows
where (to_date > from_date or to_date is null)
  and nvl(agap,1) + nvl(bgap,1) < 2;

There’s more?

Yes indeed, I’m not through yet.

At this point I can define temporal tables that allow gaps but not overlaps. The attributes can be NULL but FROM_DATE can’t. The TO_DATE is not stored in the same row, but it can be displayed that way. I can also “join” two such tables in an efficient manner.

I only have one problem: your tables probably aren’t built that way. You have FROM_DATE and TO_DATE in each row, and if you say you don’t have overlapping ranges then you want me to believe you.

In my next post I’ll try to “join” tables that store FROM_DATE and TO_DATE, assuming there are no overlaps.

MATCH_RECOGNIZE: gaps in date ranges

To find gaps between date ranges, there is an elegant solution using analytic functions. We can follow the same logic with MATCH_RECOGNIZE, using a technique that avoids the problems with MIN() and MAX() comparisons I just wrote about.

Here are some simple test data

create table t ( start_date date, end_date date );
Insert into T values (DATE '2007-01-01', DATE '2007-01-15');
Insert into T values (DATE '2007-01-03', DATE '2007-01-10');
Insert into T values (DATE '2007-01-12', DATE '2007-01-25');
Insert into T values (DATE '2007-01-20', DATE '2007-02-01');
Insert into T values (DATE '2007-02-05', DATE '2007-02-10');
Insert into T values (DATE '2007-02-05', DATE '2007-02-28');
Insert into T values (DATE '2007-02-10', DATE '2007-02-15');
Insert into T values (DATE '2007-03-01', DATE '2007-03-02');
Insert into T values (DATE '2007-03-03', DATE '2007-03-16');

Please note that I use exclusive end dates: the second to last range ends when March 2d starts, so there should be a gap from March 2d to March 3d.

Using analytics, I compare the maximum end date so far to the next start date:

SELECT * FROM (
  SELECT MAX(end_date) OVER (ORDER BY start_date) start_gap,
  LEAD(start_date) OVER (ORDER BY start_date) end_gap
  FROM T
)
WHERE start_gap < end_gap;
START_GAP END_GAP
2007-02-01 2007-02-05
2007-02-28 2007-03-01
2007-03-02 2007-03-03

That is so simple – once you think of it.

Looking for a MATCH_RECOGNIZE solution, I tried to compare the current start date to the maximum prior end date. I ran into the problem I wrote about previously: when you use MAX() or other aggregates in a comparison, the current row is included.

I finally realized that if I compared the maximum end date to the next start date, the problem disappeared.

select start_gap, end_gap from t
match_recognize(
  order by start_date
  measures max(end_date) start_gap, next(start_date) end_gap
  all rows per match
  pattern((A|{-B-})+)
  define A as max(end_date) < next(start_date)
);

For those less familiar with the syntax

  • Every row that has a gap with the following row gets assigned to “A”, and all the other rows get assigned to “B”.
  • Normally, all rows per match will output every row, but {-b-} excludes the “B” rows from the output.

This code is logically equivalent to the analytic solution above. It also demonstrates how to avoid the knotty problem of how to exclude the current row from aggregates:

  1. Instead of comparing the aggregate to the current row, just back up a row.
  2. Now the aggregate contains exactly the rows you want.
  3. So compare that aggregate to the value of the row you care about, which is the “next” row.

Merging Overlapping Date Ranges with MATCH_RECOGNIZE

I forgot to add a MATCH_RECOGNIZE solution to my last post on merging overlapping date ranges! That should take me just a few minutes, right? Wrong: it’s not that easy and here’s why.

For test data, please refer to the previous post.

To Merge or not to Merge?

The idea is to merge date ranges if they meet or overlap either partially or completely – in other words, we keep merging as long as there is not a gap. How do we know whether or not a gap exists?

  • We sort by start date, then end date.
  • The current row should be merged if its start date is less than or equal to
    the most recent end date, up to but not including the current row.
  • Again, the secret is to compare the most recent end date, not just the end date on the previous line.

The Analytics solution

With analytics, the “windowing clause” lets us do exactly what we want: when we say rows between unbounded preceding and 1 preceding then Oracle will apply the max() function to all the preceding rows, up to but not including the current row. We simply compare the current start date with that maximum preceding end date and we know whether or not to merge.

The MATCH_RECOGNIZE difficulty

MATCH_RECOGNIZE, available starting in version 12c, is pretty powerful and does things analytics can’t do, but it doesn’t really have a windowing clause. When doing comparisons, this can be a problem.

Let’s try a simple test:

select * from t 
match_recognize( 
  partition by test_case 
  order by start_date, end_date 
  measures match_number() match, max(end_date) max_date 
  all rows per match
  pattern(a+) 
  define a as start_date <= max(a.end_date)
)
where rownum <= 2; 

Here I’m comparing the start_date with the maximum date already assigned to A – no I’m not! That is not the way the DEFINE clause works. It assumes the current row belongs to A, then does the test based on that assumption. As a result, the end date from the current row is included in the computation. Look at the result:

TEST_CASE START_DATE END_DATE MATCH MAX_DATE
01:precedes 2015-06-01 2015-06-02 1 2015-06-02
01:precedes 2015-06-03 2015-06-04 1 2015-06-04

 

The “match number” is the same, which means the second line will be merged with the first – and all because the start date is being compared to the wrong end date.

The Workaround

MATCH_RECOGNIZE does give you access to other rows than the current row, using PREV, NEXT, FIRST and LAST. In this case, I could say LAST(end_date,1) and get the end date from the preceding matched line – or I could just say PREV(end_date) for the same result.

So how can I be sure that the preceding end date is the most recent? MATCH_RECOGNIZE doesn’t let you use analytic functions around LAST or PREV. I’ll just have to change the ORDER BY and sort by end date first. When I do, it finally works:

select * from t 
match_recognize( 
  partition by test_case 
  order by end_date, start_date 
  measures min(start_date) start_date, max(end_date) end_date 
  pattern(a b*) 
  define b as start_date <= last(end_date,1)
); 
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-03
03:overlaps 2015-06-01 2015-06-04
04:finished by 2015-06-01 2015-06-03
05:contains 2015-06-01 2015-06-04
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-01 2015-06-04
12:met by 2015-06-01 2015-06-03
13:preceded by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04

 
By the way, in my previous post I could have sorted by end date too and just used LAG(end_date) instead of the windowing clause.

Merging Overlapping Date Ranges

A recent question on the OTN forum asked about merging date ranges. Thanks to Karthick_App I realized that my previous blog on this subject was incomplete. Here is a solution that will merge any date ranges that meet, or that “overlap” in any way.

Define your terms

As a reminder, Allen’s Interval Algebra defines 13 possible relations between date ranges.

Day of month: 1 2 3 4
1: precedes 1 2
3 4
2: meets 1 2
2 3
3: overlaps 1 3
2 4
4: finished by 1 3
2 3
5: contains 1 4
2 3
6: starts 1 2
1 3
7: equals 1 2
1 2
8: started by 1 3
1 2
9: during 2 3
1 4
10: finishes 2 3
1 3
11: overlapped by 2 4
1 3
12: met by 2 3
1 2
13: preceded by 3 4
1 2

 

As you can see, Allen uses “overlap” only when there is a partial overlap. The OTN questioner really wanted to merge any date ranges that did not have a gap, in other words every case except “precedes” and “preceded by”.

To test this, I’ll create a little table with 13 test cases:

drop table t purge;
create table t (
  test_case varchar2(32) not null,
  start_date date not null,
  end_date date not null
);
Insert into t values ('01:precedes',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('01:precedes',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('02:meets',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('02:meets',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('04:finished by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('04:finished by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('05:contains',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('05:contains',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('09:during',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('09:during',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('10:finishes',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('10:finishes',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('11:overlapped by',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('11:overlapped by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('13:preceded by',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('13:preceded by',to_date('01','DD'),to_date('02','DD'));
commit;
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-02
02:meets 2015-06-02 2015-06-03
03:overlaps 2015-06-01 2015-06-03
03:overlaps 2015-06-02 2015-06-04
04:finished by 2015-06-01 2015-06-03
04:finished by 2015-06-02 2015-06-03
05:contains 2015-06-01 2015-06-04
05:contains 2015-06-02 2015-06-03
06:starts 2015-06-01 2015-06-02
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
08:started by 2015-06-01 2015-06-02
09:during 2015-06-02 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-02 2015-06-03
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-02 2015-06-04
11:overlapped by 2015-06-01 2015-06-03
12:met by 2015-06-02 2015-06-03
12:met by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04
13:preceded by 2015-06-01 2015-06-02

The solution

I wrote about the “start of group” method some time ago.

  • The first step is to assign 1 to each row that starts a group and 0 to every other row.
    This is the tricky part: I partition by test_case and order by start_date and end_date, and start a group only when the current start_date is greater than any preceding end_date.
  • The next step is to assign a group to each row using a running sum of the 1s and 0s.
  • Finally, I do the group by.
with grp_starts as (
  select test_case, start_date, end_date,
  case
    when start_date > max(end_date) over(
      partition by test_case order by start_date, end_date
      rows between unbounded preceding and 1 preceding
    )
    then 1 else 0
  end grp_start
  from t
)
, grps as (
  select test_case, start_date, end_date,
  sum(grp_start) over(
    partition by test_case order by start_date, end_date
  ) grp
  from grp_starts
)
select test_case,
min(start_date) start_date,
max(end_date) end_date
from grps
group by test_case, grp
order by 1, 2;
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-03
03:overlaps 2015-06-01 2015-06-04
04:finished by 2015-06-01 2015-06-03
05:contains 2015-06-01 2015-06-04
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-01 2015-06-04
12:met by 2015-06-01 2015-06-03
13:preceded by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04

Conclusion

Once again, the “start of group” method demonstrates its power, combined with a “window clause” in the analytic function that lets us identify the starting row of our group. Thanks again to Karthick_App for noticing that my first solution was incomplete.

And don’t forget: this doesn’t work if the end date can be NULL. If you allow that, substitute DATE ‘9999-12-31’ for the NULL at the beginning of the query, then translate it back to NULL at the end.

Join Tables on Date Ranges

A recent question on the OTN SQL forum asked how best to join two tables related by ID and date range, in order to insert one row per date into a data warehouse. One solution was to expand the data from each table, creating one row per date, then join on date. I think it’s more efficient to join on date range, then expand. Here’s how.

Input Tables and Requirement

I am going to simplify a bit from the OTN setup.

The first table contains an ID, a start date and an end date. (I recommend to always use exclusive end dates.) The final output should have one row for every date within the date range.

create table table_1 (
  id number primary key,
  start_date date not null check (start_date = trunc(start_date)),
  end_date date not null check (end_date = trunc(end_date)),
  check (end_date > start_date)
);
TABLE_1 Output
ID START_DATE END_DATE EACH_DATE
1 2015-03-01 2015-03-05 2015-03-01
2015-03-02
2015-03-03
2015-03-04

 

The second table contains the ID and a “review date” that assigns a “group ID”. In the output, each row should contain the data from the most recent review – or null if there was no prior review.

create table table_2 (
  id number,
  review_date date not null check (review_date = trunc(review_date)),
  group_id varchar2(20) not null,
  primary key (id, review_date)
);
TABLE_1 Output TABLE_2
ID START_DATE END_DATE EACH_DATE REVIEW_DATE GROUP_ID ID REVIEW_DATE GROUP_ID
1 2015-03-01 2015-03-05 2015-03-01 (null) (null)
2015-03-02 2015-03-02 First 1 2015-03-02 First
2015-03-03 2015-03-02 First
2015-03-04 2015-03-04 Second 1 2015-03-04 Second

 

Note: the most recent review is used even if that review took place before the start date in table 1.

The algorithm: join, then expand

If you look at the pretty table just above, you might think that I need to generate the green rows, then join each table to them. That means joining three sources. When I tell you that each ID can have a date range of several months, then you see that the extra work and memory required are not negligible.

So let’s try to join the two tables directly, based on intersecting date ranges:

  1. TABLE_1 already has a start date and an end date.
  2. TABLE_2 has a review date, which starts a range.
    1. I use the LEAD() analytic function to get the next review date, which is the end of the range.
    2. To identify the first row for each ID, I use the LAG() analytic function to get the previous review date (which will be NULL on the first row). We’ll see why.
  3. I join the tables:
    when the review date is earlier than the end date,
    and the next review date is either null or later than the start date.
    It’s a LEFT JOIN since I produce rows whether the ID is in TABLE_2 or not.
  4. Now I have to create a new date range that represents just the intersection.
    1. The beginning is the later of start date and review date – except if the previous review date is null.
      In that case this is the first row for an ID, so the beginning is the start date.
    2. The end is the earlier of end date and next review date.
  5. Finally I expand the rows
    1. When expanding the first row, I may find a start date that is earlier than the review date (see 4A above).
      For days earlier than the review date, I output a row with a null review date and a null group id.

Test data

Allen’s Interval Algebra defines 13 possible relations between date ranges.

Day of month: 1 2 3 4
1: precedes 1 2
3 4
2: meets 1 2
2 3
3: overlaps 1 3
2 4
4: finished by 1 3
2 3
5: contains 1 4
2 3
6: starts 1 2
1 3
7: equals 1 2
1 2
8: started by 1 3
1 2
9: during 2 3
1 4
10: finishes 2 3
1 3
11: overlapped by 2 4
1 3
12: met by 2 3
1 2
13: preceded by 3 4
1 2

In my test data, ID 0 has no review date. IDs 1 through 13 have one review date, and IDs 14 through 26 have two. The review dates correspond to the start and end dates for the green rows in the table.

alter session set nls_date_format='DD';

insert into table_1 select
0, to_date(1),to_date(2) from dual union all select
1, to_date(1),to_date(2) from dual union all select
2, to_date(1),to_date(2) from dual union all select
3, to_date(1),to_date(3) from dual union all select
4, to_date(1),to_date(3) from dual union all select
5, to_date(1),to_date(4) from dual union all select
6, to_date(1),to_date(2) from dual union all select
7, to_date(1),to_date(2) from dual union all select
8, to_date(1),to_date(3) from dual union all select
9, to_date(2),to_date(3) from dual union all select
10, to_date(2),to_date(3) from dual union all select
11, to_date(2),to_date(4) from dual union all select
12, to_date(2),to_date(3) from dual union all select
13, to_date(3),to_date(4) from dual;

insert into table_1
select id+13, start_date, end_date
from table_1
where id > 0;

insert into table_2 select
1, to_date(3),'precedes' from dual union all select
2, to_date(2),'meets' from dual union all select
3, to_date(2),'overlaps' from dual union all select
4, to_date(2),'finished by' from dual union all select
5, to_date(2),'contains' from dual union all select
6, to_date(1),'starts' from dual union all select
7, to_date(1),'equals' from dual union all select
8, to_date(1),'started by' from dual union all select
9, to_date(1),'during' from dual union all select
10, to_date(1),'finishes' from dual union all select
11, to_date(1),'overlapped by' from dual union all select
12, to_date(1),'met by' from dual union all select
13, to_date(1),'preceded by' from dual;

insert into table_2
select id+13, review_date, group_id from table_2;

insert into table_2
select id,
review_date +
case when group_id = 'during' then 3
  when group_id in ('overlaps','starts','finishes','overlapped by') then 2
  else 1
end,
'after ' || group_id
from table_2
where id > 13;
commit;

alter session set nls_date_format='yyyy-mm-dd';

The solution (finally!)

select id, column_value each_date, start_date, end_date, 
  case when column_value >= review_date then review_date end review_date, 
  case when column_value >= review_date then group_id end group_id
from (
  select a.id, start_date, end_date, review_date, group_id,
  case when prev_rd is null
    then start_date
    else greatest(start_date, review_date)
  end range_start,
  case when end_rd is null
    then end_date
    else least(end_date, end_rd)
  end range_end
  from table_1 a
  left join (
    select review_date, group_id, id,
    lead(review_date) over (partition by id order by review_date) end_rd,
    lag(review_date) over (partition by id order by review_date) prev_rd
    from table_2
  ) b
  on a.id = b.id
    and start_date < coalesce(end_rd, end_date)
    and end_date > review_date
) 
,
table(cast(multiset(
  select range_start-1+level from dual
  connect by range_start-1+level < range_end
) as sys.odcidatelist))
order by id, each_date;
  • Lines 16-19: get previous and next review dates from table_2
  • Lines 21-23: left join on ID, and end date greater than review date, and start date less than next review date. If the next review date is null, we are on the last review date so we go all the way to the end date.
  • Lines 6-13: calculate the date range for the joined row.
  • Lines 2-3: leave the review date and group id columns NULL if EACH_DATE is earlier than the first review date.
ID EACH_DATE START_DATE END_DATE REVIEW_DATE GROUP_ID
0 2015-03-01 2015-03-01 2015-03-02
1 2015-03-01 2015-03-01 2015-03-02
2 2015-03-01 2015-03-01 2015-03-02
3 2015-03-01 2015-03-01 2015-03-03
3 2015-03-02 2015-03-01 2015-03-03 2015-03-02 overlaps
4 2015-03-01 2015-03-01 2015-03-03
4 2015-03-02 2015-03-01 2015-03-03 2015-03-02 finished by
5 2015-03-01 2015-03-01 2015-03-04
5 2015-03-02 2015-03-01 2015-03-04 2015-03-02 contains
5 2015-03-03 2015-03-01 2015-03-04 2015-03-02 contains
6 2015-03-01 2015-03-01 2015-03-02 2015-03-01 starts
7 2015-03-01 2015-03-01 2015-03-02 2015-03-01 equals
8 2015-03-01 2015-03-01 2015-03-03 2015-03-01 started by
8 2015-03-02 2015-03-01 2015-03-03 2015-03-01 started by
9 2015-03-02 2015-03-02 2015-03-03 2015-03-01 during
10 2015-03-02 2015-03-02 2015-03-03 2015-03-01 finishes
11 2015-03-02 2015-03-02 2015-03-04 2015-03-01 overlapped by
11 2015-03-03 2015-03-02 2015-03-04 2015-03-01 overlapped by
12 2015-03-02 2015-03-02 2015-03-03 2015-03-01 met by
13 2015-03-03 2015-03-03 2015-03-04 2015-03-01 preceded by
14 2015-03-01 2015-03-01 2015-03-02
15 2015-03-01 2015-03-01 2015-03-02
16 2015-03-01 2015-03-01 2015-03-03
16 2015-03-02 2015-03-01 2015-03-03 2015-03-02 overlaps
17 2015-03-01 2015-03-01 2015-03-03
17 2015-03-02 2015-03-01 2015-03-03 2015-03-02 finished by
18 2015-03-01 2015-03-01 2015-03-04
18 2015-03-02 2015-03-01 2015-03-04 2015-03-02 contains
18 2015-03-03 2015-03-01 2015-03-04 2015-03-03 after contains
19 2015-03-01 2015-03-01 2015-03-02 2015-03-01 starts
20 2015-03-01 2015-03-01 2015-03-02 2015-03-01 equals
21 2015-03-01 2015-03-01 2015-03-03 2015-03-01 started by
21 2015-03-02 2015-03-01 2015-03-03 2015-03-02 after started by
22 2015-03-02 2015-03-02 2015-03-03 2015-03-01 during
23 2015-03-02 2015-03-02 2015-03-03 2015-03-01 finishes
24 2015-03-02 2015-03-02 2015-03-04 2015-03-01 overlapped by
24 2015-03-03 2015-03-02 2015-03-04 2015-03-03 after overlapped by
25 2015-03-02 2015-03-02 2015-03-03 2015-03-02 after met by
26 2015-03-03 2015-03-03 2015-03-04 2015-03-02 after preceded by