Merging Overlapping Date Ranges

A recent question on the OTN forum asked about merging date ranges. Thanks to Karthick_App I realized that my previous blog on this subject was incomplete. Here is a solution that will merge any date ranges that meet, or that “overlap” in any way.

Define your terms

As a reminder, Allen’s Interval Algebra defines 13 possible relations between date ranges.

Day of month: 1 2 3 4
1: precedes 1 2
3 4
2: meets 1 2
2 3
3: overlaps 1 3
2 4
4: finished by 1 3
2 3
5: contains 1 4
2 3
6: starts 1 2
1 3
7: equals 1 2
1 2
8: started by 1 3
1 2
9: during 2 3
1 4
10: finishes 2 3
1 3
11: overlapped by 2 4
1 3
12: met by 2 3
1 2
13: preceded by 3 4
1 2

 

As you can see, Allen uses “overlap” only when there is a partial overlap. The OTN questioner really wanted to merge any date ranges that did not have a gap, in other words every case except “precedes” and “preceded by”.

To test this, I’ll create a little table with 13 test cases:

drop table t purge;
create table t (
  test_case varchar2(32) not null,
  start_date date not null,
  end_date date not null
);
Insert into t values ('01:precedes',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('01:precedes',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('02:meets',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('02:meets',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('04:finished by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('04:finished by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('05:contains',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('05:contains',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('09:during',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('09:during',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('10:finishes',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('10:finishes',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('11:overlapped by',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('11:overlapped by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('13:preceded by',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('13:preceded by',to_date('01','DD'),to_date('02','DD'));
commit;
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-02
02:meets 2015-06-02 2015-06-03
03:overlaps 2015-06-01 2015-06-03
03:overlaps 2015-06-02 2015-06-04
04:finished by 2015-06-01 2015-06-03
04:finished by 2015-06-02 2015-06-03
05:contains 2015-06-01 2015-06-04
05:contains 2015-06-02 2015-06-03
06:starts 2015-06-01 2015-06-02
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
08:started by 2015-06-01 2015-06-02
09:during 2015-06-02 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-02 2015-06-03
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-02 2015-06-04
11:overlapped by 2015-06-01 2015-06-03
12:met by 2015-06-02 2015-06-03
12:met by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04
13:preceded by 2015-06-01 2015-06-02

The solution

I wrote about the “start of group” method some time ago.

  • The first step is to assign 1 to each row that starts a group and 0 to every other row.
    This is the tricky part: I partition by test_case and order by start_date and end_date, and start a group only when the current start_date is greater than any preceding end_date.
  • The next step is to assign a group to each row using a running sum of the 1s and 0s.
  • Finally, I do the group by.
with grp_starts as (
  select test_case, start_date, end_date,
  case
    when start_date > max(end_date) over(
      partition by test_case order by start_date, end_date
      rows between unbounded preceding and 1 preceding
    )
    then 1 else 0
  end grp_start
  from t
)
, grps as (
  select test_case, start_date, end_date,
  sum(grp_start) over(
    partition by test_case order by start_date, end_date
  ) grp
  from grp_starts
)
select test_case,
min(start_date) start_date,
max(end_date) end_date
from grps
group by test_case, grp
order by 1, 2;
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-03
03:overlaps 2015-06-01 2015-06-04
04:finished by 2015-06-01 2015-06-03
05:contains 2015-06-01 2015-06-04
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-01 2015-06-04
12:met by 2015-06-01 2015-06-03
13:preceded by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04

Conclusion

Once again, the “start of group” method demonstrates its power, combined with a “window clause” in the analytic function that lets us identify the starting row of our group. Thanks again to Karthick_App for noticing that my first solution was incomplete.

And don’t forget: this doesn’t work if the end date can be NULL. If you allow that, substitute DATE ‘9999-12-31’ for the NULL at the beginning of the query, then translate it back to NULL at the end.

Advertisements

12 thoughts on “Merging Overlapping Date Ranges

  1. Just another solution with “model” clause: it simply adds the beginning and ending of group for every row. And we can merge groups(or simply add distinct) later if it is needed:

    select 
       test_case
      ,start_date
      ,end_date
      ,aa as group_start
      ,bb as group_end
    from t
    model
         partition by (test_case)
         dimension by (row_number()over(partition by test_case order by start_date,end_date desc) rn)
         measures (start_date, end_date, start_date aa,end_date bb)
         rules sequential order(
           bb[any] = case when bb[cv()] < bb[cv()-1] then bb[cv()-1] else bb[cv()] end
          ,aa[any] = case when aa[cv()] < bb[cv()-1] then aa[cv()-1] else aa[cv()] end
          ,bb[any] order by rn desc = case when aa[cv()] = aa[cv()+1] then bb[cv()+1] else bb[cv()] end
         )
    order by 1,2
    
    • Hi Sayan,

      Thanks for your contribution. You don’t get the same result as I do for test case “2:meet” and test case “12:met by”. You are probably thinking inclusive end dates, whereas I recommend and use exclusive end dates.

      When I have the choice, I prefer the “Start of Group” method over the MODEL clause because it’s somewhat more efficient and more widely understood. Still, your solution is valid and illustrates my favorite use of the MODEL clause, with “dimension by (row_number()…”

      • Hi Stew,

        It’s quite easy to change to exclusive end dates: we have to change just one condition: aa[cv()] < bb[cv()-1] to aa[cv()] <= bb[cv()-1]
        select
        test_case
        ,start_date
        ,end_date
        ,aa as group_start
        ,bb as group_end
        from t
        model
        partition by (test_case)
        dimension by (row_number()over(partition by test_case order by start_date,end_date desc) rn)
        measures (start_date, end_date, start_date aa,end_date bb)
        rules sequential order(
        bb[any] = case when bb[cv()] < bb[cv()-1] then bb[cv()-1] else bb[cv()] end
        ,aa[any] = case when aa[cv()] <= bb[cv()-1] then aa[cv()-1] else aa[cv()] end
        ,bb[any] order by rn desc = case when aa[cv()] = aa[cv()+1] then bb[cv()+1] else bb[cv()] end
        )
        order by 1,2

        By the way, we also discussed another similar problem with discrete dates(discretization unit = 1 day), we also needed to merge ranges
        01.jan.2001 – 02.jan.2002 and 03.jan.2002 – 05.jan.2002 into one 01.jan.2001 – 05.jan.2002:

        http://www.sql.ru/forum/actualutils.aspx?action=gotomsg&tid=810924&msg=9904299

  2. Hi Stew,
    This question is about a year late but hope you can still assist.

    How would you revise your query if you only have the following data points: member_id, start_date, end_date with the assumption that there are no null values in either date fields and that any of the 13 cases can occur for any member?

    Given the example:
    Member ID Start Date End Date
    A 3/3/2015 3/10/2015
    A 2/25/2015 3/10/2015
    A 2/24/2015 3/10/2015
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    A 10/29/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/6/2015
    C 12/11/2014 2/10/2015

    The desired results are:
    Member ID Start Date End Date
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/10/2015

    Truly appreciate the help on this.

    — whiners

  3. Hi Stew,
    Would you be able to revise your query to transform this:
    Member ID Start Date End Date
    A 3/3/2015 3/10/2015
    A 2/25/2015 3/10/2015
    A 2/24/2015 3/10/2015
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    A 10/29/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/6/2015
    C 12/11/2014 2/10/2015

    into this:
    Member ID Start Date End Date
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/10/2015

    Thanks so much!

    – whiners

    • Hi Whiners,

      Thanks for reading and commenting. I took my code, did a global change from TEST_CASE to MEMBER_ID and it worked.

      What did you try and why didn’t it work?

      If you still have problems, post details: CREATE TABLE and INSERT statements, your attempt at a solution and the exact result of your test.

      Best regards, Stew

      • Hey Stew,

        I apologize for the double post and delayed feedback.
        Your query worked perfectly for the dataset and requirement that I have.

        Truly appreciated your expert insight in this endeavor.

  4. Hi Stew,

    I have similar requirement. When there is NO GAP between the end date of first record and start date of second, then the records should roll-up into one record. When there is a gap then it should print as it is..

    Similar to your test case 1 but the output should be rolled up..

    id start date end date
    12310 1/1/2015 12/31/2015
    12310 3/1/2016 12/31/2199
    12345 1/1/2015 3/31/2015
    12345 4/1/2015 12/31/2199
    12347 1/1/2015 4/30/2015
    12347 5/1/2015 12/31/2015
    12347 1/1/2016 12/31/2199

    expected result

    12310 1/1/2015 12/31/2015
    12310 3/1/2016 12/31/2199
    12345 1/1/2015 12/31/2199
    12347 1/1/2015 12/31/2199

    Thanks in advance.

    • Hi Sam,

      A few questions for you:
      – What do you want to happen when there are overlaps?
      – What version of the Oracle Database do you have?
      – Where are the CREATE TABLE and INSERT statements that would let us share test data?
      – What have you tried and what were the results?

      Best regards, Stew

      • Hi Stew,

        There can’t be overlaps in my case. Oracle 12c version.

        create table tmp (
        cid VARCHAR2(50),
        start_date date,
        end_date date
        );
        insert into tmp
        values(‘12345′,’01/01/2015′,’03/31/2015’);
        insert into tmp
        values(‘12345′,’04/01/2015′,’12/31/2199’);
        insert into tmp
        values(‘12347′,’01/01/2015′,’04/30/2015’);
        insert into tmp
        values(‘12347′,’05/01/2015′,’12/31/2015’);
        insert into tmp
        values(‘12347′,’01/01/2016′,’12/31/2199’);
        insert into tmp
        values(‘12310′,’01/01/2015′,’12/31/2015’);
        insert into tmp
        values(‘12310′,’03/01/2016′,’12/31/2199’);
        commit;

        I think I found what I want.. this query seems to work for my needs…

        select cid, min(start_date) df, max(end_date) dt
        from (select t2.*, rwn – sum(gap) over(order by rwn) as grp
        from (select t1.*,
        row_number() over(order by cid, start_date) rwn,
        case
        when lag(cid)
        over(partition by cid order by start_date) = cid and
        lag(end_date)
        over(partition by cid order by start_date) + 1 =
        start_date then
        1
        else
        0
        end gap
        from tmp t1) t2)
        group by cid, grp
        order by cid, min(start_date)

        Thanks for your support !

  5. Sam,

    That looks like it would work. Given your version, you could do the same thing a bit more efficiently and with less code as:

    select * from tmp
    match_recognize(
        partition by cid order by start_date
        measures a.start_date start_date, end_date end_date
        pattern(a b*)
        define b as start_date = prev(end_date) + 1
    );
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s