Merging Overlapping Date Ranges

A recent question on the OTN forum asked about merging date ranges. Thanks to Karthick_App I realized that my previous blog on this subject was incomplete. Here is a solution that will merge any date ranges that meet, or that “overlap” in any way.

Define your terms

As a reminder, Allen’s Interval Algebra defines 13 possible relations between date ranges.

Day of month: 1 2 3 4
1: precedes 1 2
3 4
2: meets 1 2
2 3
3: overlaps 1 3
2 4
4: finished by 1 3
2 3
5: contains 1 4
2 3
6: starts 1 2
1 3
7: equals 1 2
1 2
8: started by 1 3
1 2
9: during 2 3
1 4
10: finishes 2 3
1 3
11: overlapped by 2 4
1 3
12: met by 2 3
1 2
13: preceded by 3 4
1 2

 

As you can see, Allen uses “overlap” only when there is a partial overlap. The OTN questioner really wanted to merge any date ranges that did not have a gap, in other words every case except “precedes” and “preceded by”.

To test this, I’ll create a little table with 13 test cases:

drop table t purge;
create table t (
  test_case varchar2(32) not null,
  start_date date not null,
  end_date date not null
);
Insert into t values ('01:precedes',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('01:precedes',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('02:meets',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('02:meets',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('04:finished by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('04:finished by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('05:contains',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('05:contains',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('09:during',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('09:during',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('10:finishes',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('10:finishes',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('11:overlapped by',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('11:overlapped by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('13:preceded by',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('13:preceded by',to_date('01','DD'),to_date('02','DD'));
commit;
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-02
02:meets 2015-06-02 2015-06-03
03:overlaps 2015-06-01 2015-06-03
03:overlaps 2015-06-02 2015-06-04
04:finished by 2015-06-01 2015-06-03
04:finished by 2015-06-02 2015-06-03
05:contains 2015-06-01 2015-06-04
05:contains 2015-06-02 2015-06-03
06:starts 2015-06-01 2015-06-02
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
08:started by 2015-06-01 2015-06-02
09:during 2015-06-02 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-02 2015-06-03
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-02 2015-06-04
11:overlapped by 2015-06-01 2015-06-03
12:met by 2015-06-02 2015-06-03
12:met by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04
13:preceded by 2015-06-01 2015-06-02

The solution

I wrote about the “start of group” method some time ago.

  • The first step is to assign 1 to each row that starts a group and 0 to every other row.
    This is the tricky part: I partition by test_case and order by start_date and end_date, and start a group only when the current start_date is greater than any preceding end_date.
  • The next step is to assign a group to each row using a running sum of the 1s and 0s.
  • Finally, I do the group by.
with grp_starts as (
  select test_case, start_date, end_date,
  case
    when start_date > max(end_date) over(
      partition by test_case order by start_date, end_date
      rows between unbounded preceding and 1 preceding
    )
    then 1 else 0
  end grp_start
  from t
)
, grps as (
  select test_case, start_date, end_date,
  sum(grp_start) over(
    partition by test_case order by start_date, end_date
  ) grp
  from grp_starts
)
select test_case,
min(start_date) start_date,
max(end_date) end_date
from grps
group by test_case, grp
order by 1, 2;
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-03
03:overlaps 2015-06-01 2015-06-04
04:finished by 2015-06-01 2015-06-03
05:contains 2015-06-01 2015-06-04
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-01 2015-06-04
12:met by 2015-06-01 2015-06-03
13:preceded by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04

Conclusion

Once again, the “start of group” method demonstrates its power, combined with a “window clause” in the analytic function that lets us identify the starting row of our group. Thanks again to Karthick_App for noticing that my first solution was incomplete.

And don’t forget: this doesn’t work if the end date can be NULL. If you allow that, substitute DATE ‘9999-12-31’ for the NULL at the beginning of the query, then translate it back to NULL at the end.

Advertisements

7 thoughts on “Merging Overlapping Date Ranges

  1. Just another solution with “model” clause: it simply adds the beginning and ending of group for every row. And we can merge groups(or simply add distinct) later if it is needed:

    select 
       test_case
      ,start_date
      ,end_date
      ,aa as group_start
      ,bb as group_end
    from t
    model
         partition by (test_case)
         dimension by (row_number()over(partition by test_case order by start_date,end_date desc) rn)
         measures (start_date, end_date, start_date aa,end_date bb)
         rules sequential order(
           bb[any] = case when bb[cv()] < bb[cv()-1] then bb[cv()-1] else bb[cv()] end
          ,aa[any] = case when aa[cv()] < bb[cv()-1] then aa[cv()-1] else aa[cv()] end
          ,bb[any] order by rn desc = case when aa[cv()] = aa[cv()+1] then bb[cv()+1] else bb[cv()] end
         )
    order by 1,2
    
    • Hi Sayan,

      Thanks for your contribution. You don’t get the same result as I do for test case “2:meet” and test case “12:met by”. You are probably thinking inclusive end dates, whereas I recommend and use exclusive end dates.

      When I have the choice, I prefer the “Start of Group” method over the MODEL clause because it’s somewhat more efficient and more widely understood. Still, your solution is valid and illustrates my favorite use of the MODEL clause, with “dimension by (row_number()…”

      • Hi Stew,

        It’s quite easy to change to exclusive end dates: we have to change just one condition: aa[cv()] < bb[cv()-1] to aa[cv()] <= bb[cv()-1]
        select
        test_case
        ,start_date
        ,end_date
        ,aa as group_start
        ,bb as group_end
        from t
        model
        partition by (test_case)
        dimension by (row_number()over(partition by test_case order by start_date,end_date desc) rn)
        measures (start_date, end_date, start_date aa,end_date bb)
        rules sequential order(
        bb[any] = case when bb[cv()] < bb[cv()-1] then bb[cv()-1] else bb[cv()] end
        ,aa[any] = case when aa[cv()] <= bb[cv()-1] then aa[cv()-1] else aa[cv()] end
        ,bb[any] order by rn desc = case when aa[cv()] = aa[cv()+1] then bb[cv()+1] else bb[cv()] end
        )
        order by 1,2

        By the way, we also discussed another similar problem with discrete dates(discretization unit = 1 day), we also needed to merge ranges
        01.jan.2001 – 02.jan.2002 and 03.jan.2002 – 05.jan.2002 into one 01.jan.2001 – 05.jan.2002:

        http://www.sql.ru/forum/actualutils.aspx?action=gotomsg&tid=810924&msg=9904299

  2. Hi Stew,
    This question is about a year late but hope you can still assist.

    How would you revise your query if you only have the following data points: member_id, start_date, end_date with the assumption that there are no null values in either date fields and that any of the 13 cases can occur for any member?

    Given the example:
    Member ID Start Date End Date
    A 3/3/2015 3/10/2015
    A 2/25/2015 3/10/2015
    A 2/24/2015 3/10/2015
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    A 10/29/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/6/2015
    C 12/11/2014 2/10/2015

    The desired results are:
    Member ID Start Date End Date
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/10/2015

    Truly appreciate the help on this.

    — whiners

  3. Hi Stew,
    Would you be able to revise your query to transform this:
    Member ID Start Date End Date
    A 3/3/2015 3/10/2015
    A 2/25/2015 3/10/2015
    A 2/24/2015 3/10/2015
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    A 10/29/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/6/2015
    C 12/11/2014 2/10/2015

    into this:
    Member ID Start Date End Date
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/10/2015

    Thanks so much!

    – whiners

    • Hi Whiners,

      Thanks for reading and commenting. I took my code, did a global change from TEST_CASE to MEMBER_ID and it worked.

      What did you try and why didn’t it work?

      If you still have problems, post details: CREATE TABLE and INSERT statements, your attempt at a solution and the exact result of your test.

      Best regards, Stew

      • Hey Stew,

        I apologize for the double post and delayed feedback.
        Your query worked perfectly for the dataset and requirement that I have.

        Truly appreciated your expert insight in this endeavor.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s