Merging Overlapping Date Ranges

A recent question on the OTN forum asked about merging date ranges. Thanks to Karthick_App I realized that my previous blog on this subject was incomplete. Here is a solution that will merge any date ranges that meet, or that “overlap” in any way.

Define your terms

As a reminder, Allen’s Interval Algebra defines 13 possible relations between date ranges.

Day of month: 1 2 3 4
1: precedes 1 2
3 4
2: meets 1 2
2 3
3: overlaps 1 3
2 4
4: finished by 1 3
2 3
5: contains 1 4
2 3
6: starts 1 2
1 3
7: equals 1 2
1 2
8: started by 1 3
1 2
9: during 2 3
1 4
10: finishes 2 3
1 3
11: overlapped by 2 4
1 3
12: met by 2 3
1 2
13: preceded by 3 4
1 2

 

As you can see, Allen uses “overlap” only when there is a partial overlap. The OTN questioner really wanted to merge any date ranges that did not have a gap, in other words every case except “precedes” and “preceded by”.

To test this, I’ll create a little table with 13 test cases:

drop table t purge;
create table t (
  test_case varchar2(32) not null,
  start_date date not null,
  end_date date not null
);
Insert into t values ('01:precedes',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('01:precedes',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('02:meets',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('02:meets',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('04:finished by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('04:finished by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('05:contains',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('05:contains',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('09:during',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('09:during',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('10:finishes',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('10:finishes',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('11:overlapped by',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('11:overlapped by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('13:preceded by',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('13:preceded by',to_date('01','DD'),to_date('02','DD'));
commit;
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-02
02:meets 2015-06-02 2015-06-03
03:overlaps 2015-06-01 2015-06-03
03:overlaps 2015-06-02 2015-06-04
04:finished by 2015-06-01 2015-06-03
04:finished by 2015-06-02 2015-06-03
05:contains 2015-06-01 2015-06-04
05:contains 2015-06-02 2015-06-03
06:starts 2015-06-01 2015-06-02
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
08:started by 2015-06-01 2015-06-02
09:during 2015-06-02 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-02 2015-06-03
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-02 2015-06-04
11:overlapped by 2015-06-01 2015-06-03
12:met by 2015-06-02 2015-06-03
12:met by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04
13:preceded by 2015-06-01 2015-06-02

The solution

I wrote about the “start of group” method some time ago.

  • The first step is to assign 1 to each row that starts a group and 0 to every other row.
    This is the tricky part: I partition by test_case and order by start_date and end_date, and start a group only when the current start_date is greater than any preceding end_date.
  • The next step is to assign a group to each row using a running sum of the 1s and 0s.
  • Finally, I do the group by.
with grp_starts as (
  select test_case, start_date, end_date,
  case
    when start_date > max(end_date) over(
      partition by test_case order by start_date, end_date
      rows between unbounded preceding and 1 preceding
    )
    then 1 else 0
  end grp_start
  from t
)
, grps as (
  select test_case, start_date, end_date,
  sum(grp_start) over(
    partition by test_case order by start_date, end_date
  ) grp
  from grp_starts
)
select test_case,
min(start_date) start_date,
max(end_date) end_date
from grps
group by test_case, grp
order by 1, 2;
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-03
03:overlaps 2015-06-01 2015-06-04
04:finished by 2015-06-01 2015-06-03
05:contains 2015-06-01 2015-06-04
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-01 2015-06-04
12:met by 2015-06-01 2015-06-03
13:preceded by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04

Conclusion

Once again, the “start of group” method demonstrates its power, combined with a “window clause” in the analytic function that lets us identify the starting row of our group. Thanks again to Karthick_App for noticing that my first solution was incomplete.

And don’t forget: this doesn’t work if the end date can be NULL. If you allow that, substitute DATE ‘9999-12-31’ for the NULL at the beginning of the query, then translate it back to NULL at the end.

28 thoughts on “Merging Overlapping Date Ranges

  1. Just another solution with “model” clause: it simply adds the beginning and ending of group for every row. And we can merge groups(or simply add distinct) later if it is needed:

    select 
       test_case
      ,start_date
      ,end_date
      ,aa as group_start
      ,bb as group_end
    from t
    model
         partition by (test_case)
         dimension by (row_number()over(partition by test_case order by start_date,end_date desc) rn)
         measures (start_date, end_date, start_date aa,end_date bb)
         rules sequential order(
           bb[any] = case when bb[cv()] < bb[cv()-1] then bb[cv()-1] else bb[cv()] end
          ,aa[any] = case when aa[cv()] < bb[cv()-1] then aa[cv()-1] else aa[cv()] end
          ,bb[any] order by rn desc = case when aa[cv()] = aa[cv()+1] then bb[cv()+1] else bb[cv()] end
         )
    order by 1,2
    
    • Hi Sayan,

      Thanks for your contribution. You don’t get the same result as I do for test case “2:meet” and test case “12:met by”. You are probably thinking inclusive end dates, whereas I recommend and use exclusive end dates.

      When I have the choice, I prefer the “Start of Group” method over the MODEL clause because it’s somewhat more efficient and more widely understood. Still, your solution is valid and illustrates my favorite use of the MODEL clause, with “dimension by (row_number()…”

      • Hi Stew,

        It’s quite easy to change to exclusive end dates: we have to change just one condition: aa[cv()] < bb[cv()-1] to aa[cv()] <= bb[cv()-1]
        select
        test_case
        ,start_date
        ,end_date
        ,aa as group_start
        ,bb as group_end
        from t
        model
        partition by (test_case)
        dimension by (row_number()over(partition by test_case order by start_date,end_date desc) rn)
        measures (start_date, end_date, start_date aa,end_date bb)
        rules sequential order(
        bb[any] = case when bb[cv()] < bb[cv()-1] then bb[cv()-1] else bb[cv()] end
        ,aa[any] = case when aa[cv()] <= bb[cv()-1] then aa[cv()-1] else aa[cv()] end
        ,bb[any] order by rn desc = case when aa[cv()] = aa[cv()+1] then bb[cv()+1] else bb[cv()] end
        )
        order by 1,2

        By the way, we also discussed another similar problem with discrete dates(discretization unit = 1 day), we also needed to merge ranges
        01.jan.2001 – 02.jan.2002 and 03.jan.2002 – 05.jan.2002 into one 01.jan.2001 – 05.jan.2002:

        http://www.sql.ru/forum/actualutils.aspx?action=gotomsg&tid=810924&msg=9904299

  2. Hi Stew,
    This question is about a year late but hope you can still assist.

    How would you revise your query if you only have the following data points: member_id, start_date, end_date with the assumption that there are no null values in either date fields and that any of the 13 cases can occur for any member?

    Given the example:
    Member ID Start Date End Date
    A 3/3/2015 3/10/2015
    A 2/25/2015 3/10/2015
    A 2/24/2015 3/10/2015
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    A 10/29/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/6/2015
    C 12/11/2014 2/10/2015

    The desired results are:
    Member ID Start Date End Date
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/10/2015

    Truly appreciate the help on this.

    — whiners

  3. Hi Stew,
    Would you be able to revise your query to transform this:
    Member ID Start Date End Date
    A 3/3/2015 3/10/2015
    A 2/25/2015 3/10/2015
    A 2/24/2015 3/10/2015
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    A 10/29/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/6/2015
    C 12/11/2014 2/10/2015

    into this:
    Member ID Start Date End Date
    A 2/23/2015 3/10/2015
    A 10/28/2014 11/24/2014
    B 2/23/2015 3/29/2015
    B 5/5/2015 6/5/2015
    C 11/27/2014 2/10/2015

    Thanks so much!

    – whiners

    • Hi Whiners,

      Thanks for reading and commenting. I took my code, did a global change from TEST_CASE to MEMBER_ID and it worked.

      What did you try and why didn’t it work?

      If you still have problems, post details: CREATE TABLE and INSERT statements, your attempt at a solution and the exact result of your test.

      Best regards, Stew

      • Hey Stew,

        I apologize for the double post and delayed feedback.
        Your query worked perfectly for the dataset and requirement that I have.

        Truly appreciated your expert insight in this endeavor.

  4. Hi Stew,

    I have similar requirement. When there is NO GAP between the end date of first record and start date of second, then the records should roll-up into one record. When there is a gap then it should print as it is..

    Similar to your test case 1 but the output should be rolled up..

    id start date end date
    12310 1/1/2015 12/31/2015
    12310 3/1/2016 12/31/2199
    12345 1/1/2015 3/31/2015
    12345 4/1/2015 12/31/2199
    12347 1/1/2015 4/30/2015
    12347 5/1/2015 12/31/2015
    12347 1/1/2016 12/31/2199

    expected result

    12310 1/1/2015 12/31/2015
    12310 3/1/2016 12/31/2199
    12345 1/1/2015 12/31/2199
    12347 1/1/2015 12/31/2199

    Thanks in advance.

    • Hi Sam,

      A few questions for you:
      – What do you want to happen when there are overlaps?
      – What version of the Oracle Database do you have?
      – Where are the CREATE TABLE and INSERT statements that would let us share test data?
      – What have you tried and what were the results?

      Best regards, Stew

      • Hi Stew,

        There can’t be overlaps in my case. Oracle 12c version.

        create table tmp (
        cid VARCHAR2(50),
        start_date date,
        end_date date
        );
        insert into tmp
        values(‘12345′,’01/01/2015′,’03/31/2015’);
        insert into tmp
        values(‘12345′,’04/01/2015′,’12/31/2199’);
        insert into tmp
        values(‘12347′,’01/01/2015′,’04/30/2015’);
        insert into tmp
        values(‘12347′,’05/01/2015′,’12/31/2015’);
        insert into tmp
        values(‘12347′,’01/01/2016′,’12/31/2199’);
        insert into tmp
        values(‘12310′,’01/01/2015′,’12/31/2015’);
        insert into tmp
        values(‘12310′,’03/01/2016′,’12/31/2199’);
        commit;

        I think I found what I want.. this query seems to work for my needs…

        select cid, min(start_date) df, max(end_date) dt
        from (select t2.*, rwn – sum(gap) over(order by rwn) as grp
        from (select t1.*,
        row_number() over(order by cid, start_date) rwn,
        case
        when lag(cid)
        over(partition by cid order by start_date) = cid and
        lag(end_date)
        over(partition by cid order by start_date) + 1 =
        start_date then
        1
        else
        0
        end gap
        from tmp t1) t2)
        group by cid, grp
        order by cid, min(start_date)

        Thanks for your support !

  5. Sam,

    That looks like it would work. Given your version, you could do the same thing a bit more efficiently and with less code as:

    select * from tmp
    match_recognize(
        partition by cid order by start_date
        measures a.start_date start_date, end_date end_date
        pattern(a b*)
        define b as start_date = prev(end_date) + 1
    );
    
    • I think this query works only for special provided day pattern. This match_recognize should be more generic:

      select * from tmp
      match_recognize(
          partition by cid order by start_date
          measures a.start_date start_date, end_date end_date
          pattern(a b*)
          define b as start_date <= a.end_date
      );
      
      • HI Wernfied, and thanks for your contribution.

        Sam said “there can’t be overlaps”. Knowing how hard it is to enforce that rule, I asked myself: “what do I want to do if there is an overlap, even though there shouldn’t be?”

        My solution keeps the overlap in the output where it can be seen.

        Your solution accepts the overlap and hides it.

        Either solution is “correct”, depending on the desired result.

        Best regards,
        Stew

  6. Hi Stew, looks I’ve found the perfect forum to post my question. I am using Hive SQL and a novice at that may I add.
    I have a dataset full off overlapping dates, here is one such example in the create table / insert statement:

    create table test (
    account VARCHAR2(50),
    start_date date,
    end_date date
    );
    insert into test
    values(‘1111222333′,’05/01/2016′,’15/02/2016’);
    insert into tmp
    values(‘1111222333′,’29/01/2016′,’04/04/2016’);
    insert into test
    values(‘1111222333′,’20/03/2016′,’13/05/2016’);
    insert into test
    values(‘1111222333′,’26/04/2016′,’06/06/2016’);
    insert into test
    values(‘1111222333′,’05/05/2016′,’06/06/2016’);
    insert into test
    values(‘1111222333′,’13/09/2016′,’10/10/2016’);
    insert into test
    values(‘1111222333′,’14/10/2016′,’15/12/2016’);
    insert into test
    values(‘1111222333′,’09/08/2017′,’25/08/2017’);
    insert into test
    values(‘1111222333′,’25/10/2017′,’10/11/2017’);
    insert into test
    values(‘1111222333′,’02/11/2017′,’05/01/2018’);

    the output I require

    Account Start Date End_Date
    —————————————————
    1111222333 05/01/2016 06/06/2016
    1111222333 13/09/2016 10/10/2016
    1111222333 14/10/2016 15/12/2016
    1111222333 09/08/2017 25/08/2017
    1111222333 25/10/2017 05/01/2018

    I tried to adapt the code to work with HSQL but some of the functions wouldn’t work so unable to test

    Any help on this would be greatly appreciated.

    Thanks

    • Hello Ash,

      I don’t know what Hive SQL or HSQL are. I searched on Google and found Hive Query Language (HQL) and HQLDB, which are very different things. To my knowledge neither of these support the code I blogged about, either MATCH_RECOGNIZE or analytic functions. I suggest you use precise terms to search for a forum that supports the data store you are interested in. I blog about the Oracle database.

      Best regards, Stew

      • No problem Stew, thanks for looking into it. The information in this post has definitely helped me tho, used some of the logic and managed to work out a solution :)

  7. Hi Stew, thanks for your great article.

    One question: I have two tables, one with absences and one with working times. I want to merge these tables and force the absences to show. In the working times-table. There will be overlapping- and met by-data in the working times-tables and the absences overlap too.

    Do you have an idea, how solve this?

    Thanks

    • Hi Stefan,

      I am sure this can be solved if enough details are given about the environment, the input and the desired output.

      – What version of the Oracle database do you have?
      – Within the absences table, are there ranges that meet or overlap?
      – If so, can they be merged into one row before comparing to the other table?
      – Within the working times table, are there ranges that meet or overlap?
      – If so, can they be merged into one row before comparing to the other table?
      – Can the starting time be NULL?
      – Can the ending time be NULL?
      – May I assume exclusive end times?

      Best regards,
      Stew

      • thanks for your quick response.

        I´m using Oracle 12c.
        In the absence table ranges do not overlap, but perhaps meet. There is no problem to merge the dates into one row.

        In the working times table ranges do not overlap either, but can contain a range where an absence exists, meets an absence, equals an absence… and so on. They can also be merged into one row.

        There are no NULL values in both tables. The working time tables ends always at 31.12.9999.

        Best regards

  8. For example:

    working times:
    from to scope (percent)
    01.10.2014 30.06.2016 100
    01.05.2018 31.01.2020 50
    01.02.2020 31.12.9999 75

    absence times:
    from to Reason
    01.07.2016 30.04.2018 A
    01.05.2018 31.01.2019 B
    01.06.2019 30.04.2020 C

    The desired output is a list, like

    from to scope/reason
    01.10.2014 30.06.2016 100
    01.07.2016 30.04.2018 A
    01.05.2018 31.01.2019 B
    01.02.2019 31.05.2019 50
    01.06.2019 30.04.2020 C
    01.05.2020 31.12.9999 75

    Thanks for your help!!

    • Assuming:

      • Dates at midnight
      • Inclusive end dates (which make things more complicated)
      • If reason and scope overlap, reason wins
      • If two reasons or two scopes overlap, the minimum wins
      create table working(from_date, through_date, scope) as
      select to_date('01.10.2014','dd-mm-yyyy'), to_date('30.06.2016','dd-mm-yyyy'), 100 from dual union all
      select to_date('01.05.2018','dd-mm-yyyy'), to_date('31.01.2020','dd-mm-yyyy'), 50 from dual union all
      select to_date('01.02.2020','dd-mm-yyyy'), to_date('31.12.9999','dd-mm-yyyy'), 75 from dual;
      
      create table absences(from_date, through_date, reason) as
      select to_date('01.07.2016','dd-mm-yyyy'), to_date('30.04.2018','dd-mm-yyyy'), 'A' from dual union all
      select to_date('01.05.2018','dd-mm-yyyy'), to_date('31.01.2019','dd-mm-yyyy'), 'B' from dual union all
      select to_date('01.06.2019','dd-mm-yyyy'), to_date('30.04.2020','dd-mm-yyyy'), 'C' from dual;
      
      with data(from_date, to_date, reason, scope) as (
        select from_date, nullif(through_date, date '9999-12-31')+1, reason, cast(null as number)
        from absences
        union all
        select from_date, nullif(through_date, date '9999-12-31')+1, null, scope
        from working
      )
      , unpivoted as (
        select * from data
        unpivot include nulls (
          (from_to, to_date) 
          for seg_start in ((from_date, to_date) as -1, (to_date, to_date) as 1)
        )
      )
      , segmented as (
        select SEG_FROM, SEG_TO, FIRST_REASON, FIRST_SCOPE
        from unpivoted
        match_recognize(
          order by from_to, seg_start desc, to_date
          measures prev(from_to) seg_from, from_to seg_to,
            a.reason first_reason, a.scope first_scope
          all rows per match
          after match skip to next row
          pattern({-a-} b+)
          define a as seg_start = -1,
            b as from_to <= a.to_date or a.to_date is null
        )
        where seg_from < seg_to or seg_to is null
      )
      , grouped as (
        select seg_from, reason, seg_to,
          case when reason is null then scope end scope
        from (
          select seg_from, seg_to, min(first_reason) reason, min(first_scope) scope
          from segmented
          group by seg_from, seg_to
        )
      )
      select to_char(from_date,'dd.mm.yyyy') from_date,
        to_char(through_date,'dd.mm.yyyy') through_date,
        case when reason is null then scope||null else reason end scope_reason
      from grouped
      match_recognize(
        partition by reason, scope order by seg_from, seg_to
        measures first(seg_from) from_date,
          case when last(seg_to) is null then date '999-12-31' else last(seg_to)-1 end through_date
        pattern(a b*)
        define b as seg_from = prev(seg_to)
      ) m
      order by m.from_date, m.through_date;
  9. WOW! Can´t believe it. You are great!

    I don´t quite understand the magic. Can you explain the steps shortly?

    Thank you!!

Leave a reply to whiners Cancel reply