Merging Overlapping Date Ranges

Posted on 2015/06/08 by stewashton

A recent question on the OTN forum asked about merging date ranges. Thanks to Karthick_App I realized that my previous blog on this subject was incomplete. Here is a solution that will merge any date ranges that meet, or that “overlap” in any way.

Define your terms

As a reminder, Allen’s Interval Algebra defines 13 possible relations between date ranges.

Day of month:	1	2	3	4
1: precedes	1	2
1: precedes			3	4
–
2: meets	1	2
2: meets		2	3
–
3: overlaps	1		3
3: overlaps		2		4
–
4: finished by	1		3
4: finished by		2	3
–
5: contains	1			4
5: contains		2	3
–
6: starts	1	2
6: starts	1		3
–
7: equals	1	2
7: equals	1	2
–
8: started by	1		3
8: started by	1	2
–
9: during		2	3
9: during	1			4
–
10: finishes		2	3
10: finishes	1		3
–
11: overlapped by		2		4
11: overlapped by	1		3
–
12: met by		2	3
12: met by	1	2
–
13: preceded by			3	4
13: preceded by	1	2

As you can see, Allen uses “overlap” only when there is a partial overlap. The OTN questioner really wanted to merge any date ranges that did not have a gap, in other words every case except “precedes” and “preceded by”.

To test this, I’ll create a little table with 13 test cases:

drop table t purge;
create table t (
  test_case varchar2(32) not null,
  start_date date not null,
  end_date date not null
);
Insert into t values ('01:precedes',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('01:precedes',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('02:meets',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('02:meets',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('04:finished by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('04:finished by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('05:contains',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('05:contains',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('09:during',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('09:during',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('10:finishes',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('10:finishes',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('11:overlapped by',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('11:overlapped by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('13:preceded by',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('13:preceded by',to_date('01','DD'),to_date('02','DD'));
commit;

TEST_CASE	START_DATE	END_DATE
01:precedes	2015-06-01	2015-06-02
01:precedes	2015-06-03	2015-06-04
02:meets	2015-06-01	2015-06-02
02:meets	2015-06-02	2015-06-03
03:overlaps	2015-06-01	2015-06-03
03:overlaps	2015-06-02	2015-06-04
04:finished by	2015-06-01	2015-06-03
04:finished by	2015-06-02	2015-06-03
05:contains	2015-06-01	2015-06-04
05:contains	2015-06-02	2015-06-03
06:starts	2015-06-01	2015-06-02
06:starts	2015-06-01	2015-06-03
07:equals	2015-06-01	2015-06-02
07:equals	2015-06-01	2015-06-02
08:started by	2015-06-01	2015-06-03
08:started by	2015-06-01	2015-06-02
09:during	2015-06-02	2015-06-03
09:during	2015-06-01	2015-06-04
10:finishes	2015-06-02	2015-06-03
10:finishes	2015-06-01	2015-06-03
11:overlapped by	2015-06-02	2015-06-04
11:overlapped by	2015-06-01	2015-06-03
12:met by	2015-06-02	2015-06-03
12:met by	2015-06-01	2015-06-02
13:preceded by	2015-06-03	2015-06-04
13:preceded by	2015-06-01	2015-06-02

The solution

I wrote about the “start of group” method some time ago.

The first step is to assign 1 to each row that starts a group and 0 to every other row.
This is the tricky part: I partition by test_case and order by start_date and end_date, and start a group only when the current start_date is greater than any preceding end_date.
The next step is to assign a group to each row using a running sum of the 1s and 0s.
Finally, I do the group by.

with grp_starts as (
  select test_case, start_date, end_date,
  case
    when start_date > max(end_date) over(
      partition by test_case order by start_date, end_date
      rows between unbounded preceding and 1 preceding
    )
    then 1 else 0
  end grp_start
  from t
)
, grps as (
  select test_case, start_date, end_date,
  sum(grp_start) over(
    partition by test_case order by start_date, end_date
  ) grp
  from grp_starts
)
select test_case,
min(start_date) start_date,
max(end_date) end_date
from grps
group by test_case, grp
order by 1, 2;

TEST_CASE	START_DATE	END_DATE
01:precedes	2015-06-01	2015-06-02
01:precedes	2015-06-03	2015-06-04
02:meets	2015-06-01	2015-06-03
03:overlaps	2015-06-01	2015-06-04
04:finished by	2015-06-01	2015-06-03
05:contains	2015-06-01	2015-06-04
06:starts	2015-06-01	2015-06-03
07:equals	2015-06-01	2015-06-02
08:started by	2015-06-01	2015-06-03
09:during	2015-06-01	2015-06-04
10:finishes	2015-06-01	2015-06-03
11:overlapped by	2015-06-01	2015-06-04
12:met by	2015-06-01	2015-06-03
13:preceded by	2015-06-01	2015-06-02
13:preceded by	2015-06-03	2015-06-04

Conclusion

Once again, the “start of group” method demonstrates its power, combined with a “window clause” in the analytic function that lets us identify the starting row of our group. Thanks again to Karthick_App for noticing that my first solution was incomplete.

And don’t forget: this doesn’t work if the end date can be NULL. If you allow that, substitute DATE ‘9999-12-31’ for the NULL at the beginning of the query, then translate it back to NULL at the end.

28 thoughts on “Merging Overlapping Date Ranges”

Sayan Malakshinov says:

2015/06/10 at 01:10
Just another solution with “model” clause: it simply adds the beginning and ending of group for every row. And we can merge groups(or simply add distinct) later if it is needed:
```
select 
   test_case
  ,start_date
  ,end_date
  ,aa as group_start
  ,bb as group_end
from t
model
     partition by (test_case)
     dimension by (row_number()over(partition by test_case order by start_date,end_date desc) rn)
     measures (start_date, end_date, start_date aa,end_date bb)
     rules sequential order(
       bb[any] = case when bb[cv()] < bb[cv()-1] then bb[cv()-1] else bb[cv()] end
      ,aa[any] = case when aa[cv()] < bb[cv()-1] then aa[cv()-1] else aa[cv()] end
      ,bb[any] order by rn desc = case when aa[cv()] = aa[cv()+1] then bb[cv()+1] else bb[cv()] end
     )
order by 1,2
```
Reply
- stewashton says:
  
  2015/06/10 at 16:38
  
  Hi Sayan,
  
  Thanks for your contribution. You don’t get the same result as I do for test case “2:meet” and test case “12:met by”. You are probably thinking inclusive end dates, whereas I recommend and use exclusive end dates.
  
  When I have the choice, I prefer the “Start of Group” method over the MODEL clause because it’s somewhat more efficient and more widely understood. Still, your solution is valid and illustrates my favorite use of the MODEL clause, with “dimension by (row_number()…”
  
  Reply
  - Sayan Malakshinov says:
    
    2015/06/12 at 22:38
    
    Hi Stew,
    
    It’s quite easy to change to exclusive end dates: we have to change just one condition: aa[cv()] < bb[cv()-1] to aa[cv()] <= bb[cv()-1]
    select
    test_case
    ,start_date
    ,end_date
    ,aa as group_start
    ,bb as group_end
    from t
    model
    partition by (test_case)
    dimension by (row_number()over(partition by test_case order by start_date,end_date desc) rn)
    measures (start_date, end_date, start_date aa,end_date bb)
    rules sequential order(
    bb[any] = case when bb[cv()] < bb[cv()-1] then bb[cv()-1] else bb[cv()] end
    ,aa[any] = case when aa[cv()] <= bb[cv()-1] then aa[cv()-1] else aa[cv()] end
    ,bb[any] order by rn desc = case when aa[cv()] = aa[cv()+1] then bb[cv()+1] else bb[cv()] end
    )
    order by 1,2
    
    By the way, we also discussed another similar problem with discrete dates(discretization unit = 1 day), we also needed to merge ranges
    01.jan.2001 – 02.jan.2002 and 03.jan.2002 – 05.jan.2002 into one 01.jan.2001 – 05.jan.2002:
    
    http://www.sql.ru/forum/actualutils.aspx?action=gotomsg&tid=810924&msg=9904299
whiners says:

2016/07/15 at 14:27

Hi Stew,
This question is about a year late but hope you can still assist.

How would you revise your query if you only have the following data points: member_id, start_date, end_date with the assumption that there are no null values in either date fields and that any of the 13 cases can occur for any member?

Given the example:
Member ID Start Date End Date
A 3/3/2015 3/10/2015
A 2/25/2015 3/10/2015
A 2/24/2015 3/10/2015
A 2/23/2015 3/10/2015
A 10/28/2014 11/24/2014
A 10/29/2014 11/24/2014
B 2/23/2015 3/29/2015
B 5/5/2015 6/5/2015
C 11/27/2014 2/6/2015
C 12/11/2014 2/10/2015

The desired results are:
Member ID Start Date End Date
A 2/23/2015 3/10/2015
A 10/28/2014 11/24/2014
B 2/23/2015 3/29/2015
B 5/5/2015 6/5/2015
C 11/27/2014 2/10/2015

Truly appreciate the help on this.

— whiners

Reply
whiners says:

2016/07/15 at 14:34

Hi Stew,
Would you be able to revise your query to transform this:
Member ID Start Date End Date
A 3/3/2015 3/10/2015
A 2/25/2015 3/10/2015
A 2/24/2015 3/10/2015
A 2/23/2015 3/10/2015
A 10/28/2014 11/24/2014
A 10/29/2014 11/24/2014
B 2/23/2015 3/29/2015
B 5/5/2015 6/5/2015
C 11/27/2014 2/6/2015
C 12/11/2014 2/10/2015

into this:
Member ID Start Date End Date
A 2/23/2015 3/10/2015
A 10/28/2014 11/24/2014
B 2/23/2015 3/29/2015
B 5/5/2015 6/5/2015
C 11/27/2014 2/10/2015

Thanks so much!

– whiners

Reply
- stewashton says:
  
  2016/07/17 at 11:58
  
  Hi Whiners,
  
  Thanks for reading and commenting. I took my code, did a global change from TEST_CASE to MEMBER_ID and it worked.
  
  What did you try and why didn’t it work?
  
  If you still have problems, post details: CREATE TABLE and INSERT statements, your attempt at a solution and the exact result of your test.
  
  Best regards, Stew
  
  Reply
  - Whiners says:
    
    2016/07/22 at 20:32
    
    Hey Stew,
    
    I apologize for the double post and delayed feedback.
    Your query worked perfectly for the dataset and requirement that I have.
    
    Truly appreciated your expert insight in this endeavor.
Sam says:

2017/07/14 at 03:39

Hi Stew,

I have similar requirement. When there is NO GAP between the end date of first record and start date of second, then the records should roll-up into one record. When there is a gap then it should print as it is..

Similar to your test case 1 but the output should be rolled up..

id start date end date
12310 1/1/2015 12/31/2015
12310 3/1/2016 12/31/2199
12345 1/1/2015 3/31/2015
12345 4/1/2015 12/31/2199
12347 1/1/2015 4/30/2015
12347 5/1/2015 12/31/2015
12347 1/1/2016 12/31/2199

expected result

12310 1/1/2015 12/31/2015
12310 3/1/2016 12/31/2199
12345 1/1/2015 12/31/2199
12347 1/1/2015 12/31/2199

Thanks in advance.

Reply
- stewashton says:
  
  2017/07/14 at 09:47
  
  Hi Sam,
  
  A few questions for you:
  – What do you want to happen when there are overlaps?
  – What version of the Oracle Database do you have?
  – Where are the CREATE TABLE and INSERT statements that would let us share test data?
  – What have you tried and what were the results?
  
  Best regards, Stew
  
  Reply
  - Sam says:
    
    2017/07/14 at 14:37
    
    Hi Stew,
    
    There can’t be overlaps in my case. Oracle 12c version.
    
    create table tmp (
    cid VARCHAR2(50),
    start_date date,
    end_date date
    );
    insert into tmp
    values(‘12345′,’01/01/2015′,’03/31/2015’);
    insert into tmp
    values(‘12345′,’04/01/2015′,’12/31/2199’);
    insert into tmp
    values(‘12347′,’01/01/2015′,’04/30/2015’);
    insert into tmp
    values(‘12347′,’05/01/2015′,’12/31/2015’);
    insert into tmp
    values(‘12347′,’01/01/2016′,’12/31/2199’);
    insert into tmp
    values(‘12310′,’01/01/2015′,’12/31/2015’);
    insert into tmp
    values(‘12310′,’03/01/2016′,’12/31/2199’);
    commit;
    
    I think I found what I want.. this query seems to work for my needs…
    
    select cid, min(start_date) df, max(end_date) dt
    from (select t2.*, rwn – sum(gap) over(order by rwn) as grp
    from (select t1.*,
    row_number() over(order by cid, start_date) rwn,
    case
    when lag(cid)
    over(partition by cid order by start_date) = cid and
    lag(end_date)
    over(partition by cid order by start_date) + 1 =
    start_date then
    1
    else
    0
    end gap
    from tmp t1) t2)
    group by cid, grp
    order by cid, min(start_date)
    
    Thanks for your support !
stewashton says:

2017/07/14 at 16:22
Sam,

That looks like it would work. Given your version, you could do the same thing a bit more efficiently and with less code as:
```
select * from tmp
match_recognize(
    partition by cid order by start_date
    measures a.start_date start_date, end_date end_date
    pattern(a b*)
    define b as start_date = prev(end_date) + 1
);
```
Reply
- Sam says:
  
  2017/07/14 at 20:33
  
  Thanks Stew !! Good to know a new way of doing this .. Looks clean..
  
  Reply
- Wernfied says:
  
  2018/06/20 at 19:39
  I think this query works only for special provided day pattern. This match_recognize should be more generic:
```
select * from tmp
match_recognize(
    partition by cid order by start_date
    measures a.start_date start_date, end_date end_date
    pattern(a b*)
    define b as start_date <= a.end_date
);
```
  Reply
  - stewashton says:
    
    2018/06/24 at 07:45
    
    HI Wernfied, and thanks for your contribution.
    
    Sam said “there can’t be overlaps”. Knowing how hard it is to enforce that rule, I asked myself: “what do I want to do if there is an overlap, even though there shouldn’t be?”
    
    My solution keeps the overlap in the output where it can be seen.
    
    Your solution accepts the overlap and hides it.
    
    Either solution is “correct”, depending on the desired result.
    
    Best regards,
    Stew
Ryu says:

2017/11/22 at 16:12

Thanks. Nice and clean.

Reply
Ash says:

2018/04/30 at 11:08

Hi Stew, looks I’ve found the perfect forum to post my question. I am using Hive SQL and a novice at that may I add.
I have a dataset full off overlapping dates, here is one such example in the create table / insert statement:

create table test (
account VARCHAR2(50),
start_date date,
end_date date
);
insert into test
values(‘1111222333′,’05/01/2016′,’15/02/2016’);
insert into tmp
values(‘1111222333′,’29/01/2016′,’04/04/2016’);
insert into test
values(‘1111222333′,’20/03/2016′,’13/05/2016’);
insert into test
values(‘1111222333′,’26/04/2016′,’06/06/2016’);
insert into test
values(‘1111222333′,’05/05/2016′,’06/06/2016’);
insert into test
values(‘1111222333′,’13/09/2016′,’10/10/2016’);
insert into test
values(‘1111222333′,’14/10/2016′,’15/12/2016’);
insert into test
values(‘1111222333′,’09/08/2017′,’25/08/2017’);
insert into test
values(‘1111222333′,’25/10/2017′,’10/11/2017’);
insert into test
values(‘1111222333′,’02/11/2017′,’05/01/2018’);

the output I require

Account Start Date End_Date
—————————————————
1111222333 05/01/2016 06/06/2016
1111222333 13/09/2016 10/10/2016
1111222333 14/10/2016 15/12/2016
1111222333 09/08/2017 25/08/2017
1111222333 25/10/2017 05/01/2018

I tried to adapt the code to work with HSQL but some of the functions wouldn’t work so unable to test

Any help on this would be greatly appreciated.

Thanks

Reply
- stewashton says:
  
  2018/04/30 at 16:46
  
  Hello Ash,
  
  I don’t know what Hive SQL or HSQL are. I searched on Google and found Hive Query Language (HQL) and HQLDB, which are very different things. To my knowledge neither of these support the code I blogged about, either MATCH_RECOGNIZE or analytic functions. I suggest you use precise terms to search for a forum that supports the data store you are interested in. I blog about the Oracle database.
  
  Best regards, Stew
  
  Reply
  - Ash says:
    
    2018/05/01 at 13:04
    
    No problem Stew, thanks for looking into it. The information in this post has definitely helped me tho, used some of the logic and managed to work out a solution :)
Stefan says:

2019/06/17 at 11:32

Hi Stew, thanks for your great article.

One question: I have two tables, one with absences and one with working times. I want to merge these tables and force the absences to show. In the working times-table. There will be overlapping- and met by-data in the working times-tables and the absences overlap too.

Do you have an idea, how solve this?

Thanks

Reply
- stewashton says:
  
  2019/06/17 at 14:32
  
  Hi Stefan,
  
  I am sure this can be solved if enough details are given about the environment, the input and the desired output.
  
  – What version of the Oracle database do you have?
  – Within the absences table, are there ranges that meet or overlap?
  – If so, can they be merged into one row before comparing to the other table?
  – Within the working times table, are there ranges that meet or overlap?
  – If so, can they be merged into one row before comparing to the other table?
  – Can the starting time be NULL?
  – Can the ending time be NULL?
  – May I assume exclusive end times?
  
  Best regards,
  Stew
  
  Reply
  - Stefan says:
    
    2019/06/17 at 15:22
    
    thanks for your quick response.
    
    I´m using Oracle 12c.
    In the absence table ranges do not overlap, but perhaps meet. There is no problem to merge the dates into one row.
    
    In the working times table ranges do not overlap either, but can contain a range where an absence exists, meets an absence, equals an absence… and so on. They can also be merged into one row.
    
    There are no NULL values in both tables. The working time tables ends always at 31.12.9999.
    
    Best regards
Stefan says:

2019/06/17 at 15:52

For example:

working times:
from to scope (percent)
01.10.2014 30.06.2016 100
01.05.2018 31.01.2020 50
01.02.2020 31.12.9999 75

absence times:
from to Reason
01.07.2016 30.04.2018 A
01.05.2018 31.01.2019 B
01.06.2019 30.04.2020 C

The desired output is a list, like

from to scope/reason
01.10.2014 30.06.2016 100
01.07.2016 30.04.2018 A
01.05.2018 31.01.2019 B
01.02.2019 31.05.2019 50
01.06.2019 30.04.2020 C
01.05.2020 31.12.9999 75

Thanks for your help!!

Reply
- stewashton says:
  
  2019/06/18 at 11:28
  
  Stefan, I cannot work on this until next week. If you don’t want to wait, I suggest the OTN forum at https://community.oracle.com/community/groundbreakers/database/developer-tools/sql_and_pl_sql
  
  Reply

Hi Stew, did you find the time to look at my problem?

Greetings

Stefan

Assuming:

Dates at midnight
Inclusive end dates (which make things more complicated)
If reason and scope overlap, reason wins
If two reasons or two scopes overlap, the minimum wins

create table working(from_date, through_date, scope) as
select to_date('01.10.2014','dd-mm-yyyy'), to_date('30.06.2016','dd-mm-yyyy'), 100 from dual union all
select to_date('01.05.2018','dd-mm-yyyy'), to_date('31.01.2020','dd-mm-yyyy'), 50 from dual union all
select to_date('01.02.2020','dd-mm-yyyy'), to_date('31.12.9999','dd-mm-yyyy'), 75 from dual;

create table absences(from_date, through_date, reason) as
select to_date('01.07.2016','dd-mm-yyyy'), to_date('30.04.2018','dd-mm-yyyy'), 'A' from dual union all
select to_date('01.05.2018','dd-mm-yyyy'), to_date('31.01.2019','dd-mm-yyyy'), 'B' from dual union all
select to_date('01.06.2019','dd-mm-yyyy'), to_date('30.04.2020','dd-mm-yyyy'), 'C' from dual;

with data(from_date, to_date, reason, scope) as (
  select from_date, nullif(through_date, date '9999-12-31')+1, reason, cast(null as number)
  from absences
  union all
  select from_date, nullif(through_date, date '9999-12-31')+1, null, scope
  from working
)
, unpivoted as (
  select * from data
  unpivot include nulls (
    (from_to, to_date) 
    for seg_start in ((from_date, to_date) as -1, (to_date, to_date) as 1)
  )
)
, segmented as (
  select SEG_FROM, SEG_TO, FIRST_REASON, FIRST_SCOPE
  from unpivoted
  match_recognize(
    order by from_to, seg_start desc, to_date
    measures prev(from_to) seg_from, from_to seg_to,
      a.reason first_reason, a.scope first_scope
    all rows per match
    after match skip to next row
    pattern({-a-} b+)
    define a as seg_start = -1,
      b as from_to <= a.to_date or a.to_date is null
  )
  where seg_from < seg_to or seg_to is null
)
, grouped as (
  select seg_from, reason, seg_to,
    case when reason is null then scope end scope
  from (
    select seg_from, seg_to, min(first_reason) reason, min(first_scope) scope
    from segmented
    group by seg_from, seg_to
  )
)
select to_char(from_date,'dd.mm.yyyy') from_date,
  to_char(through_date,'dd.mm.yyyy') through_date,
  case when reason is null then scope||null else reason end scope_reason
from grouped
match_recognize(
  partition by reason, scope order by seg_from, seg_to
  measures first(seg_from) from_date,
    case when last(seg_to) is null then date '999-12-31' else last(seg_to)-1 end through_date
  pattern(a b*)
  define b as seg_from = prev(seg_to)
) m
order by m.from_date, m.through_date;

Stefan says:

2019/07/02 at 10:33

WOW! Can´t believe it. You are great!

I don´t quite understand the magic. Can you explain the steps shortly?

Thank you!!

Reply
- stewashton says:
  
  2019/07/02 at 11:20
  
  HI,
  
  This is a variant of the problem I blogged about here: https://stewashton.wordpress.com/2018/12/03/ranges-with-nulls-06-overlaps-conflicts/
  
  For a full understanding, you may want to read the earlier “Ranges with nulls” posts first.
  
  Regards,
  Stew
  
  Reply
jaknowles7 says:

2020/07/25 at 00:25

This was extremely helpful. Thank you stewashton!

Reply

An Oracle Programmer

SQL and PL/SQL techniques and solutions

Merging Overlapping Date Ranges

Define your terms

The solution

Conclusion

28 thoughts on “Merging Overlapping Date Ranges”

Leave a reply to whiners Cancel reply

An Oracle Programmer

SQL and PL/SQL techniques and solutions

Define your terms

The solution

Conclusion

Share this:

Related

28 thoughts on “Merging Overlapping Date Ranges”

Leave a reply to whiners Cancel reply