A recent question on the OTN forum asked about merging date ranges. Thanks to Karthick_App I realized that my previous blog on this subject was incomplete. Here is a solution that will merge any date ranges that meet, or that “overlap” in any way.
Define your terms
As a reminder, Allen’s Interval Algebra defines 13 possible relations between date ranges.
Day of month: | 1 | 2 | 3 | 4 |
---|---|---|---|---|
1: precedes | 1 | 2 | ||
3 | 4 | |||
– | ||||
2: meets | 1 | 2 | ||
2 | 3 | |||
– | ||||
3: overlaps | 1 | 3 | ||
2 | 4 | |||
– | ||||
4: finished by | 1 | 3 | ||
2 | 3 | |||
– | ||||
5: contains | 1 | 4 | ||
2 | 3 | |||
– | ||||
6: starts | 1 | 2 | ||
1 | 3 | |||
– | ||||
7: equals | 1 | 2 | ||
1 | 2 | |||
– | ||||
8: started by | 1 | 3 | ||
1 | 2 | |||
– | ||||
9: during | 2 | 3 | ||
1 | 4 | |||
– | ||||
10: finishes | 2 | 3 | ||
1 | 3 | |||
– | ||||
11: overlapped by | 2 | 4 | ||
1 | 3 | |||
– | ||||
12: met by | 2 | 3 | ||
1 | 2 | |||
– | ||||
13: preceded by | 3 | 4 | ||
1 | 2 |
As you can see, Allen uses “overlap” only when there is a partial overlap. The OTN questioner really wanted to merge any date ranges that did not have a gap, in other words every case except “precedes” and “preceded by”.
To test this, I’ll create a little table with 13 test cases:
drop table t purge; create table t ( test_case varchar2(32) not null, start_date date not null, end_date date not null ); Insert into t values ('01:precedes',to_date('01','DD'),to_date('02','DD')); Insert into t values ('01:precedes',to_date('03','DD'),to_date('04','DD')); Insert into t values ('02:meets',to_date('01','DD'),to_date('02','DD')); Insert into t values ('02:meets',to_date('02','DD'),to_date('03','DD')); Insert into t values ('03:overlaps',to_date('01','DD'),to_date('03','DD')); Insert into t values ('03:overlaps',to_date('02','DD'),to_date('04','DD')); Insert into t values ('04:finished by',to_date('01','DD'),to_date('03','DD')); Insert into t values ('04:finished by',to_date('02','DD'),to_date('03','DD')); Insert into t values ('05:contains',to_date('01','DD'),to_date('04','DD')); Insert into t values ('05:contains',to_date('02','DD'),to_date('03','DD')); Insert into t values ('06:starts',to_date('01','DD'),to_date('02','DD')); Insert into t values ('06:starts',to_date('01','DD'),to_date('03','DD')); Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD')); Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD')); Insert into t values ('08:started by',to_date('01','DD'),to_date('03','DD')); Insert into t values ('08:started by',to_date('01','DD'),to_date('02','DD')); Insert into t values ('09:during',to_date('02','DD'),to_date('03','DD')); Insert into t values ('09:during',to_date('01','DD'),to_date('04','DD')); Insert into t values ('10:finishes',to_date('02','DD'),to_date('03','DD')); Insert into t values ('10:finishes',to_date('01','DD'),to_date('03','DD')); Insert into t values ('11:overlapped by',to_date('02','DD'),to_date('04','DD')); Insert into t values ('11:overlapped by',to_date('01','DD'),to_date('03','DD')); Insert into t values ('12:met by',to_date('02','DD'),to_date('03','DD')); Insert into t values ('12:met by',to_date('01','DD'),to_date('02','DD')); Insert into t values ('13:preceded by',to_date('03','DD'),to_date('04','DD')); Insert into t values ('13:preceded by',to_date('01','DD'),to_date('02','DD')); commit;
TEST_CASE | START_DATE | END_DATE |
---|---|---|
01:precedes | 2015-06-01 | 2015-06-02 |
01:precedes | 2015-06-03 | 2015-06-04 |
02:meets | 2015-06-01 | 2015-06-02 |
02:meets | 2015-06-02 | 2015-06-03 |
03:overlaps | 2015-06-01 | 2015-06-03 |
03:overlaps | 2015-06-02 | 2015-06-04 |
04:finished by | 2015-06-01 | 2015-06-03 |
04:finished by | 2015-06-02 | 2015-06-03 |
05:contains | 2015-06-01 | 2015-06-04 |
05:contains | 2015-06-02 | 2015-06-03 |
06:starts | 2015-06-01 | 2015-06-02 |
06:starts | 2015-06-01 | 2015-06-03 |
07:equals | 2015-06-01 | 2015-06-02 |
07:equals | 2015-06-01 | 2015-06-02 |
08:started by | 2015-06-01 | 2015-06-03 |
08:started by | 2015-06-01 | 2015-06-02 |
09:during | 2015-06-02 | 2015-06-03 |
09:during | 2015-06-01 | 2015-06-04 |
10:finishes | 2015-06-02 | 2015-06-03 |
10:finishes | 2015-06-01 | 2015-06-03 |
11:overlapped by | 2015-06-02 | 2015-06-04 |
11:overlapped by | 2015-06-01 | 2015-06-03 |
12:met by | 2015-06-02 | 2015-06-03 |
12:met by | 2015-06-01 | 2015-06-02 |
13:preceded by | 2015-06-03 | 2015-06-04 |
13:preceded by | 2015-06-01 | 2015-06-02 |
The solution
I wrote about the “start of group” method some time ago.
- The first step is to assign 1 to each row that starts a group and 0 to every other row.
This is the tricky part: I partition by test_case and order by start_date and end_date, and start a group only when the current start_date is greater than any preceding end_date. - The next step is to assign a group to each row using a running sum of the 1s and 0s.
- Finally, I do the group by.
with grp_starts as ( select test_case, start_date, end_date, case when start_date > max(end_date) over( partition by test_case order by start_date, end_date rows between unbounded preceding and 1 preceding ) then 1 else 0 end grp_start from t ) , grps as ( select test_case, start_date, end_date, sum(grp_start) over( partition by test_case order by start_date, end_date ) grp from grp_starts ) select test_case, min(start_date) start_date, max(end_date) end_date from grps group by test_case, grp order by 1, 2;
TEST_CASE | START_DATE | END_DATE |
---|---|---|
01:precedes | 2015-06-01 | 2015-06-02 |
01:precedes | 2015-06-03 | 2015-06-04 |
02:meets | 2015-06-01 | 2015-06-03 |
03:overlaps | 2015-06-01 | 2015-06-04 |
04:finished by | 2015-06-01 | 2015-06-03 |
05:contains | 2015-06-01 | 2015-06-04 |
06:starts | 2015-06-01 | 2015-06-03 |
07:equals | 2015-06-01 | 2015-06-02 |
08:started by | 2015-06-01 | 2015-06-03 |
09:during | 2015-06-01 | 2015-06-04 |
10:finishes | 2015-06-01 | 2015-06-03 |
11:overlapped by | 2015-06-01 | 2015-06-04 |
12:met by | 2015-06-01 | 2015-06-03 |
13:preceded by | 2015-06-01 | 2015-06-02 |
13:preceded by | 2015-06-03 | 2015-06-04 |
Conclusion
Once again, the “start of group” method demonstrates its power, combined with a “window clause” in the analytic function that lets us identify the starting row of our group. Thanks again to Karthick_App for noticing that my first solution was incomplete.
And don’t forget: this doesn’t work if the end date can be NULL. If you allow that, substitute DATE ‘9999-12-31’ for the NULL at the beginning of the query, then translate it back to NULL at the end.
Just another solution with “model” clause: it simply adds the beginning and ending of group for every row. And we can merge groups(or simply add distinct) later if it is needed:
Hi Sayan,
Thanks for your contribution. You don’t get the same result as I do for test case “2:meet” and test case “12:met by”. You are probably thinking inclusive end dates, whereas I recommend and use exclusive end dates.
When I have the choice, I prefer the “Start of Group” method over the MODEL clause because it’s somewhat more efficient and more widely understood. Still, your solution is valid and illustrates my favorite use of the MODEL clause, with “dimension by (row_number()…”
Hi Stew,
It’s quite easy to change to exclusive end dates: we have to change just one condition: aa[cv()] < bb[cv()-1] to aa[cv()] <= bb[cv()-1]
select
test_case
,start_date
,end_date
,aa as group_start
,bb as group_end
from t
model
partition by (test_case)
dimension by (row_number()over(partition by test_case order by start_date,end_date desc) rn)
measures (start_date, end_date, start_date aa,end_date bb)
rules sequential order(
bb[any] = case when bb[cv()] < bb[cv()-1] then bb[cv()-1] else bb[cv()] end
,aa[any] = case when aa[cv()] <= bb[cv()-1] then aa[cv()-1] else aa[cv()] end
,bb[any] order by rn desc = case when aa[cv()] = aa[cv()+1] then bb[cv()+1] else bb[cv()] end
)
order by 1,2
By the way, we also discussed another similar problem with discrete dates(discretization unit = 1 day), we also needed to merge ranges
01.jan.2001 – 02.jan.2002 and 03.jan.2002 – 05.jan.2002 into one 01.jan.2001 – 05.jan.2002:
http://www.sql.ru/forum/actualutils.aspx?action=gotomsg&tid=810924&msg=9904299
Hi Stew,
This question is about a year late but hope you can still assist.
How would you revise your query if you only have the following data points: member_id, start_date, end_date with the assumption that there are no null values in either date fields and that any of the 13 cases can occur for any member?
Given the example:
Member ID Start Date End Date
A 3/3/2015 3/10/2015
A 2/25/2015 3/10/2015
A 2/24/2015 3/10/2015
A 2/23/2015 3/10/2015
A 10/28/2014 11/24/2014
A 10/29/2014 11/24/2014
B 2/23/2015 3/29/2015
B 5/5/2015 6/5/2015
C 11/27/2014 2/6/2015
C 12/11/2014 2/10/2015
The desired results are:
Member ID Start Date End Date
A 2/23/2015 3/10/2015
A 10/28/2014 11/24/2014
B 2/23/2015 3/29/2015
B 5/5/2015 6/5/2015
C 11/27/2014 2/10/2015
Truly appreciate the help on this.
— whiners
Hi Stew,
Would you be able to revise your query to transform this:
Member ID Start Date End Date
A 3/3/2015 3/10/2015
A 2/25/2015 3/10/2015
A 2/24/2015 3/10/2015
A 2/23/2015 3/10/2015
A 10/28/2014 11/24/2014
A 10/29/2014 11/24/2014
B 2/23/2015 3/29/2015
B 5/5/2015 6/5/2015
C 11/27/2014 2/6/2015
C 12/11/2014 2/10/2015
into this:
Member ID Start Date End Date
A 2/23/2015 3/10/2015
A 10/28/2014 11/24/2014
B 2/23/2015 3/29/2015
B 5/5/2015 6/5/2015
C 11/27/2014 2/10/2015
Thanks so much!
– whiners
Hi Whiners,
Thanks for reading and commenting. I took my code, did a global change from TEST_CASE to MEMBER_ID and it worked.
What did you try and why didn’t it work?
If you still have problems, post details: CREATE TABLE and INSERT statements, your attempt at a solution and the exact result of your test.
Best regards, Stew
Hey Stew,
I apologize for the double post and delayed feedback.
Your query worked perfectly for the dataset and requirement that I have.
Truly appreciated your expert insight in this endeavor.
Hi Stew,
I have similar requirement. When there is NO GAP between the end date of first record and start date of second, then the records should roll-up into one record. When there is a gap then it should print as it is..
Similar to your test case 1 but the output should be rolled up..
id start date end date
12310 1/1/2015 12/31/2015
12310 3/1/2016 12/31/2199
12345 1/1/2015 3/31/2015
12345 4/1/2015 12/31/2199
12347 1/1/2015 4/30/2015
12347 5/1/2015 12/31/2015
12347 1/1/2016 12/31/2199
expected result
12310 1/1/2015 12/31/2015
12310 3/1/2016 12/31/2199
12345 1/1/2015 12/31/2199
12347 1/1/2015 12/31/2199
Thanks in advance.
Hi Sam,
A few questions for you:
– What do you want to happen when there are overlaps?
– What version of the Oracle Database do you have?
– Where are the CREATE TABLE and INSERT statements that would let us share test data?
– What have you tried and what were the results?
Best regards, Stew
Hi Stew,
There can’t be overlaps in my case. Oracle 12c version.
create table tmp (
cid VARCHAR2(50),
start_date date,
end_date date
);
insert into tmp
values(‘12345′,’01/01/2015′,’03/31/2015’);
insert into tmp
values(‘12345′,’04/01/2015′,’12/31/2199’);
insert into tmp
values(‘12347′,’01/01/2015′,’04/30/2015’);
insert into tmp
values(‘12347′,’05/01/2015′,’12/31/2015’);
insert into tmp
values(‘12347′,’01/01/2016′,’12/31/2199’);
insert into tmp
values(‘12310′,’01/01/2015′,’12/31/2015’);
insert into tmp
values(‘12310′,’03/01/2016′,’12/31/2199’);
commit;
I think I found what I want.. this query seems to work for my needs…
select cid, min(start_date) df, max(end_date) dt
from (select t2.*, rwn – sum(gap) over(order by rwn) as grp
from (select t1.*,
row_number() over(order by cid, start_date) rwn,
case
when lag(cid)
over(partition by cid order by start_date) = cid and
lag(end_date)
over(partition by cid order by start_date) + 1 =
start_date then
1
else
0
end gap
from tmp t1) t2)
group by cid, grp
order by cid, min(start_date)
Thanks for your support !
Sam,
That looks like it would work. Given your version, you could do the same thing a bit more efficiently and with less code as:
Thanks Stew !! Good to know a new way of doing this .. Looks clean..
I think this query works only for special provided day pattern. This match_recognize should be more generic:
HI Wernfied, and thanks for your contribution.
Sam said “there can’t be overlaps”. Knowing how hard it is to enforce that rule, I asked myself: “what do I want to do if there is an overlap, even though there shouldn’t be?”
My solution keeps the overlap in the output where it can be seen.
Your solution accepts the overlap and hides it.
Either solution is “correct”, depending on the desired result.
Best regards,
Stew
Thanks. Nice and clean.
Hi Stew, looks I’ve found the perfect forum to post my question. I am using Hive SQL and a novice at that may I add.
I have a dataset full off overlapping dates, here is one such example in the create table / insert statement:
create table test (
account VARCHAR2(50),
start_date date,
end_date date
);
insert into test
values(‘1111222333′,’05/01/2016′,’15/02/2016’);
insert into tmp
values(‘1111222333′,’29/01/2016′,’04/04/2016’);
insert into test
values(‘1111222333′,’20/03/2016′,’13/05/2016’);
insert into test
values(‘1111222333′,’26/04/2016′,’06/06/2016’);
insert into test
values(‘1111222333′,’05/05/2016′,’06/06/2016’);
insert into test
values(‘1111222333′,’13/09/2016′,’10/10/2016’);
insert into test
values(‘1111222333′,’14/10/2016′,’15/12/2016’);
insert into test
values(‘1111222333′,’09/08/2017′,’25/08/2017’);
insert into test
values(‘1111222333′,’25/10/2017′,’10/11/2017’);
insert into test
values(‘1111222333′,’02/11/2017′,’05/01/2018’);
the output I require
Account Start Date End_Date
—————————————————
1111222333 05/01/2016 06/06/2016
1111222333 13/09/2016 10/10/2016
1111222333 14/10/2016 15/12/2016
1111222333 09/08/2017 25/08/2017
1111222333 25/10/2017 05/01/2018
I tried to adapt the code to work with HSQL but some of the functions wouldn’t work so unable to test
Any help on this would be greatly appreciated.
Thanks
Hello Ash,
I don’t know what Hive SQL or HSQL are. I searched on Google and found Hive Query Language (HQL) and HQLDB, which are very different things. To my knowledge neither of these support the code I blogged about, either MATCH_RECOGNIZE or analytic functions. I suggest you use precise terms to search for a forum that supports the data store you are interested in. I blog about the Oracle database.
Best regards, Stew
No problem Stew, thanks for looking into it. The information in this post has definitely helped me tho, used some of the logic and managed to work out a solution :)
Hi Stew, thanks for your great article.
One question: I have two tables, one with absences and one with working times. I want to merge these tables and force the absences to show. In the working times-table. There will be overlapping- and met by-data in the working times-tables and the absences overlap too.
Do you have an idea, how solve this?
Thanks
Hi Stefan,
I am sure this can be solved if enough details are given about the environment, the input and the desired output.
– What version of the Oracle database do you have?
– Within the absences table, are there ranges that meet or overlap?
– If so, can they be merged into one row before comparing to the other table?
– Within the working times table, are there ranges that meet or overlap?
– If so, can they be merged into one row before comparing to the other table?
– Can the starting time be NULL?
– Can the ending time be NULL?
– May I assume exclusive end times?
Best regards,
Stew
thanks for your quick response.
I´m using Oracle 12c.
In the absence table ranges do not overlap, but perhaps meet. There is no problem to merge the dates into one row.
In the working times table ranges do not overlap either, but can contain a range where an absence exists, meets an absence, equals an absence… and so on. They can also be merged into one row.
There are no NULL values in both tables. The working time tables ends always at 31.12.9999.
Best regards
For example:
working times:
from to scope (percent)
01.10.2014 30.06.2016 100
01.05.2018 31.01.2020 50
01.02.2020 31.12.9999 75
absence times:
from to Reason
01.07.2016 30.04.2018 A
01.05.2018 31.01.2019 B
01.06.2019 30.04.2020 C
The desired output is a list, like
from to scope/reason
01.10.2014 30.06.2016 100
01.07.2016 30.04.2018 A
01.05.2018 31.01.2019 B
01.02.2019 31.05.2019 50
01.06.2019 30.04.2020 C
01.05.2020 31.12.9999 75
Thanks for your help!!
Stefan, I cannot work on this until next week. If you don’t want to wait, I suggest the OTN forum at https://community.oracle.com/community/groundbreakers/database/developer-tools/sql_and_pl_sql
Hi Stew, did you find the time to look at my problem?
Greetings
Stefan
Assuming:
WOW! Can´t believe it. You are great!
I don´t quite understand the magic. Can you explain the steps shortly?
Thank you!!
HI,
This is a variant of the problem I blogged about here: https://stewashton.wordpress.com/2018/12/03/ranges-with-nulls-06-overlaps-conflicts/
For a full understanding, you may want to read the earlier “Ranges with nulls” posts first.
Regards,
Stew
This was extremely helpful. Thank you stewashton!