Last time I wrote about finding gaps in date ranges: this post is about merging date ranges that “meet”. This is a frequent question; the answer applies to any ranges, not just dates. As a reminder, I consider ranges to “meet” when the end value of one record is equal to the start value of the following record.
[Update 2014-07-04] Warning: none of these solutions work when the test data contains NULL
values. I have added a new section to talk about NULL
s.
[Update 2015-06-08] The variant for “overlapping” date ranges only works for date ranges that partially overlap. For a solution that merges any date ranges that meet or overlap in any way, see Merging Overlapping Date Ranges
Some test data, using date ranges:
drop table t purge; create table t ( id int, start_date date, end_date date ); Insert into T values (1, DATE '2014-01-01', DATE '2014-01-03'); Insert into T values (2, DATE '2014-01-03', DATE '2014-01-05'); Insert into T values (3, DATE '2014-01-05', DATE '2014-01-07'); Insert into T values (4, DATE '2014-01-08', DATE '2014-02-01'); Insert into T values (5, DATE '2014-02-01', DATE '2014-02-10'); Insert into T values (6, DATE '2014-02-05', DATE '2014-02-28'); Insert into T values (7, DATE '2014-02-10', DATE '2014-02-15');
ID | START_DATE | END_DATE |
---|---|---|
1 | 2014-01-01 00:00:00 | 2014-01-03 00:00:00 |
2 | 2014-01-03 00:00:00 | 2014-01-05 00:00:00 |
3 | 2014-01-05 00:00:00 | 2014-01-07 00:00:00 |
4 | 2014-01-08 00:00:00 | 2014-02-01 00:00:00 |
5 | 2014-02-01 00:00:00 | 2014-02-10 00:00:00 |
6 | 2014-02-05 00:00:00 | 2014-02-28 00:00:00 |
7 | 2014-02-10 00:00:00 | 2014-02-15 00:00:00 |
I expect columns 1 through 3 to be merged, as well as columns 4 and 5. The requirement says to merge contiguous ranges, not overlapping ranges, so columns 6 and 7 should stay as they are.
The “start of group” method
I wrote about the “start of group” method a week ago.
- The first step is to assign 1 to each row that starts a group and 0 to every other row. Here the test is simple: a group starts when the current start date is not equal to the previous end date.
- The next step is to assign a group to each row using a running sum of the 1s and 0s.
- Finally, I do the group by.
with grp_starts as ( select start_date, end_date, case when start_date = lag(end_date) over(order by start_date, end_date) then 0 else 1 end grp_start from t ) , grps as ( select start_date, end_date, sum(grp_start) over(order by start_date, end_date) grp from grp_starts ) select min(start_date) start_date, max(end_date) end_date from grps group by grp order by 1, 2;
START_DATE | END_DATE |
---|---|
2014-01-01 00:00:00 | 2014-01-07 00:00:00 |
2014-01-08 00:00:00 | 2014-02-10 00:00:00 |
2014-02-05 00:00:00 | 2014-02-28 00:00:00 |
2014-02-10 00:00:00 | 2014-02-15 00:00:00 |
Using 12c row pattern matching
Here the “pattern” is: any one row (A), followed by zero or more rows (B*) whose start date is equal to the previous end date. By default:
- the next match starts with the row just after the prior match;
- only one row per match is returned;
- since A is not defined, the implicit condition is TRUE: the row always matches.
select * from t match_recognize( order by start_date, end_date measures first(start_date) start_date, last(end_date) end_date pattern(A B*) define B as start_date = prev(end_date) );
Merging overlapping rows
Suppose the requirement is to merge rows that overlap as well as rows that “meet”. With the “start of group” method, I change my first test from “=
” to “<=
” and I’m done. With MATCH_RECOGNIZE
, I change DEFINE B
from “=
” to “<=
” and I return max(end_date)
instead of last(end_date)
.
select * from t match_recognize( order by start_date, end_date measures first(start_date) start_date, max(end_date) end_date pattern(A B*) define B as start_date <= prev(end_date) );
START_DATE | END_DATE |
---|---|
2014-01-01 00:00:00 | 2014-01-07 00:00:00 |
2014-01-08 00:00:00 | 2014-02-28 00:00:00 |
[Update 2014-07-02: handling NULL values]
You may want to allow NULL
values for START_DATE
(for the indefinite past) or END_DATE
(for the indefinite future). If you do, all these solutions break because functions like MAX()
leave NULL
s out. The workarounds are complicated and (I suspect) less efficient.
I think the simplest and surest way to fix this is to substitute real dates for the null values. The earliest possible date in Oracle SQL is 4712-01-01 B.C. and the latest is 9999-12-31. I use COALESCE
to change nulls to these dates, and NULLIF
to change these dates back to nulls for display. Here are simple test code and adjusted solutions. Every statement will return one line with START_DATE
as NULL
and END_DATE
as NULL
.
drop table t cascade constraints purge; create table t ( id int, start_date date, end_date date ); Insert into T values (1, null, DATE '2014-01-03'); Insert into T values (2, DATE '2014-01-03', DATE '2014-01-05'); Insert into T values (3, DATE '2014-01-05', null); commit; with data as ( select id, coalesce(start_date, date '-4712-01-01') start_date, coalesce(end_date, date '9999-12-31') end_date from t ) , grp_starts as ( select start_date, end_date, case when start_date = lag(end_date) over(order by start_date, end_date) then 0 else 1 end grp_start from data ) , grps as ( select start_date, end_date, sum(grp_start) over(order by start_date, end_date) grp from grp_starts ) select nullif(min(start_date), date '-4712-01-01') start_date, nullif(max(end_date), date '9999-12-31') end_date from grps group by grp order by 1 nulls first, 2; select nullif(start_date, date '-4712-01-01') start_date, nullif(end_date, date '9999-12-31') end_date from ( select id, coalesce(start_date, date '-4712-01-01') start_date, coalesce(end_date, date '9999-12-31') end_date from t ) match_recognize( order by start_date, end_date measures first(start_date) start_date, last(end_date) end_date pattern(A B*) define B as start_date = prev(end_date) ); with data as ( select id, coalesce(start_date, date '-4712-01-01') start_date, coalesce(end_date, date '9999-12-31') end_date from t ) , grp_starts as ( select start_date, end_date, case when start_date <= lag(end_date) over(order by start_date, end_date) then 0 else 1 end grp_start from data ) , grps as ( select start_date, end_date, sum(grp_start) over(order by start_date, end_date) grp from grp_starts ) select nullif(min(start_date), date '-4712-01-01') start_date, nullif(max(end_date), date '9999-12-31') end_date from grps group by grp order by 1 nulls first, 2; select nullif(start_date, date '-4712-01-01') start_date, nullif(end_date, date '9999-12-31') end_date from ( select id, coalesce(start_date, date '-4712-01-01') start_date, coalesce(end_date, date '9999-12-31') end_date from t ) match_recognize( order by start_date, end_date measures first(start_date) start_date, max(end_date) end_date pattern(A B*) define B as start_date <= prev(end_date) );
[Update 2014-03-18] Using the Tabibitosan method
Update 2: Warning! Frank’s method doesn’t work when there are both gaps and overlaps in the date ranges. I don’t recommend it.
Just today on the OTN forum, Frank Kulash taught me how to apply this method, which he calls “Fixed Difference”, to date ranges. I think his solution is more efficient, since it does fewer window sorts, but it only works with number and date ranges, not timestamps — and it doesn’t work when there are both gaps and overlaps in the date ranges.
First, here is the subquery that calculates the groups:
select start_date, end_date, end_date - sum(end_date - start_date) over(order by start_date, end_date) as grp from t
START_DATE | END_DATE | GRP |
---|---|---|
2014-01-01 | 2014-01-03 | 2014-01-01 |
2014-01-03 | 2014-01-05 | 2014-01-01 |
2014-01-05 | 2014-01-07 | 2014-01-01 |
2014-01-08 | 2014-02-01 | 2014-01-02 |
2014-02-01 | 2014-02-10 | 2014-01-02 |
2014-02-05 | 2014-02-28 | 2013-12-28 |
2014-02-10 | 2014-02-15 | 2013-12-10 |
The first “grp” is equal to start_date, since we add end_date and subtract it. In the second row, we are using the second end_date instead of the first, which is like adding the difference between the two end_dates. At the same time, we are subtracting the difference between the second start_date and the second end_date. So, if the second start_date is the same as the first end_date, the result will be the same: the “grp” will be the first start_date of the contiguous range.
After that, we just need to “group by”.
with grps as ( select start_date, end_date, end_date - sum(end_date - start_date) over(order by start_date, end_date) as grp from t ) select min(start_date) start_date, max(end_date) end_date from grps group by grp order by 1, 2;
Unfortunately, when there are both gaps and overlaps, they can cancel each other out and a row may be considered part of a contiguous series when in fact it is not. For an example, see the forum thread.
Thank you SO much! I have been toiling for days trying to crack this nut! Works perfectly!
There is another solution, more simple, more elegant (in my opinion) – you don’t need 12c for it.
Your ‘Gaps-in-date-ranges-when-are-you-free’ can be done twice after another – a gap analysis on a gap analysis, so to speak – producing the same result.
If your timeslices are:
RESOURCE DATE_CNT FROM TILL
XY 2014/06/01 3 5
XY 2014/06/01 2 4
XY 2014/06/01 2 7
XY 2014/06/01 10 15
Your ‘Gaps’ query results in:
RESOURCE DATE_CNT FROM TILL
XY 2014/06/01 7 10
Whereas maybe you would like to see the start and end also.
If – using a view – you add a start and an end ‘sentinel’:
RESOURCE DATE_CNT FROM TILL
XY 2014/06/01 0 0,000001
XY 2014/06/01 3 5
XY 2014/06/01 2 4
XY 2014/06/01 2 7
XY 2014/06/01 10 15
XY 2014/06/01 23,99998 23,99999
your result is much better:
RESOURCE DATE_CNT FROM TILL
XY 2014/06/01 0,000001 3
XY 2014/06/01 7 10
XY 2014/06/01 15 23,99998
And if you use ‘Gaps’ on this result again, you get your result:
RESOURCE DATE_CNT FROM TILL
XY 2014/06/01 3 7
XY 2014/06/01 10 15
Adding a little bit of rounding or extra filtering, you can filter out the ‘sentinels’.
Hi Jan,
The approach you describe does not merge contiguous date ranges, it merges date ranges that are contiguous or overlap in any way.
As I mentioned in the third paragraph, the “overlaps” solution in this post doesn’t do that: it only merges partially overlapping rows.
I did a better job of dealing with overlaps here: https://stewashton.wordpress.com/2015/06/08/merging-overlapping-date-ranges/
I did think about doing the “gap” thing twice, but the “start of group” method seemed more straightforward. I’m sure your approach would work too.
Best regards, Stew
I want to merge contiguous date to (from- to) in one row group by hotel and no of guest please if you can help me
Some test data
HOTEL_ID NO_OF_GUEST DATE_LA
1 80 02/12/2015
1 160 03/12/2015
1 160 04/12/2015
1 160 05/12/2015
1 160 06/12/2015
1 80 09/12/2015
1 80 10/12/2015
1 80 11/12/2015
1 80 12/12/2015
1 80 13/12/2015
2 60 09/12/2015
2 60 10/12/2015
2 60 11/12/2015
2 100 12/12/2015
2 100 13/12/2015
create table test (HOTEL_ID NUMBER, NO_OF_GUEST NUMBER, DATE_LA date);
Insert into test values (1, 80, TO_DATE(‘02/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 160, TO_DATE(‘03/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 160, TO_DATE(‘04/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 160, TO_DATE(‘05/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 160, TO_DATE(‘06/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 80, TO_DATE(‘09/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 80, TO_DATE(‘10/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 80, TO_DATE(‘11/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 80, TO_DATE(‘12/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 60, TO_DATE(‘09/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 60, TO_DATE(‘10/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 60, TO_DATE(‘11/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 100, TO_DATE(‘12/12/2015’,’dd/mm/yyyy’));
Insert into test values (1, 100, TO_DATE(‘13/12/2015’,’dd/mm/yyyy’));
The result
I want to Merge contiguous date
HOTEL_ID NO_OF_GUEST FROM_DATE TO_DATE
1 80 02/12/2015 02/12/2015
1 160 03/12/2015 06/12/2015
1 80 01/12/2015 12/12/2015
2 60 01/12/2015 03/12/2015
2 100 04/12/2015 05/12/2015
My SQL
select hotel_id,no_of_guest,date_la,
case
when date_la – lag(date_la) over(partition by hotel_id,no_of_guest order by DATE_LA) = 1
then 1 else 0
end grp_start
from test
Hello Zuhair,
Sorry I took so long to reply. For quicker replies, try https://community.oracle.com/community/database/developer-tools/sql_and_pl_sql/content
Your INSERT statements do not match your desired output.
I think you could use the Tabibitosan method, since you are dealing with consecutive dates not date ranges.
Hi, Stew
In preparation for row pattern matching HOL for Kscope17, I just noticed something about this example that I hadn’t noticed when I used it before.
In your data, row 7 starts where row 5 ends. So shouldn’t the merge tack row 7 on to the end of row 4-5? Making the output three date ranges, 1-2-3, 4-5-7, and 6 by itself? (Instead of four ranges.)
Fortunately it’s easy to get that result – do ORDER BY END_DATE, START_DATE instead of ORDER BY START_DATE, END_DATE ;-)
But can you think of cases where we’d then get a *different* “wrong” result by sorting END_DATE first? Or will it be a “solid” solution?
Thanks again for making these examples I can steal ;-)
Cheerio
/Kim
Hi Kim,
Thanks for bringing that up – good catch!
Your issue is with the requirement, not with the solution.
I wanted to merge only contiguous rows, which to me means rows that meet but do not overlap any intermediate row. Any time there is an overlap of any kind, I don’t want to merge.
You mention row 7, which is contiguous with row 5 but overlaps with row 6. I didn’t want to merge 7 with 5.
You may think it is “better” to go ahead and merge 5 and 7, leaving 6 alone. I can understand that. If you do, ordering by END_DATE, START_DATE seems like a solid solution from my testing.
Have a great time at Kscope (the one conference in the US I would really like to attend).
Best regards, Stew
Pingback: Lin.ear th.inking: Mandelbrot Set in SQL using SVG with RLE
Mayby this would help if you have data with partitions (multiple groups with different id) and more data than only only dates.
“t.Variables” represents multiple variables.
DROP TABLE t PURGE;
CREATE TABLE t
(
id INT,
variables VARCHAR2 (99),
dstart DATE,
dend DATE
);
INSERT INTO t
VALUES (0,
‘Z’,
DATE ‘2014-01-01’,
DATE ‘2014-01-03’);
INSERT INTO t
VALUES (0,
‘Z’,
DATE ‘2014-01-04’,
DATE ‘2014-01-05’);
INSERT INTO t
VALUES (0,
‘Z’,
DATE ‘2014-01-06’,
DATE ‘2014-01-07’);
INSERT INTO t
VALUES (1,
‘X’,
DATE ‘2014-01-01’,
DATE ‘2014-01-03’);
INSERT INTO t
VALUES (1,
‘X’,
DATE ‘2014-01-04’,
DATE ‘2014-01-05’);
INSERT INTO t
VALUES (1,
‘X’,
DATE ‘2014-01-06’,
DATE ‘2014-01-07’);
INSERT INTO t
VALUES (1,
‘Y’,
DATE ‘2014-01-08’,
DATE ‘2014-02-01’);
INSERT INTO t
VALUES (1,
‘X’,
DATE ‘2014-02-01’,
DATE ‘2014-02-10’);
INSERT INTO t
VALUES (1,
‘Y’,
DATE ‘2014-02-05’,
DATE ‘2014-02-28’);
INSERT INTO t
VALUES (1,
‘Y’,
DATE ‘2014-02-10’,
DATE ‘2014-02-15’);
SELECT * FROM t;
SELECT *
FROM t
MATCH_RECOGNIZE (
PARTITION BY id, variables
ORDER BY dstart
MEASURES FIRST (dstart) AS ds, LAST (dend) AS de
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (dstart successors *)
DEFINE successors AS successors.dstart =
prev (successors.dend) + 1);– Closed intervals, where day is smallest interval length, without overlap
— DEFINE successors AS successors.dstart <=
— prev (successors.dend) + 1);– Closed intervals, where day is smallest interval length, with overlap
— DEFINE successors AS successors.dstart = prev (successors.dend)); –Clopen intervals [dstart, dend[, without overlap
— DEFINE successors AS successors.dstart <= prev (successors.dend)); –Clopen intervals [dstart, dend[, with overlap
MATCH_RECOGNIZE (
PARTITION BY key
ORDER BY start_dat, end_dat
MEASURES
first(start_dat) AS from_dat, max(end_dat) AS to_dat
PATTERN(merged* strt)
DEFINE
merged AS max(start_dat) >= next(end_dat)
The main trick is to reverse the order of definition in the PATTERN,
the second one is to use max() in the define to be sure to take into account all start_dat of intervals so far
This definition works even if you have an interval that covers all others: it will be the only output while solutions with(strt merged*) will output more rows after the “covering all” one.
HI Pascal,
Your solution appears to be exactly what I wrote about more recently: https://stewashton.wordpress.com/2018/11/12/ranges-with-nulls-04-pack-merge/
However, your solution “merges” ranges that overlap, not just ranges that meet. In this post, I was trying to “merge” only contiguous rows, meaning rows that meet without overlapping.
Best regards,
Stew