Merging contiguous date ranges

Last time I wrote about finding gaps in date ranges: this post is about merging date ranges that “meet”. This is a frequent question; the answer applies to any ranges, not just dates. As a reminder, I consider ranges to “meet” when the end value of one record is equal to the start value of the following record.

[Update 2014-07-04] Warning: none of these solutions work when the test data contains NULL values. I have added a new section to talk about NULLs.

[Update 2015-06-08] The variant for “overlapping” date ranges only works for date ranges that partially overlap. For a solution that merges any date ranges that meet or overlap in any way, see Merging Overlapping Date Ranges

Some test data, using date ranges:

drop table t purge;
create table t ( id int, start_date date, end_date date );
Insert into T values (1, DATE '2014-01-01', DATE '2014-01-03');
Insert into T values (2, DATE '2014-01-03', DATE '2014-01-05');
Insert into T values (3, DATE '2014-01-05', DATE '2014-01-07');
Insert into T values (4, DATE '2014-01-08', DATE '2014-02-01');
Insert into T values (5, DATE '2014-02-01', DATE '2014-02-10');
Insert into T values (6, DATE '2014-02-05', DATE '2014-02-28');
Insert into T values (7, DATE '2014-02-10', DATE '2014-02-15');
ID START_DATE END_DATE
1 2014-01-01 00:00:00 2014-01-03 00:00:00
2 2014-01-03 00:00:00 2014-01-05 00:00:00
3 2014-01-05 00:00:00 2014-01-07 00:00:00
4 2014-01-08 00:00:00 2014-02-01 00:00:00
5 2014-02-01 00:00:00 2014-02-10 00:00:00
6 2014-02-05 00:00:00 2014-02-28 00:00:00
7 2014-02-10 00:00:00 2014-02-15 00:00:00

I expect columns 1 through 3 to be merged, as well as columns 4 and 5. The requirement says to merge contiguous ranges, not overlapping ranges, so columns 6 and 7 should stay as they are.

The “start of group” method

I wrote about the “start of group” method a week ago.

  • The first step is to assign 1 to each row that starts a group and 0 to every other row. Here the test is simple: a group starts when the current start date is not equal to the previous end date.
  • The next step is to assign a group to each row using a running sum of the 1s and 0s.
  • Finally, I do the group by.
with grp_starts as (
  select start_date, end_date,
  case
    when start_date = lag(end_date) over(order by start_date, end_date)
    then 0 else 1
  end grp_start
  from t
)
, grps as (
  select start_date, end_date,
  sum(grp_start) over(order by start_date, end_date) grp
  from grp_starts
)
select min(start_date) start_date,
max(end_date) end_date
from grps
group by grp
order by 1, 2;
START_DATE END_DATE
2014-01-01 00:00:00 2014-01-07 00:00:00
2014-01-08 00:00:00 2014-02-10 00:00:00
2014-02-05 00:00:00 2014-02-28 00:00:00
2014-02-10 00:00:00 2014-02-15 00:00:00

Using 12c row pattern matching

Here the “pattern” is: any one row (A), followed by zero or more rows (B*) whose start date is equal to the previous end date. By default:

  • the next match starts with the row just after the prior match;
  • only one row per match is returned;
  • since A is not defined, the implicit condition is TRUE: the row always matches.
select * from t
match_recognize(
  order by start_date, end_date
  measures first(start_date) start_date, last(end_date) end_date
  pattern(A B*)
  define B as start_date = prev(end_date)
);

Merging overlapping rows

Suppose the requirement is to merge rows that overlap as well as rows that “meet”. With the “start of group” method, I change my first test from “=” to “<=” and I’m done. With MATCH_RECOGNIZE, I change DEFINE B from “=” to “<=” and I return max(end_date) instead of last(end_date).

select * from t
match_recognize(
  order by start_date, end_date
  measures first(start_date) start_date, max(end_date) end_date
  pattern(A B*)
  define B as start_date <= prev(end_date)
);
START_DATE END_DATE
2014-01-01 00:00:00 2014-01-07 00:00:00
2014-01-08 00:00:00 2014-02-28 00:00:00

[Update 2014-07-02: handling NULL values]

You may want to allow NULL values for START_DATE (for the indefinite past) or END_DATE (for the indefinite future). If you do, all these solutions break because functions like MAX() leave NULLs out. The workarounds are complicated and (I suspect) less efficient.

I think the simplest and surest way to fix this is to substitute real dates for the null values. The earliest possible date in Oracle SQL is 4712-01-01 B.C. and the latest is 9999-12-31. I use COALESCE to change nulls to these dates, and NULLIF to change these dates back to nulls for display. Here are simple test code and adjusted solutions. Every statement will return one line with START_DATE as NULL and END_DATE as NULL.

drop table t cascade constraints purge;
create table t ( id int, start_date date, end_date date );
Insert into T values (1, null, DATE '2014-01-03');
Insert into T values (2, DATE '2014-01-03', DATE '2014-01-05');
Insert into T values (3, DATE '2014-01-05', null);
commit;

with data as (
  select id,
  coalesce(start_date, date '-4712-01-01') start_date,
  coalesce(end_date, date '9999-12-31') end_date
  from t
)
, grp_starts as (
  select start_date, end_date,
  case
    when start_date = lag(end_date) over(order by start_date, end_date)
    then 0 else 1
  end grp_start
  from data
)
, grps as (
  select start_date, end_date,
  sum(grp_start) over(order by start_date, end_date) grp
  from grp_starts
)
select nullif(min(start_date), date '-4712-01-01') start_date,
nullif(max(end_date), date '9999-12-31') end_date
from grps
group by grp
order by 1 nulls first, 2;

select nullif(start_date, date '-4712-01-01') start_date,
nullif(end_date, date '9999-12-31') end_date
from (
  select id,
  coalesce(start_date, date '-4712-01-01') start_date,
  coalesce(end_date, date '9999-12-31') end_date
  from t
)
match_recognize(
  order by start_date, end_date
  measures first(start_date) start_date, last(end_date) end_date
  pattern(A B*)
  define B as start_date = prev(end_date)
);

with data as (
  select id,
  coalesce(start_date, date '-4712-01-01') start_date,
  coalesce(end_date, date '9999-12-31') end_date
  from t
)
, grp_starts as (
  select start_date, end_date,
  case
    when start_date <= lag(end_date) over(order by start_date, end_date)
    then 0 else 1
  end grp_start
  from data
)
, grps as (
  select start_date, end_date,
  sum(grp_start) over(order by start_date, end_date) grp
  from grp_starts
)
select nullif(min(start_date), date '-4712-01-01') start_date,
nullif(max(end_date), date '9999-12-31') end_date
from grps
group by grp
order by 1 nulls first, 2;

select nullif(start_date, date '-4712-01-01') start_date,
nullif(end_date, date '9999-12-31') end_date
from (
  select id,
  coalesce(start_date, date '-4712-01-01') start_date,
  coalesce(end_date, date '9999-12-31') end_date
  from t
)
match_recognize(
  order by start_date, end_date
  measures first(start_date) start_date, max(end_date) end_date
  pattern(A B*)
  define B as start_date <= prev(end_date)
);

[Update 2014-03-18] Using the Tabibitosan method

Update 2: Warning! Frank’s method doesn’t work when there are both gaps and overlaps in the date ranges. I don’t recommend it.

Just today on the OTN forum, Frank Kulash taught me how to apply this method, which he calls “Fixed Difference”, to date ranges. I think his solution is more efficient, since it does fewer window sorts, but it only works with number and date ranges, not timestamps — and it doesn’t work when there are both gaps and overlaps in the date ranges.

First, here is the subquery that calculates the groups:

select start_date, end_date,
end_date -
  sum(end_date - start_date) over(order by start_date, end_date)
as grp
from t
START_DATE END_DATE GRP
2014-01-01 2014-01-03 2014-01-01
2014-01-03 2014-01-05 2014-01-01
2014-01-05 2014-01-07 2014-01-01
2014-01-08 2014-02-01 2014-01-02
2014-02-01 2014-02-10 2014-01-02
2014-02-05 2014-02-28 2013-12-28
2014-02-10 2014-02-15 2013-12-10

The first “grp” is equal to start_date, since we add end_date and subtract it. In the second row, we are using the second end_date instead of the first, which is like adding the difference between the two end_dates. At the same time, we are subtracting the difference between the second start_date and the second end_date. So, if the second start_date is the same as the first end_date, the result will be the same: the “grp” will be the first start_date of the contiguous range.

After that, we just need to “group by”.

with grps as (
  select start_date, end_date,
  end_date -
    sum(end_date - start_date) over(order by start_date, end_date)
  as grp
  from t
)
select min(start_date) start_date,
max(end_date) end_date
from grps
group by grp
order by 1, 2;

Unfortunately, when there are both gaps and overlaps, they can cancel each other out and a row may be considered part of a contiguous series when in fact it is not. For an example, see the forum thread.

11 thoughts on “Merging contiguous date ranges

  1. Thank you SO much! I have been toiling for days trying to crack this nut! Works perfectly!

  2. There is another solution, more simple, more elegant (in my opinion) – you don’t need 12c for it.

    Your ‘Gaps-in-date-ranges-when-are-you-free’ can be done twice after another – a gap analysis on a gap analysis, so to speak – producing the same result.

    If your timeslices are:
    RESOURCE DATE_CNT FROM TILL
    XY 2014/06/01 3 5
    XY 2014/06/01 2 4
    XY 2014/06/01 2 7
    XY 2014/06/01 10 15

    Your ‘Gaps’ query results in:
    RESOURCE DATE_CNT FROM TILL
    XY 2014/06/01 7 10

    Whereas maybe you would like to see the start and end also.

    If – using a view – you add a start and an end ‘sentinel’:
    RESOURCE DATE_CNT FROM TILL
    XY 2014/06/01 0 0,000001
    XY 2014/06/01 3 5
    XY 2014/06/01 2 4
    XY 2014/06/01 2 7
    XY 2014/06/01 10 15
    XY 2014/06/01 23,99998 23,99999

    your result is much better:
    RESOURCE DATE_CNT FROM TILL
    XY 2014/06/01 0,000001 3
    XY 2014/06/01 7 10
    XY 2014/06/01 15 23,99998

    And if you use ‘Gaps’ on this result again, you get your result:
    RESOURCE DATE_CNT FROM TILL
    XY 2014/06/01 3 7
    XY 2014/06/01 10 15

    Adding a little bit of rounding or extra filtering, you can filter out the ‘sentinels’.

    • Hi Jan,

      The approach you describe does not merge contiguous date ranges, it merges date ranges that are contiguous or overlap in any way.

      As I mentioned in the third paragraph, the “overlaps” solution in this post doesn’t do that: it only merges partially overlapping rows.

      I did a better job of dealing with overlaps here: https://stewashton.wordpress.com/2015/06/08/merging-overlapping-date-ranges/

      I did think about doing the “gap” thing twice, but the “start of group” method seemed more straightforward. I’m sure your approach would work too.

      Best regards, Stew

  3. I want to merge contiguous date to (from- to) in one row group by hotel and no of guest please if you can help me
    Some test data
    HOTEL_ID NO_OF_GUEST DATE_LA
    1 80 02/12/2015
    1 160 03/12/2015
    1 160 04/12/2015
    1 160 05/12/2015
    1 160 06/12/2015
    1 80 09/12/2015
    1 80 10/12/2015
    1 80 11/12/2015
    1 80 12/12/2015
    1 80 13/12/2015
    2 60 09/12/2015
    2 60 10/12/2015
    2 60 11/12/2015
    2 100 12/12/2015
    2 100 13/12/2015

    create table test (HOTEL_ID NUMBER, NO_OF_GUEST NUMBER, DATE_LA date);

    Insert into test values (1, 80, TO_DATE(‘02/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 160, TO_DATE(‘03/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 160, TO_DATE(‘04/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 160, TO_DATE(‘05/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 160, TO_DATE(‘06/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 80, TO_DATE(‘09/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 80, TO_DATE(‘10/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 80, TO_DATE(‘11/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 80, TO_DATE(‘12/12/2015’,’dd/mm/yyyy’));

    Insert into test values (1, 60, TO_DATE(‘09/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 60, TO_DATE(‘10/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 60, TO_DATE(‘11/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 100, TO_DATE(‘12/12/2015’,’dd/mm/yyyy’));
    Insert into test values (1, 100, TO_DATE(‘13/12/2015’,’dd/mm/yyyy’));

    The result
    I want to Merge contiguous date
    HOTEL_ID NO_OF_GUEST FROM_DATE TO_DATE
    1 80 02/12/2015 02/12/2015
    1 160 03/12/2015 06/12/2015
    1 80 01/12/2015 12/12/2015
    2 60 01/12/2015 03/12/2015
    2 100 04/12/2015 05/12/2015

    My SQL
    select hotel_id,no_of_guest,date_la,
    case
    when date_la – lag(date_la) over(partition by hotel_id,no_of_guest order by DATE_LA) = 1
    then 1 else 0
    end grp_start
    from test

    • Hello Zuhair,

      Sorry I took so long to reply. For quicker replies, try https://community.oracle.com/community/database/developer-tools/sql_and_pl_sql/content

      Your INSERT statements do not match your desired output.

      I think you could use the Tabibitosan method, since you are dealing with consecutive dates not date ranges.

      select hotel_id, no_of_guest, min(date_la) from_date, max(date_la) to_date
      from (
        select t.*,
          date_la
          - row_number() over(partition by hotel_id, no_of_guest order by date_la)
          grp_id
        from test t
      )
      group by hotel_id, no_of_guest, grp_id
      order by 1,3,2;
  4. Hi, Stew

    In preparation for row pattern matching HOL for Kscope17, I just noticed something about this example that I hadn’t noticed when I used it before.

    In your data, row 7 starts where row 5 ends. So shouldn’t the merge tack row 7 on to the end of row 4-5? Making the output three date ranges, 1-2-3, 4-5-7, and 6 by itself? (Instead of four ranges.)

    Fortunately it’s easy to get that result – do ORDER BY END_DATE, START_DATE instead of ORDER BY START_DATE, END_DATE ;-)

    But can you think of cases where we’d then get a *different* “wrong” result by sorting END_DATE first? Or will it be a “solid” solution?

    Thanks again for making these examples I can steal ;-)

    Cheerio
    /Kim

    • Hi Kim,

      Thanks for bringing that up – good catch!

      Your issue is with the requirement, not with the solution.

      I wanted to merge only contiguous rows, which to me means rows that meet but do not overlap any intermediate row. Any time there is an overlap of any kind, I don’t want to merge.

      You mention row 7, which is contiguous with row 5 but overlaps with row 6. I didn’t want to merge 7 with 5.

      You may think it is “better” to go ahead and merge 5 and 7, leaving 6 alone. I can understand that. If you do, ordering by END_DATE, START_DATE seems like a solid solution from my testing.

      Have a great time at Kscope (the one conference in the US I would really like to attend).

      Best regards, Stew

  5. Pingback: Lin.ear th.inking: Mandelbrot Set in SQL using SVG with RLE

  6. Mayby this would help if you have data with partitions (multiple groups with different id) and more data than only only dates.
    “t.Variables” represents multiple variables.

    DROP TABLE t PURGE;

    CREATE TABLE t
    (
    id INT,
    variables VARCHAR2 (99),
    dstart DATE,
    dend DATE
    );

    INSERT INTO t
    VALUES (0,
    ‘Z’,
    DATE ‘2014-01-01’,
    DATE ‘2014-01-03’);

    INSERT INTO t
    VALUES (0,
    ‘Z’,
    DATE ‘2014-01-04’,
    DATE ‘2014-01-05’);

    INSERT INTO t
    VALUES (0,
    ‘Z’,
    DATE ‘2014-01-06’,
    DATE ‘2014-01-07’);

    INSERT INTO t
    VALUES (1,
    ‘X’,
    DATE ‘2014-01-01’,
    DATE ‘2014-01-03’);

    INSERT INTO t
    VALUES (1,
    ‘X’,
    DATE ‘2014-01-04’,
    DATE ‘2014-01-05’);

    INSERT INTO t
    VALUES (1,
    ‘X’,
    DATE ‘2014-01-06’,
    DATE ‘2014-01-07’);

    INSERT INTO t
    VALUES (1,
    ‘Y’,
    DATE ‘2014-01-08’,
    DATE ‘2014-02-01’);

    INSERT INTO t
    VALUES (1,
    ‘X’,
    DATE ‘2014-02-01’,
    DATE ‘2014-02-10’);

    INSERT INTO t
    VALUES (1,
    ‘Y’,
    DATE ‘2014-02-05’,
    DATE ‘2014-02-28’);

    INSERT INTO t
    VALUES (1,
    ‘Y’,
    DATE ‘2014-02-10’,
    DATE ‘2014-02-15’);

    SELECT * FROM t;

    SELECT *
    FROM t
    MATCH_RECOGNIZE (
    PARTITION BY id, variables
    ORDER BY dstart
    MEASURES FIRST (dstart) AS ds, LAST (dend) AS de
    ONE ROW PER MATCH
    AFTER MATCH SKIP PAST LAST ROW
    PATTERN (dstart successors *)
    DEFINE successors AS successors.dstart =
    prev (successors.dend) + 1);– Closed intervals, where day is smallest interval length, without overlap
    — DEFINE successors AS successors.dstart <=
    — prev (successors.dend) + 1);– Closed intervals, where day is smallest interval length, with overlap
    — DEFINE successors AS successors.dstart = prev (successors.dend)); –Clopen intervals [dstart, dend[, without overlap
    — DEFINE successors AS successors.dstart <= prev (successors.dend)); –Clopen intervals [dstart, dend[, with overlap

  7. MATCH_RECOGNIZE (
    PARTITION BY key
    ORDER BY start_dat, end_dat
    MEASURES
    first(start_dat) AS from_dat, max(end_dat) AS to_dat
    PATTERN(merged* strt)
    DEFINE
    merged AS max(start_dat) >= next(end_dat)

    The main trick is to reverse the order of definition in the PATTERN,
    the second one is to use max() in the define to be sure to take into account all start_dat of intervals so far

    This definition works even if you have an interval that covers all others: it will be the only output while solutions with(strt merged*) will output more rows after the “covering all” one.

Leave a comment