SQL for date ranges, gaps and overlaps

[Update 2014-07-04] Since this is my most popular post, I’m adding an index to all my posts about date ranges:

There are lots of questions and posts on the Web about data with date ranges. It’s harder than it seems to design tables with start and end dates, to avoid overlaps or even to write SQL to find gaps or overlaps. I’m writing this post to get some basic ideas straight and to make myself some rules.

Comparing date ranges

I noticed a detailed analysis of date ranges and their relations here > Allen’s Interval Algebra. I’m going to say “date range” where the article says “interval”, since for Oracle an INTERVAL is just a duration without a defined starting point. (Update: Galo Baldo noticed a mistake in the text I quote here. I underlined the part he corrected.)

“This table shows all the possible relations that two date ranges can have. Each one is defined graphically by a diagram relating two ranges a and b, with time running → from left to right. For example, the first diagram shows that “a precedes b” means that a ends before b begins, with a gap separating them; the second shows that “a meets b” means that a ends when b begins.”

precedes meets overlaps finished by contains starts equals started by during finishes overlapped by met by preceded by
p m o fi di s e si d f oi mi pi
gap meet ————– overlap ————– meet gap

I love a thorough analysis like this! For our purposes, I added the third row that simplifies to three situations: “gap”, “meet” and “overlap”.

Designing tables with date ranges: not so easy

Let’s create a very simple table with one row that covers March 11th and another that covers March 12th:

create table Date_Ranges(
  start_time timestamp,
  end_time timestamp
);
insert into Date_Ranges
select TIMESTAMP '2014-03-11 00:00:00',
TIMESTAMP '2014-03-12 00:00:00'
from dual
union all
select TIMESTAMP '2014-03-12 00:00:00',
TIMESTAMP '2014-03-13 00:00:00'
from dual;
commit;
START_TIME END_TIME
2014-03-11 00:00:00.000000 2014-03-12 00:00:00.000000
2014-03-12 00:00:00.000000 2014-03-13 00:00:00.000000

These rows should “meet” because the START_TIME of one = the END_TIME of another. Now let’s see what happens at midnight:

select * from Date_Ranges
where TIMESTAMP '2014-03-12 00:00:00' between start_time and end_time;
START_TIME END_TIME
2014-03-11 00:00:00.000000 2014-03-12 00:00:00.000000
2014-03-12 00:00:00.000000 2014-03-13 00:00:00.000000

We got both rows back. It appears that these two rows “overlap” at midnight! Maybe we should make END_TIME end just before midnight:

Update Date_Ranges
set end_time = end_time - numtodsinterval('1', 'second');
--
select start_time, end_time,
lead(start_time) over(order by start_time) - end_time gap
from Date_Ranges
START_TIME END_TIME GAP
2014-03-11 00:00:00.000000 2014-03-11 23:59:59.000000 +00 00:00:01.
2014-03-12 00:00:00.000000 2014-03-12 23:59:59.000000

Now we have a “gap” between the rows!

It looks like the only way to make rows “meet” is to define START_TIME as inclusive and END_TIME as exclusive: We go back to our original table data but we change our SQL.

rollback;
select * from Date_Ranges
where start_time <= TIMESTAMP '2014-03-12 00:00:00'
                and TIMESTAMP '2014-03-12 00:00:00' < end_time;
START_TIME END_TIME
2014-03-12 00:00:00.000000 2014-03-13 00:00:00.000000
  • Rule 1: to make date ranges “meet”, make the START_TIME of the later row equal to the END_TIME of the earlier row.
  • Rule 2: make END_TIMEs exclusive by using
    “>= START_TIME” and “< END_TIME” instead of BETWEEN.

What do NULLs mean?

You think this is a trick question, right? Because NULLs aren’t supposed to “mean” anything. This is true, but very often an END_TIME with a null value means “until the end of time”. Less often, a null START_TIME means “starting at the beginning of time”. If you decide to use NULL in this way, remember that Oracle does not create index entries with entirely NULL values – and that many useful functions ignore NULLs.

  • Rule 3: if necessary, allow NULL values for START_TIME (for the indefinite past) and / or END_TIME (for the indefinite future).

Oracle 12c agrees

Oracle 12c introduced support for “Temporal Validity”. As it turns out, the SQL that Oracle generates behind the scenes corresponds exactly to my 3 “rules”:

alter table Date_Ranges modify period for valid_time(start_time, end_time);

select * from Date_Ranges
AS OF PERIOD FOR valid_time TIMESTAMP '2014-03-12 00:00:00';

select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |       |       |     3 (100)|          |
|*  1 |  TABLE ACCESS FULL| DATE_RANGES |     1 |    26 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter((("T"."START_TIME" IS NULL OR "T"."START_TIME"<=TIMESTAMP'
              2014-03-12 00:00:00.000000000') AND ("T"."END_TIME" IS NULL OR
              "T"."END_TIME">TIMESTAMP' 2014-03-12 00:00:00.000000000')))
About these ads

2 thoughts on “SQL for date ranges, gaps and overlaps

  1. Nice post Stew.

    Shouldn’t the last sentence on the paragraph before the diagram be: the second shows that “a meets b” means that a ends when b begins?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s