Calculating “free time” in Calendars is a very common task. Did you know it can be done by a SQL statement less than 100 bytes long? This post pays homage to an incredibly neat and concise use of analytic functions.
The Question
“Free time” in a Calendar means the unused time slots between events; in other words it is the set of gaps between the date ranges of those events. Back in 2007, a reader asked Tom Kyte for a SQL Query to find gaps in date ranges.Here is his input data:
create table t ( a int, b date, c date ); Insert into T (A,B,C) values (1, DATE '2007-01-01', DATE '2007-01-15'); Insert into T (A,B,C) values (2, DATE '2007-01-03', DATE '2007-01-10'); Insert into T (A,B,C) values (3, DATE '2007-01-12', DATE '2007-01-25'); Insert into T (A,B,C) values (4, DATE '2007-01-20', DATE '2007-02-01'); Insert into T (A,B,C) values (5, DATE '2007-02-05', DATE '2007-02-10'); Insert into T (A,B,C) values (6, DATE '2007-02-05', DATE '2007-02-28'); Insert into T (A,B,C) values (7, DATE '2007-02-10', DATE '2007-02-15'); Insert into T (A,B,C) values (8, DATE '2007-02-18', DATE '2007-02-23'); Insert into T (A,B,C) values (9, DATE '2007-02-22', DATE '2007-03-16');
And his expected output:
GAP_START | GAP_END |
---|---|
2007-02-02 | 2007-02-04 |
The Answer
The first thing to notice is that the reader is asking for the wrong answer – or let’s say a different answer than I would expect. The only gap that exists here is between the record ending 2007-02-01 and the record starting 2007-02-05. As I explained in a previous post on date ranges, the end point is really 2007-02-01 00:00:00; I take it to be exclusive, meaning the gap should start at that same moment. In the same way, the gap should end at 2007-02-05 00:00:00, the moment when the next event starts. I would expect this output:
GAP_START | GAP_END |
---|---|
2007-02-01 00:00:00 | 2007-02-05 00:00:00 |
The person who asked the question probably thinks in terms of dates without times. For him, a range ending on 2007-02-01 includes any time on that date, so his gap would start the next day. I strongly recommend against this approach, because it only works for “pure” dates and not for date/time values, and it is not compatible with the “Temporal Validity” support introduced by Oracle version 12c.
The Algorithm
Before revealing the code, let me explain how I think the author, Antony Boucher, came up with his solution.
Let’s suppose there are no overlaps between records. In that case,
- The end date of the current record is the start date of the “gap”;
- The start date of the next record is the end date of the “gap”;
- If the “gap” has the same start and end dates, we can just eliminate that record.
SELECT * FROM (SELECT C D, LEAD(B) OVER (ORDER BY B) E FROM T) WHERE D < E
So what do we do about the overlaps? Instead of using the end date of the current record, use the latest end date up to and including the current record.
SELECT * FROM (SELECT MAX(C) OVER (ORDER BY B) D, LEAD(B) OVER (ORDER BY B) E FROM T) WHERE D < E
D | E |
---|---|
2007-02-01 00:00:00 | 2007-02-05 00:00:00 |
And that is Antony’s solution. Its length: 99 bytes!
[Update 2014-06-29] What about NULL?
In my first post on date ranges, I wrote that we could “allow NULL values for START_TIME (for the indefinite past) and / or END_TIME (for the indefinite future).” Antony’s solution doesn’t work if we allow NULL values, because the MAX()
function ignores NULLs. For example:
update t set c = null where a = 4; SELECT * FROM ( SELECT MAX(C) OVER (ORDER BY B) D, LEAD(B) OVER (ORDER BY B) E FROM T ) WHERE D < E;
D | E |
---|---|
2007-01-25 00:00:00 | 2007-02-05 00:00:00 |
One way to fix this is to substitute real dates for the null values:
with data as( select a, coalesce(b, date '-4172-01-01') b, coalesce(c, date '9999-12-31') c from t ) SELECT * FROM ( SELECT MAX(C) OVER (ORDER BY B) D, LEAD(B) OVER (ORDER BY B) E from data ) WHERE D < E;
This correctly returns no rows, since there are no gaps anymore.
Thanks for your solution. It helped me quite a lot.
Thank you for coming back to say so, Arvind.
Pingback: How do I find gaps between ranges of numbers?