When I wrote “Gaps in Date Ranges: when are you free?“, I handled NULLs using “magic” values. This time I’ll try not to cheat.
[UPDATE 2018-11-08: I have never seen a situation where “from” could be NULL and “to” was NOT NULL, so I no longer test that. ]
Current list of posts:
- Ranges with NULLs 01: starting over
- Ranges with NULLs 02: test cases
- Ranges with NULLs 03: Gaps
- Ranges with NULLs 04: Pack, Merge
- Ranges with NULLs 05: Segments
- Ranges with NULLs 06: Overlaps with Conflicting Data
- Ranges with NULLs 07: Swiss Army Knife
Test data with no NULLs allowed
This isn’t a full test harness, just the minimum to illustrate the solution.
create table t( obj_id varchar2(9), f int, t int, af int, at int, bf int, bt int ); Insert into T values ('1p 0',1,2,1,2,3,4); Insert into T values ('1p 0',3,4,1,2,3,4); Insert into T values ('2m 0',1,2,1,2,2,3); Insert into T values ('2m 0',2,3,1,2,2,3); Insert into T values ('3o 0',1,3,1,3,2,4); Insert into T values ('3o 0',2,4,1,3,2,4); Insert into T values ('4f 0',1,3,1,3,2,3); Insert into T values ('4f 0',2,3,1,3,2,3); Insert into T values ('5c 0',1,4,1,4,2,3); Insert into T values ('5c 0',2,3,1,4,2,3); Insert into T values ('6s 0',1,2,1,2,1,3); Insert into T values ('6s 0',1,3,1,2,1,3); Insert into T values ('7e 0',1,2,1,2,1,2); Insert into T values ('7e 0',1,2,1,2,1,2);
- There are two rows = two ranges for each test case.
OBJ_ID
indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.F
is the “from” value.T
is the “to” value (exclusive).- So we can check our results more easily, each row contains the values for both ranges,
AF
andAT
for the first range,BF
andBT
for the second range.
Analytic solution
select * from ( select OBJ_ID, max(t) over(partition by obj_id order by f) start_gap, lead(f) over(partition by obj_id order by f) end_gap, AF, AT, BF, BT from t ) where start_gap < end_gap order by obj_id, start_gap;
OBJ_ID | START_GAP | END_GAP | AF | AT | BF | BT |
---|---|---|---|---|---|---|
1p 0 | 2 | 3 | 1 | 2 | 3 | 4 |
The start of the gap is the biggest T
so far, and the end of the gap is the next F
– as long as MAX(T)
is less than the next F
.
Match_recognize solution
select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT from t match_recognize( partition by obj_id order by F, T measures max(T) start_gap, next(F) end_gap all rows per match pattern((A|{-B-})+) define A as max(T) < next(F) ) order by obj_id, start_gap;
When the “to” value can be null
The new test data is exactly the same as before, except that the highest “to” value in each test case has been replaced with NULL. The OBJ_ID
of the test case ends in 1, not 0, to signal this.
Insert into T values ('1p 1',1,2,1,2,3,null); Insert into T values ('1p 1',3,null,1,2,3,null); Insert into T values ('2m 1',1,2,1,2,2,null); Insert into T values ('2m 1',2,null,1,2,2,null); Insert into T values ('3o 1',1,3,1,3,2,null); Insert into T values ('3o 1',2,null,1,3,2,null); Insert into T values ('4f 1',1,null,1,null,2,null); Insert into T values ('4f 1',2,null,1,null,2,null); Insert into T values ('5c 1',1,null,1,null,2,3); Insert into T values ('5c 1',2,3,1,null,2,3); Insert into T values ('6s 1',1,2,1,2,1,null); Insert into T values ('6s 1',1,null,1,2,1,null); Insert into T values ('7e 1',1,null,1,null,1,null); Insert into T values ('7e 1',1,null,1,null,1,null);
The problem with “to” being null is that MAX(T) ignores NULL values. I have to know if there has been a NULL T along the way, because that means there can be no more gaps. I use SUM(DECODE(…)) in the analytic solution, and with MATCH_RECOGNIZE I compare COUNT(*) to COUNT(T): if they are not the same then there was a null value along the way.
select * from ( select OBJ_ID, case sum(decode(t,null,1,0)) over(partition by obj_id order by f) when 0 then max(t) over(partition by obj_id order by f) end start_gap, lead(f) over(partition by obj_id order by f) end_gap, AF, AT, BF, BT from t ) where start_gap < end_gap; select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT from t match_recognize( partition by obj_id order by F, T measures max(T) start_gap, next(F) end_gap all rows per match pattern((A|{-B-})+) define A as max(T) < next(F) and count(t) = count(*) ) order by obj_id, start_gap;
OBJ_ID | START_GAP | END_GAP | AF | AT | BF | BT |
---|---|---|---|---|---|---|
1p 0 | 2 | 3 | 1 | 2 | 3 | 4 |
1p 1 | 2 | 3 | 1 | 2 | 3 |
When “from” and “to” can both be null
In the new test data, OBJ_ID ends with 2 when the lowest F and the highest T are replaced by NULL.
The only additional change is to order by F NULLS FIRST.
Insert into T values ('1p 2',3,null,null,2,3,null); Insert into T values ('1p 2',null,2,null,2,3,null); Insert into T values ('2m 2',2,null,null,2,2,null); Insert into T values ('2m 2',null,2,null,2,2,null); Insert into T values ('3o 2',2,null,null,3,2,null); Insert into T values ('3o 2',null,3,null,3,2,null); Insert into T values ('4f 2',2,null,null,null,2,null); Insert into T values ('4f 2',null,null,null,null,2,null); Insert into T values ('5c 2',2,3,null,null,2,3); Insert into T values ('5c 2',null,null,null,null,2,3); Insert into T values ('6s 2',null,2,null,2,null,null); Insert into T values ('6s 2',null,null,null,2,null,null); Insert into T values ('7e 2',null,null,null,null,null,null); Insert into T values ('7e 2',null,null,null,null,null,null); select * from ( select OBJ_ID, case sum(decode(t,null,1,0)) over(partition by obj_id order by f nulls first) when 0 then max(t) over(partition by obj_id order by f nulls first) end start_gap, lead(f) over(partition by obj_id order by f nulls first) end_gap, AF, AT, BF, BT from t ) where start_gap < end_gap; select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT from t match_recognize( partition by obj_id order by F nulls first, T measures max(T) start_gap, next(F) end_gap all rows per match pattern((A|{-B-})+) define A as max(T) < next(F) and count(t) = count(*) ) order by obj_id, start_gap;
OBJ_ID | START_GAP | END_GAP | AF | AT | BF | BT |
---|---|---|---|---|---|---|
1p 0 | 2 | 3 | 1 | 2 | 3 | 4 |
1p 1 | 2 | 3 | 1 | 2 | 3 | |
1p 2 | 2 | 3 | 2 | 3 |
Next up: merging or “packing” ranges.