Ranges with NULLs 03: Gaps

When I wrote “Gaps in Date Ranges: when are you free?“, I handled NULLs using “magic” values. This time I’ll try not to cheat.

[UPDATE 2018-11-08: I have never seen a situation where “from” could be NULL and “to” was NOT NULL, so I no longer test that. ]

Current list of posts: 

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution.

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.

Analytic solution

select * from (
  select OBJ_ID,
    max(t) over(partition by obj_id order by f) start_gap,
    lead(f) over(partition by obj_id order by f) end_gap,
    AF, AT, BF, BT
  from t
)
where start_gap < end_gap
order by obj_id, start_gap;
OBJ_ID START_GAP END_GAP AF AT BF BT
1p 0 2 3 1 2 3 4

 

The start of the gap is the biggest T so far, and the end of the gap is the next F – as long as MAX(T) is less than the next F.

Match_recognize solution

select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id
  order by F, T
  measures max(T) start_gap, next(F) end_gap
  all rows per match
  pattern((A|{-B-})+)
  define A as max(T) < next(F)
)
order by obj_id, start_gap;

 

When the “to” value can be null

The new test data is exactly the same as before, except that the highest “to” value in each test case has been replaced with NULL. The OBJ_ID of the test case ends in 1, not 0, to signal this.

Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);

The problem with “to” being null is that MAX(T) ignores NULL values. I have to know if there has been a NULL T along the way, because that means there can be no more gaps. I use SUM(DECODE(…)) in the analytic solution, and with MATCH_RECOGNIZE I compare COUNT(*) to COUNT(T): if they are not the same then there was a null value along the way.

select * from (
  select OBJ_ID,
    case sum(decode(t,null,1,0)) over(partition by obj_id order by f)
      when 0
        then max(t) over(partition by obj_id order by f)
    end start_gap,
    lead(f) over(partition by obj_id order by f) end_gap,
    AF, AT, BF, BT
  from t
)
where start_gap < end_gap;

select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id
  order by F, T
  measures max(T) start_gap, next(F) end_gap
  all rows per match
  pattern((A|{-B-})+)
  define A as max(T) < next(F) and count(t) = count(*)
)
order by obj_id, start_gap;
OBJ_ID START_GAP END_GAP AF AT BF BT
1p 0 2 3 1 2 3 4
1p 1 2 3 1 2 3

 

When “from” and “to” can both be null

In the new test data, OBJ_ID ends with 2 when the lowest F and the highest T are replaced by NULL.

The only additional change is to order by F NULLS FIRST.

Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);

select * from (
  select OBJ_ID,
    case sum(decode(t,null,1,0)) over(partition by obj_id order by f nulls first)
      when 0
        then max(t) over(partition by obj_id order by f nulls first)
    end start_gap,
    lead(f) over(partition by obj_id order by f nulls first) end_gap,
    AF, AT, BF, BT
  from t
)
where start_gap < end_gap;

select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id
  order by F nulls first, T
  measures max(T) start_gap, next(F) end_gap
  all rows per match
  pattern((A|{-B-})+)
  define A as max(T) < next(F) and count(t) = count(*)
)
order by obj_id, start_gap;
OBJ_ID START_GAP END_GAP AF AT BF BT
1p 0 2 3 1 2 3 4
1p 1 2 3 1 2 3
1p 2 2 3 2 3

 
Next up: merging or “packing” ranges.

Leave a comment