Ranges with NULLs 07: Swiss Army Knife

I came up with this name a few years ago for a problem that I couldn’t solve then: analyse a series of ranges and say whether and how they overlap. It turns out the solution is not that hard.

Current list of posts: 

The Problem

I want to order a set of ranges by “from” and “to”, then determine the relations between each range and the ranges that follow it. The only limit is with the “precedes” relation: I know that the first range may precede any number of following ranges, so I only want relations where one range immediately precedes the next, with no intervening range.

MATCH_RECOGNIZE to the rescue

The MATCH_RECOGNIZE clause (introduced in 12c) is a good fit here, for several reasons:

  • I can PARTITION BY different test cases and ORDER BY the “from” and “to” values.
  • I can say AFTER MATCH SKIP TO NEXT ROW to use each and every range as a starting point.
  • I can use the pipe character | for alternation. Alternation matches a single regular expression from a list of several possible regular expressions. Alternatives are preferred in the order they are specified.

Reminders (skip if you are reading the whole series)

  • Ranges have exclusive end points, which allows ranges that “meet” without gaps or overlaps.
  • Three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • My data consists of an object and a “from” and “to” range.
  • I propose solutions that require database version 12.1 or later.

No NULLs

This is the same data as in my previous post Ranges with NULLs 03: Gaps

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.
The code:
  • line 4: I partition by the test case and order by “from” and “to”.
  • line 5: the classifier() function returns the name of the condition that was met, which is also the name of the relation I am looking for.
  • line 7: I check each row in turn.
  • line 8: A is the first row (which I do not return), and the following rows belong to one of the other conditions.
  • I check the relations “backwards” so that some checks are implicit:
    • equals: between the first row and the current row, “from” and “to” are the same.
    • starts: I don’t explicitly check that the first “to” is less than the current “to”. At this point they cannot be equal and the ORDER BY guarantees that the first “to” is not greater than the current “to”.
    • after the “starts” condition, the first “from” must be less than the current “to”, so I don’t have to check it explicitly.
    • I don’t care about “precedes” if there are intervening rows.
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T
from t
match_recognize(
  partition by obj_id order by f, t
  measures a.f prior_f, a.t prior_t, classifier() relation
  all rows per match
  after match skip to next row
  pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ )
  define equals as a.f = f and a.t = t,
    starts as a.f = f, -- if true, a.t must be < t because of order by
    -- from now on a.f must be < f because of order by
    contains as a.t > t,
    finished_by as a.t = t,
    overlaps as a.t > f and a.t < t,
    meets as a.t = f,
    -- at this point a.t must be < f
    precedes as count(*) = 2    
);
OBJ_ID PRIOR_F PRIOR_T RELATION F T
1p    0 1 2 PRECEDES 3 4
2m    0 1 2 MEETS 2 3
3o    0 1 3 OVERLAPS 2 4
4f    0 1 3 FINISHED_BY 2 3
5c    0 1 4 CONTAINS 2 3
6s    0 1 2 STARTS 1 3
7e    0 1 2 EQUALS 1 2

 

“To” NULL

delete from t;
Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
  • lines 9 and 12: with equality comparisons, I use DECODE() to treat NULL values as equal.
  • lines 11 and 13: with inequality comparisons, I check explicitly for NULL when necessary.
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T
from t
match_recognize(
  partition by obj_id order by f, t
  measures a.f prior_f, a.t prior_t, classifier() relation
  all rows per match
  after match skip to next row
  pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ )
  define equals as a.f = f and decode(a.t,t,0,1) = 0,
    starts as a.f = f,
    contains as (a.t > t or a.t is null),
    finished_by as decode(a.t,t,0,1) = 0,
    overlaps as a.t > f and (a.t < t or t is null),
    meets as a.t = f,
    precedes as count(*) = 2    
);
OBJ_ID PRIOR_F PRIOR_T RELATION F T
1p    1 1 2 PRECEDES 3
2m    1 1 2 MEETS 2
3o    1 1 3 OVERLAPS 2
4f    1 1 FINISHED_BY 2
5c    1 1 CONTAINS 2 3
6s    1 1 2 STARTS 1
7e    1 1 EQUALS 1

 

With “From” and “To” NULLs

delete from t;
Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T
from t
match_recognize(
  partition by obj_id order by f nulls first, t
  measures a.f prior_f, a.t prior_t, classifier() relation
  all rows per match
  after match skip to next row
  pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ )
  define equals as decode(a.f,f,0,1) = 0 and decode(a.t,t,0,1) = 0,
    starts as decode(a.f,f,0,1) = 0,
    contains as a.t > t or a.t is null,
    finished_by as decode(a.t,t,0,1) = 0,
    overlaps as a.t > f and (a.t < t or t is null),
    meets as a.t = f,
    precedes as count(*) = 2    
);
OBJ_ID PRIOR_F PRIOR_T RELATION F T
1p    2 2 PRECEDES 3
2m    2 2 MEETS 2
3o    2 3 OVERLAPS 2
4f    2 FINISHED_BY 2
5c    2 CONTAINS 2 3
6s    2 2 STARTS
7e    2 EQUALS

 

Advertisements

Ranges with NULLs 06: Overlaps with Conflicting Data

In 2014 I attacked the problem of “Overlapping ranges with priorities”. This time I’ll deal with NULLs and propose an improved solution.

Current list of posts: 

Requirement

Assume overlapping ranges with one attribute. I call the overlapping parts “segments”.

  • Segments: break each range down into segments, including the range’s attribute.
  • Priority: when two or more segments with the same “from” and “to” have different values, the minimum value has priority.
  • Merge: once all the segments have proper values assigned, merge contiguous segments having the same values.

Reminders

  • In my world, ranges have exclusive end points: this is called the “closed-open” type. It allows ranges that “meet” without gaps or overlaps.
  • I classify ranges in three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • I propose solutions that require database version 12.1 or later.

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution.

CREATE TABLE T (
OBJ_ID    VARCHAR2 ( 20 BYTE ),
F         NUMBER,
T         NUMBER,
ATTR_ID   NUMBER,
AB_ATTR   VARCHAR2 ( 90 BYTE )
);
Insert into T values ('3o    01','1','3','1','1:1=1');
Insert into T values ('3o    01','2','4','1','1:1=1');
Insert into T values ('3o    02','1','3','1','2:1<2');
Insert into T values ('3o    02','2','4','2','2:1<2');
Insert into T values ('3o    03','1','3','2','3:2>1');
Insert into T values ('3o    03','2','4','1','3:2>1');
Insert into T values ('3o    04','1','3',null,'4:N-1');
Insert into T values ('3o    04','2','4','1','4:N-1');
Insert into T values ('3o    05','1','3','1','5:1-N');
Insert into T values ('3o    05','2','4',null,'5:1-N');
Insert into T values ('3o    06','1','3',null,'6:N-N');
Insert into T values ('3o    06','2','4',null,'6:N-N');
Insert into T values ('5c    01','1','4','1','1:1=1');
Insert into T values ('5c    01','2','3','1','1:1=1');
Insert into T values ('5c    02','1','4','1','2:1<2');
Insert into T values ('5c    02','2','3','2','2:1<2');
Insert into T values ('5c    03','1','4','2','3:2>1');
Insert into T values ('5c    03','2','3','1','3:2>1');
Insert into T values ('5c    04','1','4',null,'4:N-1');
Insert into T values ('5c    04','2','3','1','4:N-1');
Insert into T values ('5c    05','1','4','1','5:1-N');
Insert into T values ('5c    05','2','3',null,'5:1-N');
Insert into T values ('5c    06','1','4',null,'6:N-N');
Insert into T values ('5c    06','2','3',null,'6:N-N');
OBJ_ID F T ATTR_ID AB_ATTR
3o    01 1 3 1 1:1=1
3o    01 2 4 1 1:1=1
3o    02 1 3 1 2:1<2
3o    02 2 4 2 2:1<2
3o    03 1 3 2 3:2>1
3o    03 2 4 1 3:2>1
3o    04 1 3 4:N-1
3o    04 2 4 1 4:N-1
3o    05 1 3 1 5:1-N
3o    05 2 4 5:1-N
3o    06 1 3 6:N-N
3o    06 2 4 6:N-N
5c    01 1 4 1 1:1=1
5c    01 2 3 1 1:1=1
5c    02 1 4 1 2:1<2
5c    02 2 3 2 2:1<2
5c    03 1 4 2 3:2>1
5c    03 2 3 1 3:2>1
5c    04 1 4 4:N-1
5c    04 2 3 1 4:N-1
5c    05 1 4 1 5:1-N
5c    05 2 3 5:1-N
5c    06 1 4 6:N-N
5c    06 2 3 6:N-N

 

  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “overlaps, contains”. The 0 in position 6 means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the relationship between the ATTR_ID values. There are 6 such relationships, identified by the last digit of OBJ_ID.

Non-null ranges

  • First I UNPIVOT F and T into one column called FT  that contains all the boundary points, and I create a column called SEG_START that is -1 when the source is F and 1 with the source is T. I also retain the original value of T in every new row.
  • Then I use MATCH_RECOGNIZE to identify, each time SEG_START = -1, all the boundary points up to and including the original T. I output each segment with the original ATTR_ID value.
  • Then I use GROUP BY to reduce duplicate segments to one, retaining the minimum ATTR_ID
  • Finally, I use MATCH_RECOGNIZE again to merge the contiguous segments with the same ATTR_ID value.
with unpivoted as (
  select * from t
  unpivot ( (ft, t) for seg_start in ((f,t) as -1, (t,t) as 1))
)
, segmented as (
  select obj_id, ab_attr, seg_f, seg_t, range_attr_id
  from unpivoted
  match_recognize(
    partition by obj_id order by ft, seg_start, t
    measures prev(ft) seg_f, ft seg_t, first(attr_id) range_attr_id
    all rows per match
    after match skip to next row
    pattern({-a-} b+)
    define a as seg_start = -1,
      b as ft <= first(t)
  )
  where seg_f < seg_t
)
, grouped as (
  select OBJ_ID, AB_ATTR, SEG_F, SEG_T, min(RANGE_ATTR_ID) attr_id
  from segmented
  group by OBJ_ID, AB_ATTR, SEG_F, SEG_T
)
select * from grouped
match_recognize(
  partition by obj_id, ab_attr, attr_id order by seg_f, seg_t
  measures first(seg_f) f, last(seg_t) t
  pattern(a b*)
  define b as seg_f = prev(seg_t)
)
order by obj_id, f, t;
OBJ_ID AB_ATTR ATTR_ID F T
3o    01 1:1=1 1 1 4
3o    02 2:1<2 1 1 3
3o    02 2:1<2 2 3 4
3o    03 3:2>1 2 1 2
3o    03 3:2>1 1 2 4
3o    04 4:N-1 1 2
3o    04 4:N-1 1 2 4
3o    05 5:1-N 1 1 3
3o    05 5:1-N 3 4
3o    06 6:N-N 1 4
5c    01 1:1=1 1 1 4
5c    02 2:1<2 1 1 4
5c    03 3:2>1 2 1 2
5c    03 3:2>1 1 2 3
5c    03 3:2>1 2 3 4
5c    04 4:N-1 1 2
5c    04 4:N-1 1 2 3
5c    04 4:N-1 3 4
5c    05 5:1-N 1 1 4
5c    06 6:N-N 1 4

 

Segments with “To” NULL

Insert into T values ('3o    11','1','3','1','1:1=1');
Insert into T values ('3o    11','2',null,'1','1:1=1');
Insert into T values ('3o    12','1','3','1','2:1<2');
Insert into T values ('3o    12','2',null,'2','2:11');
Insert into T values ('3o    13','2',null,'1','3:2>1');
Insert into T values ('3o    14','1','3',null,'4:N-1');
Insert into T values ('3o    14','2',null,'1','4:N-1');
Insert into T values ('3o    15','1','3','1','5:1-N');
Insert into T values ('3o    15','2',null,null,'5:1-N');
Insert into T values ('3o    16','1','3',null,'6:N-N');
Insert into T values ('3o    16','2',null,null,'6:N-N');
Insert into T values ('5c    11','1',null,'1','1:1=1');
Insert into T values ('5c    11','2','3','1','1:1=1');
Insert into T values ('5c    12','1',null,'1','2:1<2');
Insert into T values ('5c    12','2','3','2','2:11');
Insert into T values ('5c    13','2','3','1','3:2>1');
Insert into T values ('5c    14','1',null,null,'4:N-1');
Insert into T values ('5c    14','2','3','1','4:N-1');
Insert into T values ('5c    15','1',null,'1','5:1-N');
Insert into T values ('5c    15','2','3',null,'5:1-N');
Insert into T values ('5c    16','1',null,null,'6:N-N');
Insert into T values ('5c    16','2','3',null,'6:N-N');
  • line 3: add the INCLUDE NULLS option to include T when it is NULL.
  • lines 15 and 17: include segments that end with NULL.
with unpivoted as (
  select * from t
  unpivot INCLUDE NULLS ( (ft, t) for seg_start in ((f,t) as -1, (t,t) as 1))
)
, segmented as (
  select obj_id, ab_attr, seg_f, seg_t, range_attr_id
  from unpivoted
  match_recognize(
    partition by obj_id order by ft, seg_start, t
    measures prev(ft) seg_f, ft seg_t, first(attr_id) range_attr_id
    all rows per match
    after match skip to next row
    pattern({-a-} b+)
    define a as seg_start = -1,
      b as ft <= first(t) OR FIRST(T) IS NULL
  )
  where seg_f < seg_t OR SEG_T IS NULL
)
, grouped as (
  select OBJ_ID, AB_ATTR, SEG_F, SEG_T, min(RANGE_ATTR_ID) attr_id
  from segmented
  group by OBJ_ID, AB_ATTR, SEG_F, SEG_T
)
select * from grouped
match_recognize(
  partition by obj_id, ab_attr, attr_id order by seg_f, seg_t
  measures first(seg_f) f, last(seg_t) t
  pattern(a b*)
  define b as seg_f = prev(seg_t)
)
order by obj_id, f, t;

Just so your mouse won’t get too tired from scrolling, I will spare you the output. There are 20 additional rows for the new test cases.

With “From” and “To” NULLs

Insert into T values ('3o    21',null,'3','1','1:1=1');
Insert into T values ('3o    21','2',null,'1','1:1=1');
Insert into T values ('3o    22',null,'3','1','2:1<2');
Insert into T values ('3o    22','2',null,'2','2:11');
Insert into T values ('3o    23','2',null,'1','3:2>1');
Insert into T values ('3o    24',null,'3',null,'4:N-1');
Insert into T values ('3o    24','2',null,'1','4:N-1');
Insert into T values ('3o    25',null,'3','1','5:1-N');
Insert into T values ('3o    25','2',null,null,'5:1-N');
Insert into T values ('3o    26',null,'3',null,'6:N-N');
Insert into T values ('3o    26','2',null,null,'6:N-N');
Insert into T values ('5c    21',null,null,'1','1:1=1');
Insert into T values ('5c    21','2','3','1','1:1=1');
Insert into T values ('5c    22',null,null,'1','2:1<2');
Insert into T values ('5c    22','2','3','2','2:11');
Insert into T values ('5c    23','2','3','1','3:2>1');
Insert into T values ('5c    24',null,null,null,'4:N-1');
Insert into T values ('5c    24','2','3','1','4:N-1');
Insert into T values ('5c    25',null,null,'1','5:1-N');
Insert into T values ('5c    25','2','3',null,'5:1-N');
Insert into T values ('5c    26',null,null,null,'6:N-N');
Insert into T values ('5c    26','2','3',null,'6:N-N');
  • lines 1 through 9: as explained in my previous post, I create a new column called NULL_ORDER so that NULL “from” values come first.
  • lines 10 through 25: I combine segmenting and grouping in one SELECT just to show that it can be done: the MATCH_RECOGNIZE clause is executed before the GROUP BY. The NULL_ORDER column is used in ordering and comparisons.
  • line 28: order by seg_f NULLS FIRST and it all works!
with unpivoted as (
  select u.*,
    case when seg_start = -1 and ft is null then -1
         when seg_start = 1  and ft is null then  1
         else 0
    end null_order
  from t
  unpivot include nulls ( (ft, t) for seg_start in ((f,t) as -1, (t,t) as 1)) u
)
, segmented_and_grouped as (
  select obj_id, ab_attr, seg_f, seg_t, min(range_attr_id) attr_id
  from unpivoted
  match_recognize(
    partition by obj_id order by null_order, ft, seg_start, t
    measures prev(ft) seg_f, ft seg_t, first(attr_id) range_attr_id, 
      prev(null_order) null_order_f, null_order null_order_t
    all rows per match
    after match skip to next row
    pattern({-a-} b+)
    define a as seg_start = -1,
      b as null_order = -1 or ft <= first(t) or first(t) is null
  )
  where null_order_f < null_order_t or seg_f < seg_t
  group by obj_id, ab_attr, seg_f, seg_t
)
select * from segmented_and_grouped
match_recognize(
  partition by obj_id, ab_attr, attr_id order by seg_f nulls first
  measures first(seg_f) f, last(seg_t) t
  pattern(a+)
  define a as count(*) = 1 or seg_f = prev(seg_t)
)
order by obj_id, f nulls first, t;

Next up: Swiss Army knife.

Ranges with NULLs 05: Segments

Take two overlapping ranges: “1 to 3” and “2 to 4”. We can divide them into three smaller ranges: “1 to 2”, “2 to 3” and “3 to 4”. I’ll call these smaller ranges segments. We sometimes need these segments to solve advanced range problems.

Current list of posts: 

Requirement

We know that ranges “1 to 3” and “2 to 4” overlap, and the overlapping part is “2 to 3”. I call that part a segment; a segment has the following properties:

  • Its starting point is a boundary point (start or end) of an original range;
  • Its ending point is the next boundary point, either of that range or some overlapping range;
  • It is entirely covered by at least one original range (so there is no segment corresponding to a “gap”);

The requirement here is to find all the segments belonging to a group of ranges. In this example we should find “1 to 2”, “2 to 3” and “3 to 4”. As a bonus, I’ll determine for each segment how many original ranges it belongs to: here “2 to 3” belongs to 2 ranges and the others belong to only one.

Reminders (skip if you are reading the whole series)

  • In my world, ranges have exclusive end points: this is called the “closed-open” type. It allows ranges that “meet” without gaps or overlaps.
  • I classify ranges in three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • My data consists of an object and a “from” and “to” range.
  • I propose solutions that require database version 12.1 or later.

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution. This is the same data as in my previous post Ranges with NULLs 03: Gaps

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.

Segments in non-null ranges

  • First I UNPIVOT F and T into one column (called F for convenience) that contains all the boundary points, and I create a column called SEG_START that is 1 when the source is F and -1 with the source is T.
  • Then I use the analytic LEAD function to pair each boundary point with the next boundary point, while also doing a running sum of SEG_START.
  • When the running sum is 0, I have a “gap” so I filter out that row. I also filter out rows where F and T are equal.
with unpivoted as (
  select * from t
  unpivot(f for seg_start in(f as 1, t as -1))
)
select OBJ_ID, F, T, NUM_SEGS, AF, AT, BF, BT
from (
  select u.*,
  lead(f) over(partition by obj_id order by f) t,
  sum(seg_start) over(partition by obj_id order by f) num_segs
  from unpivoted u
)
where num_segs > 0 and f < t;
OBJ_ID F T NUM_SEGS AF AT BF BT
1p 0 1 2 1 1 2 3 4
1p 0 3 4 1 1 2 3 4
2m 0 1 2 1 1 2 2 3
2m 0 2 3 1 1 2 2 3
3o 0 1 2 1 1 3 2 4
3o 0 2 3 2 1 3 2 4
3o 0 3 4 1 1 3 2 4
4f 0 1 2 1 1 3 2 3
4f 0 2 3 2 1 3 2 3
5c 0 1 2 1 1 4 2 3
5c 0 2 3 2 1 4 2 3
5c 0 3 4 1 1 4 2 3
6s 0 1 2 2 1 2 1 3
6s 0 2 3 1 1 2 1 3
7e 0 1 2 2 1 2 1 2

 

Segments with “To” NULL

Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
commit;
  • line 3: add the INCLUDE NULLS option to include T when it is NULL.
  • line 12: include segments that end with NULL.
with unpivoted as (
  select * from t
  unpivot include nulls (f for seg_start in(f as 1, t as -1))
)
select OBJ_ID, F, T, NUM_SEGS, AF, AT, BF, BT
from (
  select u.*,
  lead(f) over(partition by obj_id order by f) t,
  sum(seg_start) over(partition by obj_id order by f) num_segs
  from unpivoted u
)
where num_segs > 0 and (f < t or t is null);
OBJ_ID F T NUM_SEGS AF AT BF BT
1p 0 1 2 1 1 2 3 4
1p 0 3 4 1 1 2 3 4
1p 1 1 2 1 1 2 3
1p 1 3 1 1 2 3
2m 0 1 2 1 1 2 2 3
2m 0 2 3 1 1 2 2 3
2m 1 1 2 1 1 2 2
2m 1 2 1 1 2 2
3o 0 1 2 1 1 3 2 4
3o 0 2 3 2 1 3 2 4
3o 0 3 4 1 1 3 2 4
3o 1 1 2 1 1 3 2
3o 1 2 3 2 1 3 2
3o 1 3 1 1 3 2
4f 0 1 2 1 1 3 2 3
4f 0 2 3 2 1 3 2 3
4f 1 1 2 1 1 2
4f 1 2 2 1 2
5c 0 1 2 1 1 4 2 3
5c 0 2 3 2 1 4 2 3
5c 0 3 4 1 1 4 2 3
5c 1 1 2 1 1 2 3
5c 1 2 3 2 1 2 3
5c 1 3 1 1 2 3
6s 0 1 2 2 1 2 1 3
6s 0 2 3 1 1 2 1 3
6s 1 1 2 2 1 2 1
6s 1 2 1 1 2 1
7e 0 1 2 2 1 2 1 2
7e 1 1 2 1 1

 

With “From” and “To” NULLs

Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
commit;

This is why I don’t like to allow NULL in “from” columns. After we UNPIVOT, we have to sort the null “from” values first and the null “to” values last! As a workaround, I create a column called NULL_ORDER that is -1 when F is NULL, 1 when T is NULL and 0 otherwise. I then test or order by this column when necessary.

with unpivoted as (
  select u.*,
    case when seg_start =  1 and f is null then -1
         when seg_start = -1 and f is null then  1
         else 0
    end null_order
  from t
  unpivot include nulls (f for seg_start in(f as 1, t as -1)) u
)
select OBJ_ID, F, T, NUM_SEGS, AF, AT, BF, BT
from (
  select u.*,
  lead(null_order) over(partition by obj_id order by null_order, f) t_null_order,
  lead(f) over(partition by obj_id order by null_order, f) t,
  sum(seg_start) over(partition by obj_id order by null_order, f) num_segs
  from unpivoted u
)
where num_segs > 0 and (null_order < t_null_order or f < t);
OBJ_ID F T NUM_SEGS AF AT BF BT
1p 0 1 2 1 1 2 3 4
1p 0 3 4 1 1 2 3 4
1p 1 1 2 1 1 2 3  
1p 1 3   1 1 2 3  
1p 2   2 1   2 3  
1p 2 3   1   2 3  
2m 0 1 2 1 1 2 2 3
2m 0 2 3 1 1 2 2 3
2m 1 1 2 1 1 2 2  
2m 1 2   1 1 2 2  
2m 2   2 1   2 2  
2m 2 2   1   2 2  
3o 0 1 2 1 1 3 2 4
3o 0 2 3 2 1 3 2 4
3o 0 3 4 1 1 3 2 4
3o 1 1 2 1 1 3 2  
3o 1 2 3 2 1 3 2  
3o 1 3   1 1 3 2  
3o 2   2 1   3 2  
3o 2 2 3 2   3 2  
3o 2 3   1   3 2  
4f 0 1 2 1 1 3 2 3
4f 0 2 3 2 1 3 2 3
4f 1 1 2 1 1   2  
4f 1 2   2 1   2  
4f 2   2 1     2  
4f 2 2   2     2  
5c 0 1 2 1 1 4 2 3
5c 0 2 3 2 1 4 2 3
5c 0 3 4 1 1 4 2 3
5c 1 1 2 1 1   2 3
5c 1 2 3 2 1   2 3
5c 1 3   1 1   2 3
5c 2   2 1     2 3
5c 2 2 3 2     2 3
5c 2 3   1     2 3
6s 0 1 2 2 1 2 1 3
6s 0 2 3 1 1 2 1 3
6s 1 1 2 2 1 2 1  
6s 1 2   1 1 2 1  
6s 2   2 2   2    
6s 2 2   1   2    
7e 0 1 2 2 1 2 1 2
7e 1 1   2 1   1  
7e 2     2        

 
Next up: “joining” each segment to the original ranges.

Ranges with NULLs 04: Pack, Merge

In this post, I’m going to treat two closely related but distinct requirements:

  1. “Pack”: in a SELECT, return contiguous or overlapping ranges as one range.
  2. “Merge”: modify a table to reduce contiguous or overlapping ranges to one range.

These operations should take into account other data associated with each range: ranges are packed or merged only when the other data is the same.

Current list of posts: 

What is the meaning of this?

This heading sounds like something a teacher would say when he discovers students breaking the rules. Actually, I’m not trying to break rules, I’m trying to make some – or at least tell you what rules I am following.

  • In my world, the rule is that ranges have exclusive end points: this is called the “closed-open” type. It allows ranges that “meet” without gaps or overlaps.
  • I classify ranges in three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • My data consists of an object, a “from” and “to” range –
    and optional “attributes” that are valid for the object only within the range.
  • In this post, I’ll use “Pack” to mean a SELECT that returns contiguous or overlapping ranges as one big range (any “attributes” have to be the same).
  • In this post, I’ll use “Merge” to mean a MERGE statement that changes a table so that contiguous or overlapping ranges become one big range.
  • I propose solutions that require database version 12.1 or later.

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution. This is the same data as in my previous post Ranges with NULLs 03: Gaps

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.

Pack non-null ranges

  • I partition by the object ID and also by any attributes I want to include. PARTITION BY, like GROUP BY, does a good job of comparing values when NULL might be involved: two NULLs are considered “the same”, and a NULL is considered different from a non-NULL value.
    For brevity, here the attributes for a given object are always the same.
  • In a DEFINE, the current row is assumed to belong to the condition being tested, so I cannot compare the current start date to the maximum prior end date. I can however compare the maximum end date to the next start date.
select obj_id, F, T, AF, AT, BF, BT
from t
match_recognize( 
  partition by obj_id, AF, AT, BF, BT
  order by F, T
  measures first(F) F, max(T) T 
  pattern(a* b)
  define a as max(T) >= next(F)
);
OBJ_ID F T AF AT BF BT
1p 0 1 2 1 2 3 4
1p 0 3 4 1 2 3 4
2m 0 1 3 1 2 2 3
3o 0 1 4 1 3 2 4
4f 0 1 3 1 3 2 3
5c 0 1 4 1 4 2 3
6s 0 1 3 1 2 1 3
7e 0 1 2 1 2 1 2

 

Pack ranges with “To” NULL

Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
commit;

In the MEASURES and DEFINE lines, I find out whether there has been a NULL “to” by comparing COUNT(*) to COUNT(T).

select obj_id, F, T, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id, AF, AT, BF, BT
  order by F, T
  measures first(F) F, decode(count(*), count(T), max(T)) T
  pattern(a* b)
  define a as count(*) > count(T) or max(T) >= next(F)
)
order by 1, 2 nulls first, 3;
OBJ_ID F T AF AT BF BT
1p 0 1 2 1 2 3 4
1p 0 3 4 1 2 3 4
1p 1 1 2 1 2 3
1p 1 3 1 2 3
2m 0 1 3 1 2 2 3
2m 1 1 1 2 2
3o 0 1 4 1 3 2 4
3o 1 1 1 3 2
4f 0 1 3 1 3 2 3
4f 1 1 1 2
5c 0 1 4 1 4 2 3
5c 1 1 1 2 3
6s 0 1 3 1 2 1 3
6s 1 1 1 2 1
7e 0 1 2 1 2 1 2
7e 1 1 1 1

 

Pack with “From” and “To” NULLs

Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
commit;

We just need to order by F nulls first, and in the DEFINE say nvl(next(F),T).

select obj_id, F, T, AF, AT, BF, BT
from t
match_recognize( 
  partition by obj_id, AF, AT, BF, BT
  order by F nulls first, T
  measures first(F) F, decode(count(*), count(T), max(T)) T 
  pattern(a* b)
  define a as count(*) > count(T) or max(T) >= nvl(next(F),T)
)
order by 1,2 nulls first, 3;
OBJ_ID F T AF AT BF BT
1p 0 1 2 1 2 3 4
1p 0 3 4 1 2 3 4
1p 1 1 2 1 2 3
1p 1 3 1 2 3
1p 2 2 2 3
1p 2 3 2 3
2m 0 1 3 1 2 2 3
2m 1 1 1 2 2
2m 2 2 2
3o 0 1 4 1 3 2 4
3o 1 1 1 3 2
3o 2 3 2
4f 0 1 3 1 3 2 3
4f 1 1 1 2
4f 2 2
5c 0 1 4 1 4 2 3
5c 1 1 1 2 3
5c 2 2 3
6s 0 1 3 1 2 1 3
6s 1 1 1 2 1
6s 2 2
7e 0 1 2 1 2 1 2
7e 1 1 1 1
7e 2

 

Merge with or without NULLs

Here I’ll use the same data.

  • I use the ROWID to identify each row and to allow direct access to the target rows in the MERGE step.
  • To take attributes into account, simply add them to the PARTITION BY list.
  • Since I use ALL ROWS PER MATCH, I must use the FINAL keyword to include the entire match in the aggregation functions.
  • I adjust the PATTERN (A+, not A*) to bypass rows that are already packed ranges.
  • I return all the rows to be packed, updating the first row to the new T and deleting the others.
merge into t o
using (
  select * from (
    select obj_id, f, t, rowid rid from t a
  )
  match_recognize( 
    partition by obj_id
    order by F nulls first, T
    measures count(*) cnt,
      decode(final count(*), final count(T), final max(T)) new_T 
    all rows per match
    pattern(a+ b)
    define a as count(*) > count(T) or max(T) >= nvl(next(F),T)
  )
) n
on (o.rowid = n.rid)
when matched then update set o.t = n.new_t
  delete where n.cnt > 1;

36 rows merged.

Next up: breaking down overlapping ranges into component segments.

Ranges with NULLs 03: Gaps

When I wrote “Gaps in Date Ranges: when are you free?“, I handled NULLs using “magic” values. This time I’ll try not to cheat.

[UPDATE 2018-11-08: I have never seen a situation where “from” could be NULL and “to” was NOT NULL, so I no longer test that. ]

Current list of posts: 

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution.

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.

Analytic solution

select * from (
  select OBJ_ID,
    max(t) over(partition by obj_id order by f) start_gap,
    lead(f) over(partition by obj_id order by f) end_gap,
    AF, AT, BF, BT
  from t
)
where start_gap < end_gap
order by obj_id, start_gap;
OBJ_ID START_GAP END_GAP AF AT BF BT
1p 0 2 3 1 2 3 4

 

The start of the gap is the biggest T so far, and the end of the gap is the next F – as long as MAX(T) is less than the next F.

Match_recognize solution

select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id
  order by F, T
  measures max(T) start_gap, next(F) end_gap
  all rows per match
  pattern((A|{-B-})+)
  define A as max(T) < next(F)
)
order by obj_id, start_gap;

 

When the “to” value can be null

The new test data is exactly the same as before, except that the highest “to” value in each test case has been replaced with NULL. The OBJ_ID of the test case ends in 1, not 0, to signal this.

Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);

The problem with “to” being null is that MAX(T) ignores NULL values. I have to know if there has been a NULL T along the way, because that means there can be no more gaps. I use SUM(DECODE(…)) in the analytic solution, and with MATCH_RECOGNIZE I compare COUNT(*) to COUNT(T): if they are not the same then there was a null value along the way.

select * from (
  select OBJ_ID,
    case sum(decode(t,null,1,0)) over(partition by obj_id order by f)
      when 0
        then max(t) over(partition by obj_id order by f)
    end start_gap,
    lead(f) over(partition by obj_id order by f) end_gap,
    AF, AT, BF, BT
  from t
)
where start_gap < end_gap;

select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id
  order by F, T
  measures max(T) start_gap, next(F) end_gap
  all rows per match
  pattern((A|{-B-})+)
  define A as max(T) < next(F) and count(t) = count(*)
)
order by obj_id, start_gap;
OBJ_ID START_GAP END_GAP AF AT BF BT
1p 0 2 3 1 2 3 4
1p 1 2 3 1 2 3

 

When “from” and “to” can both be null

In the new test data, OBJ_ID ends with 2 when the lowest F and the highest T are replaced by NULL.

The only additional change is to order by F NULLS FIRST.

Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);

select * from (
  select OBJ_ID,
    case sum(decode(t,null,1,0)) over(partition by obj_id order by f nulls first)
      when 0
        then max(t) over(partition by obj_id order by f nulls first)
    end start_gap,
    lead(f) over(partition by obj_id order by f nulls first) end_gap,
    AF, AT, BF, BT
  from t
)
where start_gap < end_gap;

select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id
  order by F nulls first, T
  measures max(T) start_gap, next(F) end_gap
  all rows per match
  pattern((A|{-B-})+)
  define A as max(T) < next(F) and count(t) = count(*)
)
order by obj_id, start_gap;
OBJ_ID START_GAP END_GAP AF AT BF BT
1p 0 2 3 1 2 3 4
1p 1 2 3 1 2 3
1p 2 2 3 2 3

 
Next up: merging or “packing” ranges.

Ranges with NULLs 02: test cases

When I write SQL there is a bit of trial and error (laughter), so I need as complete a set of test cases as possible. Here’s what I came up with.

Current list of posts: 

[UPDATE 2018-11-28: added some test cases and recalculated their number.]

0: input with no rows

Would you believe that one of my solutions doesn’t work when the input has no rows? Well…let’s put it this way, I never said what “working” should mean with no input rows. Now that I’ve added that test case, I have to say what output I expect.

1: one range per test case

This is another situation I never bothered to think about – until now.

2: two ranges per test case

With two ranges to compare, it only takes 7 test cases to cover all the relations defined in Dr. Allen’s Time Interval Algebra.
Time Interval Relationships

The image mentions “day of month” but the logic would be identical with numbers instead of dates. The boundaries are starting and ending points: the starting point is included in the range, but the ending point is not. When A’s ending point is equal to B’s starting point, the ranges “meet” with no gap and no overlap.

3: three ranges per test case

In all the time I have looked at range-related problems on forums, every bug could be illustrated with at most three ranges. I decided to generate all the possible combinations of relations: range A to range B, A to C and B to C. It turns out there are 75 valid combinations, so 75 test cases at 3 rows per test case.

3: three combinations of NULL and NOT NULL

The whole point in this series is to deal with NULL starting and ending points, so I will test three variations:

  • No nulls
  • Start can be null, end not null
  • Start not null, end can be null
  • Start and end can both be null.

[UPDATE 2018-11-8: I now omit the variant “Start can be null, end not null” since I have never seen it in real life.]

7 / 79: comparing attributes

When ranges intersect in any way, there may be a requirement to compare additional attributes. For example, we may want to merge two rows together if they intersect and all the attributes are the same. Limiting myself to one attribute only, I’ve come up with six combinations:

  1. x=y: the attributes are identical
  2. x<y: the "earlier" range attribute is less than the "later" range attribute
  3. x>y: the earlier attribute is greater than the later one
  4. N-y: the earlier attribute is null
  5. x-N: the later attribute is null
  6. N-N: both attributes are null

For completeness, I’ll add case “0-n/a”: when there is a gap, the attributes are not compared.

When there are three ranges, there are up to 79 different combinations of comparison.

Summary: 5128 test cases

Changing the number of ranges from 0 through 3, and adding attributes when there are two or three relations without gaps, I get a total of 5128 test cases in 6 different tables:

Table name                  RANGES0 RANGES1 RANGES2 RANGES2_ATTRS RANGES3 RANGES3_ATTR
Relations                   0       0       7       7             75      75
Relations with or w/o nulls 0       0       21      21            225     225
Combinations of attributes  0       0       0       7             0       79
Test cases                  1       3       21      111           225     4767
rows                        0       3       42      222           675     14301

Forgive me for not publishing the actual test data, and the convoluted code I used to generate it. I may show bits as needed in later posts.

Next up: finding gaps in ranges.

Ranges with NULLs 01: starting over

I have written a lot about ranges, mostly based on dates, and I have tried my best to avoid NULLs in “from” and “to” columns. I give up: NULLs are a fact of life and must be dealt with. This means revisiting all my previous work!

Current list of posts: 

What do I mean by “range”?

Let’s start with the SQL:2011 definition of PERIOD. A PERIOD identifies two datetime columns. The end datetime is greater than the start datetime. In the Oracle implementation, both start and end can be null.

A datetime value falls within the period if:

  • it is greater than or equal to the start datetime
    OR the start datetime is NULL
  • and it is less than the end datetime
    OR the end datetime is NULL.

A period that starts with NULL and ends with NULL will include any datetime value, including NULL!

My notion of “range” is basically an extension of this definition to more data types:

  • A range identifies a pair of values of the same data type.
  • The data type can be any scalar capable of equal, greater-than and less-than comparisons,
    though datetime types and numbers are by far the most common.
  • I tend to use “from” and “to” because “start” and “end” make me think of datetime ranges.
  • “From” is less than “to”
  • “From” and / or “to” can be NULL
  • A value falls within the range if:
    • it is greater than or equal to “from”
      OR “from” is NULL
    • and it is less than “to”
      OR “to” is NULL
  • In table definitions, I strongly recommend constraints
    • “from” < "to". This constraint should always be there.
    • “from” NOT NULL (unless a use case necessitates NULL).

Closed-Open ranges

When defining ranges, we always want to allow the possibility for two ranges to “meet”, meaning there is no gap between them yet they do not overlap.

The “closed-open” model allows this, because the “from” value belongs to the range but the “to” value does not.

If we have a range 1 to 2 and another range from 2 to 3:

  • The value 2 is not in the first range, but it is in the second.
  • There is no value from 1 to 3 that is in both ranges, so there is no overlap.
  • There is no value from 1 to 3 that is in neither range, so there is no gap.
  • In other words, every value from 1 to 3 is in exactly one of the two ranges.

This model will work will all types of range, and it makes some queries easier to write. It’s all good.

Objections to NULL in ranges

I once wrote SQL and date ranges: don’t make NULL mean something, where I gave five arguments against NULL. Here’s what I think as of today.

  1. “NULL should not mean anything”.
    I still agree with that, but what can you do?
  2. “We can use real values instead of NULL”.
    That may be the case for “from”, but not for “to” because there is no valid “to” value that is strictly greater than every valid “to” value!
  3. “Oracle doesn’t index NULLs”.
    Oracle does not index a row if all of the columns to be indexed are NULL. When this is a problem, it can be worked around.
  4. “Queries on NULLable ranges are hard”.
    We’ll see if I am up to the task…
  5. “Unique constraints don’t work with NULL”.
    Again, when this is a problem there are workarounds.

Next up: test data! If the queries are hard, I must try to test them thoroughly.