Ranges with NULLs 06: Overlaps with Conflicting Data

In 2014 I attacked the problem of “Overlapping ranges with priorities”. This time I’ll deal with NULLs and propose an improved solution.

Current list of posts: 

Requirement

Assume overlapping ranges with one attribute. I call the overlapping parts “segments”.

  • Segments: break each range down into segments, including the range’s attribute.
  • Priority: when two or more segments with the same “from” and “to” have different values, the minimum value has priority.
  • Merge: once all the segments have proper values assigned, merge contiguous segments having the same values.

Reminders

  • In my world, ranges have exclusive end points: this is called the “closed-open” type. It allows ranges that “meet” without gaps or overlaps.
  • I classify ranges in three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • I propose solutions that require database version 12.1 or later.

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution.

CREATE TABLE T (
OBJ_ID    VARCHAR2 ( 20 BYTE ),
F         NUMBER,
T         NUMBER,
ATTR_ID   NUMBER,
AB_ATTR   VARCHAR2 ( 90 BYTE )
);
Insert into T values ('3o    01','1','3','1','1:1=1');
Insert into T values ('3o    01','2','4','1','1:1=1');
Insert into T values ('3o    02','1','3','1','2:1<2');
Insert into T values ('3o    02','2','4','2','2:1<2');
Insert into T values ('3o    03','1','3','2','3:2>1');
Insert into T values ('3o    03','2','4','1','3:2>1');
Insert into T values ('3o    04','1','3',null,'4:N-1');
Insert into T values ('3o    04','2','4','1','4:N-1');
Insert into T values ('3o    05','1','3','1','5:1-N');
Insert into T values ('3o    05','2','4',null,'5:1-N');
Insert into T values ('3o    06','1','3',null,'6:N-N');
Insert into T values ('3o    06','2','4',null,'6:N-N');
Insert into T values ('5c    01','1','4','1','1:1=1');
Insert into T values ('5c    01','2','3','1','1:1=1');
Insert into T values ('5c    02','1','4','1','2:1<2');
Insert into T values ('5c    02','2','3','2','2:1<2');
Insert into T values ('5c    03','1','4','2','3:2>1');
Insert into T values ('5c    03','2','3','1','3:2>1');
Insert into T values ('5c    04','1','4',null,'4:N-1');
Insert into T values ('5c    04','2','3','1','4:N-1');
Insert into T values ('5c    05','1','4','1','5:1-N');
Insert into T values ('5c    05','2','3',null,'5:1-N');
Insert into T values ('5c    06','1','4',null,'6:N-N');
Insert into T values ('5c    06','2','3',null,'6:N-N');
OBJ_ID F T ATTR_ID AB_ATTR
3o    01 1 3 1 1:1=1
3o    01 2 4 1 1:1=1
3o    02 1 3 1 2:1<2
3o    02 2 4 2 2:1<2
3o    03 1 3 2 3:2>1
3o    03 2 4 1 3:2>1
3o    04 1 3 4:N-1
3o    04 2 4 1 4:N-1
3o    05 1 3 1 5:1-N
3o    05 2 4 5:1-N
3o    06 1 3 6:N-N
3o    06 2 4 6:N-N
5c    01 1 4 1 1:1=1
5c    01 2 3 1 1:1=1
5c    02 1 4 1 2:1<2
5c    02 2 3 2 2:1<2
5c    03 1 4 2 3:2>1
5c    03 2 3 1 3:2>1
5c    04 1 4 4:N-1
5c    04 2 3 1 4:N-1
5c    05 1 4 1 5:1-N
5c    05 2 3 5:1-N
5c    06 1 4 6:N-N
5c    06 2 3 6:N-N

 

  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “overlaps, contains”. The 0 in position 6 means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the relationship between the ATTR_ID values. There are 6 such relationships, identified by the last digit of OBJ_ID.

Non-null ranges

  • First I UNPIVOT F and T into one column called FT  that contains all the boundary points, and I create a column called SEG_START that is -1 when the source is F and 1 with the source is T. I also retain the original value of T in every new row.
  • Then I use MATCH_RECOGNIZE to identify, each time SEG_START = -1, all the boundary points up to and including the original T. I output each segment with the original ATTR_ID value.
  • Then I use GROUP BY to reduce duplicate segments to one, retaining the minimum ATTR_ID
  • Finally, I use MATCH_RECOGNIZE again to merge the contiguous segments with the same ATTR_ID value.
with unpivoted as (
  select * from t
  unpivot ( (ft, t) for seg_start in ((f,t) as -1, (t,t) as 1))
)
, segmented as (
  select obj_id, ab_attr, seg_f, seg_t, range_attr_id
  from unpivoted
  match_recognize(
    partition by obj_id order by ft, seg_start, t
    measures prev(ft) seg_f, ft seg_t, first(attr_id) range_attr_id
    all rows per match
    after match skip to next row
    pattern({-a-} b+)
    define a as seg_start = -1,
      b as ft <= first(t)
  )
  where seg_f < seg_t
)
, grouped as (
  select OBJ_ID, AB_ATTR, SEG_F, SEG_T, min(RANGE_ATTR_ID) attr_id
  from segmented
  group by OBJ_ID, AB_ATTR, SEG_F, SEG_T
)
select * from grouped
match_recognize(
  partition by obj_id, ab_attr, attr_id order by seg_f, seg_t
  measures first(seg_f) f, last(seg_t) t
  pattern(a b*)
  define b as seg_f = prev(seg_t)
)
order by obj_id, f, t;
OBJ_ID AB_ATTR ATTR_ID F T
3o    01 1:1=1 1 1 4
3o    02 2:1<2 1 1 3
3o    02 2:1<2 2 3 4
3o    03 3:2>1 2 1 2
3o    03 3:2>1 1 2 4
3o    04 4:N-1 1 2
3o    04 4:N-1 1 2 4
3o    05 5:1-N 1 1 3
3o    05 5:1-N 3 4
3o    06 6:N-N 1 4
5c    01 1:1=1 1 1 4
5c    02 2:1<2 1 1 4
5c    03 3:2>1 2 1 2
5c    03 3:2>1 1 2 3
5c    03 3:2>1 2 3 4
5c    04 4:N-1 1 2
5c    04 4:N-1 1 2 3
5c    04 4:N-1 3 4
5c    05 5:1-N 1 1 4
5c    06 6:N-N 1 4

 

Segments with “To” NULL

Insert into T values ('3o    11','1','3','1','1:1=1');
Insert into T values ('3o    11','2',null,'1','1:1=1');
Insert into T values ('3o    12','1','3','1','2:1<2');
Insert into T values ('3o    12','2',null,'2','2:11');
Insert into T values ('3o    13','2',null,'1','3:2>1');
Insert into T values ('3o    14','1','3',null,'4:N-1');
Insert into T values ('3o    14','2',null,'1','4:N-1');
Insert into T values ('3o    15','1','3','1','5:1-N');
Insert into T values ('3o    15','2',null,null,'5:1-N');
Insert into T values ('3o    16','1','3',null,'6:N-N');
Insert into T values ('3o    16','2',null,null,'6:N-N');
Insert into T values ('5c    11','1',null,'1','1:1=1');
Insert into T values ('5c    11','2','3','1','1:1=1');
Insert into T values ('5c    12','1',null,'1','2:1<2');
Insert into T values ('5c    12','2','3','2','2:11');
Insert into T values ('5c    13','2','3','1','3:2>1');
Insert into T values ('5c    14','1',null,null,'4:N-1');
Insert into T values ('5c    14','2','3','1','4:N-1');
Insert into T values ('5c    15','1',null,'1','5:1-N');
Insert into T values ('5c    15','2','3',null,'5:1-N');
Insert into T values ('5c    16','1',null,null,'6:N-N');
Insert into T values ('5c    16','2','3',null,'6:N-N');
  • line 3: add the INCLUDE NULLS option to include T when it is NULL.
  • lines 15 and 17: include segments that end with NULL.
with unpivoted as (
  select * from t
  unpivot INCLUDE NULLS ( (ft, t) for seg_start in ((f,t) as -1, (t,t) as 1))
)
, segmented as (
  select obj_id, ab_attr, seg_f, seg_t, range_attr_id
  from unpivoted
  match_recognize(
    partition by obj_id order by ft, seg_start, t
    measures prev(ft) seg_f, ft seg_t, first(attr_id) range_attr_id
    all rows per match
    after match skip to next row
    pattern({-a-} b+)
    define a as seg_start = -1,
      b as ft <= first(t) OR FIRST(T) IS NULL
  )
  where seg_f < seg_t OR SEG_T IS NULL
)
, grouped as (
  select OBJ_ID, AB_ATTR, SEG_F, SEG_T, min(RANGE_ATTR_ID) attr_id
  from segmented
  group by OBJ_ID, AB_ATTR, SEG_F, SEG_T
)
select * from grouped
match_recognize(
  partition by obj_id, ab_attr, attr_id order by seg_f, seg_t
  measures first(seg_f) f, last(seg_t) t
  pattern(a b*)
  define b as seg_f = prev(seg_t)
)
order by obj_id, f, t;

Just so your mouse won’t get too tired from scrolling, I will spare you the output. There are 20 additional rows for the new test cases.

With “From” and “To” NULLs

Insert into T values ('3o    21',null,'3','1','1:1=1');
Insert into T values ('3o    21','2',null,'1','1:1=1');
Insert into T values ('3o    22',null,'3','1','2:1<2');
Insert into T values ('3o    22','2',null,'2','2:11');
Insert into T values ('3o    23','2',null,'1','3:2>1');
Insert into T values ('3o    24',null,'3',null,'4:N-1');
Insert into T values ('3o    24','2',null,'1','4:N-1');
Insert into T values ('3o    25',null,'3','1','5:1-N');
Insert into T values ('3o    25','2',null,null,'5:1-N');
Insert into T values ('3o    26',null,'3',null,'6:N-N');
Insert into T values ('3o    26','2',null,null,'6:N-N');
Insert into T values ('5c    21',null,null,'1','1:1=1');
Insert into T values ('5c    21','2','3','1','1:1=1');
Insert into T values ('5c    22',null,null,'1','2:1<2');
Insert into T values ('5c    22','2','3','2','2:11');
Insert into T values ('5c    23','2','3','1','3:2>1');
Insert into T values ('5c    24',null,null,null,'4:N-1');
Insert into T values ('5c    24','2','3','1','4:N-1');
Insert into T values ('5c    25',null,null,'1','5:1-N');
Insert into T values ('5c    25','2','3',null,'5:1-N');
Insert into T values ('5c    26',null,null,null,'6:N-N');
Insert into T values ('5c    26','2','3',null,'6:N-N');
  • lines 1 through 9: as explained in my previous post, I create a new column called NULL_ORDER so that NULL “from” values come first.
  • lines 10 through 25: I combine segmenting and grouping in one SELECT just to show that it can be done: the MATCH_RECOGNIZE clause is executed before the GROUP BY. The NULL_ORDER column is used in ordering and comparisons.
  • line 28: order by seg_f NULLS FIRST and it all works!
with unpivoted as (
  select u.*,
    case when seg_start = -1 and ft is null then -1
         when seg_start = 1  and ft is null then  1
         else 0
    end null_order
  from t
  unpivot include nulls ( (ft, t) for seg_start in ((f,t) as -1, (t,t) as 1)) u
)
, segmented_and_grouped as (
  select obj_id, ab_attr, seg_f, seg_t, min(range_attr_id) attr_id
  from unpivoted
  match_recognize(
    partition by obj_id order by null_order, ft, seg_start, t
    measures prev(ft) seg_f, ft seg_t, first(attr_id) range_attr_id, 
      prev(null_order) null_order_f, null_order null_order_t
    all rows per match
    after match skip to next row
    pattern({-a-} b+)
    define a as seg_start = -1,
      b as null_order = -1 or ft <= first(t) or first(t) is null
  )
  where null_order_f < null_order_t or seg_f < seg_t
  group by obj_id, ab_attr, seg_f, seg_t
)
select * from segmented_and_grouped
match_recognize(
  partition by obj_id, ab_attr, attr_id order by seg_f nulls first
  measures first(seg_f) f, last(seg_t) t
  pattern(a+)
  define a as count(*) = 1 or seg_f = prev(seg_t)
)
order by obj_id, f nulls first, t;

Next up: Swiss Army knife.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s