Ranges with NULLs 05: Segments

Take two overlapping ranges: “1 to 3” and “2 to 4”. We can divide them into three smaller ranges: “1 to 2”, “2 to 3” and “3 to 4”. I’ll call these smaller ranges segments. We sometimes need these segments to solve advanced range problems.

Current list of posts: 

Requirement

We know that ranges “1 to 3” and “2 to 4” overlap, and the overlapping part is “2 to 3”. I call that part a segment; a segment has the following properties:

  • Its starting point is a boundary point (start or end) of an original range;
  • Its ending point is the next boundary point, either of that range or some overlapping range;
  • It is entirely covered by at least one original range (so there is no segment corresponding to a “gap”);

The requirement here is to find all the segments belonging to a group of ranges. In this example we should find “1 to 2”, “2 to 3” and “3 to 4”. As a bonus, I’ll determine for each segment how many original ranges it belongs to: here “2 to 3” belongs to 2 ranges and the others belong to only one.

Reminders (skip if you are reading the whole series)

  • In my world, ranges have exclusive end points: this is called the “closed-open” type. It allows ranges that “meet” without gaps or overlaps.
  • I classify ranges in three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • My data consists of an object and a “from” and “to” range.
  • I propose solutions that require database version 12.1 or later.

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution. This is the same data as in my previous post Ranges with NULLs 03: Gaps

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.

Segments in non-null ranges

  • First I UNPIVOT F and T into one column (called F for convenience) that contains all the boundary points, and I create a column called SEG_START that is 1 when the source is F and -1 with the source is T.
  • Then I use the analytic LEAD function to pair each boundary point with the next boundary point, while also doing a running sum of SEG_START.
  • When the running sum is 0, I have a “gap” so I filter out that row. I also filter out rows where F and T are equal.
with unpivoted as (
  select * from t
  unpivot(f for seg_start in(f as 1, t as -1))
)
select OBJ_ID, F, T, NUM_SEGS, AF, AT, BF, BT
from (
  select u.*,
  lead(f) over(partition by obj_id order by f) t,
  sum(seg_start) over(partition by obj_id order by f) num_segs
  from unpivoted u
)
where num_segs > 0 and f < t;
OBJ_ID F T NUM_SEGS AF AT BF BT
1p 0 1 2 1 1 2 3 4
1p 0 3 4 1 1 2 3 4
2m 0 1 2 1 1 2 2 3
2m 0 2 3 1 1 2 2 3
3o 0 1 2 1 1 3 2 4
3o 0 2 3 2 1 3 2 4
3o 0 3 4 1 1 3 2 4
4f 0 1 2 1 1 3 2 3
4f 0 2 3 2 1 3 2 3
5c 0 1 2 1 1 4 2 3
5c 0 2 3 2 1 4 2 3
5c 0 3 4 1 1 4 2 3
6s 0 1 2 2 1 2 1 3
6s 0 2 3 1 1 2 1 3
7e 0 1 2 2 1 2 1 2

 

Segments with “To” NULL

Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
commit;
  • line 3: add the INCLUDE NULLS option to include T when it is NULL.
  • line 12: include segments that end with NULL.
with unpivoted as (
  select * from t
  unpivot include nulls (f for seg_start in(f as 1, t as -1))
)
select OBJ_ID, F, T, NUM_SEGS, AF, AT, BF, BT
from (
  select u.*,
  lead(f) over(partition by obj_id order by f) t,
  sum(seg_start) over(partition by obj_id order by f) num_segs
  from unpivoted u
)
where num_segs > 0 and (f < t or t is null);
OBJ_ID F T NUM_SEGS AF AT BF BT
1p 0 1 2 1 1 2 3 4
1p 0 3 4 1 1 2 3 4
1p 1 1 2 1 1 2 3
1p 1 3 1 1 2 3
2m 0 1 2 1 1 2 2 3
2m 0 2 3 1 1 2 2 3
2m 1 1 2 1 1 2 2
2m 1 2 1 1 2 2
3o 0 1 2 1 1 3 2 4
3o 0 2 3 2 1 3 2 4
3o 0 3 4 1 1 3 2 4
3o 1 1 2 1 1 3 2
3o 1 2 3 2 1 3 2
3o 1 3 1 1 3 2
4f 0 1 2 1 1 3 2 3
4f 0 2 3 2 1 3 2 3
4f 1 1 2 1 1 2
4f 1 2 2 1 2
5c 0 1 2 1 1 4 2 3
5c 0 2 3 2 1 4 2 3
5c 0 3 4 1 1 4 2 3
5c 1 1 2 1 1 2 3
5c 1 2 3 2 1 2 3
5c 1 3 1 1 2 3
6s 0 1 2 2 1 2 1 3
6s 0 2 3 1 1 2 1 3
6s 1 1 2 2 1 2 1
6s 1 2 1 1 2 1
7e 0 1 2 2 1 2 1 2
7e 1 1 2 1 1

 

With “From” and “To” NULLs

Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
commit;

This is why I don’t like to allow NULL in “from” columns. After we UNPIVOT, we have to sort the null “from” values first and the null “to” values last! As a workaround, I create a column called NULL_ORDER that is -1 when F is NULL, 1 when T is NULL and 0 otherwise. I then test or order by this column when necessary.

with unpivoted as (
  select u.*,
    case when seg_start =  1 and f is null then -1
         when seg_start = -1 and f is null then  1
         else 0
    end null_order
  from t
  unpivot include nulls (f for seg_start in(f as 1, t as -1)) u
)
select OBJ_ID, F, T, NUM_SEGS, AF, AT, BF, BT
from (
  select u.*,
  lead(null_order) over(partition by obj_id order by null_order, f) t_null_order,
  lead(f) over(partition by obj_id order by null_order, f) t,
  sum(seg_start) over(partition by obj_id order by null_order, f) num_segs
  from unpivoted u
)
where num_segs > 0 and (null_order < t_null_order or f < t);
OBJ_ID F T NUM_SEGS AF AT BF BT
1p 0 1 2 1 1 2 3 4
1p 0 3 4 1 1 2 3 4
1p 1 1 2 1 1 2 3  
1p 1 3   1 1 2 3  
1p 2   2 1   2 3  
1p 2 3   1   2 3  
2m 0 1 2 1 1 2 2 3
2m 0 2 3 1 1 2 2 3
2m 1 1 2 1 1 2 2  
2m 1 2   1 1 2 2  
2m 2   2 1   2 2  
2m 2 2   1   2 2  
3o 0 1 2 1 1 3 2 4
3o 0 2 3 2 1 3 2 4
3o 0 3 4 1 1 3 2 4
3o 1 1 2 1 1 3 2  
3o 1 2 3 2 1 3 2  
3o 1 3   1 1 3 2  
3o 2   2 1   3 2  
3o 2 2 3 2   3 2  
3o 2 3   1   3 2  
4f 0 1 2 1 1 3 2 3
4f 0 2 3 2 1 3 2 3
4f 1 1 2 1 1   2  
4f 1 2   2 1   2  
4f 2   2 1     2  
4f 2 2   2     2  
5c 0 1 2 1 1 4 2 3
5c 0 2 3 2 1 4 2 3
5c 0 3 4 1 1 4 2 3
5c 1 1 2 1 1   2 3
5c 1 2 3 2 1   2 3
5c 1 3   1 1   2 3
5c 2   2 1     2 3
5c 2 2 3 2     2 3
5c 2 3   1     2 3
6s 0 1 2 2 1 2 1 3
6s 0 2 3 1 1 2 1 3
6s 1 1 2 2 1 2 1  
6s 1 2   1 1 2 1  
6s 2   2 2   2    
6s 2 2   1   2    
7e 0 1 2 2 1 2 1 2
7e 1 1   2 1   1  
7e 2     2        

 
Next up: “joining” each segment to the original ranges.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s