Ranges with NULLs 04: Pack, Merge

In this post, I’m going to treat two closely related but distinct requirements:

  1. “Pack”: in a SELECT, return contiguous or overlapping ranges as one range.
  2. “Merge”: modify a table to reduce contiguous or overlapping ranges to one range.

These operations should take into account other data associated with each range: ranges are packed or merged only when the other data is the same.

Current list of posts: 

What is the meaning of this?

This heading sounds like something a teacher would say when he discovers students breaking the rules. Actually, I’m not trying to break rules, I’m trying to make some – or at least tell you what rules I am following.

  • In my world, the rule is that ranges have exclusive end points: this is called the “closed-open” type. It allows ranges that “meet” without gaps or overlaps.
  • I classify ranges in three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • My data consists of an object, a “from” and “to” range –
    and optional “attributes” that are valid for the object only within the range.
  • In this post, I’ll use “Pack” to mean a SELECT that returns contiguous or overlapping ranges as one big range (any “attributes” have to be the same).
  • In this post, I’ll use “Merge” to mean a MERGE statement that changes a table so that contiguous or overlapping ranges become one big range.
  • I propose solutions that require database version 12.1 or later.

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution. This is the same data as in my previous post Ranges with NULLs 03: Gaps

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.

Pack non-null ranges

  • I partition by the object ID and also by any attributes I want to include. PARTITION BY, like GROUP BY, does a good job of comparing values when NULL might be involved: two NULLs are considered “the same”, and a NULL is considered different from a non-NULL value.
    For brevity, here the attributes for a given object are always the same.
  • In a DEFINE, the current row is assumed to belong to the condition being tested, so I cannot compare the current start date to the maximum prior end date. I can however compare the maximum end date to the next start date.
select obj_id, F, T, AF, AT, BF, BT
from t
match_recognize( 
  partition by obj_id, AF, AT, BF, BT
  order by F, T
  measures first(F) F, max(T) T 
  pattern(a* b)
  define a as max(T) >= next(F)
);
OBJ_ID F T AF AT BF BT
1p 0 1 2 1 2 3 4
1p 0 3 4 1 2 3 4
2m 0 1 3 1 2 2 3
3o 0 1 4 1 3 2 4
4f 0 1 3 1 3 2 3
5c 0 1 4 1 4 2 3
6s 0 1 3 1 2 1 3
7e 0 1 2 1 2 1 2

 

Pack ranges with “To” NULL

Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
commit;

In the MEASURES and DEFINE lines, I find out whether there has been a NULL “to” by comparing COUNT(*) to COUNT(T).

select obj_id, F, T, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id, AF, AT, BF, BT
  order by F, T
  measures first(F) F, decode(count(*), count(T), max(T)) T
  pattern(a* b)
  define a as count(*) > count(T) or max(T) >= next(F)
)
order by 1, 2 nulls first, 3;
OBJ_ID F T AF AT BF BT
1p 0 1 2 1 2 3 4
1p 0 3 4 1 2 3 4
1p 1 1 2 1 2 3
1p 1 3 1 2 3
2m 0 1 3 1 2 2 3
2m 1 1 1 2 2
3o 0 1 4 1 3 2 4
3o 1 1 1 3 2
4f 0 1 3 1 3 2 3
4f 1 1 1 2
5c 0 1 4 1 4 2 3
5c 1 1 1 2 3
6s 0 1 3 1 2 1 3
6s 1 1 1 2 1
7e 0 1 2 1 2 1 2
7e 1 1 1 1

 

Pack with “From” and “To” NULLs

Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
commit;

We just need to order by F nulls first, and in the DEFINE say nvl(next(F),T).

select obj_id, F, T, AF, AT, BF, BT
from t
match_recognize( 
  partition by obj_id, AF, AT, BF, BT
  order by F nulls first, T
  measures first(F) F, decode(count(*), count(T), max(T)) T 
  pattern(a* b)
  define a as count(*) > count(T) or max(T) >= nvl(next(F),T)
)
order by 1,2 nulls first, 3;
OBJ_ID F T AF AT BF BT
1p 0 1 2 1 2 3 4
1p 0 3 4 1 2 3 4
1p 1 1 2 1 2 3
1p 1 3 1 2 3
1p 2 2 2 3
1p 2 3 2 3
2m 0 1 3 1 2 2 3
2m 1 1 1 2 2
2m 2 2 2
3o 0 1 4 1 3 2 4
3o 1 1 1 3 2
3o 2 3 2
4f 0 1 3 1 3 2 3
4f 1 1 1 2
4f 2 2
5c 0 1 4 1 4 2 3
5c 1 1 1 2 3
5c 2 2 3
6s 0 1 3 1 2 1 3
6s 1 1 1 2 1
6s 2 2
7e 0 1 2 1 2 1 2
7e 1 1 1 1
7e 2

 

Merge with or without NULLs

Here I’ll use the same data.

  • I use the ROWID to identify each row and to allow direct access to the target rows in the MERGE step.
  • To take attributes into account, simply add them to the PARTITION BY list.
  • Since I use ALL ROWS PER MATCH, I must use the FINAL keyword to include the entire match in the aggregation functions.
  • I adjust the PATTERN (A+, not A*) to bypass rows that are already packed ranges.
  • I return all the rows to be packed, updating the first row to the new T and deleting the others.
merge into t o
using (
  select * from (
    select obj_id, f, t, rowid rid from t a
  )
  match_recognize( 
    partition by obj_id
    order by F nulls first, T
    measures count(*) cnt,
      decode(final count(*), final count(T), final max(T)) new_T 
    all rows per match
    pattern(a+ b)
    define a as count(*) > count(T) or max(T) >= nvl(next(F),T)
  )
) n
on (o.rowid = n.rid)
when matched then update set o.t = n.new_t
  delete where n.cnt > 1;

36 rows merged.

Next up: breaking down overlapping ranges into component segments.

Leave a comment