In this post, I’m going to treat two closely related but distinct requirements:
- “Pack”: in a
SELECT
, return contiguous or overlapping ranges as one range. - “Merge”: modify a table to reduce contiguous or overlapping ranges to one range.
These operations should take into account other data associated with each range: ranges are packed or merged only when the other data is the same.
Current list of posts:
- Ranges with NULLs 01: starting over
- Ranges with NULLs 02: test cases
- Ranges with NULLs 03: Gaps
- Ranges with NULLs 04: Pack, Merge
- Ranges with NULLs 05: Segments
- Ranges with NULLs 06: Overlaps with Conflicting Data
- Ranges with NULLs 07: Swiss Army Knife
What is the meaning of this?
This heading sounds like something a teacher would say when he discovers students breaking the rules. Actually, I’m not trying to break rules, I’m trying to make some – or at least tell you what rules I am following.
- In my world, the rule is that ranges have exclusive end points: this is called the “closed-open” type. It allows ranges that “meet” without gaps or overlaps.
- I classify ranges in three categories:
- Neither “To” nor “From” may be
NULL
- “To” may be
NULL
, but not “From” - Both “From” and “To” may be
NULL
.
- Neither “To” nor “From” may be
- My data consists of an object, a “from” and “to” range –
and optional “attributes” that are valid for the object only within the range. - In this post, I’ll use “Pack” to mean a
SELECT
that returns contiguous or overlapping ranges as one big range (any “attributes” have to be the same). - In this post, I’ll use “Merge” to mean a
MERGE
statement that changes a table so that contiguous or overlapping ranges become one big range. - I propose solutions that require database version 12.1 or later.
Test data with no NULLs allowed
This isn’t a full test harness, just the minimum to illustrate the solution. This is the same data as in my previous post Ranges with NULLs 03: Gaps
create table t( obj_id varchar2(9), f int, t int, af int, at int, bf int, bt int ); Insert into T values ('1p 0',1,2,1,2,3,4); Insert into T values ('1p 0',3,4,1,2,3,4); Insert into T values ('2m 0',1,2,1,2,2,3); Insert into T values ('2m 0',2,3,1,2,2,3); Insert into T values ('3o 0',1,3,1,3,2,4); Insert into T values ('3o 0',2,4,1,3,2,4); Insert into T values ('4f 0',1,3,1,3,2,3); Insert into T values ('4f 0',2,3,1,3,2,3); Insert into T values ('5c 0',1,4,1,4,2,3); Insert into T values ('5c 0',2,3,1,4,2,3); Insert into T values ('6s 0',1,2,1,2,1,3); Insert into T values ('6s 0',1,3,1,2,1,3); Insert into T values ('7e 0',1,2,1,2,1,2); Insert into T values ('7e 0',1,2,1,2,1,2);
- There are two rows = two ranges for each test case.
OBJ_ID
indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.F
is the “from” value.T
is the “to” value (exclusive).- So we can check our results more easily, each row contains the values for both ranges,
AF
andAT
for the first range,BF
andBT
for the second range.
Pack non-null ranges
- I partition by the object ID and also by any attributes I want to include.
PARTITION BY
, likeGROUP BY
, does a good job of comparing values whenNULL
might be involved: twoNULL
s are considered “the same”, and aNULL
is considered different from a non-NULL
value.
For brevity, here the attributes for a given object are always the same. - In a
DEFINE
, the current row is assumed to belong to the condition being tested, so I cannot compare the current start date to the maximum prior end date. I can however compare the maximum end date to the next start date.
select obj_id, F, T, AF, AT, BF, BT from t match_recognize( partition by obj_id, AF, AT, BF, BT order by F, T measures first(F) F, max(T) T pattern(a* b) define a as max(T) >= next(F) );
OBJ_ID | F | T | AF | AT | BF | BT |
---|---|---|---|---|---|---|
1p 0 | 1 | 2 | 1 | 2 | 3 | 4 |
1p 0 | 3 | 4 | 1 | 2 | 3 | 4 |
2m 0 | 1 | 3 | 1 | 2 | 2 | 3 |
3o 0 | 1 | 4 | 1 | 3 | 2 | 4 |
4f 0 | 1 | 3 | 1 | 3 | 2 | 3 |
5c 0 | 1 | 4 | 1 | 4 | 2 | 3 |
6s 0 | 1 | 3 | 1 | 2 | 1 | 3 |
7e 0 | 1 | 2 | 1 | 2 | 1 | 2 |
Pack ranges with “To” NULL
Insert into T values ('1p 1',1,2,1,2,3,null); Insert into T values ('1p 1',3,null,1,2,3,null); Insert into T values ('2m 1',1,2,1,2,2,null); Insert into T values ('2m 1',2,null,1,2,2,null); Insert into T values ('3o 1',1,3,1,3,2,null); Insert into T values ('3o 1',2,null,1,3,2,null); Insert into T values ('4f 1',1,null,1,null,2,null); Insert into T values ('4f 1',2,null,1,null,2,null); Insert into T values ('5c 1',1,null,1,null,2,3); Insert into T values ('5c 1',2,3,1,null,2,3); Insert into T values ('6s 1',1,2,1,2,1,null); Insert into T values ('6s 1',1,null,1,2,1,null); Insert into T values ('7e 1',1,null,1,null,1,null); Insert into T values ('7e 1',1,null,1,null,1,null); commit;
In the MEASURES
and DEFINE
lines, I find out whether there has been a NULL
“to” by comparing COUNT(*)
to COUNT(T)
.
select obj_id, F, T, AF, AT, BF, BT from t match_recognize( partition by obj_id, AF, AT, BF, BT order by F, T measures first(F) F, decode(count(*), count(T), max(T)) T pattern(a* b) define a as count(*) > count(T) or max(T) >= next(F) ) order by 1, 2 nulls first, 3;
OBJ_ID | F | T | AF | AT | BF | BT |
---|---|---|---|---|---|---|
1p 0 | 1 | 2 | 1 | 2 | 3 | 4 |
1p 0 | 3 | 4 | 1 | 2 | 3 | 4 |
1p 1 | 1 | 2 | 1 | 2 | 3 | |
1p 1 | 3 | 1 | 2 | 3 | ||
2m 0 | 1 | 3 | 1 | 2 | 2 | 3 |
2m 1 | 1 | 1 | 2 | 2 | ||
3o 0 | 1 | 4 | 1 | 3 | 2 | 4 |
3o 1 | 1 | 1 | 3 | 2 | ||
4f 0 | 1 | 3 | 1 | 3 | 2 | 3 |
4f 1 | 1 | 1 | 2 | |||
5c 0 | 1 | 4 | 1 | 4 | 2 | 3 |
5c 1 | 1 | 1 | 2 | 3 | ||
6s 0 | 1 | 3 | 1 | 2 | 1 | 3 |
6s 1 | 1 | 1 | 2 | 1 | ||
7e 0 | 1 | 2 | 1 | 2 | 1 | 2 |
7e 1 | 1 | 1 | 1 |
Pack with “From” and “To” NULLs
Insert into T values ('1p 2',3,null,null,2,3,null); Insert into T values ('1p 2',null,2,null,2,3,null); Insert into T values ('2m 2',2,null,null,2,2,null); Insert into T values ('2m 2',null,2,null,2,2,null); Insert into T values ('3o 2',2,null,null,3,2,null); Insert into T values ('3o 2',null,3,null,3,2,null); Insert into T values ('4f 2',2,null,null,null,2,null); Insert into T values ('4f 2',null,null,null,null,2,null); Insert into T values ('5c 2',2,3,null,null,2,3); Insert into T values ('5c 2',null,null,null,null,2,3); Insert into T values ('6s 2',null,2,null,2,null,null); Insert into T values ('6s 2',null,null,null,2,null,null); Insert into T values ('7e 2',null,null,null,null,null,null); Insert into T values ('7e 2',null,null,null,null,null,null); commit;
We just need to order by F nulls first
, and in the DEFINE say nvl(next(F),T)
.
select obj_id, F, T, AF, AT, BF, BT from t match_recognize( partition by obj_id, AF, AT, BF, BT order by F nulls first, T measures first(F) F, decode(count(*), count(T), max(T)) T pattern(a* b) define a as count(*) > count(T) or max(T) >= nvl(next(F),T) ) order by 1,2 nulls first, 3;
OBJ_ID | F | T | AF | AT | BF | BT |
---|---|---|---|---|---|---|
1p 0 | 1 | 2 | 1 | 2 | 3 | 4 |
1p 0 | 3 | 4 | 1 | 2 | 3 | 4 |
1p 1 | 1 | 2 | 1 | 2 | 3 | |
1p 1 | 3 | 1 | 2 | 3 | ||
1p 2 | 2 | 2 | 3 | |||
1p 2 | 3 | 2 | 3 | |||
2m 0 | 1 | 3 | 1 | 2 | 2 | 3 |
2m 1 | 1 | 1 | 2 | 2 | ||
2m 2 | 2 | 2 | ||||
3o 0 | 1 | 4 | 1 | 3 | 2 | 4 |
3o 1 | 1 | 1 | 3 | 2 | ||
3o 2 | 3 | 2 | ||||
4f 0 | 1 | 3 | 1 | 3 | 2 | 3 |
4f 1 | 1 | 1 | 2 | |||
4f 2 | 2 | |||||
5c 0 | 1 | 4 | 1 | 4 | 2 | 3 |
5c 1 | 1 | 1 | 2 | 3 | ||
5c 2 | 2 | 3 | ||||
6s 0 | 1 | 3 | 1 | 2 | 1 | 3 |
6s 1 | 1 | 1 | 2 | 1 | ||
6s 2 | 2 | |||||
7e 0 | 1 | 2 | 1 | 2 | 1 | 2 |
7e 1 | 1 | 1 | 1 | |||
7e 2 |
Merge with or without NULLs
Here I’ll use the same data.
- I use the
ROWID
to identify each row and to allow direct access to the target rows in theMERGE
step. - To take attributes into account, simply add them to the
PARTITION BY
list. - Since I use
ALL ROWS PER MATCH
, I must use theFINAL
keyword to include the entire match in the aggregation functions. - I adjust the
PATTERN
(A+, not A*) to bypass rows that are already packed ranges. - I return all the rows to be packed, updating the first row to the new T and deleting the others.
merge into t o using ( select * from ( select obj_id, f, t, rowid rid from t a ) match_recognize( partition by obj_id order by F nulls first, T measures count(*) cnt, decode(final count(*), final count(T), final max(T)) new_T all rows per match pattern(a+ b) define a as count(*) > count(T) or max(T) >= nvl(next(F),T) ) ) n on (o.rowid = n.rid) when matched then update set o.t = n.new_t delete where n.cnt > 1; 36 rows merged.
Next up: breaking down overlapping ranges into component segments.