I came up with this name a few years ago for a problem that I couldn’t solve then: analyse a series of ranges and say whether and how they overlap. It turns out the solution is not that hard.
Current list of posts:
- Ranges with NULLs 01: starting over
- Ranges with NULLs 02: test cases
- Ranges with NULLs 03: Gaps
- Ranges with NULLs 04: Pack, Merge
- Ranges with NULLs 05: Segments
- Ranges with NULLs 06: Overlaps with Conflicting Data
- Ranges with NULLs 07: Swiss Army Knife
The Problem
I want to order a set of ranges by “from” and “to”, then determine the relations between each range and the ranges that follow it. The only limit is with the “precedes” relation: I know that the first range may precede any number of following ranges, so I only want relations where one range immediately precedes the next, with no intervening range.
MATCH_RECOGNIZE to the rescue
The MATCH_RECOGNIZE clause (introduced in 12c) is a good fit here, for several reasons:
- I can PARTITION BY different test cases and ORDER BY the “from” and “to” values.
- I can say AFTER MATCH SKIP TO NEXT ROW to use each and every range as a starting point.
- I can use the pipe character | for alternation. Alternation matches a single regular expression from a list of several possible regular expressions. Alternatives are preferred in the order they are specified.
Reminders (skip if you are reading the whole series)
- Ranges have exclusive end points, which allows ranges that “meet” without gaps or overlaps.
- Three categories:
- Neither “To” nor “From” may be
NULL
- “To” may be
NULL
, but not “From” - Both “From” and “To” may be
NULL
.
- Neither “To” nor “From” may be
- My data consists of an object and a “from” and “to” range.
- I propose solutions that require database version 12.1 or later.
No NULLs
This is the same data as in my previous post Ranges with NULLs 03: Gaps
create table t( obj_id varchar2(9), f int, t int, af int, at int, bf int, bt int ); Insert into T values ('1p 0',1,2,1,2,3,4); Insert into T values ('1p 0',3,4,1,2,3,4); Insert into T values ('2m 0',1,2,1,2,2,3); Insert into T values ('2m 0',2,3,1,2,2,3); Insert into T values ('3o 0',1,3,1,3,2,4); Insert into T values ('3o 0',2,4,1,3,2,4); Insert into T values ('4f 0',1,3,1,3,2,3); Insert into T values ('4f 0',2,3,1,3,2,3); Insert into T values ('5c 0',1,4,1,4,2,3); Insert into T values ('5c 0',2,3,1,4,2,3); Insert into T values ('6s 0',1,2,1,2,1,3); Insert into T values ('6s 0',1,3,1,2,1,3); Insert into T values ('7e 0',1,2,1,2,1,2); Insert into T values ('7e 0',1,2,1,2,1,2);
- There are two rows = two ranges for each test case.
OBJ_ID
indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.F
is the “from” value.T
is the “to” value (exclusive).- So we can check our results more easily, each row contains the values for both ranges,
AF
andAT
for the first range,BF
andBT
for the second range.
The code:
- line 4: I partition by the test case and order by “from” and “to”.
- line 5: the classifier() function returns the name of the condition that was met, which is also the name of the relation I am looking for.
- line 7: I check each row in turn.
- line 8: A is the first row (which I do not return), and the following rows belong to one of the other conditions.
- I check the relations “backwards” so that some checks are implicit:
- equals: between the first row and the current row, “from” and “to” are the same.
- starts: I don’t explicitly check that the first “to” is less than the current “to”. At this point they cannot be equal and the ORDER BY guarantees that the first “to” is not greater than the current “to”.
- after the “starts” condition, the first “from” must be less than the current “to”, so I don’t have to check it explicitly.
- I don’t care about “precedes” if there are intervening rows.
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T from t match_recognize( partition by obj_id order by f, t measures a.f prior_f, a.t prior_t, classifier() relation all rows per match after match skip to next row pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ ) define equals as a.f = f and a.t = t, starts as a.f = f, -- if true, a.t must be < t because of order by -- from now on a.f must be < f because of order by contains as a.t > t, finished_by as a.t = t, overlaps as a.t > f and a.t < t, meets as a.t = f, -- at this point a.t must be < f precedes as count(*) = 2 );
OBJ_ID | PRIOR_F | PRIOR_T | RELATION | F | T |
---|---|---|---|---|---|
1p 0 | 1 | 2 | PRECEDES | 3 | 4 |
2m 0 | 1 | 2 | MEETS | 2 | 3 |
3o 0 | 1 | 3 | OVERLAPS | 2 | 4 |
4f 0 | 1 | 3 | FINISHED_BY | 2 | 3 |
5c 0 | 1 | 4 | CONTAINS | 2 | 3 |
6s 0 | 1 | 2 | STARTS | 1 | 3 |
7e 0 | 1 | 2 | EQUALS | 1 | 2 |
“To” NULL
delete from t; Insert into T values ('1p 1',1,2,1,2,3,null); Insert into T values ('1p 1',3,null,1,2,3,null); Insert into T values ('2m 1',1,2,1,2,2,null); Insert into T values ('2m 1',2,null,1,2,2,null); Insert into T values ('3o 1',1,3,1,3,2,null); Insert into T values ('3o 1',2,null,1,3,2,null); Insert into T values ('4f 1',1,null,1,null,2,null); Insert into T values ('4f 1',2,null,1,null,2,null); Insert into T values ('5c 1',1,null,1,null,2,3); Insert into T values ('5c 1',2,3,1,null,2,3); Insert into T values ('6s 1',1,2,1,2,1,null); Insert into T values ('6s 1',1,null,1,2,1,null); Insert into T values ('7e 1',1,null,1,null,1,null); Insert into T values ('7e 1',1,null,1,null,1,null);
- lines 9 and 12: with equality comparisons, I use DECODE() to treat NULL values as equal.
- lines 11 and 13: with inequality comparisons, I check explicitly for NULL when necessary.
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T from t match_recognize( partition by obj_id order by f, t measures a.f prior_f, a.t prior_t, classifier() relation all rows per match after match skip to next row pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ ) define equals as a.f = f and decode(a.t,t,0,1) = 0, starts as a.f = f, contains as (a.t > t or a.t is null), finished_by as decode(a.t,t,0,1) = 0, overlaps as a.t > f and (a.t < t or t is null), meets as a.t = f, precedes as count(*) = 2 );
OBJ_ID | PRIOR_F | PRIOR_T | RELATION | F | T |
---|---|---|---|---|---|
1p 1 | 1 | 2 | PRECEDES | 3 | |
2m 1 | 1 | 2 | MEETS | 2 | |
3o 1 | 1 | 3 | OVERLAPS | 2 | |
4f 1 | 1 | FINISHED_BY | 2 | ||
5c 1 | 1 | CONTAINS | 2 | 3 | |
6s 1 | 1 | 2 | STARTS | 1 | |
7e 1 | 1 | EQUALS | 1 |
With “From” and “To” NULLs
delete from t; Insert into T values ('1p 2',3,null,null,2,3,null); Insert into T values ('1p 2',null,2,null,2,3,null); Insert into T values ('2m 2',2,null,null,2,2,null); Insert into T values ('2m 2',null,2,null,2,2,null); Insert into T values ('3o 2',2,null,null,3,2,null); Insert into T values ('3o 2',null,3,null,3,2,null); Insert into T values ('4f 2',2,null,null,null,2,null); Insert into T values ('4f 2',null,null,null,null,2,null); Insert into T values ('5c 2',2,3,null,null,2,3); Insert into T values ('5c 2',null,null,null,null,2,3); Insert into T values ('6s 2',null,2,null,2,null,null); Insert into T values ('6s 2',null,null,null,2,null,null); Insert into T values ('7e 2',null,null,null,null,null,null); Insert into T values ('7e 2',null,null,null,null,null,null);
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T from t match_recognize( partition by obj_id order by f nulls first, t measures a.f prior_f, a.t prior_t, classifier() relation all rows per match after match skip to next row pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ ) define equals as decode(a.f,f,0,1) = 0 and decode(a.t,t,0,1) = 0, starts as decode(a.f,f,0,1) = 0, contains as a.t > t or a.t is null, finished_by as decode(a.t,t,0,1) = 0, overlaps as a.t > f and (a.t < t or t is null), meets as a.t = f, precedes as count(*) = 2 );
OBJ_ID | PRIOR_F | PRIOR_T | RELATION | F | T |
---|---|---|---|---|---|
1p 2 | 2 | PRECEDES | 3 | ||
2m 2 | 2 | MEETS | 2 | ||
3o 2 | 3 | OVERLAPS | 2 | ||
4f 2 | FINISHED_BY | 2 | |||
5c 2 | CONTAINS | 2 | 3 | ||
6s 2 | 2 | STARTS | |||
7e 2 | EQUALS |