Splitting strings before they’re CLOBs!

After I tokenized two CLOBs in response to a forum question, Andy Sayer mentioned that they were probably files first. So I smacked my forehead and used external tables. When the light bulb went on, I saw that the comma needed to be a record delimiter, not a field separator. Easy.

create table item_ids_EXT(  
  "RN" NUMBER,  
  "ITEM_ID" VARCHAR2(16)  
)  
ORGANIZATION EXTERNAL(  
  TYPE ORACLE_LOADER  
  DEFAULT DIRECTORY EXT_DATA_DIR  
  ACCESS PARAMETERS(  
    records delimited by ','   
    BADFILE EXT_DATA_DIR:'i_item_id.txt.bad'  
    LOGFILE EXT_DATA_DIR:'i_item_id.txt.log'  
    FIELDS TERMINATED BY "§" NOTRIM  
    MISSING FIELD VALUES ARE NULL  
    REJECT ROWS WITH ALL NULL FIELDS  
    (  
      "RN" recnum,  
      "ITEM_ID" char(255)  
    )  
  )  
  location ('i_item_id.txt')  
)  
REJECT LIMIT 10;

create table item_keys_EXT(  
  "RN" NUMBER,  
  "ITEM_KEY" VARCHAR2(255)  
)  
ORGANIZATION EXTERNAL(  
  TYPE ORACLE_LOADER  
  DEFAULT DIRECTORY EXT_DATA_DIR  
  ACCESS PARAMETERS(  
    records delimited by ','   
    BADFILE EXT_DATA_DIR:'i_item_id.txt.bad'  
    LOGFILE EXT_DATA_DIR:'i_item_id.txt.log'  
    FIELDS TERMINATED BY "§" NOTRIM  
    MISSING FIELD VALUES ARE NULL  
    REJECT ROWS WITH ALL NULL FIELDS  
    (  
      "RN" recnum,  
      "ITEM_KEY" char(255)  
    )  
  )  
  location ('i_item_key.txt')  
)  
REJECT LIMIT 10;  

set timing on

create table u as  
select item_id, item_key  
from item_ids_ext  
join item_keys_ext using(rn);

Table U created.

Elapsed: 00:00:00.787

select floor(2 * 60 * 60 / 0.787) times_faster from dual;

TIMES_FASTER
------------
        9148
Advertisements

Splitting strings when they are CLOBs

Like many of you, I have enjoyed using SQL to make data processing go hundreds of times faster. My new record is: 8000 times faster :-)

In a recent ODC thread, the question was how to split out comma separated values from two CLOBs, one containing “item ids” and the other “item keys”. The ids and the keys had to be matched by position: the first id with the first key, the second id with the second key, and so on. Then each id/key pair was to be inserted as a row into a target table.

With 50,000 ids of 7 characters and 50,000 keys of 172 characters, the current processing takes 2 hours.

Test data

First I’ll create a table with the two CLOBs.

--drop table t purge;
create table t as
select 
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(E, ',' || to_char(level,'fm0000009'))
      order by level
    ) AS CLOB
  ), 2
) i_item_id,
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(E, ',' || rpad(to_char(level,'fm0000009'),172,'*'))
      order by level
    ) AS CLOB
  ), 2
) i_item_key
from dual
connect by level <= 50000;

Splitting CLOB strings

Instead of seeking a “perfect” custom-made solution, I just put together two functions I have already written about:

  1. The PIPE_CLOB function: a pipelined table function that reads a CLOB and outputs up to 4000 bytes at a time, respecting a delimiter.
  2. The STRING_TOKENIZE function: a pipelined table function that splits a VARCHAR2 string according to a delimiter.

All I have to do is feed the output from the first function into the second, and voilà!

select b.column_value item_id
from t,
table(pipe_clob( i_item_id, 4000, ',')) a,
table(string_tokenize(a.column_value,',')) b;

Combining the two CLOBs

Remember, the requirement was to combine the values from both CLOBs by position, in order to insert them into a target table. Since the functions return the values in order, I just use ROWNUM to number the values and use that number to join them.

--drop table u purge;
create table u as
with item_ids as (
  select rownum rn,
    b.column_value item_id
    from t,
  table(pipe_clob( i_item_id, 4000, ',')) a,
  table(string_tokenize(a.column_value,',')) b
)
, item_keys as (
  select rownum rn,
    b.column_value item_key
    from t,
  table(pipe_clob( i_item_key, 4000, ',')) a,
  table(string_tokenize(a.column_value,',')) b
)
select item_id, item_key 
from item_ids join item_keys using(rn);

Table U created.

Elapsed: 00:00:00.879

That’s right, processing time went from two hours to less than one second.

Now, I suspect that the questioner’s code is doing more than just a simple insert, but until he or she provides more details about the requirement, I get to maintain my claim:

8000 times faster!

(which I hope applies only to my code, not to the speed at which time flies…)

Ranges with NULLs 07: Swiss Army Knife

I came up with this name a few years ago for a problem that I couldn’t solve then: analyse a series of ranges and say whether and how they overlap. It turns out the solution is not that hard.

Current list of posts: 

The Problem

I want to order a set of ranges by “from” and “to”, then determine the relations between each range and the ranges that follow it. The only limit is with the “precedes” relation: I know that the first range may precede any number of following ranges, so I only want relations where one range immediately precedes the next, with no intervening range.

MATCH_RECOGNIZE to the rescue

The MATCH_RECOGNIZE clause (introduced in 12c) is a good fit here, for several reasons:

  • I can PARTITION BY different test cases and ORDER BY the “from” and “to” values.
  • I can say AFTER MATCH SKIP TO NEXT ROW to use each and every range as a starting point.
  • I can use the pipe character | for alternation. Alternation matches a single regular expression from a list of several possible regular expressions. Alternatives are preferred in the order they are specified.

Reminders (skip if you are reading the whole series)

  • Ranges have exclusive end points, which allows ranges that “meet” without gaps or overlaps.
  • Three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • My data consists of an object and a “from” and “to” range.
  • I propose solutions that require database version 12.1 or later.

No NULLs

This is the same data as in my previous post Ranges with NULLs 03: Gaps

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.
The code:
  • line 4: I partition by the test case and order by “from” and “to”.
  • line 5: the classifier() function returns the name of the condition that was met, which is also the name of the relation I am looking for.
  • line 7: I check each row in turn.
  • line 8: A is the first row (which I do not return), and the following rows belong to one of the other conditions.
  • I check the relations “backwards” so that some checks are implicit:
    • equals: between the first row and the current row, “from” and “to” are the same.
    • starts: I don’t explicitly check that the first “to” is less than the current “to”. At this point they cannot be equal and the ORDER BY guarantees that the first “to” is not greater than the current “to”.
    • after the “starts” condition, the first “from” must be less than the current “to”, so I don’t have to check it explicitly.
    • I don’t care about “precedes” if there are intervening rows.
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T
from t
match_recognize(
  partition by obj_id order by f, t
  measures a.f prior_f, a.t prior_t, classifier() relation
  all rows per match
  after match skip to next row
  pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ )
  define equals as a.f = f and a.t = t,
    starts as a.f = f, -- if true, a.t must be < t because of order by
    -- from now on a.f must be < f because of order by
    contains as a.t > t,
    finished_by as a.t = t,
    overlaps as a.t > f and a.t < t,
    meets as a.t = f,
    -- at this point a.t must be < f
    precedes as count(*) = 2    
);
OBJ_ID PRIOR_F PRIOR_T RELATION F T
1p    0 1 2 PRECEDES 3 4
2m    0 1 2 MEETS 2 3
3o    0 1 3 OVERLAPS 2 4
4f    0 1 3 FINISHED_BY 2 3
5c    0 1 4 CONTAINS 2 3
6s    0 1 2 STARTS 1 3
7e    0 1 2 EQUALS 1 2

 

“To” NULL

delete from t;
Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
  • lines 9 and 12: with equality comparisons, I use DECODE() to treat NULL values as equal.
  • lines 11 and 13: with inequality comparisons, I check explicitly for NULL when necessary.
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T
from t
match_recognize(
  partition by obj_id order by f, t
  measures a.f prior_f, a.t prior_t, classifier() relation
  all rows per match
  after match skip to next row
  pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ )
  define equals as a.f = f and decode(a.t,t,0,1) = 0,
    starts as a.f = f,
    contains as (a.t > t or a.t is null),
    finished_by as decode(a.t,t,0,1) = 0,
    overlaps as a.t > f and (a.t < t or t is null),
    meets as a.t = f,
    precedes as count(*) = 2    
);
OBJ_ID PRIOR_F PRIOR_T RELATION F T
1p    1 1 2 PRECEDES 3
2m    1 1 2 MEETS 2
3o    1 1 3 OVERLAPS 2
4f    1 1 FINISHED_BY 2
5c    1 1 CONTAINS 2 3
6s    1 1 2 STARTS 1
7e    1 1 EQUALS 1

 

With “From” and “To” NULLs

delete from t;
Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
select OBJ_ID, PRIOR_F, PRIOR_T, relation, F, T
from t
match_recognize(
  partition by obj_id order by f nulls first, t
  measures a.f prior_f, a.t prior_t, classifier() relation
  all rows per match
  after match skip to next row
  pattern( {-a-} (equals|starts|finished_by|contains|overlaps|meets|precedes)+ )
  define equals as decode(a.f,f,0,1) = 0 and decode(a.t,t,0,1) = 0,
    starts as decode(a.f,f,0,1) = 0,
    contains as a.t > t or a.t is null,
    finished_by as decode(a.t,t,0,1) = 0,
    overlaps as a.t > f and (a.t < t or t is null),
    meets as a.t = f,
    precedes as count(*) = 2    
);
OBJ_ID PRIOR_F PRIOR_T RELATION F T
1p    2 2 PRECEDES 3
2m    2 2 MEETS 2
3o    2 3 OVERLAPS 2
4f    2 FINISHED_BY 2
5c    2 CONTAINS 2 3
6s    2 2 STARTS
7e    2 EQUALS

 

Ranges with NULLs 06: Overlaps with Conflicting Data

In 2014 I attacked the problem of “Overlapping ranges with priorities”. This time I’ll deal with NULLs and propose an improved solution.

Current list of posts: 

Requirement

Assume overlapping ranges with one attribute. I call the overlapping parts “segments”.

  • Segments: break each range down into segments, including the range’s attribute.
  • Priority: when two or more segments with the same “from” and “to” have different values, the minimum value has priority.
  • Merge: once all the segments have proper values assigned, merge contiguous segments having the same values.

Reminders

  • In my world, ranges have exclusive end points: this is called the “closed-open” type. It allows ranges that “meet” without gaps or overlaps.
  • I classify ranges in three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • I propose solutions that require database version 12.1 or later.

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution.

CREATE TABLE T (
OBJ_ID    VARCHAR2 ( 20 BYTE ),
F         NUMBER,
T         NUMBER,
ATTR_ID   NUMBER,
AB_ATTR   VARCHAR2 ( 90 BYTE )
);
Insert into T values ('3o    01','1','3','1','1:1=1');
Insert into T values ('3o    01','2','4','1','1:1=1');
Insert into T values ('3o    02','1','3','1','2:1<2');
Insert into T values ('3o    02','2','4','2','2:1<2');
Insert into T values ('3o    03','1','3','2','3:2>1');
Insert into T values ('3o    03','2','4','1','3:2>1');
Insert into T values ('3o    04','1','3',null,'4:N-1');
Insert into T values ('3o    04','2','4','1','4:N-1');
Insert into T values ('3o    05','1','3','1','5:1-N');
Insert into T values ('3o    05','2','4',null,'5:1-N');
Insert into T values ('3o    06','1','3',null,'6:N-N');
Insert into T values ('3o    06','2','4',null,'6:N-N');
Insert into T values ('5c    01','1','4','1','1:1=1');
Insert into T values ('5c    01','2','3','1','1:1=1');
Insert into T values ('5c    02','1','4','1','2:1<2');
Insert into T values ('5c    02','2','3','2','2:1<2');
Insert into T values ('5c    03','1','4','2','3:2>1');
Insert into T values ('5c    03','2','3','1','3:2>1');
Insert into T values ('5c    04','1','4',null,'4:N-1');
Insert into T values ('5c    04','2','3','1','4:N-1');
Insert into T values ('5c    05','1','4','1','5:1-N');
Insert into T values ('5c    05','2','3',null,'5:1-N');
Insert into T values ('5c    06','1','4',null,'6:N-N');
Insert into T values ('5c    06','2','3',null,'6:N-N');
OBJ_ID F T ATTR_ID AB_ATTR
3o    01 1 3 1 1:1=1
3o    01 2 4 1 1:1=1
3o    02 1 3 1 2:1<2
3o    02 2 4 2 2:1<2
3o    03 1 3 2 3:2>1
3o    03 2 4 1 3:2>1
3o    04 1 3 4:N-1
3o    04 2 4 1 4:N-1
3o    05 1 3 1 5:1-N
3o    05 2 4 5:1-N
3o    06 1 3 6:N-N
3o    06 2 4 6:N-N
5c    01 1 4 1 1:1=1
5c    01 2 3 1 1:1=1
5c    02 1 4 1 2:1<2
5c    02 2 3 2 2:1<2
5c    03 1 4 2 3:2>1
5c    03 2 3 1 3:2>1
5c    04 1 4 4:N-1
5c    04 2 3 1 4:N-1
5c    05 1 4 1 5:1-N
5c    05 2 3 5:1-N
5c    06 1 4 6:N-N
5c    06 2 3 6:N-N

 

  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “overlaps, contains”. The 0 in position 6 means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the relationship between the ATTR_ID values. There are 6 such relationships, identified by the last digit of OBJ_ID.

Non-null ranges

  • First I UNPIVOT F and T into one column called FT  that contains all the boundary points, and I create a column called SEG_START that is -1 when the source is F and 1 with the source is T. I also retain the original value of T in every new row.
  • Then I use MATCH_RECOGNIZE to identify, each time SEG_START = -1, all the boundary points up to and including the original T. I output each segment with the original ATTR_ID value.
  • Then I use GROUP BY to reduce duplicate segments to one, retaining the minimum ATTR_ID
  • Finally, I use MATCH_RECOGNIZE again to merge the contiguous segments with the same ATTR_ID value.
with unpivoted as (
  select * from t
  unpivot ( (ft, t) for seg_start in ((f,t) as -1, (t,t) as 1))
)
, segmented as (
  select obj_id, ab_attr, seg_f, seg_t, range_attr_id
  from unpivoted
  match_recognize(
    partition by obj_id order by ft, seg_start, t
    measures prev(ft) seg_f, ft seg_t, first(attr_id) range_attr_id
    all rows per match
    after match skip to next row
    pattern({-a-} b+)
    define a as seg_start = -1,
      b as ft <= first(t)
  )
  where seg_f < seg_t
)
, grouped as (
  select OBJ_ID, AB_ATTR, SEG_F, SEG_T, min(RANGE_ATTR_ID) attr_id
  from segmented
  group by OBJ_ID, AB_ATTR, SEG_F, SEG_T
)
select * from grouped
match_recognize(
  partition by obj_id, ab_attr, attr_id order by seg_f, seg_t
  measures first(seg_f) f, last(seg_t) t
  pattern(a b*)
  define b as seg_f = prev(seg_t)
)
order by obj_id, f, t;
OBJ_ID AB_ATTR ATTR_ID F T
3o    01 1:1=1 1 1 4
3o    02 2:1<2 1 1 3
3o    02 2:1<2 2 3 4
3o    03 3:2>1 2 1 2
3o    03 3:2>1 1 2 4
3o    04 4:N-1 1 2
3o    04 4:N-1 1 2 4
3o    05 5:1-N 1 1 3
3o    05 5:1-N 3 4
3o    06 6:N-N 1 4
5c    01 1:1=1 1 1 4
5c    02 2:1<2 1 1 4
5c    03 3:2>1 2 1 2
5c    03 3:2>1 1 2 3
5c    03 3:2>1 2 3 4
5c    04 4:N-1 1 2
5c    04 4:N-1 1 2 3
5c    04 4:N-1 3 4
5c    05 5:1-N 1 1 4
5c    06 6:N-N 1 4

 

Segments with “To” NULL

Insert into T values ('3o    11','1','3','1','1:1=1');
Insert into T values ('3o    11','2',null,'1','1:1=1');
Insert into T values ('3o    12','1','3','1','2:1<2');
Insert into T values ('3o    12','2',null,'2','2:11');
Insert into T values ('3o    13','2',null,'1','3:2>1');
Insert into T values ('3o    14','1','3',null,'4:N-1');
Insert into T values ('3o    14','2',null,'1','4:N-1');
Insert into T values ('3o    15','1','3','1','5:1-N');
Insert into T values ('3o    15','2',null,null,'5:1-N');
Insert into T values ('3o    16','1','3',null,'6:N-N');
Insert into T values ('3o    16','2',null,null,'6:N-N');
Insert into T values ('5c    11','1',null,'1','1:1=1');
Insert into T values ('5c    11','2','3','1','1:1=1');
Insert into T values ('5c    12','1',null,'1','2:1<2');
Insert into T values ('5c    12','2','3','2','2:11');
Insert into T values ('5c    13','2','3','1','3:2>1');
Insert into T values ('5c    14','1',null,null,'4:N-1');
Insert into T values ('5c    14','2','3','1','4:N-1');
Insert into T values ('5c    15','1',null,'1','5:1-N');
Insert into T values ('5c    15','2','3',null,'5:1-N');
Insert into T values ('5c    16','1',null,null,'6:N-N');
Insert into T values ('5c    16','2','3',null,'6:N-N');
  • line 3: add the INCLUDE NULLS option to include T when it is NULL.
  • lines 15 and 17: include segments that end with NULL.
with unpivoted as (
  select * from t
  unpivot INCLUDE NULLS ( (ft, t) for seg_start in ((f,t) as -1, (t,t) as 1))
)
, segmented as (
  select obj_id, ab_attr, seg_f, seg_t, range_attr_id
  from unpivoted
  match_recognize(
    partition by obj_id order by ft, seg_start, t
    measures prev(ft) seg_f, ft seg_t, first(attr_id) range_attr_id
    all rows per match
    after match skip to next row
    pattern({-a-} b+)
    define a as seg_start = -1,
      b as ft <= first(t) OR FIRST(T) IS NULL
  )
  where seg_f < seg_t OR SEG_T IS NULL
)
, grouped as (
  select OBJ_ID, AB_ATTR, SEG_F, SEG_T, min(RANGE_ATTR_ID) attr_id
  from segmented
  group by OBJ_ID, AB_ATTR, SEG_F, SEG_T
)
select * from grouped
match_recognize(
  partition by obj_id, ab_attr, attr_id order by seg_f, seg_t
  measures first(seg_f) f, last(seg_t) t
  pattern(a b*)
  define b as seg_f = prev(seg_t)
)
order by obj_id, f, t;

Just so your mouse won’t get too tired from scrolling, I will spare you the output. There are 20 additional rows for the new test cases.

With “From” and “To” NULLs

Insert into T values ('3o    21',null,'3','1','1:1=1');
Insert into T values ('3o    21','2',null,'1','1:1=1');
Insert into T values ('3o    22',null,'3','1','2:1<2');
Insert into T values ('3o    22','2',null,'2','2:11');
Insert into T values ('3o    23','2',null,'1','3:2>1');
Insert into T values ('3o    24',null,'3',null,'4:N-1');
Insert into T values ('3o    24','2',null,'1','4:N-1');
Insert into T values ('3o    25',null,'3','1','5:1-N');
Insert into T values ('3o    25','2',null,null,'5:1-N');
Insert into T values ('3o    26',null,'3',null,'6:N-N');
Insert into T values ('3o    26','2',null,null,'6:N-N');
Insert into T values ('5c    21',null,null,'1','1:1=1');
Insert into T values ('5c    21','2','3','1','1:1=1');
Insert into T values ('5c    22',null,null,'1','2:1<2');
Insert into T values ('5c    22','2','3','2','2:11');
Insert into T values ('5c    23','2','3','1','3:2>1');
Insert into T values ('5c    24',null,null,null,'4:N-1');
Insert into T values ('5c    24','2','3','1','4:N-1');
Insert into T values ('5c    25',null,null,'1','5:1-N');
Insert into T values ('5c    25','2','3',null,'5:1-N');
Insert into T values ('5c    26',null,null,null,'6:N-N');
Insert into T values ('5c    26','2','3',null,'6:N-N');
  • lines 1 through 9: as explained in my previous post, I create a new column called NULL_ORDER so that NULL “from” values come first.
  • lines 10 through 25: I combine segmenting and grouping in one SELECT just to show that it can be done: the MATCH_RECOGNIZE clause is executed before the GROUP BY. The NULL_ORDER column is used in ordering and comparisons.
  • line 28: order by seg_f NULLS FIRST and it all works!
with unpivoted as (
  select u.*,
    case when seg_start = -1 and ft is null then -1
         when seg_start = 1  and ft is null then  1
         else 0
    end null_order
  from t
  unpivot include nulls ( (ft, t) for seg_start in ((f,t) as -1, (t,t) as 1)) u
)
, segmented_and_grouped as (
  select obj_id, ab_attr, seg_f, seg_t, min(range_attr_id) attr_id
  from unpivoted
  match_recognize(
    partition by obj_id order by null_order, ft, seg_start, t
    measures prev(ft) seg_f, ft seg_t, first(attr_id) range_attr_id, 
      prev(null_order) null_order_f, null_order null_order_t
    all rows per match
    after match skip to next row
    pattern({-a-} b+)
    define a as seg_start = -1,
      b as null_order = -1 or ft <= first(t) or first(t) is null
  )
  where null_order_f < null_order_t or seg_f < seg_t
  group by obj_id, ab_attr, seg_f, seg_t
)
select * from segmented_and_grouped
match_recognize(
  partition by obj_id, ab_attr, attr_id order by seg_f nulls first
  measures first(seg_f) f, last(seg_t) t
  pattern(a+)
  define a as count(*) = 1 or seg_f = prev(seg_t)
)
order by obj_id, f nulls first, t;

Next up: Swiss Army knife.

Ranges with NULLs 05: Segments

Take two overlapping ranges: “1 to 3” and “2 to 4”. We can divide them into three smaller ranges: “1 to 2”, “2 to 3” and “3 to 4”. I’ll call these smaller ranges segments. We sometimes need these segments to solve advanced range problems.

Current list of posts: 

Requirement

We know that ranges “1 to 3” and “2 to 4” overlap, and the overlapping part is “2 to 3”. I call that part a segment; a segment has the following properties:

  • Its starting point is a boundary point (start or end) of an original range;
  • Its ending point is the next boundary point, either of that range or some overlapping range;
  • It is entirely covered by at least one original range (so there is no segment corresponding to a “gap”);

The requirement here is to find all the segments belonging to a group of ranges. In this example we should find “1 to 2”, “2 to 3” and “3 to 4”. As a bonus, I’ll determine for each segment how many original ranges it belongs to: here “2 to 3” belongs to 2 ranges and the others belong to only one.

Reminders (skip if you are reading the whole series)

  • In my world, ranges have exclusive end points: this is called the “closed-open” type. It allows ranges that “meet” without gaps or overlaps.
  • I classify ranges in three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • My data consists of an object and a “from” and “to” range.
  • I propose solutions that require database version 12.1 or later.

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution. This is the same data as in my previous post Ranges with NULLs 03: Gaps

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.

Segments in non-null ranges

  • First I UNPIVOT F and T into one column (called F for convenience) that contains all the boundary points, and I create a column called SEG_START that is 1 when the source is F and -1 with the source is T.
  • Then I use the analytic LEAD function to pair each boundary point with the next boundary point, while also doing a running sum of SEG_START.
  • When the running sum is 0, I have a “gap” so I filter out that row. I also filter out rows where F and T are equal.
with unpivoted as (
  select * from t
  unpivot(f for seg_start in(f as 1, t as -1))
)
select OBJ_ID, F, T, NUM_SEGS, AF, AT, BF, BT
from (
  select u.*,
  lead(f) over(partition by obj_id order by f) t,
  sum(seg_start) over(partition by obj_id order by f) num_segs
  from unpivoted u
)
where num_segs > 0 and f < t;
OBJ_ID F T NUM_SEGS AF AT BF BT
1p 0 1 2 1 1 2 3 4
1p 0 3 4 1 1 2 3 4
2m 0 1 2 1 1 2 2 3
2m 0 2 3 1 1 2 2 3
3o 0 1 2 1 1 3 2 4
3o 0 2 3 2 1 3 2 4
3o 0 3 4 1 1 3 2 4
4f 0 1 2 1 1 3 2 3
4f 0 2 3 2 1 3 2 3
5c 0 1 2 1 1 4 2 3
5c 0 2 3 2 1 4 2 3
5c 0 3 4 1 1 4 2 3
6s 0 1 2 2 1 2 1 3
6s 0 2 3 1 1 2 1 3
7e 0 1 2 2 1 2 1 2

 

Segments with “To” NULL

Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
commit;
  • line 3: add the INCLUDE NULLS option to include T when it is NULL.
  • line 12: include segments that end with NULL.
with unpivoted as (
  select * from t
  unpivot include nulls (f for seg_start in(f as 1, t as -1))
)
select OBJ_ID, F, T, NUM_SEGS, AF, AT, BF, BT
from (
  select u.*,
  lead(f) over(partition by obj_id order by f) t,
  sum(seg_start) over(partition by obj_id order by f) num_segs
  from unpivoted u
)
where num_segs > 0 and (f < t or t is null);
OBJ_ID F T NUM_SEGS AF AT BF BT
1p 0 1 2 1 1 2 3 4
1p 0 3 4 1 1 2 3 4
1p 1 1 2 1 1 2 3
1p 1 3 1 1 2 3
2m 0 1 2 1 1 2 2 3
2m 0 2 3 1 1 2 2 3
2m 1 1 2 1 1 2 2
2m 1 2 1 1 2 2
3o 0 1 2 1 1 3 2 4
3o 0 2 3 2 1 3 2 4
3o 0 3 4 1 1 3 2 4
3o 1 1 2 1 1 3 2
3o 1 2 3 2 1 3 2
3o 1 3 1 1 3 2
4f 0 1 2 1 1 3 2 3
4f 0 2 3 2 1 3 2 3
4f 1 1 2 1 1 2
4f 1 2 2 1 2
5c 0 1 2 1 1 4 2 3
5c 0 2 3 2 1 4 2 3
5c 0 3 4 1 1 4 2 3
5c 1 1 2 1 1 2 3
5c 1 2 3 2 1 2 3
5c 1 3 1 1 2 3
6s 0 1 2 2 1 2 1 3
6s 0 2 3 1 1 2 1 3
6s 1 1 2 2 1 2 1
6s 1 2 1 1 2 1
7e 0 1 2 2 1 2 1 2
7e 1 1 2 1 1

 

With “From” and “To” NULLs

Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
commit;

This is why I don’t like to allow NULL in “from” columns. After we UNPIVOT, we have to sort the null “from” values first and the null “to” values last! As a workaround, I create a column called NULL_ORDER that is -1 when F is NULL, 1 when T is NULL and 0 otherwise. I then test or order by this column when necessary.

with unpivoted as (
  select u.*,
    case when seg_start =  1 and f is null then -1
         when seg_start = -1 and f is null then  1
         else 0
    end null_order
  from t
  unpivot include nulls (f for seg_start in(f as 1, t as -1)) u
)
select OBJ_ID, F, T, NUM_SEGS, AF, AT, BF, BT
from (
  select u.*,
  lead(null_order) over(partition by obj_id order by null_order, f) t_null_order,
  lead(f) over(partition by obj_id order by null_order, f) t,
  sum(seg_start) over(partition by obj_id order by null_order, f) num_segs
  from unpivoted u
)
where num_segs > 0 and (null_order < t_null_order or f < t);
OBJ_ID F T NUM_SEGS AF AT BF BT
1p 0 1 2 1 1 2 3 4
1p 0 3 4 1 1 2 3 4
1p 1 1 2 1 1 2 3  
1p 1 3   1 1 2 3  
1p 2   2 1   2 3  
1p 2 3   1   2 3  
2m 0 1 2 1 1 2 2 3
2m 0 2 3 1 1 2 2 3
2m 1 1 2 1 1 2 2  
2m 1 2   1 1 2 2  
2m 2   2 1   2 2  
2m 2 2   1   2 2  
3o 0 1 2 1 1 3 2 4
3o 0 2 3 2 1 3 2 4
3o 0 3 4 1 1 3 2 4
3o 1 1 2 1 1 3 2  
3o 1 2 3 2 1 3 2  
3o 1 3   1 1 3 2  
3o 2   2 1   3 2  
3o 2 2 3 2   3 2  
3o 2 3   1   3 2  
4f 0 1 2 1 1 3 2 3
4f 0 2 3 2 1 3 2 3
4f 1 1 2 1 1   2  
4f 1 2   2 1   2  
4f 2   2 1     2  
4f 2 2   2     2  
5c 0 1 2 1 1 4 2 3
5c 0 2 3 2 1 4 2 3
5c 0 3 4 1 1 4 2 3
5c 1 1 2 1 1   2 3
5c 1 2 3 2 1   2 3
5c 1 3   1 1   2 3
5c 2   2 1     2 3
5c 2 2 3 2     2 3
5c 2 3   1     2 3
6s 0 1 2 2 1 2 1 3
6s 0 2 3 1 1 2 1 3
6s 1 1 2 2 1 2 1  
6s 1 2   1 1 2 1  
6s 2   2 2   2    
6s 2 2   1   2    
7e 0 1 2 2 1 2 1 2
7e 1 1   2 1   1  
7e 2     2        

 
Next up: “joining” each segment to the original ranges.

Ranges with NULLs 04: Pack, Merge

In this post, I’m going to treat two closely related but distinct requirements:

  1. “Pack”: in a SELECT, return contiguous or overlapping ranges as one range.
  2. “Merge”: modify a table to reduce contiguous or overlapping ranges to one range.

These operations should take into account other data associated with each range: ranges are packed or merged only when the other data is the same.

Current list of posts: 

What is the meaning of this?

This heading sounds like something a teacher would say when he discovers students breaking the rules. Actually, I’m not trying to break rules, I’m trying to make some – or at least tell you what rules I am following.

  • In my world, the rule is that ranges have exclusive end points: this is called the “closed-open” type. It allows ranges that “meet” without gaps or overlaps.
  • I classify ranges in three categories:
    1. Neither “To” nor “From” may be NULL
    2. “To” may be NULL, but not “From”
    3. Both “From” and “To” may be NULL.
  • My data consists of an object, a “from” and “to” range –
    and optional “attributes” that are valid for the object only within the range.
  • In this post, I’ll use “Pack” to mean a SELECT that returns contiguous or overlapping ranges as one big range (any “attributes” have to be the same).
  • In this post, I’ll use “Merge” to mean a MERGE statement that changes a table so that contiguous or overlapping ranges become one big range.
  • I propose solutions that require database version 12.1 or later.

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution. This is the same data as in my previous post Ranges with NULLs 03: Gaps

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.

Pack non-null ranges

  • I partition by the object ID and also by any attributes I want to include. PARTITION BY, like GROUP BY, does a good job of comparing values when NULL might be involved: two NULLs are considered “the same”, and a NULL is considered different from a non-NULL value.
    For brevity, here the attributes for a given object are always the same.
  • In a DEFINE, the current row is assumed to belong to the condition being tested, so I cannot compare the current start date to the maximum prior end date. I can however compare the maximum end date to the next start date.
select obj_id, F, T, AF, AT, BF, BT
from t
match_recognize( 
  partition by obj_id, AF, AT, BF, BT
  order by F, T
  measures first(F) F, max(T) T 
  pattern(a* b)
  define a as max(T) >= next(F)
);
OBJ_ID F T AF AT BF BT
1p 0 1 2 1 2 3 4
1p 0 3 4 1 2 3 4
2m 0 1 3 1 2 2 3
3o 0 1 4 1 3 2 4
4f 0 1 3 1 3 2 3
5c 0 1 4 1 4 2 3
6s 0 1 3 1 2 1 3
7e 0 1 2 1 2 1 2

 

Pack ranges with “To” NULL

Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
commit;

In the MEASURES and DEFINE lines, I find out whether there has been a NULL “to” by comparing COUNT(*) to COUNT(T).

select obj_id, F, T, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id, AF, AT, BF, BT
  order by F, T
  measures first(F) F, decode(count(*), count(T), max(T)) T
  pattern(a* b)
  define a as count(*) > count(T) or max(T) >= next(F)
)
order by 1, 2 nulls first, 3;
OBJ_ID F T AF AT BF BT
1p 0 1 2 1 2 3 4
1p 0 3 4 1 2 3 4
1p 1 1 2 1 2 3
1p 1 3 1 2 3
2m 0 1 3 1 2 2 3
2m 1 1 1 2 2
3o 0 1 4 1 3 2 4
3o 1 1 1 3 2
4f 0 1 3 1 3 2 3
4f 1 1 1 2
5c 0 1 4 1 4 2 3
5c 1 1 1 2 3
6s 0 1 3 1 2 1 3
6s 1 1 1 2 1
7e 0 1 2 1 2 1 2
7e 1 1 1 1

 

Pack with “From” and “To” NULLs

Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
commit;

We just need to order by F nulls first, and in the DEFINE say nvl(next(F),T).

select obj_id, F, T, AF, AT, BF, BT
from t
match_recognize( 
  partition by obj_id, AF, AT, BF, BT
  order by F nulls first, T
  measures first(F) F, decode(count(*), count(T), max(T)) T 
  pattern(a* b)
  define a as count(*) > count(T) or max(T) >= nvl(next(F),T)
)
order by 1,2 nulls first, 3;
OBJ_ID F T AF AT BF BT
1p 0 1 2 1 2 3 4
1p 0 3 4 1 2 3 4
1p 1 1 2 1 2 3
1p 1 3 1 2 3
1p 2 2 2 3
1p 2 3 2 3
2m 0 1 3 1 2 2 3
2m 1 1 1 2 2
2m 2 2 2
3o 0 1 4 1 3 2 4
3o 1 1 1 3 2
3o 2 3 2
4f 0 1 3 1 3 2 3
4f 1 1 1 2
4f 2 2
5c 0 1 4 1 4 2 3
5c 1 1 1 2 3
5c 2 2 3
6s 0 1 3 1 2 1 3
6s 1 1 1 2 1
6s 2 2
7e 0 1 2 1 2 1 2
7e 1 1 1 1
7e 2

 

Merge with or without NULLs

Here I’ll use the same data.

  • I use the ROWID to identify each row and to allow direct access to the target rows in the MERGE step.
  • To take attributes into account, simply add them to the PARTITION BY list.
  • Since I use ALL ROWS PER MATCH, I must use the FINAL keyword to include the entire match in the aggregation functions.
  • I adjust the PATTERN (A+, not A*) to bypass rows that are already packed ranges.
  • I return all the rows to be packed, updating the first row to the new T and deleting the others.
merge into t o
using (
  select * from (
    select obj_id, f, t, rowid rid from t a
  )
  match_recognize( 
    partition by obj_id
    order by F nulls first, T
    measures count(*) cnt,
      decode(final count(*), final count(T), final max(T)) new_T 
    all rows per match
    pattern(a+ b)
    define a as count(*) > count(T) or max(T) >= nvl(next(F),T)
  )
) n
on (o.rowid = n.rid)
when matched then update set o.t = n.new_t
  delete where n.cnt > 1;

36 rows merged.

Next up: breaking down overlapping ranges into component segments.

Ranges with NULLs 03: Gaps

When I wrote “Gaps in Date Ranges: when are you free?“, I handled NULLs using “magic” values. This time I’ll try not to cheat.

[UPDATE 2018-11-08: I have never seen a situation where “from” could be NULL and “to” was NOT NULL, so I no longer test that. ]

Current list of posts: 

Test data with no NULLs allowed

This isn’t a full test harness, just the minimum to illustrate the solution.

create table t(
  obj_id varchar2(9), 
  f int, t int, af int, at int, bf int, bt int
);

Insert into T values ('1p    0',1,2,1,2,3,4);
Insert into T values ('1p    0',3,4,1,2,3,4);
Insert into T values ('2m    0',1,2,1,2,2,3);
Insert into T values ('2m    0',2,3,1,2,2,3);
Insert into T values ('3o    0',1,3,1,3,2,4);
Insert into T values ('3o    0',2,4,1,3,2,4);
Insert into T values ('4f    0',1,3,1,3,2,3);
Insert into T values ('4f    0',2,3,1,3,2,3);
Insert into T values ('5c    0',1,4,1,4,2,3);
Insert into T values ('5c    0',2,3,1,4,2,3);
Insert into T values ('6s    0',1,2,1,2,1,3);
Insert into T values ('6s    0',1,3,1,2,1,3);
Insert into T values ('7e    0',1,2,1,2,1,2);
Insert into T values ('7e    0',1,2,1,2,1,2);
  1. There are two rows = two ranges for each test case.
  2. OBJ_ID indicates the relation between the ranges: “precedes, meets, overlaps, finished by, contains, starts, equals”. The 0 at the end means there are no nulls.
  3. F is the “from” value.
  4. T is the “to” value (exclusive).
  5. So we can check our results more easily, each row contains the values for both ranges,
    1. AF and AT for the first range,
    2. BF and BT for the second range.

Analytic solution

select * from (
  select OBJ_ID,
    max(t) over(partition by obj_id order by f) start_gap,
    lead(f) over(partition by obj_id order by f) end_gap,
    AF, AT, BF, BT
  from t
)
where start_gap < end_gap
order by obj_id, start_gap;
OBJ_ID START_GAP END_GAP AF AT BF BT
1p 0 2 3 1 2 3 4

 

The start of the gap is the biggest T so far, and the end of the gap is the next F – as long as MAX(T) is less than the next F.

Match_recognize solution

select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id
  order by F, T
  measures max(T) start_gap, next(F) end_gap
  all rows per match
  pattern((A|{-B-})+)
  define A as max(T) < next(F)
)
order by obj_id, start_gap;

 

When the “to” value can be null

The new test data is exactly the same as before, except that the highest “to” value in each test case has been replaced with NULL. The OBJ_ID of the test case ends in 1, not 0, to signal this.

Insert into T values ('1p    1',1,2,1,2,3,null);
Insert into T values ('1p    1',3,null,1,2,3,null);
Insert into T values ('2m    1',1,2,1,2,2,null);
Insert into T values ('2m    1',2,null,1,2,2,null);
Insert into T values ('3o    1',1,3,1,3,2,null);
Insert into T values ('3o    1',2,null,1,3,2,null);
Insert into T values ('4f    1',1,null,1,null,2,null);
Insert into T values ('4f    1',2,null,1,null,2,null);
Insert into T values ('5c    1',1,null,1,null,2,3);
Insert into T values ('5c    1',2,3,1,null,2,3);
Insert into T values ('6s    1',1,2,1,2,1,null);
Insert into T values ('6s    1',1,null,1,2,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);
Insert into T values ('7e    1',1,null,1,null,1,null);

The problem with “to” being null is that MAX(T) ignores NULL values. I have to know if there has been a NULL T along the way, because that means there can be no more gaps. I use SUM(DECODE(…)) in the analytic solution, and with MATCH_RECOGNIZE I compare COUNT(*) to COUNT(T): if they are not the same then there was a null value along the way.

select * from (
  select OBJ_ID,
    case sum(decode(t,null,1,0)) over(partition by obj_id order by f)
      when 0
        then max(t) over(partition by obj_id order by f)
    end start_gap,
    lead(f) over(partition by obj_id order by f) end_gap,
    AF, AT, BF, BT
  from t
)
where start_gap < end_gap;

select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id
  order by F, T
  measures max(T) start_gap, next(F) end_gap
  all rows per match
  pattern((A|{-B-})+)
  define A as max(T) < next(F) and count(t) = count(*)
)
order by obj_id, start_gap;
OBJ_ID START_GAP END_GAP AF AT BF BT
1p 0 2 3 1 2 3 4
1p 1 2 3 1 2 3

 

When “from” and “to” can both be null

In the new test data, OBJ_ID ends with 2 when the lowest F and the highest T are replaced by NULL.

The only additional change is to order by F NULLS FIRST.

Insert into T values ('1p    2',3,null,null,2,3,null);
Insert into T values ('1p    2',null,2,null,2,3,null);
Insert into T values ('2m    2',2,null,null,2,2,null);
Insert into T values ('2m    2',null,2,null,2,2,null);
Insert into T values ('3o    2',2,null,null,3,2,null);
Insert into T values ('3o    2',null,3,null,3,2,null);
Insert into T values ('4f    2',2,null,null,null,2,null);
Insert into T values ('4f    2',null,null,null,null,2,null);
Insert into T values ('5c    2',2,3,null,null,2,3);
Insert into T values ('5c    2',null,null,null,null,2,3);
Insert into T values ('6s    2',null,2,null,2,null,null);
Insert into T values ('6s    2',null,null,null,2,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);
Insert into T values ('7e    2',null,null,null,null,null,null);

select * from (
  select OBJ_ID,
    case sum(decode(t,null,1,0)) over(partition by obj_id order by f nulls first)
      when 0
        then max(t) over(partition by obj_id order by f nulls first)
    end start_gap,
    lead(f) over(partition by obj_id order by f nulls first) end_gap,
    AF, AT, BF, BT
  from t
)
where start_gap < end_gap;

select OBJ_ID, START_GAP, END_GAP, AF, AT, BF, BT
from t
match_recognize(
  partition by obj_id
  order by F nulls first, T
  measures max(T) start_gap, next(F) end_gap
  all rows per match
  pattern((A|{-B-})+)
  define A as max(T) < next(F) and count(t) = count(*)
)
order by obj_id, start_gap;
OBJ_ID START_GAP END_GAP AF AT BF BT
1p 0 2 3 1 2 3 4
1p 1 2 3 1 2 3
1p 2 2 3 2 3

 
Next up: merging or “packing” ranges.