Splitting a Table into Rowid Ranges of Equal Size

In the OTN forum, Jonathan Lewis recently asked for an efficient SQL solution to split a table into 12 ROWID ranges having the same (+-1) number of blocks. I’m posting here a slightly cleaned-up version of my answer so I can change it if necessary after the question gets archived.

If you stop at the ANSWER clause, you will see the number of blocks in each bucket.

If you stop at the ROWIDS clause, you will get the ROWID ranges which are the point of the exercise.

The final SELECT is for testing purposes: it compares the number of rows in the table to the sum of the rows in each bucket. Obviously the two numbers had better be equal or else there is a bug. If you find a bug, let me know – gently ;)

define d_owner = 'SYS'  
define d_table_name = 'SOURCE$'
define d_num_buckets = 12
column obj# new_value d_object_id  
select obj# from sys.obj$  
where name = '&d_table_name'  
and owner# = (select user# from sys.user$ where name = '&d_owner');  

with extents as (   
  select block_id, file_id,
  sum(blocks) over() total_blocks,  
  sum(blocks) over() / &d_num_buckets blocks_per_bucket,
  sum(blocks) over(order by file_id, block_id) - blocks cumul_prev_blocks,   
  sum(blocks) over(order by file_id, block_id) cumul_current_blocks
  from dba_extents  
  where (owner, segment_name) = (('&d_owner', '&d_table_name'))
)
, extents_with_buckets as (   
  select 
  width_bucket(cumul_prev_blocks, .5, total_blocks + .5, &d_num_buckets) prev_bucket,
  width_bucket(cumul_prev_blocks+1, .5, total_blocks + .5, &d_num_buckets) first_bucket,
  width_bucket(cumul_current_blocks, .5, total_blocks + .5, &d_num_buckets) last_bucket,   
  e.* from extents e   
)
, selected_extents as (
  select * from extents_with_buckets
  where cumul_current_blocks = round(last_bucket*blocks_per_bucket)
  or prev_bucket < last_bucket
)
, expanded_extents as (   
  select first_bucket + level - 1 bucket,
  case level when 1 then cumul_prev_blocks
    else round((first_bucket + level - 2) * blocks_per_bucket) 
  end start_blocks,   
  case first_bucket + level - 1 when last_bucket then cumul_current_blocks - 1
    else round((first_bucket + level - 1) * blocks_per_bucket) - 1
  end end_blocks,
  e.*
  from selected_extents e
  connect by cumul_prev_blocks = prior cumul_prev_blocks   
  and first_bucket + level -1 <= last_bucket   
  and prior sys_guid() is not null
)
, answer as ( 
  select bucket,
  min(file_id) first_file_id,  
  min(block_id + start_blocks - cumul_prev_blocks)   
    keep (dense_rank first order by cumul_prev_blocks) first_block_id,
  max(file_id) last_file_id,  
  max(block_id + end_blocks - cumul_prev_blocks)   
    keep (dense_rank last order by cumul_prev_blocks) last_block_id,  
  max(end_blocks) + 1 - min(start_blocks) blocks   
  from expanded_extents   
  group by bucket 
)
, rowids as (
  select
  dbms_rowid.rowid_create(
    1, &d_object_id, first_file_id, first_block_id, 0
  ) rowid_start,   
  dbms_rowid.rowid_create(
    1, &d_object_id, last_file_id, last_block_id, 9999
  ) rowid_end   
  from answer
  order by bucket
)
select
'select count(*) cnt from &d_owner..&d_table_name union all select sum(cnt) from (' txt from dual
union all
select 'select count(*) cnt from &d_owner..&d_table_name where rowid between chartorowid('''
|| rowid_start || ''') and chartorowid(''' || rowid_end || ''')'
|| case when lead(rowid_start) over(order by rowid_start) is null then ');'
  else ' union all'
end test_sql
from rowids;

MATCH_RECOGNIZE: gaps in date ranges

To find gaps between date ranges, there is an elegant solution using analytic functions. We can follow the same logic with MATCH_RECOGNIZE, using a technique that avoids the problems with MIN() and MAX() comparisons I just wrote about.

Here are some simple test data

create table t ( start_date date, end_date date );
Insert into T values (DATE '2007-01-01', DATE '2007-01-15');
Insert into T values (DATE '2007-01-03', DATE '2007-01-10');
Insert into T values (DATE '2007-01-12', DATE '2007-01-25');
Insert into T values (DATE '2007-01-20', DATE '2007-02-01');
Insert into T values (DATE '2007-02-05', DATE '2007-02-10');
Insert into T values (DATE '2007-02-05', DATE '2007-02-28');
Insert into T values (DATE '2007-02-10', DATE '2007-02-15');
Insert into T values (DATE '2007-03-01', DATE '2007-03-02');
Insert into T values (DATE '2007-03-03', DATE '2007-03-16');

Please note that I use exclusive end dates: the second to last range ends when March 2d starts, so there should be a gap from March 2d to March 3d.

Using analytics, I compare the maximum end date so far to the next start date:

SELECT * FROM (
  SELECT MAX(end_date) OVER (ORDER BY start_date) start_gap,
  LEAD(start_date) OVER (ORDER BY start_date) end_gap
  FROM T
)
WHERE start_gap < end_gap;
START_GAP END_GAP
2007-02-01 2007-02-05
2007-02-28 2007-03-01
2007-03-02 2007-03-03

That is so simple – once you think of it.

Looking for a MATCH_RECOGNIZE solution, I tried to compare the current start date to the maximum prior end date. I ran into the problem I wrote about previously: when you use MAX() or other aggregates in a comparison, the current row is included.

I finally realized that if I compared the maximum end date to the next start date, the problem disappeared.

select start_gap, end_gap from t
match_recognize(
  order by start_date
  measures max(end_date) start_gap, next(start_date) end_gap
  all rows per match
  pattern((A|{-B-})+)
  define A as max(end_date) < next(start_date)
);

For those less familiar with the syntax

  • Every row that has a gap with the following row gets assigned to “A”, and all the other rows get assigned to “B”.
  • Normally, all rows per match will output every row, but {-b-} excludes the “B” rows from the output.

This code is logically equivalent to the analytic solution above. It also demonstrates how to avoid the knotty problem of how to exclude the current row from aggregates:

  1. Instead of comparing the aggregate to the current row, just back up a row.
  2. Now the aggregate contains exactly the rows you want.
  3. So compare that aggregate to the value of the row you care about, which is the “next” row.

MATCH_RECOGNIZE: matching based on aggregates

The heart of row pattern matching is finding which row matches what part of the pattern. Within the 12c MATCH_RECOGNIZE clause, DEFINE lists the conditions a row may meet; it doesn’t always work the way you expect, especially if you use aggregates in the condition.

How DEFINE works

The PATTERN clause lists the conditions that have to be met by a series of rows in order to match the pattern. The DEFINE clause defines each condition. For example:

PATTERN (A B)
DEFINE
  A as A.SOME_COLUMN = 'First row',
  B as A.SOME_COLUMN = 'First row' and B.SOME_COLUMN = 'Second row'

Do you see something strange here?

  • In the “A” condition, A.SOME_COLUMN refers to the row being evaluated.
  • In the “B” condition, A.SOME_COLUMN refers to a row that has already been accepted as meeting the “A” condition. B.SOME_COLUMN refers to the row being evaluated.

DEFINE starts by assuming the current row meets the condition. Then it evaluates the condition. If TRUE, the current row gets assigned. If FALSE, the current row does not get assigned to that condition.

Conditions with Aggregates

When you use an aggregate in DEFINE, you have to know when the aggregate will include the current row being evaluated. Here is an example:

PATTERN (A B)
DEFINE
  A as count(B.*) = 0,
  B as count(B.*) = 0

When the first row is evaluated, there are no B rows yet, and the current row is assumed to be part of “A”, so the “A” condition will be met. When the “B” condition is evaluated, the current row is assumed to be part of “B”, so count(B.*) will always be greater than 0 and the “B” condition will never be met.

Excluding the current row

If you use an aggregate that includes the current row’s value, and you want that row excluded, you have to do it yourself:

  • For COUNT(), subtract 1
  • For SUM(), subtract the value of the current row.

What about MIN() or MAX()?

I hope I haven’t lost you already, because the hard part is coming up. It’s also the interesting part, so take a deep breath and read on.

You can’t just “exclude” the current row from a MIN() or MAX() function. You have to keep track of which row has the minimum or maximum value, and then refer to that row. Here’s an example.

Let’s say I have stock market summaries each day: for each stock, I get the date, the closing price and the quantity traded.

create table ticker (
  stock_id integer,
  trade_date date,
  closing_price number not null,
  qty integer not null,
  primary key(stock_id, trade_date)
);
insert into ticker select 
1, sysdate, 20, 100 from dual union all select
1, sysdate+1, 30, 90 from dual union all select
1, sysdate+2, 15, 200 from dual union all select
1, sysdate+3, 14, 80 from dual union all select
1, sysdate+4, 20, 200 from dual;
STOCK_ID TRADE_DATE CLOSING_PRICE QTY
1 2015-07-02 07:23:37 20 100
1 2015-07-03 07:23:37 30 90
1 2015-07-04 07:23:37 15 200
1 2015-07-05 07:23:37 14 80
1 2015-07-06 07:23:37 20 200

 
I want to know when the stock hits a new low in price, or quantity, or both. To do this, I need to compare the current row’s price and quantity with the minimums up to but not including the current row.

I do this by defining four conditions:

  • PQ: a row containing a new low price and a new low quantity;
  • P: row with a new low price but not a new low quantity;
  • Q: row with a new low quantity but not a new low price;
  • OK: row without any new lows.

I also define two SUBSETs:

  • LOWP includes PQ and P rows: the most recent LOWP row has the minimum price so far.
  • LOWQ includes PQ and Q rows: the most recent LOWQ row has the minimum quantity so far.

Now, how do I access the row just before the current one? If I compare price to LAST(LOWP.price), I will be comparing the current price to itself.

To get the prior row, I just say LAST(LOWP.price,1). This backs up exactly one row.

select * from ticker
match_recognize(
  partition by stock_id order by trade_date
  measures classifier() new_low
  all rows per match
  pattern(pq (pq|p|q|ok)*)
  subset lowp = (pq,p), lowq = (pq,q)
  define
    pq as count(*) = 1 or (
      closing_price < last(lowp.closing_price,1) and qty < last(lowq.qty,1)
    ),
    p as closing_price < last(lowp.closing_price,1),
    q as qty < last(lowq.qty,1)
);
STOCK_ID TRADE_DATE NEW_LOW CLOSING_PRICE QTY
1 2015-07-02 07:23:37 PQ 20 100
1 2015-07-03 07:23:37 Q 30 90
1 2015-07-04 07:23:37 P 15 200
1 2015-07-05 07:23:37 PQ 14 80
1 2015-07-06 07:23:37 OK 20 200

 

Merging Overlapping Date Ranges with MATCH_RECOGNIZE

I forgot to add a MATCH_RECOGNIZE solution to my last post on merging overlapping date ranges! That should take me just a few minutes, right? Wrong: it’s not that easy and here’s why.

For test data, please refer to the previous post.

To Merge or not to Merge?

The idea is to merge date ranges if they meet or overlap either partially or completely – in other words, we keep merging as long as there is not a gap. How do we know whether or not a gap exists?

  • We sort by start date, then end date.
  • The current row should be merged if its start date is less than or equal to
    the most recent end date, up to but not including the current row.
  • Again, the secret is to compare the most recent end date, not just the end date on the previous line.

The Analytics solution

With analytics, the “windowing clause” lets us do exactly what we want: when we say rows between unbounded preceding and 1 preceding then Oracle will apply the max() function to all the preceding rows, up to but not including the current row. We simply compare the current start date with that maximum preceding end date and we know whether or not to merge.

The MATCH_RECOGNIZE difficulty

MATCH_RECOGNIZE, available starting in version 12c, is pretty powerful and does things analytics can’t do, but it doesn’t really have a windowing clause. When doing comparisons, this can be a problem.

Let’s try a simple test:

select * from t 
match_recognize( 
  partition by test_case 
  order by start_date, end_date 
  measures match_number() match, max(end_date) max_date 
  all rows per match
  pattern(a+) 
  define a as start_date <= max(a.end_date)
)
where rownum <= 2; 

Here I’m comparing the start_date with the maximum date already assigned to A – no I’m not! That is not the way the DEFINE clause works. It assumes the current row belongs to A, then does the test based on that assumption. As a result, the end date from the current row is included in the computation. Look at the result:

TEST_CASE START_DATE END_DATE MATCH MAX_DATE
01:precedes 2015-06-01 2015-06-02 1 2015-06-02
01:precedes 2015-06-03 2015-06-04 1 2015-06-04

 

The “match number” is the same, which means the second line will be merged with the first – and all because the start date is being compared to the wrong end date.

The Workaround

MATCH_RECOGNIZE does give you access to other rows than the current row, using PREV, NEXT, FIRST and LAST. In this case, I could say LAST(end_date,1) and get the end date from the preceding matched line – or I could just say PREV(end_date) for the same result.

So how can I be sure that the preceding end date is the most recent? MATCH_RECOGNIZE doesn’t let you use functions around LAST or PREV. I’ll just have to change the ORDER BY and sort by end date first. When I do, it finally works:

select * from t 
match_recognize( 
  partition by test_case 
  order by end_date, start_date 
  measures min(start_date) start_date, max(end_date) end_date 
  pattern(a b*) 
  define b as start_date <= last(end_date,1)
); 
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-03
03:overlaps 2015-06-01 2015-06-04
04:finished by 2015-06-01 2015-06-03
05:contains 2015-06-01 2015-06-04
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-01 2015-06-04
12:met by 2015-06-01 2015-06-03
13:preceded by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04

 
By the way, in my previous post I could have sorted by end date too and just used LAG(end_date) instead of the windowing clause.

Merging Overlapping Date Ranges

A recent question on the OTN forum asked about merging date ranges. Thanks to Karthick_App I realized that my previous blog on this subject was incomplete. Here is a solution that will merge any date ranges that meet, or that “overlap” in any way.

Define your terms

As a reminder, Allen’s Interval Algebra defines 13 possible relations between date ranges.

Day of month: 1 2 3 4
1: precedes 1 2
3 4
2: meets 1 2
2 3
3: overlaps 1 3
2 4
4: finished by 1 3
2 3
5: contains 1 4
2 3
6: starts 1 2
1 3
7: equals 1 2
1 2
8: started by 1 3
1 2
9: during 2 3
1 4
10: finishes 2 3
1 3
11: overlapped by 2 4
1 3
12: met by 2 3
1 2
13: preceded by 3 4
1 2

 

As you can see, Allen uses “overlap” only when there is a partial overlap. The OTN questioner really wanted to merge any date ranges that did not have a gap, in other words every case except “precedes” and “preceded by”.

To test this, I’ll create a little table with 13 test cases:

drop table t purge;
create table t (
  test_case varchar2(32) not null,
  start_date date not null,
  end_date date not null
);
Insert into t values ('01:precedes',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('01:precedes',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('02:meets',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('02:meets',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('03:overlaps',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('04:finished by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('04:finished by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('05:contains',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('05:contains',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('06:starts',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('07:equals',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('08:started by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('09:during',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('09:during',to_date('01','DD'),to_date('04','DD'));
Insert into t values ('10:finishes',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('10:finishes',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('11:overlapped by',to_date('02','DD'),to_date('04','DD'));
Insert into t values ('11:overlapped by',to_date('01','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('02','DD'),to_date('03','DD'));
Insert into t values ('12:met by',to_date('01','DD'),to_date('02','DD'));
Insert into t values ('13:preceded by',to_date('03','DD'),to_date('04','DD'));
Insert into t values ('13:preceded by',to_date('01','DD'),to_date('02','DD'));
commit;
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-02
02:meets 2015-06-02 2015-06-03
03:overlaps 2015-06-01 2015-06-03
03:overlaps 2015-06-02 2015-06-04
04:finished by 2015-06-01 2015-06-03
04:finished by 2015-06-02 2015-06-03
05:contains 2015-06-01 2015-06-04
05:contains 2015-06-02 2015-06-03
06:starts 2015-06-01 2015-06-02
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
08:started by 2015-06-01 2015-06-02
09:during 2015-06-02 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-02 2015-06-03
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-02 2015-06-04
11:overlapped by 2015-06-01 2015-06-03
12:met by 2015-06-02 2015-06-03
12:met by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04
13:preceded by 2015-06-01 2015-06-02

The solution

I wrote about the “start of group” method some time ago.

  • The first step is to assign 1 to each row that starts a group and 0 to every other row.
    This is the tricky part: I partition by test_case and order by start_date and end_date, and start a group only when the current start_date is greater than any preceding end_date.
  • The next step is to assign a group to each row using a running sum of the 1s and 0s.
  • Finally, I do the group by.
with grp_starts as (
  select test_case, start_date, end_date,
  case
    when start_date > max(end_date) over(
      partition by test_case order by start_date, end_date
      rows between unbounded preceding and 1 preceding
    )
    then 1 else 0
  end grp_start
  from t
)
, grps as (
  select test_case, start_date, end_date,
  sum(grp_start) over(
    partition by test_case order by start_date, end_date
  ) grp
  from grp_starts
)
select test_case,
min(start_date) start_date,
max(end_date) end_date
from grps
group by test_case, grp
order by 1, 2;
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-03
03:overlaps 2015-06-01 2015-06-04
04:finished by 2015-06-01 2015-06-03
05:contains 2015-06-01 2015-06-04
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-01 2015-06-04
12:met by 2015-06-01 2015-06-03
13:preceded by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04

Conclusion

Once again, the “start of group” method demonstrates its power, combined with a “window clause” in the analytic function that lets us identify the starting row of our group. Thanks again to Karthick_App for noticing that my first solution was incomplete.

And don’t forget: this doesn’t work if the end date can be NULL. If you allow that, substitute DATE ‘9999-12-31′ for the NULL at the beginning of the query, then translate it back to NULL at the end.

Join Tables on Date Ranges

A recent question on the OTN SQL forum asked how best to join two tables related by ID and date range, in order to insert one row per date into a data warehouse. One solution was to expand the data from each table, creating one row per date, then join on date. I think it’s more efficient to join on date range, then expand. Here’s how.

Input Tables and Requirement

I am going to simplify a bit from the OTN setup.

The first table contains an ID, a start date and an end date. (I recommend to always use exclusive end dates.) The final output should have one row for every date within the date range.

create table table_1 (
  id number primary key,
  start_date date not null check (start_date = trunc(start_date)),
  end_date date not null check (end_date = trunc(end_date)),
  check (end_date > start_date)
);
TABLE_1 Output
ID START_DATE END_DATE EACH_DATE
1 2015-03-01 2015-03-05 2015-03-01
2015-03-02
2015-03-03
2015-03-04

 

The second table contains the ID and a “review date” that assigns a “group ID”. In the output, each row should contain the data from the most recent review – or null if there was no prior review.

create table table_2 (
  id number,
  review_date date not null check (review_date = trunc(review_date)),
  group_id varchar2(20) not null,
  primary key (id, review_date)
);
TABLE_1 Output TABLE_2
ID START_DATE END_DATE EACH_DATE REVIEW_DATE GROUP_ID ID REVIEW_DATE GROUP_ID
1 2015-03-01 2015-03-05 2015-03-01 (null) (null)
2015-03-02 2015-03-02 First 1 2015-03-02 First
2015-03-03 2015-03-02 First
2015-03-04 2015-03-04 Second 1 2015-03-04 Second

 

Note: the most recent review is used even if that review took place before the start date in table 1.

The algorithm: join, then expand

If you look at the pretty table just above, you might think that I need to generate the green rows, then join each table to them. That means joining three sources. When I tell you that each ID can have a date range of several months, then you see that the extra work and memory required are not negligible.

So let’s try to join the two tables directly, based on intersecting date ranges:

  1. TABLE_1 already has a start date and an end date.
  2. TABLE_2 has a review date, which starts a range.
    1. I use the LEAD() analytic function to get the next review date, which is the end of the range.
    2. To identify the first row for each ID, I use the LAG() analytic function to get the previous review date (which will be NULL on the first row). We’ll see why.
  3. I join the tables:
    when the review date is earlier than the end date,
    and the next review date is either null or later than the start date.
    It’s a LEFT JOIN since I produce rows whether the ID is in TABLE_2 or not.
  4. Now I have to create a new date range that represents just the intersection.
    1. The beginning is the later of start date and review date – except if the previous review date is null.
      In that case this is the first row for an ID, so the beginning is the start date.
    2. The end is the earlier of end date and next review date.
  5. Finally I expand the rows
    1. When expanding the first row, I may find a start date that is earlier than the review date (see 4A above).
      For days earlier than the review date, I output a row with a null review date and a null group id.

Test data

Allen’s Interval Algebra defines 13 possible relations between date ranges.

Day of month: 1 2 3 4
1: precedes 1 2
3 4
2: meets 1 2
2 3
3: overlaps 1 3
2 4
4: finished by 1 3
2 3
5: contains 1 4
2 3
6: starts 1 2
1 3
7: equals 1 2
1 2
8: started by 1 3
1 2
9: during 2 3
1 4
10: finishes 2 3
1 3
11: overlapped by 2 4
1 3
12: met by 2 3
1 2
13: preceded by 3 4
1 2

In my test data, ID 0 has no review date. IDs 1 through 13 have one review date, and IDs 14 through 26 have two. The review dates correspond to the start and end dates for the green rows in the table.

alter session set nls_date_format='DD';

insert into table_1 select
0, to_date(1),to_date(2) from dual union all select
1, to_date(1),to_date(2) from dual union all select
2, to_date(1),to_date(2) from dual union all select
3, to_date(1),to_date(3) from dual union all select
4, to_date(1),to_date(3) from dual union all select
5, to_date(1),to_date(4) from dual union all select
6, to_date(1),to_date(2) from dual union all select
7, to_date(1),to_date(2) from dual union all select
8, to_date(1),to_date(3) from dual union all select
9, to_date(2),to_date(3) from dual union all select
10, to_date(2),to_date(3) from dual union all select
11, to_date(2),to_date(4) from dual union all select
12, to_date(2),to_date(3) from dual union all select
13, to_date(3),to_date(4) from dual;

insert into table_1
select id+13, start_date, end_date
from table_1
where id > 0;

insert into table_2 select
1, to_date(3),'precedes' from dual union all select
2, to_date(2),'meets' from dual union all select
3, to_date(2),'overlaps' from dual union all select
4, to_date(2),'finished by' from dual union all select
5, to_date(2),'contains' from dual union all select
6, to_date(1),'starts' from dual union all select
7, to_date(1),'equals' from dual union all select
8, to_date(1),'started by' from dual union all select
9, to_date(1),'during' from dual union all select
10, to_date(1),'finishes' from dual union all select
11, to_date(1),'overlapped by' from dual union all select
12, to_date(1),'met by' from dual union all select
13, to_date(1),'preceded by' from dual;

insert into table_2
select id+13, review_date, group_id from table_2;

insert into table_2
select id,
review_date +
case when group_id = 'during' then 3
  when group_id in ('overlaps','starts','finishes','overlapped by') then 2
  else 1
end,
'after ' || group_id
from table_2
where id > 13;
commit;

alter session set nls_date_format='yyyy-mm-dd';

The solution (finally!)

select id, column_value each_date, start_date, end_date, 
  case when column_value >= review_date then review_date end review_date, 
  case when column_value >= review_date then group_id end group_id
from (
  select a.id, start_date, end_date, review_date, group_id,
  case when prev_rd is null
    then start_date
    else greatest(start_date, review_date)
  end range_start,
  case when end_rd is null
    then end_date
    else least(end_date, end_rd)
  end range_end
  from table_1 a
  left join (
    select review_date, group_id, id,
    lead(review_date) over (partition by id order by review_date) end_rd,
    lag(review_date) over (partition by id order by review_date) prev_rd
    from table_2
  ) b
  on a.id = b.id
    and start_date < coalesce(end_rd, end_date)
    and end_date > review_date
) 
,
table(cast(multiset(
  select range_start-1+level from dual
  connect by range_start-1+level < range_end
) as sys.odcidatelist))
order by id, each_date;
  • Lines 16-19: get previous and next review dates from table_2
  • Lines 21-23: left join on ID, and end date greater than review date, and start date less than next review date. If the next review date is null, we are on the last review date so we go all the way to the end date.
  • Lines 6-13: calculate the date range for the joined row.
  • Lines 2-3: leave the review date and group id columns NULL if EACH_DATE is earlier than the first review date.
ID EACH_DATE START_DATE END_DATE REVIEW_DATE GROUP_ID
0 2015-03-01 2015-03-01 2015-03-02
1 2015-03-01 2015-03-01 2015-03-02
2 2015-03-01 2015-03-01 2015-03-02
3 2015-03-01 2015-03-01 2015-03-03
3 2015-03-02 2015-03-01 2015-03-03 2015-03-02 overlaps
4 2015-03-01 2015-03-01 2015-03-03
4 2015-03-02 2015-03-01 2015-03-03 2015-03-02 finished by
5 2015-03-01 2015-03-01 2015-03-04
5 2015-03-02 2015-03-01 2015-03-04 2015-03-02 contains
5 2015-03-03 2015-03-01 2015-03-04 2015-03-02 contains
6 2015-03-01 2015-03-01 2015-03-02 2015-03-01 starts
7 2015-03-01 2015-03-01 2015-03-02 2015-03-01 equals
8 2015-03-01 2015-03-01 2015-03-03 2015-03-01 started by
8 2015-03-02 2015-03-01 2015-03-03 2015-03-01 started by
9 2015-03-02 2015-03-02 2015-03-03 2015-03-01 during
10 2015-03-02 2015-03-02 2015-03-03 2015-03-01 finishes
11 2015-03-02 2015-03-02 2015-03-04 2015-03-01 overlapped by
11 2015-03-03 2015-03-02 2015-03-04 2015-03-01 overlapped by
12 2015-03-02 2015-03-02 2015-03-03 2015-03-01 met by
13 2015-03-03 2015-03-03 2015-03-04 2015-03-01 preceded by
14 2015-03-01 2015-03-01 2015-03-02
15 2015-03-01 2015-03-01 2015-03-02
16 2015-03-01 2015-03-01 2015-03-03
16 2015-03-02 2015-03-01 2015-03-03 2015-03-02 overlaps
17 2015-03-01 2015-03-01 2015-03-03
17 2015-03-02 2015-03-01 2015-03-03 2015-03-02 finished by
18 2015-03-01 2015-03-01 2015-03-04
18 2015-03-02 2015-03-01 2015-03-04 2015-03-02 contains
18 2015-03-03 2015-03-01 2015-03-04 2015-03-03 after contains
19 2015-03-01 2015-03-01 2015-03-02 2015-03-01 starts
20 2015-03-01 2015-03-01 2015-03-02 2015-03-01 equals
21 2015-03-01 2015-03-01 2015-03-03 2015-03-01 started by
21 2015-03-02 2015-03-01 2015-03-03 2015-03-02 after started by
22 2015-03-02 2015-03-02 2015-03-03 2015-03-01 during
23 2015-03-02 2015-03-02 2015-03-03 2015-03-01 finishes
24 2015-03-02 2015-03-02 2015-03-04 2015-03-01 overlapped by
24 2015-03-03 2015-03-02 2015-03-04 2015-03-03 after overlapped by
25 2015-03-02 2015-03-02 2015-03-03 2015-03-02 after met by
26 2015-03-03 2015-03-03 2015-03-04 2015-03-02 after preceded by

 

1 Uncool Thing about the COMPARE_SYNC Package

It only works in version 12 :( That’s right, I didn’t test in previous versions and there was a bug. Please refer to the COMPARE_SYNC post for the new, improved version. Here is what I changed:

  1. The name is now COMPARE_SYNC. I don’t know what got into me to put those four meaningless letters in front.
  2. The column lists are formatted so that no line is longer than 80 characters.
  3. The column lists are now CLOBs, so there is no artificial limit on the number of columns.
  4. When generating the column lists, I query ALL_TAB_COLS differently for version 12 and previous versions. In version 12, I query the new USER_GENERATED column, and in previous versions I query HIDDEN_COLUMN instead.