Merging Overlapping Date Ranges with MATCH_RECOGNIZE

I forgot to add a MATCH_RECOGNIZE solution to my last post on merging overlapping date ranges! That should take me just a few minutes, right? Wrong: it’s not that easy and here’s why.

For test data, please refer to the previous post.

To Merge or not to Merge?

The idea is to merge date ranges if they meet or overlap either partially or completely – in other words, we keep merging as long as there is not a gap. How do we know whether or not a gap exists?

  • We sort by start date, then end date.
  • The current row should be merged if its start date is less than or equal to
    the most recent end date, up to but not including the current row.
  • Again, the secret is to compare the most recent end date, not just the end date on the previous line.

The Analytics solution

With analytics, the “windowing clause” lets us do exactly what we want: when we say rows between unbounded preceding and 1 preceding then Oracle will apply the max() function to all the preceding rows, up to but not including the current row. We simply compare the current start date with that maximum preceding end date and we know whether or not to merge.

The MATCH_RECOGNIZE difficulty

MATCH_RECOGNIZE, available starting in version 12c, is pretty powerful and does things analytics can’t do, but it doesn’t really have a windowing clause. When doing comparisons, this can be a problem.

Let’s try a simple test:

select * from t 
match_recognize( 
  partition by test_case 
  order by start_date, end_date 
  measures match_number() match, max(end_date) max_date 
  all rows per match
  pattern(a+) 
  define a as start_date <= max(a.end_date)
)
where rownum <= 2; 

Here I’m comparing the start_date with the maximum date already assigned to A – no I’m not! That is not the way the DEFINE clause works. It assumes the current row belongs to A, then does the test based on that assumption. As a result, the end date from the current row is included in the computation. Look at the result:

TEST_CASE START_DATE END_DATE MATCH MAX_DATE
01:precedes 2015-06-01 2015-06-02 1 2015-06-02
01:precedes 2015-06-03 2015-06-04 1 2015-06-04

 

The “match number” is the same, which means the second line will be merged with the first – and all because the start date is being compared to the wrong end date.

The Workaround

MATCH_RECOGNIZE does give you access to other rows than the current row, using PREV, NEXT, FIRST and LAST. In this case, I could say LAST(end_date,1) and get the end date from the preceding matched line – or I could just say PREV(end_date) for the same result.

So how can I be sure that the preceding end date is the most recent? MATCH_RECOGNIZE doesn’t let you use analytic functions around LAST or PREV. I’ll just have to change the ORDER BY and sort by end date first. When I do, it finally works:

select * from t 
match_recognize( 
  partition by test_case 
  order by end_date, start_date 
  measures min(start_date) start_date, max(end_date) end_date 
  pattern(a b*) 
  define b as start_date <= last(end_date,1)
); 
TEST_CASE START_DATE END_DATE
01:precedes 2015-06-01 2015-06-02
01:precedes 2015-06-03 2015-06-04
02:meets 2015-06-01 2015-06-03
03:overlaps 2015-06-01 2015-06-04
04:finished by 2015-06-01 2015-06-03
05:contains 2015-06-01 2015-06-04
06:starts 2015-06-01 2015-06-03
07:equals 2015-06-01 2015-06-02
08:started by 2015-06-01 2015-06-03
09:during 2015-06-01 2015-06-04
10:finishes 2015-06-01 2015-06-03
11:overlapped by 2015-06-01 2015-06-04
12:met by 2015-06-01 2015-06-03
13:preceded by 2015-06-01 2015-06-02
13:preceded by 2015-06-03 2015-06-04

 
By the way, in my previous post I could have sorted by end date too and just used LAG(end_date) instead of the windowing clause.

Advertisements

2 thoughts on “Merging Overlapping Date Ranges with MATCH_RECOGNIZE

  1. Pingback: Log Buffer #427: A Carnival of the Vanities for DBAs | InsideMySQL

  2. Pingback: Log Buffer #427: A Carnival of the Vanities for DBAs | MySQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s