Overlapping ranges with priority

A few years ago, Alberto Dell’Era blogged about product prices with overlapping date ranges; “on an overlapping range, the strongest priority (lower number) wins.” His analysis, explanation and solution are all excellent. I’m revisiting the subject to suggest a few improvements (and mention MATCH_RECOGNIZE yet again).

To explain more easily, Alberto uses letters (a,b,c,d,e) instead of dates, and in the code he changes the letters to numbers using the ascii() function. (The logic is the same, whether the ranges are dates or something else.) He uses the term sku as a product identifier. “Prices of skus (i.e. goods) have validity ranges and can overlap; on an overlapping range, the strongest priority (lower number) wins.” Sample input:

b______(0,$200)_______d
b__________c______(1,$300)_______e

Notice that d and e are not colored, because they are not included in the range. Since priority 0 is stronger than 1, the expected output is:

b_______($200)________d
b__________c__________d__($300)__e

The Algorithm

Step Expected Output
1) Get distinct start and end points b,c,d,e
2) Create non-overlapping base ranges (Alberto has a not null end point, but in general nulls may be useful.) b__________c
b__________c__________d
b__________c__________d__________e
b__________c__________d__________e_(null)
3) Assign price/priority to base ranges b_(0,$200)_c
b__________c_(0,$200)_d
b__________c_(1,$300)_d
b__________c__________d_(1,$300)_e
4) For each base range, keep highest priority b__($200)__c
b__________c__($200)__d
b__________c__________d__($300)__e
5) Merge consecutive ranges with same price b_______($200)________d
b__________c__________d__($300)__e

1) Get distinct start and end points

Alberto uses UNION to get the start points, then the end points.

with instants as (
  select sku, a as i from ranges
  union
  select sku, b as i from ranges
),

This requires two scans of the index. I suggest using UNPIVOT, which does the work with one scan. I’ll show you that next.

2) Create non-overlapping base ranges

Alberto does this with the analytic LEAD function:

base_ranges as (
  select * from (
    select sku,
    i as ba,
    lead(i) over (partition by sku order by i) as bb
    from instants
  )
  where bb is not null
),

I do the same thing, but thanks to UNPIVOT I can do steps 1) and 2) with one SELECT:

with base_ranges as (
  select distinct sku, i ba,
  lead(i) over(partition by sku order by i, col) bb,
  dense_rank() over(partition by sku order by i) seq
  from ranges
  unpivot(i for col in(a,b))
),

Notice I add the column SEQ, which assigns consecutive integers to each date range. SEQ will allow me to use the Tabibitosan method in step 5) !

3) Assign price/priority to base ranges

We both do this by joining the base ranges to the original rows. Here’s Alberto’s code:

factored_ranges as (
  select i.sku, bi.ba, bi.bb, i.a, i.b, i.prio, i.price
  from ranges i, base_ranges bi
  where i.sku = bi.sku
  and (i.a <= bi.ba and bi.ba < i.b)  
),

4) For each base range, keep highest priority

Alberto uses the DENSE_RANK analytic function on each base range, giving a rank of 1 to the highest priority, then keeps only the rows where the rank is 1.

strongest_factored_ranges as (
  select sku, ba, bb, prio, price
  from (
    select sku, ba, bb, prio, price,
    dense_rank () over (partition by sku, ba, bb order by prio) as rnk
    from factored_ranges
  )
  where rnk = 1
),

I decided to use the KEEP (DENSE_RANK FIRST...) aggregate function. I can then combine the aggregation with another analytic function in the same SELECT. I’ll show this in a minute.

5) Merge consecutive ranges with same price

Alberto uses an equivalent to the “start of group” method to merge the prices.

-- STEP will be zero if a range can be joined to the previous one, since:
-- a) they are consecutive (no gap between them)
-- b) they have the same price
-- see http://www.oracle.com/technetwork/issue-archive/o24asktom-095715.html
ranges_with_step as (
  select sku, ba, bb, prio, price,
  decode (
    price,
    lag(price) over (partition by sku order by ba),
    ba - lag(bb) over (partition by sku order by ba),
    1000
  ) step
  from strongest_factored_ranges
), 
-- the integral of step over the ranges
-- joinable ranges will hence have the same value for "interval" since step is zero
ranges_with_step_integral as (
  select sku, ba, bb, prio, price, step,
  sum(step) over (
    partition by sku order by ba
    rows between unbounded preceding and current row
  ) as integral
  from ranges_with_step
), 
-- this joins the joinable ranges
ranges_joined as (
  select * from (
    select sku, ba, bb, prio, price, step, integral,
    min(ba) over (partition by sku, integral) as a,
    max(bb) over (partition by sku, integral) as b
    from ranges_with_step_integral
  )
  where step > 0 
)
select sku, chr(a) "START", chr(b) "END", price
from ranges_joined;

I’ve waited until now to show my code, because I do steps 3), 4) and part of 5) in the same SELECT.

strongest_factored_ranges as (
  select bi.sku,
  bi.seq -
    row_number() over(partition by bi.sku order by min(i.prio), bi.seq) grp,
  bi.ba a, bi.bb b, 
  min(i.price) keep (dense_rank first order by i.prio) price
  from ranges i, base_ranges bi
  where i.sku = bi.sku
  and (i.a <= bi.ba
  and (bi.bb <= i.b or bi.bb is null and i.b is null))
  group by bi.sku, bi.seq, bi.ba, bi.bb
)
select sku, chr(min(a)) "START", chr(max(b)) "END", price
from strongest_factored_ranges
group by sku, grp, price;
  • I changed the WHERE clause to allow for NULL end points, just to show how.
  • The KEEP (DENSE_RANK FIRST...) aggregate function gives me the price of the highest priority for each base range.
  • Once the aggregation is done, I can use an analytic function, and even refer to the result of the aggregation within the analytic. Here is where I use the Tabibitosan method to assign the same grp to consecutive ranges with the same price.

Using 12c MATCH_RECOGNIZE

With 12c row pattern matching, I don’t need the Tabibitosan method so I can simplify the preceding subqueries and use fewer analytic functions. Here is the entire solution:

-- steps 1) and 2)
with base_ranges as (
  select distinct sku, i ba,
  lead(i) over(partition by sku order by i, col) bb
  from ranges
  unpivot(i for col in(a,b))
)
-- steps 3) and 4)
, strongest_factored_ranges as (
  select bi.sku,
  bi.ba a, bi.bb b, 
  min(i.price) keep (dense_rank first order by i.prio) price
  from ranges i, base_ranges bi
  where i.sku = bi.sku
  and (i.a <= bi.ba
  and (bi.bb <= i.b or bi.bb is null and i.b is null))
  group by bi.sku, bi.ba, bi.bb
)
-- step 5)
select * from strongest_factored_ranges
match_recognize(
  partition by sku order by a, b
  measures chr(first(a)) "START", chr(last(b)) "END",
    first(price) price
  pattern(A B*)
  define B as (a, price) = ((prev(b), a.price))
);

Performance

You may be a bit disappointed in my performance findings: I certainly am. The fact is, using UNPIVOT instead of UNION in step 1) is clearly better, because it does one index scan instead of two. Other than that, Alberto’s solution and my pre-12c “improvements” have equivalent performance and use about the same resources. On the other hand, MATCH_RECOGNIZE is a little slower because it uses more CPU. In conclusion, in version 11.2 I would keep Alberto’s solution (except for that first UNION) because it is easier to maintain, and I would use MATCH_RECOGNIZE in version 12c for the same reason (unless the bit of extra CPU is a problem).

About these ads

2 thoughts on “Overlapping ranges with priority

  1. Stew,

    very, very interesting – I will for sure try out the unpivot improvement in our scenario/hardware!

    Congratulations for your blog – a very interesting “catalogue of solutions” explained excellently, and I do know how much time it takes to be clear and concise :)

    Ciao
    Alberto Dell’Era

    • Alberto, thanks very much for your kind words. I would be interested to know if UNPIVOT makes a difference in your environment. By the way, your English is as excellent as your blog’s content…

      Ciao (a great word I don’t dare use outside of Italy!),
      Stew Ashton

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s