# Overlapping ranges with priority

A few years ago, Alberto Dell’Era blogged about product prices with overlapping date ranges; “on an overlapping range, the strongest priority (lower number) wins.” His analysis, explanation and solution are all excellent. I’m revisiting the subject to suggest a few improvements (and mention `MATCH_RECOGNIZE` yet again).

To explain more easily, Alberto uses letters (a,b,c,d,e) instead of dates, and in the code he changes the letters to numbers using the `ascii()` function. (The logic is the same, whether the ranges are dates or something else.) He uses the term `sku` as a product identifier. “Prices of skus (i.e. goods) have validity ranges and can overlap; on an overlapping range, the strongest priority (lower number) wins.” Sample input:

b______(0,\$200)_______d
b__________c______(1,\$300)_______e

Notice that d and e are not colored, because they are not included in the range. Since priority 0 is stronger than 1, the expected output is:

b_______(\$200)________d
b__________c__________d__(\$300)__e

#### The Algorithm

Step Expected Output
1) Get distinct start and end points b,c,d,e
2) Create non-overlapping base ranges (Alberto has a `not null` end point, but in general nulls may be useful.) b__________c
b__________c__________d
b__________c__________d__________e
b__________c__________d__________e_(null)
3) Assign price/priority to base ranges b_(0,\$200)_c
b__________c_(0,\$200)_d
b__________c_(1,\$300)_d
b__________c__________d_(1,\$300)_e
4) For each base range, keep highest priority b__(\$200)__c
b__________c__(\$200)__d
b__________c__________d__(\$300)__e
5) Merge consecutive ranges with same price b_______(\$200)________d
b__________c__________d__(\$300)__e

#### 1) Get distinct start and end points

Alberto uses UNION to get the start points, then the end points.

``````with instants as (
select sku, a as i from ranges
union
select sku, b as i from ranges
),``````

This requires two scans of the index. I suggest using `UNPIVOT`, which does the work with one scan. I’ll show you that next.

#### 2) Create non-overlapping base ranges

Alberto does this with the analytic `LEAD` function:

``````base_ranges as (
select * from (
select sku,
i as ba,
lead(i) over (partition by sku order by i) as bb
from instants
)
where bb is not null
),``````

I do the same thing, but thanks to `UNPIVOT` I can do steps 1) and 2) with one `SELECT`:

``````with base_ranges as (
select distinct sku, i ba,
lead(i) over(partition by sku order by i, col) bb,
dense_rank() over(partition by sku order by i) seq
from ranges
unpivot(i for col in(a,b))
),``````

Notice I add the column SEQ, which assigns consecutive integers to each date range. SEQ will allow me to use the Tabibitosan method in step 5) !

#### 3) Assign price/priority to base ranges

We both do this by joining the base ranges to the original rows. Here’s Alberto’s code:

``````factored_ranges as (
select i.sku, bi.ba, bi.bb, i.a, i.b, i.prio, i.price
from ranges i, base_ranges bi
where i.sku = bi.sku
and (i.a <= bi.ba and bi.ba < i.b)
),``````

#### 4) For each base range, keep highest priority

Alberto uses the `DENSE_RANK` analytic function on each base range, giving a rank of 1 to the highest priority, then keeps only the rows where the rank is 1.

``````strongest_factored_ranges as (
select sku, ba, bb, prio, price
from (
select sku, ba, bb, prio, price,
dense_rank () over (partition by sku, ba, bb order by prio) as rnk
from factored_ranges
)
where rnk = 1
),``````

I decided to use the `KEEP (DENSE_RANK FIRST...)` aggregate function. I can then combine the aggregation with another analytic function in the same `SELECT`. I’ll show this in a minute.

#### 5) Merge consecutive ranges with same price

Alberto uses an equivalent to the “start of group” method to merge the prices.

``````-- STEP will be zero if a range can be joined to the previous one, since:
-- a) they are consecutive (no gap between them)
-- b) they have the same price
ranges_with_step as (
select sku, ba, bb, prio, price,
decode (
price,
lag(price) over (partition by sku order by ba),
ba - lag(bb) over (partition by sku order by ba),
1000
) step
from strongest_factored_ranges
),
-- the integral of step over the ranges
-- joinable ranges will hence have the same value for "interval" since step is zero
ranges_with_step_integral as (
select sku, ba, bb, prio, price, step,
sum(step) over (
partition by sku order by ba
rows between unbounded preceding and current row
) as integral
from ranges_with_step
),
-- this joins the joinable ranges
ranges_joined as (
select * from (
select sku, ba, bb, prio, price, step, integral,
min(ba) over (partition by sku, integral) as a,
max(bb) over (partition by sku, integral) as b
from ranges_with_step_integral
)
where step > 0
)
select sku, chr(a) "START", chr(b) "END", price
from ranges_joined;``````

I’ve waited until now to show my code, because I do steps 3), 4) and part of 5) in the same `SELECT`.

``````strongest_factored_ranges as (
select bi.sku,
bi.seq -
row_number() over(partition by bi.sku order by min(i.prio), bi.seq) grp,
bi.ba a, bi.bb b,
min(i.price) keep (dense_rank first order by i.prio) price
from ranges i, base_ranges bi
where i.sku = bi.sku
and (i.a <= bi.ba
and (bi.bb <= i.b or bi.bb is null and i.b is null))
group by bi.sku, bi.seq, bi.ba, bi.bb
)
select sku, chr(min(a)) "START", chr(max(b)) "END", price
from strongest_factored_ranges
group by sku, grp, price;``````
• I changed the WHERE clause to allow for NULL end points, just to show how.
• The `KEEP (DENSE_RANK FIRST...)` aggregate function gives me the price of the highest priority for each base range.
• Once the aggregation is done, I can use an analytic function, and even refer to the result of the aggregation within the analytic. Here is where I use the Tabibitosan method to assign the same grp to consecutive ranges with the same price.

#### Using 12c MATCH_RECOGNIZE

With 12c row pattern matching, I don’t need the Tabibitosan method so I can simplify the preceding subqueries and use fewer analytic functions. Here is the entire solution:

``````-- steps 1) and 2)
with base_ranges as (
select distinct sku, i ba,
lead(i) over(partition by sku order by i, col) bb
from ranges
unpivot(i for col in(a,b))
)
-- steps 3) and 4)
, strongest_factored_ranges as (
select bi.sku,
bi.ba a, bi.bb b,
min(i.price) keep (dense_rank first order by i.prio) price
from ranges i, base_ranges bi
where i.sku = bi.sku
and (i.a <= bi.ba
and (bi.bb <= i.b or bi.bb is null and i.b is null))
group by bi.sku, bi.ba, bi.bb
)
-- step 5)
select * from strongest_factored_ranges
match_recognize(
partition by sku order by a, b
measures chr(first(a)) "START", chr(last(b)) "END",
first(price) price
pattern(A B*)
define B as (a, price) = ((prev(b), a.price))
);``````

#### Performance

You may be a bit disappointed in my performance findings: I certainly am. The fact is, using `UNPIVOT` instead of `UNION` in step 1) is clearly better, because it does one index scan instead of two. Other than that, Alberto’s solution and my pre-12c “improvements” have equivalent performance and use about the same resources. On the other hand, `MATCH_RECOGNIZE` is a little slower because it uses more CPU. In conclusion, in version 11.2 I would keep Alberto’s solution (except for that first UNION) because it is easier to maintain, and I would use MATCH_RECOGNIZE in version 12c for the same reason (unless the bit of extra CPU is a problem).

## 2 thoughts on “Overlapping ranges with priority”

1. Stew,

very, very interesting – I will for sure try out the unpivot improvement in our scenario/hardware!

Congratulations for your blog – a very interesting “catalogue of solutions” explained excellently, and I do know how much time it takes to be clear and concise :)

Ciao
Alberto Dell’Era

• Alberto, thanks very much for your kind words. I would be interested to know if UNPIVOT makes a difference in your environment. By the way, your English is as excellent as your blog’s content…

Ciao (a great word I don’t dare use outside of Italy!),
Stew Ashton