At the UKOUG Tech15 conference, Jonathan Lewis presented a question asked on OTN and the answer published on his blog. The problem is how to count the number of rows in one table that fall into the ranges stored in another table.

The straightforward solution in SQL is to join the two tables using a `BETWEEN`

condition, but that involves comparing each row in one table to every row in the other table. With the questioner’s data, this solution takes seven hours.

Jonathan’s solution requires more than one SQL statement, but it can answer the question in a few minutes. Please refer to his blog for details about the problem statement and his solution.

If you have database version 12c or later, the `MATCH_RECOGNIZE`

clause can help solve this problem directly. I’ll present two variants:

- the first assumes that the ranges cannot overlap,
- and the second handles any ranges, including overlaps.

## Ranges without overlaps

Here are some test data using Jonathan’s table and column names:

define d_sqrt_rows = 10 create table msisdns as with data as ( select null from dual connect by level <= &d_sqrt_rows ) select rownum msisdn from data, data where rownum < power(&d_sqrt_rows,2); create table number_ranges as select level + &d_sqrt_rows * (level - 1) from_number, &d_sqrt_rows * level to_number from dual connect by level <= &d_sqrt_rows; select * from number_ranges;

FROM_NUMBER | TO_NUMBER |
---|---|

1 | 10 |

12 | 20 |

23 | 30 |

34 | 40 |

45 | 50 |

56 | 60 |

67 | 70 |

78 | 80 |

89 | 90 |

100 | 100 |

I start by combining the ranges and the individual rows in one dataset:

select from_number, to_number from number_ranges union all select msisdn, null from msisdns

Using the `MATCH_RECOGNIZE`

clause, I can:

- Order the data by from_number
- then process the data in that order:
- Assign the label “A” to the first range
- Assign the label “B” to all the following individual rows that fall within the range
- Return one row with the range from “A” and the count of “B” rows
- Then go on to the next range, skipping any individual rows that fall after the first range and before the second.

select * from ( select from_number, to_number from number_ranges union all select msisdn, null from msisdns ) match_recognize( order by from_number, to_number measures a.from_number from_number, a.to_number to_number, count(b.*) range_count pattern(a b*) define a as to_number is not null, b as from_number <= a.to_number );

FROM_NUMBER | TO_NUMBER | RANGE_COUNT |
---|---|---|

1 | 10 | 10 |

12 | 20 | 9 |

23 | 30 | 8 |

34 | 40 | 7 |

45 | 50 | 6 |

56 | 60 | 5 |

67 | 70 | 4 |

78 | 80 | 3 |

89 | 90 | 2 |

100 | 100 | 0 |

Yes, the last RANGE_COUNT should be 0 because I only loaded the first 99 individual rows.

## Ranges with overlaps

To solve this, I need to use `MATCH_RECOGNIZE`

twice! This is fun for me, though it may not be for you…

First, I’ll update the number_range table to create overlaps:

update number_ranges set to_number = to_number+10; select * from number_ranges;

FROM_NUMBER | TO_NUMBER |
---|---|

1 | 20 |

12 | 30 |

23 | 40 |

34 | 50 |

45 | 60 |

56 | 70 |

67 | 80 |

78 | 90 |

89 | 100 |

100 | 110 |

Overlapping ranges require us to count some individual rows more than once, if they show up in more than one range. One way to handle this is to break down the ranges into what I call “base ranges” every time there is an intersection. For example, for ranges 1-4 and 3-6:

- the “base ranges” are 1-2, 3-4 and 5-6
- and the rows in range 3-4 must be counted both in range 1-4 and in range 3-6.

I identify the “base ranges” by putting both “from” and “to” numbers in the first column so I can sort them together:

select from_number, 0 rowtype from number_ranges union all select msisdn, 1 from msisdns union all select to_number, 2 rowtype from number_ranges

Once I order by from_number and rowtype, all the individual rows will be sandwiched between “from” or “to” numbers having a rowtype other than 1. So, I just return one row per “sandwich”. Notice that after each match I start from the last row of the previous match (“after match skip to last a”): the bottom slice of one sandwich becomes the top slice of the next sandwich.

with ranges_plus_data as ( select from_number num, 0 rowtype from number_ranges union all select msisdn, 1 from msisdns union all select to_number, 2 rowtype from number_ranges ) select * from ranges_plus_data match_recognize( order by num, rowtype measures first(num) from_number, last(num) to_number, count(b.*) range_count after match skip to last a pattern(a b+ a) define a as rowtype !=1, b as rowtype = 1 );

FROM_NUMBER | TO_NUMBER | RANGE_COUNT |
---|---|---|

1 | 12 | 11 |

12 | 20 | 9 |

20 | 23 | 2 |

23 | 30 | 8 |

30 | 34 | 3 |

34 | 40 | 7 |

40 | 45 | 4 |

45 | 50 | 6 |

50 | 56 | 5 |

56 | 60 | 5 |

60 | 67 | 6 |

67 | 70 | 4 |

70 | 78 | 7 |

78 | 80 | 3 |

80 | 89 | 8 |

89 | 90 | 2 |

90 | 100 | 9 |

Now I need to match these base ranges with each original range.

- I combine original ranges and base ranges, then sort them.
- Then I start from each original range and sum all the range_counts of the included base ranges.
- To make sure I don’t skip an original range, I start looking for the next match one row after the beginning of the previous match (“after match skip to next row”).

with ranges_plus_data as ( select from_number num, 0 rowtype from number_ranges union all select msisdn, 1 from msisdns union all select to_number, 2 rowtype from number_ranges ) , base_range_sums as ( select * from ranges_plus_data match_recognize( order by num, rowtype measures first(num) from_number, last(num) to_number, count(b.*) range_count after match skip to last a pattern(a b+ a) define a as rowtype !=1, b as rowtype = 1 ) ) select from_number, to_number, range_count from ( select * from base_range_sums union all select from_number, to_number, 0 from number_ranges ) match_recognize( order by from_number, range_count, to_number measures first(from_number) from_number, first(to_number) to_number, sum(range_count) range_count after match skip to next row pattern(a (a|b)*) define a as range_count = 0, b as to_number <= first(to_number) );

FROM_NUMBER | TO_NUMBER | RANGE_COUNT |
---|---|---|

1 | 20 | 20 |

12 | 30 | 19 |

23 | 40 | 18 |

34 | 50 | 17 |

45 | 60 | 16 |

56 | 70 | 15 |

67 | 80 | 14 |

78 | 90 | 13 |

89 | 100 | 11 |

100 | 110 | 0 |

With 40 million rows and 1000 ranges, Jonathan’s solution ran in less than two minutes. This solution runs in a little over a minute; it is not necessarily “better” but it is a reasonable alternative for a shop with 12c and a SQL developer who is familiar with `MATCH_RECOGNIZE`

.

Hi Stew.

As always, it’s fun reading your elegant solutions.

Regarding the “Ranges with Overlaps” problem – I think that your “Ranges without Overlaps” solution can work there too with 2 small additions:

select * from (

select from_number, to_number from number_ranges

union all

select msisdn, null from msisdns

)

match_recognize(

order by from_number, to_number

measures a.from_number from_number,

a.to_number to_number,

count(b.*) – COUNT(B.TO_NUMBER) range_count

AFTER MATCH SKIP TO NEXT ROW

pattern(a b*)

define a as to_number is not null,

b as from_number <= a.to_number

);

Thanks,

Oren.

Hello Oren,

Thanks for the compliment and your comment.

I agree that your adjustment produces the correct result. I was worried that doing everything in one step would be slower, because of counting the individual rows several times.

If you consult Jonathan’s post, you will see his test data with one million rows. Your adjusted solution takes 3 minutes and mine takes 1.2 seconds.

I really should have tested that alternative and explained why I chose two steps instead of one.

Best regards, Stew

Pingback: Counting | Oracle Scratchpad

Stew,

This would make a lovely addition to my presentation.

With your permission I’d like to include your solution in my slide deck for future presentations of “Just Don’t Do It”, showing the non-overlapping case, and linking here for the overlapping case.

Regards

Jonathan Lewis

Hello Jonathan,

That is quite a compliment. I would of course be honored and pleased to contribute further to your presentation – having already provided my “Chunking Table” code as an example of what just not to do ;)

Pingback: Top 50 Oracle SQL Blogs for 2016 - Complete IT Professional