The Tabibitosan method by Aketi Jyuuzou is a very clever and efficient way to group rows with consecutive values. When it solves the problem, it can’t be beat — unless you have 12c. This method is worth explaining in its own right, so I’ll do my best; then I’ll make it easier with the `MATCH_RECOGNIZE`

clause.

#### A bit of math

Let’s say there are two consecutive integers, A and B, where A = B – 1.

There are two other non-consecutive integers, M and N, so M != N – 1

Now subtract 1 from A and 2 from B. The answer should be the same, since

B – 2 = (B – 1) – 1 = A – 1

(Remember, (B – 1) is equal to A since they are consecutive.)

On the other hand, if we subtract 1 from M and 2 from N, the answer will be different.

Generally, if we subtract one set of consecutive numbers from another set, the answer will always be the same, whereas if we subtract consecutive numbers from non-consecutive numbers the answers will be different. In other words, by subtracting consecutive numbers we have turned “consecutive” into *equality* and “non-consecutive” into *inequality*.

#### A simple example

Here is the data from Aketi’s first example:

create table Ex1 (NumVal) as select 1 from dual union all select 2 from dual union all select 3 from dual union all select 5 from dual union all select 6 from dual union all select 7 from dual union all select 10 from dual union all select 11 from dual union all select 12 from dual union all SELECT 20 FROM dual;

Aketi uses `ROW_NUMBER()`

to get known consecutive numbers, then he does the subtraction:

SELECT NumVal, Row_Number() over(order by NumVal) as rn, NumVal-Row_Number() over(order by NumVal) as "GRP=NUMVAL-RN" FROM Ex1;

NUMVAL | RN | GRP=NUMVAL-RN |
---|---|---|

1 | 1 | 0 |

2 | 2 | 0 |

3 | 3 | 0 |

5 | 4 | 1 |

6 | 5 | 1 |

7 | 6 | 1 |

10 | 7 | 3 |

11 | 8 | 3 |

12 | 9 | 3 |

20 | 10 | 10 |

The consecutive rows now have the same value for GRP. It doesn’t matter what that value is, because all we want to do is group by it. Now here’s the condensed version of the data, with the first and last value and the number of rows in each group.

select min(NumVal) firstval, max(NumVal) lastval, count(*) cnt FROM ( SELECT NumVal, NumVal-Row_Number() over(order by NumVal) as grp FROM Ex1 ) GROUP BY grp ORDER BY MIN(NumVal);

FIRSTVAL | LASTVAL | CNT |
---|---|---|

1 | 3 | 3 |

5 | 7 | 3 |

10 | 12 | 3 |

20 | 20 | 1 |

You can see how efficient this solution is: all it takes is one inline view with an analytic function to get the GRP value, then we do the `GROUP BY`

and we’re done!

#### Using MATCH_RECOGNIZE

Let’s think of this as a pattern matching problem. The pattern is “consecutive rows”, which we can translate as “the first row, then every row where NumVal is 1 more than the previous NumVal”.

SELECT * FROM ex1 MATCH_RECOGNIZE ( ORDER BY numval MEASURES FIRST(numval) firstval, LAST(numval) lastval, COUNT(*) cnt PATTERN (A b*) DEFINE b as numval = prev(numval)+1 );

I introduced the basic syntax in a recent post, so I’ll just mention the new stuff.

- Here the pattern has two parts, A and B*.
- A all alone means “match exactly one row”. Since A is not defined, the definition defaults to TRUE, meaning match one row no matter what.
- B* means take any and all following rows that fit the definition. The only difference between the ‘
**+**‘ I used before and the ‘*****‘ is that ‘**+**‘ means “at least one row” while ‘*****‘ means “zero or more rows”.

- When I define B, I use PREV to refer to the previous row.

The last row, with NumVal = 20, is a match where I found one A and no B. If I had said ‘B+’ that row would be unmatched.

#### An example with more syntax

An AskTom reader wanted to “partition the range of sequential numbers”. Unlike our first example, he asked to output every line, showing the range on the last line of each match. Here is the test data and expected output.

create table RAND(N) as select 1 from dual union all select 2 from dual union all select 3 from dual union all select 5 from dual union all select 7 from dual union all select 8 from dual union all select 9 from dual union all select 11 from dual;

N | FIRSTN | LASTN |
---|---|---|

1 | ||

2 | ||

3 | 1 | 3 |

5 | 5 | |

7 | ||

8 | ||

9 | 7 | 9 |

11 | 11 |

To get this result, we have to change `MEASURES`

(for the output columns) and `ROWS PER MATCH`

(for the output rows).

SELECT * FROM rand MATCH_RECOGNIZE ( ORDER BY n MEASURES CASE WHEN n = FINAL LAST(n) THEN a.n END firstn, CASE WHEN n = FINAL LAST(n) and n != a.n THEN n END lastn ALL ROWS PER MATCH PATTERN (a b*) DEFINE b AS n = prev(n)+1 );

`ALL ROWS PER MATCH`

will output every row that is involved in a match. Unmatched rows are not returned. (Don’t worry, there is an option to get unmatched rows if you ever need it.)- Strangely enough, When you ask for ALL ROWS, you get more columns too! Notice I get column N in the output without asking for it in the
`MEASURES`

clause. - In the
`MEASURES`

clause, “N” means the value of N for the current row. “LAST(N)” means the same thing, since the “last” row is actually the last row*so far*. If you want to look ahead to the very last row of the match, you need to say`FINAL LAST`

. - Notice I say “a.n” instead of FIRST(n). I just wanted to show that the pattern identifiers can be used to qualify rows. “a.n” means the most recent row mapped to A, which is also the first row of the match.

#### Is MATCH_RECOGNIZE “better” than Tabibitosan?

Tabibitosan is, in my opinion, the most efficient pre-12c pattern matching method, since it finds the matches with one `SELECT`

using analytic functions. Unfortunately, it solves a limited range of problems and it isn’t that easy to understand and apply properly. `MATCH_RECOGNIZE`

seems easier to use and, as the next post will show, it also solves problems when Tabibitosan is not quite enough.

I see that you are solving algorithems using match_recognize. Thankyou for the example. Could you help me in testing if Maximum subarray problem could be solved using match_recognize ? ( more info at http://en.wikipedia.org/wiki/Maximum_subarray_problem)

Hi Vijay and thanks for your comment.

I think there are two steps in solving this kind of problem: first, find an algorithm that solves the problem, and second, implement the algorithm in SQL. Many times in blogs or forums you see the SQL, but no explanation of what algorithm it implements.

In this case I tried to implement Kadane’s algorithm. I don’t see any way to do this with one single MATCH_RECOGNIZE clause, because the clause doesn’t give you access to the results of previous matches.

Here is my attempt to implement the algorithm using the MODEL clause: