The heart of row pattern matching is finding which row matches what part of the pattern. Within the 12c
DEFINE lists the conditions a row may meet; it doesn’t always work the way you expect, especially if you use aggregates in the condition.
How DEFINE works
PATTERN clause lists the conditions that have to be met by a series of rows in order to match the pattern. The
DEFINE clause defines each condition. For example:
PATTERN (A B) DEFINE A as A.SOME_COLUMN = 'First row', B as A.SOME_COLUMN = 'First row' and B.SOME_COLUMN = 'Second row'
Do you see something strange here?
- In the “A” condition,
A.SOME_COLUMNrefers to the row being evaluated.
- In the “B” condition,
A.SOME_COLUMNrefers to a row that has already been accepted as meeting the “A” condition.
B.SOME_COLUMNrefers to the row being evaluated.
DEFINE starts by assuming the current row meets the condition. Then it evaluates the condition. If TRUE, the current row gets assigned. If FALSE, the current row does not get assigned to that condition.
Conditions with Aggregates
When you use an aggregate in
DEFINE, you have to know when the aggregate will include the current row being evaluated. Here is an example:
PATTERN (A B) DEFINE A as count(B.*) = 0, B as count(B.*) = 0
When the first row is evaluated, there are no B rows yet, and the current row is assumed to be part of “A”, so the “A” condition will be met. When the “B” condition is evaluated, the current row is assumed to be part of “B”, so
count(B.*) will always be greater than 0 and the “B” condition will never be met.
Excluding the current row
If you use an aggregate that includes the current row’s value, and you want that row excluded, you have to do it yourself:
COUNT(), subtract 1
SUM(), subtract the value of the current row.
What about MIN() or MAX()?
I hope I haven’t lost you already, because the hard part is coming up. It’s also the interesting part, so take a deep breath and read on.
You can’t just “exclude” the current row from a
MAX() function. You have to keep track of which row has the minimum or maximum value, and then refer to that row. Here’s an example.
Let’s say I have stock market summaries each day: for each stock, I get the date, the closing price and the quantity traded.
create table ticker ( stock_id integer, trade_date date, closing_price number not null, qty integer not null, primary key(stock_id, trade_date) ); insert into ticker select 1, sysdate, 20, 100 from dual union all select 1, sysdate+1, 30, 90 from dual union all select 1, sysdate+2, 15, 200 from dual union all select 1, sysdate+3, 14, 80 from dual union all select 1, sysdate+4, 20, 200 from dual;
I want to know when the stock hits a new low in price, or quantity, or both. To do this, I need to compare the current row’s price and quantity with the minimums up to but not including the current row.
I do this by defining four conditions:
- PQ: a row containing a new low price and a new low quantity;
- P: row with a new low price but not a new low quantity;
- Q: row with a new low quantity but not a new low price;
- OK: row without any new lows.
I also define two
- LOWP includes PQ and P rows: the most recent LOWP row has the minimum price so far.
- LOWQ includes PQ and Q rows: the most recent LOWQ row has the minimum quantity so far.
Now, how do I access the row just before the current one? If I compare price to
LAST(LOWP.price), I will be comparing the current price to itself.
To get the prior row, I just say
LAST(LOWP.price,1). This backs up exactly one row.
select * from ticker match_recognize( partition by stock_id order by trade_date measures classifier() new_low all rows per match pattern(pq (pq|p|q|ok)*) subset lowp = (pq,p), lowq = (pq,q) define pq as count(*) = 1 or ( closing_price < last(lowp.closing_price,1) and qty < last(lowq.qty,1) ), p as closing_price < last(lowp.closing_price,1), q as qty < last(lowq.qty,1) );