# MATCH_RECOGNIZE: matching based on aggregates

The heart of row pattern matching is finding which row matches what part of the pattern. Within the 12c `MATCH_RECOGNIZE` clause, `DEFINE` lists the conditions a row may meet; it doesn’t always work the way you expect, especially if you use aggregates in the condition.

#### How DEFINE works

The `PATTERN` clause lists the conditions that have to be met by a series of rows in order to match the pattern. The `DEFINE` clause defines each condition. For example:

```PATTERN (A B)
DEFINE
A as A.SOME_COLUMN = 'First row',
B as A.SOME_COLUMN = 'First row' and B.SOME_COLUMN = 'Second row'
```

Do you see something strange here?

• In the “A” condition, `A.SOME_COLUMN` refers to the row being evaluated.
• In the “B” condition, `A.SOME_COLUMN` refers to a row that has already been accepted as meeting the “A” condition. `B.SOME_COLUMN` refers to the row being evaluated.

`DEFINE` starts by assuming the current row meets the condition. Then it evaluates the condition. If TRUE, the current row gets assigned. If FALSE, the current row does not get assigned to that condition.

#### Conditions with Aggregates

When you use an aggregate in `DEFINE`, you have to know when the aggregate will include the current row being evaluated. Here is an example:

```PATTERN (A B)
DEFINE
A as count(B.*) = 0,
B as count(B.*) = 0
```

When the first row is evaluated, there are no B rows yet, and the current row is assumed to be part of “A”, so the “A” condition will be met. When the “B” condition is evaluated, the current row is assumed to be part of “B”, so `count(B.*)` will always be greater than 0 and the “B” condition will never be met.

#### Excluding the current row

If you use an aggregate that includes the current row’s value, and you want that row excluded, you have to do it yourself:

• For `COUNT()`, subtract 1
• For `SUM()`, subtract the value of the current row.

#### What about MIN() or MAX()?

I hope I haven’t lost you already, because the hard part is coming up. It’s also the interesting part, so take a deep breath and read on.

You can’t just “exclude” the current row from a `MIN()` or `MAX()` function. You have to keep track of which row has the minimum or maximum value, and then refer to that row. Here’s an example.

Let’s say I have stock market summaries each day: for each stock, I get the date, the closing price and the quantity traded.

```create table ticker (
stock_id integer,
closing_price number not null,
qty integer not null,
);
insert into ticker select
1, sysdate, 20, 100 from dual union all select
1, sysdate+1, 30, 90 from dual union all select
1, sysdate+2, 15, 200 from dual union all select
1, sysdate+3, 14, 80 from dual union all select
1, sysdate+4, 20, 200 from dual;```
1 2015-07-02 07:23:37 20 100
1 2015-07-03 07:23:37 30 90
1 2015-07-04 07:23:37 15 200
1 2015-07-05 07:23:37 14 80
1 2015-07-06 07:23:37 20 200

I want to know when the stock hits a new low in price, or quantity, or both. To do this, I need to compare the current row’s price and quantity with the minimums up to but not including the current row.

I do this by defining four conditions:

• PQ: a row containing a new low price and a new low quantity;
• P: row with a new low price but not a new low quantity;
• Q: row with a new low quantity but not a new low price;
• OK: row without any new lows.

I also define two `SUBSET`s:

• LOWP includes PQ and P rows: the most recent LOWP row has the minimum price so far.
• LOWQ includes PQ and Q rows: the most recent LOWQ row has the minimum quantity so far.

Now, how do I access the row just before the current one? If I compare price to `LAST(LOWP.price)`, I will be comparing the current price to itself.

To get the prior row, I just say `LAST(LOWP.price,1)`. This backs up exactly one row.

```select * from ticker
match_recognize(
partition by stock_id order by trade_date
measures classifier() new_low
all rows per match
pattern(pq (pq|p|q|ok)*)
subset lowp = (pq,p), lowq = (pq,q)
define
pq as count(*) = 1 or (
closing_price < last(lowp.closing_price,1) and qty < last(lowq.qty,1)
),
p as closing_price < last(lowp.closing_price,1),
q as qty < last(lowq.qty,1)
);
```
1 2015-07-02 07:23:37 PQ 20 100
1 2015-07-03 07:23:37 Q 30 90
1 2015-07-04 07:23:37 P 15 200
1 2015-07-05 07:23:37 PQ 14 80
1 2015-07-06 07:23:37 OK 20 200

## 2 thoughts on “MATCH_RECOGNIZE: matching based on aggregates”

1. A minor typo in your last example’s “q” pattern definition:

q as qty < last(lowq.closing_price,1)

should be:

q as qty < last(lowq.qty,1)

• Hi Gerard,

Thanks for the heads up! I have changed the input data to better test the code, and I have corrected the code.

Best regards, Stew