MATCH_RECOGNIZE: matching based on aggregates

The heart of row pattern matching is finding which row matches what part of the pattern. Within the 12c MATCH_RECOGNIZE clause, DEFINE lists the conditions a row may meet; it doesn’t always work the way you expect, especially if you use aggregates in the condition.

How DEFINE works

The PATTERN clause lists the conditions that have to be met by a series of rows in order to match the pattern. The DEFINE clause defines each condition. For example:

PATTERN (A B)
DEFINE
  A as A.SOME_COLUMN = 'First row',
  B as A.SOME_COLUMN = 'First row' and B.SOME_COLUMN = 'Second row'

Do you see something strange here?

  • In the “A” condition, A.SOME_COLUMN refers to the row being evaluated.
  • In the “B” condition, A.SOME_COLUMN refers to a row that has already been accepted as meeting the “A” condition. B.SOME_COLUMN refers to the row being evaluated.

DEFINE starts by assuming the current row meets the condition. Then it evaluates the condition. If TRUE, the current row gets assigned. If FALSE, the current row does not get assigned to that condition.

Conditions with Aggregates

When you use an aggregate in DEFINE, you have to know when the aggregate will include the current row being evaluated. Here is an example:

PATTERN (A B)
DEFINE
  A as count(B.*) = 0,
  B as count(B.*) = 0

When the first row is evaluated, there are no B rows yet, and the current row is assumed to be part of “A”, so the “A” condition will be met. When the “B” condition is evaluated, the current row is assumed to be part of “B”, so count(B.*) will always be greater than 0 and the “B” condition will never be met.

Excluding the current row

If you use an aggregate that includes the current row’s value, and you want that row excluded, you have to do it yourself:

  • For COUNT(), subtract 1
  • For SUM(), subtract the value of the current row.

What about MIN() or MAX()?

I hope I haven’t lost you already, because the hard part is coming up. It’s also the interesting part, so take a deep breath and read on.

You can’t just “exclude” the current row from a MIN() or MAX() function. You have to keep track of which row has the minimum or maximum value, and then refer to that row. Here’s an example.

Let’s say I have stock market summaries each day: for each stock, I get the date, the closing price and the quantity traded.

create table ticker (
  stock_id integer,
  trade_date date,
  closing_price number not null,
  qty integer not null,
  primary key(stock_id, trade_date)
);
insert into ticker select 
1, sysdate, 20, 100 from dual union all select
1, sysdate+1, 30, 90 from dual union all select
1, sysdate+2, 15, 200 from dual union all select
1, sysdate+3, 14, 80 from dual union all select
1, sysdate+4, 20, 200 from dual;
STOCK_ID TRADE_DATE CLOSING_PRICE QTY
1 2015-07-02 07:23:37 20 100
1 2015-07-03 07:23:37 30 90
1 2015-07-04 07:23:37 15 200
1 2015-07-05 07:23:37 14 80
1 2015-07-06 07:23:37 20 200

 
I want to know when the stock hits a new low in price, or quantity, or both. To do this, I need to compare the current row’s price and quantity with the minimums up to but not including the current row.

I do this by defining four conditions:

  • PQ: a row containing a new low price and a new low quantity;
  • P: row with a new low price but not a new low quantity;
  • Q: row with a new low quantity but not a new low price;
  • OK: row without any new lows.

I also define two SUBSETs:

  • LOWP includes PQ and P rows: the most recent LOWP row has the minimum price so far.
  • LOWQ includes PQ and Q rows: the most recent LOWQ row has the minimum quantity so far.

Now, how do I access the row just before the current one? If I compare price to LAST(LOWP.price), I will be comparing the current price to itself.

To get the prior row, I just say LAST(LOWP.price,1). This backs up exactly one row.

select * from ticker
match_recognize(
  partition by stock_id order by trade_date
  measures classifier() new_low
  all rows per match
  pattern(pq (pq|p|q|ok)*)
  subset lowp = (pq,p), lowq = (pq,q)
  define
    pq as count(*) = 1 or (
      closing_price < last(lowp.closing_price,1) and qty < last(lowq.qty,1)
    ),
    p as closing_price < last(lowp.closing_price,1),
    q as qty < last(lowq.qty,1)
);
STOCK_ID TRADE_DATE NEW_LOW CLOSING_PRICE QTY
1 2015-07-02 07:23:37 PQ 20 100
1 2015-07-03 07:23:37 Q 30 90
1 2015-07-04 07:23:37 P 15 200
1 2015-07-05 07:23:37 PQ 14 80
1 2015-07-06 07:23:37 OK 20 200

 

Advertisements

2 thoughts on “MATCH_RECOGNIZE: matching based on aggregates

    • Hi Gerard,

      Thanks for the heads up! I have changed the input data to better test the code, and I have corrected the code.

      Best regards, Stew

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s