MERGE using Change Data Capture (CDC)

When you want to get a target table in sync with a source table, sometimes you have to compare the tables to obtain the differences. Other times the source system provides you with the differences and you just need to apply them. MERGE can help you do that.

(Here is a list of my posts about comparing and synchronizing tables.)

Situation

  • Target and source have the same columns.
  • The target table is to receive all the changes from the source, including deletes.
  • Target and source have primary keys and additional columns, so updates are possible.
  • The change table contains all the columns in changed source rows, plus an OPERATION column with 3 possible values:
    • ‘I’ if the row was inserted in the source
    • ‘U’ if the row was updated in the source.
    • ‘D’ if the row was deleted in the source.

Test data

To cover every possibility, I have 9 test cases:

Operation Target row Data same Action #
I Present Yes Do nothing 1
I Present No Update 2
U Present Yes Do nothing 3
U Present No Update 4
D Present Delete 5
(absent) Present Do nothing 6
I Absent Insert 7
U Absent Insert 8
D Absent Do nothing 9

I want my solution to be robust and to do as little work as possible. I want to be able to run the same code a second time and have nothing happen, since the target table is already synchronized. For these reasons, I treat the ‘I’ and ‘U’ operations exactly the same: the MERGE statement will do an UPSERT if the rows are different, and nothing if the rows are already the same.

Here is the test data I came up with:

create table t_target(ckey, cvalue) as
select 1, 'I, target = cdc > do nothing' from dual union all
select 2, 'I, target != cdc > update OLD' from dual union all
select 3, 'U, target = cdc > do nothing' from dual union all
select 4, 'U, target != cdc > update OLD' from dual union all
select 5, 'D > delete' from dual union all
select 6, 'target not in cdc > do nothing' from dual;
CKEY CVALUE
1 I, target = cdc > do nothing
2 I, target != cdc > update OLD
3 U, target = cdc > do nothing
4 U, target != cdc > update OLD
5 D > delete
6 target not in cdc > do nothing
drop table cdc purge;
create table cdc(operation, ckey, cvalue) as
select 'I', 1, 'I, target = cdc > do nothing' from dual union all
select 'I', 2, 'I, target != cdc > update NEW' from dual union all
select 'U', 3, 'U, target = cdc > do nothing' from dual union all
select 'U', 4, 'U, target != cdc > update NEW' from dual union all
select 'D', 5, 'D > delete' from dual union all
select 'I', 7, 'I not in target > Insert' from dual union all
select 'U', 8, 'U not in target > Insert' from dual union all
select 'D', 9, 'D not in target > do nothing' from dual;
OPERATION CKEY CVALUE
I 1 I, target = cdc > do nothing
I 2 I, target != cdc > update NEW
U 3 U, target = cdc > do nothing
U 4 U, target != cdc > update NEW
D 5 D > delete
I 7 I not in target > Insert
U 8 U not in target > Insert
D 9 D not in target > do nothing

First try: DELETE with WHERE

Remember, I will always run each MERGE statement twice to make sure no unnecessary work is done.

merge into t_target o
using cdc n
on (o.ckey = n.ckey)
when matched then update
  set cvalue = n.cvalue
delete where n.operation = 'D'
when not matched then insert
  (ckey, cvalue)
  values(n.ckey, n.cvalue);
CKEY CVALUE
1 I, target = cdc > do nothing
2 I, target != cdc > update NEW
3 U, target = cdc > do nothing
4 U, target != cdc > update NEW
6 target not in cdc > do nothing
7 I not in target > Insert
8 U not in target > Insert
9 D not in target > do nothing

Well, that didn’t work out too well: row 9 was supposed to be deleted, but it was already gone from the target table so I actually inserted it!

Second try: WHERE with DELETE and with INSERT

Let’s tell the MERGE not to insert rows with ‘D’ flags:

merge into t_target o
using cdc n
on (o.ckey = n.ckey)
when matched then update
  set cvalue = n.cvalue
delete where n.operation = 'D'
when not matched then insert
  (ckey, cvalue)
  values(n.ckey, n.cvalue)
  where n.operation != 'D';

7 rows merged.
CKEY CVALUE
1 I, target = cdc > do nothing
2 I, target != cdc > update NEW
3 U, target = cdc > do nothing
4 U, target != cdc > update NEW
6 target not in cdc > do nothing
7 I not in target > Insert
8 U not in target > Insert

Well, that’s better. I merged 7 rows the first time and the target table looks good. Unfortunately, when I run the statement again I get “6 rows merged” for no good reason.

Third try: WHERE with UPDATE, DELETE and INSERT

merge into t_target o
using cdc n
on (o.ckey = n.ckey)
when matched then update
  set cvalue = n.cvalue
  where n.operation = 'D'
  or decode(o.cvalue,n.cvalue,0,1) = 1
delete where n.operation = 'D'
when not matched then insert
  (ckey, cvalue)
  values(n.ckey, n.cvalue)
  where n.operation != 'D';

5 rows merged. (first time)
0 rows merged. (second time)

The trick is to only update rows with changes – or with the ‘D’ flag since you can’t delete a row if it hasn’t been updated first. Please note the DECODE trick, which compares the columns correctly even if one or both values are NULL.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s