When you want to get a target table in sync with a source table, sometimes you have to compare the tables to obtain the differences. Other times the source system provides you with the differences and you just need to apply them. MERGE
can help you do that.
(Here is a list of my posts about comparing and synchronizing tables.)
Situation
- Target and source have the same columns.
- The target table is to receive all the changes from the source, including deletes.
- Target and source have primary keys and additional columns, so updates are possible.
- The change table contains all the columns in changed source rows, plus an OPERATION column with 3 possible values:
- ‘I’ if the row was inserted in the source
- ‘U’ if the row was updated in the source.
- ‘D’ if the row was deleted in the source.
Test data
To cover every possibility, I have 9 test cases:
Operation | Target row | Data same | Action | # |
---|---|---|---|---|
I | Present | Yes | Do nothing | 1 |
I | Present | No | Update | 2 |
U | Present | Yes | Do nothing | 3 |
U | Present | No | Update | 4 |
D | Present | Delete | 5 | |
(absent) | Present | Do nothing | 6 | |
I | Absent | Insert | 7 | |
U | Absent | Insert | 8 | |
D | Absent | Do nothing | 9 |
I want my solution to be robust and to do as little work as possible. I want to be able to run the same code a second time and have nothing happen, since the target table is already synchronized. For these reasons, I treat the ‘I’ and ‘U’ operations exactly the same: the MERGE
statement will do an UPSERT if the rows are different, and nothing if the rows are already the same.
Here is the test data I came up with:
create table t_target(ckey, cvalue) as select 1, 'I, target = cdc > do nothing' from dual union all select 2, 'I, target != cdc > update OLD' from dual union all select 3, 'U, target = cdc > do nothing' from dual union all select 4, 'U, target != cdc > update OLD' from dual union all select 5, 'D > delete' from dual union all select 6, 'target not in cdc > do nothing' from dual;
CKEY | CVALUE |
---|---|
1 | I, target = cdc > do nothing |
2 | I, target != cdc > update OLD |
3 | U, target = cdc > do nothing |
4 | U, target != cdc > update OLD |
5 | D > delete |
6 | target not in cdc > do nothing |
drop table cdc purge; create table cdc(operation, ckey, cvalue) as select 'I', 1, 'I, target = cdc > do nothing' from dual union all select 'I', 2, 'I, target != cdc > update NEW' from dual union all select 'U', 3, 'U, target = cdc > do nothing' from dual union all select 'U', 4, 'U, target != cdc > update NEW' from dual union all select 'D', 5, 'D > delete' from dual union all select 'I', 7, 'I not in target > Insert' from dual union all select 'U', 8, 'U not in target > Insert' from dual union all select 'D', 9, 'D not in target > do nothing' from dual;
OPERATION | CKEY | CVALUE |
---|---|---|
I | 1 | I, target = cdc > do nothing |
I | 2 | I, target != cdc > update NEW |
U | 3 | U, target = cdc > do nothing |
U | 4 | U, target != cdc > update NEW |
D | 5 | D > delete |
I | 7 | I not in target > Insert |
U | 8 | U not in target > Insert |
D | 9 | D not in target > do nothing |
First try: DELETE with WHERE
Remember, I will always run each MERGE
statement twice to make sure no unnecessary work is done.
merge into t_target o using cdc n on (o.ckey = n.ckey) when matched then update set cvalue = n.cvalue delete where n.operation = 'D' when not matched then insert (ckey, cvalue) values(n.ckey, n.cvalue);
CKEY | CVALUE |
---|---|
1 | I, target = cdc > do nothing |
2 | I, target != cdc > update NEW |
3 | U, target = cdc > do nothing |
4 | U, target != cdc > update NEW |
6 | target not in cdc > do nothing |
7 | I not in target > Insert |
8 | U not in target > Insert |
9 | D not in target > do nothing |
Well, that didn’t work out too well: row 9 was supposed to be deleted, but it was already gone from the target table so I actually inserted it!
Second try: WHERE with DELETE and with INSERT
Let’s tell the MERGE
not to insert rows with ‘D’ flags:
merge into t_target o using cdc n on (o.ckey = n.ckey) when matched then update set cvalue = n.cvalue delete where n.operation = 'D' when not matched then insert (ckey, cvalue) values(n.ckey, n.cvalue) where n.operation != 'D'; 7 rows merged.
CKEY | CVALUE |
---|---|
1 | I, target = cdc > do nothing |
2 | I, target != cdc > update NEW |
3 | U, target = cdc > do nothing |
4 | U, target != cdc > update NEW |
6 | target not in cdc > do nothing |
7 | I not in target > Insert |
8 | U not in target > Insert |
Well, that’s better. I merged 7 rows the first time and the target table looks good. Unfortunately, when I run the statement again I get “6 rows merged” for no good reason.
Third try: WHERE with UPDATE, DELETE and INSERT
merge into t_target o using cdc n on (o.ckey = n.ckey) when matched then update set cvalue = n.cvalue where n.operation = 'D' or decode(o.cvalue,n.cvalue,0,1) = 1 delete where n.operation = 'D' when not matched then insert (ckey, cvalue) values(n.ckey, n.cvalue) where n.operation != 'D'; 5 rows merged. (first time) 0 rows merged. (second time)
The trick is to only update rows with changes – or with the ‘D’ flag since you can’t delete a row if it hasn’t been updated first. Please note the DECODE
trick, which compares the columns correctly even if one or both values are NULL
.