Compare and Sync without Primary Keys

I have written a lot about comparing and synchronizing tables. My examples always had both primary keys and non-key columns, so I could do updates along with inserts and deletes. What about the other tables? Here’s a technique that works for them.

(Here is a list of my posts about comparing and synchronizing tables.)

The Idea

MERGE joins together rows from the target table and the source. The ON clause contains the join conditions. When we do an UPDATE, we can only change columns that are not mentioned in the ON clause.

  • What if we don’t have any primary or unique key? We have nothing to join on.
  • What if we have a primary key but no non-key columns? We can join, but there is nothing to update.

What we can do is DELETE and INSERT. We just need to provide the source (in the USING clause) containing:

  • the rows to delete, identified by ROWID
  • and the rows to insert.

Now, suppose there are duplicate rows, say 2 rows in the source and 3 rows in the target. Should we delete 3 target rows and insert 2 source rows, or just delete 1 target row? I prefer just deleting the 1 row, without doing any extra work.

Test data

I want to create test cases with different numbers of duplicate rows in the target and source tables. Here is a table showing each test case, the number of rows in each table and the number of deletes or inserts I want to do.

COL SOURCE_COUNT TARGET_COUNT INSERT_COUNT DELETE_COUNT
1 0 1 0 1
2 0 2 0 2
3 1 0 1 0
4 1 1 0 0
5 1 2 0 1
6 2 0 2 0
7 2 1 1 0
8 2 2 0 0

 

As you can see, I want to wind up with “8 rows merged”, 4 inserts and 4 deletes.My test data is simple, but the code to generate it is complicated. Please don’t get hung up on this part! Here’s the code:

create table t_target(
  col number
);

create table t_source(
  col number
);

insert first
when type = 't' then into t_target values(col)
when type = 's' then into t_source values(col)
with nums as (
  select level-1 n from dual
  connect by level <= 3
)
, test_cases as (
  select row_number() over(order by s.n, t.n) col,
  s.n s, t.n t
  from nums s, nums t
  where s.n+t.n > 0
)
select 's' type, col
from test_cases,
table(cast(multiset(
  select 0 from dual connect by level <= s) as sys.odcinumberlist
))
where s > 0
union all
select 't' type, col
from test_cases,
table(cast(multiset(
  select 0 from dual connect by level <= t) as sys.odcinumberlist
))
where t > 0;
commit;

Step 1: get all the data and the target ROWIDs

“Old” rows are flagged with -1 and “new” rows with 1.

select col,
  -1 Z##FLAG, rowid Z##RID
from T_TARGET o
union all
select col,
  1 Z##FLAG, null
from T_SOURCE n
order by col, z##flag;
COL Z##FLAG Z##RID
1 -1 AAAX1hAAHAAAQPbA
2 -1 AAAX1hAAHAAAQPbA
2 -1 AAAX1hAAHAAAQPbA
3 1
4 -1 AAAX1hAAHAAAQPbA
4 1
5 -1 AAAX1hAAHAAAQPbA
5 -1 AAAX1hAAHAAAQPbA
5 1
6 1
6 1
7 -1 AAAX1hAAHAAAQPbA
7 1
7 1
8 -1 AAAX1hAAHAAAQPbA
8 -1 AAAX1hAAHAAAQPbA
8 1
8 1

 

Step 2: how many rows to insert or delete

Here I use analytics to compare the rows. I partition by all the columns.

  • From the previous step, Z##FLAG is 1 for new rows and -1 for old ones.
  • Z##NUM_ROWS is the sum of Z##FLAG over the entire partition. so it indicates the number of rows to insert or delete. If Z##NUM_ROWS = 0, nothing needs to be done.
  • Z##NEW is an incremental number assigned to new rows.
  • Z##OLD is an incremental number assigned to old rows.
select
sum(Z##FLAG) over(partition by col) Z##NUM_ROWS,
count(nullif(Z##FLAG,-1)) over(
  partition by col
  order by null rows unbounded preceding
) Z##NEW,
count(nullif(Z##FLAG,1)) over(
  partition by col
  order by null rows unbounded preceding
) Z##OLD,
a.* from (
  select col,
    -1 Z##FLAG, rowid Z##RID
  from T_TARGET o
  union all
  select col,
    1 Z##FLAG, null
  from T_SOURCE n
) a
order by col, z##flag;
Z##NUM_ROWS Z##NEW Z##OLD COL Z##FLAG Z##RID
-1 0 1 1 -1 AAAX1hAAHAAAQPbA
-2 0 1 2 -1 AAAX1hAAHAAAQPbA
-2 0 2 2 -1 AAAX1hAAHAAAQPbA
1 1 0 3 1
0 0 1 4 -1 AAAX1hAAHAAAQPbA
0 1 1 4 1
-1 0 1 5 -1 AAAX1hAAHAAAQPbA
-1 0 2 5 -1 AAAX1hAAHAAAQPbA
-1 1 2 5 1
2 1 0 6 1
2 2 0 6 1
1 0 1 7 -1 AAAX1hAAHAAAQPbA
1 1 1 7 1
1 2 1 7 1
0 0 1 8 -1 AAAX1hAAHAAAQPbA
0 0 2 8 -1 AAAX1hAAHAAAQPbA
0 1 2 8 1
0 2 2 8 1

 

Step 3: Keep only rows of interest

For each partition:

  • I only care when “old” and “new” have different numbers of rows:
    Z##NUM_ROWS != 0
  • If Z##NUM_ROWS is positive, I want only “new” rows, and if it’s negative I want only “old” rows:
    sign(Z##NUM_ROWS) = Z##FLAG
  • I only want enough rows to make the numbers even. For example, if Z##NUM_ROWS is 1 then I want the row where Z##NEW is 1, but not the row where Z##NEW is 2:
    abs(Z##NUM_ROWS) >= case sign(Z##NUM_ROWS) when 1 then Z##NEW else Z##OLD end
select * from (
  select
  sum(Z##FLAG) over(partition by col) Z##NUM_ROWS,
  count(nullif(Z##FLAG,-1)) over(
    partition by col
    order by null rows unbounded preceding
  ) Z##NEW,
  count(nullif(Z##FLAG,1)) over(
    partition by col
    order by null rows unbounded preceding
  ) Z##OLD,
  a.* from (
    select col,
      -1 Z##FLAG, rowid Z##RID
    from T_TARGET o
    union all
    select col,
      1 Z##FLAG, null
    from T_SOURCE n
  ) a
)
where Z##NUM_ROWS != 0
and sign(Z##NUM_ROWS) = Z##FLAG
and abs(Z##NUM_ROWS) >=
  case sign(Z##NUM_ROWS) when 1 then z##new else z##old end;
Z##NUM_ROWS Z##NEW Z##OLD COL Z##FLAG Z##RID
-1 0 1 1 -1 AAAX1hAAHAAAQPbA
-2 0 1 2 -1 AAAX1hAAHAAAQPbA
-2 0 2 2 -1 AAAX1hAAHAAAQPbA
1 1 0 3 1
-1 0 1 5 -1 AAAX1hAAHAAAQPbA
2 1 0 6 1
2 2 0 6 1
1 1 0 7 1

 

Step 4: Use MERGE to delete old and insert new

This time I join Z##RID to the “old” ROWID.

  • When Z##RID is null there will be no match so the row will be inserted.
  • When there is a match, I update a column, any column, because I can’t delete a row unless I have updated it first.
  • Then I delete every row I have updated.

I should explain about the hint “use_nl(o)”. This tells Oracle to use a “nested loop” when joining the lines from step 3 to the target table. If I leave out the hint, Oracle will likely do a full scan. With the hint, Oracle will access the target table “BY USER ROWID”. Use the hint only when there are few rows to change, say around 1%. If you’re not sure, it might be more prudent to remove the hint.

merge /*+ use_nl(o) */into T_TARGET o
  using (
  select * from (
    select
    sum(Z##FLAG) over(partition by col) Z##NUM_ROWS,
    count(nullif(Z##FLAG,-1)) over(
      partition by col
      order by null rows unbounded preceding
    ) Z##NEW,
    count(nullif(Z##FLAG,1)) over(
      partition by col
      order by null rows unbounded preceding
    ) Z##OLD,
    a.* from (
      select col,
        -1 Z##FLAG, rowid Z##RID
      from T_TARGET o
      union all
      select col,
        1 Z##FLAG, null
      from T_SOURCE n
    ) a
  )
  where Z##NUM_ROWS != 0
  and sign(Z##NUM_ROWS) = Z##FLAG
  and abs(Z##NUM_ROWS) >=
    case sign(Z##NUM_ROWS) when 1 then z##new else z##old end
) n
on (o.ROWID = n.Z##RID)
when matched then update set col = n.col
delete where 1=1
when not matched then insert (col)
  values(n.col);

8 rows merged. (the first time)
0 rows merged. (the second time)

Step 5: Generate the MERGE

In the post Compare and sync tables: Generating the code, I generated the code for the GROUP BY and MERGE method. Now I’ll generate the code for this new method. Please see Generating SQL with SQL templates for an explanation of the MULTI_REPLACE package.

VARIABLE P_OLDOWNER varchar2(30)
VARIABLE P_OLDTABLE varchar2(30)
VARIABLE P_NEWSOURCE varchar2(256)
EXEC :P_OLDTABLE := 'T_TARGET';
EXEC :P_NEWSOURCE := 'T_SOURCE';

with INPUT as (
  select UPPER(NVL(:P_OLDOWNER, user)) OLD_OWNER,
  UPPER(:P_OLDTABLE) OLD_TABLE_NAME,
  :P_NEWSOURCE NEW_SOURCE,
  UPPER(NVL2(:P_OLDOWNER, :P_OLDOWNER || '.' || :P_OLDTABLE, :P_OLDTABLE)) OLD_TABLE
  from DUAL
)
, TAB_COLS as (
  select COLUMN_NAME, INTERNAL_COLUMN_ID COLUMN_ID
  from ALL_TAB_COLS, INPUT
  where (OWNER, TABLE_NAME) = ((OLD_OWNER, OLD_TABLE_NAME))
)
, COL_LIST as (
  select LISTAGG(COLUMN_NAME,',') within group(order by COLUMN_ID) ALL_COLS,
  LISTAGG('n.' || COLUMN_NAME,',') within group(order by COLUMN_ID) INSERT_COLS,
  min(COLUMN_NAME) COLUMN_NAME
  from TAB_COLS
)
select MULTI_REPLACE.TO_VARC(
'merge /*+ use_nl(o) */into #OLD_TABLE# o
  using (
  select * from (
    select 
    sum(Z##FLAG) over(partition by #ALL_COLS#) Z##NUM_ROWS,
    count(nullif(Z##FLAG,-1)) over(
      partition by #ALL_COLS#
      order by null rows unbounded preceding
    ) Z##NEW,
    count(nullif(Z##FLAG,1)) over(
      partition by #ALL_COLS#
      order by null rows unbounded preceding
    ) Z##OLD,
    a.* from (
      select #ALL_COLS#,
        -1 Z##FLAG, rowid Z##RID
      from #OLD_TABLE# o
      union all
      select #ALL_COLS#,
        1 Z##FLAG, null
      from #NEW_SOURCE# n
    ) a
  )
  where Z##NUM_ROWS != 0
  and sign(Z##NUM_ROWS) = Z##FLAG
  and abs(Z##NUM_ROWS) >=
    case sign(Z##NUM_ROWS) when 1 then Z##NEW else Z##OLD end
) n
on (o.ROWID = n.Z##RID)
when matched then update set #COLUMN_NAME# = n.#COLUMN_NAME#
delete where 1=1
when not matched then insert
  (#ALL_COLS#)
  values(#INSERT_COLS#);',
SYS.ODCIVARCHAR2LIST(
  '#OLD_TABLE#','#ALL_COLS#','#COLUMN_NAME#','#NEW_SOURCE#','#INSERT_COLS#'
),
SYS.ODCIVARCHAR2LIST(
    OLD_TABLE,    ALL_COLS,    COLUMN_NAME,    NEW_SOURCE,    INSERT_COLS
)
) SQL_TEXT
from INPUT, COL_LIST;

Conclusion

This method of synchronizing tables works with any combination of primary key and non-key fields. Most of the time, you will have tables with both primary keys and non-key fields; for those tables, the GROUP BY method is more efficient. For the others, you now have a solution so you’re all set.

Advertisements

2 thoughts on “Compare and Sync without Primary Keys

  1. Hi Steve, I have similar requirement where I need to compare two tables without primary key column. The changed records are required to be copied to another table with a flag ‘I’ for insert, ‘D’ for Deletes and ‘U’ for updates. Can you please suggest if this can be achieved? Many thanks in advance

    • Hello RJ,

      I don’t understand your requirement for a simple reason: when there is no primary or unique key, there is no way to UPDATE a row. How do you identify the row to be updated?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s