Compare and sync tables: Generating the code

[Update 2015-01-04: I now get the primary key columns from ALL_CONS_COLUMNS, not ALL_IND_COLUMNS.]
[Update 2014-11-29: I simplified the code a bit and changed the C## prefix to Z##. Oracle uses the C## prefix for “common users” in CDBs, and I wanted to avoid any confusion.]

(Here is a list of my posts about comparing and synchronizing tables.)

To continue my presentation of the GROUP BY + MERGE technique, here is a query that generates the SQL to synchronize two tables. You just need to specify the source and the target; the source can be a local or remote table or view, or a query. The target must be a local table (or view!) with a primary key.

The starting point is the MERGE statement presented at the end of my previous post.

merge into T_TARGET o
using (
  select * from (
    select KEY_NUM, KEY_DATE, VAL_TS, VAL_STRING,
    count(*) over(partition by KEY_NUM, KEY_DATE)
      - sum(Z##CNT) Z##IUD_FLAG
    FROM (
      select KEY_NUM, KEY_DATE, VAL_TS, VAL_STRING,
        -1 Z##CNT
      from T_TARGET o
      union all
      select KEY_NUM, KEY_DATE, VAL_TS, VAL_STRING,
        1 Z##CNT
      from T_SOURCE n
    )
    group by KEY_NUM, KEY_DATE, VAL_TS, VAL_STRING
    having sum(Z##CNT) != 0
  )
  where Z##IUD_FLAG < 3
) n
on (o.KEY_NUM = n.KEY_NUM and o.KEY_DATE = n.KEY_DATE)
when matched then update
  set VAL_TS = n.VAL_TS, VAL_STRING = n.VAL_STRING
  delete where n.Z##IUD_FLAG = 2
when not matched then insert
  (KEY_NUM, KEY_DATE, VAL_TS, VAL_STRING)
  values(n.KEY_NUM, n.KEY_DATE, n.VAL_TS, n.VAL_STRING);

As I mentioned in Generating SQL with SQL templates, we need to identify the elements that change and represent them with placeholders:

  • ‘#OLD_TABLE#’: the target table name, with or without the preceding owner name
  • ‘#ALL_COLS#’: a list of all the non-virtual columns, taken from the target table
  • ‘#PK_COLS#’: a list of all the columns in the target primary key
  • ‘#NEW_SOURCE#’: the table, view – or query in parentheses – representing the source
  • ‘#ON_COLS#’: the primary key columns, used in the ON clause of the MERGE statement
  • ‘#SET_COLS#’: the columns not in the primary key, used in the SET clause
  • ‘#INSERT_COLS#’: a list of all the columns used in the VALUES clause of the INSERT

Using as input the “old” table name (and optionally the “old” owner name), we can get the column names from the ALL_TAB_COLS view. The primary key columns come from the ALL_CONS_COLUMNS view, once we get the name of the primary key index – which is available in the ALL_CONSTRAINTS view.

VARIABLE P_OLDOWNER varchar2(30)
VARIABLE P_OLDTABLE varchar2(30)
VARIABLE P_NEWSOURCE varchar2(256)
EXEC :P_OLDTABLE := 'T_TARGET';
EXEC :P_NEWSOURCE := 'T_SOURCE';

with INPUT as (
  select UPPER(NVL(:P_OLDOWNER, user)) OLD_OWNER,
  UPPER(:P_OLDTABLE) OLD_TABLE_NAME,
  :P_NEWSOURCE NEW_SOURCE,
  UPPER(NVL2(:P_OLDOWNER, :P_OLDOWNER || '.' || :P_OLDTABLE, :P_OLDTABLE)) OLD_TABLE
  from DUAL
)
, TAB_COLS as (
  select COLUMN_NAME, INTERNAL_COLUMN_ID COLUMN_ID
  from ALL_TAB_COLS, INPUT
  where (OWNER, TABLE_NAME) = ((OLD_OWNER, OLD_TABLE_NAME))
)
, KEY_COLS as (
  select COLUMN_NAME, POSITION
  from ALL_CONS_COLUMNS, INPUT
  where (OWNER, CONSTRAINT_NAME) = (
    select OWNER, CONSTRAINT_NAME from ALL_CONSTRAINTS
    where (OWNER, TABLE_NAME, CONSTRAINT_TYPE) = ((OLD_OWNER, OLD_TABLE_NAME, 'P'))
  )
)
, COL_LIST as (
  select LISTAGG(COLUMN_NAME,',') within group(order by COLUMN_ID) ALL_COLS,
  LISTAGG('n.' || COLUMN_NAME,',') within group(order by COLUMN_ID) INSERT_COLS
  from TAB_COLS
)
, PK_LIST as (
  select LISTAGG(COLUMN_NAME,',') within group(order by POSITION) PK_COLS,
  LISTAGG('o.'||COLUMN_NAME||'=n.'||COLUMN_NAME,' and ')
    within group(order by POSITION) ON_COLS
  from KEY_COLS
)
, SET_LIST as (
  select LISTAGG(COLUMN_NAME || '=n.'||COLUMN_NAME,',')
    within group(order by COLUMN_ID) SET_COLS
  from TAB_COLS
  where COLUMN_NAME not in (select COLUMN_NAME from KEY_COLS)
)
select MULTI_REPLACE.TO_VARC(
'merge into #OLD_TABLE# O
using (
  select * from (
    select #ALL_COLS#,
    COUNT(*) over(partition by #PK_COLS#)
      - SUM(Z##_CNT) Z##IUD_FLAG
    from (
      select #ALL_COLS#,
        -1 Z##_CNT
      from #OLD_TABLE# O
      union all
      select #ALL_COLS#,
        1 Z##_CNT
      from #NEW_SOURCE# N
    )
    group by #ALL_COLS#
    having SUM(Z##_CNT) != 0
  )
  where Z##IUD_FLAG < 3
) N
on (#ON_COLS#)
when matched then update
  set #SET_COLS#
  delete where N.Z##IUD_FLAG = 2
when not matched then insert
  (#ALL_COLS#)
  values(#INSERT_COLS#)',
SYS.ODCIVARCHAR2LIST('#OLD_TABLE#','#ALL_COLS#','#PK_COLS#',
  '#NEW_SOURCE#','#ON_COLS#','#SET_COLS#','#INSERT_COLS#'),
SYS.ODCIVARCHAR2LIST(  OLD_TABLE,    ALL_COLS,    PK_COLS,
    NEW_SOURCE,    ON_COLS,    SET_COLS,    INSERT_COLS)
) SQL_TEXT
from INPUT, COL_LIST, PK_LIST, SET_LIST;

Here is the result, ready to execute:

merge into T_TARGET o
using (
  select * from (
    select KEY_NUM,KEY_DATE,VAL_TS,VAL_STRING,
    count(*) over(partition by KEY_NUM,KEY_DATE)
      - sum(Z##_CNT) Z##IUD_FLAG
    FROM (
      select KEY_NUM,KEY_DATE,VAL_TS,VAL_STRING,
        -1 Z##_CNT
      from T_TARGET o
      union all
      select KEY_NUM,KEY_DATE,VAL_TS,VAL_STRING,
        1 Z##_CNT
      from T_SOURCE n
    )
    group by KEY_NUM,KEY_DATE,VAL_TS,VAL_STRING
    having sum(Z##_CNT) != 0
  )
  where Z##IUD_FLAG < 3
) n
on (o.KEY_NUM=n.KEY_NUM and o.KEY_DATE=n.KEY_DATE)
when matched then update
  set VAL_TS=n.VAL_TS,VAL_STRING=n.VAL_STRING
  delete where n.Z##IUD_FLAG = 2
when not matched then insert
  (KEY_NUM,KEY_DATE,VAL_TS,VAL_STRING)
  values(n.KEY_NUM,n.KEY_DATE,n.VAL_TS,n.VAL_STRING);

Limitations and disclaimer

There are some situations that this utility does not cover: the target table may not have a primary key, or it may be in a remote database; also, some data types such as LOBs cannot be “grouped by”. There are some possible workarounds that would require enhancements to what I’ve presented here.

Above all, please be careful! All this tool does is generate some SQL code; please review and test the generated code carefully before using it on real data! I cannot share any responsibility for changes you make to your data; once the SQL is generated, it is yours and yours alone. To use or not to use: that is your question.

Of course, if you find any bugs in the tool, please leave a reply with details so I can correct my mistakes. Thanks in advance!

Advertisement

8 thoughts on “Compare and sync tables: Generating the code

  1. I have been using the merge technique for a long time and generating sql statements to do this for “any table” in pl/sql. I have just finished converting to a group by method and a nice multi_replace query. Loving the performance!
    Thanks to you and others for your work on this.

    Any pointers for handling LOBs as you mentioned?

    • Hi Neil,

      I’m very glad this helped, and thanks for your kind words. As for handling LOBs, the only thing I have done is to create a view on the source table that contains the first 4000 bytes of the LOB as a varchar2. The target only contains the varchar2, not the LOB. If you want more than that, it would be best to say in detail what you want to do, how you mean to compare and whether your database character set is single byte or multibyte (UTF8). Also, are the source and target tables in the same database or not? Pulling LOBs over a dblink to compare is going to take a long time!

      • Thanks for getting back Stewart. Was away for a few days so just getting back to this now.

        Typically, I don’t actually generate a merge statement using your code, instead I generate a statement to insert (IUDs) into a “changes table”. This changes table is further processed along with the source table and target table providing a “field/update changes” table, with new_val ,old_val info for updates. Change info is extracted from this for a whole series of downstream systems. This technology has been a good lowest common denominator in term of the range of systems it feeds. (We also utilize more “modern” approaches)
        Anyway, It sounds like a lot of work but works well enough and thanks to your code and some other bits and pieces I have put together over the years – trivial to setup – using pl/sql to generate to generate the sql or more recently a variation on your query above.

        So one of the source tables now has has an NCLOB. I think my database is single byte but NCLOB means that it can store multibyte or at least another characterset?? (not too sure on that)

        I pull this nclob from a remote table using a global temporary table. (Not ideal I guess – but the blobs are just text and typically short enough – and the source table itself is not huge.).

        I was hoping for some magic that I could add that would treat this as any other column BUT I have started converting the NCLOB using a view for comparison later as you suggest. Ill probably just re-blobify the column on the way out.

        Obviously all this is against a backdrop of it being ideal to keep the code generic and one solution fits all (when dealing with the new and future BLOB requirements).

        Thanks again for sharing your work

  2. Hi Stew,

    I am so glad I found your site as I was using Tom Kyte’s GROUP BY method, but have done a small test on my data and see that your MERGE approach yields far better performance and utilises less resources on one of our largest tables in our application. Now I want to take this a step further by somehow looping through a list of tables stored in my schema that I wish to compare by repeatedly using your MERGE technique to automate and report the change.

    I was looking at using PL/SQL so I was wondering if you have a way to do this in PL/SQL. Another issue/request I have is to be able to compare data betwenn tables that have a unique key but not defined as a primary key. Not sure if using the unique key will yield the correct results in the compare.

    I look forward to your reply.

    Best regards,
    Bal

  3. Hi Bal,

    Thanks for the kind words.Starting with your second question, a unique key would work as long as the column also has a NOT NULL constraint. Oracle does NOT enforce uniqueness when the column in the constraint is NULL.

    You can certainly use PL/SQL to loop through a set of tables and do the MERGE. In fact, I have done this for two different applications in my shop. I may blog about this in the near future. Do these tables have foreign key constraints? During the time you are doing the MERGE, is there any other DML being applied either the source or the target tables?

  4. Hi Stew,

    Happy New Year to you. Apologies for not being in touch sooner to answer your question. OK, so I’ve managed to apply your MERGE approach to a set of tables in my application where the tables have a primary key constraint, however I have circa 70 tables that have unique key constraints. These unique keys are composite keys i.e nullable and not nullable columns. I am thinking that the above code will not work with this type of index on the tables?

    To answer your question, when performing the MERGE on the tables in my schema, I disable all triggers and FK constraints before the MERGE and enable them after the job is complete. I am not working on a production system, so there is no detrimental effect.

    I would appreciate some insight into the unique key possibilities with your tool before I decide upon another approach for the outstanding 70 tables.

    N.B. I also noticed that you changed the query that the merge uses to generated the differences. I tested new v’s old on the table with a primary key and known differences and the result is the same :-)

    I look forward to your reply.

    Cheers,
    Bal

    • Hi Bal,

      The changes to the SQL “template” make it a bit more concise, that is all. There should be no change in performance.

      The changes to the SQL generator allow it to generate correct SQL when the target is a view, not a table. If it works for you already, it will still work.

      I only use the primary key when I generate the MERGE statement.

      When the MERGE actually runs, there is no need for any constraint. However, if there are duplicate rows in the source, the statement will fail.

      Also, if there are any NULL values in any of the columns used in the ON clause, the comparison will not do what you want. Oracle does not consider two NULL values to be “equal”.

      Are you sure that in every unique constraint there is at least one column defined as NOT NULL?

      I just posted a more detailed answer to your question here: Sync tables: generate MERGE using Unique constraint

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s