More on Comparing Columns across Rows

My previous post got several replies about other ways to compare columns across rows. Here’s my take on all of them.

Test data (again)

create table t as
with data as (
  select level n from dual connect by level <= 3
)
, tests as (
  select row_number() over (order by a.n, b.n) test_id,
  nullif(a.n, 3) a, 
  nullif(b.n, 3) b,
  case a.n when b.n then 'Same' else 'Different' end status
  from data a, data b
)
select test_id, tbl, n, status
from tests
unpivot include nulls (n for tbl in(A,B));

select * from t order by 1,2;
TEST_ID TBL N STATUS
1 A 1 Same
1 B 1 Same
2 A 1 Different
2 B 2 Different
3 A 1 Different
3 B Different
4 A 2 Different
4 B 1 Different
5 A 2 Same
5 B 2 Same
6 A 2 Different
6 B Different
7 A Different
7 B 1 Different
8 A Different
8 B 2 Different
9 A Same
9 B Same

 

My solution (again)

It looks like the most concise test for a difference is COUNT(N) BETWEEN 1 and COUNT(DISTINCT N).

select t.*,
count(distinct n) over(partition by test_id) cnt_distinct,
count(n) over(partition by test_id) cnt,
case when count(n) over(partition by test_id)
  between 1 and count(distinct n) over(partition by test_id)
  then 'Different' else 'Same' end new_status
from t
order by 4,5,6,1,2;
TEST_ID TBL N STATUS CNT_DISTINCT CNT NEW_STATUS
3 A 1 Different 1 1 Different
3 B Different 1 1 Different
6 A 2 Different 1 1 Different
6 B Different 1 1 Different
7 A Different 1 1 Different
7 B 1 Different 1 1 Different
8 A Different 1 1 Different
8 B 2 Different 1 1 Different
2 A 1 Different 2 2 Different
2 B 2 Different 2 2 Different
4 A 2 Different 2 2 Different
4 B 1 Different 2 2 Different
9 A Same 0 0 Same
9 B Same 0 0 Same
1 A 1 Same 1 2 Same
1 B 1 Same 1 2 Same
5 A 2 Same 1 2 Same
5 B 2 Same 1 2 Same

 

Using the DUMP() function (Andrej Pashchenko)

Since the DUMP() function returns a non-null result in every case, we can simply test

COUNT( DISTINCT DUMP(n) ) OVER (PARTITION BY TEST_ID)

If the result is 1, N is the same, otherwise it is different.

This solution is more concise, but in my tests it ran 10 times slower. The DUMP() function chews up some CPU.

Using SYS_OP_MAP_NONNULL() (Tony Hasler)

As with DUMP(), we can test

COUNT( DISTINCT SYS_OP_MAP_NONNULL(n) ) OVER (PARTITION BY TEST_ID)

Again a concise solution. Surprisingly, it runs twice as slow as my solution – and despite one mention in the documentation I doubt that this function is supported.

Simplifying the comparison

Zahar Hilkevich suggested simply comparing the two counts:

select t.*,
count(distinct n) over(partition by test_id) cnt_distinct,
count(n) over(partition by test_id) cnt,
case when count(n)          over(partition by test_id)
        = count(distinct n) over(partition by test_id)
  then 'Different' else 'Same' end new_status
from t
order by 4,5,6,1,2;
TEST_ID TBL N STATUS CNT_DISTINCT CNT NEW_STATUS
3 A 1 Different 1 1 Different
3 B Different 1 1 Different
6 A 2 Different 1 1 Different
6 B Different 1 1 Different
7 A Different 1 1 Different
7 B 1 Different 1 1 Different
8 A Different 1 1 Different
8 B 2 Different 1 1 Different
2 A 1 Different 2 2 Different
2 B 2 Different 2 2 Different
4 A 2 Different 2 2 Different
4 B 1 Different 2 2 Different
9 A Same 0 0 Different
9 B Same 0 0 Different
1 A 1 Same 1 2 Same
1 B 1 Same 1 2 Same
5 A 2 Same 1 2 Same
5 B 2 Same 1 2 Same

 

Notice that in test case 9, when both values are null, Zahar’s test says they are “different”. This is not a problem in the context of displaying different values, since the output values will be null no matter what.

Conclusion

Thanks to those who mentioned other more concise solutions. Despite my taste for concision, I think the solution I proposed is correct and efficient. The choice is yours!

Advertisements

One thought on “More on Comparing Columns across Rows

  1. Stew, your interpretation of my solution was a bit inaccurate. My criteria for the rows to be different included count(*)=1 condition (along with GROUP BY clause), so for your new test table, my solution should look as following:

    select t.test_id, LISTAGG(tbl,',') WITHIN GROUP(ORDER BY tbl) tbl, N, status, 
    count(distinct n) over(partition by test_id) cnt_distinct,
    count(n) over(partition by test_id) cnt,
    case when count(n)          over(partition by test_id)
            = count(distinct n) over(partition by test_id)
            and count(*)=1
      then 'Different' else 'Same' end new_status, count(*)
    from t
     group by t.test_id, N, status 
    order by 4,5,6,1,2
    

    I agree, that after you simplified your solution for the new T table, it looks very compact; though for comparing emp vs emp2 tables it looked bulkier than the one I suggested. My solution was focused on showing the differences between tables (and this was your ultimate goal) while your new solution shows both all records with the matching status.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s