My previous post got several replies about other ways to compare columns across rows. Here’s my take on all of them.
Test data (again)
create table t as with data as ( select level n from dual connect by level <= 3 ) , tests as ( select row_number() over (order by a.n, b.n) test_id, nullif(a.n, 3) a, nullif(b.n, 3) b, case a.n when b.n then 'Same' else 'Different' end status from data a, data b ) select test_id, tbl, n, status from tests unpivot include nulls (n for tbl in(A,B)); select * from t order by 1,2;
TEST_ID | TBL | N | STATUS |
---|---|---|---|
1 | A | 1 | Same |
1 | B | 1 | Same |
2 | A | 1 | Different |
2 | B | 2 | Different |
3 | A | 1 | Different |
3 | B | Different | |
4 | A | 2 | Different |
4 | B | 1 | Different |
5 | A | 2 | Same |
5 | B | 2 | Same |
6 | A | 2 | Different |
6 | B | Different | |
7 | A | Different | |
7 | B | 1 | Different |
8 | A | Different | |
8 | B | 2 | Different |
9 | A | Same | |
9 | B | Same |
My solution (again)
It looks like the most concise test for a difference is COUNT(N) BETWEEN 1 and COUNT(DISTINCT N).
select t.*, count(distinct n) over(partition by test_id) cnt_distinct, count(n) over(partition by test_id) cnt, case when count(n) over(partition by test_id) between 1 and count(distinct n) over(partition by test_id) then 'Different' else 'Same' end new_status from t order by 4,5,6,1,2;
TEST_ID | TBL | N | STATUS | CNT_DISTINCT | CNT | NEW_STATUS |
---|---|---|---|---|---|---|
3 | A | 1 | Different | 1 | 1 | Different |
3 | B | Different | 1 | 1 | Different | |
6 | A | 2 | Different | 1 | 1 | Different |
6 | B | Different | 1 | 1 | Different | |
7 | A | Different | 1 | 1 | Different | |
7 | B | 1 | Different | 1 | 1 | Different |
8 | A | Different | 1 | 1 | Different | |
8 | B | 2 | Different | 1 | 1 | Different |
2 | A | 1 | Different | 2 | 2 | Different |
2 | B | 2 | Different | 2 | 2 | Different |
4 | A | 2 | Different | 2 | 2 | Different |
4 | B | 1 | Different | 2 | 2 | Different |
9 | A | Same | 0 | 0 | Same | |
9 | B | Same | 0 | 0 | Same | |
1 | A | 1 | Same | 1 | 2 | Same |
1 | B | 1 | Same | 1 | 2 | Same |
5 | A | 2 | Same | 1 | 2 | Same |
5 | B | 2 | Same | 1 | 2 | Same |
Using the DUMP() function (Andrej Pashchenko)
Since the DUMP() function returns a non-null result in every case, we can simply test
COUNT( DISTINCT DUMP(n) ) OVER (PARTITION BY TEST_ID)
If the result is 1, N is the same, otherwise it is different.
This solution is more concise, but in my tests it ran 10 times slower. The DUMP() function chews up some CPU.
Using SYS_OP_MAP_NONNULL() (Tony Hasler)
As with DUMP(), we can test
COUNT( DISTINCT SYS_OP_MAP_NONNULL(n) ) OVER (PARTITION BY TEST_ID)
Again a concise solution. Surprisingly, it runs twice as slow as my solution – and despite one mention in the documentation I doubt that this function is supported.
Simplifying the comparison
Zahar Hilkevich suggested simply comparing the two counts:
select t.*, count(distinct n) over(partition by test_id) cnt_distinct, count(n) over(partition by test_id) cnt, case when count(n) over(partition by test_id) = count(distinct n) over(partition by test_id) then 'Different' else 'Same' end new_status from t order by 4,5,6,1,2;
TEST_ID | TBL | N | STATUS | CNT_DISTINCT | CNT | NEW_STATUS |
---|---|---|---|---|---|---|
3 | A | 1 | Different | 1 | 1 | Different |
3 | B | Different | 1 | 1 | Different | |
6 | A | 2 | Different | 1 | 1 | Different |
6 | B | Different | 1 | 1 | Different | |
7 | A | Different | 1 | 1 | Different | |
7 | B | 1 | Different | 1 | 1 | Different |
8 | A | Different | 1 | 1 | Different | |
8 | B | 2 | Different | 1 | 1 | Different |
2 | A | 1 | Different | 2 | 2 | Different |
2 | B | 2 | Different | 2 | 2 | Different |
4 | A | 2 | Different | 2 | 2 | Different |
4 | B | 1 | Different | 2 | 2 | Different |
9 | A | Same | 0 | 0 | Different | |
9 | B | Same | 0 | 0 | Different | |
1 | A | 1 | Same | 1 | 2 | Same |
1 | B | 1 | Same | 1 | 2 | Same |
5 | A | 2 | Same | 1 | 2 | Same |
5 | B | 2 | Same | 1 | 2 | Same |
Notice that in test case 9, when both values are null, Zahar’s test says they are “different”. This is not a problem in the context of displaying different values, since the output values will be null no matter what.
Conclusion
Thanks to those who mentioned other more concise solutions. Despite my taste for concision, I think the solution I proposed is correct and efficient. The choice is yours!
Stew, your interpretation of my solution was a bit inaccurate. My criteria for the rows to be different included count(*)=1 condition (along with GROUP BY clause), so for your new test table, my solution should look as following:
I agree, that after you simplified your solution for the new T table, it looks very compact; though for comparing emp vs emp2 tables it looked bulkier than the one I suggested. My solution was focused on showing the differences between tables (and this was your ultimate goal) while your new solution shows both all records with the matching status.