# More on Comparing Columns across Rows

My previous post got several replies about other ways to compare columns across rows. Here’s my take on all of them.

#### Test data (again)

```create table t as
with data as (
select level n from dual connect by level <= 3
)
, tests as (
select row_number() over (order by a.n, b.n) test_id,
nullif(a.n, 3) a,
nullif(b.n, 3) b,
case a.n when b.n then 'Same' else 'Different' end status
from data a, data b
)
select test_id, tbl, n, status
from tests
unpivot include nulls (n for tbl in(A,B));

select * from t order by 1,2;```
TEST_ID TBL N STATUS
1 A 1 Same
1 B 1 Same
2 A 1 Different
2 B 2 Different
3 A 1 Different
3 B Different
4 A 2 Different
4 B 1 Different
5 A 2 Same
5 B 2 Same
6 A 2 Different
6 B Different
7 A Different
7 B 1 Different
8 A Different
8 B 2 Different
9 A Same
9 B Same

#### My solution (again)

It looks like the most concise test for a difference is COUNT(N) BETWEEN 1 and COUNT(DISTINCT N).

```select t.*,
count(distinct n) over(partition by test_id) cnt_distinct,
count(n) over(partition by test_id) cnt,
case when count(n) over(partition by test_id)
between 1 and count(distinct n) over(partition by test_id)
then 'Different' else 'Same' end new_status
from t
order by 4,5,6,1,2;```
TEST_ID TBL N STATUS CNT_DISTINCT CNT NEW_STATUS
3 A 1 Different 1 1 Different
3 B Different 1 1 Different
6 A 2 Different 1 1 Different
6 B Different 1 1 Different
7 A Different 1 1 Different
7 B 1 Different 1 1 Different
8 A Different 1 1 Different
8 B 2 Different 1 1 Different
2 A 1 Different 2 2 Different
2 B 2 Different 2 2 Different
4 A 2 Different 2 2 Different
4 B 1 Different 2 2 Different
9 A Same 0 0 Same
9 B Same 0 0 Same
1 A 1 Same 1 2 Same
1 B 1 Same 1 2 Same
5 A 2 Same 1 2 Same
5 B 2 Same 1 2 Same

#### Using the DUMP() function (Andrej Pashchenko)

Since the DUMP() function returns a non-null result in every case, we can simply test

`COUNT( DISTINCT DUMP(n) ) OVER (PARTITION BY TEST_ID)`

If the result is 1, N is the same, otherwise it is different.

This solution is more concise, but in my tests it ran 10 times slower. The DUMP() function chews up some CPU.

#### Using SYS_OP_MAP_NONNULL() (Tony Hasler)

As with DUMP(), we can test

`COUNT( DISTINCT SYS_OP_MAP_NONNULL(n) ) OVER (PARTITION BY TEST_ID)`

Again a concise solution. Surprisingly, it runs twice as slow as my solution – and despite one mention in the documentation I doubt that this function is supported.

#### Simplifying the comparison

Zahar Hilkevich suggested simply comparing the two counts:

```select t.*,
count(distinct n) over(partition by test_id) cnt_distinct,
count(n) over(partition by test_id) cnt,
case when count(n)          over(partition by test_id)
= count(distinct n) over(partition by test_id)
then 'Different' else 'Same' end new_status
from t
order by 4,5,6,1,2;```
TEST_ID TBL N STATUS CNT_DISTINCT CNT NEW_STATUS
3 A 1 Different 1 1 Different
3 B Different 1 1 Different
6 A 2 Different 1 1 Different
6 B Different 1 1 Different
7 A Different 1 1 Different
7 B 1 Different 1 1 Different
8 A Different 1 1 Different
8 B 2 Different 1 1 Different
2 A 1 Different 2 2 Different
2 B 2 Different 2 2 Different
4 A 2 Different 2 2 Different
4 B 1 Different 2 2 Different
9 A Same 0 0 Different
9 B Same 0 0 Different
1 A 1 Same 1 2 Same
1 B 1 Same 1 2 Same
5 A 2 Same 1 2 Same
5 B 2 Same 1 2 Same

Notice that in test case 9, when both values are null, Zahar’s test says they are “different”. This is not a problem in the context of displaying different values, since the output values will be null no matter what.

#### Conclusion

Thanks to those who mentioned other more concise solutions. Despite my taste for concision, I think the solution I proposed is correct and efficient. The choice is yours!

Advertisements

## One thought on “More on Comparing Columns across Rows”

1. Stew, your interpretation of my solution was a bit inaccurate. My criteria for the rows to be different included count(*)=1 condition (along with GROUP BY clause), so for your new test table, my solution should look as following:

```select t.test_id, LISTAGG(tbl,',') WITHIN GROUP(ORDER BY tbl) tbl, N, status,
count(distinct n) over(partition by test_id) cnt_distinct,
count(n) over(partition by test_id) cnt,
case when count(n)          over(partition by test_id)
= count(distinct n) over(partition by test_id)
and count(*)=1
then 'Different' else 'Same' end new_status, count(*)
from t
group by t.test_id, N, status
order by 4,5,6,1,2
```

I agree, that after you simplified your solution for the new T table, it looks very compact; though for comparing emp vs emp2 tables it looked bulkier than the one I suggested. My solution was focused on showing the differences between tables (and this was your ultimate goal) while your new solution shows both all records with the matching status.