# Actually Seeing the Differences

When comparing data between two tables, it’s one thing to query the differences and another thing to actually see them.

I have two tables EMP and EMP2. Here are the rows that are different. Quick, what columns have changed?

EMPNO TBL ROW_CNT ENAME JOB MGR SAL COMM
7369 EMP 2 SMITH CLERK 7902 800
7369 EMP2 2 SMITE CLERK 7902 800
7499 EMP 2 ALLEN SALESMAN 7698 1600 300
7499 EMP2 2 ALLEN SALESGUY 7698 1600 300
7521 EMP 2 WARD SALESMAN 7698 1250 500
7521 EMP2 2 WARD SALESMAN 7788 1250 500
7654 EMP 2 MARTIN SALESMAN 7698 1250 1400
7654 EMP2 2 MARTIN SALESMAN 7698 1750 1400
7698 EMP 2 BLAKE MANAGER 7839 2850
7698 EMP2 2 BLAKE MANAGER 7839 2850 1000
7788 EMP2 1 SCOTT ANALYST 7566 3000
7902 EMP 1 FORD ANALYST 7566 3000

I thought so. Now suppose I blank out the columns that are the same in both tables?

EMPNO TBL ROW_CNT ENAME JOB MGR SAL COMM
7369 EMP 2 SMITH
7369 EMP2 2 SMITE
7499 EMP 2 SALESMAN
7499 EMP2 2 SALESGUY
7521 EMP 2 7698
7521 EMP2 2 7788
7654 EMP 2 1250
7654 EMP2 2 1750
7698 EMP 2
7698 EMP2 2 1000
7788 EMP2 1 SCOTT ANALYST 7566 3000
7902 EMP 1 FORD ANALYST 7566 3000

That’s better. Now, how can I do that?

#### Comparing columns across rows

I need to compare two columns in two different rows, and I need the result of the comparison in each row. That sounds like a job for analytic functions. It would be hard to use LAG() or LEAD() because one row would need LEAD() and the other would need LAG(). I finally came up with a way to use COUNT().

For testing, I created a little table T:

TEST_ID TBL N STATUS
1 A 1 Same
1 B 1 Same
2 A 1 Different
2 B 2 Different
3 A 1 Different
3 B Different
4 A 2 Different
4 B 1 Different
5 A 2 Same
5 B 2 Same
6 A 2 Different
6 B Different
7 A Different
7 B 1 Different
8 A Different
8 B 2 Different
9 A Same
9 B Same

If I use COUNT(DISTINCT N) I should get either 1 (same values) or 2 (different values) and I’m done: wrong! When I count N or DISTINCT N, null values don’t count. So I thought of comparing COUNT(N) and COUNT(DISTINCT N).

```select t.*,
count(distinct n) over(partition by test_id) cnt_distinct,
count(n) over(partition by test_id) cnt
from t
order by 4,5,6,1,2;```
TEST_ID TBL N STATUS CNT_DISTINCT CNT
3 A 1 Different 1 1
3 B Different 1 1
6 A 2 Different 1 1
6 B Different 1 1
7 A Different 1 1
7 B 1 Different 1 1
8 A Different 1 1
8 B 2 Different 1 1
2 A 1 Different 2 2
2 B 2 Different 2 2
4 A 2 Different 2 2
4 B 1 Different 2 2
9 A Same 0 0
9 B Same 0 0
1 A 1 Same 1 2
1 B 1 Same 1 2
5 A 2 Same 1 2
5 B 2 Same 1 2

It looks like the most concise test for a difference is COUNT(N) BETWEEN 1 and COUNT(DISTINCT N).

```select t.*,
count(distinct n) over(partition by test_id) cnt_distinct,
count(n) over(partition by test_id) cnt,
case when count(n) over(partition by test_id)
between 1 and count(distinct n) over(partition by test_id)
then 'Different' else 'Same' end new_status
from t
order by 4,5,6,1,2;```
TEST_ID TBL N STATUS CNT_DISTINCT CNT NEW_STATUS
3 A 1 Different 1 1 Different
3 B Different 1 1 Different
6 A 2 Different 1 1 Different
6 B Different 1 1 Different
7 A Different 1 1 Different
7 B 1 Different 1 1 Different
8 A Different 1 1 Different
8 B 2 Different 1 1 Different
2 A 1 Different 2 2 Different
2 B 2 Different 2 2 Different
4 A 2 Different 2 2 Different
4 B 1 Different 2 2 Different
9 A Same 0 0 Same
9 B Same 0 0 Same
1 A 1 Same 1 2 Same
1 B 1 Same 1 2 Same
5 A 2 Same 1 2 Same
5 B 2 Same 1 2 Same

Finally, I apply this technique to each non-PK column in my EMP comparison and I get the desired result.

```select EMPNO,
case NEW_CNT when 1 then 'EMP2' else 'EMP' end tbl,
ROW_CNT,
case when count(ENAME) over(partition by EMPNO)
between 1 and count(distinct ENAME) over(partition by EMPNO)
then ENAME end ENAME,
case when count(JOB) over(partition by EMPNO)
between 1 and count(distinct JOB) over(partition by EMPNO)
then JOB end JOB,
case when count(MGR) over(partition by EMPNO)
between 1 and count(distinct MGR) over(partition by EMPNO)
then MGR end MGR,
case when count(SAL) over(partition by EMPNO)
between 1 and count(distinct SAL) over(partition by EMPNO)
then SAL end SAL,
case when count(COMM) over(partition by EMPNO)
between 1 and count(distinct COMM) over(partition by EMPNO)
then COMM end COMM
FROM (
select
EMPNO, ENAME, JOB, MGR, SAL, COMM,
sum(NEW_CNT) NEW_CNT, count(*) over(partition by EMPNO) ROW_CNT
FROM (
select
EMPNO, ENAME, JOB, MGR, SAL, COMM,
-1 NEW_CNT
from EMP O
union all
select
EMPNO, ENAME, JOB, MGR, SAL, COMM,
1 NEW_CNT
from emp2 N
)
group by
EMPNO, ENAME, JOB, MGR, SAL, COMM
having sum(NEW_CNT) != 0
)
order by 1, 2, new_cnt;```

#### Conclusion

By using two COUNT() analytic functions, I can tell whether two columns in two different rows are the same or not, considering two NULLs to be “the same”. This allows me to compare rows, then to compare columns and blank out all but the true differences.

## 7 thoughts on “Actually Seeing the Differences”

1. This is a nice technique. I wonder, wouldn’t it be easier to simply compare COUNT(DISTINCT ) to 1 and 2. If 1 then – ‘the same’ if 2 – then different.

• Hi Zahar,

That doesn’t work when one or both of the values is NULL. I did explain that in my post…

Best regards,
Stew

2. It seems that BETWEEN is not really needed in your query, simple match of COUNT to COUNT(DISTINCT …) is sufficient:

``` WITH x AS ( SELECT empno, ename, job, mgr, sal, comm, 'EMP' tbl FROM emp UNION ALL SELECT empno, ename, job, mgr, sal, comm, 'EMP2' tbl FROM emp2 ) SELECT empno, MAX(tbl) tbl, COUNT(*) OVER(PARTITION BY empno) row_cnt, CASE WHEN COUNT(DISTINCT ename) OVER(PARTITION BY empno)=COUNT(ename) OVER(PARTITION BY empno) THEN ename END ename, CASE WHEN COUNT(DISTINCT job) OVER(PARTITION BY empno)=COUNT(job) OVER(PARTITION BY empno) THEN job END job, CASE WHEN COUNT(DISTINCT mgr) OVER(PARTITION BY empno)=COUNT(mgr) OVER(PARTITION BY empno) THEN mgr END mgr, CASE WHEN COUNT(DISTINCT sal) OVER(PARTITION BY empno)=COUNT(sal) OVER(PARTITION BY empno) THEN sal END sal, CASE WHEN COUNT(DISTINCT comm) OVER(PARTITION BY empno)=COUNT(comm) OVER(PARTITION BY empno) THEN comm END comm FROM x GROUP BY empno, ename, job, mgr, sal, comm HAVING COUNT(*)=1 ORDER BY 1, 2 ```

• Zahar,

Yes, I see your point. If both values are NULL, your WHEN condition is met and mine is not, but that doesn’t matter because both values are NULL…

Best regards,
Stew

3. ```WITH x AS (
SELECT  empno, ename, job, mgr, sal, comm, 'EMP' tbl
FROM emp
UNION ALL
SELECT empno, ename, job, mgr, sal, comm, 'EMP2' tbl
FROM emp2
)
SELECT empno, MAX(tbl) tbl,
COUNT(*) OVER(PARTITION BY empno) row_cnt,
CASE WHEN COUNT(DISTINCT ename) OVER(PARTITION BY empno)=COUNT(ename) OVER(PARTITION BY empno)
THEN ename
END ename,
CASE WHEN COUNT(DISTINCT job) OVER(PARTITION BY empno)=COUNT(job) OVER(PARTITION BY empno)
THEN job
END job,
CASE WHEN COUNT(DISTINCT mgr) OVER(PARTITION BY empno)=COUNT(mgr) OVER(PARTITION BY empno)
THEN mgr
END mgr,
CASE WHEN COUNT(DISTINCT sal) OVER(PARTITION BY empno)=COUNT(sal) OVER(PARTITION BY empno)
THEN sal
END sal,
CASE WHEN COUNT(DISTINCT comm) OVER(PARTITION BY empno)=COUNT(comm) OVER(PARTITION BY empno)
THEN comm
END comm
FROM x
GROUP BY empno, ename, job, mgr, sal, comm
HAVING COUNT(*)=1
ORDER BY 1, 2
```
4. tonyhasler says:

Hi Stew,

I don’t know if this entry (https://docs.oracle.com/database/121/DWHSG/basicmv.htm#DWHSG8217) counts as official documentation for SYS_OP_MAP_NONNULL but if it does an easier to read variant becomes “supported”;

DECODE (
MAX (SYS_OP_MAP_NONNULL (n)) OVER (PARTITION BY test_id),
MIN (SYS_OP_MAP_NONNULL (n)) OVER (PARTITION BY test_id),
‘Same’,
‘Different’)

• Hi Tony,

If SYS_OP_MAP_NONNULL were supported, we could just do

COUNT(DISTINCT SYS_OP_MAP_NONNULL(n)) OVER (PARTITION BY test_id)

which would be the most concise solution, though strangely the solution I posted is twice as fast in the few tests I did: it took less CPU.

Best regards,
Stew