# My #OUGN19 Presentations

As promised, here are the two presentations I gave on the OUGN 2019 Boat conference. I had a great time with old friends and new, and managed to have discussions about politics without anyone getting upset! Hope to see some of you again at OUG Ireland!

https://www.slideshare.net/stewashton/json-in-18c-and-19c

https://www.slideshare.net/stewashton/make-your-data-dance

# PIVOT Function with Totals

I have updated the ADVANCED_PIVOT function to support row and/or column totals. It’s neat, but you need to know what you’re doing.

#### No totals

Let’s start from an example that has no totals. If we didn’t want to pivot, we could do a simple GROUP BY:

```SQL> select dname, job, sum(sal)
2  from emp join dept using(deptno)
3  group by dname, job
4  order by 1,2;

DNAME          JOB         SUM(SAL)
-------------- --------- ----------
ACCOUNTING     CLERK           1300
ACCOUNTING     MANAGER         2450
ACCOUNTING     PRESIDENT       5000
RESEARCH       ANALYST         3000
RESEARCH       CLERK           1900
RESEARCH       MANAGER         2975
SALES          CLERK            950
SALES          MANAGER         2850
SALES          SALESMAN        5600```

If we do want to pivot, the PIVOT clause will do the aggregation for us:

```SQL> var rc refcursor

SQL> begin
3  p_source => q'@(
4  select dname, job, sal
5  from emp join dept using(deptno)
6  )@',
7  p_aggfuncs => 'sum(sal)',
8  p_pivotfor => 'job'
9  );
10  end;
11  /

print :rc

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING                      1300       2450       5000
RESEARCH             3000       1900       2975
SALES                            950       2850                  5600```

#### Horizontal total

Now suppose we want an extra column that shows the total for each row. Is that a “row total” or a “column total”? To avoid confusion, I’m going to say “horizontal total” when we get an extra column, and “vertical total” when we get an extra row.

The first thing I realized is that, if we want any totals, we have to compute them beforehand: the PIVOT clause will only be used to create the pivot columns and move the data to the right places. Here is a query to do that:

```SQL> select dname,
2    job,
3    sum(sal) sal
4  from emp join dept using(deptno)
5  group by dname, rollup(job);

DNAME          JOB              SAL
-------------- --------- ----------
SALES          CLERK            950
SALES          MANAGER         2850
SALES          SALESMAN        5600
SALES                          9400
RESEARCH       CLERK           1900
RESEARCH       ANALYST         3000
RESEARCH       MANAGER         2975
RESEARCH                       7875
ACCOUNTING     CLERK           1300
ACCOUNTING     MANAGER         2450
ACCOUNTING     PRESIDENT       5000
ACCOUNTING                     8750```

Notice the use of ROLLUP(JOB) to get totals for all the jobs, but always at the DNAME level.

Notice also that the total rows have a JOB value of NULL. PIVOT will not allow us to pivot data based on a NULL column value. So we use the GROUPING() function to find out when there is a total of all jobs, and replace the NULL with a literal.

```SQL> select dname,
2    decode(grouping(job),1,'(Total)',job) job,
3    sum(sal) sal
4  from emp join dept using(deptno)
5  group by dname, rollup(job);

DNAME          JOB              SAL
-------------- --------- ----------
SALES          CLERK            950
SALES          MANAGER         2850
SALES          SALESMAN        5600
SALES          (Total)         9400
RESEARCH       CLERK           1900
RESEARCH       ANALYST         3000
RESEARCH       MANAGER         2975
RESEARCH       (Total)         7875
ACCOUNTING     CLERK           1300
ACCOUNTING     MANAGER         2450
ACCOUNTING     PRESIDENT       5000
ACCOUNTING     (Total)         8750```

Notice I always put the literal ‘(Total)’ before the value. If the value is not a string, Oracle will implicitly convert it. If we said `decode(grouping(<column>),0,<column>,'(Total)')` and <column> was a date or a number, we would get a conversion exception.

Now that we have the totals with a valid label, let’s plug that into my function:

```SQL> begin
3  p_source => q'@(
4  select dname,
5    decode(grouping(job),1,'(Total)',job) job,
6    sum(sal) sal
7  from emp join dept using(deptno)
8  group by dname, rollup(job)
9  )@',
10  p_aggfuncs => 'sum(sal)',
11  p_pivotfor => 'job'
12  );
13  end;
14  /

DNAME             (Total)    ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING           8750                  1300       2450       5000
RESEARCH             7875       3000       1900       2975
SALES                9400                   950       2850                  5600```

Oops! The Total column is in the wrong place. To fix that, I added an optional parameter `p_total_label`. Whenever we order by a column, I will make sure that the content of that parameter goes last. Let’s try it:

```SQL> begin
3  p_source => q'@(
4  select dname,
5    decode(grouping(job),1,'(Total)',job) job,
6    sum(sal) sal
7  from emp join dept using(deptno)
8  group by dname, rollup(job)
9  )@',
10  p_aggfuncs => 'sum(sal)',
11  p_pivotfor => 'job',
12  p_total_label => '(Total)'
13  );
14  end;
15  /

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN    (Total)
-------------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING                      1300       2450       5000                  8750
RESEARCH             3000       1900       2975                             7875
SALES                            950       2850                  5600       9400```

Warnings:

1. Make sure you use the same literal value for `p_total_label` and within the `p_source` query! The function will never try to check for coherence, so it’s up to you.
2. To use something like ‘(Total)’ instead of NULL, dates and numbers must be converted to strings. If that happens, the function can only order by the string value, not the original date or number value!

#### Vertical total

Let’s try to ROLLUP by DNAME rather than JOB, in order to get an extra row with the totals for each job.

```SQL> begin
3  p_source => q'@(
4  select decode(grouping(dname),1,'(Total)',dname) dname,
5    job,
6    sum(sal) sal
7  from emp join dept using(deptno)
8  group by rollup(dname), job
9  )@',
10  p_aggfuncs => 'sum(sal)',
11  p_pivotfor => 'job',
12  p_total_label => '(Total)'
13  );
14  end;
15  /

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- ---------- ---------- ---------- ---------- ----------
(Total)              3000       4150       8275       5000       5600
ACCOUNTING                      1300       2450       5000
RESEARCH             3000       1900       2975
SALES                            950       2850                  5600```

We still have a problem with sorting the rows. The function has no idea that the DNAME column has a ‘(Total)’ row that needs to go last. To fix that, I added another optional parameter `p_ordercol`. The function will order the final query by the column listed here, making sure that the total line goes last.

```SQL> begin
3  p_source => q'@(
4  select decode(grouping(dname),1,'(Total)',dname) dname,
5    job,
6    sum(sal) sal
7  from emp join dept using(deptno)
8  group by rollup(dname), job
9  )@',
10  p_aggfuncs => 'sum(sal)',
11  p_pivotfor => 'job',
12  p_total_label => '(Total)',
13  p_ordercol => 'dname'
14  );
15  end;
16  /

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING                      1300       2450       5000
RESEARCH             3000       1900       2975
SALES                            950       2850                  5600
(Total)              3000       4150       8275       5000       5600```

#### Horizontal and vertical totals

To get both totals, we just need to change the source query. We could combine two ROLLUPs, but in this case the CUBE function is more concise:

```SQL> begin
3  p_source => q'@(
4  select decode(grouping(dname),1,'(Total)',dname) dname,
5    decode(grouping(job),1,'(Total)',job) job,
6    sum(sal) sal
7  from emp join dept using(deptno)
8  group by cube(dname,job)
9  )@',
10  p_aggfuncs => 'sum(sal)',
11  p_pivotfor => 'job',
12  p_total_label => '(Total)',
13  p_ordercol => 'dname'
14  );
15  end;
16  /

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN    (Total)
-------------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING                      1300       2450       5000                  8750
RESEARCH             3000       1900       2975                             7875
SALES                            950       2850                  5600       9400
(Total)              3000       4150       8275       5000       5600      26025```

#### Two vertical totals

Now we’re getting fancy! Suppose we want totals by department, but also subtotals within each department by the year someone was hired?

• row 10: get the hiring year
• row 13: instead of ROLLUP on DNAME only, we ROLLUP on DNAME and HIREYEAR
• row 5: HIREYEAR should say ‘(Total)’ only when we are on a subtotal line, not on the grand total.
• row 18: the `p_ordercol` parameter can be a list of column names
```SQL> begin
3  p_source => q'@(
4  select decode(grouping(dname),1,'(Total)',dname) dname,
5    decode(grouping_id(dname,hireyear),1,'(Total)',hireyear) hireyear,
6    job,
7    sum(sal) sal
8  from (
9    select dname, job, sal,
10      extract(year from hiredate) hireyear
11    from dept join emp using(deptno)
12  )
13  group by rollup(dname, hireyear), job
14  )@',
15  p_aggfuncs => 'sum(sal)',
16  p_pivotfor => 'job',
17  p_total_label => '(Total)',
18  p_ordercol => 'dname,hireyear'
19  );
20  end;
21  /

DNAME          HIREYEAR     ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- --------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING     1981                                  2450       5000
ACCOUNTING     1982                       1300
ACCOUNTING     (Total)                    1300       2450       5000
RESEARCH       1980                        800
RESEARCH       1981            3000                  2975
RESEARCH       1987                       1100
RESEARCH       (Total)         3000       1900       2975
SALES          1981                        950       2850                  5600
SALES          (Total)                     950       2850                  5600
(Total)                        3000       4150       8275       5000       5600```

#### Two vertical totals, one horizontal

• row 13: we ROLLUP on the vertical axis (DNAME, HIREYEAR) and on the horizontal (JOB)
• row 6: JOB must never have a NULL value, else the PIVOT will fail.
```SQL> begin
3  p_source => q'@(
4  select decode(grouping(dname),1,'(Total)',dname) dname,
5    decode(grouping_id(dname,hireyear),1,'(Total)',hireyear) hireyear,
6    decode(grouping(job),1,'(Total)',job) job,
7    sum(sal) sal
8  from (
9    select dname, job, sal,
10      extract(year from hiredate) hireyear
11    from dept join emp using(deptno)
12  )
13  group by rollup(dname, hireyear), rollup(job)
14  )@',
15  p_aggfuncs => 'sum(sal)',
16  p_pivotfor => 'job',
17  p_total_label => '(Total)',
18  p_ordercol => 'dname,hireyear'
19  );
20  end;
21  /

DNAME          HIREYEAR     ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN    (Total)
-------------- --------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING     1981                                  2450       5000                  7450
ACCOUNTING     1982                       1300                                        1300
ACCOUNTING     (Total)                    1300       2450       5000                  8750
RESEARCH       1980                        800                                         800
RESEARCH       1981            3000                  2975                             5975
RESEARCH       1987                       1100                                        1100
RESEARCH       (Total)         3000       1900       2975                             7875
SALES          1981                        950       2850                  5600       9400
SALES          (Total)                     950       2850                  5600       9400
(Total)                        3000       4150       8275       5000       5600      26025```

#### One vertical, two horizontal

• row 5: HIREYEAR is now a source for a pivot column, so it must never be NULL.
• row 16: the `p_pivotfor` parameter can be a list of column names
```SQL> begin
3  p_source => q'@(
4  select decode(grouping(dname),1,'(Total)',dname) dname,
5    decode(grouping(hireyear),1,'(Total)',hireyear) hireyear,
6    decode(grouping(job),1,'(Total)',job) job,
7    sum(sal) sal
8  from (
9    select dname, job, sal,
10      extract(year from hiredate) hireyear
11    from dept join emp using(deptno)
12  )
13  group by rollup(dname), rollup(hireyear, job)
14  )@',
15  p_aggfuncs => 'sum(sal)',
16  p_pivotfor => 'hireyear,job',
17  p_total_label => '(Total)',
18  p_ordercol => 'dname'
19  );
20  end;
21  /

DNAME          1980_CLERK 1980_(Total) 1981_ANALYST 1981_CLERK 1981_MANAGER 1981_PRESIDENT 1981_SALESMAN 1981_(Total) 1982_CLERK 1982_(Total) 1987_CLERK 1987_(Total) (Total)_(Total)
-------------- ---------- ------------ ------------ ---------- ------------ -------------- ------------- ------------ ---------- ------------ ---------- ------------ ---------------
ACCOUNTING                                                             2450           5000                       7450       1300         1300                                    8750
RESEARCH              800          800         3000                    2975                                      5975                               1100         1100            7875
SALES                                                      950         2850                         5600         9400                                                            9400
(Total)               800          800         3000        950         8275           5000          5600        22825       1300         1300       1100         1100           26025
```

#### Conclusion

Did you really read down to here? If so, congratulations! Here is what I wish you would take away from this post:

• The ROLLUP and CUBE extensions to GROUP BY are very powerful means to get various levels of subtotals, totals and grand totals.
• The PIVOT clause can manage vertical and / or horizontal totals very cleanly.
• The source query should do one ROLLUP on the vertical columns and another on the horizontal columns, except when there is exactly one of each: then just use CUBE.
• The ADVANCED_PIVOT function generates the appropriate PIVOT clause based on the data, and can handle multiple levels of vertical and / or horizontal totals.
• The horizontal “p_pivotfor” columns must never be NULL.
• The vertical “p_ordercol” columns can be NULL or labeled, your choice. If there are dates or numbers, NULL allows for sorting by the original values.

# Improved PIVOT Function

My generic PIVOT function only allowed one column in the FOR clause. The modestly named “Advanced” function has no such limit.

Here are some test runs to put it through its paces. I’ll save the code for the end.

UPDATE 2018-05-31: I added support for horizontal and vertical totals. The most recent code is still at the bottom of this post, but the additional explanations are here: PIVOT Function with Totals

NOTE: I changed the order of the input parameters to match the order of the PIVOT statement itself: aggregation functions before the pivot_for_clause.

#### One aggregation function, one FOR column

```SQL> set serveroutput off
SQL> var rc refcursor
SQL> begin
3      '(select dname, job, sal from DEPT join EMP using(DEPTNO))',
4      'AVG(SAL)',
5      'JOB'
6    );
7  end;
8  /

PL/SQL procedure successfully completed.

SQL> print :rc

DNAME             ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
-------------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING                      1300       2450       5000
RESEARCH             3000        950       2975
SALES                            950       2850                  1400
```

#### Transposing the data and aliasing the aggregate

```SQL> begin
3      '(select dname, job, sal from DEPT join EMP using(DEPTNO))',
4      'AVG(SAL) AVG_SAL',
5      'DNAME'
6    );
7  end;
8  /

PL/SQL procedure successfully completed.

SQL> print :rc

JOB       ACCOUNTING_AVG_SAL RESEARCH_AVG_SAL SALES_AVG_SAL
--------- ------------------ ---------------- -------------
ANALYST                                  3000
CLERK                   1300              950           950
MANAGER                 2450             2975          2850
PRESIDENT               5000
SALESMAN                                               1400
```

#### One aggregate, two FOR columns

The pivot columns are ordered by DNAME, JOB from left to right.

```SQL> begin
3      '(select dname, job, sal from DEPT join EMP using(DEPTNO))',
4      'MAX(SAL) MAX_SAL',
5      'DNAME,JOB'
6    );
7  end;
8  /

PL/SQL procedure successfully completed.

SQL> print :rc

ACCOUNTING_CLERK_MAX_SAL ACCOUNTING_MANAGER_MAX_SAL ACCOUNTING_PRESIDENT_MAX_SAL RESEARCH_ANALYST_MAX_SAL RESEARCH_CLERK_MAX_SAL RESEARCH_MANAGER_MAX_SAL SALES_CLERK_MAX_SAL SALES_MANAGER_MAX_SAL SALES_SALESMAN_MAX_SAL
------------------------ -------------------------- ---------------------------- ------------------------ ---------------------- ------------------------ ------------------- --------------------- ----------------------
1300                       2450                         5000                     3000                   1100                     2975                 950                  2850                   1600
```

#### Shorter pivot column names

The pivot column names can get long real fast. With this generic function, the way to shorten the names is to shorten the input data.

```SQL> begin
3      '(select initcap(substr(dname,1,4)) dname, initcap(substr(job,1,5)) job, sal from DEPT join EMP using(DEPTNO))',
4      'MIN(SAL) "Min_Sal"',
5      'JOB,DNAME'
6    );
7  end;
8  /

PL/SQL procedure successfully completed.

SQL> print :rc

Analy_Rese_Min_Sal Clerk_Acco_Min_Sal Clerk_Rese_Min_Sal Clerk_Sale_Min_Sal Manag_Acco_Min_Sal Manag_Rese_Min_Sal Manag_Sale_Min_Sal Presi_Acco_Min_Sal Sales_Sale_Min_Sal
------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ ------------------
3000               1300                800                950               2450               2975               2850               5000               1250
```

#### Two aggregates, two FOR columns

I would rather not do this by hand…

```SQL> begin
3      '(select substr(dname,1,2) dname, substr(job,1,2) job, sal from DEPT join EMP using(DEPTNO))',
4      'count(*),AVG(SAL) AVG',
5      'DNAME,JOB'
6    );
7  end;
8  /

PL/SQL procedure successfully completed.

SQL> print :rc

AC_CL  AC_CL_AVG      AC_MA  AC_MA_AVG      AC_PR  AC_PR_AVG      RE_AN  RE_AN_AVG      RE_CL  RE_CL_AVG      RE_MA  RE_MA_AVG      SA_CL  SA_CL_AVG      SA_MA  SA_MA_AVG      SA_SA  SA_SA_AVG
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
1       1300          1       2450          1       5000          1       3000          2        950          1       2975          1        950          1       2850          4       1400
```

#### What if the FOR column is a date?

I want to break down the number of hires in each department by month, so why not truncate HIREDATE by month, then just display the month and the year?

```SQL> begin
3      '(select dname, to_char(trunc(hiredate,''MM''),''MON-yyyy'') hiredate from DEPT join EMP using(DEPTNO))',
4      'count(*)',
5      'hiredate'
6    );
7  end;
8  /

PL/SQL procedure successfully completed.

SQL> print :rc

DNAME            APR-1981   DEC-1980   DEC-1981   FEB-1981   JAN-1982   JUN-1981   MAY-1981   MAY-1987   NOV-1981   SEP-1981
-------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING              0          0          0          0          1          1          0          0          1          0
RESEARCH                1          1          1          0          0          0          0          1          0          0
SALES                   0          0          1          2          0          0          1          0          0          2
```

Oops! The input HIREDATE is now a string, so the pivot column names are ordered by the string values.

#### Ordering dates in date order

Here is a situation where implicit conversion is your friend. Just keep the truncated date in your input query and set NLS_DATE_FORMAT as you wish. I’ll order the columns by date and Oracle will convert the dates to strings in the format you specified.

```SQL> alter session set nls_date_format='mon-yyyy';

Session altered.

SQL> begin
3      '(select dname, trunc(hiredate,''MM'') hiredate from DEPT join EMP using(DEPTNO))',
4      'count(*)',
5      'hiredate'
6    );
7  end;
8  /

PL/SQL procedure successfully completed.

SQL> print :rc

DNAME            dec-1980   feb-1981   apr-1981   may-1981   jun-1981   sep-1981   nov-1981   dec-1981   jan-1982   may-1987
-------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNTING              0          0          0          0          1          0          1          0          1          0
RESEARCH                1          0          1          0          0          0          0          1          0          1
SALES                   0          2          0          1          0          2          0          1          0          0
```

#### The Code

```create or replace function advanced_pivot(
p_source in varchar2,       -- table, view or query in parentheses
p_aggfuncs in varchar2,     -- one or more aggregation functions with or without aliases
p_pivotfor in varchar2,     -- one or more columns from the input separated by commas
p_total_label in varchar2 default null, -- label to use for pivot totals (if CUBE, ROLLUP, etc.)
p_ordercol in varchar2 default null     -- column(s) to order by with p_total_label last
) return sys_refcursor authid current_user is
/*
Calculates pivot_in_list using SQL 1, updates SQL 2 text and opens ref cursor.
- Pivot_in_list concatenates values from all PIVOT_FOR columns
for example, if (DNAME,JOB) then
('ACCOUNTING','CLERK') as "ACCOUNTING_CLERK",
('ACCOUNTING','MANAGER') as "ACCOUNTING_MANAGER",
and so on
*/
l_sql sys.odcivarchar2list := sys.odcivarchar2list(

q'@select listagg('(''' || expr || ''') as "' || al || '"', ',
')
within group(order by #ORDERFOR#)
from (select distinct
#EXPR# expr,
#ALIAS# al,
#PIVOTFOR#
from #SOURCE#)@',

'select * from #SOURCE#
pivot(#AGGFUNCS# for (#PIVOTFOR#) in (
#PIVOTINLIST#
)) order by #ORDERCOL#1,2,3'
);
l_pivotfor varchar2(255);
l_orderfor varchar2(255);
l_refcur sys_refcursor;
l_pivotinlist varchar2(32767);
l_expr varchar2(4000);
l_alias varchar2(4000);
l_ordercol varchar2(255);
begin
l_pivotfor := trim( ',' from replace(p_pivotfor,' ') );
l_orderfor :=
case when p_total_label is null
then l_pivotfor
else 'nullif('
|| replace(l_pivotfor, ',', ','''||p_total_label||'''),nullif(')
|| ','''||p_total_label||''')'
end;
l_sql(1) := replace(l_sql(1), '#ORDERFOR#', l_orderfor);

l_expr := replace(l_pivotfor,',',q'%||''','''||%');
l_sql(1) := replace(l_sql(1), '#EXPR#', l_expr);

l_alias := replace(l_pivotfor,',',q'%||'_'||%');
l_sql(1) := replace(l_sql(1), '#ALIAS#', l_alias);

for i in 1..l_sql.count loop
l_sql(i) := replace(l_sql(i), '#SOURCE#', p_source);
l_sql(i) := replace(l_sql(i), '#PIVOTFOR#', l_pivotfor);
end loop;

dbms_output.put_line(l_sql(1));
dbms_output.put_line('/');
open l_refcur for l_sql(1);
fetch l_refcur into l_pivotinlist;
close l_refcur;

l_sql(2) := replace(l_sql(2), '#AGGFUNCS#', p_aggfuncs);
l_sql(2) := replace(l_sql(2), '#PIVOTINLIST#', l_pivotinlist);

l_ordercol := trim( ',' from replace(p_ordercol,' ') );
l_ordercol :=
case when p_total_label is null or l_ordercol is null
then l_ordercol
else 'nullif('
|| replace(l_ordercol, ',', ','''||p_total_label||'''),nullif(')
|| ','''||p_total_label||'''),'
end;
l_sql(2) := replace(l_sql(2), '#ORDERCOL#', l_ordercol);

dbms_output.put_line(l_sql(2));
dbms_output.put_line('/');
open l_refcur for l_sql(2);
return l_refcur;

/```

I really enjoyed the challenge of writing a truly generic function. I hope you enjoy using it!

# Techniques for Comparing Tables

In my “Advanced Row Pattern Matching” presentation, I demonstrate using MATCH_RECOGNIZE to compare tables. Kim Berg Hansen asked me to compare this technique with others. I did some quick tests and here are the results with some comments.

Technique Seconds
Full join 1
Group by (HASH) 1
Group by (SORT) 1.4
Analytic function 2.5
MATCH_RECOGNIZE 3.7

The “Full join” technique only works when we have a primary or unique key that is shared by both tables. I prefer the GROUP BY technique popularized by Tom Kyte, even though it may be a bit slower. When testing, I noticed that the HASH GROUP BY algorithm performs better than SORT GROUP BY, as others have written.

If either of the tables contains duplicate rows (which may happen if we don’t compare all of the columns, or if there is no primary key), then GROUP BY will output one row. This may be a problem if we want data (such as the ROWID)  that was not included in the comparison. In that case, we could use analytic functions or the MATCH_RECOGNIZE clause to compare and output all the rows and columns of interest. As you can see, the analytic function is more than twice as slow but it easily beats the MATCH_RECOGNIZE clause.

I use the output from table comparisons to synchronize the tables, so capturing the ROWID is important to me even when a primary or unique key is not available. For that use case, I will prefer analytic functions from now on.

We very often make spreadsheets with subtotals for each row and for each column. Someone on the OTN forum asked how to produce data in this format. I answered using the cool `CUBE` function.

The question used `SCOTT.EMP` as input. The requirement was to sum salaries by `DEPTNO` and `JOB` and to display them as follows:

JOB 10 20 30 TOTAL
ANALYST 6000 6000
CLERK 1300 1900 950 4150
MANAGER 2450 2975 2850 8275
PRESIDENT 5000 5000
SALESMAN 5600 5600
Total 8750 10875 9400 29025

• The white cells with numbers contain subtotals by `DEPTNO` and `JOB`;
• the yellow cells (right hand column) contain subtotals by `JOB`;
• the blue cells (bottom row) contain subtotals by `DEPTNO`;
• and the green cell (bottom right) contains the grand total.

#### Getting all the totals

The `CUBE` extension to `GROUP BY` is ideal for this kind of cross-tabular report: it will generate everything we need with one `SELECT` and one table scan.

```select deptno, job, sum(sal) sal
from scott.emp
group by cube(deptno, job);```
DEPTNO JOB SAL
29025
CLERK 4150
ANALYST 6000
MANAGER 8275
SALESMAN 5600
PRESIDENT 5000
10 8750
10 CLERK 1300
10 MANAGER 2450
10 PRESIDENT 5000
20 10875
20 CLERK 1900
20 ANALYST 6000
20 MANAGER 2975
30 9400
30 CLERK 950
30 MANAGER 2850
30 SALESMAN 5600

#### Formatting the output

Some folks whose opinion I respect say that formatting reports should be done outside of SQL. I agree in principle, but that didn’t stop me from answering the question using the `PIVOT` clause. As always with this clause, you have to know in advance how many columns you want to end up with!

The tricky part of this particular pivoting operation is handling NULLs correctly. For one thing, the `JOB` subtotals need to be pivoted to the rightmost column, but they have no `DEPTNO` value to pivot to. For another thing, the input might have NULLs in the `JOB` or `DEPTNO` columns, so I need a reliable way to identify the output rows that have subtotals.

I use the `GROUPING()` function to identify the subtotals:

• When `GROUPING(DEPTNO)` is equal to 1, the row contains a `JOB` subtotal (or the grand total) and I have to assign an arbitrary `DEPTNO` value so I can pivot.
• When `GROUPING(JOB)` is equal to 1, the row contains a `DEPTNO` subtotal (or the grand total) so after pivoting I output ‘Total’ in the `JOB` column of the last row.
```select case gr_job when 1 then 'Total' else job end job,
"10", "20", "30", "Total"
from (
select case grouping(deptno) when 1 then -1 else deptno end deptno,
job, grouping(job) gr_job, sum(sal) sal
from scott.emp
group by cube(deptno, job)
)
pivot(
sum(sal) for deptno in (10, 20, 30, -1 as "Total")
)
order by gr_job, job;```
JOB 10 20 30 Total
ANALYST 6000 6000
CLERK 1300 1900 950 4150
MANAGER 2450 2975 2850 8275
PRESIDENT 5000 5000
SALESMAN 5600 5600
Total 8750 10875 9400 29025