Strings that contain multiple values violate first normal form, which mandates “one value per column per row”. People still ask how to make or massage lists of values. This post is about aggregating values into lists.
Rewrite 2019-07-11: I am now using DNAME instead of DEPTNO so that the JOIN makes sense. Also Tony Hasler pointed out that JOIN before MODEL does not need an inline view.
(To split lists into little tables, see my series on “Splitting strings”.)
Chunks of lists
(For a discussion of this topic that has influenced my solutions, see Managing overflows in LISTAGG by Keith Laker, with solutions provided by Lei Sheng.)
Version 11.2 introduced LISTAGG, which concatenates values with or without a delimiter. For example:
select dname, listagg(ename, ',') within group(order by ename) enames from dept join emp using(deptno) group by dname order by dname;
DNAME | ENAMES |
---|---|
ACCOUNTING | CLARK,KING,MILLER |
RESEARCH | ADAMS,FORD,JONES,SCOTT,SMITH |
SALES | ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD |
Unfortunately, the return value cannot be a CLOB
; if it is too long, we get an ORA-01489 exception. Version 12.2 provides the ON OVERFLOW
clause that truncates the return value, so no error but we don’t get all the values back.
Even before version 12.2, we could get all the values if we divided the list into chunks with a valid maximum length.
Starting with 12.1 and using the MATCH_RECOGNIZE
clause:
select dname, chunk, listagg(ename, ',') within group(order by ename) enames from ( select dname, ename from dept join emp using(deptno) ) match_recognize( partition by dname order by ename measures match_number() chunk all rows per match pattern(s b*) define b as sum(lengthb(ename)) + count(b.*) <= 20 ) group by dname, chunk order by dname, chunk;
DNAME | CHUNK | ENAMES |
---|---|---|
ACCOUNTING | 1 | CLARK,KING,MILLER |
RESEARCH | 1 | ADAMS,FORD,JONES |
RESEARCH | 2 | SCOTT,SMITH |
SALES | 1 | ALLEN,BLAKE,JAMES |
SALES | 2 | MARTIN,TURNER,WARD |
Notice I am using LENGTHB
to calculate the length in bytes of the input strings. COUNT(B.*)
adds the lengths of the delimiters, assuming that the delimiter is one byte long. Lei uses LENGTHB
even on the delimiter.
Before 12.1 we can get there with the MODEL clause:
select dname, chunk, listagg(ename, ',') within group(order by ename) enames from ( select * from dept join emp using(deptno) model partition by (dname) dimension by (row_number() over(partition by dname order by ename) rn) measures(ename, lengthb(ename) length_so_far, 1 chunk) rules( length_so_far[rn>1] = lengthb(ename[cv()]) + case when lengthb(ename[cv()]) + length_so_far[cv()-1] + 1 <= 20 then length_so_far[cv()-1] + 1 else 0 end, chunk[rn>1] = chunk[cv()-1] + case when length_so_far[cv()] = lengthb(ename[cv()]) then 1 else 0 end ) ) group by dname, chunk order by dname, chunk;
- Lines 10-15 : the first
RULE
calculates the running length of the list that would be made, starting over once the list gets past the limit of 20 bytes. - Lines 16-21 : the second
RULE
increases theCHUNK
number whenever the running length starts over. - Finally,
LISTAGG
groups byDNAME
andCHUNK
.
But I want just one long string!
In that case, you need a CLOB. For that, you don’t even need version 11.2, just version 11.1.
select dname, SUBSTR( xmlcast( xmlagg( xmlelement(E, ',' || ename) order by ename ) AS CLOB -- AS VARCHAR2(4000) would be the equivalent of LISTAGG ), 2 ) enames from dept join emp using(deptno) group by dname order by dname;