Making Lists

Strings that contain multiple values violate first normal form, which mandates “one value per column per row”.  People still ask how to make or massage lists of values. This post is about aggregating values into lists.

Rewrite 2019-07-11: I am now using DNAME instead of DEPTNO so that the JOIN makes sense. Also Tony Hasler pointed out that JOIN before MODEL does not need an inline view.

(To split lists into little tables, see my series on “Splitting strings”.)

Chunks of lists

(For a discussion of this topic that has influenced my solutions, see Managing overflows in LISTAGG by Keith Laker, with solutions provided by Lei Sheng.)

Version 11.2 introduced LISTAGG, which concatenates values with or without a delimiter. For example:

select dname,
  listagg(ename, ',') within group(order by ename) enames
from dept join emp using(deptno)
group by dname
order by dname;
DNAME ENAMES
ACCOUNTING CLARK,KING,MILLER
RESEARCH ADAMS,FORD,JONES,SCOTT,SMITH
SALES ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

 
Unfortunately, the return value cannot be a CLOB; if it is too long, we get an ORA-01489 exception. Version 12.2 provides the ON OVERFLOW clause that truncates the return value, so no error but we don’t get all the values back.

Even before version 12.2, we could get all the values if we divided the list into chunks with a valid maximum length.

Starting with 12.1 and using the MATCH_RECOGNIZE clause:

select dname, chunk,
  listagg(ename, ',') within group(order by ename) enames
from (
  select dname, ename from dept join emp using(deptno)
)
match_recognize(
  partition by dname order by ename
  measures match_number() chunk
  all rows per match
  pattern(s b*)
  define b as sum(lengthb(ename)) + count(b.*) <= 20
)
group by dname, chunk
order by dname, chunk;
DNAME CHUNK ENAMES
ACCOUNTING 1 CLARK,KING,MILLER
RESEARCH 1 ADAMS,FORD,JONES
RESEARCH 2 SCOTT,SMITH
SALES 1 ALLEN,BLAKE,JAMES
SALES 2 MARTIN,TURNER,WARD

 

Notice I am using LENGTHB to calculate the length in bytes of the input strings. COUNT(B.*) adds the lengths of the delimiters, assuming that the delimiter is one byte long. Lei uses LENGTHB even on the delimiter.

Before 12.1 we can get there with the MODEL clause:

select dname, chunk,
  listagg(ename, ',') within group(order by ename) enames
from (
  select * from dept join emp using(deptno)
  model
    partition by (dname)
    dimension by (row_number() over(partition by dname order by ename) rn)
    measures(ename, lengthb(ename) length_so_far, 1 chunk)
    rules(
      length_so_far[rn>1] = lengthb(ename[cv()]) +
        case
          when lengthb(ename[cv()]) + length_so_far[cv()-1] + 1 <= 20
          then length_so_far[cv()-1] + 1
          else 0
        end,
      chunk[rn>1] = chunk[cv()-1] +
        case
          when length_so_far[cv()] = lengthb(ename[cv()])
          then 1
          else 0
        end    
    )
)
group by dname, chunk
order by dname, chunk;
  • Lines 10-15 : the first RULE calculates the running length of the list that would be made, starting over once the list gets past the limit of 20 bytes.
  • Lines 16-21 : the second RULE increases the CHUNK number whenever the running length starts over.
  • Finally, LISTAGG groups by DNAME and CHUNK.

But I want just one long string!

In that case, you need a CLOB. For that, you don’t even need version 11.2, just version 11.1.

select dname,
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(E, ',' || ename)
      order by ename
    ) AS CLOB  -- AS VARCHAR2(4000) would be the equivalent of LISTAGG
  ), 2
) enames
from dept join emp using(deptno)
group by dname
order by dname;
Advertisements

2 thoughts on “Making Lists

  1. Hi Stew,

    Joins are always performed before model clause evaluation so only one inline view is required. Also, an order by clause is probably a good idea. Interesting as usual!

    select deptno, chunk,
    listagg(ename, ‘,’) within group(order by ename) enames
    from (
    select * from dept join emp using(deptno)
    model
    partition by (deptno)
    dimension by (row_number() over(partition by deptno order by ename) rn)
    measures(ename, lengthb(ename) length_so_far, 1 chunk)
    rules(
    length_so_far[rn>1] = lengthb(ename[cv()]) +
    case
    when lengthb(ename[cv()]) + length_so_far[cv()-1] + 1 LE 20
    then length_so_far[cv()-1] + 1
    else 0
    end,
    chunk[rn>1] = chunk[cv()-1] +
    case
    when length_so_far[cv()] = lengthb(ename[cv()])
    then 1
    else 0
    end
    )
    )
    group by deptno, chunk
    order by deptno,chunk;

    • Thanks for catching that, Tony. I thought I tested that but got my parentheses mixed up. I decided to rewrite the post, which makes the context of your remarks disappear, sorry…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s