New, Improved IN Lists!

In SQL, the “varying IN list” problem comes up constantly and there are many published solutions. My new favorite solution is inspired from the video Synthesizing rows inside Oracle by Connor McDonald (start at the 6 minute mark).

The idea is to extract a series of delimited values from a string, and to present those values one row at a time. For example, the string
A,BB,CCC,DDDD,EEEEE

would become:
A
BB
CCC
DDDD
EEEEE

I’m not going to bother you with the other solutions, except for two remarks:

  1. Tom Kyte’s classic solution with SUBSTR() and INSTR() is a bit complicated and doesn’t work with strings longer than 3998 bytes.
  2. The popular REGEXP_SUBSTR() solution is a bit complicated and the REGEXP* functions use rather a lot of CPU.

Connor’s idea is efficient, works with the longest strings and seems slightly less complicated to me. However, that’s not going to stop me from explaining my variant in excruciating detail!

Generating rows

Like Connor, I’ll start with a reminder how easy and efficient it is to generate rows from DUAL using CONNECT BY:

select level from dual connect by level <= 5;
LEVEL
1
2
3
4
5

 

Extracting a value

So I have 5 rows, big deal… Now I need to put the right value in each row, which SUBSTR() can do. To extract the third value for example:

var txt varchar2(20);
exec :txt := 'A,BB,CCC,DDDD,EEEEE';

select :txt "Input",
  6 "Position",
  3 "Output length",
  substr(:txt, 6, 3) CCC
from dual;
Input Position Output length CCC
A,BB,CCC,DDDD,EEEEE 6 3 CCC

 

Locating the delimiters

As you probably know, or can guess, the key to getting the right positions is to locate the delimiters. This is a job for INSTR():

select :txt "Input",
  ',' "Substring",
  1 "Position",
  2 "Occurence",
  instr( :txt, ',', 1, 2 ) "Pos after BB"
from dual;
Input Substring Position Occurence Pos after BB
A,BB,CCC,DDDD,EEEEE , 1 2 5

 

Now let’s locate all the commas:

select instr(:txt, ',', 1, level) pos
from dual
connect by level <= 5;
POS
2
5
9
14
0

 

Do you see how clever Connor is there? There are only 4 commas, but 5 rows are needed. That last row contains a zero because INSTR() did not find a fifth comma. I can pretend that zero is a “virtual comma” before the first value.

[Update 2016-06-23: changed the CONNECT BY to avoid repeated unnecessary calls to INSTR().]

Of course, I mustn’t hard-code that “5” at the end of the statement. Instead,

  • I’ll calculate the length of the input;
  • I’ll calculate the length of the input without commas;
  • by subtraction I get the number of commas,
  • then I add 1 to get the number of output rows.
select instr(:txt, ',', 1, level) pos
from dual
connect by
  level <= length(:txt) - nvl(length(replace(:txt, ',', '')), 0) + 1;
POS
2
5
9
14
0

 

Putting it all together

Now that I have all my commas, with one “virtual comma” at position zero, I just add one to the comma positions and I have the starting point of each value. The length is equal to the next position, minus this position, minus 1.

select substr(
  :txt,
  pos + 1,
  lead(pos) over(order by pos) - pos - 1
) subs
from (
  select instr(:txt, ',', 1, level) pos
  from dual
  connect by
    level <= length(:txt) - nvl(length(replace(:txt, ',', '')), 0) + 1
);
SUBS
A
BB
CCC
DDDD

 

What happened on that last line? The LEAD() function tried to access a row that doesn’t exist, so it returned NULL. Fortunately, we can make LEAD() return another value in that situation:

select substr(
  :txt,
  pos + 1,
  lead(pos, 1, 4000) over(order by pos) - pos - 1
) subs
from (
  select instr(:txt, ',', 1, level) pos
  from dual
  connect by
    level <= length(:txt) - nvl(length(replace(:txt, ',', '')), 0) + 1
);
SUBS
A
BB
CCC
DDDD
EEEEE

 

Conclusion

This to me is an elegant and efficient solution that demonstrates at least 3 clever things we can do with Oracle SQL:

  1. Use CONNECT BY LEVEL to generate as many rows as we want;
  2. Use the zero, returned by INSTR() when it finds nothing, as a “virtual comma” before the first value;
  3. Use the “default” parameter of LEAD() to get a “virtual comma” after the last value.
Advertisements

2 thoughts on “New, Improved IN Lists!

  1. What a beautiful step by step explanation. Had you thrown final query, it would have been difficult to understand but step by step algorithm makes it elegant.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s