In SQL, the “varying IN list” problem comes up constantly and there are many published solutions. My new favorite solution is inspired from the video Synthesizing rows inside Oracle by Connor McDonald (start at the 6 minute mark).
The idea is to extract a series of delimited values from a string, and to present those values one row at a time. For example, the string
I’m not going to bother you with the other solutions, except for two remarks:
- Tom Kyte’s classic solution with
INSTR()is a bit complicated and doesn’t work with strings longer than 3998 bytes.
- The popular
REGEXP_SUBSTR()solution is a bit complicated and the
REGEXP* functions use rather a lot of CPU.
Connor’s idea is efficient, works with the longest strings and seems slightly less complicated to me. However, that’s not going to stop me from explaining my variant in excruciating detail!
Like Connor, I’ll start with a reminder how easy and efficient it is to generate rows from DUAL using CONNECT BY:
select level from dual connect by level <= 5;
Extracting a value
So I have 5 rows, big deal… Now I need to put the right value in each row, which
SUBSTR() can do. To extract the third value for example:
var txt varchar2(20); exec :txt := 'A,BB,CCC,DDDD,EEEEE'; select :txt "Input", 6 "Position", 3 "Output length", substr(:txt, 6, 3) CCC from dual;
Locating the delimiters
As you probably know, or can guess, the key to getting the right positions is to locate the delimiters. This is a job for
select :txt "Input", ',' "Substring", 1 "Position", 2 "Occurence", instr( :txt, ',', 1, 2 ) "Pos after BB" from dual;
|Input||Substring||Position||Occurence||Pos after BB|
Now let’s locate all the commas:
select instr(:txt, ',', 1, level) pos from dual connect by level <= 5;
Do you see how clever Connor is there? There are only 4 commas, but 5 rows are needed. That last row contains a zero because
INSTR() did not find a fifth comma. I can pretend that zero is a “virtual comma” before the first value.
[Update 2016-06-23: changed the
CONNECT BY to avoid repeated unnecessary calls to
Of course, I mustn’t hard-code that “5” at the end of the statement. Instead,
- I’ll calculate the length of the input;
- I’ll calculate the length of the input without commas;
- by subtraction I get the number of commas,
- then I add 1 to get the number of output rows.
select instr(:txt, ',', 1, level) pos from dual connect by level <= length(:txt) - nvl(length(replace(:txt, ',', '')), 0) + 1;
Putting it all together
Now that I have all my commas, with one “virtual comma” at position zero, I just add one to the comma positions and I have the starting point of each value. The length is equal to the next position, minus this position, minus 1.
select substr( :txt, pos + 1, lead(pos) over(order by pos) - pos - 1 ) subs from ( select instr(:txt, ',', 1, level) pos from dual connect by level <= length(:txt) - nvl(length(replace(:txt, ',', '')), 0) + 1 );
What happened on that last line? The
LEAD() function tried to access a row that doesn’t exist, so it returned
NULL. Fortunately, we can make
LEAD() return another value in that situation:
select substr( :txt, pos + 1, lead(pos, 1, 4000) over(order by pos) - pos - 1 ) subs from ( select instr(:txt, ',', 1, level) pos from dual connect by level <= length(:txt) - nvl(length(replace(:txt, ',', '')), 0) + 1 );
This to me is an elegant and efficient solution that demonstrates at least 3 clever things we can do with Oracle SQL:
- Use CONNECT BY LEVEL to generate as many rows as we want;
- Use the zero, returned by INSTR() when it finds nothing, as a “virtual comma” before the first value;
- Use the “default” parameter of LEAD() to get a “virtual comma” after the last value.