In my post New, Improved IN Lists!, I split one string into multiple rows. Now I want to split multiple input strings – but first, I’ve rediscovered an even faster technique!
In this post I’ll introduce the technique, and in the next post I’ll do some testing and comparing.
XMLTABLE with XMLTYPE
- (Warning: the XML services in the Oracle database have evolved over the last several versions. Some of the solutions here may not work in older versions. Also, prior to version 12c a DBA could decide not to install “XML DB” at all! I have only tested in version 22.214.171.124.)
Quite a few folks have used the
XMLTABLE function to solve this problem. One way is to change the string into an XML document: there is one root node that contains one child node per substring.
var txt varchar2(20); exec :txt := 'A,BB,CCC,DDDD,EEEEE'; select '<r><c>'||replace(:txt, ',', '</c><c>')||'</c></r>' txt from dual; <r><c>A</c><c>BB</c><c>CCC</c><c>DDDD</c><c>EEEEE</c></r>
Using the XPATH expression ‘
XMLTABLE will go through the child nodes and produce one row per substring.
select subs from xmltable( '/r/c/text()' passing xmltype('<r><c>'||replace(:txt, ',', '</c><c>')||'</c></r>') columns subs varchar2(4000) path '.' );
The main drawback of this solution, aside from performance, is that the input string cannot be a full 4000 bytes long.
[Update 2016-08-01: Marc Bleron rightly commented that I could use a CLOB as an input string and have way more that 4000 bytes. The limitation I mention is only true when the input string is a VARCHAR2.]
XMLTABLE with an XQUERY sequence
If I put double quotes (or single quotes) around all the substrings, then the result is an XQUERY sequence. XMLTABLE will simply output that sequence one row per item.
select '"'||replace(:txt, ',', '","')||'"' str from dual; "A","BB","CCC","DDDD","EEEEE" with data as ( select '"'||replace(:txt, ',', '","')||'"' str from dual ) select xmlcast(column_value as varchar2(4000)) subs from data, xmltable(str);
Again, this solution breaks if the input string is too long.
Why doesn’t Oracle just provide a function to split these confounded strings? It does!
select subs from xmltable( 'ora:tokenize($X, "\,")' passing :txt as X columns subs varchar2(4000) path '.' );
How simple is that? Well, not as simple as I would like. The nasty Oracle developers have decided that this function should raise an exception if the delimiter is not present in the input string. In other words, ‘A,BB’ is OK but just ‘A’ will produce “ORA-19176: FORX0003: regular expression matches zero-length string”.
Marc Bleron, who has published the
ora:tokenize solution, worked around this problem by concatenating an extra comma to the input string. I worked out this alternative that allows for a full 4000 byte VARCHAR2 input string:
select subs from xmltable( 'if (contains($X,",")) then ora:tokenize($X,"\,") else $X' passing :txt as X columns subs varchar2(4000) path '.' );
[Update 2016-08-02: in a comment on the next post, Todd Hershiser points out that the second parameter in ora:tokenize is a regex expression. In order to use a regex “metacharacter” like “|” as a delimiter, I need to escape it with a backslash. I decided to put the backslash in everywhere since it doesn’t do any harm.
[Update 2016-08-05: There is a problem with this solution if the ampersand & is involved. You have to convert it to & in the input string and the delimiter.]
In the next post I’ll compare the performance of these solutions with the
substr+instr approach, and with a popular