Splitting Strings: Surprise!

In my post New, Improved IN Lists!, I split one string into multiple rows. Now I want to split multiple input strings – but first, I’ve rediscovered an even faster technique!

In this post I’ll introduce the technique, and in the next post I’ll do some testing and comparing.

XMLTABLE with XMLTYPE

  • (Warning: the XML services in the Oracle database have evolved over the last several versions. Some of the solutions here may not work in older versions. Also, prior to version 12c a DBA could decide not to install “XML DB” at all! I have only tested in version 12.1.0.2.)

Quite a few folks have used the XMLTABLE function to solve this problem. One way is to change the string into an XML document: there is one root node that contains one child node per substring.

var txt varchar2(20);
exec :txt := 'A,BB,CCC,DDDD,EEEEE';

select '<r><c>'||replace(:txt, ',', '</c><c>')||'</c></r>' txt
from dual;

<r><c>A</c><c>BB</c><c>CCC</c><c>DDDD</c><c>EEEEE</c></r>

Using the XPATH expression ‘/r/c/text()‘, XMLTABLE will go through the child nodes and produce one row per substring.

select subs from xmltable(
  '/r/c/text()'
  passing xmltype('<r><c>'||replace(:txt, ',', '</c><c>')||'</c></r>')
  columns subs varchar2(4000) path '.'
);
SUBS
A
BB
CCC
DDDD
EEEEE

 

The main drawback of this solution, aside from performance, is that the input string cannot be a full 4000 bytes long.

[Update 2016-08-01: Marc Bleron rightly commented that I could use a CLOB as an input string and have way more that 4000 bytes. The limitation I mention is only true when the input string is a VARCHAR2.]

XMLTABLE with an XQUERY sequence

If I put double quotes (or single quotes) around all the substrings, then the result is an XQUERY sequence. XMLTABLE will simply output that sequence one row per item.

select '"'||replace(:txt, ',', '","')||'"' str 
from dual;

"A","BB","CCC","DDDD","EEEEE"

with data as (
  select '"'||replace(:txt, ',', '","')||'"' str 
  from dual
)
select xmlcast(column_value as varchar2(4000)) subs
from data, xmltable(str);
SUBS
A
BB
CCC
DDDD
EEEEE

Again, this solution breaks if the input string is too long.

Tokenize!

Why doesn’t Oracle just provide a function to split these confounded strings? It does!

Oracle XQuery function ora:tokenize lets you use a regular expression to split the input string … into a sequence of strings.

select subs from xmltable(
  'ora:tokenize($X, "\,")'
  passing :txt as X
  columns subs varchar2(4000) path '.'
);

How simple is that? Well, not as simple as I would like. The nasty Oracle developers have decided that this function should raise an exception if the delimiter is not present in the input string. In other words, ‘A,BB’ is OK but just ‘A’ will produce “ORA-19176: FORX0003: regular expression matches zero-length string”.

Marc Bleron, who has published the ora:tokenize solution, worked around this problem by concatenating an extra comma to the input string. I worked out this alternative that allows for a full 4000 byte VARCHAR2 input string:

select subs from xmltable(
    'if (contains($X,",")) then ora:tokenize($X,"\,") else $X'
  passing :txt as X
  columns subs varchar2(4000) path '.'
);

[Update 2016-08-02: in a comment on the next post, Todd Hershiser points out that the second parameter in ora:tokenize is a regex expression. In order to use a regex “metacharacter” like “|” as a delimiter, I need to escape it with a backslash. I decided to put the backslash in everywhere since it doesn’t do any harm.

[Update 2016-08-05: There is a problem with this solution if the ampersand & is involved. You have to convert it to &amp; in the input string and the delimiter.]

What’s Next?

In the next post I’ll compare the performance of these solutions with the substr+instr approach, and with a popular regexp_substr approach.

Advertisements

4 thoughts on “Splitting Strings: Surprise!

  1. Hi Stewart,

    Note that Oracle now supports (as of 12c, and maybe 11.2.0.4 too) the standard fn:tokenize() XQuery function, which does not raise any exception when the pattern is missing.

    • Hi Marc,

      It’s a pleasure to hear from you! You have taught the Oracle community (and me in particular) so many things about XML support in the Oracle Database. Thanks so much!

      I just tested “fn:tokenize” in 12.1.0.2 and it indeed works, however the official documentation states:

      Oracle XML DB does not support the following XQuery functions and operators:

      Function fn:tokenize. Use Oracle XQuery function ora:tokenize instead.

      See http://docs.oracle.com/database/121/ADXDB/xdb_xquery.htm#ADXDB5164

      I didn’t see any mention of fn:tokenize in MOS either. What makes you think it is supported?

    • Marc,

      You are right of course. I was not precise in my language. I meant a VARCHAR2 input string only. I updated my post to mention your correction.

      Thanks again, Stew

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s