APEX – Scratching the itch #JoelKallmanDay

My biggest regret upon retirement in 2015 was not having had the time to study APEX and promote it in my organisation.

Fortunately, I had a daughter in medical school in Paris. She was in the Commission that managed the process of assigning internships to students every trimester. I used the “Hungarian Algorithm” to optimise the students’ preferences and achieve the greatest average satisfaction. With the help of Richard Martens, I then developed an APEX application to accept the preferences and to enable an administrator to run the process directly without my intervention.

Students on average got their fifth choice out of about 150 possibilities!

Since then I have developed a new version of this application after the merger of two medical schools, a similar application for the medical school in Nancy, and two COVID-related applications.

By this time, I should have enough experience to temper my enthusiasm for APEX: I should be starting the downward slope in the Hype Cycle from “inflated expectations” to the “trough of disillusionment”. No way! APEX was always there for me, either meeting my need out of the box or allowing me to tinker with a bit of javascript. Not only that, but slightly more difficult tasks kept getting easier with each version as new functions kept piling up.

Sometimes the feature I needed was hidden away where I could not find it, but online searches generally led me to some member of the APEX community who had already solved my problem. Often enough, that feature was simply there in the first spot I looked.

I know I have barely scratched the surface of all that APEX has to offer, but I was able to scratch my itch,
and boy did it feel good!

With my thanks to the entire team and especially Mr. Kallman,

Stew Ashton

SQL table macros 3: parameter strangeness

Like everything else in SQL macros, parameters are used differently from what we know or expect. Understanding how they work is key to understanding what macros do – and what they don’t.

[Update 2021-09-30: Chris Saxon reminded me that literal values can be inspected by SQL macros; the fact that VARCHAR2 literal values are nulled is an exception.]

Parse, then execute

I recently realized that Oracle never executes SQL SELECT statements! It parses them to generate execution plans, then executes the execution plans. The actual SQL statement text that was submitted has no bearing on the execution itself. This is just like classic third generation languages (COBOL, C, etc.) where a compiler generates an executable which is totally separate from the source code and which executes independently.

We have to accept that SQL macros operate at hard parse time, period. Once the execution plan is generated, how it got generated is irrelevant: it no longer matters whether a macro was called during the parse.

Table macro parameters

SQL table macros can have three types of formal parameters:

  • Table parameters of type DBMS_TF.TABLE_T
  • Column parameters of type DBMS_TF.COLUMNS_T
  • Classic parameters

To see how they work, here is a sample macro with all three types.

create or replace function select_table_column_equal(
  p_table in dbms_tf.table_t,
  p_column in dbms_tf.columns_t,
  p_string in varchar2
)
return varchar2 sql_macro is
  l_sql varchar2(4000);
begin
  l_sql := 
   'select t.*, ''' || p_string || ''' s from p_table t'
   || ' where ' || p_column(1) || ' = p_string';
  dbms_output.put_line(l_sql);
  return l_sql;
end select_table_column_equal;
/

select * from select_table_column_equal(emp, columns(ename),'KING');

When we submit this SELECT statement, the macro runs during the hard parse phase, so dbms_output.put_line shows us the text the macro passed on to the parser :

select t.*, '' s from p_table t where "ENAME" = p_string

and the output, as expected, is:

EMPNO ENAME JOB       MGR HIREDATE     SAL  COMM  DEPTNO INPUT
----- ----- --------- --- ---------- ----- ----- ------- -----
 7839 KING  PRESIDENT     1981-11-17  5000            10

There is so much going on here!

Let’s start by looking at the p_string parameter. This is a classic VARCHAR2 parameter whose value should be available at execution time in order to choose the rows to return. However, it makes no sense to use its value at parse time to change the macro’s output text, since the value can vary at each execution. In fact, we see that during the parse phase the value of the parameter is considered to be null. (It is true, as Chris Saxon has reminded me, that NUMBER parameters can be inspected by SQL macros, provided the caller passes a literal and provided that the CURSOR_SHARING initialization parameter is set to EXACT. I wish this were considered a bug and already fixed…)

Let me repeat: the value of a classic parameter cannot must not be used to influence the content of the macro’s output text! In most cases that value is not available, I can’t stick it into the output, and I can’t use it in a condition within the macro.

What I should do is use the name of the parameter within the output text. It is a placeholder that tells the parser where the actual parameter should be used during execution.

Now look at the COLUMNS_T parameter. When calling the macro, I have to use the COLUMNS pseudo-operator, which contains a list of one or more identifiers that could be used as column names. Since they are identifiers, they are part of the statement to be parsed and cannot be replaced by bind variables or expressions. Unlike the classic parameter p_string, the name “p_column” cannot be used for anything, but the values can. The list of identifiers that I pass is converted to a table of quoted identifiers; I plug the first entry from that table into the text that the macro outputs.

OK, so far we’ve seen that classic parameters can must only be used as placeholders in the output text: they tell the parser where to use the actual parameters during execution. The name is used at parse time; the value is used at execution time. On the other hand, with COLUMNS_T parameters the name is never used, but the value(s) are available for use within the macro.

Finally, let’s look at the TABLE_T parameter. When we call the macro, we pass an identifier (emp), not a literal. This identifier has to refer to an existing table or view, or to a named subquery that precedes the call to the macro. Since it is an identifier, it is part of the text to be parsed and it cannot be replaced by a bind variable or other expression. Inside the macro, this parameter is a complex structure that contains a lot of information; however, here I don’t use that information at all, I just put the name of the parameter in the output text. The parser will see the name and substitute the table, view or named subquery as an inline view.

More details on COLUMNS_T

Again, the formal parameter is of type DBMS_TF.COLUMNS_T:

create function f(p_columns in DBMS_TF.COLUMNS_T)

The actual parameter is the COLUMNS pseudo-operator containing a list of identifiers that must be valid column names, in other words simple SQL names:

select * from f(columns(n,vis,"Fancy Name"))

The PL/SQL code in the function sees this parameter as a nested table of quoted identifiers, exactly as if the input was run through DBMS_ASSERT.ENQUOTE_NAME.

If quoted, all characters are allowed except \0 and the double quote.

We can concatenate these values into our output text, or use the values to influence our logic.
For example, we can say what data types we want to include or exclude.

So, if you want a foolproof way to pass a literal value to a SQL macro, disguise it as a column name!

As time goes on, I will probably use (and abuse) this parameter to do things that have nothing to do with column names ;)

SQL table macros 2: “polymorphic” is not “dynamic”

A polymorphic SQL table macro can produce rows with different “shapes”, but the “shape” is established at parse time, not execution time. (By “shape” I mean the number of columns, their names and their characteristics such as data type.)

These macros work very differently from what we are used to. My goal here is to help understand how they work, not blow you away with an advanced use case. The examples are just illustrations, not meaningful solutions.

Current list of posts:

Starting simple: the TABLE_T parameter

Let’s call a Polymorphic (SQL) Table Macro a PTM for short. Here is a simple one:

SQL> create or replace function noop1(p_table in dbms_tf.table_t)
  2    return varchar2 sql_macro is
  3  begin
  4    dbms_output.put_line('table_schema_name: ' || p_table.table_schema_name);
  5    dbms_output.put_line('table_name: ' || p_table.table_name);
  6    return 'select * from p_table';
  7  end;
  8  /

Function NOOP1 compiled

The input parameter p_table has type DBMS_TF.TABLE_T. It is used in two different ways:

  1. Our PL/SQL code sees a record whose values we can access.
  2. The literal ‘p_table’ appears in the returned string where a table name would normally go, and the actual table name is substituted.

Think how strange that is: the code returns a literal, but part of the literal is really a placeholder for the value of the input parameter! The strangeness continues when we call the macro:

SQL> select * from noop1(dual);

D
-
X

table_schema_name: "SYS"
table_name: "DUAL"

SQL> select * from noop1("DUAL");

D
-
X

table_schema_name: "SYS"
table_name: "DUAL"

SQL> with x as (select * from dual)
  2  select * from noop1(x);

D
-
X

table_schema_name: ""
table_name: ""
  • Notice that the input parameter is not enclosed in single quotes: it is not a literal, but an identifier that is treated just the same as any other table name (or named subquery) that might be found in a SELECT statement.
  • When the input parameter refers to a named subquery, the schema and table name are left empty. The content of the named subquery is plugged into the SELECT statement as an inline view.
  • If we execute any of these SELECT statements twice in a row, the output from DBMS_OUTPUT.PUT_LINE will only appear once. This illustrates that the macro only executes when the statement is parsed.

Another simple example

If the TABLE_T input parameter references a real table or view (not a named subquery), we can plug the values from the input record into the return string.

create or replace function noop2(p_table in dbms_tf.table_t)
  return varchar2 sql_macro is
begin
  dbms_output.put_line('table_schema_name: ' || p_table.table_schema_name);
  dbms_output.put_line('table_name: ' || p_table.table_name);
  return
    case
      when p_table.table_name = '""' then
        'select * from p_table'
      else
        'select * from '|| p_table.table_schema_name || '.' || p_table.table_name
    end;
end;
/

Notice that in one case we write the parameter name and the macro uses it as a placeholder, but in the other case we concatenate input values into the SELECT statement ourselves.

Another new parameter type: COLUMNS_T

PTMs also accept input parameters of type DBMS_TF.COLUMNS_T. Let’s use this to decide which columns we want to see in the output.

create table t(a,b,c,d) as
select 1,2,3,4 from dual;

create or replace function pick_columns(
  p_table in dbms_tf.table_t,
  p_columns in dbms_tf.columns_t
)
  return varchar2 sql_macro is
begin
  return 'select p_columns from p_table';
end;
/
select * from pick_columns(t, columns(a,b,c));

ORA-00904: "P_COLUMNS": invalid identifier

Oops, it looks like automatic substitution doesn’t work. Let’s plug the values in ourselves.

create or replace function pick_columns(
  p_table in dbms_tf.table_t,
  p_columns in dbms_tf.columns_t
)
  return varchar2 sql_macro is
  l_columns varchar2(4000);
  l_sql varchar2(4000);
begin
  for i in 1..p_columns.count loop
    l_columns := l_columns
              || case when l_columns is not null then ',' end 
              || p_columns(i);
  end loop;
  l_sql := 'select ' || l_columns || ' from p_table';
  dbms_output.put_line(l_sql);
  return l_sql;
end;
/
select * from pick_columns(t, columns(a,b,c));

         A          B          C
---------- ---------- ----------
         1          2          3

select "A","B","C" from p_table

Lots of things to notice here:

  • When calling the function, the COLUMNS pseudo-operator must be used. The list contains valid simple identifiers.
  • The input appears to our code as a nested table containing quoted identifiers.
  • We have to plug the values into the SELECT statement ourselves.
  • When we print the statement ourselves, we see our own substitutions but the ‘p_table’ literal has not been replaced yet.

Other parameters

What happens if we use good old scalar parameters? This example takes as input a table, a column and a number, then returns the column value multiplied by the number.

create or replace function multiplier(
  p_table in dbms_tf.table_t,
  p_columns in dbms_tf.columns_t,
  p_multiplier in number
)
  return varchar2 sql_macro is
  l_sql varchar2(4000);
begin
  l_sql := 'select ' || p_columns(1) || ', ' 
            || p_columns(1) 
            || ' * p_multiplier multiplied_value from p_table';
  dbms_output.put_line('p_multiplier = <' || p_multiplier || '>');
  dbms_output.put_line(l_sql);
  return l_sql;
end;
/

SQL> select * from multiplier(t, columns(a), 9);

         A MULTIPLIED_VALUE
---------- ----------------
         1                9

p_multiplier = <9>
select "A", "A" * p_multiplier multiplied_value from p_table

The number 9 is a literal, and as such is part of the statement text; it is also available to the function code. However, in the returned string we should place the parameter name, not the value, just like the table parameter. We can see why if we use a bind variable instead of a literal.

SQL> var n number;
SQL> exec :n := 20;

PL/SQL procedure successfully completed.

SQL> select * from multiplier(t, columns(a), :n);

         A MULTIPLIED_VALUE
---------- ----------------
         1               20

p_multiplier = <>
select "A", "A" * p_multiplier multiplied_value from p_table

Notice that the value of the p_multiplier parameter is NULL. Values of “normal” parameters are not available to the macro when it executes during the parsing phase.

SQL> exec :n := 99;

PL/SQL procedure successfully completed.

SQL> select * from multiplier(t, columns(a), :n);

         A MULTIPLIED_VALUE
---------- ----------------
         1               99

Notice here that the DBMS_OUTPUT.PUT_LINE output doesn’t appear. Thanks to the use of a bind variable, the statement is not reparsed.

Synthesis

SELECT using a SQL table macro is a two-stage rocket:

  1. At hard parse time, the macro executes and returns a string containing a SELECT statement,
    which is then parsed.
  2. At execution time, the SELECT statement is executed. The macro doesn’t execute again, its work is done.

In other words, SQL table macros return code (the SELECT statement), not data, and they do their work at parse time, not execution time.

SQL table macros can have two new types of input parameters:

  • DBMS_TF.TABLE_T : refers to an existing table, view or named subquery
  • DBMS_TF.COLUMNS_T : refers to a list of identifiers that are valid column names, whether they exist or not.

These are in addition to “classic” parameters with SQL or PL/SQL data types.

The values of the TABLE_T and COLUMNS_T parameters are identifiers:

  • they are part of the SQL statement that SELECTs FROM the macro and cannot be replaced by bind variables, even with CURSOR_SHARING=FORCE;
  • their values are known at parse time, when the macro executes;

The “classic” parameters act differently depending on what the caller passes:

  • variables: values are NULL at parse time. They appear in the generated SELECT statement as bind variables.
    (The generated statement can be seen using DBMS_UTILITY.EXPAND_SQL_TEXT.)
  • literals: values are known at parse time and can be integrated directly into the SELECT statement.
    Careful, CURSOR_SHARING=FORCE turns the literals into bind variables, so never assume that “classic” parameter values are known at parse time!

At execution time, the values of the “classic” variable parameters are plugged into the bind variables of the final SELECT statement before execution.

Conclusion: “polymorphic” vs. “dynamic”

If you ask me, a “dynamic” table macro would be like a PL/SQL function that returns a REF CURSOR: the cursor can have a different “shape” at every execution, even when the parameters are the same, so the result is as “dynamic” as we want it to be. A SQL table macro is different, since it only works at parse time. That is why the documentation calls it “polymorphic” and never “dynamic”.

SQL table macros 1: a moving target?

Testing SQL table macros in the Cloud(19c, 21c) and on LiveSQL (19.8): behaviors are different, and the differences change with time! Here I’ll zoom in on the TABLE input parameter, as of 2021-02-16.

Current list of posts:

[UPDATE 2020-12-08: 20c preview is dead, long live 21c! This post has been revised after redoing my tests on a Free Tier 21c.]

[UPDATE 2021-02-16: thanks to a test by Philipp Salvisberg, I can say that version 19.10 has the same limitations as version 19.8.]

Too long to read?

This is a long post without a direct, practical SQL solution at the end. If you just want the gist:

  • SQL table macros let you take a table, view or named subquery as input and transform it into another query at parse time.
  • SQL table macros in 19.8 19.10 accept only scalar data types in the input, but in the Cloud (19c, 21c) they accept practically any data type.
  • The information about the input data types comes from different sources that are incomplete and sometimes in disagreement.
  • I hope Oracle will complete the back-port of SQL table macros to 19c and that they will correct and complete the information about the input data types.

Preliminaries

A few words for those who are wondering what in the world I am talking about:

  • in classical programming languages, a “macro” is a fragment of code with a name. When you write a program, you write the name; when the program is compiled, as a first step the name is replaced with the code fragment.
  • A SQL macro is a function that returns either a SQL expression or a SELECT statement. The macro is executed only at hard parse time, and results in a final SQL statement which then goes through the normal parsing process.
  • A SQL table macro returns an entire SELECT statement. It can have a table, view or named subquery as an input parameter: this allows it to be a “polymorphic table function”.
    • “Polymorphic” means “able to change shapes”; in a SELECT statement, the “shape” comes from the number, names and types of the columns in the output.
    • The main contributor to the “shape” is the table input parameter.
    • For now, I’m going to call this type of macro a “polymorphic table macro” (PTM) to distinguish it from other types of “polymorphic table functions”.

Simple example

create or replace function simplest_ptm(p_table in DBMS_TF.TABLE_T)
return varchar2 sql_macro is
begin
return 'select * from p_table';
end simplest_ptm;
/
select * from simplest_ptm(dual);

DUMMY
-----
X

There are at least two strange things to notice here:

  1. When I call the function, the parameter dual is an identifier, not a literal!
    dual is just as much a part of the SELECT statement as when I refer to it directly by select * from dual.
    Since it is part of the statement, changing the parameter changes the statement itself, which means the new statement has to be parsed again. This is what allows different “shapes”: if I change the parameter I get a new statement, which has to be parsed anew and therefore can have a new “shape”.
    (As an identifier, this parameter cannot be changed dynamically using a bind variable.)
  2. The function appears to return the literal ‘select * from p_table’ – but it doesn’t!
    Even though the return string is written as a literal, the “p_table” part is replaced by the table that p_table refers to.

Analyzing the table input parameter

The DBMS_TF.TABLE_T record has a complex structure that is used by all polymorphic table functions. When used in a SQL macro, some items are left null.

TYPE TABLE_T IS RECORD(
column                TABLE_COLUMNS_T,
schema_name           DBMS_id,  -- always null
package_name          DBMS_id,  -- always null
ptf_name              DBMS_id); -- always null

TYPE TABLE_COLUMNS_T IS TABLE OF COLUMN_T;

TYPE column_t IS RECORD (
description            COLUMN_METADATA_T,
pass_through           BOOLEAN,   -- always TRUE
for_read               BOOLEAN);  -- always FALSE

TYPE COLUMN_METADATA_T IS RECORD   -- THE GOOD STUFF
( type               PLS_INTEGER,
max_len            PLS_integer DEFAULT -1,
name               VARCHAR2(32767),
name_len           PLS_INTEGER,
precision          PLS_INTEGER,
scale              PLS_INTEGER,
charsetid          PLS_INTEGER,
charsetform        PLS_INTEGER,
collation          PLS_INTEGER );

The bottom line is that this structure contains lots of information about the columns, but not information about the table itself: we don’t even know what the name of the table is! This makes more sense when we remember that the input parameter can be a named subquery from a WITH clause.

[UPDATE 2020-12-27: the definitions above are excerpts from the documentation. Looking at the source code of the DBMS_TF package, we see that the TABLE_T record has additional fields table_schema_name and table_name! These fields have appropriate values when the input parameter refers to a table or view; if it refers to a subquery, the values are null.]

A simple way to see what is in this complex record is the DBMS_TF.TRACE procedure.

create or replace function simplest_ptm(
p_table in dbms_tf.table_t
)return varchar2 sql_macro is
begin
dbms_tf.trace(p_table);
return 'select * from p_table';
end simplest_ptm;
/
select * from simplest_ptm(dual);
/
........tab.column[1] =

pass through column

NOT a for read column
__________Type:...............VARCHAR2
__________Max Length:.........1
__________Name:..............."DUMMY"
__________Name Length:........7
__________Charset Id:.........AL32UTF8
__________Collation:..........USING_NLS_COMP

Of course, we can also use good old PL/SQL code to access all the fields directly. Using these methods, here is what I have found so far.

The Good

The input table parameter gives us lots of useful information about each column, starting with the data type. The data type is a code, and the DBMS_TF package specification tells us what each code means.

  /* Type Codes for supported types */
TYPE_VARCHAR2            CONSTANT PLS_INTEGER :=   1;
TYPE_NUMBER              CONSTANT PLS_INTEGER :=   2;
TYPE_DATE                CONSTANT PLS_INTEGER :=  12;
TYPE_RAW                 CONSTANT PLS_INTEGER :=  23;
TYPE_ROWID               CONSTANT PLS_INTEGER :=  69;
TYPE_CHAR                CONSTANT PLS_INTEGER :=  96;
TYPE_BINARY_FLOAT        CONSTANT PLS_INTEGER := 100;
TYPE_BINARY_DOUBLE       CONSTANT PLS_INTEGER := 101;
TYPE_CLOB                CONSTANT PLS_INTEGER := 112;
TYPE_BLOB                CONSTANT PLS_INTEGER := 113;
TYPE_JSON                CONSTANT PLS_INTEGER := 119; --21c only
TYPE_TIMESTAMP           CONSTANT PLS_INTEGER := 180;
TYPE_TIMESTAMP_TZ        CONSTANT PLS_INTEGER := 181;
TYPE_INTERVAL_YM         CONSTANT PLS_INTEGER := 182;
TYPE_INTERVAL_DS         CONSTANT PLS_INTEGER := 183;
TYPE_EDATE               CONSTANT PLS_INTEGER := 184;
TYPE_ETIMESTAMP_TZ       CONSTANT PLS_INTEGER := 186;
TYPE_ETIMESTAMP          CONSTANT PLS_INTEGER := 187;
TYPE_EINTERVAL_YM        CONSTANT PLS_INTEGER := 189;
TYPE_EINTERVAL_DS        CONSTANT PLS_INTEGER := 190;
TYPE_TIMESTAMP_LTZ       CONSTANT PLS_INTEGER := 231;
TYPE_ETIMESTAMP_LTZ      CONSTANT PLS_INTEGER := 232;

With this information, we can do all sorts of transformations of the query. For example, we can leave out columns of a certain data type, or format columns depending on the data type. This would be useful for formatting all the datetime data types without having to wrap them individually in TO_CHAR functions (or SQL scalar macros).

Another good thing happens before the function even executes: it checks to make sure the parameter is a valid identifer. This is an automatic protection against SQL injection.

The Bad and the Ugly

There is a lot to say here, sadly. I’m not sure what is “bad” and what is “ugly”: let’s just say there are quite a few things I wish were different.

No invisible columns

If the input table has invisible columns, they are not included in the column list and the final SELECT statement cannot access them, even if the final SELECT statement lists them explicitly. The only way to get access to such columns is by using a view or a named subquery that lists them explicitly.

[UPDATE 2020-12-27: the DBMS_TF.TABLE_T record has additional fields table_schema_name and table_name. We can plug those values into the SELECT statement (instead of referring to the parameter itself) and then all columns are accessible.]

Version weirdness

SQL macros were supposed to appear in 20c, and have now arrived in 21c. Table macros were back-ported to version 19, starting with 19.7. Unfortunately, 19.8 (on livesql.oracle.com) – and even 19.10 – have limited functionality compared to 21c, and even compared to the Cloud Autonomous Database which is theoretically at version 19.5!

Basically, 19.8 and 19.10 allow only scalar columns in the input table, whereas 19c “Cloud” and 21c allow many more data types. Here is the list:

LONG
In-memory DATE (e.g. SYSDATE)
LONG RAW
XMLTYPE
BFILE
JSON
UDT OBJECT
UDT NESTED TABLE
UDT VARRAY
In-memory TIMESTAMP
In-memory TIMESTAMP WITH TIME ZONE
In-memory INTERVAL_YM
In-memory INTERVAL_DS
In-memory TIMESTAMP WITH LOCAL TIME ZONE

I hope the on-premises version of 19c will eventually support all the same data types as the Cloud “version”.

Incoherent type codes

Sometimes the DBMS_TF constants define a value that is different from what the input column data actually shows.

  • CHAR / NCHAR are said to have code 96 but they return 1
  • In-memory DATE (such as SYSDATE in the SELECT list): said to be 184, is actually 13
  • In-memory TIMESTAMP WITH TIME ZONE: said to be 186, is actually 188

This is unfortunate because the constants in the DBMS_TF package are the only detailed “documentation” we have on what types are supported.

Undocumented data types

The DBMS_TF package only documents the scalar data types that work in versions up to 19.10. Even in 21c, the JSON data type is documented but not the dozen or more other data types that work fine in both 21c and 19c “Cloud”.

Type-code 58 overload

XMLTYPE columns are assigned type-code 58, which would be fine except that 58 is also the code for ANYDATA. There is no way to distinguish these types using only the information in the input TABLE_T record.

NVARACHAR2

I’m nitpicking here, but DBMS_TF.TRACE misspells NVARCHAR2 :-)

My Presentations at #UKOUG #techfest19

I just finished my second presentation at TechFest19 in sunny Brighton, England. The conference this year is great from every point of view: venue, people, content, conversations… Thanks to those who came to my talks for your benevolent attention and excellent questions/remarks!

Both my presentations are now on SlideShare. I beg you to download them so the animations work; if you don’t, you will see a jumble.

Best wishes to all for the end of the conference and the trip home.

Regards, Stew

Make your Data Dance: UNPIVOT, PIVOT and GROUP BY extensions

JSON in Oracle database 18c and 19c

MATCH_RECOGNIZE Restrictions

Sometimes we think that certain restrictions are not documented when in fact they are. Where do we forget to look? Database Error Messages

As mentioned recently on asktom.oracle.com (thanks, Connor), all error messages and event codes are listed in $ORACLE_HOME/rdbms/mesg/oraus.msg. Searching for “pattern matching”, we read “62500 – 62549 are reserved for PATTERN MATCHING”.

So here is a link to all the MATCH_RECOGNIZE error messages: ORA-60001 to ORA-65535

As of version 19c, the range actually used is 62500 through 62521.

It was a surprise to discover that “bounded quantifiers” are limited to 200, but there it is in black and white:

“ORA-62518: MATCH_RECOGNIZE pattern is too large.”
“Permute with more than 7 elements or bounded quantifier bound greater than 200 are … currently not supported.”

I’m not suggesting we spend our lives reading database error messages, but we have to admit they are documented ;-)

Best regards,
Stew

P.S. Where are the values of UB2MAXVAL and UB4MAXVAL documented? If you know, please post a link in the comments below. I suspect UB2MAXVAL is 0XFFFF and UB4MAXVAL is 0XFFFFFFFF.

FOOTNOTE 2019-11-05 : thanks to Maxim Demenko for pointing out where UB*MAXVAL are documented. In the OCI Programmer’s Guide, at the very end of the chapter on Data Types, it says

“Throughout this guide there are references to data types like ub2 or sb4, or to constants like UB4MAXVAL. These types are defined in the oratypes.h header file, which is found in the public directory. The exact contents may vary according to the operating system that you are using.”

Backtracking

By popular request, here are my thoughts about the impact of “backtracking” on performance when using the MATCH_RECOGNIZE clause. This came up again because of a query that Jonathan Lewis wrote recently; however, I will concentrate on the theory, not the query.

Patterns

The match_recognize clause implements row pattern matching: it recognizes that a consecutive series of rows is a match for a defined pattern.

The pattern is described in the aptly named PATTERN clause. The syntax resembles a subset of the regular expression syntax. For example:

  • a regular expression pattern ‘AB’ means “find the character A immediately followed by the character B”.
  • In MATCH_RECOGNIZE, PATTERN(A B) means “find a row that meets condition A, immediately followed by a row that meets condition B”. The conditions are then described in the DEFINE clause.

Both syntaxes use two features that can lead to backtracking: quantifiers and alternation.

Quantifiers

PATTERN(A) means find exactly one row that meets condition A.

We can be more flexible in how many A rows we want:

  • A? means we want 0 or 1 A row;
  • A* means we want 0 or more rows;
  • A+ means we want 1 or more rows;
  • A{100} means we want exactly 100 rows;
  • A{3,100} means we want from 3 to 100 rows.

Notice the word “exactly” appears only once in this list. All the other quantifiers are what I’ll call indefinite: there can be more than one series of rows that match the pattern! Suppose we have 200 consecutive rows that meet condition A: the pattern A* could be met 201 different ways.

When a quantifier is indefinite, the rule is to match as many rows as possible: the quantifiers are greedy. If we add a question mark, the rule is to match as few rows as possible: the quantifier becomes reluctant.

  • A{3,100} will match 100 rows if it can.
  • A{3,100}? will match 3 rows and then stop.

Whether greedy or reluctant, indefinite quantifiers can lead to backtracking. More in a minute.

Alternation

To quote the documentation, an “alternation list is created by placing a vertical bar (|) between each regular expression. Alternatives are preferred in the order they are specified. As an example, PATTERN (A | B | C) attempts to match A first. If A is not matched, it attempts to match B. If B is not matched, it attempts to match C.”

Alternation is also indefinite: in this example, the number of rows is always the same, but the same row might meet any of three different conditions.

From now on I’ll concentrate on quantifiers, since they are much more common.

From “indefinite” to backtracking

Even though an indefinite quantifier means there is more than one correct answer, there is always one preferred answer, so what’s the big deal? Suppose there are 200 rows that meet the A condition:

  • A{1,100} returns 100 rows and
  • A{1,100}? returns 1 row.

Aha! But what if there is another condition after A?

With PATTERN(A{1,100} B), suppose there are 101 consecutive rows that meet A but not B.
A regular expression engine should find 100 A, then not find a B.
It will then backtrack, “giving back” A 100. It will then find there is no B.
It will then backtrack, “giving back” A 99. It will then find there is no B.
And so on all the way back to 1.

With PATTERN(A{1,100}? B), suppose there are 100 consecutive As followed by a B.
The engine should find 1 A, then not find a B.
It will then backtrack, adding A 2. It will then not find a B.
And so on all the way up to 100.

So backtracking does not mean “giving back” an A, it means backing up from B to A.

To summarize: backtracking can happen when an indefinite quantifier is followed by another condition. With greedy quantifiers, the worst backtracking happens when there is no match, because every possible solution must be tested. With reluctant quantifiers, backtracking may happen even when there is eventually a match.

Instrumentation?

There is one bit of instrumentation about backtracking in the explain plan. Here is a quote from Oracle development that Keith Laker sent me five years ago:

In the explain plan the specific pattern matching keywords are: MATCH RECOGNIZE (SORT | BUFFER) [DETERMINISTIC FINITE AUTO]

When the plan shows “MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO“, here “MATCH RECOGNIZE” refers to the row source for evaluating the match_recognize clause , “SORT” means the row source uses “SORT” to sort the data before running it through the state machine to find matches, and “DETERMINISTIC FINITE AUTO” means the state machine that we constructed is deterministic and thus when running the sorted rows through the state machine, we don’t do backtracking. DETERMINISTIC FINITE AUTO” is desirable as the execution is efficient when there is no backtracking.

Currently we detect deterministic finite automaton by checking the state machine built: if any state has 2 or more outgoing transitions then we regard the state machine as non-deterministic, if any final state is followed by a non-final state, then the state machine is regarded as non-deterministic. We don’t check the predicates associated with each transition at all. At the moment we can only detect a few trivial cases such as PATTERN (A B C), PATTERN (A B+), PATTERN (A B*), etc.

For PATTERN (A | B) , or PATTERN (A B+ C) we just regard the state machine as non-deterministic. We don’t check the mutual exclusiveness of the define predicates in detecting a deterministic state machine.

Conclusions

The quote from Oracle development confirms that alternation, or indefinite quantifiers followed by another condition, are possibly subject to backtracking. If we are lucky enough to see DETERMINISTIC FINITE AUTO, we know backtracking is not a problem. In testing, we should always test situations where no match is found. If there are reluctant quantifiers, we should also test situations where there is a match, but not right away.

Finally, each condition should be defined as strictly as possible, saying what it should be and also what it should not be. More than once, I have run into backtracking problems because the first condition was always true; once I defined the first condition more strictly, potential matches were eliminated much earlier and the query sped up considerably.

Hope this helps,
Stew

Predict_Clustering_Factor

Jonathan Lewis recently wrote about estimating the clustering factor of an index, taking into account the intended value of the TABLE_CACHED_BLOCKS parameter of DBMS_STATS.SET_TABLE_PREFS. He included a function I wrote called predict_clustering_factor. Here is a corrected and improved version. Any questions or remarks about the code are welcome here.

Please refer to blog posts by Jonathan and Richard Foote for explanations of the clustering factor and the TABLE_CACHED_BLOCKS parameter.

UPDATE 2019-10-28: thanks to a comment by Viacheslav Andzhich on Jonathan’s post, I found that PLSQL_CODE_TYPE=’NATIVE’ makes the function run in two-thirds the time. He also mentioned the possibility of overflow on PLS_INTEGER, so I now use INTEGER only. Thanks Viacheslav!

Description

The function takes as input:
p_table_name: the table name
p_column_list: a comma-separated list of the column names to be indexed
p_max_table_cached_blocks: the MAXIMUM value of TABLE_CACHED_BLOCKS to be tested
(255 by default).

The function then queries the ROWIDs, sorted in index order:
– Rows are omitted if all index columns are NULL.
– In case of ties, rows are sorted by ROWID.
– Each ROWID has its “row number” portion set to 0, so it effectively becomes a block id.

The function collects the most recent p_max_table_cached_blocks block ids.
When there is no more room, it replaces the Least Recently Used block id with the current one.

lt_hits_per_RU is a table of intermediate results. The number of entries =  p_max_table_cached_blocks. When a new row has a block id among the most recent, I call that a “hit”. Every hit adds one to an entry in the result table: the index of the entry is based on how “recently used” the block id was. For example, if the current row and the previous row are in the same block, the index is one.

After all the rows are processed, the result table is transformed into a table of clustering factors: the clustering factor is the total number of rows minus all the hits up to and including the current entry.

In the output, the number of rows is equal to p_max_table_cached_blocks.
The ROWNUM provides the TABLE_CACHED_BLOCKS parameter value.

Sample of usage:

select rownum table_cached_blocks,
column_value clustering_factor
from table(predict_clustering_factor('T1','v1,rand'))

The code

ALTER SESSION SET PLSQL_CODE_TYPE='NATIVE';

create or replace function predict_clustering_factor(
/*
Function to predict the clustering factor of an index,
taking into account the intended value of
the TABLE_CACHED_BLOCKS parameter of DBMS_STATS.SET_TABLE_PREFS.

The function takes as input:
- p_table_name: the table name
- p_column_list: a comma-separated list of the column names to be indexed
- p_max_table_cached_blocks: the MAXIMUM value of TABLE_CACHED_BLOCKS to be tested
  (255 by default).

The function then queries the ROWIDs, sorted in index order:
- Rows are omitted if all index columns are NULL.
- In case of ties, rows are sorted by ROWID.
- Each ROWID has its "row number" portion set to 0, so it effectively becomes a block id.

The function collects the most recent p_max_table_cached_blocks block ids.
When there is no more room, it replaces the Least Recently Used block id with the current one.

The function returns a table of clustering factors.
The number of rows is equal to p_max_table_cached_blocks.
The ROWNUM of the table is the TABLE_CACHED_BLOCKS parameter value.

Sample of usage:
  select rownum table_cached_blocks,
    column_value clustering_factor
  from table(predict_clustering_factor('T1','v1,rand'))
*/
  p_table_name in varchar2,
  p_column_list in varchar2,
  p_max_table_cached_blocks in number default 255
) return sys.odcinumberlist authid current_user is

  sql_text varchar2(4000);
  rc sys_refcursor;
  type tt_rids is table of rowid;
  lt_rids tt_rids;

  type t_block is record(
    block_id rowid,
    most_recent_hit integer
  );
  type tt_blocks is table of t_block;
  lt_blocks tt_blocks := new tt_blocks();
  l_block_id rowid;
  l_block_id_prev rowid;

  l_rn integer := 0;
  b_block_found boolean;
  l_LRU integer;
  i_LRU integer := 0;

  lt_hits_per_RU sys.odcinumberlist := new sys.odcinumberlist();
  i_hits_per_RU integer;

  function truncated_rid(p_rid in rowid) return rowid is
    l_rowid_type number;
    l_object_number NUMBER;
    l_relative_fno NUMBER;
    l_block_number NUMBER;
    l_row_number NUMBER;
  begin
    DBMS_ROWID.ROWID_INFO (
      p_rid,
      l_rowid_type,
      l_object_number,
      l_relative_fno,
      l_block_number,
      l_row_number
    );
    return DBMS_ROWID.ROWID_CREATE (
      l_rowid_type,
      l_object_number,
      l_relative_fno,
      l_block_number,
      0
    );
  end truncated_rid;

  function hits_per_RU(p_most_recent_hit in integer) return integer is
    i_hits_per_RU integer := 1;
  begin
    for i in 1..lt_blocks.count loop
      if lt_blocks(i).most_recent_hit > p_most_recent_hit then
        i_hits_per_RU := i_hits_per_RU + 1;
      end if;
    end loop;
    return i_hits_per_RU;
  end hits_per_RU;

begin
  -- Check for valid TABLE_CACHED_PARAMETER value
  if p_max_table_cached_blocks != trunc(p_max_table_cached_blocks)
  or p_max_table_cached_blocks not between 1 and 255 then
    raise_application_error(
      -20001,
      'input parameter p_max_table_cached_blocks must be an integer between 1 and 255'
    );
  end if;

  -- Initialize hits_per_RU table
  lt_hits_per_RU.extend(p_max_table_cached_blocks);
  for i in 1..lt_hits_per_RU.count loop
    lt_hits_per_RU(i) := 0;
  end loop;

  -- Execute query that mimics index
  sql_text :=
    'select rowid from '||p_table_name
    ||' where '||replace(p_column_list, ',', ' is not null or ')||' is not null'
    ||' order by '||p_column_list||', rowid';
  dbms_output.put_line('Query text: '||sql_text);
  open rc for sql_text;

  loop
    fetch rc bulk collect into lt_rids limit 10000;

    for irid in 1..lt_rids.count loop

      l_rn := l_rn + 1;
      l_block_id := truncated_rid(lt_rids(irid));

      -- Optimized treatment of first row
      if l_rn = 1 then
        lt_blocks.extend;
        lt_blocks(1).block_id := l_block_id;
        lt_blocks(1).most_recent_hit := l_rn;
        l_block_id_prev := l_block_id;
        continue;
      end if;

      -- Optimized treatment of consecutive rows in same block
      if l_block_id = l_block_id_prev then
        lt_hits_per_RU(1) := lt_hits_per_RU(1) + 1;
        continue;
      end if;

      l_block_id_prev := l_block_id;
      l_LRU := l_rn;
      b_block_found := false;

      for i in 1..lt_blocks.count loop

        -- if the new block_id is never found,
        -- i_LRU will point to the Least Recently Used block
        if l_LRU > lt_blocks(i).most_recent_hit then
          l_LRU := lt_blocks(i).most_recent_hit;
          i_LRU := i;
        end if;

        -- if the new block_id is found,
        -- then how many blocks ago was it found?
        if lt_blocks(i).block_id = l_block_id then
          b_block_found := true;
          -- how recently used is the block?
          i_hits_per_RU := hits_per_RU(lt_blocks(i).most_recent_hit);
          -- update hit summary
          lt_hits_per_RU(i_hits_per_RU) := lt_hits_per_RU(i_hits_per_RU) + 1;
          -- the block_id was just hit, so update most_recent_hit value
          lt_blocks(i).most_recent_hit := l_rn;
          exit;
        end if;

      end loop;

      -- If new block_id, add to lt_blocks if there is room,
      -- otherwise overwrite Least Recently Used entry
      if not b_block_found then
        if lt_blocks.count <span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span>< p_max_table_cached_blocks then
          lt_blocks.extend;
          lt_blocks(lt_blocks.count).block_id := l_block_id;
          lt_blocks(lt_blocks.count).most_recent_hit := l_rn;
        else
          lt_blocks(i_LRU).block_id := l_block_id;
          lt_blocks(i_LRU).most_recent_hit := l_rn;
        end if;
      end if;

    end loop;

    exit when rc%notfound;
  end loop;

  close rc;

-- Prepare output
  -- calculate cumulative sum of hits
  for i in 2..lt_hits_per_RU.count loop
    lt_hits_per_RU(i) := lt_hits_per_RU(i) + lt_hits_per_RU(i-1);
  end loop;
  -- subtract cumulative hits from total number of rows to get
  -- clustering factor. ROWNUM provides the TABLE_CACHED_BLOCKS value.
  for i in 1..lt_hits_per_RU.count loop
    lt_hits_per_RU(i) := l_rn - lt_hits_per_RU(i);
  end loop;

  dbms_output.put_line('Total number of rows in index = '|| l_rn);
  return lt_hits_per_RU;

exception when others then
  if rc%isopen then
    close rc;
  end if;
  raise;

end predict_clustering_factor;
/

LISTAGG() as a Cumulative Function

LISTAGG() can be used as an analytic function, but cannot be cumulative. What does that mean, and is there a workaround?

I’ll demonstrate with the boring old DEPT and EMP tables, just because they have so few rows, and I’ll keep only the department and employee names:

create or replace view denames as
select dname, ename from dept join emp using(deptno);

select * from denames
order by dname, ename;
DNAME ENAME
ACCOUNTING CLARK
ACCOUNTING KING
ACCOUNTING MILLER
RESEARCH ADAMS
RESEARCH FORD
RESEARCH JONES
RESEARCH SCOTT
RESEARCH SMITH
SALES ALLEN
SALES BLAKE
SALES JAMES
SALES MARTIN
SALES TURNER
SALES WARD

 

Reminder: aggregate versus analytic

“Aggregate functions return a single result row based on groups of rows, rather than on single rows.” (19c doc)

select dname,
  listagg(ename, ',') within group(order by ename) enames
from denames
group by dname
order by dname;
DNAME ENAMES
ACCOUNTING CLARK,KING,MILLER
RESEARCH ADAMS,FORD,JONES,SCOTT,SMITH
SALES ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

 

By contrast, “analytic functions…differ from aggregate functions in that they return multiple rows for each group…All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed.” (19c). In other words, the output rows are already determined and any analytic function simply adds a column to each existing row. The “analytic clause”, starting with the keyword OVER(), says what rows provide data for the function.

select de.*,
  listagg(ename, ',') within group(order by ename)
    over(partition by dname)
  enames
from denames de;
DNAME ENAME ENAMES
ACCOUNTING CLARK CLARK,KING,MILLER
ACCOUNTING KING CLARK,KING,MILLER
ACCOUNTING MILLER CLARK,KING,MILLER
RESEARCH ADAMS ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH FORD ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH JONES ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH SCOTT ADAMS,FORD,JONES,SCOTT,SMITH
RESEARCH SMITH ADAMS,FORD,JONES,SCOTT,SMITH
SALES ALLEN ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES BLAKE ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES JAMES ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES MARTIN ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES TURNER ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
SALES WARD ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

 

LISTAGG() Limitations

The “analytic clause” has three parts:

  1. PARTITION BY: like GROUP BY, separates the data into chunks based on one or more values;
  2. ORDER BY: within the “partition”, orders the rows and sets up the next part;
  3. Windowing clause: says what rows within the ordered partition are concerned by the function.

Some analytic functions allow only the first or second parts, and LISTAGG() is one of them. Let’s try to use it as a cumulative function:

select de.*,
  listagg(ename, ',') within group(order by ename)
    over(partition by dname order by ename)
  enames
from denames de;

SQL Error: ORA-30487: ORDER BY not allowed here
30487. 00000 -  "ORDER BY not allowed here"
*Cause:    DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY

The error message is a bit strange, but the conclusion is: it doesn’t work.

The workaround

The idea is simple: take what LISTAGG() gives us and do a SUBSTR() based on the row number:

select dname, ename,
  substr(enames, 1, instr(enames, ',', 1, rn) - 1) enames
from (
  select de.*,
    listagg(ename, ',') within group(order by ename)
      over(partition by dname)
    || ','
    enames,
    row_number() over(partition by dname order by ename) rn
  from denames de
);
DNAME ENAME ENAMES
ACCOUNTING CLARK CLARK
ACCOUNTING KING CLARK,KING
ACCOUNTING MILLER CLARK,KING,MILLER
RESEARCH ADAMS ADAMS
RESEARCH FORD ADAMS,FORD
RESEARCH JONES ADAMS,FORD,JONES
RESEARCH SCOTT ADAMS,FORD,JONES,SCOTT
RESEARCH SMITH ADAMS,FORD,JONES,SCOTT,SMITH
SALES ALLEN ALLEN
SALES BLAKE ALLEN,BLAKE
SALES JAMES ALLEN,BLAKE,JAMES
SALES MARTIN ALLEN,BLAKE,JAMES,MARTIN
SALES TURNER ALLEN,BLAKE,JAMES,MARTIN,TURNER
SALES WARD ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

 

Caveats

This solution seems fine, as long as the source values are NOT NULL. If there is a NULL value, the “cumulative value” becomes NULL. Look at the last row of the output.

update emp set ename = null where ename = 'ALLEN';

(repeat the query)
DNAME ENAME ENAMES
ACCOUNTING CLARK CLARK
ACCOUNTING KING CLARK,KING
ACCOUNTING MILLER CLARK,KING,MILLER
RESEARCH ADAMS ADAMS
RESEARCH FORD ADAMS,FORD
RESEARCH JONES ADAMS,FORD,JONES
RESEARCH SCOTT ADAMS,FORD,JONES,SCOTT
RESEARCH SMITH ADAMS,FORD,JONES,SCOTT,SMITH
SALES BLAKE BLAKE
SALES JAMES BLAKE,JAMES
SALES MARTIN BLAKE,JAMES,MARTIN
SALES TURNER BLAKE,JAMES,MARTIN,TURNER
SALES WARD BLAKE,JAMES,MARTIN,TURNER,WARD
SALES

 

Also, with version 19c LISTAGG() allows the DISTINCT keyword. If you want to use DISTINCT, then instead of ROW_NUMBER() use DENSE_RANK()!

update emp set ename = 'MARTIN' where ename is null;

select dname, ename,
  substr(enames, 1, instr(enames, ',', 1, rn) - 1) enames
from (
  select de.*,
    listagg(DISTINCT ename, ',') within group(order by ename)
      over(partition by dname)
    || ','
    enames,
    DENSE_RANK() over(partition by dname order by ename) rn
  from denames de
);
DNAME ENAME ENAMES
ACCOUNTING CLARK CLARK
ACCOUNTING KING CLARK,KING
ACCOUNTING MILLER CLARK,KING,MILLER
RESEARCH ADAMS ADAMS
RESEARCH FORD ADAMS,FORD
RESEARCH JONES ADAMS,FORD,JONES
RESEARCH SCOTT ADAMS,FORD,JONES,SCOTT
RESEARCH SMITH ADAMS,FORD,JONES,SCOTT,SMITH
SALES BLAKE BLAKE
SALES JAMES BLAKE,JAMES
SALES MARTIN BLAKE,JAMES,MARTIN
SALES MARTIN BLAKE,JAMES,MARTIN
SALES TURNER BLAKE,JAMES,MARTIN,TURNER
SALES WARD BLAKE,JAMES,MARTIN,TURNER,WARD

 
With these precautions, why not use LISTAGG() as a cumulative function if you want? Hope this helps…

Making Longer Lists

For very long lists, we need the return values to be CLOBs. Alas, LISTAGG can only return VARCHAR2 lists, but XML and JSON aggregate functions can return CLOBs!

As an exercise, I’ll start from ALL_OBJECTS: I want to make a list of all fully qualified object names for each object type. To start with, a simple subquery:

with data as (
  select object_type, owner ||'.'|| object_name obj
  from all_objects
)
select * from data;

OBJECT_TYPE  OBJ
INDEX	     SYS.I_FILE#_BLOCK#
INDEX	     SYS.I_OBJ3
INDEX	     SYS.I_TS1
INDEX	     SYS.I_CON1
...

XML

XML functions work on the XMLTYPE data type, which has practically no length limitations. To make my list, I have to go through three steps:

  1. Make an XML fragment out of each object.
    This step will escape any special characters such as '<' or '>'.
  2. Aggregate the XML fragments.
  3. Extract the text from the aggregated XML data.
    This step will unescape the special characters.

I have to handle myself the comma that separates the elements of the list.

with data as (
  select object_type, owner ||'.'|| object_name obj
  from all_objects
)
select object_type,
SUBSTR(
  xmlcast(
    xmlagg(
      xmlelement(O, ',' || obj)
      order by obj
    ) AS CLOB
  ), 2
) qualified_objects
from data
group by object_type
order by object_type;

On my system with about 78000 objects, this takes 3 seconds and a bit.

JSON

JSON support in the Oracle database is pretty new. For full support of CLOBs, we need versions 18c or later (or a patched version 12.2, but I don’t know the precise version number).

The steps are a little different with JSON:

  1. I can aggregate the data directly into a JSON array.
    This escapes any special characters like double quotes, newlines, etc.
  2. To turn the multiple array items into one (which is my list), I replace "," (which separates the array items) with my separator.
  3. I then use JSON_VALUE to extract the list from the array.
    This unescapes any character that needs it.
with data as (
  select object_type, owner ||'.'|| object_name obj
  from all_objects
)
select object_type,
  json_value(
    replace(
      json_arrayagg(obj order by obj returning clob),
      '","',
      ','
    ),
    '$[0]' returning clob
  ) qualified_objects
from data
group by object_type
order by object_type;

Instead of 3+ seconds, this code takes a bit over 1 second, so three times faster.

As a reminder, JSON is great for going the other way: split the items of a list into a table. See Splitting Strings: a New Champion!