SQL table macros 1: a moving target?

Testing SQL table macros in the Cloud(19c, 21c) and on LiveSQL (19.8): behaviors are different, and the differences change with time! Here I’ll zoom in on the TABLE input parameter, as of 2021-02-16.

Current list of posts:

[UPDATE 2020-12-08: 20c preview is dead, long live 21c! This post has been revised after redoing my tests on a Free Tier 21c.]

[UPDATE 2021-02-16: thanks to a test by Philipp Salvisberg, I can say that version 19.10 has the same limitations as version 19.8.]

Too long to read?

This is a long post without a direct, practical SQL solution at the end. If you just want the gist:

  • SQL table macros let you take a table, view or named subquery as input and transform it into another query at parse time.
  • SQL table macros in 19.8 19.10 accept only scalar data types in the input, but in the Cloud (19c, 21c) they accept practically any data type.
  • The information about the input data types comes from different sources that are incomplete and sometimes in disagreement.
  • I hope Oracle will complete the back-port of SQL table macros to 19c and that they will correct and complete the information about the input data types.

Preliminaries

A few words for those who are wondering what in the world I am talking about:

  • in classical programming languages, a “macro” is a fragment of code with a name. When you write a program, you write the name; when the program is compiled, as a first step the name is replaced with the code fragment.
  • A SQL macro is a function that returns either a SQL expression or a SELECT statement. The macro is executed only at hard parse time, and results in a final SQL statement which then goes through the normal parsing process.
  • A SQL table macro returns an entire SELECT statement. It can have a table, view or named subquery as an input parameter: this allows it to be a “polymorphic table function”.
    • “Polymorphic” means “able to change shapes”; in a SELECT statement, the “shape” comes from the number, names and types of the columns in the output.
    • The main contributor to the “shape” is the table input parameter.
    • For now, I’m going to call this type of macro a “polymorphic table macro” (PTM) to distinguish it from other types of “polymorphic table functions”.

Simple example

create or replace function simplest_ptm(p_table in DBMS_TF.TABLE_T)
return varchar2 sql_macro is
begin
return 'select * from p_table';
end simplest_ptm;
/
select * from simplest_ptm(dual);

DUMMY
-----
X

There are at least two strange things to notice here:

  1. When I call the function, the parameter dual is an identifier, not a literal!
    dual is just as much a part of the SELECT statement as when I refer to it directly by select * from dual.
    Since it is part of the statement, changing the parameter changes the statement itself, which means the new statement has to be parsed again. This is what allows different “shapes”: if I change the parameter I get a new statement, which has to be parsed anew and therefore can have a new “shape”.
    (As an identifier, this parameter cannot be changed dynamically using a bind variable.)
  2. The function appears to return the literal ‘select * from p_table’ – but it doesn’t!
    Even though the return string is written as a literal, the “p_table” part is replaced by the table that p_table refers to.

Analyzing the table input parameter

The DBMS_TF.TABLE_T record has a complex structure that is used by all polymorphic table functions. When used in a SQL macro, some items are left null.

TYPE TABLE_T IS RECORD(
column                TABLE_COLUMNS_T,
schema_name           DBMS_id,  -- always null
package_name          DBMS_id,  -- always null
ptf_name              DBMS_id); -- always null

TYPE TABLE_COLUMNS_T IS TABLE OF COLUMN_T;

TYPE column_t IS RECORD (
description            COLUMN_METADATA_T,
pass_through           BOOLEAN,   -- always TRUE
for_read               BOOLEAN);  -- always FALSE

TYPE COLUMN_METADATA_T IS RECORD   -- THE GOOD STUFF
( type               PLS_INTEGER,
max_len            PLS_integer DEFAULT -1,
name               VARCHAR2(32767),
name_len           PLS_INTEGER,
precision          PLS_INTEGER,
scale              PLS_INTEGER,
charsetid          PLS_INTEGER,
charsetform        PLS_INTEGER,
collation          PLS_INTEGER );

The bottom line is that this structure contains lots of information about the columns, but not information about the table itself: we don’t even know what the name of the table is! This makes more sense when we remember that the input parameter can be a named subquery from a WITH clause.

[UPDATE 2021-01-11: the definitions above are excerpts from the documentation. Looking at the source code of the DBMS_TF package (in 21c and 19c Cloud only), we see that the TABLE_T record has additional fields table_schema_name and table_name! These fields have appropriate values when the input parameter refers to a table or view; if it refers to a subquery, the values are null.]

A simple way to see what is in this complex record is the DBMS_TF.TRACE procedure.

create or replace function simplest_ptm(
p_table in dbms_tf.table_t
)return varchar2 sql_macro is
begin
dbms_tf.trace(p_table);
return 'select * from p_table';
end simplest_ptm;
/
select * from simplest_ptm(dual);
/
........tab.column[1] =

pass through column

NOT a for read column
__________Type:...............VARCHAR2
__________Max Length:.........1
__________Name:..............."DUMMY"
__________Name Length:........7
__________Charset Id:.........AL32UTF8
__________Collation:..........USING_NLS_COMP

Of course, we can also use good old PL/SQL code to access all the fields directly. Using these methods, here is what I have found so far.

The Good

The input table parameter gives us lots of useful information about each column, starting with the data type. The data type is a code, and the DBMS_TF package specification tells us what each code means.

  /* Type Codes for supported types */
TYPE_VARCHAR2            CONSTANT PLS_INTEGER :=   1;
TYPE_NUMBER              CONSTANT PLS_INTEGER :=   2;
TYPE_DATE                CONSTANT PLS_INTEGER :=  12;
TYPE_RAW                 CONSTANT PLS_INTEGER :=  23;
TYPE_ROWID               CONSTANT PLS_INTEGER :=  69;
TYPE_CHAR                CONSTANT PLS_INTEGER :=  96;
TYPE_BINARY_FLOAT        CONSTANT PLS_INTEGER := 100;
TYPE_BINARY_DOUBLE       CONSTANT PLS_INTEGER := 101;
TYPE_CLOB                CONSTANT PLS_INTEGER := 112;
TYPE_BLOB                CONSTANT PLS_INTEGER := 113;
TYPE_JSON                CONSTANT PLS_INTEGER := 119; --21c only
TYPE_TIMESTAMP           CONSTANT PLS_INTEGER := 180;
TYPE_TIMESTAMP_TZ        CONSTANT PLS_INTEGER := 181;
TYPE_INTERVAL_YM         CONSTANT PLS_INTEGER := 182;
TYPE_INTERVAL_DS         CONSTANT PLS_INTEGER := 183;
TYPE_EDATE               CONSTANT PLS_INTEGER := 184;
TYPE_ETIMESTAMP_TZ       CONSTANT PLS_INTEGER := 186;
TYPE_ETIMESTAMP          CONSTANT PLS_INTEGER := 187;
TYPE_EINTERVAL_YM        CONSTANT PLS_INTEGER := 189;
TYPE_EINTERVAL_DS        CONSTANT PLS_INTEGER := 190;
TYPE_TIMESTAMP_LTZ       CONSTANT PLS_INTEGER := 231;
TYPE_ETIMESTAMP_LTZ      CONSTANT PLS_INTEGER := 232;

With this information, we can do all sorts of transformations of the query. For example, we can leave out columns of a certain data type, or format columns depending on the data type. This would be useful for formatting all the datetime data types without having to wrap them individually in TO_CHAR functions (or SQL scalar macros).

Another good thing happens before the function even executes: it checks to make sure the parameter is a valid identifer. This is an automatic protection against SQL injection.

The Bad and the Ugly

There is a lot to say here, sadly. I’m not sure what is “bad” and what is “ugly”: let’s just say there are quite a few things I wish were different.

No invisible columns

If the input table has invisible columns, they are not included in the column list and the final SELECT statement cannot access them, even if the final SELECT statement lists them explicitly. The only way to get access to such columns is by using a view or a named subquery that lists them explicitly.

[UPDATE 2020-12-27: the DBMS_TF.TABLE_T record has additional fields table_schema_name and table_name. We can plug those values into the SELECT statement (instead of referring to the parameter itself) and then all columns are accessible.]

Version weirdness

SQL macros were supposed to appear in 20c, and have now arrived in 21c. Table macros were back-ported to version 19, starting with 19.7. Unfortunately, 19.8 (on livesql.oracle.com) – and even 19.10 – have limited functionality compared to 21c, and even compared to the Cloud Autonomous Database which is theoretically at version 19.5!

Basically, 19.8 and 19.10 allow only scalar columns in the input table, whereas 19c “Cloud” and 21c allow many more data types. Here is the list:

LONG
In-memory DATE (e.g. SYSDATE)
LONG RAW
XMLTYPE
BFILE
JSON
UDT OBJECT
UDT NESTED TABLE
UDT VARRAY
In-memory TIMESTAMP
In-memory TIMESTAMP WITH TIME ZONE
In-memory INTERVAL_YM
In-memory INTERVAL_DS
In-memory TIMESTAMP WITH LOCAL TIME ZONE

I hope the on-premises version of 19c will eventually support all the same data types as the Cloud “version”.

Incoherent type codes

Sometimes the DBMS_TF constants define a value that is different from what the input column data actually shows.

  • CHAR / NCHAR are said to have code 96 but they return 1
  • In-memory DATE (such as SYSDATE in the SELECT list): said to be 184, is actually 13
  • In-memory TIMESTAMP WITH TIME ZONE: said to be 186, is actually 188

This is unfortunate because the constants in the DBMS_TF package are the only detailed “documentation” we have on what types are supported.

Undocumented data types

The DBMS_TF package only documents the scalar data types that work in versions up to 19.10. Even in 21c, the JSON data type is documented but not the dozen or more other data types that work fine in both 21c and 19c “Cloud”.

Type-code 58 overload

XMLTYPE columns are assigned type-code 58, which would be fine except that 58 is also the code for ANYDATA. There is no way to distinguish these types using only the information in the input TABLE_T record.

NVARACHAR2

I’m nitpicking here, but DBMS_TF.TRACE misspells NVARCHAR2 :-)

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s