[UPDATE 2020-12-08: 20c preview is dead, long live 21c! This post has been revised after redoing my tests on a Free Tier 21c.]
Testing SQL table macros in the Cloud(19c, 21c) and on LiveSQL (19.8): behaviors are different, and the differences change with time! Here I’ll zoom in on the TABLE input parameter, as of 2020-12-08.
Too long to read?
This is a long post without a direct, practical SQL solution at the end. If you just want the gist:
- SQL table macros let you take a table, view or named subquery as input and transform it into another query at parse time.
- SQL table macros in 19.8 accept only scalar data types in the input, but in the Cloud (19c, 21c) they accept practically any data type.
- The information about the input data types comes from different sources that are incomplete and sometimes in disagreement.
- I hope Oracle will complete the back-port of SQL table macros to 19c and that they will correct and complete the information about the input data types.
A few words for those who are wondering what in the world I am talking about:
- in classical programming languages, a “macro” is a fragment of code with a name. When you write a program, you write the name; when the program is compiled, as a first step the name is replaced with the code fragment.
- A SQL macro is a function that returns either a SQL expression or a SELECT statement. The macro is executed only at hard parse time, and results in a final SQL statement which then goes through the normal parsing process.
- A SQL table macro returns an entire SELECT statement. It can have a table, view or named subquery as an input parameter: this allows it to be a “polymorphic table function”.
- “Polymorphic” means “able to change shapes”; in a SELECT statement, the “shape” comes from the number, names and types of the columns in the output.
- The main contributor to the “shape” is the table input parameter.
- For now, I’m going to call this type of macro a “polymorphic table macro” (PTM) to distinguish it from other types of “polymorphic table functions”.
create or replace function simplest_ptm(p_table in DBMS_TF.TABLE_T) return varchar2 sql_macro is begin return 'select * from p_table'; end simplest_ptm; / select * from simplest_ptm(dual); DUMMY ----- X
There are at least two strange things to notice here:
- When I call the function, the parameter
dualis an identifier, not a literal!
dualis just as much a part of the SELECT statement as when I refer to it directly by
select * from dual.
Since it is part of the statement, changing the parameter changes the statement itself, which means the new statement has to be parsed again. This is what allows different “shapes”: if I change the parameter I get a new statement, which has to be parsed anew and therefore can have a new “shape”.
(As an identifier, this parameter cannot be changed dynamically using a bind variable.)
- The function appears to return the literal ‘select * from p_table’ – but it doesn’t!
Even though the return string is written as a literal, the “p_table” part is replaced by the table that
Analyzing the table input parameter
DBMS_TF.TABLE_T record has a complex structure that is used by all polymorphic table functions. When used in a SQL macro, some items are left null.
TYPE TABLE_T IS RECORD( column TABLE_COLUMNS_T, schema_name DBMS_id, -- always null package_name DBMS_id, -- always null ptf_name DBMS_id); -- always null TYPE TABLE_COLUMNS_T IS TABLE OF COLUMN_T; TYPE column_t IS RECORD ( description COLUMN_METADATA_T, pass_through BOOLEAN, -- always TRUE for_read BOOLEAN); -- always FALSE TYPE COLUMN_METADATA_T IS RECORD -- THE GOOD STUFF ( type PLS_INTEGER, max_len PLS_integer DEFAULT -1, name VARCHAR2(32767), name_len PLS_INTEGER, precision PLS_INTEGER, scale PLS_INTEGER, charsetid PLS_INTEGER, charsetform PLS_INTEGER, collation PLS_INTEGER );
The bottom line is that this structure contains lots of information about the columns,
but not information about the table itself: we don’t even know what the name of the table is! This makes more sense when we remember that the input parameter can be a named subquery from a
[UPDATE 2020-12-27: the definitions above are excerpts from the documentation. Looking at the source code of the
DBMS_TF package, we see that the
TABLE_T record has additional fields
table_name! These fields have appropriate values when the input parameter refers to a table or view; if it refers to a subquery, the values are null.]
A simple way to see what is in this complex record is the
create or replace function simplest_ptm( p_table in dbms_tf.table_t )return varchar2 sql_macro is begin dbms_tf.trace(p_table); return 'select * from p_table'; end simplest_ptm; / select * from simplest_ptm(dual); / ........tab.column = pass through column NOT a for read column __________Type:...............VARCHAR2 __________Max Length:.........1 __________Name:..............."DUMMY" __________Name Length:........7 __________Charset Id:.........AL32UTF8 __________Collation:..........USING_NLS_COMP
Of course, we can also use good old PLS/SQL code to access all the fields directly. Using these methods, here is what I have found so far.
The input table parameter gives us lots of useful information about each column, starting with the data type. The data type is a code, and the
DBMS_TF package specification tells us what each code means.
/* Type Codes for supported types */ TYPE_VARCHAR2 CONSTANT PLS_INTEGER := 1; TYPE_NUMBER CONSTANT PLS_INTEGER := 2; TYPE_DATE CONSTANT PLS_INTEGER := 12; TYPE_RAW CONSTANT PLS_INTEGER := 23; TYPE_ROWID CONSTANT PLS_INTEGER := 69; TYPE_CHAR CONSTANT PLS_INTEGER := 96; TYPE_BINARY_FLOAT CONSTANT PLS_INTEGER := 100; TYPE_BINARY_DOUBLE CONSTANT PLS_INTEGER := 101; TYPE_CLOB CONSTANT PLS_INTEGER := 112; TYPE_BLOB CONSTANT PLS_INTEGER := 113; TYPE_JSON CONSTANT PLS_INTEGER := 119; --21c only TYPE_TIMESTAMP CONSTANT PLS_INTEGER := 180; TYPE_TIMESTAMP_TZ CONSTANT PLS_INTEGER := 181; TYPE_INTERVAL_YM CONSTANT PLS_INTEGER := 182; TYPE_INTERVAL_DS CONSTANT PLS_INTEGER := 183; TYPE_EDATE CONSTANT PLS_INTEGER := 184; TYPE_ETIMESTAMP_TZ CONSTANT PLS_INTEGER := 186; TYPE_ETIMESTAMP CONSTANT PLS_INTEGER := 187; TYPE_EINTERVAL_YM CONSTANT PLS_INTEGER := 189; TYPE_EINTERVAL_DS CONSTANT PLS_INTEGER := 190; TYPE_TIMESTAMP_LTZ CONSTANT PLS_INTEGER := 231; TYPE_ETIMESTAMP_LTZ CONSTANT PLS_INTEGER := 232;
With this information, we can do all sorts of transformations of the query. For example, we can leave out columns of a certain data type, or format columns depending on the data type. This would be useful for formatting all the datetime data types without having to wrap them individually in
TO_CHAR functions (or SQL scalar macros).
Another good thing happens before the function even executes: it checks to make sure the parameter is a valid identifer. This is an automatic protection against SQL injection.
The Bad and the Ugly
There is a lot to say here, sadly. I’m not sure what is “bad” and what is “ugly”: let’s just say there are quite a few things I wish were different.
No invisible columns
If the input table has invisible columns, they are not included in the column list and the final SELECT statement cannot access them, even if the final
SELECT statement lists them explicitly.
The only way to get access to such columns is by using a view or a named subquery that lists them explicitly.
[UPDATE 2020-12-27: the DBMS_TF.TABLE_T record has additional fields table_schema_name and table_name. We can plug those values into the SELECT statement (instead of referring to the parameter itself) and then all columns are accessible.]
SQL macros were supposed to appear in 20c, and have now arrived in 21c. Table macros were back-ported to version 19, starting with 19.7. Unfortunately, 19.8 (on livesql.oracle.com) has limited functionality compared to 21c, and even compared to the Cloud Autonomous Database which is theoretically at version 19.5!
Basically, 19.8 allows only scalar columns in the input table, whereas 19c “Cloud” and 21c allow many more data types. Here is the list:
LONG In-memory DATE (e.g. SYSDATE) LONG RAW XMLTYPE BFILE JSON UDT OBJECT UDT NESTED TABLE UDT VARRAY In-memory TIMESTAMP In-memory TIMESTAMP WITH TIME ZONE In-memory INTERVAL_YM In-memory INTERVAL_DS In-memory TIMESTAMP WITH LOCAL TIME ZONE
I hope the on-premises version of 19c will eventually support all the same data types as the Cloud “version”.
Incoherent type codes
DBMS_TF constants define a value that is different from what the input column data actually shows.
- CHAR / NCHAR are said to have code 96 but they return 1
- In-memory DATE (such as SYSDATE in the SELECT list): said to be 184, is actually 13
- In-memory TIMESTAMP WITH TIME ZONE: said to be 186, is actually 188
This is unfortunate because the constants in the
DBMS_TF package are the only detailed “documentation” we have on what types are supported.
Undocumented data types
DBMS_TF package only documents the scalar data types that work in version 19.8. Even in 21c, the JSON data type is documented but not the dozen or more other data types that work fine in both 21c and 19c “Cloud”.
Type-code 58 overload
XMLTYPE columns are assigned type-code 58, which would be fine except that 58 is also the code for ANYDATA. There is no way to distinguish these types using only the information in the input TABLE_T record.
I’m nitpicking here, but
DBMS_TF.TRACE misspells NVARCHAR2 :-)