My goal is to show how ORA_ROWSCN can help deal with the problem of “lost updates”. Before that, I need to explain the use case I have in mind: an OLTP application that provides a transactional API in the database, using JSON for data interchange, across a stateless connection.
OLTP versus Batch
Technically, OLTP (On Line Transaction Processing) implies lots of small transactions running concurrently, whereas “batch” implies changing lots of data in one process, using bulk or set-based techniques. There is another important difference: with OLTP, someone is there!
To avoid a “lost update”, someone has to be there to “find” it. Suppose you and I query a row at the same time, then we both try to update it. If your update happens first and commits, I need to know it so I can decide whether my update should overwrite yours. In other words, I need to “find” your update so I can think about how to deal with the conflict.
When a batch program runs, there is no one there to “find” an update made by something else while the batch is running.
A prerequisite of avoiding “lost updates” is the presence of a human being who can be notified in case of a conflict and decide what to do next.
Stateless: Request and Response
When you and I write SQL or PL/SQL, we generally use a tool like SQL Developer, Toad or SQL*Plus. These tools use the same client / server architecture that applications used in the 1980’s: each person has a conversation with the database using a session that keeps track of what is going on. If I query 1000 rows in SQL Developer, the tool will execute the query and fetch 100 rows. If I scroll down, the tool will fetch 100 more rows from the same query – because the cursor was left open in my session.
The fact that the session remembers what is going on during these exchanges is called “state”.
In modern OLTP applications, people access servers using HTTP, then the servers access the database using connections from a pool. Each connection has a session that can by used by you, then me, then you again. When you use the session a second time, the session remembers nothing from your previous use – if it does, there is a huge bug in the application. This absence of “state” is what we call “stateless”.
Being “stateless” makes a big difference. A “stateful” interface can do a SELECT FOR UPDATE, then UPDATE the rows that I change. A “stateless” interface cannot do that, since the session I used for the SELECT and the session I use for the UPDATE are not necessarily the same.
A stateless application must use a pure “Request and Response” protocol with the database:
- I ask the application to query some data for me. The application gets a connection from the pool, requests the data and shows me the response from the database. The database session forgets about me because the next user will likely be somebody else.
- I ask the application to update some data. The application gets whatever connection and session are free. If the database needs some information from my first query, the application has to pass it back.
The whole idea of “optimistic locking” is based on this scenario: a stateless OLTP application with a real life user who has to be informed of update conflicts.
JSON and the Object-relational impedance mismatch
As stated in the JSON standard, JSON is a data interchange format. Did I emphasize that enough for you? JSON was not meant for storing data, but for exchanging it between programming languages. Starting with Oracle Database 12c, SQL can query JSON (meaning it can map JSON to SQL rows and columns); starting with version 12.2, SQL can also generate JSON from “normal” database data. In other words, SQL all by itself can now use JSON as a data interchange format.
Ever since Java hype got huge mind share in the minds of enterprise IT architects, developers are supposed to do object-oriented programming. Thinking of data as rows and columns in a SQL database is old hat. Instead of data, there are objects with attributes and methods; several EMP objects can point to the same DEPT object, whereas in a JOIN the same DEPT data is repeated across several rows. The conceptual differences between “object” and “relational” are important enough to have a fancy name: the “object-relational impedance mismatch”. People write object-relational mapping frameworks so that application code can ignore SQL as much as possible, leading to all sorts of wasted resources, time and money.
By using JSON intelligently as a data interchange format, we can map relational data to an “object-friendly” form right in our SQL statements. As long as the application developers and the SQL developers understand the mapping and what underlies it, each can work in their own environment with data everyone can access with relative ease.
Note I am doing the JSON-relational mapping in the database, in SQL. The database client (application or generic tool) accesses the database via SQL*Net and receives data in JSON format already. That data may or may not be a payload for a RESTful web service. JSON in the database can be used in browser-based, #ThinJava applications but also in enterprise applications with thick application layers.
Transactional API: read-consistent, write-coherent
I wish I had a better term than “transactional API”, because I don’t mean the same thing by it as others – sorry! Let me explain.
Getting back to the “lost update” problem, the user needs one API to query some data and another API to update it. Each API has to include one “request” from the application and one “response” from the database (this is all stateless, remember). In my view, every “query” API should return read-consistent data, and every “update” API should be transactional – at least potentially.
As far as querying is concerned, I am not saying that two different responses from the database should be consistent – only that all the data contained within one response should be internally consistent. If data from two queries are combined in one response, then the queries should be “as of” the same SCN.
With respect to “update” APIs (or any API that changes data), the API should provide everything the database needs to make a coherent set of changes that can reasonably be committed together. This is what we used to call a “logical unit of work”. If a change doesn’t make sense by itself, but only as part of a whole, then there should be no API to make just that change.
Unlike the #SmartDB paradigm, I don’t insist that the API commit before returning. It may be convenient for the application to do a complex transaction that involves two APIs which must succeed together or fail together. It may be that the application accesses multiple data sources and does two-phase commits. I do insist that the API is “atomic”: the changes either entirely succeed or entirely fail, and if they fail the database returns to precisely the state it was in before the API was called. For want of a better phrase, I’ll call this API a “write-coherent” API.
When is there a lost update?
Let me add one final touch to my use case. Optimistic locking schemes are often row-based: if you change any column in a row, I get notified of a “lost update” even if I am changing another column. My approach will start with that idea, but then I will make it more granular so that only updates to the same column will cause a conflict.
Next up: an overview of the ORA_ROWSCN solution.