Reliable Messaging with REST

Marc de Graauw’s article Nobody Needs Reliable Messaging remains as relevant today as it did in 2010, when it was first published. It echoes the principles outlined in Scalable, Reliable, and Secure RESTful services from 2007.

It basically says that you don’t need for REST to support WS-ReliableMessaging delivery requirements, because reliable delivery can be accomplished by the business logic through retries, so long as in the REST layer its methods are idempotent (the same request will produce the same result). Let’s examine the implications in more detail.

First, we must design the REST methods to be idempotent. This is no small feat. This is a huge topic that deserves its own separate examination. But let’s put this topic aside for now, and assume that we have designed our REST web services to support idempotence.

If we are developing components that call REST web services for process automation, the above principle says that the caller is responsible for retrying on failure.

The caller must be able to distinguish a failure to deliver the request from a failure by the server to perform the requested method. The former should be retried, expecting that the failure is temporary. The latter is permanent.

The caller must be able to implement retry in an efficient manner. If the request is retried immediately in a tight loop, it is likely to continue to fail for the same reason. Network connectivity issues sometimes take a few minutes to be resolved. However, if the reason for failure is because the server is overloaded, having all clients retry in a tight loop will exacerbate the problem by slamming the server with a flood of requests, when it is least able to process them. It would be helpful if clients would behave better by backing off for some time and retrying after a delay. Relying on clients to behave nicely on their honor is sure to fail, if their retry logic is coded ad hoc without following a standard convention.

The caller must be able to survive crashes and restarts, so that an automated task can be relied upon to reach a terminal state (success or failure) after starting. Therefore, message delivery must be backed by a persistent store. Delivery must be handled asynchronously so that it can be retried across restarts (including service migration to replacement hardware after a hardware failure), and so that the caller is not blocked waiting.

The caller must be able to detect when too many retry attempts have failed, so that it does not get stuck waiting forever for the request to be delivered. Temporary problems that take too long to be resolved need to be escalated for intervention. These requests should be diverted for special handling, and the caller should continue with other work, until someone can troubleshoot the problem. Poison message handling is essential so that retrying does not result in an infinite loop that would gum up the works.

POST methods are not idempotent, so retry must be handled very carefully to account for side-effects. Even if the request is guaranteed to be delivered, and it is processed properly (exactly once) by the server, the caller must be able to determine if the method succeeded reliably, because the reply can be lost. One approach is to deliver the reply reliably from the server back to the caller. Again, all of the above reliable delivery qualities apply. The interactions to enable this round trip message exchange certainly look very foreign to the simple HTTP synchronous interaction. Either the caller would poll for the reply, or a callback mechanism would be needed. Another approach is to enable the caller to confirm that the original request was processed. With either approach, the reliable execution requirement needs to alter the methods of the REST web services. To achieve better quality of service in the transport, the definition of the methods need to be radically redesigned. (If you are having a John McEnroe “you cannot be serious” moment right about now, it is perfectly understandable.)

Taking these requirements into consideration, it is clear that it is not true that “nobody needs reliable messaging”. Enterprise applications with automated processes that perform mission-critical tasks need the ability to perform those tasks reliably. If reliable message delivery is not handled at the REST layer, the responsibility for retry falls to the message sender. We still need reliable messaging; we must implement the requirement ourselves above REST, and this becomes troublesome without a standard framework that behaves nicely. If we accept that REST can provide only idempotence toward this goal, we must implement a standard framework to handle delivery failures, retry with exponential back off, and divert poison messages for escalation. That is to say, we need a reliable messaging framework on top of REST.

[Note that when we speak of a “client” above, we are not talking about a user sitting in front of a Web browser. We are talking about one mission-critical enterprise application communicating with another in a choreography to accomplish some business transaction. An example of a choreography is the interplay between a buyer and a seller through the systems for commerce, quote, procurement, and order fulfillment.]

OLTP database requirements

Here is what I want from a database in support of enterprise applications for online transaction processing (OLTP).

  1. ACID transactions – Enterprise CRM, ERP, and HCM applications manage data that is mission critical. People’s jobs, livelihoods, and businesses rely on this data to be correct. Real money is on the line.
  2. Document oriented – A JSON or XML representation should be the canonical way that we should think of objects stored in the database.
  3. Schema aware – A document should conform to a schema (JSON Schema or XML Schema). Information has a structure and meaning, and it should have a formal definition.
  4. Schema versioned – A document schema may evolve in a controlled manner. Software is life cycle managed, and its data needs to evolve with it for compatibility, upgrades, and migration.
  5. Relational – A subset of a document schema may be modeled as relational tables with foreign keys and indexes to support SQL queries, which can be optimized for high performance.

The fundamental shift is from a relational to a document paradigm as the primary abstraction. Relational structures continue to play an adjunct role to improve query performance for those parts of the document schema that are heavily involved in query criteria (WHERE clauses). The document paradigm enables the vast majority of data to be stored and retrieved without having to rigidly conform to relational schema, which cannot evolve as fluidly. That is not to say that data stored outside of relational tables is less important or less meaningful. To the contrary, some of the non-relational data may be the most critical to the business. This approach is simply recognizing information that is not directly involved in query criteria can be treated differently to take advantage of greater flexibility in schema evolution and life cycle management.

Ideally, the adjunct relational tables and SQL queries would be confined by the database to its internal implementation. When exposing a document abstraction to applications, the database should also present a document-oriented query language, such as XQuery or its equivalent for JSON, which would be implemented as SQL, where appropriate as an optimization technique.

NoSQL database technology is often cited as supporting a document paradigm. NoSQL technologies as they exist today do not meet the need, because they do not support ACID transactions and they do not support adjunct structures (i.e., relational tables and indexes) to improve query performance in the manner described above.

Perhaps the next best thing would be to provide a Java persistent entity abstraction, much like EJB3/JPA, which would encapsulate the underlying representation in a document part (e.g., as a XMLType or a JSON CLOB column) and a relational part, all stored in a SQL database. This would also provide JAXB-like serialization and deserialization to and from JSON and XML representations. This is not far from what EclipseLink does today.