Tag Archives: platform

going meta

Anatomy of an n-tier application

A fully functioning web app involves several layers of software, each with its own technology, patterns, and techniques.

At the bottom of the stack is the database. A schema defines the data structures for storage. A query language is used to operate on the data. Regardless whether the database is relational, object-relational, NoSQL, or some other type, the programming paradigm at the database tier is distinctly different than and quite foreign from the layers above.

Above the database is the middle tier or application server. This is where the server-side business logic, APIs, and Web components reside.

There is usually a set of persistent entities, which provide an object abstraction of the database schema. The database query language (e.g., SQL) may be abstracted into an object query language (e.g., JPQL) for convenience. The majority of CRUD (create, read, update, delete) operations can be done naturally in the programming language without needing to formulate statements in the database query language. This provides a persistent representation of the model of the application.

Above the persistent entities is a layer of domain services. The transactional behavior of the business logic resides in this layer. This provides the API (local) that encapsulates the essence of the application functions.

The domain services are usually exposed as SOAP or RESTful services to remote clients for access from Web browsers and for machine-to-machine integration. This would necessitate that JSON and/or XML representations be derived from the persistent entities (i.e., using JAXB). This provides a serialized representation of the model of the application.

We finally come to the presentation layer, which is divided into server-side components residing in the application server and client-side components that execute in the Web browser. Usually there is a presentation-oriented representation called a view-model, which matches the information rendered on views or input on forms. The view and controls are constructed from HTML, CSS, and JavaScript. The programming paradigm in these technologies is distinctly different than the layers below.

Extending the application

Let’s examine what it would take to extend an application with a simple type (e.g., string) property on an entity. The database schema would need to be altered. A persistent entity would need a field, getter and setter methods, and a binding between the field and a column in the database schema. The property may be involved in the logic of the domain services. Next, the JSON and XML binding objects would need to be augmented with the property, and logic would be added to transform between these objects and the persistent entities used by the domain services. At the presentation layer, the view-model would be augmented with the property to expose it to the views. Various views to show an entity’s details and search results would likewise be enhanced to render the property. For editing and searching, a field would need to be added on forms with corresponding validation of any constraints associated with that property and on-submit transaction handling.

That is an awful lot of repetitive work at every layer. There are many technologies and skill sets involved. Much of the work is trivial and tedious. The entire process is far from efficient. It is worse if there is division of labor among multiple developers who require coordination.

A better platform

When confronted with coordinating many concomitant coding activities to accomplish a single well-defined goal, it is natural for an engineer to solve the more general problem rather than doing tedious work repeatedly. The solution is to “go meta”; instead of programming inefficiently, develop a better language to program in. Programming has evolved from machine language to assembly language for humans to express instructions more intuitively. Assembly evolved to structured languages with a long history of advances in control and data flow. Programming languages have evolved in conjunction with virtualization of the machine (i.e., bytecode) to provide better abstractions of software and hardware capabilities. In the spirit of Guy L. Steele’s Growing a Language talk from OOPSLA ’98, components, libraries, and frameworks have been developed using a programming language that itself supports extending the language itself within limits. All of these innovations continually raise the level of abstraction to increase human productivity.

We are hitting the limits of what can be expressed efficiently in today’s languages. We have a database storage abstraction that is separate from server-side application logic, which is itself separate from client-side (Web browser) presentation. There is growing support for database and server-side abstractions to scale beyond the confines of individual machines. Clustering enables a software to take advantage of multiple machines to distribute load and provide redundancy in case of failure. However, our abstractions seem to stop at the boundaries between database storage, server-side application logic, and client-side presentation. Hence, we have awkward impedance mismatches when integrating top-to-bottom. We also have impedance mismatches when integrating together heterogeneous application components or services, as RESTful and SOAP Web Services technologies cross the boundaries between distributed software components, but this style of control and data flow (remote procedure calls) is entirely foreign to the programming language. That is why we must perform inconvenient translations between persistent entities and their bindings to various serialized representations (JSON, XML).

It seems natural that these pain points will be relieved by again raising the level of abstraction so that these inefficiencies will be eliminated. Ease of human expression will better enable programming for non-programmers. We are trying to shape the world so that humans and machines can work together harmoniously. Having languages that facilitate effective communication is a big part of that. To get this right, we need to go meta.

Reliable Messaging with REST

Marc de Graauw’s article Nobody Needs Reliable Messaging remains as relevant today as it did in 2010, when it was first published. It echoes the principles outlined in Scalable, Reliable, and Secure RESTful services from 2007.

It basically says that you don’t need for REST to support WS-ReliableMessaging delivery requirements, because reliable delivery can be accomplished by the business logic through retries, so long as in the REST layer its methods are idempotent (the same request will produce the same result). Let’s examine the implications in more detail.

First, we must design the REST methods to be idempotent. This is no small feat. This is a huge topic that deserves its own separate examination. But let’s put this topic aside for now, and assume that we have designed our REST web services to support idempotence.

If we are developing components that call REST web services for process automation, the above principle says that the caller is responsible for retrying on failure.

The caller must be able to distinguish a failure to deliver the request from a failure by the server to perform the requested method. The former should be retried, expecting that the failure is temporary. The latter is permanent.

The caller must be able to implement retry in an efficient manner. If the request is retried immediately in a tight loop, it is likely to continue to fail for the same reason. Network connectivity issues sometimes take a few minutes to be resolved. However, if the reason for failure is because the server is overloaded, having all clients retry in a tight loop will exacerbate the problem by slamming the server with a flood of requests, when it is least able to process them. It would be helpful if clients would behave better by backing off for some time and retrying after a delay. Relying on clients to behave nicely on their honor is sure to fail, if their retry logic is coded ad hoc without following a standard convention.

The caller must be able to survive crashes and restarts, so that an automated task can be relied upon to reach a terminal state (success or failure) after starting. Therefore, message delivery must be backed by a persistent store. Delivery must be handled asynchronously so that it can be retried across restarts (including service migration to replacement hardware after a hardware failure), and so that the caller is not blocked waiting.

The caller must be able to detect when too many retry attempts have failed, so that it does not get stuck waiting forever for the request to be delivered. Temporary problems that take too long to be resolved need to be escalated for intervention. These requests should be diverted for special handling, and the caller should continue with other work, until someone can troubleshoot the problem. Poison message handling is essential so that retrying does not result in an infinite loop that would gum up the works.

POST methods are not idempotent, so retry must be handled very carefully to account for side-effects. Even if the request is guaranteed to be delivered, and it is processed properly (exactly once) by the server, the caller must be able to determine if the method succeeded reliably, because the reply can be lost. One approach is to deliver the reply reliably from the server back to the caller. Again, all of the above reliable delivery qualities apply. The interactions to enable this round trip message exchange certainly look very foreign to the simple HTTP synchronous interaction. Either the caller would poll for the reply, or a callback mechanism would be needed. Another approach is to enable the caller to confirm that the original request was processed. With either approach, the reliable execution requirement needs to alter the methods of the REST web services. To achieve better quality of service in the transport, the definition of the methods need to be radically redesigned. (If you are having a John McEnroe “you cannot be serious” moment right about now, it is perfectly understandable.)

Taking these requirements into consideration, it is clear that it is not true that “nobody needs reliable messaging”. Enterprise applications with automated processes that perform mission-critical tasks need the ability to perform those tasks reliably. If reliable message delivery is not handled at the REST layer, the responsibility for retry falls to the message sender. We still need reliable messaging; we must implement the requirement ourselves above REST, and this becomes troublesome without a standard framework that behaves nicely. If we accept that REST can provide only idempotence toward this goal, we must implement a standard framework to handle delivery failures, retry with exponential back off, and divert poison messages for escalation. That is to say, we need a reliable messaging framework on top of REST.

[Note that when we speak of a “client” above, we are not talking about a user sitting in front of a Web browser. We are talking about one mission-critical enterprise application communicating with another in a choreography to accomplish some business transaction. An example of a choreography is the interplay between a buyer and a seller through the systems for commerce, quote, procurement, and order fulfillment.]

OLTP database requirements

Here is what I want from a database in support of enterprise applications for online transaction processing.

  1. ACID transactions – Enterprise CRM, ERP, and HCM applications manage data that is mission critical. People’s jobs, livelihoods, and businesses rely on this data to be correct. Real money is on the line.
  2. Document oriented – A JSON or XML representation should be the canonical way that we should think of objects stored in the database.
  3. Schema aware – A document should conform to a schema (JSON Schema or XML Schema). Information has a structure and meaning, and it should have a formal definition.
  4. Schema versioned – A document schema may evolve in a controlled manner. Software is life cycle managed, and its data needs to evolve with it for compatibility, upgrades, and migration.
  5. Relational – A subset of a document schema may be modeled as relational tables with foreign keys and indexes to support SQL queries, which can be optimized for high performance.

The fundamental shift is from a relational to a document paradigm as the primary abstraction. Relational structures continue to play an adjunct role to improve query performance for those parts of the document schema that are heavily involved in query criteria (WHERE clauses). The document paradigm enables the vast majority of data to be stored and retrieved without having to rigidly conform to relational schema, which cannot evolve as fluidly. That is not to say that data stored outside of relational tables is less important or less meaningful. To the contrary, some of the non-relational data may be the most critical to the business. This approach is simply recognizing information that is not directly involved in query criteria can be treated differently to take advantage of greater flexibility in schema evolution and life cycle management.

Ideally, the adjunct relational tables and SQL queries would be confined by the database to its internal implementation. When exposing a document abstraction to applications, the database should also present a document-oriented query language, such as XQuery or its equivalent for JSON, which would be implemented as SQL, where appropriate as an optimization technique.

NoSQL database technology is often cited as supporting a document paradigm. NoSQL technologies as they exist today do not meet the need, because they do not support ACID transactions and they do not support adjunct structures (i.e., relational tables and indexes) to improve query performance in the manner described above.

Perhaps the next best thing would be to provide a Java persistent entity abstraction, much like EJB3/JPA, which would encapsulate the underlying representation in a document part (e.g., as a XMLType or a JSON CLOB column) and a relational part, all stored in a SQL database. This would also provide JAXB-like serialization and deserialization to and from JSON and XML representations. This is not far from what EclipseLink does today.

transparent persistence

Advantages

Transparent persistence has emerged into the mainstream over the past few years with the popularity of JDO and JPA for enterprise application development. This approach offers the following advantages.

  1. Domain modeling is expressed naturally as plain old Java objects (POJOs) without having to program any of the SQL or JDBC calls that are traditionally coded by hand.
  2. Navigation through relationships – objects are naturally related through references, and navigating a relationship will automatically load the related object on demand.
  3. Modified objects are stored automatically when the transaction is committed.
  4. Persistence by reachability – related objects are automatically stored, if they are reachable from another persistent object.

The programming model is improved by eliminating the tedium that is traditionally associated with object persistence. Loading, storing, and querying are all expressed in terms of the Java class and field names, as opposed to the physical schema names. The programmer is largely insulated from the impedance mismatch between Java objects and the relational database. The software can be expressed purely in terms of the domain model, as represented by Java objects.

Challenges

When developing domain objects, persistence is only one aspect. The business logic that applies to the graph of related objects is the most important concern. Transparent persistence introduces challenges to executing business logic to enforce constraints and complex business rules when creating, updating, and deleting persistent objects through reachability.

For example, an equipment rental application may need to enforce the following constraints:

  • When creating equipment, it must be related to a location.
  • When creating a rental, it must be related to a customer, and ensure that the equipment is available for the duration of the rental.
  • When updating a rental, it must ensure that equipment is available for the duration of the rental.

JPA 2.0 does not provide sufficient mechanisms for enforcing these constraints, when creating or updating these entities through reachability. The responsibility is placed on a service object to manage these graphs of entities. The constraint checking must be enforced by the service object per transaction. Java EE 5 does not provide any assistance to ensure that the constraint checks (implemented in Java) are deferred until commit, so that they are not repeated, when performing a sequence of operations in the same transaction.

Adding a preCommit event to a persistent object would provide a good place for expressing constraints. Allowing this event to be deferred until transaction commit would provide the proper optimization for good performance. Of course, preCommit would need to prevent any further modifications to the persistent objects enlisted in the transaction. This would factor out many of the invariants so that they are expressed per entity, removing the responsibility from every operation on service objects, which is prone to programmer error. The domain model would be greatly improved.

java christmas wish list 2008

JPA preCommit

Similar to the other events (PrePersist, PreRemove, PostPersist, PostRemove, PreUpdate, PostUpdate, and PostLoad), JPA needs to add a preCommit event. This would be useful for enforcing constraints (invariants) using Java logic, similar to how less expressive deferred constraints can be enforced in SQL.

read-only transaction

javax.transaction.UserTransaction needs the ability to begin a transaction with an awareness of whether the transaction will be read-only or read-write. A read-only transaction would prevent writes (inserts, updates, and deletes) from being done.

dynamic immutability

It would be helpful if an instance of an object can be mutable, when used by some classes (e.g., builder, factory, repository, deserializer), and immutable, when used by others. This would facilitate the ability to load persistent objects from a data store, derive transient fields from persistent fields, and marking the instance as immutable if the transaction is read-only. I do not want to develop entities that have both a mutable class and an immutable class; and access control (private, protected) is not sufficient, if the mutability is dependent on context (e.g., read-only transaction).

OOPs, here comes SOA (again)

Last week, I posted the following facetious comment about Service Oriented Architecture.

Posted By: Ben Eng on May 24, 2004 @ 09:27 PM in response to Message #123260

Strip all the behavior off of our domain objects. Define data structures to pass around. Provide sets of operations as stateless services, which perform functions on data, and return data to its callers. That sounds revolutionary.

All we have to do now is eliminate the object-oriented programmers, and the revolution will be complete.

There is a serious side to the issue. The recent popularity of SOA should give object oriented programmers much pause. Where do objects fit in the world of services? Are objects passé?

In the book Bitter EJB (p.57), the distinction is made between service-driven and domain-driven approaches. I am strong proponent of domain-driven approaches. However, maybe three years ago, I began feeling that a purely domain-driven approach was awkward for modeling object graphs with very sophisticated global constraints. This is a natural topic of interest, considering that I model telecommunications networks for a living.

Factoring out the global constraints, instead of embedding them within the domain objects, seemed natural. I started to develop stateless session local interfaces (using plain old Java objects, not EJB), which use domain objects as parameters and return values. Then I found myself adding conversational state to the session objects. I’m not sure, where I’m going with this yet. We’ll see after a few more weeks or months of experimentation.

After having developers fumble around with EJBs for the past four years, I am becoming disillusioned with the technology. XML and Web Services is going even further down the road of coarse-grained distributed components, which are a consequence of high latency due to expensive data marshaling and remote communications. Whenever EJB or Web Services is involved, the client code looks like garbage – calling functions on data, rather than methods on objects. Optimizing the interface to reduce client-server round trips pollutes the interface, until it also looks like garbage; objects become so polluted with denormalized data that the concepts are no longer recognizable. When faced with sophisticated object graphs, rather than simple services, the approach leads ultimately to garbage and more garbage. I’m tempted to call the approach Garbage Oriented Architecture, rather than SOA.

proprietary J2EE

I am less than impressed with this article.

The author is arguing that proprietary extensions threaten the promise of standards based application portability—that Java’s promise of “write once, run anywhere” is dead, and was never a reality anyway. This demonstrates to me a tremendous lack of vision.

The J2EE platform is growing as a language. This means starting small and collecting the best contributions over time into a larger, more capable language. The language evolves with the community that uses it. A standards based approach to application development suggests that if you constrain yourself to using only the standard features and interfaces, then you should be portable to any implementation. Generally, that is true of J2EE; standards based portability is certainly more a reality with J2EE than on any other equivalent platform (e.g., Microsoft .NET).

Proprietary extensions by application server vendors are expected. Applications written to the standard will still be portable, because they do not depend on proprietary extensions. In time, the scope of standards will expand, as technologies mature and become commonplace. The industry’s best practices become encoded into standards, because the market demands it for portability (reducing costs). Innovation (proprietary extensions) will happen further out into the leading edge, where technologies are unproven. Vendors can differentiate their product based on quality of implemention and value added features. The latter requires extending beyond the standard features.

The article paints a picture of a static standard, which is being eroded. In reality the J2EE standard is constantly evolving to expand its capabilities. Application server vendors are moving in lock step to embrace the newest standard; often, their products have already implemented the standard by the time it is officially published. In reality, J2EE application portability is continually improving rather than being threatened due to this natural evolution.