black hole and big bang singularities

Here is Ben’s theory about the impossibility of black hole and big bang singularities.

I think once we understand the Higgs mechanism better, we will discover that above a certain temperature, which rises as we work backward in time to make the universe more dense, the opposite of “condensation” happens. The bosons no longer have mass. No mass, no gravity; no longer contributing gravity means the very force that is squeezing things together stops squeezing at the core. I believe this should put a limit on how dense things can be, so it is impossible to form a singularity, if you cannot pass this density limit at the extreme interior.

I wish I knew a lick of math, so I could even comprehend what SU(2) × U(1) means. Sadly, I’ll have no chance of writing a paper and winning a Nobel prize. Math is hard.

cloud services for the enterprise

The Innovator’s Dilemma describes how the choice to sustain an incumbent technology may need to be weighed against pursuing disruptive new technologies. Nascent technologies tend to solve a desirable subset of a problem with greater efficiency. They change the game by making what used to be a costly high-end technology available as a commodity that is affordable to the masses. It turns out that high-end customers can often live without the rich capabilities of the costly solution, and they would rather save on cost. Meanwhile, with the success that the low-end solution is gaining in the market, it can invest in maturing its product to encroach into the high-end market. Eventually, the incumbent product’s market is entirely taken over by the rapidly growing upstart, who was able to establish a foothold in a larger installed base.

That is the situation we find ourselves in today with enterprise applications. Large companies rely on expensive software licenses for Customer Relationship Management, Enterprise Resource Management, and Human Capital Management applications deployed on-premise. Small and medium sized businesses may not be able to afford the same kinds of feature rich software, because not only is the software license and annual maintenance cost expensive, but commercial off the shelf software for enterprises are typically platforms that require months of after-market solution development, customization, and system integration to tailor the software to the business policies and processes specific to the enterprise. The evolution to cloud services aims to disrupt this situation.

Let us explore the ways that cloud services aim to be disruptive.

As described above, traditional enterprise software are platforms. An incumbent product that wants to evolve to cloud without disrupting its code base will merely be operating in a sustaining mode, not achieving significant gains in efficiency. Being more PaaS-like, the prohibitive cost and onerous effort of after-market solution development remains a huge barrier to entry for customers. To become SaaS-like, a cloud service must be useful by default, immediately of value to the end users of its enterprise tenants.

Cloud services are disruptive by providing improved user experiences. Of course, this means a friendlier Web user interface that is optimized for users to perform their work more easily and intuitively. User interfaces need to be responsive to device screen size, orientation, locale, and input method. Cloud services also provide advantages for enterprise collaboration by enabling the work force to be mobile. Workers need to become more decoupled in space and time, so they can be more geographically dispersed and global in reach. Cloud services should assist in transforming how employees work together, not just replacing the same old ways of doing our jobs using a Web browser instead of a desktop application. Mobile applications may even enable new ways of interacting that are not recognizable today.

Cloud services are disruptive economically. Subscription pricing replaces perpetual software licensing and annual maintenance costs along with the capital costs of hardware infrastructure, IT staffing to operate an on-premise deployment, and on-going infrastructure maintenance, upgrades, and scaling. Subscription pricing in and of itself is not transformational. It is only superficially different by virtue of amortizing the traditional cost of on-premise deployment over many recurring payments. The main benefit is in eliminating the financial risk associated with huge up-front capital expenditures in case the project fails. Migrating a traditional on-premise application into the cloud is not really financially disruptive unless it can significantly alter the costs involved. In fact, by taking on the capital cost of infrastructure and the operational cost of the deployment, the software vendor has now cannibalized its on premise application business and replaced it with a lower margin business with high upfront costs and risk—this is a terrible formula for profitability and a healthy business.

Multi-tenancy provides this disruptive benefit. Multi-tenancy enables a cloud service to support users from multiple tenants. This provides significant cost advantages over single-tenant deployments in terms of resource utilization, simplified operations, and economies of scale. Higher deployment density translates directly into higher profit, but by itself multi-tenancy provides no visible benefit to users. The disruption comes when the vendor realizes that at scale multi-tenancy enables a new tenant to be provisioned at near zero cost. This opens up the possibility of offering an entry level service to new tenants at a low price point, because the cost to the vendor is zero. Zero cost entry-level pricing is transformational by virtue of making a cloud service available to small enterprises who would never have been able to afford such capabilities in the past. This enables innovation to be done by individual or small scale entrepreneurs (start-ups), who have the most radical, risky, and unconventional, paradigm-shifting ideas.

Elastic scaling provides another disruptive benefit. It enables a cloud service to perform as required as a tenant grows from seeding a proof-of-concept demonstrator to large scale (so-called Web scale) production. The expertise, techniques, and resources needed to scale a deployment are difficult and costly to acquire. When a vendor can provide this pain-free, an enormous burden is lifted from the tenant’s shoulders.

Cloud services evolve with the times through DevOps and continuous delivery. Traditional on-premise applications tend to be upgraded rarely due to the risk and high development cost of customization, which tends to suffer from compatibility breakage. Enterprise applications are often not upgraded for years. “If it ain’t broke, don’t fix it.” Even though the software vendor may be investing heavily in feature enhancements, functional and performance improvements, and other innovations, users don’t see the benefits in a timely manner, because the enterprise cannot afford the pain of upgrading. When the vendor operates the software as a SaaS offering, upgrades can be deployed frequently and painlessly for all tenants. Users enjoy the benefit of software improvements immediately, as the cloud service stays up-to-date with the current competitive business environment.

Combining the abilities to provision a tenant to be useful immediately by default, to start at near zero cost, to scale with growth, and to evolve with the times, cloud services provide tools that can enable business agility. A business needs to be able to turn on a dime, changing what they sell and how they operate in order to stay ahead of their competitors. Cloud services are innovative and disruptive in these ways in order to enable their enterprise tenants to be innovative and disruptive.

intent modeling as a programming language

In Tom Nolle’s blog article titled What NFV Needs is “Deep Orchestration”!, he identifies the need for a modernized Business Support System and Operations Support System (BSS/OSS) to improve operations efficiency and service agility by extending service-level automation uniformly downward into the network resources, including the virtual network functions and the NFV infrastructure.

Service orchestration is the automation of processes to fulfill orders for life cycle managed information and communications technology services. Traditionally, this process automation problem has been solved through business process modeling and work flow management, which includes a great deal of system integration to glue together heterogeneous software components that do not naturally work together. The focus of process modeling is on “how” to achieve the desired result, not on “what” that result is. The “what” is the intent; the content of the order captures the intent.

To achieve agility in launching services, we must be able to model services in a manner that allows a service provider to redefine the service to suit the current business need. This modeling must be done by product managers and business analysts, experts in the service provider’s business. Any involvement of software developers and system integrators will necessarily require programming at a level of abstraction that is far below the concepts that are natural to the service provider’s business. The software development life cycle is very costly and risky, because the abstractions are so mismatched with the business. When service modeling directly results in its realization in a completely automated executable runtime without involving other humans in any software development activities, this becomes Programming for Non-programmers.

The key is Going Meta. The “what” metadata is the intent modeling. The “how” metadata is the corresponding fulfillment and provisioning behavior (service-level automation). If the “what” and “how” can be designed as a language that can be expressed in modular packages, which are reusable by assembling higher level intent based on lower level components, this would provide an approach that would facilitate the service agility users are looking for. Bundling services together and utilizing lower level services as resources that support a higher level service are familiar techniques, which would be familiar to users who are designing services. When users express themselves using this language, they are in fact programming, but because the language is made up entirely of abstractions that are familiar and natural to the business, it does not feel burdensome. General purpose programming languages like Java feel burdensome, because the abstractions are for a low level computational machine, not a high level business-oriented machine for service-level automation of human intent.

Our challenge in developing a modernized BSS/OSS is to invent this language for intent modeling for services. An IETF draft titled YANG Data Models for Intent-based NEtwork MOdel attempts to define a flavor of intent modeling for network resources. An IETF draft titled Intent Common Information Model attempts to define a flavor of intent modeling that is very general, but it is far removed from any executable runtime that can implement it, because it is so imprecise (not machine executable). ETSI NFV MANO defines an approach that captures intent as descriptors for network services as network functions and their connections. However, these abstractions are not expressive enough to extend upward into the service layer, across the entire spectrum of network technologies (physical and virtualized), and into the “how” for automation, to enable the composition of resources into services and the utilization of services as resources to support higher level services that can be commercialized. More thought is needed to design a good language for this purpose and a virtual machine that is capable of executing the code that is produced from it.

vertical integration

Applications have been pursuing operational efficiency through vertical integration for years. This is generally understood to mean assembling infrastructure (machine and operating system) with platform components (database, middleware) and application components into an engineered system that is pre-integrated and optimized to work together.

Now, the evolution to cloud services is following the same pattern. IaaS is integrated into PaaS. IaaS and PaaS are integrated with application components to deliver SaaS. However, just as we see in on-premise enterprise information systems, applications do not operate in silos. They are integrated by business processes, and they must collaborate to enforce business policies across business functions and organizations.

Marketing is deeply interwoven with sales. Product configuration, pricing, and quotation are tied to order capture and fulfillment. Fulfillment involves inventory, shipping, provisioning, billing, and financial accounting. Customer service is linked with various service assurance components, billing care, and also quote and order capture. All components need views of accounts, assets (products and services subscribed to), agreements, contracts, and warranties. Service usage and demand all feed analytics to drive marketing campaigns that generate more sales. What a tangled web.

What is clear from this picture is that vertical integration does not end with infrastructure, platform, and a software application. Applications contribute components that combine with business processes and business policies to construct higher level applications. This may continue for many layers of integration according to the self-similar paradigm.

The evolution to cloud should recognize the need for integration of SaaS components with business processes and business policies. However, it does not appear as though cloud services have anticipated the need for vertical integration to continue in layers. To construct assemblies, the platform should provide a means of defining such assemblies, so that they can be replicated by packaging and deploying them on infrastructure at various scales. The platform should provide a consistent programming and configuration model for extending and customizing applications in ways that are natural to being reapplied layer by layer.

Vertical integration is not an elegantly solved problem for on-premise applications. On-premise application integration is notoriously complex due to heterogeneity and vastly inconsistent interfaces and programming models. One component’s notion of customer is another’s notion of party. Two components with notions of customer do not agree on its schema and semantics. A product to one component is an offer to another. System integration projects routinely cost five to ten times the software license cost of the application components, because of the difficulty of overcoming impedance mismatches, gaps in functional capabilities, duct tape, and bubblegum.

Examining today’s cloud platforms and the applications built upon them, it is looking like we have not learned much from our past mistakes. We are faced with the same costly and clunky integration nightmare with no breakthrough in sight.

encapsulation

Encapsulation is the packing of data and functions into a single component.

Under this definition, encapsulation means that the internal representation of an object is generally hidden from view outside of the object’s definition.

In 2004, I wrote OOPs, here comes SOA (again) to comment on how Service-Oriented Architecture (SOA) stands in stark contrast to Object-Oriented Programming (OOP).

In my previous article, Going Meta, we can see that beyond the scale of a single object, the architecture of an enterprise application breaks encapsulation by deliberately exposing data representations of entities. If data hiding through encapsulation is a fundamental principle of Object-Oriented Programming, that principle certainly breaks down at several levels. Technical challenges are partly to blame. However, I believe there are non-technical motivations to abandon data hiding.

From a technical perspective, the impedance mismatch between the middle tier and the database tier demands that the database schema be a central design consideration that is agreed upon by all stakeholders. The boundary between the middle tier and the database tier is all about the data and CRUD operations. This boundary may be more service-oriented, if the logic is implemented in the database tier as stored procedures, but the programming language available in the database is seldom expressiveness enough or natural enough for most developers to embrace. In the middle tier, domain services encapsulate the application logic, but the boundary between the middle tier and its clients (Web browsers and machine-to-machine integration with other applications) again is all about serialized data in the form of request, response, and fault messages for remote procedure calls (SOAP and RESTful services) or event messages (publish-subscribe). The entire data model (with very few exceptions for data that is used only for computations that are private to the logic) is exposed through these interfaces.

From a business perspective, it is natural to think in terms of business processes and the data artifacts (e.g., documents, files) that flow between tasks. The application services are merely a means of implementing those tasks; another means of implementing a task may be for a human to perform it either without the aid of software or with the aid of software that lacks the integration to enable the task to be performed automatically. Users do not think of their interactions with the software in terms of data hiding. They are very aware of the data structures that are relevant to the business.

Object-Oriented Programming is relegated to the micro level of an enterprise application. Encapsulation or data hiding is a concept that is relevant only to modules of logic, and these concepts do not extend naturally throughout the architecture for an enterprise application for both technical and business reasons. When developing enterprise applications and systems that involve business processes that integrate across applications, Object-Oriented Programming is a paradigm that sadly has little impact and relevance at a macro level, where SOA fills the vacuum. It seems like the software industry and computer science are still at a very immature stage of evolution, as we go without a programming paradigm that can unify how software is developed at the micro and macro levels.

going meta

Anatomy of an n-tier application

A fully functioning web app involves several layers of software, each with its own technology, patterns, and techniques.

At the bottom of the stack is the database. A schema defines the data structures for storage. A query language is used to operate on the data. Regardless whether the database is relational, object-relational, NoSQL, or some other type, the programming paradigm at the database tier is distinctly different than and quite foreign from the layers above.

Above the database is the middle tier or application server. This is where the server-side business logic, APIs, and Web components reside.

There is usually a set of persistent entities, which provide an object abstraction of the database schema. The database query language (e.g., SQL) may be abstracted into an object query language (e.g., JPQL) for convenience. The majority of CRUD (create, read, update, delete) operations can be done naturally in the programming language without needing to formulate statements in the database query language. This provides a persistent representation of the model of the application.

Above the persistent entities is a layer of domain services. The transactional behavior of the business logic resides in this layer. This provides the API (local) that encapsulates the essence of the application functions.

The domain services are usually exposed as SOAP or RESTful services to remote clients for access from Web browsers and for machine-to-machine integration. This would necessitate that JSON and/or XML representations be derived from the persistent entities (i.e., using JAXB). This provides a serialized representation of the model of the application.

We finally come to the presentation layer, which is divided into server-side components residing in the application server and client-side components that execute in the Web browser. Usually there is a presentation-oriented representation called a view-model, which matches the information rendered on views or input on forms. The view and controls are constructed from HTML, CSS, and JavaScript. The programming paradigm in these technologies is distinctly different than the layers below.

Extending the application

Let’s examine what it would take to extend an application with a simple type (e.g., string) property on an entity. The database schema would need to be altered. A persistent entity would need a field, getter and setter methods, and a binding between the field and a column in the database schema. The property may be involved in the logic of the domain services. Next, the JSON and XML binding objects would need to be augmented with the property, and logic would be added to transform between these objects and the persistent entities used by the domain services. At the presentation layer, the view-model would be augmented with the property to expose it to the views. Various views to show an entity’s details and search results would likewise be enhanced to render the property. For editing and searching, a field would need to be added on forms with corresponding validation of any constraints associated with that property and on-submit transaction handling.

That is an awful lot of repetitive work at every layer. There are many technologies and skill sets involved. Much of the work is trivial and tedious. The entire process is far from efficient. It is worse if there is division of labor among multiple developers who require coordination.

A better platform

When confronted with coordinating many concomitant coding activities to accomplish a single well-defined goal, it is natural for an engineer to solve the more general problem rather than doing tedious work repeatedly. The solution is to “go meta”; instead of programming inefficiently, develop a better language to program in. Programming has evolved from machine language to assembly language for humans to express instructions more intuitively. Assembly evolved to structured languages with a long history of advances in control and data flow. Programming languages have evolved in conjunction with virtualization of the machine (i.e., bytecode) to provide better abstractions of software and hardware capabilities. In the spirit of Guy L. Steele’s Growing a Language talk from OOPSLA ’98, components, libraries, and frameworks have been developed using a programming language that itself supports extending the language itself within limits. All of these innovations continually raise the level of abstraction to increase human productivity.

We are hitting the limits of what can be expressed efficiently in today’s languages. We have a database storage abstraction that is separate from server-side application logic, which is itself separate from client-side (Web browser) presentation. There is growing support for database and server-side abstractions to scale beyond the confines of individual machines. Clustering enables a software to take advantage of multiple machines to distribute load and provide redundancy in case of failure. However, our abstractions seem to stop at the boundaries between database storage, server-side application logic, and client-side presentation. Hence, we have awkward impedance mismatches when integrating top-to-bottom. We also have impedance mismatches when integrating together heterogeneous application components or services, as RESTful and SOAP Web Services technologies cross the boundaries between distributed software components, but this style of control and data flow (remote procedure calls) is entirely foreign to the programming language. That is why we must perform inconvenient translations between persistent entities and their bindings to various serialized representations (JSON, XML).

It seems natural that these pain points will be relieved by again raising the level of abstraction so that these inefficiencies will be eliminated. Ease of human expression will better enable programming for non-programmers. We are trying to shape the world so that humans and machines can work together harmoniously. Having languages that facilitate effective communication is a big part of that. To get this right, we need to go meta.

you aren’t gonna need it

“You aren’t gonna need it” (YAGNI) is a principle espoused by extreme programming (XP). It says to implement things only when you actually need them, never when you foresee the need. I see serious problems with this principle.

From a product management perspective, the product owner’s job is to define a release roadmap and a backlog of requirements for development that foresee a need in the market. There is a vision and product strategy that guides what should be developed and how to develop it. Does the YAGNI principle apply equally to this backlog? It can be argued that anything on the future roadmap is subject to change. Therefore, developers should not be building features, infrastructure, and platform capabilities in anticipation of those future needs, no matter how confident they are that the backlog will be implemented.

Doing the minimum to meet the immediate need leads to technical debt. YAGNI causes developers to defer solving more difficult architectural problems until the need becomes critical. Architectural problems (“-ilities”) are often systemic, which means as the code base grows the cost to refactor and fix becomes greater. Security and error handling are examples of systemic issues that are very difficult to fix later. If the problem grows too large, it becomes too costly to fix, because the work cannot be accomplished within a sprint without leaving the code broken (i.e., failed builds, failed tests), and that is absolutely forbidden. That constraint makes it impractical to fix large scale architectural problems throughout the code base, when the problem has become intractable. The software eventually collapses under the weight of its technical debt because of compounded interest, as the debt is multiplied with a growing code base.

If we take YAGNI too seriously, we are being deliberately short-sighted, ignoring the requirements that we can (and should!) foresee. YAGNI encourages us to discount future requirements, expecting that they may not come to pass. A myopic approach tends to lead to a dead end, if we do not take care to set a course in the right direction, and to ensure that we have equipped ourselves to travel there. If you allow YAGNI to make the road ahead too difficult to travel, those future requirements certainly will not come to pass, because you’ll be broken down on the side of the road. There won’t be anyone to come rescue you, because although you could foresee the need for roadside assistance, you didn’t pay for it on your auto insurance policy, because “you aren’t gonna need it”.

service and resource

I have written about TM Forum SID before in What is wrong with TM Forum SID? My criticisms were focused on deficiencies in behavioral modeling. In this article, I turn my attention to the structural model itself.

Let’s start with the concept of Resource. SID defines a model for resources to represent communications network functions. [GB922 Logical and Compound Resource R14.5.0 §1.1.2] This approach seems self-evident. So far, so good. (My intent is not to evaluate how effective the SID resource model is in achieving its goal.)

When we examine the concept of Service, we run into difficulties. In [GB922 Service Overview R14.5.0 §1.1.3], this overview of “service” makes no attempt to provide a precise definition of the term. This section references other standards efforts that have attempted to address the topic. It references various eTOM process areas that apply to service. Finally, it discusses the things that surround and derive from service. All the while, “service” remains undefined, as the document proceeds to a detailed structural decomposition. I don’t consider this a fatal flaw, because we can fill this gap ourselves through contemplation given SID’s circling around the abstraction to evade nailing its definition.

I would define “service” as something of value that can be delivered as a subscription by the resources of a communications network. That wasn’t too difficult. In the context of SID, this definition of “service” is not intended to include human activities that are provided to clients; that is an entirely different concept.

SID specializes “service” into two concepts: (1) customer-facing service and (2) resource-facing service. A CFS is a service that may be commercialized (branded, priced, and sold) as a product to customers. An RFS is not commercialized.

Here is where we begin to see things go wrong. When we model services, such as network connectivity, it may be a CFS under certain circumstances, but it may be a RFS under other circumstances. I think, at this point, SID should have recognized that the concepts of “service” and “resource” are roles that can be taken on by entities. They are not superclasses that are specialized. Using our example of network connectivity, when it is commercialized, it becomes a service, and when it is used to enable (directly or indirectly) something else to be delivered, it acts as a resource. The concepts of “service” and “resource” should be thought of more like “manager” and “employee”. A person is not intrinsically a manager or an employee; a person may take on one or both of these roles contextually. By not recognizing this pattern, this flaw in SID has made modeling very awkward for many types of communications network technologies, especially for layered services, and services that are built from other services (which are treated as resources).

social applications

Facebook and Twitter have flourished in our personal lives. But their usefulness for work is limited to advertising and other marketing activities. Engagement is through sharing of status updates, links, photos, likes, and comments. This is a decade old approach that has not advanced much.

In Mark Zuckerberg’s interview for Startup School, he shows his understanding that Facebook is a social platform for building social apps. However, it is my opinion that all of the players in the social networking space do not have good vision into the future. Facebook and Twitter treat social interactions as ends in themselves. That is why they present information in a timeline, and they seek out trending topics. Information is like news that is stale after it is read. Engagement is a vehicle for targeted marketing.

Google has tried to compete with Facebook, but they can’t seem to find a formula for success. The article Why Google+ failed, according to Google insiders outlines their failure to achieve mass adoption and engagement. Providing an alternative to Facebook without a discernible improvement is not competitive, because users have no good reason to migrate away from an established network of friends.

Facebook “friend” relationships are more likely to be friends, family, and casual acquaintances. Facebook “follow” and “like” relationships are more likely to be public figures, celebrities, and business-to-consumer connections. Facebook is not the platform for professional relationships, work-related interactions, and business associations. LinkedIn is used for professional relationships with recruiting as its primary function. We should recognize that none of these platforms provides an application platform for actually doing work using social tools. Google failed to recognize this opportunity, as they began to integrate G+ with mail, storage, and other services. Providing a wall for posting information and comments is an extremely limited function for social interaction. It seems like no one has bothered to analyze how workers engage with each other to perform their jobs, so as to identify how social tools can facilitate these interactions to be done with improved productivity.

We do see companies like Atlassian developing tools like JIRA and Confluence for assisting teams to work together. These tools recognize how social interactions are embedded into the information and processes that surround business functions. We need this kind of innovation applied across the board throughout the tools that we use in the enterprise.

Productive work relies on effective communication, coordination, and collaboration. These are social functions. Social networking is already mature in project management, wikis (crowd sourcing information), and discussion forums. But these are often peripheral to the tools that many workers use to perform their primary job functions. We need to be looking at the social interactions that surround these tools to redevelop these tools to facilitate improvements in social interaction.

Let’s explore where social interactions are poor in our work environments today.

As our businesses expand across the globe, our teams are composed of workers who reside in different places and time zones. Remote interactions between non-collocated teams can be extremely challenging and inefficient compared to workers who can have regular face-to-face interactions with tools like white boards and pens. There is a huge opportunity for tablet applications to better support remote workers.

As businesses scale, we may discover that the traditional organizational structures are too rigid to support the ever-accelerating pace of agility that we demand. Perhaps social tools can facilitate innovations in how workers organize themselves. As highly skilled and experienced workers mature, they become more capable of taking the initiative, making good decisions independently, and behaving in a self-motivated manner. Daniel Pink has identified that autonomy, mastery, and purpose are the intrinsic motivators that lead to happy and productive employees. Perhaps with social tooling, it is possible for organizations to evolve to take advantage of spontaneous order among workers instead of relying mostly on top-down management practices for assigning work.

These are two ways in which social networking may apply to enterprises in ways that are not well supported today. All we have to do is examine the pain points in our work environments to identify innovations that may be possible. It is quite surprising to me that we are not already seeing social tools revolutionize the work place, especially in the technology sector where start-ups do not have an entrenched culture and management style.

Reliable Messaging with REST

Marc de Graauw’s article Nobody Needs Reliable Messaging remains as relevant today as it did in 2010, when it was first published. It echoes the principles outlined in Scalable, Reliable, and Secure RESTful services from 2007.

It basically says that you don’t need for REST to support WS-ReliableMessaging delivery requirements, because reliable delivery can be accomplished by the business logic through retries, so long as in the REST layer its methods are idempotent (the same request will produce the same result). Let’s examine the implications in more detail.

First, we must design the REST methods to be idempotent. This is no small feat. This is a huge topic that deserves its own separate examination. But let’s put this topic aside for now, and assume that we have designed our REST web services to support idempotence.

If we are developing components that call REST web services for process automation, the above principle says that the caller is responsible for retrying on failure.

The caller must be able to distinguish a failure to deliver the request from a failure by the server to perform the requested method. The former should be retried, expecting that the failure is temporary. The latter is permanent.

The caller must be able to implement retry in an efficient manner. If the request is retried immediately in a tight loop, it is likely to continue to fail for the same reason. Network connectivity issues sometimes take a few minutes to be resolved. However, if the reason for failure is because the server is overloaded, having all clients retry in a tight loop will exacerbate the problem by slamming the server with a flood of requests, when it is least able to process them. It would be helpful if clients would behave better by backing off for some time and retrying after a delay. Relying on clients to behave nicely on their honor is sure to fail, if their retry logic is coded ad hoc without following a standard convention.

The caller must be able to survive crashes and restarts, so that an automated task can be relied upon to reach a terminal state (success or failure) after starting. Therefore, message delivery must be backed by a persistent store. Delivery must be handled asynchronously so that it can be retried across restarts (including service migration to replacement hardware after a hardware failure), and so that the caller is not blocked waiting.

The caller must be able to detect when too many retry attempts have failed, so that it does not get stuck waiting forever for the request to be delivered. Temporary problems that take too long to be resolved need to be escalated for intervention. These requests should be diverted for special handling, and the caller should continue with other work, until someone can troubleshoot the problem. Poison message handling is essential so that retrying does not result in an infinite loop that would gum up the works.

POST methods are not idempotent, so retry must be handled very carefully to account for side-effects. Even if the request is guaranteed to be delivered, and it is processed properly (exactly once) by the server, the caller must be able to determine if the method succeeded reliably, because the reply can be lost. One approach is to deliver the reply reliably from the server back to the caller. Again, all of the above reliable delivery qualities apply. The interactions to enable this round trip message exchange certainly look very foreign to the simple HTTP synchronous interaction. Either the caller would poll for the reply, or a callback mechanism would be needed. Another approach is to enable the caller to confirm that the original request was processed. With either approach, the reliable execution requirement needs to alter the methods of the REST web services. To achieve better quality of service in the transport, the definition of the methods need to be radically redesigned. (If you are having a John McEnroe “you cannot be serious” moment right about now, it is perfectly understandable.)

Taking these requirements into consideration, it is clear that it is not true that “nobody needs reliable messaging”. Enterprise applications with automated processes that perform mission-critical tasks need the ability to perform those tasks reliably. If reliable message delivery is not handled at the REST layer, the responsibility for retry falls to the message sender. We still need reliable messaging; we must implement the requirement ourselves above REST, and this becomes troublesome without a standard framework that behaves nicely. If we accept that REST can provide only idempotence toward this goal, we must implement a standard framework to handle delivery failures, retry with exponential back off, and divert poison messages for escalation. That is to say, we need a reliable messaging framework on top of REST.

[Note that when we speak of a “client” above, we are not talking about a user sitting in front of a Web browser. We are talking about one mission-critical enterprise application communicating with another in a choreography to accomplish some business transaction. An example of a choreography is the interplay between a buyer and a seller through the systems for commerce, quote, procurement, and order fulfillment.]

Insights into innovation