encapsulation – abandoning data hiding

Encapsulation is the packing of data and functions into a single component.

Under this definition, encapsulation means that the internal representation of an object is generally hidden from view outside of the object’s definition.

In 2004, I wrote OOPs, here comes SOA (again) to comment on how Service-Oriented Architecture (SOA) stands in stark contrast to Object-Oriented Programming (OOP).

In my previous article, Going Meta, we can see that beyond the scale of a single object, the architecture of an enterprise application breaks encapsulation by deliberately exposing data representations of entities. If data hiding through encapsulation is a fundamental principle of Object-Oriented Programming, that principle certainly breaks down at several levels. Technical challenges are partly to blame. However, I believe there are non-technical motivations to abandon data hiding.

From a technical perspective, the impedance mismatch between the middle tier and the database tier demands that the database schema be a central design consideration that is agreed upon by all stakeholders. The boundary between the middle tier and the database tier is all about the data and CRUD operations. This boundary may be more service-oriented, if the logic is implemented in the database tier as stored procedures, but the programming language available in the database is seldom expressiveness enough or natural enough for most developers to embrace. In the middle tier, domain services encapsulate the application logic, but the boundary between the middle tier and its clients (Web browsers and machine-to-machine integration with other applications) again is all about serialized data in the form of request, response, and fault messages for remote procedure calls (SOAP and RESTful services) or event messages (publish-subscribe). The entire data model (with very few exceptions for data that is used only for computations that are private to the logic) is exposed through these interfaces.

From a business perspective, it is natural to think in terms of business processes and the data artifacts (e.g., documents, files) that flow between tasks. The application services are merely a means of implementing those tasks; another means of implementing a task may be for a human to perform it either without the aid of software or with the aid of software that lacks the integration to enable the task to be performed automatically. Users do not think of their interactions with the software in terms of data hiding. They are very aware of the data structures that are relevant to the business.

Object-Oriented Programming is relegated to the micro level of an enterprise application. Encapsulation or data hiding is a concept that is relevant only to modules of logic, and these concepts do not extend naturally throughout the architecture for an enterprise application for both technical and business reasons. When developing enterprise applications and systems that involve business processes that integrate across applications, Object-Oriented Programming is a paradigm that sadly has little impact and relevance at a macro level, where SOA fills the vacuum. It seems like the software industry and computer science are still at a very immature stage of evolution, as we go without a programming paradigm that can unify how software is developed at the micro and macro levels.

going meta – the human-machine interface

Anatomy of an n-tier application

A fully functioning web app involves several layers of software, each with its own technology, patterns, and techniques.

At the bottom of the stack is the database. A schema defines the data structures for storage. A query language is used to operate on the data. Regardless whether the database is relational, object-relational, NoSQL, or some other type, the programming paradigm at the database tier is distinctly different than and quite foreign from the layers above.

Above the database is the middle tier or application server. This is where the server-side business logic, APIs, and Web components reside.

There is usually a set of persistent entities, which provide an object abstraction of the database schema. The database query language (e.g., SQL) may be abstracted into an object query language (e.g., JPQL) for convenience. The majority of CRUD (create, read, update, delete) operations can be done naturally in the programming language without needing to formulate statements in the database query language. This provides a persistent representation of the model of the application.

Above the persistent entities is a layer of domain services. The transactional behavior of the business logic resides in this layer. This provides the API (local) that encapsulates the essence of the application functions.

The domain services are usually exposed as SOAP or RESTful services to remote clients for access from Web browsers and for machine-to-machine integration. This would necessitate that JSON and/or XML representations be derived from the persistent entities (i.e., using JAXB). This provides a serialized representation of the model of the application.

We finally come to the presentation layer, which is divided into server-side components residing in the application server and client-side components that execute in the Web browser. Usually there is a presentation-oriented representation called a view-model, which matches the information rendered on views or input on forms. The view and controls are constructed from HTML, CSS, and JavaScript. The programming paradigm in these technologies is distinctly different than the layers below.

Extending the application

Let’s examine what it would take to extend an application with a simple type (e.g., string) property on an entity. The database schema would need to be altered. A persistent entity would need a field, getter and setter methods, and a binding between the field and a column in the database schema. The property may be involved in the logic of the domain services. Next, the JSON and XML binding objects would need to be augmented with the property, and logic would be added to transform between these objects and the persistent entities used by the domain services. At the presentation layer, the view-model would be augmented with the property to expose it to the views. Various views to show an entity’s details and search results would likewise be enhanced to render the property. For editing and searching, a field would need to be added on forms with corresponding validation of any constraints associated with that property and on-submit transaction handling.

That is an awful lot of repetitive work at every layer. There are many technologies and skill sets involved. Much of the work is trivial and tedious. The entire process is far from efficient. It is worse if there is division of labor among multiple developers who require coordination.

A better platform

When confronted with coordinating many concomitant coding activities to accomplish a single well-defined goal, it is natural for an engineer to solve the more general problem rather than doing tedious work repeatedly. The solution is to “go meta”; instead of programming inefficiently, develop a better language to program in. Programming has evolved from machine language to assembly language for humans to express instructions more intuitively. Assembly evolved to structured languages with a long history of advances in control and data flow. Programming languages have evolved in conjunction with virtualization of the machine (i.e., bytecode) to provide better abstractions of software and hardware capabilities. In the spirit of Guy L. Steele’s Growing a Language talk from OOPSLA ’98, components, libraries, and frameworks have been developed using a programming language that itself supports extending the language itself within limits. All of these innovations continually raise the level of abstraction to increase human productivity.

We are hitting the limits of what can be expressed efficiently in today’s languages. We have a database storage abstraction that is separate from server-side application logic, which is itself separate from client-side (Web browser) presentation. There is growing support for database and server-side abstractions to scale beyond the confines of individual machines. Clustering enables a software to take advantage of multiple machines to distribute load and provide redundancy in case of failure. However, our abstractions seem to stop at the boundaries between database storage, server-side application logic, and client-side presentation. Hence, we have awkward impedance mismatches when integrating top-to-bottom. We also have impedance mismatches when integrating together heterogeneous application components or services, as RESTful and SOAP Web Services technologies cross the boundaries between distributed software components, but this style of control and data flow (remote procedure calls) is entirely foreign to the programming language. That is why we must perform inconvenient translations between persistent entities and their bindings to various serialized representations (JSON, XML).

It seems natural that these pain points will be relieved by again raising the level of abstraction so that these inefficiencies will be eliminated. Ease of human expression will better enable programming for non-programmers. We are trying to shape the world so that humans and machines can work together harmoniously. Having languages that facilitate effective communication is a big part of that. To get this right, we need to go meta.