Category Archives: software

OLTP database requirements

August 21, 2015 Ben Eng Leave a comment

Here is what I want from a database in support of enterprise applications for online transaction processing (OLTP).

ACID transactions – Enterprise CRM, ERP, and HCM applications manage data that is mission critical. People’s jobs, livelihoods, and businesses rely on this data to be correct. Real money is on the line.
Document oriented – A JSON or XML representation should be the canonical way that we should think of objects stored in the database.
Schema aware – A document should conform to a schema (JSON Schema or XML Schema). Information has a structure and meaning, and it should have a formal definition.
Schema versioned – A document schema may evolve in a controlled manner. Software is life cycle managed, and its data needs to evolve with it for compatibility, upgrades, and migration.
Relational – A subset of a document schema may be modeled as relational tables with foreign keys and indexes to support SQL queries, which can be optimized for high performance.

The fundamental shift is from a relational to a document paradigm as the primary abstraction. Relational structures continue to play an adjunct role to improve query performance for those parts of the document schema that are heavily involved in query criteria (WHERE clauses). The document paradigm enables the vast majority of data to be stored and retrieved without having to rigidly conform to relational schema, which cannot evolve as fluidly. That is not to say that data stored outside of relational tables is less important or less meaningful. To the contrary, some of the non-relational data may be the most critical to the business. This approach is simply recognizing information that is not directly involved in query criteria can be treated differently to take advantage of greater flexibility in schema evolution and life cycle management.

Ideally, the adjunct relational tables and SQL queries would be confined by the database to its internal implementation. When exposing a document abstraction to applications, the database should also present a document-oriented query language, such as XQuery or its equivalent for JSON, which would be implemented as SQL, where appropriate as an optimization technique.

NoSQL database technology is often cited as supporting a document paradigm. NoSQL technologies as they exist today do not meet the need, because they do not support ACID transactions and they do not support adjunct structures (i.e., relational tables and indexes) to improve query performance in the manner described above.

Perhaps the next best thing would be to provide a Java persistent entity abstraction, much like EJB3/JPA, which would encapsulate the underlying representation in a document part (e.g., as a XMLType or a JSON CLOB column) and a relational part, all stored in a SQL database. This would also provide JAXB-like serialization and deserialization to and from JSON and XML representations. This is not far from what EclipseLink does today.

software

Applied Cosmology: The Holographic Principle

February 9, 2014 Ben Eng Leave a comment

The Holographic Principle says that a full description of a volume of space is encoded in the surface that bounds it. This arises from black hole thermodynamics, where the black hole entropy increases with its surface area, not its volume. Everything there is to know about the black hole’s internal content is on its boundary.

Software components have boundaries that are defined by interfaces, which encapsulate everything an outsider needs to know to use it. Everything about its interior is represented by its surface at the boundary. It can be treated like a black box.

software

Applied Cosmology: Self Similar Paradigm

February 9, 2014 Ben Eng Leave a comment

Robert Oldershaw’s research on The Self Similar Cosmological Paradigm recognizes that nature is organized in stratified hierarchy, where every level is similar. The shape and motions of atoms is similar to stellar systems. Similarities extend from the stellar scale to the galactic scale, and beyond.

Managing complexity greatly influences software design. Stratified hierarchy is familiar to this discipline.

At the atomic level, we organize our code into units. Each unit is a module with a boundary, which exposes its interface that governs how clients interact with this unit. The unit’s implementation is hidden behind this boundary, enabling it to undergo change independently of other units, as much as possible.

We build upon units by reusing modules and integrating them together into larger units, which themselves are modular and reusable in the same way. Assembly of modules into integrated components is the bread and butter of object-oriented programming. This approach is able to scale up to the level of an application, which exhibits uniformity of platform technologies, programming language, design metaphors, conventions, and development resources (tools, processes, organizations).

The next level of stratification exists because of the need to violate the uniformity across applications. However, the similarity is unbroken. We remain true to the principles of modular reuse. We continue to define a boundary with interfaces that encapsulate the implementation. We continue to integrate applications as components into larger scale components that themselves can be assembled further.

Enterprises are attempting to enable even higher levels of stratification. They define how an organization functions and how it interfaces with other organizations. This is with respect to protocols for human interaction as well as information systems. Organizations are integrated into business units that are integrated into businesses at local, national, multi-national, and global scales. Warren Buffett’s Berkshire Hathaway has demonstrated how entire enterprises exhibit such modular assembly.

This same pattern manifests itself across enterprises and across industries. A company exposes its products and services through an interface (branding, pricing, customer experience) which encapsulates its internal implementation. Through these protocols, we integrate across industries to create supply chains that provide ever more complex products and services.

software

Applied Cosmology: Machian Dynamics

February 9, 2014 Ben Eng Leave a comment

Julian Barbour wrote the book titled “The End of Time: The Next Revolution in Physics” [http://www.platonia.com/ideas.html]. He explains how our failure to unify General Relativity with Quantum Theory is because of our ill-conceived preoccupation with time as a necessary component of such a theory. According to Machian Dynamics, a proper description of reality is composed of the relationships between real things, not a description with respect to an imaginary background (space and time). Therefore, all you have is a configuration of things, which undergoes a change in arrangement. The path through this configuration-space is what we perceive as the flow of time.

We apply this very model of the universe in configuration management.

Software release management is a configuration management problem. The things in configuration-space are source files. A path through configuration space captures the versions of these source files relative to each other as releases of software are built. Our notion of time is with respect to these software releases.

Enterprise resource management in the communications industry involves many configuration management problems in various domains. We normally refer to such applications as Operations Support Systems.

In network resource management, the configuration-space includes devices and other resources in the network, their connectivity, and the metadata (what is normally called a “device configuration” which needs to be avoided in the context of this discussion for obvious reasons) associated with that connectivity arrangement.

In service resource management, the configuration-space includes services, their resource allocations, and the subscription metadata (what is normally called a “service configuration” which needs to be avoided in the context of this discussion for obvious reasons) or “design”.

Such applications have a notion of configuration-space, because such systems cannot operate in a world that is limited to its dependence on a background of space and time. We need to be able to travel backward and forward in time arbitrarily to see how the world looked in the past from the perspective of a particular transaction. These applications enable users to hypothesize many possible futures. Perhaps only one of which is brought into reality through a rigorous process of analysis, design, planning, procurement, construction, and project management. Reality is always from the perspective of the observer, and one’s frame of reference is always somewhere on the path in configuration-space.

software

Software engineering is applied cosmology

February 9, 2014 Ben Eng 1 Comment

Engineering is applied science. Some people believe that software engineering is applied computer science. In a limited sense, it is. But software is not entirely separated from hardware. Applications are not entirely separated from processes. Systems are not entirely separated from enterprises. Corporations are not entirely separated from markets. For this reason, I believe what we do is not software engineering at all. It is not limited to applied computer science. Our engineering discipline is actually applied cosmology.

software

what is wrong with TM Forum SID?

November 11, 2012 Ben Eng Leave a comment

What is TM Forum?

The TM Forum is a standards organization for the communications industry. Rather than defining standards for particular network technologies, TM Forum is most interested in how to manage the processes across the entire service provider’s business. Their approach to modeling the communications problem domain divides the space into processes (eTOM), data (SID), applications (TAM), and integration. TM Forum has gained wide acceptance in the industry, and it has become the only game in town at the size and scale of its membership and audience.

What are eTOM and SID?

The business process framework and information framework together serve as an analysis model. They provide a common vocabulary and model for conceptualizing the communications problem space. As with any standards organization, the specifications are an amalgam of contributions from its diverse membership, and the result is formed by consensus.

The business process framework (eTOM) decomposes the business starting at the macro level and explores the parts at more granular levels of activities. Processes are described in terms of the actors, roles, responsibilities, goals, and the types of relevant information involved. There is no presumption of whether activities are performed by humans or automated systems.

The information framework (SID) decomposes the data into the product, service, and resource domains. Each domain is a model of entities and relationships presented as an object model using UML as the notation. The product domain is concerned with customer-facing and commercial interests. The service domain is concerned with abstracting the network resources into capabilities that can be parameterized for commercialization and that are of value to deliver to subscribers. The resource domain is concerned with the networks that enable services to be delivered.

What is wrong with TM Forum SID

As an analysis model, the eTOM and SID do a decent job to help people understand the problem space. However, these standards are problematic, because proponents promote them as being detailed and precise enough to be considered design models that can be translated directly into a software implementation. More accurately the framework is promoted as a starting point, which implies the ability to extend in a robust manner, and it is this position that I challenge. This position is stated in principle and demonstrated in practice through code generation tools. The separation of behavioral and structural modeling seems like a natural approach to organizing concepts, but at the level of detail to design software behavior and data need to come together cohesively as transactions. Object-oriented or service-oriented techniques would normally do this.

By representing SID in UML, it has the appearance of being an object model, but what it omits is behavior in the form of operations and their signatures. This is explained away by delegating the responsibility for interface specifications to the integration framework. That certainly contradicts the position that SID is a design model which translates into implementation.

Missing detailed transactional behavior

The integration framework tries to define software interfaces that directly use the SID entities. The result is a collection of interfaces that only superficially represents the behavior of the system. Because of the methodology of separating process modeling from data modeling, the interfaces are very CRUD-like, and the majority of behavior is assumed to continue residing in process integration (i.e., BPEL). This would work well for service provider specific business processes, but it is the wrong approach for defining transactional behavior that is intrinsic to the domain. These behaviors include life cycle management, capacity management, utilization and availability, compatibility and eligibility, and topology (rules and constraints for designing services and networks).

Because eTOM does not describe the behavior to this level of detail, SID does not define the entities and relationships to a level of detail that supports these essential behaviors. Consequently, neither do the interfaces defined by the integration framework. The problem is that many in the community view the model as robust with rich and unambiguous semantics, when this is far from true. If the very generic-sounding elements of the SID are used to model some of the behaviorally rich capabilities like resource utilization, we quickly discover missing attributes, missing relationships, and missing entities. Even worse, we may even find that the existing elements conflict with how the model needs to be defined to support the behavior. Sometimes a simple attribute needs to become an entity (or a pattern of entities) with a richer set of attributes and relationships. Sometimes relationships on a generalized entity are repeated in a specialized way on a specialized entity, and this is ambiguous for implementation. These issues undermine the robust extensibility of the framework, making the idea that the framework is easily usable as a starting point (extend by adding) very suspect, because extensions require radical destabilizing redesign.

Software design is a complex set of trade-offs to balance many opposing forces, including performance, scalability, availability, and maintainability alongside the functional requirements. SID cannot possibly be a design model, because none of these forces are considered. I strongly believe it would be an error to accept SID as a design model for direct implementation in software. I believe that any reasonable software implementation will have a data model that looks radically different than SID, but elements of the software interface will have a conceptual linkage to SID as its intellectual ancestry. If we think of SID in this capacity, we have some hope for a viable implementation.

software

designs are useless – like planning

June 18, 2010 Ben Eng Leave a comment

“In preparing for battle I have always found that plans are useless, but planning is indispensable.” -Eisenhower (from Planning Extreme Programming)

I believe this is true, because “no plan survives contact with the enemy”. In software, the enemy is in the form of dark spirits hidden within the code, both legacy and yet to be written. Because plans (the schedule of work to be done) are intimately tied to designs (models of the software), it must also be true that no design survives contact with the enemy. Any programmer who begins writing code based on a preconceived design will almost immediately feel the pain of opposing forces that beg to be resolved through refactoring; programmers who lack this emotional connection with their code are probably experiencing a failure of imagination (to improve the design).

Therefore, I think we can return to the original quote and state its corollary: designs are useless, but designing is indispensable.

All this is to say that for the above reasons, I think these ideas strongly support the notion that the process artifacts (e.g., Functional Solution Approach, Functional Design, Technical Design) are more or less useless, because as soon as they are written they are obsoleted by improvements that are discovered contemporaneously with each line of code written, but the act of producing them (the thought invested in designing) is indispensable to producing good software.

This leads me to conclude that we should not fuss so much about the actual content of the artifacts, so long as they capture the essence to show a fruitful journey through designing—that problems have been thought through and decisions have been based on good reasoning. Worrying about the content being perfectly precise, comprehensive, and consistent ends up being a waste of effort, since the unrelenting act of designing will have already moved beyond the snapshot captured in the artifact.

Coincidentally, this theme also aligns with the notion of a learning organization espoused by Lean. The value of designing is the facilitation of learning.

software

transparent persistence

January 4, 2009 Ben Eng Leave a comment

Advantages

Transparent persistence has emerged into the mainstream over the past few years with the popularity of JDO and JPA for enterprise application development. This approach offers the following advantages.

Domain modeling is expressed naturally as plain old Java objects (POJOs) without having to program any of the SQL or JDBC calls that are traditionally coded by hand.
Navigation through relationships – objects are naturally related through references, and navigating a relationship will automatically load the related object on demand.
Modified objects are stored automatically when the transaction is committed.
Persistence by reachability – related objects are automatically stored, if they are reachable from another persistent object.

The programming model is improved by eliminating the tedium that is traditionally associated with object persistence. Loading, storing, and querying are all expressed in terms of the Java class and field names, as opposed to the physical schema names. The programmer is largely insulated from the impedance mismatch between Java objects and the relational database. The software can be expressed purely in terms of the domain model, as represented by Java objects.

Challenges

When developing domain objects, persistence is only one aspect. The business logic that applies to the graph of related objects is the most important concern. Transparent persistence introduces challenges to executing business logic to enforce constraints and complex business rules when creating, updating, and deleting persistent objects through reachability.

For example, an equipment rental application may need to enforce the following constraints:

When creating equipment, it must be related to a location.
When creating a rental, it must be related to a customer, and ensure that the equipment is available for the duration of the rental.
When updating a rental, it must ensure that equipment is available for the duration of the rental.

JPA 2.0 does not provide sufficient mechanisms for enforcing these constraints, when creating or updating these entities through reachability. The responsibility is placed on a service object to manage these graphs of entities. The constraint checking must be enforced by the service object per transaction. Java EE 5 does not provide any assistance to ensure that the constraint checks (implemented in Java) are deferred until commit, so that they are not repeated, when performing a sequence of operations in the same transaction.

Adding a preCommit event to a persistent object would provide a good place for expressing constraints. Allowing this event to be deferred until transaction commit would provide the proper optimization for good performance. Of course, preCommit would need to prevent any further modifications to the persistent objects enlisted in the transaction. This would factor out many of the invariants so that they are expressed per entity, removing the responsibility from every operation on service objects, which is prone to programmer error. The domain model would be greatly improved.

software

java christmas wish list 2008

November 23, 2008 Ben Eng Leave a comment

JPA preCommit

Similar to the other events (PrePersist, PreRemove, PostPersist, PostRemove, PreUpdate, PostUpdate, and PostLoad), JPA needs to add a preCommit event. This would be useful for enforcing constraints (invariants) using Java logic, similar to how less expressive deferred constraints can be enforced in SQL.

read-only transaction

javax.transaction.UserTransaction needs the ability to begin a transaction with an awareness of whether the transaction will be read-only or read-write. A read-only transaction would prevent writes (inserts, updates, and deletes) from being done.

dynamic immutability

It would be helpful if an instance of an object can be mutable, when used by some classes (e.g., builder, factory, repository, deserializer), and immutable, when used by others. This would facilitate the ability to load persistent objects from a data store, derive transient fields from persistent fields, and marking the instance as immutable if the transaction is read-only. I do not want to develop entities that have both a mutable class and an immutable class; and access control (private, protected) is not sufficient, if the mutability is dependent on context (e.g., read-only transaction).

software

Java technology challenges

June 8, 2008 Ben Eng Leave a comment

Here is my Java wish list.

module deployment – Java has done well to define an archive format and class loading system that enables code to be organized into libraries. However, a modular application needs more than the deployment of Java classes. It also needs to accommodate the following:
- SQL DDL for initial database schema creation and on-going evolution
- SQL DML and possibly Java code for upgrading data as the schema evolves and the application features are upgraded
- XML documents and binary resources containing data that needs to be processed by the application and possibly loaded into the database
- HTML pages or templates (e.g., Tapestry) that can be dynamically added to a Web application
- Scripts (e.g., Groovy), rules, or other forms of code that can be dynamically executed.
module dependency – Java needs a better way to dynamically enable or disable behaviors that depend on whether modules of code are available, similar to how function_exists(f) works in PHP, without having to resort to invoking methods using Java reflection. A static programming model should be possible, while protecting a block of code to be conditionally executed only if another class is loadable.

There is no box

Category Archives: software

OLTP database requirements

Applied Cosmology: The Holographic Principle

Applied Cosmology: Self Similar Paradigm

Applied Cosmology: Machian Dynamics

Software engineering is applied cosmology

what is wrong with TM Forum SID?

designs are useless – like planning

transparent persistence

Advantages

Challenges

java christmas wish list 2008

JPA preCommit

read-only transaction

dynamic immutability

Java technology challenges

Insights into innovation