All posts by Ben Eng

I am a software architect in the Communications Applications Global Business Unit (CAGBU) within Oracle. I currently work on cloud services for Business Support System (BSS) and Operations Support System (OSS) applications for communications service providers. My previous responsibilities include architecture and product management for the the RODOD and RSDOD solutions to provide an integrated suite of BSS and OSS applications for communications providers. I founded the Oracle Communications Service & Subscriber Management (S&SM) application and the Oracle Communications Unified Inventory Management (UIM) application. I pioneered the adoption of Object Relational Mapping (ORM) based persistence techniques within OSS applications. I introduced the XML Schema based entity relationship modeling language, which is compiled into the persistent object modeling service. I established the notion of a valid time temporal object model and database schema for life cycle management of entities and the ability to travel through time by querying with a temporal frame of reference or a time window of interest. I established the patterns for resource consumption for capacity management. I championed the development of Web based user interfaces and Web Services for SOA based integration of OSS applications. I was responsible for single handedly developing the entire prototype that formed the foundation of the current generation of the OSS inventory application. I have been engaged in solution architecture with service providers to adopt and deploy Oracle's OSS applications across the globe. I am responsible for requirements analysis and architectural design for the Order-to-Activate Process Integration Pack (PIP) proposed to integrate the OSS application suite for the communications industry using the Application Integration Architecture (AIA). Any opinions expressed on this site are my own, and do not necessarily reflect the views of Oracle.

Microservices Life Cycle

There is friction between a microservices architecture and life cycle management goals for application releases. One significant motivation for microservices is independent life cycle management, so that capabilities with well-defined boundaries can be developed and operated by self-contained, self-directed teams. This allows for more efficient workflows, so that a fast-moving code base is not held back by other slower-moving code bases.

Typically, an application (a collection of services that form an integrated whole and are offered together as a product to users) is rolled out with major and minor releases on some cadence. Major releases include large feature enhancements and some degree of compatibility breakage, so these may happen on an annual or semi-annual basis. Minor releases or patches may happen quarterly, monthly, or even more frequently. With microservices, the expectation is that each service may release on its own schedule without coordination with all others even within the scope of an integrated application. A rapid release cadence is conducive to responsiveness for bug fixes and security fixes, which protect against exposing vulnerabilities to exploits.

One advantage of applications on the cloud is that a single release of software can be rolled out to all users in short order. This removes the substantial burden on developers to maintain multiple code branches, as they had to do in the past for on-premises deployments. Unfortunately, the burden is not entirely lifted, because as software under development graduates toward production use, various pre-release versions must be made available for pre-production staging, testing, and quality assurance.

Development is already complex, needing feature development toward a future release to proceed in parallel with being able to implement bug fixes for the release that is already in production (assuming all users are on only the latest). These parallel streams of development will be in various phases of pre-production testing toward being released to production, and in various phases of integration testing with a longer-term future release schedule. Varying levels of severity for bugs mean that the urgency for fixes varies. For example, emergency fixes need to be released as a patch to production immediately, if they are needed for security vulnerabilities that are exploitable. Whereas, fixes for functional defects may wait for the next release on the regular cadence. Cherry-picking and merging fixes across code branches is tedium that every developer dreads. Independent life cycle management of source code organized according to microservices is seen as helping to decouple coordination across development teams, which are organized according to microservice boundaries.

Independent life cycle management of services relies on both backward compatibility and forward compatibility. Integration between services needs to be tolerant of mismatched versions to be resilient to independent release timing, including both upgrades, rollbacks due to failed upgrades, and rerunning an upgrade after a prior failure. Backward compatibility enables a new version of a service to interoperate with an older client. Forward compatibility enables the current version of a service—soon to be upgraded—to interoperate with a newer client, especially during the span of time (brief or lengthy) in which one may be upgraded before the other. In my article about system integration, I explained the numerous problems that make compatibility difficult to achieve. Verification of API compatibility through contract testing is the best practice, but test coverage is seldom perfect. Moreover, no contract language specifies everything that impacts compatibility. Mocking will never be representative of non-functional qualities. This is one of many reasons why confidence in verification cannot be achieved without a fully integrated system. This is how the desire for independent life cycles for microservices is thwarted. The struggle is more real than most people realize. As software professionals, we enter into every new project with fresh optimism that this time we will do things properly to achieve utopia (well, at least independent life cycle would be a small victory), and at each and every turn we are confronted by this one insurmountable obstacle.

Application features involve workflows that span two or more collaborating microservices. For example, a design-time component provides the product modeling for a runtime component for selling and ordering those products. Selling and ordering cannot function without the product model, so the collaboration between those services must integrate properly for features to work. Most features rely on collaborations involving several services. Often, the work resulting from one service is needed to drive the processing in other services, as was the case in the selling and ordering example above. This pattern is repeated broadly in most applications. Once all collaborations are accounted for across the supported use cases, the integrations across services would naturally cover every service. The desire for an independent life cycle for each service that composes the application faces the interoperability challenges across this entire scope. There goes our independence.

Given the need to certify a snapshot of all services that compose an application to work properly together, we need a mechanism to correlate the versioning of source code to versions of binaries (container images) for deployment. Source code can be tagged with a release. This includes Helm charts, Kubernetes YAML files, Ansible playbooks, and whatever other artifacts support the control plane and operations pipelines for the application. A snapshot must be taken of the Helm chart version and their corresponding container image versions, so that the complete deployment can be reproduced.

This identifies an application release as a set of releases of services deployed together. This information aids in troubleshooting, bug reporting, and reproducing a build of those container images and artifacts from source code, each at the same version as what was released for deployment. This is software release management 101, nothing out of the ordinary. What is noteworthy is our inability to extricate ourselves from the monolithic approach to life cycle management despite adopting a modern microservices architecture.

Worse still, if our application is integrated into a suite of applications, as enterprise applications tend to be, the system integration nightmare broadens the scope to the entire suite. The desire for an independent life cycle even for each application that composes the suite faces interoperability challenges across this entire scope. What a debacle this has turned out to be. The system integration nightmare is the challenge that modern software engineering continues to fail at solving across the industry.

DevOps Transparency and Coordination

Coordinating human activities across organizations and disciplines is fundamental to DevOps. This requires having documented procedures to handle any situation, tools that enable participants to collaborate effectively, and a shared understanding of what information needs to be captured and communicated to work — especially when the actors are likely to be separated by space and time.

A DevOps procedure is initiated for a reason. These situations include scheduled maintenance, a response to an alert (detection of a condition that deserves attention), or a response to a request for support. In each case, there should be a ‘ticket’ (a record in a tool) that notifies a responsible DevOps engineer to work the issue. Ideally, the relevant procedure that applies to a ticket should be obvious — ideally, referenced explicitly.

When responding to an alert or a support request (usually a complaint about a service malfunction), usually it begins by confirming the reported condition. This requires gathering information about the context and collecting diagnostics to aid in troubleshooting. Ideally, the ticket clearly identified the problem; otherwise, interactions would be needed to gain such clarity. Humans are routinely terrible about assuming the recipient of a request has all the necessary context to understand what is being asked of them and why. To mitigate this inefficiency, tooling and procedural documentation are usually provided to guide how tickets should be written, so that many questions and answers are not needed afterward to satisfy the request.

The engineer who works a ticket should capture the service configuration, relevant logs to determine the failure mode, and any other data for the context associated with the problem for analysis toward an operational fix or for submitting a bug, if applicable.

A designated channel should be used for engineers to collaborate. Each engineer must record in real time the actions taken to troubleshoot, analyze, and correct the problem. This aids in coordination between multiple individuals in different roles across organizations to work together. Good communication enables everyone involved to stay informed and avoid taking actions that interfere with each other. Moreover, an accurate record can be reviewed later as a post-mortem. In the course of the

Severe incidents, such as those that cause a service outage, demand a root cause analysis toward preventative actions, such as process improvements, procedure documentation, operational tooling, or developing a permanent fix (for a software bug). This depends on the ticket capturing the necessary information to trace how the problem originated, such as the transaction processing in progress, events or metrics indicating resource utilization out of bounds (e.g., out of memory, insufficient cpu, I/O bandwidth limited, storage exhaustion) or performance impairment (e.g., lock contention, queue length, latency), or anything else that appears out of the ordinary.

One of the biggest impediments to verifying that a problem is resolved is that DevOps normally does not have access to the service features being exercises by end users. When a functional problem is reported by users, it may not be possible for DevOps to confirm that the problem is fixed from the user’s perspective. Communication with users may need to be mediated by customer support staff. The information on the ticket would need to facilitate this interaction.

Accurate record-keeping also enables later audits in case there are subsequent problems. The record of actions taken can be used to analyze whether these actions (such as configuration changes or software patches) are contributing to other problems. Troubleshooting procedures should include data mining of past incidents (have we seen this problem before? how did we fix it previously?) and auditing what procedures may have impacted the service under investigation (what could have broken this?).

The above guidance can be summarized as follows.

  • Say what you do
  • Do what you say
  • Write it down

Future Distributed Applications

Big Tech censorship and cancel culture are becoming intolerable. Politicization of business is destroying the fabric of society. Corporate oligarchs are implementing partisan agendas to shape public discourse by applying so-called “community standards” for social media content moderation. They de-platform personalities who express opinions that run counter to approved narratives. They silence dissent. Free speech and freedom of association are under threat, as private companies are coerced by state regulatory action, looming threats of state intervention, and mob rule through heckler’s veto, bullying, harassment, doxxing, and cancel culture. Concentration of power and control in a few dominant platforms, such as Google, YouTube, Facebook, Twitter, Wikipedia, and their peers has harmed consumer choice. Anti-competitive behavior, such as collusion among platform and infrastructure services to deny service to competitive upstarts and undesirable non-conformists, has suppressed alternatives like Parler, Gab, and BitChute.

The current generation of dominant platforms does not allow editorial control to be retained by content creators. The platform is viewed as the ultimate authority, and users are limited in their ability to assert control to form self-moderated communities and to set their own community standards. Control is asserted by the central platform authorities.

Control needs to be decoupled from centralized platform authorities and put back in the hands of content creators (authors, podcasters, video makers) and end users (content consumers and social participants). Editorial control over legal content does not belong with Big Tech. What constitutes legal content is dependent on the user’s jurisdiction, not Big Tech’s harmonization of globalist attitudes. To Americans, hate speech is protected speech, and it needs to be freely expressible. Similarly, users in other jurisdictions should be governed according to their own standards.

We need to develop apps with peer-to-peer protocols and end-to-end encryption to cut out the middlemen which will exterminate today’s generation of social media companies. Better yet, application logic itself should be deployable on user-controlled compute with user-controlled encrypted storage on any choice of infrastructure providers (providing a real impetus for the adoption of Edge Computing), so that centralized technology monopolies cannot dominate as they do today. This approach needs to be applied to decentralize all apps, including video, audio podcasts, music, messaging, news, and other content distribution.

I believe the next frontier for the Internet will be the development of a generalized approach on top of HTTP or as an adjunct to HTTP (like bittorrent) to enable distributed apps that put app logic and data storage at end-points controlled by users. This would eliminate control by middlemen over what content can be created and shared.

Applications must be distributed in a topology where a node is dedicated to each user, so that the user maintains control over the processing and data storage associated with their own content. Applications must be portable across cloud infrastructures available from multiple providers. A user should be able to deploy an application node on any choice of infrastructure provider. This would enable users to be immune from being de-platformed.

With an application whose logic and data are distributed in topology and administrative control, the content should be digitally signed so that it can be authenticated (verified to be produced by the user who owns it). This is necessary, so that a user’s application node can be moved to an alternative infrastructure (compute and storage) without other application nodes needing to establish any form of trust. Consumers (the audience with whom the owner shares content) and processors (other computational services that may operate on the content) of the information would be able to verify that the information is authentic, not forged or tampered with. The relationship between users and among application nodes, as well as processors, is based on zero trust.

Processing of information often involves mirroring and syndication. Mirroring with locality for low latency access gives certain types of transaction processing, such as search indexing, the performance characteristics they need. Authorizing a search engine to index one’s content does not automatically grant users of the search engine access to the content. Perhaps only an excerpt is presented by the search engine along with the owner’s address, where the user may request access. A standard protocol is needed to enable this negotiation to be efficient and automated, if the content owner chooses to forego human review and approval.

We need to change how social applications control the relationship between content producers and content consumers. First, for original source content, the root of a new discussion thread, the owner must control how broadly it is published. Second, consumers of content must control what sources of information they consume and how it is presented. Equally important, consumers of an article become producers of reviews and comments, when interacting in a social network. The same principle must apply universally to the follow-on interactions, so that the article’s author should not be able to block haters from commenting, but the author is not obligated to read them. Similarly, readers are not obligated to see hateful commenters, who they want to exclude from their network. The intent should be to enable each person to control their own content and experience, ceding no control to others.

Social applications need self-managed communities with member administered access control and content moderation. Community membership tends to be fluid with subgroups merging and splitting regularly. Each member’s access and content should follow their own memberships rather than being administered by others in those communities. The intent is to mitigate a blacklisted individual being cancelled by mobs. If a cancelled individual can form their own community and move their allies there with ease, cancel culture becomes powerless as a tool of suppression with global reach. Its reach is limited to communities that quarantine themselves.

This notion of social network or community is decentralized. A social application may support a registry of members, which would serve as a superset of potential relationships for content distribution. This would enable a new member to join a social network and request access to their content. Presumably, most members would enable automatic authorization of new members to see their content, if the new member has not been blocked previously. That is, enable a community to default to public square with open participation. However, honor freedom of association, so that no one is forced to interact with those with whom there is no desire to associate, and no one can be banned from forming their own mutually agreed relationships.

We need software innovations to address this urgent need to counter the censors, the cancellers, the de-platformers, the prohibitionists, the silencers of dissent, and the government oppressors. We don’t yet have a good understanding of the requirements which I’ve touched upon above, as I have only scratched the surface. We need an architecture to enable the unstoppable Open Internet that we failed to preserve from the early days. We need to develop a platform that realizes this vision to restore a healthy social fabric for our online communities.

indirect aggression

An earlier article titled corporations acting as agents of foreign governments describes one form of indirect aggression. No one would dispute that when a person hires a hitman to murder someone, the hiring party is guilty of aggression against the victim, even though the violence was indirect. We must recognize the same causal connection when government wields the machinery of state violence indirectly through private entities; and similarly when private entities wields government power indirectly against their competitors and the public at large.

In current events, we see David Boaz of the Cato Institute, a libertarian think-tank, lament Florida Governor Ron DeSantis implementing legislation to counter Critical Race Theory, election irregularities, stringent COVID restrictions, and vaccine passports. We also see state action being taken to punish protests that block road traffic and social media censorship and de-platforming, which trammel upon free speech.

Libertarians face a moral dilemma of epic proportions. The left weaponizes government power to aggress. The right weaponizes government power to aggress in their own ways and to counter the left. Libertarians want non-aggression, but they cannot reconcile their views with reality of a world filled with aggression.

Right and left engage in indirect aggression through crony relationships between private entities and governments. When private entities influence government to wield legislative and regulatory force against their competitors or the public at large, the corruption results in a phenomenon known as regulatory capture. Increasingly, we also see the government infiltrate private entities to coerce them into implementing policy that the US Constitution forbids government from implementing directly, such as censorship (content moderation). Politicians threaten to regulate or otherwise interfere in the market. Government use subsidies, spending, loans, tax policy, tariffs, export controls, licenses, permits, and authorizations as tools of coercive power. Sometimes the mere threat or even a wink and a nod are enough to influence private entities to acquiesce to a politician’s desires, as any mobster is aware. Government agencies are also looking to outsource their coercive functions to private entities, who are not subject to Constitutional constraints and legislative oversight.

Libertarians are faced with the dilemma of either to be passive (“private entities can do what they want”) in ignorance of indirect aggression, or to retaliate. When countering indirect aggression, if one’s economic power (cancel culture) and political power (voting, lobbying, and bribing for legislation and regulation) are too weak, passivity is surrendering to coercion (resigning to be victims). Alternatively, libertarians may recognize that in the face of indirect aggression having already been initiated against them, the same weapons can be wielded as a defensive measure in retaliation and this would be morally defensible. It is the principle: return fire when fired upon.

system integration

There is something terribly wrong with software development in the enterprise application space. No one is able to release working software without coordinating across all product development teams to align the version of every product in the universe, because end-to-end workflows can’t be made to work as products are released on independent life cycles.

I believe we are missing architectural design principles. We talk about forward and backward compatibility of APIs, but I’m not sure the industry deeply understands what that entails. The problem goes beyond teams within an organization, because the software industry doesn’t even understand what compatibility entails.

The issue lies in how the base application (e.g., product catalog, store front, sales automation, care, order fulfillment, customer and subscription management, charging, billing, revenue management) is horizontal (generic) and hollow, expecting after-market extensibility to provide the vertical behavior that is specialized for the industry and the enterprise’s business model. The intent of the application vendor is to provide a general purpose platform that can be tailored after-market to the peculiarities of any enterprise. The application will implement an API defined by industry standards (say, tmforum.org for the communications industry) that reflects this general purpose hollowness. The application doesn’t have any real substance until it is customized to model the business. For example, a product catalog would not come populated with 5G mobile product specifications that are branded and priced according to a 5G service provider’s business model).

When extending entities with data that have hidden meaning, implied behavior, constraints, and statefulness (life cycle, workflow), these contribute to the API in ways that were not defined by the original specification. Each new element introduces some degree of incompatibility. Industry standards can never specify in a precise and rigorous manner things they did not foresee.

Stateful behavior is especially troublesome to specify in a manner that ensures compatibility. This includes conversational state and persistent state. Conversational state is where linked information is implicitly kept across multiple requests involved in the same session. A cursor for iterating through a collection of query results is an example of conversational state. Persistent state is durable across transactions, having memory that spans the life of a transaction, a session, a process, and even the life of a compute instance. When methods can only act against objects in certain states, but not others, this constraint must be honored for compatibility across collaborating components.

Objects and attributes are allowed to take on certain values at various points in their life cycles, and transactional behavior and workflow (the steps performed by business processes) are conditional upon the state of these objects. For example, when equipment is installed, it may be in various states of readiness for production use, but when not installed the equipment’s operational characteristics and configuration are irrelevant. Every component with access to that object must understand these semantics and enforce them consistently, otherwise there is no compatibility. Unfortunately, even these very simple conditional constraints and the ones in the previous paragraph are beyond the capability of today’s prevailing interface specification languages and entity modeling frameworks.

Immutability is often conditional on the life cycle state of an entity. For example, an order can be edited during information capture, but its captured intent cannot be edited after the order is firm and in the process of being fulfilled. Again, this constraint cannot be specified in a manner that ensures compatibility across collaborating components.

Methods have failure modes, usually specified as failure responses, error codes, or exceptions. Some kinds of failures are recoverable using techniques like retrying, while others are non-recoverable. This too is usually not expressible for compatibility.

Methods have performance expectations in terms of latency, concurrency, and transaction volume. Methods have resource consumption expectations in terms of memory, cpu, storage, network, and I/O. Methods that involve data sets have expectations about how much data can be passed with corresponding performance and scalability characteristics. This too is usually not expressible for compatibility.

Objects and their attributes are often persistent on durable storage. Subsets of attributes may be persistent, while others are volatile or derived (computed based on the value of other attributes, such as a rolled-up status or a count of a collection). This too is usually not expressible for compatibility.

Methods must trade off concurrency, availability, and partition tolerance. The expectation of what trade offs should be chosen is usually not expressible for compatibility.

Methods expect the caller to be authenticated and they are expected to enforce access control to verify that the caller is authorized. Moreover, the method is expected to enforce data permissions and data privacy. This too is usually not expressible for compatibility.

The list of requirements and constraints that contribute to compatibility goes on. The above is a sampling to give the reader a sense of the problem, not to be comprehensive. The intent is to show how formal specifications are grossly insufficient to ensure a high degree of compatibility across heterogeneous suppliers and independently developed implementations.

Because API compatibility is so unreliable based on specifications and contract testing, the promise of a microservice architecture (within an application) or a service-oriented architecture (for integrating applications across the enterprise) cannot be achieved naively. System integration continues to be plagued by a waterfall model of requiring a complete line-up of application versions to be tested end-to-end, before we have any confidence that they work together. The benefits of agile development and independent life cycles are not achievable, because the pre-requisite compatibility guarantees cannot be met. System integration of enterprise applications remains in the stone age because of this crippling deficiency.

Wealth versus Quality of Life

Conflating “wealth” with “quality of life”—in criticism of wealth inequality—is a fatal error. It is important to recognize that wealth in the form of capital (savings that are re-invested into factors of production toward increasing capacity for supplying goods and services into the future) speaks to supply-side capacity. The abundance created by this productive capacity is what provides for quality of life. On the demand side, quality of life comes from consumers with incomes that have purchasing power to acquire those goods and services. The greater the abundance of supply, the greater the purchasing power that consumers can wield (as expenses on the income statement or outflows on the cash flow statemen) WITHOUT wealth (assets and equity on the balance sheet) playing any role for consumers. The role of wealth is to associate ownership for management responsibility over factors of production to create and maintain supply. The role of income is to have purchasing power to enable quality of life for consumers. Savings (retained earnings that are re-invested) is how consumers cross over to participate in wealth toward the management of supply.

Pain Feedback Loops

Feedback loops are very important to regulate behavior within an enterprise. This applies to both rewarding positive behavior, and encouraging more of it, as well as correcting negative behavior to get less of it. Continuous improvement is about feedback loops.

Focusing on negative feedback, we should recognize a phenomenon called ‘pain’. In this context, it refers mostly to pains in the ass, which are discomforts, inconveniences, and frustrations which burden people’s life, draining their time and energy in unproductive ways. In DevOps, when high severity operational problems arise, such a service outage in the middle of the night, pain manifests in a pager alert that wakes up an engineer to troubleshoot the incident and resolve the problem.

When fires need to be fought, fire-fighters experience this pain in proportion to the number of fires and their severity. Development teams tend to avoid work with the goal being to deliver features with faster time to market. They inevitably cut corners in areas that make operations more efficient, because they tend not to be placed in the position of experiencing the pain, when it comes. Disconnecting development priorities from operational responsibilities is a recipe for the infliction of pain on those who do not deserve it, and the result is an excess of unexpected pain that should have been foreseen and mitigated. The integration of development with operations into DevOps is intended to establish this connection. This connection must not be undermined by paying mere lip-service to operations without putting real skin in the game, so that development staff experience pain for operational failures as much as operations staff.

Social Media Bias

Tim Pool did a reasonable job of enumerating several areas of concern.

  1. Applying a single global standard (“community standards”) to American citizens imposes “hate speech” regulations that are antithetical to American principles of free speech protected by the US Constitution. [Similar to concerns I raise: https://www.jetpen.com/blog/2020/06/20/corporations-acting-as-agents-of-foreign-governments/  and https://www.jetpen.com/blog/2019/09/16/social-media-and-the-first-amendment/]
  2. Twitter claims to hold no politically biased agenda by intention, while trying to implement narrow goals around maximizing inclusion of users to conduct speech by protecting their physical safety (i.e., by disallowing targeted harassment and doxxing), but they have adopted ideologically biased policies that are selectively enforced in a manner that predominantly punishes conservatives.
  3. The near-monopoly status of the social media giants within their own niches combined with their unilateral decision-making that appear to most people to be politically biased in one direction will lead politicians, who are ignorant the actual issues and incompetent to formulate good solutions, will take a sledgehammer approach to regulate, and consequently make the situation much worse.
  4. The coordinated efforts among corporations to punish individuals across social media, hosting, Internet infrastructure, and payment processing systems demonstrates a terrifying abuse of power that is terrifying for how they can implement a social credit system that can unperson people in an extra-judicial manner without due process of law or avenues of redemption.

Black hole jets

In my hubris, I sometimes write emails to established scientists with my stupid ideas. I have a knack for formulating what I believe to be good questions. Here is what I sent to Netta Engelhardt at MIT this morning.

I appreciate the videos you’ve done on YouTube on black holes.

Some rhetorical questions come to mind. I don’t expect an answer. My intention is to ask them, in case they help stimulate some curiosity toward maybe forming a useful idea.

  1. How much of the mass that is falling into a black hole adds to the mass of the black hole, versus being ejected, say through its jets?
  2. Can we think of the jets as carrying information away from the black hole, given that the BH is accelerating the outbound particles substantially, thereby transferring energy from it?
  3. Wouldn’t (2) then be consistent with a model whereby all information about the BH is thought to be encoded on its boundary, for accreted matter to be seen as sticking to the boundary as it falls in, and over time that same information migrating to the poles of the BH and ejected through its jets?

It just seems to me, as a layman, that black hole jets are such a prominent feature, but I haven’t seen much talk about what mechanisms generate these jets, and what are their relationships to the flow of energy and information into and out of the black hole.

Ben