Category Archives: software

system integration

There is something terribly wrong with software development in the enterprise application space. No one is able to release working software without coordinating across all product development teams to align the version of every product in the universe, because end-to-end workflows can’t be made to work as products are released on independent life cycles.

I believe we are missing architectural design principles. We talk about forward and backward compatibility of APIs, but I’m not sure the industry deeply understands what that entails. The problem goes beyond teams within an organization, because the software industry doesn’t even understand what compatibility entails.

The issue lies in how the base application (e.g., product catalog, store front, sales automation, care, order fulfillment, customer and subscription management, charging, billing, revenue management) is horizontal (generic) and hollow, expecting after-market extensibility to provide the vertical behavior that is specialized for the industry and the enterprise’s business model. The intent of the application vendor is to provide a general purpose platform that can be tailored after-market to the peculiarities of any enterprise. The application will implement an API defined by industry standards (say, tmforum.org for the communications industry) that reflects this general purpose hollowness. The application doesn’t have any real substance until it is customized to model the business. For example, a product catalog would not come populated with 5G mobile product specifications that are branded and priced according to a 5G service provider’s business model).

When extending entities with data that have hidden meaning, implied behavior, constraints, and statefulness (life cycle, workflow), these contribute to the API in ways that were not defined by the original specification. Each new element introduces some degree of incompatibility. Industry standards can never specify in a precise and rigorous manner things they did not foresee.

Stateful behavior is especially troublesome to specify in a manner that ensures compatibility. This includes conversational state and persistent state. Conversational state is where linked information is implicitly kept across multiple requests involved in the same session. A cursor for iterating through a collection of query results is an example of conversational state. Persistent state is durable across transactions, having memory that spans the life of a transaction, a session, a process, and even the life of a compute instance. When methods can only act against objects in certain states, but not others, this constraint must be honored for compatibility across collaborating components.

Objects and attributes are allowed to take on certain values at various points in their life cycles, and transactional behavior and workflow (the steps performed by business processes) are conditional upon the state of these objects. For example, when equipment is installed, it may be in various states of readiness for production use, but when not installed the equipment’s operational characteristics and configuration are irrelevant. Every component with access to that object must understand these semantics and enforce them consistently, otherwise there is no compatibility. Unfortunately, even these very simple conditional constraints and the ones in the previous paragraph are beyond the capability of today’s prevailing interface specification languages and entity modeling frameworks.

Immutability is often conditional on the life cycle state of an entity. For example, an order can be edited during information capture, but its captured intent cannot be edited after the order is firm and in the process of being fulfilled. Again, this constraint cannot be specified in a manner that ensures compatibility across collaborating components.

Methods have failure modes, usually specified as failure responses, error codes, or exceptions. Some kinds of failures are recoverable using techniques like retrying, while others are non-recoverable. This too is usually not expressible for compatibility.

Methods have performance expectations in terms of latency, concurrency, and transaction volume. Methods have resource consumption expectations in terms of memory, cpu, storage, network, and I/O. Methods that involve data sets have expectations about how much data can be passed with corresponding performance and scalability characteristics. This too is usually not expressible for compatibility.

Objects and their attributes are often persistent on durable storage. Subsets of attributes may be persistent, while others are volatile or derived (computed based on the value of other attributes, such as a rolled-up status or a count of a collection). This too is usually not expressible for compatibility.

Methods must trade off concurrency, availability, and partition tolerance. The expectation of what trade offs should be chosen is usually not expressible for compatibility.

Methods expect the caller to be authenticated and they are expected to enforce access control to verify that the caller is authorized. Moreover, the method is expected to enforce data permissions and data privacy. This too is usually not expressible for compatibility.

The list of requirements and constraints that contribute to compatibility goes on. The above is a sampling to give the reader a sense of the problem, not to be comprehensive. The intent is to show how formal specifications are grossly insufficient to ensure a high degree of compatibility across heterogeneous suppliers and independently developed implementations.

Because API compatibility is so unreliable based on specifications and contract testing, the promise of a microservice architecture (within an application) or a service-oriented architecture (for integrating applications across the enterprise) cannot be achieved naively. System integration continues to be plagued by a waterfall model of requiring a complete line-up of application versions to be tested end-to-end, before we have any confidence that they work together. The benefits of agile development and independent life cycles are not achievable, because the pre-requisite compatibility guarantees cannot be met. System integration of enterprise applications remains in the stone age because of this crippling deficiency.

Pain Feedback Loops

Feedback loops are very important to regulate behavior within an enterprise. This applies to both rewarding positive behavior, and encouraging more of it, as well as correcting negative behavior to get less of it. Continuous improvement is about feedback loops.

Focusing on negative feedback, we should recognize a phenomenon called ‘pain’. In this context, it refers mostly to pains in the ass, which are discomforts, inconveniences, and frustrations which burden people’s life, draining their time and energy in unproductive ways. In DevOps, when high severity operational problems arise, such a service outage in the middle of the night, pain manifests in a pager alert that wakes up an engineer to troubleshoot the incident and resolve the problem.

When fires need to be fought, fire-fighters experience this pain in proportion to the number of fires and their severity. Development teams tend to avoid work with the goal being to deliver features with faster time to market. They inevitably cut corners in areas that make operations more efficient, because they tend not to be placed in the position of experiencing the pain, when it comes. Disconnecting development priorities from operational responsibilities is a recipe for the infliction of pain on those who do not deserve it, and the result is an excess of unexpected pain that should have been foreseen and mitigated. The integration of development with operations into DevOps is intended to establish this connection. This connection must not be undermined by paying mere lip-service to operations without putting real skin in the game, so that development staff experience pain for operational failures as much as operations staff.

DevOps Mentality

Developers, who are new to Operations, as they become immersed in DevOps culture, may envision that their involvement in operations follows after development is done. However, operations are not an after-thought. This article is to enumerate some operations-related impacts to development practices that may not be at the forefront of a developer’s mind, but they should be. Design for operations.

Logging to enable monitoring, alerting, troubleshooting, and problem resolution

When coding and testing, pay attention to how helpful log messages are to troubleshooting problems. Do messages contain enough context to assist in identifying the root cause and corrective actions? Are messages at appropriate severity levels? One of the biggest impediments to monitoring and troubleshooting is excessive, imprecise, and unnecessary logging, otherwise known as noise.

If logging an ERROR level message, it should represent an operational problem that can be monitored. Each error should have a corresponding corrective action, if one is necessary. If a failure is transient and correctable by retrying, It should not be logged as an error until all attempts have been exhausted without success; repeated messages are unhelpful. Error level messages must be documented in the monitoring, troubleshooting, and problem resolution procedures. Error messages that are functional without any operational significance (not correctable through operational procedures) should be marked as such, so that they are not monitored for intervention. The knowledge base should document every foreseeable failure mode and its corrective actions.

Every log message incurs cost.

  • Computational cost to produce the message and collect it.
  • Storage cost for retention and indexing for search. Accounting for the volume of messages collected per service instance per day multiplied by one year retention and the number of service instances deployed, that may be hundreds of terabytes of data at a cost of tens of thousands of dollars per month.
  • Documentation cost to understand the meaning of the message and the expected operational procedures, if any, to monitor, alert, and carry out any corrective actions, when detected.

Excessive logging produces noise that becomes an impediment to operational efficiency and effectiveness. Monitoring and troubleshooting become more difficult, when significant information is buried among the noise. Seek to reduce noise by eliminating log messages that are not valuable. This can be done by classifying messages at a finer-grained log level (i.e., INFO, DEBUG) or by suppressing them altogether.

Specify a log message format so that alerts can be defined based on pattern matching. A precise identifier (e.g., OLTP-0123) for each type of message is helpful for monitoring solutions to key off of, rather than matching arbitrary strings that are not guaranteed to remain invariant.

Log messages should be parameterized to carry contextual information, such as the identifier of the entity being processed and the values of the most significant properties to the transaction. Avoid logging sensitive data that would compromise security, such as credentials or personally identifiable information subject to data privacy regulations. The context is important for isolating the problem, when trouble shooting, parameterizing corrective actions to resolve the problem, and providing a useful description when reporting a bug.

Avoid logging stack traces for non-debug levels of logging. Stack traces are verbose (noisy), and they carry information about the internal workings of the software (packages, classes, and source file names) that may be interesting to developers, but is not useful for operations.

Avoid repeatedly logging the same message. Repeated logging is often the consequence of an error condition that is handled with a retry loop. To avoid noise, retries can be counted without logging the continuing error. A summary can be logged when the retry loop ends. If retrying is successful, silence is preferred unless it is important to note violations of performance targets caused by retrying. Otherwise, timing out may entail escalating the error condition to a fall back mechanism or a circuit breaker, and logging this exceptional condition may be informative later, if the condition persists.

Do log a message when an error is detected for which a bug should be reported, an operations engineer should be alerted about a possible malfunction, or a corrective action is required. Error conditions that represent a possible service outage are especially important, as these are the messages that should be matched for alerting. Errors will be associated with corrective actions, which operations engineers will perform to resolve the problem, when encountered.

Avoid logging a message for normal operations, such as successful liveness probes and readiness probes. This is worthless noise.

Pay special attention to methods and transactions with security events.

  1. Redirect a request to access a user interface to login
  2. A successful login
  3. A failed login attempt
  4. Performing a privileged action that must be audited, such as administrative actions or gaining access to private information not owned by the user
  5. A denial of access due to insufficient privileges

Security events should be logged with a format that allows such messages to be classified, so that they can be directed to SIEM for special handling. SIEM is responsible for auditing, intrusion detection, and fraud detection. Being able to detect a security breach is among the most important operational responsibilities.

Audiences

One of the most common mistakes is to conflate the log stream directed toward operations and information intended for end users. Services that enable administrative end users to configure or customize features need to provide transparency to the steps that are executing. This includes integrating other services to collaborate or specifying workflows or policies, which can be done erroneously. When something can go wrong, users must have a way of debugging those errors. Legacy server-based applications tended to not differentiate between these use cases and operations. Unfortunately, when evolving such applications into cloud services, these use cases are the most difficult to tease apart, so that information is directed to the proper audiences.

It is equally problematic when functional issues, which are intended for feedback to end users, are directed to operational logs. Operations have no role to play in monitoring such issues and taking corrective action. Directing such messages to operational logs adds to noise.

Developers must pay attention to the intended audience of messages, so that they are either directed to operational logs or end users or both.

Telemetry

“You get what you measure.”

Produce metrics for things you care about.

  • Service utilization – end user activity
  • Service outages – times when the liveness and readiness probes fail, so that these outages go toward calculating the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) for service availability
  • Latency – the time between receiving a request and sending its corresponding response tell us how well the system is performing, as perceived by human users interacting through a user interface
  • Transaction processing throughput – the volume of transactions processed in each time interval tell us how well the system is performing with regard to work loads
  • Resource utilization – the infrastructure and platform resources (compute, storage, network) consumed to provide the service tell us how the demand is trending and how capacity should be managed to enable the service to scale into the future

Collect these metrics in a time series database for monitoring (including visualization and reporting) and alerting based on threshold crossing. Specify threshold crossing events for conditions that need attention, such as the following.

  • Violations of service level objectives (availability, performance)
  • Exceeding service limits (what the user is allowed) toward up-selling higher levels of service to the customer
  • Exceeding service and resource demands (capacity planning) toward adjusting scale and forecasting future scaling needs

Autonomous operation

Developers should design a service to be self-sustaining indefinitely without the need for human intervention, as much as possible. Human involvement is expensive (labor intensive), error-prone, and slow to respond compared to automated procedures. Any condition that resorts to human intervention should be considered a failure on the part of the developers to design for autonomous operation.

  • Fault tolerance – as a consumer of other services and resources, be resilient to failures and outages that are likely to recover after some time
  • Self-healing – as a provider that is experiencing a failure or outage, detect the problem and implement measures to recover, such as restarting, rescheduling to use alternative resources, or shifting workloads to surviving instances
  • Auto-scaling – adjust the number of resources to match the workload, so that the service continues to satisfy performance objectives
  • Self-maintaining – routine housekeeping should be scheduled and automated to prevent storage exhaustion (i.e., data purging), to maintain reasonable performance (i.e., recalculate statistics), and to enforce policies (i.e., secrets rotation)

Pipelines

Operations involve actions initiated through human access (by operations staff) and systems integration (i.e., by capturing an order submitted by the subscriber). Ad hoc changes to a production deployment should be forbidden, because these cannot be reproduced programmatically from source code. Therefore, all types of changes must be anticipated during development, so that they are available as pipelines (programmatic workflows) operationally. Each action should be parameterized, so that a precise set of information is input to drive the execution of its pipeline.

Provisioning – creation and termination of the service subscription

Upgrades and patches – deploying software updates and bug fixes

Configuration – scaling, enabling and disabling features, naming, certificates, policies, customization

Diagnostic actions – checks, tests, and probes with a verbose log level for troubleshooting and debugging, when increased scrutiny is needed for problem resolution

Administrative actions and maintenance – password and secrets rotation, data purging according to retention policy, and housekeeping (e.g., storage optimization)

Capacity management – adding and removing infrastructure and platform resources (e.g., compute, storage, addresses)

Corrective actions – interventions for problem resolution, such as stopping and restarting of services (rescheduling on compute resources), forcing maintenance tasks like purging data (due to storage exhaustion) or password rotation (prevent security breaches), replacing defective resources (e.g., kernel deadlocks)

See also:

  1. The Twelve-Factor App
  2. 10 key attributes of cloud native applications

Scaling operations across tenants in the cloud

Dear Santa,

Currently, when using the tenant-per-namespace deployment model, operational management procedures are difficult to scale to many tenants, because typical actions like patching, upgrading, stopping, starting, etc. must be initiated as pipeline jobs, once per tenant, and watched for successful execution per job. This is labor intensive, error-prone (having to re-input the same input parameters per pipeline job), and tedious to manage. Therefore, it is not scalable in its current form.

To enable this model to scale, tooling is required to enable a single specification of intent to serve as input into an automated workflow that performs the required action across every applicable namespace (tenant). The intended action may be as simple as `kubectl patch` or it may be a very complex job (upgrade all resources). The workflow would coordinate the parallel execution of these actions against their respective namespaces (indentified either by label or a list of names), possibly throttling for limited concurrency to avoid resource contention, and reporting output for status monitoring and troubleshooting. This would reduce the operational cost and complexity of deploying patches and upgrades from O(n) to approximately O(1) for n tenants.

Personal Assistants

Continuing the series on Revolutionizing the Enterprise, where we left off at Sparking the Revolution, I would like to further emphasize immediate opportunities for productive improvements, which do not need to venture into much-hyped speculative technologies like blockchain and artificial intelligence.

In the previous article, I identified communication and negotiation as skills where software agents can contribute superior capabilities to improve human productivity by offloading tedium and toil. Basic elements of this problem can be solved without applying advanced technology like AI. Machine learning can provide additional value by discerning a person’s preferences and priorities. For example, this person is always preferring to reschedule dentist appointments but never reschedules family events to accommodate work. Automating the learning of rules enables the prioritization of activities to be automated, further offloading cognitive load.

In my own work, I wish I had a personal assistant, who could shadow my every move. I want it to record my activities so I can replay them later. I want these activities to be in the most concise and compact form, not only as audio and video. For example, as I execute commands in a bash shell, I want to record the command line arguments, the inputs, and the outputs, so this textual information can be copied to technical documentation. As I point and click through a graphical user interface, I want these events to be described as instructions (e.g., input “John Doe” in the field labeled “Name” and click on the “Submit” button).

With a history of my work in this form, this information will be useful for a number of purposes.

  • Someone who pioneers a procedure will eventually need to document it for knowledge transfer. Operating procedures teach others how to accomplish the same tasks by observing how it was done.
  • Pair programming is often inconvenient due to team members being located remote from each other and separated by time zones. An activity log can enable two remote workers to collaborate more effectively.
  • Context switching between tasks is expensive in terms of organizing one’s thoughts. Remembering what a person was doing, so that they can resume later would save time and improve effectiveness.

The above would be a good starting point for a personal assistant without applying any form of AI or analytics. Then, imagine what might be possible as future enhancements. Procedures can be optimized. Bad habits can be replaced by better ones. Techniques used by more effective workers can be taught to others. Highly repeatable tasks can be automated to remove that burden from humans.

I truly believe the places to begin innovating to revolutionize the enterprise are the mundane and ordinary, which machines have the patience, discipline, and endurance to perform better than humans. More ambitious technological capabilities are good value-adds, but we should start with the basics to establish personal assistants in the enterprise as participants in ordinary work, not as esoteric tools in obscure niches.

[Image credit – Robotics and the Personal Assistant of the Future]

planning is useless

Whenever an organization is faced with challenges that require many people to move in a different direction, change their behavior, adjust their attitudes, or alter their thinking, the first thing that management wants to put in place is leadership. They always believe that with the proper top-down inspiration, instruction, and oversight, it will drive the desired results. They believe this model scales hierarchically.

I don’t believe it’s true of problems for which the organization does not have experience and expertise. The more technical and schedule risk that a project incurs because of greater unknowns, the less helpful project planning is. The ability to plan relies on a degree of analysis and design. Without relevant experience to help speculate on how to implement something, planning must happen in ignorance. The plans are meaningless, because actual implementation experience will likely invalidate those plans and designs. Unfortunately, the natural reaction is to spend more time and effort getting those plans right, as the plan goes off track with execution. The more right you try to make it, the worse that situation becomes, as the organization invests more in a futile activity, and less in activities that actually achieve the result. A “learning organization” is what is needed, not one that assumes it knows (or more importantly “can know”) what it’s doing without having done it yet.

The idea of “spontaneous order” is appealing, but that requires all participants to behave rationally with the right signals, so they can work things out among themselves. In large engineering organizations, this does not seem to work, because the communications channels are too narrow, the number of participants too great, and the volume and complexity of knowledge that must be exchanged is too vast. Individuals become too overwhelmed and cannot keep up. Management structures are inevitably put in place to introduce controls and gatekeepers. Whereas chaos is too noisy and incoherent, the imposition of order destroys knowledge pathways from forming spontaneously.

I’m left wondering if there are methods that facilitate spontaneous order. Autonomy, mastery, and purpose are great motivators in the abstract, but they don’t easily translate into concrete methods and tools. I noticed that Facebook has started implementing a system like khanacademy.org for helping edit location information, where it awards points, badges, and levels. Such systems really do provide users with a motivation to achieve the measured outcome. I’m wondering if gamification is a superior way to achieve outcomes.

Sparking the Revolution

In my previous article, Revolutionizing the Enterprise, I provided an outlook for how emerging technologies may help to transform how we do work. Now, let’s explore how we might provide the spark that starts the fire to burn down the old and welcome the new. The world does not change in a radical way without a progression of steps that pave a path for getting from here to there. What might the first step be to introducing robots and AIs as personal assistants into the regular work lives of numerous employees?

We need only look to our daily struggles to identify where every person would see the value of machine intelligence. Organizing a meeting among several participants can be challenging. You need to find a convenient time when every participant is available. You need to find a suitable venue that can accommodate everyone. If folks need to travel, the complexity rises enormously, because each traveler’s attendance is then dependent upon successfully booking travel arrangements. The risk of a single unsatisfied requirement causing the meeting to be non-viable rises with each participant and their special needs. If the meeting needs to be moved to accommodate certain participants, this would then trigger a storm of activity to renegotiate, and a flurry of activity to explore how calendars can be readjusted with a cascade of renegotiations of other appointments, each having its own priority and constraints.

This kind of negotiation among a network of people is virtually impossible to accomplish by humans among each other, because of the latency for human communications. However, if every human could be represented by an agent, who could negotiate on their behalf, this kind of activity could become painless. Imagine how many hours of phone tag, email, and travel booking could be saved. Even if an agent were not entrusted to finalize decisions on travel booking, all of the negotiation and arrangements could be prepared and presented for final approval by the human; or even involve the human at key decision points by presenting a short list of options to guide the way forward for the agent.

I believe, ordinary mundane problems such as this one, which every person has experienced, will serve as an opportunity to introduce machine intelligence to work alongside us. The off-loading of such unproductive and non-creative toil to an automated personal assistant would be a welcome change that would be seen as another useful tool, rather than a radical development. And that’s how the revolutionary should begin.

Revolutionizing the Enterprise

It has been over five years since I wrote an article titled Enterprise Collaboration, in which I identified the need for innovations to transform how people do their work. Since then, we have seen no significant advances. Enterprise applications continue to move very slowly to the cloud, driven primarily by cost efficiencies with little noticeable functional improvement except at the margins (big data analytics, social, search, mobile, user experience).

Where can we go from here?

I still firmly believe that a global work force needs to be decoupled in space and time. Mobility and cloud services will continue to provide an improving platform to enable work to be performed at any time from wherever people want. We should enable people to do their work as effectively from the office as from home, in their vehicles, during air travel, at the coffee shop, or anywhere else they happen to be. Advances in computing power, miniaturization, virtual reality, alternative display and input technologies (e.g., electronic skin, heads up displays, voice recognition, brain computer interfaces, etc.), and networking will continue to provide an improving platform for inventing better ways of doing work and play. This path does not need too much imagination to foresee.

Recently, we have seen an up-tick in applying artificial intelligence. Every major company seems to be embracing AI in some form. Image recognition and natural language are areas that have been researched for decades, and they are now being employed more ubiquitously in every day applications. These technologies lower the barrier between the virtual world and the real world, where humans want to interact with machine intelligence on their own terms.

However, I believe an area where AI will provide revolutionary benefits is in decision support and autonomous decision-making. So much of what people do at work is tedium that they wish could be automated. Some forms of tedium are drudgery, such as reporting status and time to management, organizing and scheduling meetings among team members, planning work and tracking progress, and keeping people informed. These tasks are routine and time-consuming, not creative and value-producing. Machines can interact among themselves to negotiate on behalf of humans for the most mundane tasks that people don’t really care too much to be involved in. Machines can slog through an Internet full of information to gather, prune, and organize the most relevant set of facts that drive decisions. Machines can carry out tasks on their own time, freeing up humans to work on more important or interesting things.

Personal assistants as computing applications are a new phenomenon. Everyone has heard of Amazon Echo and Google Assistant by now. I can imagine advances in this capability expanding into all areas of work and personal life to help off-load tedium. As AI becomes more capable, we should see them taking over mundane tasks, like research (e.g., comparing products to offer recommendations toward a purchasing decision, comparing providers toward recommending a selection), planning, coordinating, note taking, recalling relevant information from memory, distilling large volumes of information into a concise summary, etc. Eventually, AI will even become capable enough to take over mundane decision-making tasks that a person no longer cares to make (e.g., routinely replenish supplies of consumables from the lowest priced supplier, repetitive tasks).

The other phenomenon that will revolutionize the work place even more than in the past is robotics. Robots have already revolutionized manufacturing for decades by replacing repetitive error-prone labor-intensive tasks with perfectly reproducible error-free automation. We are seeing politics influence businesses to apply robots, where human labor sufficed in the past, purely because of the increasing cost of labor. Minimum wage legislation (bans on jobs that pay less than some mandated floor in wages) that raises labor costs above the value produced will force businesses to rethink how to operate profitably. Beyond entry-level jobs, such as fast food service, self-driving cars and trucks are already in trials for ride-sharing and long haul cargo transport. As robots become more dexterous, mobile, compact, and intelligent, we will see them become personal assistants to perform physical tasks as much as we see them in software form perform computing tasks. We should anticipate that robots will serve in a broad spectrum of capacities from low-skilled drudgery to highly-skilled artisans and professions.

The future enterprise will involve a work force where humans, AIs, and robots collaborate closely. Humans have a comparative advantage in performing creative and path-finding tasks with ill-defined goals, many unknowns, and little experience to draw upon. Robots and AIs have a comparative advantage in performing repetitive, well-defined, and tedious tasks. Together, they will transform the enterprise in ways that we have never seen before.

cloud services for the enterprise

The Innovator’s Dilemma describes how the choice to sustain an incumbent technology may need to be weighed against pursuing disruptive new technologies. Nascent technologies tend to solve a desirable subset of a problem with greater efficiency. They change the game by making what used to be a costly high-end technology available as a commodity that is affordable to the masses. It turns out that high-end customers can often live without the rich capabilities of the costly solution, and they would rather save on cost. Meanwhile, with the success that the low-end solution is gaining in the market, it can invest in maturing its product to encroach into the high-end market. Eventually, the incumbent product’s market is entirely taken over by the rapidly growing upstart, who was able to establish a foothold in a larger installed base.

That is the situation we find ourselves in today with enterprise applications. Large companies rely on expensive software licenses for Customer Relationship Management, Enterprise Resource Management, and Human Capital Management applications deployed on-premise. Small and medium sized businesses may not be able to afford the same kinds of feature rich software, because not only is the software license and annual maintenance cost expensive, but commercial off the shelf software for enterprises are typically platforms that require months of after-market solution development, customization, and system integration to tailor the software to the business policies and processes specific to the enterprise. The evolution to cloud services aims to disrupt this situation.

Let us explore the ways that cloud services aim to be disruptive.

As described above, traditional enterprise software are platforms. An incumbent product that wants to evolve to cloud without disrupting its code base will merely be operating in a sustaining mode, not achieving significant gains in efficiency. Being more PaaS-like, the prohibitive cost and onerous effort of after-market solution development remains a huge barrier to entry for customers. To become SaaS-like, a cloud service must be useful by default, immediately of value to the end users of its enterprise tenants.

Cloud services are disruptive by providing improved user experiences. Of course, this means a friendlier Web user interface that is optimized for users to perform their work more easily and intuitively. User interfaces need to be responsive to device screen size, orientation, locale, and input method. Cloud services also provide advantages for enterprise collaboration by enabling the work force to be mobile. Workers need to become more decoupled in space and time, so they can be more geographically dispersed and global in reach. Cloud services should assist in transforming how employees work together, not just replacing the same old ways of doing our jobs using a Web browser instead of a desktop application. Mobile applications may even enable new ways of interacting that are not recognizable today.

Cloud services are disruptive economically. Subscription pricing replaces perpetual software licensing and annual maintenance costs along with the capital costs of hardware infrastructure, IT staffing to operate an on-premise deployment, and on-going infrastructure maintenance, upgrades, and scaling. Subscription pricing in and of itself is not transformational. It is only superficially different by virtue of amortizing the traditional cost of on-premise deployment over many recurring payments. The main benefit is in eliminating the financial risk associated with huge up-front capital expenditures in case the project fails. Migrating a traditional on-premise application into the cloud is not really financially disruptive unless it can significantly alter the costs involved. In fact, by taking on the capital cost of infrastructure and the operational cost of the deployment, the software vendor has now cannibalized its on premise application business and replaced it with a lower margin business with high upfront costs and risk—this is a terrible formula for profitability and a healthy business.

Multi-tenancy provides this disruptive benefit. Multi-tenancy enables a cloud service to support users from multiple tenants. This provides significant cost advantages over single-tenant deployments in terms of resource utilization, simplified operations, and economies of scale. Higher deployment density translates directly into higher profit, but by itself multi-tenancy provides no visible benefit to users. The disruption comes when the vendor realizes that at scale multi-tenancy enables a new tenant to be provisioned at near zero cost. This opens up the possibility of offering an entry level service to new tenants at a low price point, because the cost to the vendor is zero. Zero cost entry-level pricing is transformational by virtue of making a cloud service available to small enterprises who would never have been able to afford such capabilities in the past. This enables innovation to be done by individual or small scale entrepreneurs (start-ups), who have the most radical, risky, and unconventional, paradigm-shifting ideas.

Elastic scaling provides another disruptive benefit. It enables a cloud service to perform as required as a tenant grows from seeding a proof-of-concept demonstrator to large scale (so-called Web scale) production. The expertise, techniques, and resources needed to scale a deployment are difficult and costly to acquire. When a vendor can provide this pain-free, an enormous burden is lifted from the tenant’s shoulders.

Cloud services evolve with the times through DevOps and continuous delivery. Traditional on-premise applications tend to be upgraded rarely due to the risk and high development cost of customization, which tends to suffer from compatibility breakage. Enterprise applications are often not upgraded for years. “If it ain’t broke, don’t fix it.” Even though the software vendor may be investing heavily in feature enhancements, functional and performance improvements, and other innovations, users don’t see the benefits in a timely manner, because the enterprise cannot afford the pain of upgrading. When the vendor operates the software as a SaaS offering, upgrades can be deployed frequently and painlessly for all tenants. Users enjoy the benefit of software improvements immediately, as the cloud service stays up-to-date with the current competitive business environment.

Combining the abilities to provision a tenant to be useful immediately by default, to start at near zero cost, to scale with growth, and to evolve with the times, cloud services provide tools that can enable business agility. A business needs to be able to turn on a dime, changing what they sell and how they operate in order to stay ahead of their competitors. Cloud services are innovative and disruptive in these ways in order to enable their enterprise tenants to be innovative and disruptive.

intent modeling as a programming language

In Tom Nolle’s blog article titled What NFV Needs is “Deep Orchestration”!, he identifies the need for a modernized Business Support System and Operations Support System (BSS/OSS) to improve operations efficiency and service agility by extending service-level automation uniformly downward into the network resources, including the virtual network functions and the NFV infrastructure.

Service orchestration is the automation of processes to fulfill orders for life cycle managed information and communications technology services. Traditionally, this process automation problem has been solved through business process modeling and work flow management, which includes a great deal of system integration to glue together heterogeneous software components that do not naturally work together. The focus of process modeling is on “how” to achieve the desired result, not on “what” that result is. The “what” is the intent; the content of the order captures the intent.

To achieve agility in launching services, we must be able to model services in a manner that allows a service provider to redefine the service to suit the current business need. This modeling must be done by product managers and business analysts, experts in the service provider’s business. Any involvement of software developers and system integrators will necessarily require programming at a level of abstraction that is far below the concepts that are natural to the service provider’s business. The software development life cycle is very costly and risky, because the abstractions are so mismatched with the business. When service modeling directly results in its realization in a completely automated executable runtime without involving other humans in any software development activities, this becomes Programming for Non-programmers.

The key is Going Meta. The “what” metadata is the intent modeling. The “how” metadata is the corresponding fulfillment and provisioning behavior (service-level automation). If the “what” and “how” can be designed as a language that can be expressed in modular packages, which are reusable by assembling higher level intent based on lower level components, this would provide an approach that would facilitate the service agility users are looking for. Bundling services together and utilizing lower level services as resources that support a higher level service are familiar techniques, which would be familiar to users who are designing services. When users express themselves using this language, they are in fact programming, but because the language is made up entirely of abstractions that are familiar and natural to the business, it does not feel burdensome. General purpose programming languages like Java feel burdensome, because the abstractions are for a low level computational machine, not a high level business-oriented machine for service-level automation of human intent.

Our challenge in developing a modernized BSS/OSS is to invent this language for intent modeling for services. An IETF draft titled YANG Data Models for Intent-based NEtwork MOdel attempts to define a flavor of intent modeling for network resources. An IETF draft titled Intent Common Information Model attempts to define a flavor of intent modeling that is very general, but it is far removed from any executable runtime that can implement it, because it is so imprecise (not machine executable). ETSI NFV MANO defines an approach that captures intent as descriptors for network services as network functions and their connections. However, these abstractions are not expressive enough to extend upward into the service layer, across the entire spectrum of network technologies (physical and virtualized), and into the “how” for automation, to enable the composition of resources into services and the utilization of services as resources to support higher level services that can be commercialized. More thought is needed to design a good language for this purpose and a virtual machine that is capable of executing the code that is produced from it.