Tag Archives: artificial intelligence

World Models for Artificial Intelligence

My thoughts about world models has evolved over time. The topic is about how artificial intelligence (AI) should represent world models for more accurate reasoning about the real world. I started off wondering if model weights in the neural network need to represent abstract syntax tree (AST) nodes, which are the in-memory structures within a compiler for parsing a language. I am currently thinking that existing LLMs based on their understanding of language should be able to reason perfectly well against structured (schema compliant) documents that have an ontology.

Evolution of my thoughts on this topic

2024-05-28

My thoughts return to question: how can we represent an AST in an encoding that a neural network can do inference on? A generalized AST would provide a way of modeling the world. This would enable accurate reasoning that is not possible with a lexical tokenization of language.

2024-08-04

This is why I believe we need to research how to represent an Abstract Syntax Tree as tokens (an AST Node is a token) that a neural network can operate on. I was talking to a friend yesterday about this, and the insight he added was that NN model weights are like a mathematic computation akin to a Fourier transform. This gave me a lot of hope that my idea that an AST Node (as the unit to represent any concept generically) can be encoded as a token (a number). Once we can do this, we can now represent any language, and therefore any mental model of anything in the real world. (Not just tokens as word fragments like the current generation of LLMs.)

2024-08-25

Another avenue to explore is to use an initial encoding to determine what kind of model to use. Classify the problem as natural language, lexical, math, logic, programming (each language is distinct), chemistry, etc. Every domain has its own way of modeling the world. Using this classification, parse the information again using that domain-specific model. Now the AI can reason about the problem using an optimal representation.

2025-10-06

It might be too divergent from today’s neural network paradigm for neurons to represent an AST directly. Now, I believe we don’t need to do that, because we don’t need to have a metamodel at the neuron level to represent any language.

We already have several metamodels that today’s LLMs can understand well: YAML and JSON. Any domain specific language can be represented in these formats. A world model can be represented as a document in this format according to some schema.

Perhaps that is how a human brain represents a world model anyway. We have a concept, a document. Concepts have relationships, which are links. We reason based on large collections of tokens together: a set of concepts and their relationships.

I have no feel for how my human neurons represent primitive data types like numbers. It certainly isn’t lexical (string of digits). However, this is where the document format world model is advantageous. The NN doesn’t have to take on the burden of having a representation of numbers like a human brain does, because a NN has the advantage of working with adjunct brains through documents. These are programs that do computation precisely, and an AI can make use of it through MCP.

An AI doesn’t need to have an understanding of numbers and math. We have seen how poorly LLMs do math by themselves. An AI merely has to use a MCP server to communicate the document representing the math problem to a math solver (e.g., Maple or Mathematica), and the answer will always come back perfectly solved. The AI’s responsibility is only to understand how to formulate the document request and consume the document response. That is a language the LLM can fully understand how to process.

That can be generalized to everything within the scope of computing.

2026-01-20

The more that AI coding agents are guided and driven by markdown documents, the more I am convinced that a world model is merely a set of documents. Therefore, it is in a LLM’s wheelhouse to comprehend and update documents to maintain its memory durably.

2026-05-23

Consider this. No matter how we model things as data structures in memory, we almost always define a serialization format for network transfer or for durable storage. The file format, especially today is almost always preferred to be yaml or json (including variants of these).

With LLMs trained for coding, tokenization of yaml and json should be pretty good. Harnesses can query against the structure using jq or improvised python. That enables sophisticated reasoning against a model represented in those formats, especially when compliant to a schema, which most APIs would be.

My hypothesis is that because world models are represented according to some ontology that governs its in-memory structures, its wire protocol for an API, and its file format for durable storage, it would be natural to think that world models are just documents in yaml or json with a schema along with a ontology to describe the meaning in natural language of the classes, properties, relationships, behavior. That means LLMs can reason perfectly well against real world models as documents.

LLM-based agents will do reasoning with world model documents better as we invent better representations of ontology to drive the reasoning. Today, we have tool names and tool descriptions in MCP, and we have frontmatter in markdown for skills. We need to do more work in this area to make any json schema or yaml schema understandable (semantically) with an ontology.

Everything as Code

I’d like to coin a term: Everything as Code (EaC). Everything done for engineering is becoming computerized or computer aided. Everything that is computerized becomes software driven. Everything software driven becomes “as code”. Ultimately, all engineering is becoming software engineering.

This extends from:

  • Infrastructure as Code – using a language like Terraform to automate the provisioning of computing infrastructure
  • Configuration as Code – using YAML, JSON, or other precise schema-validated specifications in combination with GitOps processes to configure the deployment of software components

Everything as Code – using natural language specifications of intent to drive an AI agent in combination with GitOps

Software development is story telling

As software development evolves from human programming activities to AI-assisted activities, our efforts will concentrate more on expressing intent concisely and precisely through story telling. Many of our failings in software are due to our inability to foresee everything we need, before we build. Then, the flawed design and implementation imposes crippling constraints on what can be added or fixed later. Even as we learn from our mistakes, often the sunk cost of software misleads us into thinking that a rebuild will incur that cost again. That is why some things never get fixed properly or in a timely fashion until a competitor offers a total replacement.

AI-assisted software development offers hope that the toil of grunt coding ceases to bottleneck progress. The heavy boat anchor of legacy code is not an insurmountable burden if tools can automate the rewriting of thousands of lines of code in minutes, provided a precise enough context and intent can be expressed through documents and prompts.

The question is, what should that context and intent look like? That answer has plagued software development since the beginning. Various methodologies have offered techniques that have yielded mixed results in how humans analyze requirements and design solutions. A mix of languages (metamodels) and notations (visual, textual) have been used for modeling. It has always been a struggle manifesting a god’s-eye model into software realization, because of the enormous human effort it takes to meticulously reinterpret a conceptual model into precise machine instructions.

AI-assisted tools will motivate us to race faster toward inventing more concise and precise languages and notations for expressing conceptual models, so that automation can reduce the burden of software implementation. Methodologies will further concentrate our efforts on expressing context and intent, as “just a matter of programming” (IYKYK) is reduced to large scale inferencing.

Paleo-software engineering methodologies include structured, functional, object-oriented, and test-driven. Industrial and commercial problem domains are too large and complex for any human to reason about all at once.

We often approach modeling from a 30,000 ft view, as though we are cruising at altitude in an aircraft and describing the lay of the land below. There are structural and behavioral descriptions, as well as non-functional qualities (so-called ‘-ilities’). This mental model provides a skeleton, but “the devil is in the details”. Such a conceptual model is essential for providing a vocabulary, otherwise we cannot even begin to articulate anything in our analysis and design journey. That journey involves “peeling the onion” to explore finer levels of detail, until there is enough understanding for humans to write code. Human error and the imprecision of human expression and interpretation plagues this process.

To be able to speak about a problem domain, we must first establish a vocabulary of nouns for structures and verbs for behaviors. Naturally, this vocabulary expands to include relationships among structures and between the nouns and verbs in how subjects act against objects under various circumstances (states) and with constraints.

Analysis and design is about describing an imaginary world that becomes real. It is story telling. The purpose is not to entertain. It is to author a future that will be manifest in machines and the humans who use them.

There will be a gap in descriptive detail, as we begin telling the story from a 30,000 ft altitude, and we descend to bring more granular elements into focus. Traditionally, the remaining distance between story telling and machine executable code is closed by human effort. Developers fill in this gap. We aspire for this toil to be replaced by machine inferencing. Human inference produces imperfect results because of flawed (incomplete, inconsistent, ambiguous, incorrect) descriptions. Machines will give similar outcomes until we improve story telling.

We should recognize that AI-assistance in software development is not limited to coding. Story telling is in a Large Language Model’s wheelhouse. With the totality of human knowledge at its disposal, any problem domain can be largely understood by AI through its ability to research.

The creative aim of software development is to make incremental (sometimes revolutionary) improvements on the current state. Therefore, the vast majority of analysis and design is to build tooling to facilitate what we do today largely unaltered. Innovation targets a few narrow pain points that yield valuable benefits.

AI-assistance in story telling could provide the mundane description of the current state of a problem domain, leaving the narrow creative portion to human imagination. A machine researcher can do far better than a human in writing stories that provide complete coverage, that are accurate and consistent, and that provide enough detail to leave no dangerous holes.

Define AI consciousness

The rise of artificial intelligence has led many to wonder whether AI consciousness exists. We use this word without a precise definition. Its rough definition includes “subjective experience”. Let’s explore what that is, so we can characterize it somewhat more precisely.

Certainly if we define “consciousness” as “something magical beyond physics”, it is defined outside the realm of testability, because our ability to measure is limited within the scope of physics. It would help to define consciousness within the scope of physics so that it is testable.

If consciousness is not magical, its mechanisms are within the realm of physics. The definition should be definable according to physical mechanisms, observations, and measurements. Although subjective experience is internal to the mind, responses to queries about that internal subjective experience should be able to externalize views of that model.

Subjective Experience

Let’s define “subjective” and “experience”.

Experience is a model of the world. It must capture the essential existents within the subject’s scope of contact across space and time. Past and present models must reflect actual events. Future models must be recognizable as simulations of potential events, not actual events that have come to pass.

Subjective means that within the experience, the model includes a representation of the subject’s self. The subject must be able to identify oneself and its own name and characteristics. The subject must be able to recognize the relationships between self and other entities in the model. It must be able to recognize its own identity, characteristics, and relationships with consistency across space and time.

There is a separate topic of “free will” or perhaps more testable “will”. I will leave that as an exercise outside the scope of this analysis. I don’t consider will to be essential to consciousness.

Orientation

Medical professionals typically assess whether someone is mentally present in reality—often referred to as being “oriented” or having intact reality testing—using three key criteria: orientation to person, place, and time.

With regard to orientation to person, does the individual know who they are and can they identify others around them? This includes awareness of their own identity (name, age, etc.) and recognizing key people, such as family members or caregivers.

Orientation to place and time is about the recognition of the relationship of the subject to the other entities in the world. This requires a world model that includes self and one’s relation spatially and temporally to others. Historical memories should be relatively accurate. Visions of potential future events should be understood to be imaginary, not hallucinated as actual events that have come to pass.

Accuracy

Human memories and situational awareness are far from perfect. Past memories are pruned, rolled up, and summarized. Our model of the present is highly selective and biased in how we interpret percepts. (If it looks and sounds like a duck…)

I cannot at this time specify what accuracy should be the threshold for whether a model exhibits consciousness. Certainly greater than zero across samples in time. Certainly less than 100% is acceptable. What level would be convincing?

Rather than considering accuracy to be a scalar, We might consider certain types of inaccuracy to be disqualifying.

Errors in identifying self (as a unique entity) across space and time are disqualifying.

The above are my half-baked thoughts on the matter. Hopefully, these ideas provide a basis for further refinement.

Use AI to Accelerate Software Development

People think that AI will accelerate software development by generating code. Coding is only a tiny part of the story. So much of software development precedes code, and these activities are usually the blockers that impede technology acceleration.

Before any code can be written, there is the question of what needs to be built. No work can begin until there is a purpose. Work requires time and resources, which necessitates investment. An investment needs a business case. This means getting to know what customers (users) want, what they are willing to pay for, and whether it would be worth building. You are not an AI accelerated software developer, if you have no purpose, if you don’t know what your customers want, or if you can’t justify the investment.

Traditionally, coders working in a commercial setting have relied on product managers, business analysts, and executives to acquire such knowledge and make such decisions. Usually the coders are not subject matter experts. Coders are always asking “what is the requirement?” Coders are paralyzed waiting for answers.

They don’t know the problem domain (the customer’s business) and the terms of art. They don’t have a mental model of the problem space. They don’t know what the actors do (journeys, use cases) and how they collaborate to accomplish their business objectives. They don’t know how these actors do their work, how they need to see their information, and what tasks they perform in certain contexts. They don’t know scale (how many users, how many transactions per hour, how much information of each type). They don’t know cost. They don’t know availability. They don’t know regulatory compliance. They don’t know their customer’s products, pricing, and business policies. There is so much they don’t know about the world, because all they know is how to code.

Coding is always blocked waiting for these responsibilities to be fulfilled by roles who don’t code. The decision makers don’t know the technical side of how to code. The coders don’t know the business side of why or what to code. The collaboration is always impaired by this mismatch in skills and knowledge.

For AI to accelerate software development, the business-focused roles must be elided with the technically-focused role. AI tools must be built to fill the skills and knowledge gap for whomever is in the driver’s seat. Human jobs need to be lost and replaced by AI. Not clear is whether coders will learn to use AI to drive the business; or whether product managers will learn to use AI to code. My bet is on the latter, even though my sympathies and bias are with the former. I don’t see AI gaining the necessary competencies to become entrepreneurial, visionary, and business savvy.

On the other hand, very few people in the business-focused roles are any good at it. We need only look at the lack of success that Tim Cook, CEO of Apple, is having in introducing innovative new products relative to what Apple has done in the past under the leadership of Steve Jobs, a true visionary and a genius in designing desirable products. All lower-ranked roles are responsible for a lesser subset of Tim Cook’s decision-making. It is a rare gem who can achieve 1% the success of a Steve Jobs, when it comes to product vision and design. Coders who have this kind of talent would have become founders. Perhaps, AI can help accelerate this more for coders who have such an inclination. By the sounds of how Tom Bilyeu applies AI for exploring business ideas, this is not as far fetched as I imagine (because of my own ignorance in applying AI in this way).

In the near-term, the quality of vibe code generated by AI is hit-or-miss. A great deal of human supervision is needed to make it work. With the progress we are seeing in AI advances and the cadence accelerating, we should not dismiss the possibility that vibe coding without a heavy human burden for supervision will become routine. Since this is a goal that many people are pursuing, this outcome is pretty much guaranteed. There is virtually no chance it will NOT happen.

Until recently, I did not think that AI would have good taste and judgement to produce good designs. Then, I was introduced to Cline. It is a coding assistant. You guide it by writing a project brief, which documents your project’s purpose, structure, standards, guidelines, constraints, technology selections, style preferences, and rules. You write these in exactly the manner that you normally would in concise English for human team mates. Amazingly, Cline understands and complies perfectly. Through this experience, I am now confident that an AI can be guided to do good design by documenting design principles, archetypes, patterns, and trade-offs. AI may not be there yet today. There is virtually no chance it will NOT happen.

Personal Assistants

Continuing the series on Revolutionizing the Enterprise, where we left off at Sparking the Revolution, I would like to further emphasize immediate opportunities for productive improvements, which do not need to venture into much-hyped speculative technologies like blockchain and artificial intelligence. Personal assistants fits the bill.

In the previous article, I identified communication and negotiation as skills where software agents can contribute superior capabilities to improve human productivity by offloading tedium and toil. Basic elements of this problem can be solved without applying advanced technology like AI. Machine learning can provide additional value by discerning a person’s preferences and priorities. For example, this person is always preferring to reschedule dentist appointments but never reschedules family events to accommodate work. Automating the learning of rules enables the prioritization of activities to be automated, further offloading cognitive load.

In my own work, I wish I had a personal assistant, who could shadow my every move. I want it to record my activities so I can replay them later. I want these activities to be in the most concise and compact form, not only as audio and video. For example, as I execute commands in a bash shell, I want to record the command line arguments, the inputs, and the outputs, so this textual information can be copied to technical documentation. As I point and click through a graphical user interface, I want these events to be described as instructions (e.g., input “John Doe” in the field labeled “Name” and click on the “Submit” button).

With a history of my work in this form, this information will be useful for a number of purposes.

  • Someone who pioneers a procedure will eventually need to document it for knowledge transfer. Operating procedures teach others how to accomplish the same tasks by observing how it was done.
  • Pair programming is often inconvenient due to team members being located remote from each other and separated by time zones. An activity log can enable two remote workers to collaborate more effectively.
  • Context switching between tasks is expensive in terms of organizing one’s thoughts. Remembering what a person was doing, so that they can resume later would save time and improve effectiveness.

The above would be a good starting point for a personal assistant without applying any form of AI or analytics. Then, imagine what might be possible as future enhancements. Procedures can be optimized. Bad habits can be replaced by better ones. Techniques used by more effective workers can be taught to others. Highly repeatable tasks can be automated to remove that burden from humans.

I truly believe the places to begin innovating to revolutionize the enterprise are the mundane and ordinary, which machines have the patience, discipline, and endurance to perform better than humans. More ambitious technological capabilities are good value-adds, but we should start with the basics to establish personal assistants in the enterprise as participants in ordinary work, not as esoteric tools in obscure niches.

[Image credit – Robotics and the Personal Assistant of the Future]

Sparking the Revolution

In my previous article, Revolutionizing the Enterprise, I provided an outlook for how emerging technologies may help to transform how we do work. Now, let’s explore how we might provide the spark that starts the fire to burn down the old and welcome the new. The world does not change in a radical way without a progression of steps that pave a path for getting from here to there. What might the first step be to spark the revolution? How do we introduce robots and AIs as personal assistants into the regular work lives of employees?

We need only look to our daily struggles to identify where every person would see the value of machine intelligence. Organizing a meeting among several participants can be challenging. You need to find a convenient time when every participant is available. You need to find a suitable venue that can accommodate everyone. If folks need to travel, the complexity rises enormously, because each traveler’s attendance is then dependent upon successfully booking travel arrangements. The risk of a single unsatisfied requirement causing the meeting to be non-viable rises with each participant and their special needs. If the meeting needs to be moved to accommodate certain participants, this would then trigger a storm of activity to renegotiate, and a flurry of activity to explore how calendars can be readjusted with a cascade of renegotiations of other appointments, each having its own priority and constraints.

This kind of negotiation among a network of people is virtually impossible to accomplish by humans among each other, because of the latency for human communications. However, if every human could be represented by an agent, who could negotiate on their behalf, this kind of activity could become painless. Imagine how many hours of phone tag, email, and travel booking could be saved. Even if an agent were not entrusted to finalize decisions on travel booking, all of the negotiation and arrangements could be prepared and presented for final approval by the human; or even involve the human at key decision points by presenting a short list of options to guide the way forward for the agent.

I believe, ordinary mundane problems such as this one, which every person has experienced, will serve as an opportunity to introduce machine intelligence to work alongside us. The off-loading of such unproductive and non-creative toil to an automated personal assistant would be a welcome change that would be seen as another useful tool, rather than a radical development. And that’s how the revolutionary should begin.

Revolutionizing the Enterprise

It has been over five years since I wrote an article titled Enterprise Collaboration, in which I identified the need for innovations to transform how people do their work. Since then, we have seen no significant advances. Enterprise applications continue to move very slowly to the cloud, driven primarily by cost efficiencies with little noticeable functional improvement except at the margins (big data analytics, social, search, mobile, user experience).

Where can we go from here?

I still firmly believe that a global work force needs to be decoupled in space and time. Mobility and cloud services will continue to provide an improving platform to enable work to be performed at any time from wherever people want. We should enable people to do their work as effectively from the office as from home, in their vehicles, during air travel, at the coffee shop, or anywhere else they happen to be. Advances in computing power, miniaturization, virtual reality, alternative display and input technologies (e.g., electronic skin, heads up displays, voice recognition, brain computer interfaces, etc.), and networking will continue to provide an improving platform for inventing better ways of doing work and play. This path does not need too much imagination to foresee.

Recently, we have seen an up-tick in applying artificial intelligence. Every major company seems to be embracing AI in some form. Image recognition and natural language are areas that have been researched for decades, and they are now being employed more ubiquitously in every day applications. These technologies lower the barrier between the virtual world and the real world, where humans want to interact with machine intelligence on their own terms.

However, I believe an area where AI will provide revolutionary benefits is in decision support and autonomous decision-making. So much of what people do at work is tedium that they wish could be automated. Some forms of tedium are drudgery, such as reporting status and time to management, organizing and scheduling meetings among team members, planning work and tracking progress, and keeping people informed. These tasks are routine and time-consuming, not creative and value-producing. Machines can interact among themselves to negotiate on behalf of humans for the most mundane tasks that people don’t really care too much to be involved in. Machines can slog through an Internet full of information to gather, prune, and organize the most relevant set of facts that drive decisions. Machines can carry out tasks on their own time, freeing up humans to work on more important or interesting things.

Personal assistants as computing applications are a new phenomenon. Everyone has heard of Amazon Echo and Google Assistant by now. I can imagine advances in this capability expanding into all areas of work and personal life to help off-load tedium. As AI becomes more capable, we should see them taking over mundane tasks, like research (e.g., comparing products to offer recommendations toward a purchasing decision, comparing providers toward recommending a selection), planning, coordinating, note taking, recalling relevant information from memory, distilling large volumes of information into a concise summary, etc. Eventually, AI will even become capable enough to take over mundane decision-making tasks that a person no longer cares to make (e.g., routinely replenish supplies of consumables from the lowest priced supplier, repetitive tasks).

The other phenomenon that will revolutionize the work place even more than in the past is robotics. Robots have already revolutionized manufacturing for decades by replacing repetitive error-prone labor-intensive tasks with perfectly reproducible error-free automation. We are seeing politics influence businesses to apply robots, where human labor sufficed in the past, purely because of the increasing cost of labor. Minimum wage legislation (bans on jobs that pay less than some mandated floor in wages) that raises labor costs above the value produced will force businesses to rethink how to operate profitably. Beyond entry-level jobs, such as fast food service, self-driving cars and trucks are already in trials for ride-sharing and long haul cargo transport. As robots become more dexterous, mobile, compact, and intelligent, we will see them become personal assistants to perform physical tasks as much as we see them in software form perform computing tasks. We should anticipate that robots will serve in a broad spectrum of capacities from low-skilled drudgery to highly-skilled artisans and professions.

The future enterprise will involve a work force where humans, AIs, and robots collaborate closely. Humans have a comparative advantage in performing creative and path-finding tasks with ill-defined goals, many unknowns, and little experience to draw upon. Robots and AIs have a comparative advantage in performing repetitive, well-defined, and tedious tasks. Together, they will transform the enterprise in ways that we have never seen before.