Tag Archives: modeling

World Models for Artificial Intelligence

My thoughts about world models has evolved over time. The topic is about how artificial intelligence (AI) should represent world models for more accurate reasoning about the real world. I started off wondering if model weights in the neural network need to represent abstract syntax tree (AST) nodes, which are the in-memory structures within a compiler for parsing a language. I am currently thinking that existing LLMs based on their understanding of language should be able to reason perfectly well against structured (schema compliant) documents that have an ontology.

Evolution of my thoughts on this topic

2024-05-28

My thoughts return to question: how can we represent an AST in an encoding that a neural network can do inference on? A generalized AST would provide a way of modeling the world. This would enable accurate reasoning that is not possible with a lexical tokenization of language.

2024-08-04

This is why I believe we need to research how to represent an Abstract Syntax Tree as tokens (an AST Node is a token) that a neural network can operate on. I was talking to a friend yesterday about this, and the insight he added was that NN model weights are like a mathematic computation akin to a Fourier transform. This gave me a lot of hope that my idea that an AST Node (as the unit to represent any concept generically) can be encoded as a token (a number). Once we can do this, we can now represent any language, and therefore any mental model of anything in the real world. (Not just tokens as word fragments like the current generation of LLMs.)

2024-08-25

Another avenue to explore is to use an initial encoding to determine what kind of model to use. Classify the problem as natural language, lexical, math, logic, programming (each language is distinct), chemistry, etc. Every domain has its own way of modeling the world. Using this classification, parse the information again using that domain-specific model. Now the AI can reason about the problem using an optimal representation.

2025-10-06

It might be too divergent from today’s neural network paradigm for neurons to represent an AST directly. Now, I believe we don’t need to do that, because we don’t need to have a metamodel at the neuron level to represent any language.

We already have several metamodels that today’s LLMs can understand well: YAML and JSON. Any domain specific language can be represented in these formats. A world model can be represented as a document in this format according to some schema.

Perhaps that is how a human brain represents a world model anyway. We have a concept, a document. Concepts have relationships, which are links. We reason based on large collections of tokens together: a set of concepts and their relationships.

I have no feel for how my human neurons represent primitive data types like numbers. It certainly isn’t lexical (string of digits). However, this is where the document format world model is advantageous. The NN doesn’t have to take on the burden of having a representation of numbers like a human brain does, because a NN has the advantage of working with adjunct brains through documents. These are programs that do computation precisely, and an AI can make use of it through MCP.

An AI doesn’t need to have an understanding of numbers and math. We have seen how poorly LLMs do math by themselves. An AI merely has to use a MCP server to communicate the document representing the math problem to a math solver (e.g., Maple or Mathematica), and the answer will always come back perfectly solved. The AI’s responsibility is only to understand how to formulate the document request and consume the document response. That is a language the LLM can fully understand how to process.

That can be generalized to everything within the scope of computing.

2026-01-20

The more that AI coding agents are guided and driven by markdown documents, the more I am convinced that a world model is merely a set of documents. Therefore, it is in a LLM’s wheelhouse to comprehend and update documents to maintain its memory durably.

2026-05-23

Consider this. No matter how we model things as data structures in memory, we almost always define a serialization format for network transfer or for durable storage. The file format, especially today is almost always preferred to be yaml or json (including variants of these).

With LLMs trained for coding, tokenization of yaml and json should be pretty good. Harnesses can query against the structure using jq or improvised python. That enables sophisticated reasoning against a model represented in those formats, especially when compliant to a schema, which most APIs would be.

My hypothesis is that because world models are represented according to some ontology that governs its in-memory structures, its wire protocol for an API, and its file format for durable storage, it would be natural to think that world models are just documents in yaml or json with a schema along with a ontology to describe the meaning in natural language of the classes, properties, relationships, behavior. That means LLMs can reason perfectly well against real world models as documents.

LLM-based agents will do reasoning with world model documents better as we invent better representations of ontology to drive the reasoning. Today, we have tool names and tool descriptions in MCP, and we have frontmatter in markdown for skills. We need to do more work in this area to make any json schema or yaml schema understandable (semantically) with an ontology.

Applied Cosmology: Dimensions and Degrees of Freedom

Today’s lesson in Applied Cosmology: dimensions and degrees of freedom In physics, Minkowski spacetime has 4 dimensions (3 spatial dimensions and 1 time dimension), expressed as X⁴. In curved spacetime, the number of degrees of freedom for X⁴ is the number of parameters to specify this model: fourteen (14).

  • 4 coordinates for identifying a point in spacetime (x, y, z, t)
  • 3 rulers in the space dimensions to measure distance
  • 1 clock in the time dimension to measure duration
  • 3 protractors (x-y, y-z, z-x) to measure the angle of each space dimension with respect to the other space dimensions
  • 3 protractors (x-t, y-t, z-t) to measure the angle of each space dimension with respect to the time dimension

In the same way, we wish to identify configuration dimensions with respect to separation of concerns. The number of separate concerns is the number of dimensions.

Within each dimension, each concern has many parameters. The number of dimensions is modest (fewer than a dozen?), while the total number of degrees of freedom (parameters) is large (hundreds?).

For example, the horizontal scaling dimension is parameterized for the platform and infrastructure by the number of worker nodes in a cluster. Within an application component (e.g., deployment or statefulset), horizontal scaling is parameterized by the replicaset scale.

The vertical scaling dimension is parameterized for the platform and infrastructure by the compute shape (cpu architecture, cpu, memory, boot volume) of each worker node. Within an application component, vertical scaling is parameterized by the cpu, memory, and storage requests and limits of each container within the pod template. Other dimensions of interest are:

  • high availability
  • disaster recovery
  • workload complexity
  • workload scale
  • workload isolation
  • security isolation (and many more)

intent modeling as a programming language

This article explores how intent modeling can serve as a programming language for launching services with agility.

In Tom Nolle’s blog article titled What NFV Needs is “Deep Orchestration”!, he identifies the need for a modernized Business Support System and Operations Support System (BSS/OSS) to improve operations efficiency and service agility by extending service-level automation uniformly downward into the network resources, including the virtual network functions and the NFV infrastructure.

Service orchestration is the automation of processes to fulfill orders for life cycle managed information and communications technology services. Traditionally, this process automation problem has been solved through business process modeling and work flow management, which includes a great deal of system integration to glue together heterogeneous software components that do not naturally work together. The focus of process modeling is on “how” to achieve the desired result, not on “what” that result is. The “what” is the intent; the content of the order captures the intent.

To achieve agility in launching services, we must be able to model services in a manner that allows a service provider to redefine the service to suit the current business need. This modeling must be done by product managers and business analysts, experts in the service provider’s business. Any involvement of software developers and system integrators will necessarily require programming at a level of abstraction that is far below the concepts that are natural to the service provider’s business. The software development life cycle is very costly and risky, because the abstractions are so mismatched with the business. When service modeling directly results in its realization in a completely automated executable runtime without involving other humans in any software development activities, this becomes Programming for Non-programmers.

The key is Going Meta. The “what” metadata is the intent modeling. The “how” metadata is the corresponding fulfillment and provisioning behavior (service-level automation). If the “what” and “how” can be designed as a language that can be expressed in modular packages, which are reusable by assembling higher level intent based on lower level components, this would provide an approach that would facilitate the service agility users are looking for. Bundling services together and utilizing lower level services as resources that support a higher level service are familiar techniques, which would be familiar to users who are designing services. When users express themselves using this language, they are in fact programming, but because the language is made up entirely of abstractions that are familiar and natural to the business, it does not feel burdensome. General purpose programming languages like Java feel burdensome, because the abstractions are for a low level computational machine, not a high level business-oriented machine for service-level automation of human intent.

Our challenge in developing a modernized BSS/OSS is to invent this language for intent modeling for services. An IETF draft titled YANG Data Models for Intent-based NEtwork MOdel attempts to define a flavor of intent modeling for network resources. An IETF draft titled Intent Common Information Model attempts to define a flavor of intent modeling that is very general, but it is far removed from any executable runtime that can implement it, because it is so imprecise (not machine executable). ETSI NFV MANO defines an approach that captures intent as descriptors for network services as network functions and their connections. However, these abstractions are not expressive enough to extend upward into the service layer, across the entire spectrum of network technologies (physical and virtualized), and into the “how” for automation, to enable the composition of resources into services and the utilization of services as resources to support higher level services that can be commercialized. More thought is needed to design a good language for this purpose and a virtual machine that is capable of executing the code that is produced from it.

going meta – the human-machine interface

Anatomy of an n-tier application

A fully functioning web app involves several layers of software, each with its own technology, patterns, and techniques.

At the bottom of the stack is the database. A schema defines the data structures for storage. A query language is used to operate on the data. Regardless whether the database is relational, object-relational, NoSQL, or some other type, the programming paradigm at the database tier is distinctly different than and quite foreign from the layers above.

Above the database is the middle tier or application server. This is where the server-side business logic, APIs, and Web components reside.

There is usually a set of persistent entities, which provide an object abstraction of the database schema. The database query language (e.g., SQL) may be abstracted into an object query language (e.g., JPQL) for convenience. The majority of CRUD (create, read, update, delete) operations can be done naturally in the programming language without needing to formulate statements in the database query language. This provides a persistent representation of the model of the application.

Above the persistent entities is a layer of domain services. The transactional behavior of the business logic resides in this layer. This provides the API (local) that encapsulates the essence of the application functions.

The domain services are usually exposed as SOAP or RESTful services to remote clients for access from Web browsers and for machine-to-machine integration. This would necessitate that JSON and/or XML representations be derived from the persistent entities (i.e., using JAXB). This provides a serialized representation of the model of the application.

We finally come to the presentation layer, which is divided into server-side components residing in the application server and client-side components that execute in the Web browser. Usually there is a presentation-oriented representation called a view-model, which matches the information rendered on views or input on forms. The view and controls are constructed from HTML, CSS, and JavaScript. The programming paradigm in these technologies is distinctly different than the layers below.

Extending the application

Let’s examine what it would take to extend an application with a simple type (e.g., string) property on an entity. The database schema would need to be altered. A persistent entity would need a field, getter and setter methods, and a binding between the field and a column in the database schema. The property may be involved in the logic of the domain services. Next, the JSON and XML binding objects would need to be augmented with the property, and logic would be added to transform between these objects and the persistent entities used by the domain services. At the presentation layer, the view-model would be augmented with the property to expose it to the views. Various views to show an entity’s details and search results would likewise be enhanced to render the property. For editing and searching, a field would need to be added on forms with corresponding validation of any constraints associated with that property and on-submit transaction handling.

That is an awful lot of repetitive work at every layer. There are many technologies and skill sets involved. Much of the work is trivial and tedious. The entire process is far from efficient. It is worse if there is division of labor among multiple developers who require coordination.

A better platform

When confronted with coordinating many concomitant coding activities to accomplish a single well-defined goal, it is natural for an engineer to solve the more general problem rather than doing tedious work repeatedly. The solution is to “go meta”; instead of programming inefficiently, develop a better language to program in. Programming has evolved from machine language to assembly language for humans to express instructions more intuitively. Assembly evolved to structured languages with a long history of advances in control and data flow. Programming languages have evolved in conjunction with virtualization of the machine (i.e., bytecode) to provide better abstractions of software and hardware capabilities. In the spirit of Guy L. Steele’s Growing a Language talk from OOPSLA ’98, components, libraries, and frameworks have been developed using a programming language that itself supports extending the language itself within limits. All of these innovations continually raise the level of abstraction to increase human productivity.

We are hitting the limits of what can be expressed efficiently in today’s languages. We have a database storage abstraction that is separate from server-side application logic, which is itself separate from client-side (Web browser) presentation. There is growing support for database and server-side abstractions to scale beyond the confines of individual machines. Clustering enables a software to take advantage of multiple machines to distribute load and provide redundancy in case of failure. However, our abstractions seem to stop at the boundaries between database storage, server-side application logic, and client-side presentation. Hence, we have awkward impedance mismatches when integrating top-to-bottom. We also have impedance mismatches when integrating together heterogeneous application components or services, as RESTful and SOAP Web Services technologies cross the boundaries between distributed software components, but this style of control and data flow (remote procedure calls) is entirely foreign to the programming language. That is why we must perform inconvenient translations between persistent entities and their bindings to various serialized representations (JSON, XML).

It seems natural that these pain points will be relieved by again raising the level of abstraction so that these inefficiencies will be eliminated. Ease of human expression will better enable programming for non-programmers. We are trying to shape the world so that humans and machines can work together harmoniously. Having languages that facilitate effective communication is a big part of that. To get this right, we need to go meta.