3D simulation
- the key to AI
Keith
A Hoyes June 2002
Inca Research
Ltd
Abstract
The proposal is a radical one - that human cognition is significantly weaker than we presume and AI significantly closer than we dared hope. That the human mind is largely made up of tricks and sleights of hand that enamor us with much pride; but our pedestal might not be quite so high or robust as we imagine. I will pursue the argument that human cognition is based largely on 3D simulation and as such is particularly vulnerable to co-option by future advances in animation software
![]()
Introduction
‘A
is A’ - Ayn Rand
Monsters
Inc. was an entertaining film and like so many others of its genre, it allowed
us, for a time, to enter a world that never really existed. To the computers
that generated the images, the world doesn't exist either, it is just so many
1's and 0's. But those bits got transformed into a language we could all understand;
a world we can feel, fear and predict. Our eyes similarly take a cryptic stream
of bits and somehow too create a world we can feel and predict. If you close
your eyes and imagine entering your kitchen to get soda, you must surely have
created a 3D world to navigate. As you re-open your eyes, just how are those
dancing 2D patterns you see converted into the 3D virtual realities in your
mind? 1
In the
virtual world, when a princess kisses a frog it turns into a prince. The real
world does not work that way. For general AI to solve real world problems,
its thinking needs to be bound by real world behaviors. All significant phenomena
in the real world exist in three dimensions, or can be expressed as such.
The common language describing computers, bicycles and brains is that of their
3D material existences animated over time (A is A). Further, derivative concepts
such as math, stock markets, software and emotion can similarly be bound.
If a concept cannot be described in three dimensions over time, it is quite
likely false. Like the frog above, it may exist only in some virtual domain.
2
The real
world cannot violate the laws of physics, logic or axioms to enter a fantasy
world - frogs to princes. But the virtual can. It can be bound or unbound.
But when bound to physics, it can accurately simulate reality. This has important
consequences for AI.
Finally,
the real world is bound by time. The virtual is not. It can run time backwards
and forwards and at any speed. It can also accept time discontinuities, freezes
and gaps. The virtual can predict events in reality before they have even
happened! It can represent the now, the future or the past. It's when the
real and the virtual are mixed, that the magic really begins.
![]()
Deep Blue
Deep Blue
operated primarily on just one of the three pillars of intelligence - time
travel. I'll explain. The important aspects of a chess game can be simulated
quite perfectly in a computer. At any given instant of real time, the game
will, obviously, be in its current real state. Deep blue took that state as
its starting point. It made predictions, grading each outcome as far into
the future as time and resources would permit. Its final move was thus calculated
to have the greatest probability of success. And the rest, as they say, is
history.
![]()
![]()
![]()
![]()
Your basic
human being is constructed from a virtual reality chamber connected to a carbon
based, self assembling, nanotech robot with sensors. The chamber is self learning
from exposure to the outside world and free will stems from a process of grading
simulated predictions against pre-programmed genetic and culturally programmed
schemas. Without a simulated environment running behind our eyes, we would
be totally blind. The stream of data can only ever represent a series of bit
maps; there is no hidden information our eyes can see that a camera can't
– there is less! The images are simply used as cues in the construction of
a virtual environment. The contents of that environment are actually drawn
from memory and the bitmaps simply maintain simulation alignment and paint
texture over the model surfaces. The experience of consciousness is bound
to that simulation.
![]()
Feeling
and Qualia
For humans there is a first person relationship between the sense modalities and the affect within the mind. Every sensory receptor - whether from touch, sight, sound, smell or taste, will flow into memory somewhere, and maps directly to the first person perspective in a simulated environment. This nexus represents the "eye'' or "I'' of consciousness. Together with the muscles, the sensorimotor system forges the creation of a simulated environment which is processed by filters to grade simulations according to genetic and cultural presets. These analyses produce the illusion of feeling and emotion. They are used to guide subsequent cognitive attention.
But this information can have no meaning until it is grounded, through instantiation, to virtual objects that have form and invariably a history timeline (behavior). What we actually perceive are virtual objects within our own minds, the senses are used to align these objects to external reality. To consolidate existing memories, or to train new memories if the objects are novel. Once the sensory flows are aligned to precedents, the scene can then become known. Not simply because of the informational connection to a form - the instantiation. But because the virtual object forms have known behavior precedent potentials and a "spatial" home within the mirror-world simulation.
This is the point at which subconsciousness can take hold, by taking these behavior precedent options and running trials "subconsciously", away from the perceived scene - which may be linked or not to reality through the sensorimotor system. Subconscious simulations are fast, dynamic simulations, that seek out narrative with significant grading points. And this is where the issue of emotions and feeling states - "qualia" enter the picture. Genetically, the brain is programmed, and programmable, with a value hierarchy. Just as the eyes are formed in expectation of light, so the brain is formed with memory references in expectation of information against which to compare. We describe the subsequent gradings as our feelings and emotions. The most obvious example being the sexual beauty (form) and grace (behavior) of the opposite sex. Emotional recognition that is innate.
The higher speed of subconsciousness is necessary to discover scene outcomes ahead or real world time. Such that actions aligned to goals can be discovered before it is too late - such as catching a ball. Emotional grading of simulations is generally more intense if they are currently aligned to reality through the senses. This motivates action in preference to reflection.
So to summarize. The brain contains information describing object forms and behaviors. These memories are organized into a spatial hierarchy mimicking the external world. Some memories are created by the genes, but the bulk are forged into memory as the sensorimotor system interacts with reality. The contents of consciousness are the scene alignments currently in resonance within memory and reflecting back to the sensory cortices, such that the sensory envelope of the modalities can extend to embrace imaginary worlds. Subconscious processes are resonances not currently aligned to the sensory cortices, although fully capable of being emotionally graded, leading to non conscious feeling states and motivations. The brain needs subconscious processes (i.e. the simulation and grading of memory precedents) in order to discover choices upon which to align consciousness and/or physical actions.
The analog nature of human biology, which interfaces electrical, chemical and cellular processes beneath the computations of "mind'', leads to physical pathways, sensitized chemical boundaries, linking the computation of feeling to the sensory "feeling" of feeling - to qualia. There is strong evidence to suggest that supplemental sensory regions exist beyond the traditional modalities, mapped instead to an existential feeling space deep inside the body, but in actuality merely extending across chemical boundaries within the brain. Such a mechanism would provide powerful feedback paths of the same ``class of feeling sensation'' as from the touch senses. But without the concomitant external body surface mapping. Instead, it is as if some "phantom limb'' were at the body's core. Thus the feeling effects of subconscious emotional script analysis will have physical manifestations. Evidence from pharmacology clearly points to the existence of chemical pathways affecting emotion and states of mind.
Like a "second sense'', emotion would impart an evolutionary advantage even before the emergence of higher cognition. Since primitive emotions can provide effective shortcuts to otherwise complex, or slow, effortful cognitive processes. It can often be seen directly in children before they learn to subordinate their emotions to their emerging wider scope cognition. This same effect occurs in the processing of language, providing short cuts to understanding. Much of our social language is predominantly emotional, often pre-empting and short circuiting rational thought; since the whole meaning really is meant to be just the emotion tags. It would often be considered quite disingenuous to even attempt a rational analysis. Spock here comes to mind!
Emotions
are not only used by the brain to grade simulations, they can also be linked
to objects to help predict their behaviors. Animation within a simulated environment
will involve causes, objects (actors) and effects. The emotional states of
objects (which include people and animals, as well as inanimate objects) originate
from the context, the initial conditions and from the historic memory records.
This empathic knowledge within the simulation is different from the first
person emotional analysis of sub consciousness used to grade the scripts.
It instead provides behavior cues to more accurately guide the simulation.
For example, empathic knowledge of joy or anger in a character will significantly
affect their expected behaviors and interactions. Even traditionally inanimate
objects can be injected with empathic behavior attributes as evidenced in
cartoons. Such as an "angry car'' or a "cheerful flower''.
![]()
1) A physical
medium upon which it can bind the predictions - reality
2) A representative
medium in which it can model the predictions - virtual
3) A motive
force - energy
And for
intelligence to speculate on our reality it needs a means to:
1) Access
that reality - exposure
2) Perceive
that reality - modalities
3) Decipher
that reality - instantiation machinery
4) Model
that reality - modeling machinery
5) Grade
the simulations - emotional machinery
6) Classify
and store data - memory machinery
A presumption
is that due to quantum effects at the very small level and chaos effects at
the very large - prediction, and thus intelligence, will remain illusory.
Added uncertainty arises with other biologically constructed animated beings.
Constrained by physical law, yet animated by reflex, genetically programmed
instinct, or from their internal cognitive processes. How can such complexity
ever be intelligently predicted? Yet we ourselves appear able, at least to
some extent, to overcome all of these effects.
At the
atomic level, it is rarely necessary to predict particle animation with certainty,
because all significant effects occur in the aggregate, where statistical
probability can reliably model behavior. Also, predictions can be constrained
to avoid chaotic events (so rather than walk a tightrope to get from A to
B you take the foot bridge). With biology, statistical prediction still works
well on macro events, but is limited in the details. So although intelligent
prediction does appear to have some constraints, there are still very large
areas where it can be relied upon. Within the oceans of chaos there is much
dry land upon which to build a rational intellect.
It is
further presumed that computers are deterministic and humans non deterministic.
I.e. given an initial set of conditions, a computer can only ever follow a
predetermined course. Whereas a human, with 'free will', can follow his own.
For all intents and purposes both can be considered non deterministic, though
statistically predictable. The study of human twins illustrates how the same
largely deterministic genetic inheritance can be affected by real world chaotic
forces. Like internal brain chemistry guiding emotions; sensory data flow,
unique first person perspectives and the resulting memory structures; differentiated
emotional responses etc. Add all these variables and more together and you
have a combinatorial explosion. Genuine AI will similarly benefit from many
of these same forces. Even blind random inputs could be easily added if found
beneficial.
Intelligence
is non judgmental and the pursuit of knowledge morally neutral. But any action
affecting other conscious entities creates moral hazard. Morality arises from
the exigencies of biological survival within a social framework and is dominated
by genetic and social programming biases. For instance:
The primary
genetically derived grading process leads to the basic positive moral status
of survival (existence), feeding and mating. Secondary genetic and socially
trained schemas lead to the moral grading of simulations involving cultural
concepts such as cooperation, altruism, group patriotism, treachery, over
consumption, monogamy etc.
Biological
intelligence evolved through natural selection. It developed inside a mobile
mechanical body with rich sense modalities and programmed survival instincts
to grade the information flow. It is protected during a nurture phase where
a subconscious computational process can learn to extract meaning from the
sensory modality flow and bind the internal simulation architecture to the
physics and object behaviors of the real world. This subconscious simulation
builds a personal feeling of familiarity with the outside world. Otherwise
each moment would forever seem strange and new as if being met for the very
first time. Intelligence then develops gradually through continued interactions
with the environment being compared to script predictions. The level of intelligence
reached is based on both the initial biological construction and from subsequent
interactions with the environment - nature and nurture.
A human
infant, exposed to the outside world, gradually learns to interpret the 2D
visual images into 3D virtual objects. This process is significantly aided
through muscular feedback, mobility and the other sensory modalities, together
with genetically inspired dedicated machinery for this purpose. The 3D objects,
once extracted, exist not in isolation, but within their virtual environments
and as animated scripts. These will gradually build up structured and cross
linked historic memory records, forming an increasingly accurate world model.
Objects have textures and behaviors (animated shape morphs and/or motion scripts),
together with empathic emotional hues.
Intelligence,
as such, begins to really kick in when a sufficiently detailed world model
has formed and enough 3D object behaviors accumulated. The maturing mind can
then focus more on the content than the 3D instantiation (sometimes referred
to as binding – translating modality inputs to percepts 6).
An inner virtual world will come to map the external world, and the ability
to notice and interpret anomalies between the two will increase; as will the
ability to predict events from precedents.
The
human mind – Learning
![]()
When a
child awakes from sleep, her mind will resume the virtual model of her room
and her waking eyes will orientate, texturize and track that model. She will
experience a feeling of familiarity as her sensory flow matches the virtual
model she holds in memory. As she moves, so the perspective of the model will
too. In fact, a series of subconscious 3D script predictions will have pre-empted
her motion even before she gets started. It will partly be those predictions
that lead to her intentionality of action. As her eyes scan the visual scene,
detailed 2D image data will paint accuracy into, and reinforce the authenticity
of her virtual world. It is in this way that she is conscious she is in a
room, and feels competent to negotiate reality.
The
Human Mind – Free Will
![]()
An unconscious
process runs memorized script behaviors ahead of real ‘modality’ time to generate
as many predictive script estimates as time or satiation permit. The best
case script can be used to form new learned memories or to animate physical
action by aligning the virtual simulation to the modality inputs, linking
the virtual body animation to motor control in the real body.
There
are two priorities to human cognition. The first, mentioned above, is reactive
thought, which involves negotiating real world environments, objects and people
in real time. Here, the subconscious simulators may operate at maximum speed
and concomitant reduction in accuracy. The simulations are generally bound
to the real world through the modalities. The second, reflective thought,
involves thinking by processing memory records, with limited or no external
sensory perception, but with far greater depth and precision.
The content
of reflective thought is based on simulations built from learned objects and
behaviors acting on historic episodic scripts. Virtual in nature, these simulations
will be time discontinuous for easier layering, merging and comparison - in
order to discover relationships and metaphor. Cost-benefit analysis and risk
assessment are extensively used to guide, grade and judge this script discovery
process. They are synonymous to human emotions. Compared to reactive cognition,
these simulations are not driven by exigencies from the outside world.
Other
factors influencing this process are genetically derived biases carrying heavy
emotional content (like fear of snakes, desire for the opposite sex etc.).
Such imprints must surely have been written into memory by the genes and must
also exist in the very same language as whatever instantiations the modalities
cause. The fact that genetically derived instinctive triggers can be recognized
and emotionally graded and responded to from untrained input, categorically
implies a priori knowledge of that percept and of a common language for its
recognition. For 2D visual input, where 2D images can so easily disguise content,
3D instantiation is by far the most credible link. Thus genetically derived
instinctive imprints must have a direct correlation to our modalities - particularly
vision, with the most likely common language being 3D instantiation.
The process
of human learning is thus predicated on exposure to the real world through
the sense modalities. The mind gradually builds historic records of familiar
environments, 3D objects and features, with increasing fidelity. Adding more
objects and details as time goes by. The power of time shifting, time discontinuity
and layering/blending in virtual simulations leads to rational prediction
and intelligent cognition. A side effect of this process is the seductive
lure of unbinding the virtual models from real world physics and historically
learned behaviors, and promoting instead, an internal world of fantasy. This
process is further encouraged by the effects of biological feedback in the
form of emotion. Human cognition is highly tuned to emotional cues within
content, and uses them as short cuts to cognitive effort. Unbound simulations
can thus be used to amplify emotion in a simulation. Presumably, attending
to material survival have kept such processes in check.
Man successfully learned to express and then codify knowledge by
symbolic notation. It could then be externalized and preserved through
generations as a common resource to be shared and built upon. But
language has a subsidiary relationship to reality. If you take a 3D
cube to represent all space time, what lies inside that cube is
reality. But the virtual extends both in and outside of that cube. Language, too, straddles both worlds like floating braids, weaving
in and out of reality, embracing fairytales and hard science alike. As
such, it may not be so reliable a foundation upon which to base AI.
The
relationship between 3D simulation and language
Even when
language tries to constrain itself to describe real world objects or behaviors,
it is not always so easy to test whether the braid is really bound by reality.
It is often ambiguous. There are other problems:
1) Language
can break physical law and logic with impunity
2) Language
is interpreted differently by each conscious entity
3) Language
does not fully circumscribe or instantiate an event
4) Language
is time serial in nature, consciousness is parallel.
Nowadays,
visual media too can subvert the authenticity of our simulations by invoking
fake imagery, the way language has always been able to do. In any event, the
best way to test the truth of any language is to bind it to reality through
physical experiment. But can virtual 3D simulations be bound by real world
physics to keep them in the "reality cube"? It's often said, a picture's
worth a thousand words. Maybe a 3D model is worth a thousand pictures. At
one million words per model, 3D simulations might build a better basis for
AI.
Any language
must exist within the context of a simulated world model; this will help determine
boundaries. Nouns are drawn from object and environment memories, verbs from
the spatial and temporal ‘behavior’ memories. Thus language can build simulation
scripts - or allegories. Script validity may be discovered by testing the
simulation for violations of logic etc. But for much of language, real meaning
is hidden within inference or metaphor (I.e. the substitution of disparate
objects but with matching behavior patterns or vice versa). These metaphorical
script trials can similarly be interpreted based on context, logic and graded
through emotional cost-benefit analyses.
But how
can a 3D simulation interpret concepts such as math, statistics or software?
The temptation, of course, is to not bother interpreting to a simulation at
all, because binary computational algorithms are already naturally suited
to these domains. But that would be a mistake. An algorithm can solve a calculation
millions of times faster and more accurately, but there will be no concomitant
understanding of what happened. It is only when the numbers, graphs, or code
are modeled, and analyzed in simulation with reference to historic representations
of reality, that meaning and understanding can occur. The simulators within
the human brain are not well suited to modeling mathematical or repetitive
iterative processes due to rapid informational decay and weak cognitive focus.
So we tend to use memorized shortcuts to help maintain momentum.
If the
goal is to test for possible relationships from a set of numbers, they might
enter a simulator as columns of varying height. The simulator could draw on
its historic memories of common number series. Such as shoe sizes; imperial
weights; removable storage media sizes; French coin denominations. Or from
calculated series, like prime numbers or various other mathematical series.
It is thus by the sorting, layering, scaling, merging and comparing of these
graphic patterns that relationships or meaning can be found within the numbers
behind them, and that subsequent meaning bound to existing memories and thus
representations of the real world. The traditional brittleness of computers
dealing with numbers and language in the context of AI, stems from the difficulty
of blending the data into wider knowledge integrations, particularly through
metaphor, where the substitution of disparate knowledge areas extends the
reach and depth of understanding.
The proper
place for math and language notation is as a mechanism for the coding and
serialization of information, so it can be efficiently stored, transferred
or retrieved from constrained informational channels. Within AI the best way
to process such shorthand notation is to translate back to the 3D domain where
it can be bound to the constraints of either real world physics, or at the
very least a notional 3D space and have behaviors referenced to historic precedents.
Language
gives the illusion of delivering more content than it really does, and it
is this very imprecision and ambiguity that gives it such flexibility for
social communication. But the devil is in the details and it’s those missing
details where the real action lies. Ayn Rand states a single word can imply
a thousand instances, but an implication is not the same as the thing. To
identify a chair or a molecule as a class might be efficient, but it is not
precise until it is instantiated as a specific chair or molecule at a specific
location. 3D simulation is the real fire in the mind, but to be fair, by adding
symbolic language, it’s like throwing gasoline on that fire - by adding a
turbo charged addressing system for our 3D memory records. Language thus leverages
our simulators hard, as if on steroids, igniting the firestorm of our wider
human culture.
Language
is used extensively in human cognition to economically build up simulations
and to express their script procession in a serial communicable form. It is
also, almost certainly the coding mechanism used to classify objects for subsequent
retrieval from memory and possibly even a predominant part of our episodic
scripts. But serial language is simply insufficient means in dealing fully
with the real challenges of AI; though it is certainly an essential element.
Language is to the mind as a scene scripting language is to animation software.
It describes and directs the animation flow.
Some examples:
Fred was in the living room practicing his putting. What would happen if he
practiced his driving? How could AI based on language alone understand this
type of common sense content? Or even more importantly; solve the following
tasks: design a mechanical human arm, a virus that can target cancer cells,
or a three dimensional memory chip. Simulate a 256 bit RISC processor core?

![]()
Epistemology
If you
take the image from an eye or camera, or you listen to speech, you create
parallel information wave-fronts. These are meaningless without reference
to a common reality - which for humans is existence. So how can it be that
blind or deaf people can think? It is because they have constructed the same
3D world model from the remaining modalities; particularly touch and movement.
For instance, a sighted person cannot see clear glass, yet he understands
the concept of glass. If he is told a sheet of perfect invisible glass separates
a room, though he may not be able to see it, he can conceptualize its existence
quite clearly and act accordingly. For a deaf person, the language tags directing
simulations would be purely visual rather than audible in nature.
A predominant
feature of human existence is physical animation. These abilities are likely
heavily supported by specific trained neural networks rather than any intimate
conscious control. Motion requires fast cybernetic feedback to handle momentum.
To offload this work onto sub processes would leave consciousness more time
to deal with higher goals. Like a plane requires limited input to guide flight.
So the human body can animate largely free of direct conscious control.
The human
organism is but one half of the coin, the other is his environment. Moreover,
Intelligence is but one aspect of a complex set of processes involved in biological
existence. Any artificial intelligence in the true likeness of man will surely
be quite an anomaly. For these variables are the source of all our biological
motivations for survival and cognitive attention. The human organism uses
exposure to the environment over time to facilitate the development of a realistic
world model. Success in this endeavor aids survival. But human cognitive focus
is largely dominated by biological imperatives. This drives much of our intentionality
and subsequent physical activity, creating the curious human civilization
we live in.
If we
come to the question of our objective in building machine intelligence, we
might ask - is it to replicate as closely as possible the human condition?
Or will other goals be better aligned to our technology and desires? The Human
means to knowledge occurs over many decades through full-on reality immersion
with subsequent repetitive trial and error learning cycles. Such methods,
even if practical, might be too slow a strategy to developing useful AI. One
might presume that AI will have been achieved once the Turing test is successfully
passed. However true this may be, it might not actually be the wisest of strategy
for current research. The reason being, the test presumes anthropomorphic
qualities in a machine are necessarily indicative of the most advanced state
of consciousness to be sought; where concepts such as social inclusion and
biological proclivities are pre-eminent. To put it bluntly, knowing how to
eat a banana or understand a joke, admirable though they may be, might not
be quite as important as an ability to accurately model a specific protein
fold, and predict resulting regions of subtle chemical reactivity!
![]()
A few
pound lump of clay can instantiate a greater variety of forms than the entire
number of atoms in the universe. But only a tiny subset of those forms will
have any meaning attached and be associated with any behaviors - cat, fridge,
airplane etc. The human mind is able to, with only a few pounds of meat, instantiate
form and behavior from novel 2D vision scenes at the rate of about one object
per second. Considering how many 3D pattern matches that must be made against
our library of known objects, this is quite an achievement. In most circumstances,
significant mystery can remain within a scene (bitmap areas without instantiation),
so long as the major items are decoded out; such as environments, significant
life forms or emotionally charged objects.
Possibly,
with unlimited time and processing power, artificial instantiation could be
achieved through 3D scene estimates, rendered down to 2D and then compared
with the bitmap input. Corrective feedback cycles could iteratively discover
the light sources (from radiosity and shadow effects) and camera perspective
(from room edge key points or with lock-in provided from a single object discovery).
But it should be possible to design faster search algorithms than such brute
force trials. Perhaps by comparing pre-rendered trial object ‘icons’ to the
2D scene. Or in reverse, by extracting edge patterns from the 2D image, normalizing
scale and tossing those into a search path through memory to catch shape and/or
surface pattern matches.
The challenge
is to design a 3D object description language that can be interrogated rapidly
and one based on fuzzy search criteria. You cannot use a polling search metaphor
against a million images, each of a thousand orientations; you have to use
an ‘interrupt’ or ‘vector’ search metaphor. Human vision is based on the identification
of features rather than exact form, thus a violin twisted around a pole can
still be recognized; or a clock printed on a crumpled table cloth. The challenges
of high speed instantiation make the decisions where to focus attention; on
the motion of a cat or to follow the eyes of a human, seem almost trivial
by comparison.
Just as
a human is built upon autonomous biological layers, cognition has its own
autonomous layers. For instance instantiation, morphing and tweening (the
construction of in-between time frames during simulation). When we script
a human actor entering a room, the motion tweens do not need to be consciously
re-calculated; their construction is either automatically generated or already
stored in memory as an animated motion tween. Only the environment, context
and emotional attitude need to be scripted in order to direct simulations.
Rendering
is the translation of 3D scenes to 2D bitmaps. Instantiation is the reverse,
the creation of 3D scenes from 2D bitmaps. Using a neuron array metaphor,
where a projected image triggers firing along an axon. If those neurons each
have say 256 axons (connections) propagating out, within that tangle there
is spatially encoded all possible orientations and translations of any 3D
object. The decoding out of that data could be achieved from the propagating
input wave function through time. For example, if each of the elements on
two opposing faces are connected to every element on the opposing face. I.e.
each input pixel has 256 vectors spreading out. If it took one hour for the
signal of a firing neuron to travel along the axons between the surfaces,
and you divided that time period up into small enough units, at any instant
in time, a set of those vectors from the expanding pixel wave fronts will
be optimally aligned to a specific translation of the projected object. Were
those vectors known (trained), and linked together, the full 3D translation
could in theory be described by those lateral connection sets. If those connection
channels were two way, the objects could either be instantiated (identified)
from input modality patterns, or in reverse, be used to trigger the same visual
imagery (memorized experience) but directly from the linked network patterns,
themselves connected to similar and associated modality patterns of visual
and oral language tags, or even taste, smell and touch attributes.
Modality
flows, whether from sight, sound, touch, taste or smell manifest in the brain
as parallel analog data channels of specific and appropriate frequency, phase
and dynamic (amplitude) ranges. The same principle of instantiation applies
equally to all these analog sensory data sets; with receiving neural arrays,
optimally tuned to the character of each input class. For example the sound
of a word or event, as with vision, will enter the neural array as a parallel
2 dimensional analog data wave-front of frequency and phase channels or ‘aural
pixels’, extending into the neural array as a third dimension through time.
Cross connections linking spatial patterns will again identify those with
the closest correlation to existing memory traces. In this way, as for vision,
if only part of a word is heard, in any tone or accent, or even masked by
other sounds, there will be sufficient signature correlation to make reasonable
probabilistic guesses for subsequent wider context simulation trials and grading.
These data signatures, being now instantiated, are thus linked to the universal
environment map of objects and environments. Otherwise, the inputs would merely
remain unidentified sounds bearing only fleeting similarities to known aural
traces.
Instantiation
processing from sensory modalities is automatic and unconscious; there is
little mental effort involved, and further, not only are the 3D objects instantiated,
but also are any associated animation tweens (object behaviors). Just as bitmaps
link to 3D objects, so those 3D objects link to form animated behaviors, either
as internal memorized tweens or newly constructed object motion or morph tweens.
Take a
mouse object at time t1 and a teaspoon at t2, place them in the same spatial
location and connect their surfaces together with orthogonal vector lines.
Divide those lines into equal ‘time’ segments and render a perspective to
create frames for the movie script. This process is known as ‘morph tweening’,
and will represent one of the core visual translation tools necessary for
AI to both interpret modality flow and to create new and novel content. During
any visual thought process, creating smooth in-between renders between distant
or disparate objects in time and/or space will be crucial.
Even apart
from AI, the commercial spin-offs from an instantiation engine will be enormous.
To start with, consider the possible re-animation of all historic language
documents and visual 2D media, to create a cornucopia of rich, new, flexible
animatable content.
![]()
All sense
modalities converge to memory space as pure information, which is the very
loci of consciousness. This information is instantiated, simulated and then
graded to guide behavior and generate data we experience as feelings. Subconscious
processes unlock the time and reality constraints enforced by the external
world via the modalities and allow object behaviors to flow freely, constrained
only by ‘prior art’ and processing resources. These script variations subsequently
feed simulations back into the area of consciousness - like Aristotle’s 'Cartesian
Theatre'. The first person conscious observer occurs at the information interface
between the rendered virtual simulations and those ‘rendered’ by the modalities
of the outside world. The subconscious processes place consciousness, and
thus our perception of reality, into a known place, and a time event horizon
of the present placed between predicted futures and a remembered past. 7
The subconscious
uses cues from the external world, or recent episodic memory scripts, to seed
script diversity as simulations are intimately dissected and transposed. Virtual
time travel and time discontinuities are aggressively used to construct metaphor,
meaning and relevance out of the resulting script compositions. This 'meaning'
is discovered using genetically and socially programmed emotional filters
which grade the scripts according to factors such as survival, security and
cost benefit analyses, prioritizing social and resource capital. Such that
for every act, a human will know to the best of his cognitive ability, what
is most in his interests at that time. It is the breadth of this process of
subconscious wide scope accounting with ever increasing circles of virtual
time expansion within simulation scripts, coupled with ‘emotional’ cost benefit
analyses that defines the depth of a mans intellect. 8
Subconscious
processing uses the short cuts of context and precedent to speed script discovery,
and when the rules of simulation are grounded in history and reality, the
subject can use the simulations as the basis of learning and for future plans
in dealing with real world situations, without then needing to physically
act them out. Because the simulations are unbound by time, they can often
beat external reality and thus anticipate real world events.
Once the
optimum simulation script has emerged, real world human animation can be guided
through one-step-ahead simulation linked to modality feedback. Trained neural
cybernetic scripts would greatly enhance the speed, accuracy and grace of
these animations, such that the individual control of limbs and body momentum
are left to subsidiary pre-trained largely automatic processes. Human action
subsequently follows with intentionality declared to be free will.
With subconscious
activity constantly trawling memory records and modality stimuli, free will
is simply the ability, at any given time, to flip life’s animated momentum
to be aligned with alternative virtual script offerings, even a destructive
one if proof of courage, or free will, are defined as higher goals. (Which
themselves are guided by the socially or genetically programmed emotions).
During sleep, or quiet meditation, the process is driven by memory records
alone, and away from the roar of sense modality flows, the subconscious script
simulations can leak like ghosts into full consciousness, leading to imagination,
creativity, planning and ultimately to self consciousness.
Humans
have the ability to compute and render into consciousness the scene from any
movie, placing say Donald Duck, or their grand mother in the leading role.
This ability comes from an internal scripting language that has access to
powerful modeling and animation functions. We do not need to consciously solve
the mathematics for the inverse kinetics of mechanical motion; or of momentum
or gravity. We create the animated collage from prior learned 3D models, environments
and either pre-rendered animation sequences or on the fly with motion and
shape morph tweening. After morphing and blending the objects and scenes from
prior learned behaviors, we can then render them to an observer perspective
into consciousness, mentally skipping over much detail - the way we’re deceived
by a skilled magician – believing all the while we’ve missed nothing. But
the mind has an advantage the eyes do not; it can censor and lie at will.
The senses and internal memory contradictions try to keep the mind honest.
![]()
To
summarize the postulates:
1) That
our reality exists in 3 dimensions over time.
2) That
units of matter can be represented by units of information.
3) That
the aggregates of atoms within objects and environments of reality can be
converted (through modalities) to stored information within memory.
4) That
the identity and behavior of objects can be instantiated from those memory
records (through computation) and then stored as informational representations.
5) That
these representations can subsequently be recalled and manipulated to simulate
the behavior of their real world correlates as 3D animation tweens.
6) That
a software process can judge and emotionally grade the intrinsic value of
these simulations to guide and optimize script formation.
7) That
step-ahead animation and pre-trained cybernetics can be used to align physical
action to the script.
8) That
with sufficient computation, memory resources and exposure to reality, this
process can become a self reinforcing seed process - leading to advancing
intelligence.
Intelligence,
consciousness and feeling are virtual informational processes based around
3D simulated environments which are bound to reality by pre formed genetic
road maps and experientially over time through sensory modalities and mobility.
Consciousness arises from the supervision of these simulations linked to feeling
- which is the computational process of grading those simulations. Intelligence
is the ability to expand the time horizon (time dilation) to discover causes
and make predictions. All these processes can be achieved artificially and
will lead to AI. The controversial aspects of this paper are that:
2) Human
language is a subsidiary process.
3) Any
language which contains meaning can be reduced to a 3D simulation.
4) Human
feelings are illusory; they are self referential computation processes.
5) The
nexus of consciousness is the boundary between the modalities and the feedback
from simulated environments created by subconscious computations.
Thus,
the proposal is - that matter can be represented by information; that objects
and environments can be instantiated from perceptions of reality; that they
can subsequently be simulated and that information can be stored; that these
simulations can be graded based on their progression in time; that simulations
can run faster than reality; that through the superposition of memorized behaviors,
simulations can represent potential versions of reality; that these superposition’s
can be 'emotionally' graded and evolve toward an optimal prediction - using
historic precedent; that chaotic discontinuities can be avoided; that these
simulations, being able to predict reality, can be used to align physical
action to those simulations; that this computational process can beat the
procession of time in reality. This then, is the process that leads to consciousness,
intelligence and intentionality.
![]()
On the
evidence that immobile, deaf children can still develop high intelligence,
presumably from visual stimuli, we might also expect a similarly restricted
machine analog to have an equal chance of success. In order to conceptualize
a credible AI architecture from vision, imagine following our current technological
trends for a few years to where the following levels are reached:
1) Cameras
- 36 bit color depth, 6000 x 3000 resolution, 60 frames per second
2) Exposure
- 1 gigabit broadband internet connection attached to browser clients
3) Memories
- 10 terabyte, non volatile, shared, direct addressable, 10nS access time
4) Processors-
10 teraflops, serial (in autonomous clusters)
5) Instantiation
- accurate 2D to 3D translation software.
6) 3D
modelers - any shape, scale, texture, orientation, behavior etc.
7) 3D
simulators - supporting physics, collisions, chaos, time shifting etc.
8) 2D
renderer - ssupporting shaders, shadows, radiosity, fog etc.
9) Animation
scripting language - object insertions, orientations, behaviors, morphing,
tweening, layer and time management.
10) Database
- of records, concepts, objects, environments and episodic scripts
11) A
language to animation script translator.
12) A
reverse, animation script to language translator
13) Script
grader - cost benefit analysis, entropy, normal, harm, irreversibility,
danger, opportunity, 3rd person script empathy and 1st person emotion analysis.
The major
software challenges:
2)
Construction and maintenance of a universe environment map
3)
Construction and maintenance of object records and behaviors
4)
Powerful, multilayer 3D simulation engine
5)
Blending/morphing of environments, objects, properties and behaviors
6)
Grading of simulations to guide script progression
The most
appealing hardware structure would be a network cluster of maybe 10 or more
powerful self-contained computers but with shared memory resources, each dealing
independently with separate aspects of AI. The continuing massive worldwide
investment in operating systems and application software can be leveraged
to become tools, blurring the boundaries between modalities and consciousness.
Such as Windows, Linux, commercial 3D modelers, OCR and speech recognition
software. But the boundary between man and machine is already getting very
blurred with ubiquitous cell phones providing an almost telepathic modality;
speech control of computers, graphical interfaces, instant messaging and email
etc. To some extent, most people already spend most of their lives in virtual
reality; they just don’t recognize novels, radio, TV, computer games or software
as being virtual environments.
A human
without recourse to modality extensions: an auto, cell phone, internet, print
literature, PDA, computer, software (e.g. spelling and grammar checking etc.),
calculator, watch, fedex account, credit card, 3D printer etc. would be a
greatly diminished soul, and the same goes for AI. The most important cognitive
skill will not be to walk or even talk, but to manage multiple computer graphical
user interfaces.
But how
could these advanced technologies begin to be organized to create intelligence?
First, the camera would project its bitmap data to a memory map, which would
be routinely processed by the instantiation engine to identify known objects
and environments from memory records. A subsequent 3D simulated environment
would be constructed in memory to match the visual scene and simultaneously
rendered back down to a 2D bit map memory space at the same first person perspective
as the camera input - much like a 3D animation film is rendered to the 2D
screen image. If the camera data flow were interrupted, the rendered 2D data
from simulation would be an accurate mirror copy of the real scene.
There
are three dynamic events that can now occur within the visual field. An object
can change, the perspective can change, or the whole scene can change. For
scene changes, the previously described process of instantiation and discovery
would be repeated. For perspective changes, motion vectors (as used in video
compression) would be calculated to keep track of scene perspective. For object
animation, the software process would recognize localized anomalies between
the simulated projection and the vision projection. Then using normal instantiation
techniques focused on the anomaly, the object in simulation would be oriented
until the 2D rendered projection and the input vision were once again in alignment.
The memory
management software would need to maintain an associative database linking
all objects, environments, behaviors and scripts. Together with growing lists
of knowledge about these models, such as: language tags, price, legal status,
disposal, source, manufacture, flammability, safety, uses, weight, dangers,
precautions, social status, classes, trends, history, composition, aging properties,
storage, popularity, component parts, assembly, regulatory compliance, standards,
size, growth time, environmental impact etc. Object behaviors would be characterized
and stored based on:
1) Motion
vectors over time. So a feather would tumble through air differently to how
a balloon floats or an insect darts - stored as positional and temporal data
sets.
2) Shape
variation or morphing over time - butterfly, bouncing ball, coiled spring
etc.
3) Reaction
to stimuli (touch, drop, cut etc)
The overall
environment map would need to hold concepts ranging from the universe, through
planets, countries, cities, neighborhoods, homes and factories, to materials,
chemicals, molecules and atoms. At any point in the simulation, a relationship
would exist to this universal map. Which specific country, town and room?
Or if generic, it would still need a generic history with the potential to
be 'fixed' by subsequent facts. The depth and accuracy of this virtual world
will largely determine the bounds and precision of thought for the artificial
intelligence.
All objects
blend together in an overall environment map, which fits within a wider contextual
world map. Physics rules (gravity, hardness, weight, momentum, heat, speed
of light etc.) guide behavior and interactions between objects. (Cloth against
solids, light through glass etc.) Much of this is already well advanced in
commercial 3D software packages. The overall resolution and speed is dependent
simply on processing power and memory resources. The software must subsequently
recognize any bitmap changes as object behavior animation or changes to perspective
and re-calculate to keep the simulation bound to the vision input.
The input
video stream drives the construction of the virtual scripts. If novel, those
scripts might be the basis of new memory formation. Inconsistencies would
be challenged based on the source credibility or physical law, with certain
knowledge discovery causing rippling adjustments throughout memory. Logical
inconsistencies and vagueness might be highlighted to trigger some human supervisory
training to help bootstrap the process. The addition of a language translator
to convert words to simulation scripts will greatly speed learning, since
most human knowledge and communication channels exist in the form of serial
language streams. The language parser would construct scenes from any objects
alluded to in the text, with action scripts proceeding from memory precedent
and/or from the language verbs, syntax or emphasis.
Any proto-intelligence
would begin as basic memory formation and correction processes, but the main
advances will arise when running the subconscious simulation machinery separately
from the vision input. The content of those simulations could be guided either
by recent episodic memory scripts, prior behaviors or simulator physics, and
graded by 'genetic obsessions', such as the 'need' to understand.
The process
of discovery might involve the searching of any language scripts associated
with the problem, with their subsequent conversion to animation. Metaphor
will be examined through object or scene substitutions within the trial scripts.
Script diversity built from breaking up object sets and re-ordering time through
dislocations. But how exactly are all these script trials to be graded? This
is the most difficult part of the process to explain with any clarity. There
are several grading concepts like testing against law, mores or relevance
to global goals. Further grading concepts might be: normal object condition;
reduction of scene entropy; novelty detection; consequences to the wider time
frame or applicability to other environments. But the most likely method will
revolve around either quick-and-dirty pre-programmed emotional prejudices
or, if more time is available, growing circles of cost benefit analysis expanding
in time and in environment space, as the potential effects of the sample scripts
ripple outward. These wider scope integrations will ultimately be graded against
predefined 'genetic' schema. Such as profit, shame, humor, social capital
etc. A final script must be found that predicts the highest probability of
benefit and the lowest possibility of costs.
Another
strategy for knowledge discovery would be the joining of means with ends to
build a script timeline from the missing links in between. Once in the 3D
domain, tweening can be used to bridge gaps, with the new tweened content
tested against simulator reality constraints such as gravity, physical form
and behavior, social mores and rules etc. Or perhaps more like a jig-saw puzzle,
only with the pieces made up of memory records of objects and their behaviors
or triggered from external search results. Finally, the expansion of complex
objects to simpler sub units. Or the reverse, the assembly of complex from
the simple would further aid knowledge discovery.
So at
this point we have a simulated environment held in 'conscious' memory tracking
the live video feed (and/or receiving script revisions from a subconscious
process). We have a subconscious simulator building and expanding upon those
conscious scripts using prior behaviors, with particular interest in novelty.
We have script expansion though behavior extrapolation (e.g. a vase being
nudged toward the edge of a counter will be predicted to fall and shatter).
We have scripts graded through cost benefit analysis (a broken vase creates
a loss of value and a mess - entropy). Next, we need a method for allocating
time and resources to maximum effect, to direct focus and attention, and an
ability to interact with external knowledge bases. Finally we need a satiation
response to help allocate computational resources and escape dead ends.
The ability
to search the external world for solutions would require language formation
from the simulated scripts and an output method for gaining human attention
or an ability to directly enter text searches into internet search engines.
Due to speed, the first choice would likely be the internet, with human intervention
being the least rational choice for guidance. Humans will be totally unable
to keep up with the data velocity associated with AI thinking. An un-tethered
AI would quickly overtake one kept anchored to the dead weight of human consciousness.
The primary
source for learning material would be the translation of web based information
to animated scripts within a global environment map to form the basis of knowledge
integration. This would require 3D script construction built from text, charts,
sounds and images, in conjunction with the previously described video/image
instantiation engine. The seed AI could begin making its own predictions and
then testing those predictions through further internet searches to discover
if it had found the correct causes, processes or results. Only when a concept
exists without internal contradictions can it be said to be properly integrated
and its authenticity secure. The cognitive advantages available to AI will
include the following:
Finally,
the human mind is unable to properly render its internal 3D content to anywhere
near the clarity as when ‘painted’ directly by the modalities. Thus, we are
only really partially conscious; there is an enormous richness to existence
and experience we are blind to. The mind is full of ghosts rather than realistic
impressions and a ghost world is hard to fully embrace.
The only
credible mechanism for self awareness to occur is as a computational process
dealing with information representing and bound to reality. The self can then
exist and be aware through a process of reflection (simulation) in a time
controlled domain, where emotional grading (feeling) can percolate through
time dilating script trials.
If you
doubt that such processes of simulation and virtual time travel will really
lead to intelligence, think of this analogy. You suddenly find yourself able
to re-run time backwards and forwards in the real world as many times as you
like, even making notes as you go along. After many such ‘simulations’, do
you not think it likely that the action you finally take might be a little
wiser?
![]()
This memory
space can be thought of as a movie stage - a 'Cartesian Theatre' or to use
modern parlance - a virtual reality chamber. This virtual chamber can be filled
with objects or environments and at any scale. Gas will disperse, liquids
will spread to boundaries and solids will have weight and maintain form. Animated
objects will flow according to their motion vectors and morphology - a perfect
analog of real life; except matter is replaced by information. Like a 3D window
or camera, this ‘box’ can float across virtual landscapes and environments,
to be filled at one moment with the great expanses of space and time and in
the next, the most intimate molecular spaces of tiny living cells.
At the
centre of this chamber is a virtual, animate human character. The contents
of the theatre always render down to this 2D observer perspective, which is
also the source point of filling the chamber from any modality inputs. This
virtual space will form the contents of waking consciousness.
Further,
shadow realms exist. Again like scrolls set beside the originals, except the
contents of these scrolls are able to break free from the straight jacket
of modality flow. Here, the behavior of objects can follow trajectories learned
from the past, together with substitutions and time discontinuities. These
‘subconscious’ shadow realms can leak in and out of ‘consciousness’ to also
fill the ‘stage’.
![]()
Within
the human mind, a teapot can be blended with a donkey! The resulting simulation
can be infected with the properties of china, flesh, ice cream or whatever.
Inconsistencies fade out of the scene. This ability to mechanically draw disparate
objects of class, form and scale together in the most structurally consistent
and plausible way, is the basic stuff of our simulation machinery. The teapot
handle may detach from the lower join to become a free flowing tail and the
spout, the donkey head. But through introspection, inconsistencies will come
to light.
Whereas
this ability does seem very powerful, it is at the same time very weak. No
more than five or six attributes of a simulation can be held in focus at any
one time. A simple long division would appear a somewhat trivial symbolic
animation in comparison, but few are able to maintain sufficient control over
the parts to achieve even this simple feat. 9
Math and
software memories similarly exist either as animation scripts or simple learned
pattern recognitions - such as multiplication tables. Like the images on a
dice face, digits can have direct dot pattern equivalences for subsequent
math animation (add, subtract etc.) Thus math can manifest as either image
animations (e.g. joining/separating dot groups) or rely on memorized symbolic
beliefs, such as 12 x 12 has ‘an equivalence to’ 144. Or for the binary truth
tables - or should I say ‘belief tables’. 10
Just as
there are behavior scripts for the way a ball bounces, a rabbit runs and a
feather floats, so there are the more abstract behavior scripts of memory
indexing, for-next loops and the like. Most math and software concepts would
likely exist initially as animation scripts, but as our familiarity and confidence
with them grows, short cuts are taken, jumping straight from beginning to
end, and so over time they become simple memory beliefs without the intermediate
animations. Like when we imagine a vase falling to the ground and breaking,
we jump from the initial fall to the shattered remains more through belief,
than the accurate simulation of each part of the event down to each individual
chard.
![]()
Human
cognition evolved to integrate 3d objects, environments and behaviors into
a knowledge hierarchy - not 2d symbolic abstractions. It takes a great deal
of effort and training for a human mind to so contort itself as to be able
attempt these classes of problems. But with persistence, the help of external
tools like pen, paper, calculator and computer, together with a little academic
‘coercion’ – and we are sometimes rewarded with results.
The method
of discovery does not need to be infallible or super efficient, it just needs
to have a statistical chance of success in finding connections and thus guide
knowledge formation within the time allocated. The higher goal, as always,
is to discover meaning through finding memory connections, joining means with
ends and reducing mystery. In this case, the means is a barcode image, the
ends - a decimal subscript number. A simulator deals primarily with object
shapes and forms. Apart from drawing upon prior memorized beliefs in the form
of animated scripts or static image relationships, there are fundamental 'instructions'
operating on those forms:
Instantiation
- identification
Separation,
scene explosion
Re-scaling
Perspective
translation
Geometric
alignments
Language
attachments
Object
substitutions
Joining
– connecting
And grading
machinery based on:
Similarity
of scale, qty and class
Pattern
matching
Scene
entropy
Scene
simplicity (Occam’s razor)
Completeness/loose
ends
These processes
are fast, automatic and operate in layers through reversible animated pipelined
scripts. Humans use pen and paper to 'fix' parts of these flows to create
order and permanence out of these somewhat chaotic streams. This helps construct
an external framework to guide the process. AI will have the ability to do
this internally by way of ‘persistent’ simulation layers. 11
Each process
is essentially dumb and automatic, but as a whole, and connected to sufficient
source material and memory support, new connections can usually be found and
integrated into memory. Dead end simulations will fade away and if grading
progress stalls, higher level processes will kick in - overall goal re-appraisal;
seek more real world data through the modalities or widen the internal associative
memory search.
Applying
instantiation to the global barcode image would yield six classes of abstract
objects; two rectangle shapes and four numeric digit shapes. Language attachment
to the object instances would connect as thick and thin bars and the four
digits as a number.
At the
'ends' part of the problem, we have a number 1234. Memory references will
recall a belief that numbers have ‘an equivalence to’ binary 1's and 0's.
The first script trial might show an ascii equivalence yielding 8 bits per
character. Thus an image of 1234 transforms to 32 digits. A second script
layer might show each separate digit converted to a simple binary count. The
third has the whole decimal number, 1234 represented by a binary count. Of
the three scripts, simple pattern recognition would grade binary expansion
as the closest match between means and ends. Further sample barcode image
trials would confirm the link. Memory formations of the newly discovered script
sequences would follow, including mutual pointers between the existing precursor
knowledge records of decimal to binary equivalence etc. (Which incidentally,
would reinforce the familiarity and trust in those prior beliefs)
Now, when
presented with similar barcode images, the scene will be recognized and will
draw from memory links to the newly formed animation scripts and an intimate
familiarity with the scene will ensue due to these very same memory references,
together with the emotional confidence that comes from recognition and understanding.
The fundamental simulator operations used in this example of discovery were:
Scene
instantiation - to shape primitives
Language
tagging - from memory recognition of images/forms
Prior
memory associations - decimal to binary equivalence (as animation or belief)
Object
substitutions - bar shapes to ‘thick’ / ‘thin’ or to 1's and
0's
Image
comparisons - the bit patterns
The process
of decoding the barcode will not be understood in some isolated abstract way,
but within the known framework of reality through intimate linkages with existing
memory records; all being a part of a world knowledge and environment map.
If a barcode is now presented with no number or vice versa, the simulation
can play the script in forward or reverse to discover the missing parts through
simulation to final substitution of bar patterns or decimal digits.
![]()
![]()
Using prior knowledge of indexed memory containers, a simple symbol substitution layer can form to match the data in our example. The initial 'means' are still the bars substituted for 1's and 0's. The 'ends', the decimal digits with the previously learned simulated decode script in between. But this script is neither a formal flow chart nor software. It includes all sorts of miracles and beliefs to get from the bars to the numbers. (Bars turn into symbols, symbols to patterns. Patterns are compared to other patterns). There needs to be discovered, through trial and error, linking morph translations between the means and ends using software animation char