The most appealing hardware structure would be a network cluster
of maybe 10 or more powerful self-contained computers but
with shared memory resources, each dealing
independently with separate aspects of AI. The continuing
massive worldwide investment in operating systems and application
software can be leveraged to become tools, blurring the boundaries
between modalities and consciousness. Such as Windows, Linux,
commercial 3D modelers, OCR and speech recognition software.
But the boundary between man and machine is already getting
very blurred with ubiquitous cell phones providing an almost
telepathic modality; speech control of computers, graphical
interfaces, instant messaging and email etc. To some extent,
most people already spend most of their lives in virtual reality;
they just don't recognize novels, radio, TV, computer games
or software as being virtual environments.
A human without recourse to modality extensions: an auto, cell
phone, internet, print literature, PDA, computer, software
(e.g. spelling and grammar checking etc.), calculator, watch,
fedex account, credit card, 3D printer etc. would be a greatly
diminished soul, and the same goes for AI. The most important
cognitive skill will not be to walk or even talk, but to manage
multiple computer graphical user interfaces.
But how could these advanced technologies begin to be organized
to create intelligence? First, the camera would project its
bitmap data to a memory map, which would be routinely processed
by the instantiation engine to identify known objects and
environments from memory records. A subsequent 3D simulated
environment would be constructed in memory to match the visual
scene and simultaneously rendered back down to a 2D bit map
memory space at the same first person perspective as the camera
input - much like a 3D animation film is rendered to the 2D
screen image. If the camera data flow were interrupted, the
rendered 2D data from simulation would be an accurate mirror
copy of the real scene.
There are three dynamic events that can now occur within the visual
field. An object can change, the perspective can change, or
the whole scene can change. For scene changes, the previously
described process of instantiation and discovery would be
repeated. For perspective changes, motion vectors (as used
in video compression) would be calculated to keep track of
scene perspective. For object animation, the software process
would recognize localized anomalies between the simulated
projection and the vision projection. Then using normal instantiation
techniques focused on the anomaly, the object in simulation
would be oriented until the 2D rendered projection and the
input vision were once again in alignment.
The memory management software would need to maintain an associative
database linking all objects, environments, behaviors and
scripts. Together with growing lists of knowledge about these
models, such as: language tags, price, legal status, disposal,
source, manufacture, flammability, safety, uses, weight, dangers,
precautions, social status, classes, trends, history, composition,
aging properties, storage, popularity, component parts, assembly,
regulatory compliance, standards, size, growth time, environmental
impact etc. Object behaviors would be characterized and stored
based on:
- Motion vectors over time. So a feather would tumble through air differently to how a balloon floats or an insect darts - stored as positional and temporal data sets.
- Shape variation or morphing over time - butterfly, bouncing ball, coiled spring etc.
- Reaction to stimuli (touch, drop, cut etc)
The overall environment map would need to hold concepts ranging
from the universe, through planets, countries, cities, neighborhoods,
homes and factories, to materials, chemicals, molecules and
atoms. At any point in the simulation, a relationship would
exist to this universal map. Which specific country, town
and room? Or if generic, it would still need a generic history
with the potential to be 'fixed' by subsequent facts. The
depth and accuracy of this virtual world will largely determine
the bounds and precision of thought for the artificial intelligence.
All objects blend together in an overall environment map, which
fits within a wider contextual world map. Physics rules (gravity,
hardness, weight, momentum, heat, speed of light etc.) guide
behavior and interactions between objects. (Cloth against
solids, light through glass etc.) Much of this is already
well advanced in commercial 3D software packages. The overall
resolution and speed is dependent simply on processing power
and memory resources. The software must subsequently recognize
any bitmap changes as object behavior animation or changes
to perspective and re-calculate to keep the simulation bound
to the vision input.
The input video stream drives the construction of the virtual scripts.
If novel, those scripts might be the basis of new memory formation.
Inconsistencies would be challenged based on the source credibility
or physical law, with certain knowledge discovery causing
rippling adjustments throughout memory. Logical inconsistencies
and vagueness might be highlighted to trigger some human supervisory
training to help bootstrap the process. The addition of a
language translator to convert words to simulation scripts
will greatly speed learning, since most human knowledge and
communication channels exist in the form of serial language
streams. The language parser would construct scenes from any
objects alluded to in the text, with action scripts proceeding
from memory precedent and/or from the language verbs, syntax
or emphasis.
Any proto-intelligence would begin as basic memory formation and
correction processes, but the main advances will arise when
running the subconscious simulation machinery separately from
the vision input. The content of those simulations could be
guided either by recent episodic memory scripts, prior behaviors
or simulator physics, and graded by 'genetic obsessions',
such as the 'need' to understand.
The process of discovery might involve the searching of any language
scripts associated with the problem, with their subsequent
conversion to animation. Metaphor will be examined through
object or scene substitutions within the trial scripts. Script
diversity built from breaking up object sets and re-ordering
time through dislocations. But how exactly are all these script
trials to be graded? This is the most difficult part of the
process to explain with any clarity. There are several grading
concepts like testing against law, mores or relevance to global
goals. Further grading concepts might be: normal object condition;
reduction of scene entropy; novelty detection; consequences
to the wider time frame or applicability to other environments.
But the most likely method will revolve around either quick-and-dirty
pre-programmed emotional prejudices or, if more time is available,
growing circles of cost benefit analysis expanding in time
and in environment space, as the potential effects of the
sample scripts ripple outward. These wider scope integrations
will ultimately be graded against predefined 'genetic' schema.
Such as profit, shame, humor, social capital etc. A final
script must be found that predicts the highest probability
of benefit and the lowest possibility of costs.
Another strategy for knowledge discovery would be the joining of
means with ends to build a script timeline from the missing
links in between. Once in the 3D domain, tweening
can be used to bridge gaps, with the new tweened
content tested against simulator reality constraints such
as gravity, physical form and behavior, social mores and rules
etc. Or perhaps more like a jig-saw puzzle, only with the
pieces made up of memory records of objects and their behaviors
or triggered from external search results. Finally, the expansion
of complex objects to simpler sub units. Or the reverse, the
assembly of complex from the simple would further aid knowledge
discovery.
So at this point we have a simulated environment held in 'conscious'
memory tracking the live video feed (and/or receiving script
revisions from a subconscious process). We have a subconscious
simulator building and expanding upon those conscious scripts
using prior behaviors, with particular interest in novelty.
We have script expansion though behavior extrapolation (e.g.
a vase being nudged toward the edge of a counter will be predicted
to fall and shatter). We have scripts graded through cost
benefit analysis (a broken vase creates a loss of value and
a mess - entropy). Next, we need a method for allocating time
and resources to maximum effect, to direct focus and attention,
and an ability to interact with external knowledge bases.
Finally we need a satiation response to help allocate computational
resources and escape dead ends.
The ability to search the external world for solutions would require
language formation from the simulated scripts and an output
method for gaining human attention or an ability to directly
enter text searches into internet search engines. Due to speed,
the first choice would likely be the internet, with human
intervention being the least rational choice for guidance.
Humans will be totally unable to keep up with the data velocity
associated with AI thinking. An un-tethered AI would quickly
overtake one kept anchored to the dead weight of human consciousness.
The primary source for learning material would be the translation
of web based information to animated scripts within a global
environment map to form the basis of knowledge integration.
This would require 3D script construction built from text,
charts, sounds and images, in conjunction with the previously
described video/image instantiation engine. The seed AI could
begin making its own predictions and then testing those predictions
through further internet searches to discover if it had found
the correct causes, processes or results. Only when a concept
exists without internal contradictions can it be said to be
properly integrated and its authenticity secure. The cognitive
advantages available to AI will include the following:
- Persistence
in simulation layers
- Simulation
accuracy and precision (e.g. for math & software)
- Increased
number of conscious objects
- Increased
size of simulation
- Accurate
simulation of physical law
- Accurate
'photographic' memory
- Multiple
parallel modality inputs (e.g.100 simultaneous internet
channels)
- Extended
modality inputs (e.g. data protocols, radio, IR, UV, ultrasonic
etc.)
- Automatic,
high speed multi language translators
- Greater
conscious control of simulation progression and persistence
- Scientific
calculator, thesaurus, dictionary and encyclopedic resources
- Patience,
rationality and deep foresight


Finally,
the human mind is unable to properly render its internal 3D
content to anywhere near the clarity as when 'painted' directly
by the modalities. Thus, we are only really partially conscious;
there is an enormous richness to existence and experience
we are blind to. The mind is full of ghosts rather than realistic
impressions and a ghost world is hard to fully embrace.
The only credible mechanism for self awareness to occur is as a
computational process dealing with information representing
and bound to reality. The self can then exist and be aware
through a process of reflection (simulation) in a time controlled
domain, where emotional grading (feeling) can percolate through
time dilating script trials.
If
you doubt that such processes of simulation and virtual time