Humans
inhabit a world full of language, symbols and images. In
order to bootstrap, an AI will need to inherit as much intellectual
capital as humans are able to provide. The vast majority
of this knowledge is encoded in the form of serial language,
mathematical symbology and 2D imagery. An AI needs to translate
this to 3D simulation scripts for subsequent abstraction
to memory and integration to the world model.
Humans
can interact with computers through keyboard, mouse and
sometimes speech. These are all far too slow for AI. If
a human wants to select a graphical screen object, he needs
complex motor control and accurate visual display and feedback
systems. An AI will need to have direct access to human graphical
user interfaces from the conscious 3D simulation process
itself, without necessarily parsing through language and
especially mechanical mouse pointer control. Thus 2D graphical
interfaces must ideally enter the AI as a secondary enhanced
visual modality, and the spatial/mechanical pointer control
replaced by standard virtual 3D object manipulation using
environmental morph targets with the usual script trial
animation processes for the various screen options.
Of the 120 million or so cones
and rods in a human retina, only one million channels actually
pass through the optic nerve, a substantial reduction in
bandwidth. If you set a VGA (800x600) projector
shutter speed to a quarter second and turn off the color.
The experience may not be pleasant, but you could easily
follow the movie. If your own vision was similarly restricted
with very slightly out of focus glasses and a quarter second
LCD shutter, you could get about quite OK in normal life.
This is a dramatic reduction in vision bandwidth
to a mere 2Mbps uncompressed or 50Kbps compressed video.
A modern PC could handle many thousands of seconds of such
video every single second.
If you consider all the modality
inputs of the human organism, there really isn't the massive
data bandwidth often thought. There's a lot of parallel
processing behind it, but you could almost transmit the
whole modality data set (admittedly much degraded) over
a compressed 56K modem channel. Even if you don't accept
these vision estimates, consider the zero vision input of
blind people, who manage great intelligence from the remaining
sound, touch, taste and smell senses. Yet another vast bandwidth
reduction.
The realization that the essence
of 3D simulation can be built from touch (spatial animation)
and sound alone is quite intriguing. It implies cognition
does not need a visual 2D render plane to concretize simulations.
How can a 3D animation be emotionally graded without first
being rendered to a perspective? More research is needed
here.
Is conscious experience always
derived from the subconscious simulation layers, primarily
rendered back to a 2D perspective at the visual cortex region
- to be perceived as mental imagery, or can the simulations
be fully 'experienced' in their native 3D form? What role
does rendering thoughts down to 2D vision provide?
Are un-rendered (non visualised)
thoughts the origin of background feelings, again, implying
emotional grading occurs seperate from visual imagery. -
In a sense, feeling what is occuring in the simulation,
despite the 'curtain' being pulled.