We will
reduce the task of creating AI to the task of finding an appropriate language
for description of the world. This will not be a programing language because programing languages
describe only computable functions, while our language will describe a somewhat
broader class of functions. Another specificity of this language will be that
the description will consist of separate modules. This will enable us look for
the description of the world automatically such that we discover it module
after module. Our approach to the creation of this new language will be to start
with a particular world and write the description of that particular world. The
point is that the language which can describe this particular world will be appropriate
for description of any world.
Keywords: Artificial
Intelligence, Language for
description of worlds, Event-Driven Model, Definition of Property, Definition of Algorithm.
Introduction
Our task is to
understand the world. This means we have to describe it but before we can do so
we need to develop a specific language for description of worlds.
The starting point
of our language development process is answering the question “What is the
world?” It will be shown that the world can be thought of as a function f, which is a function from
In order to find a
more simple description, we will admit a degree of randomness in the world and
will look for a non-deterministic description. To make the f function non-deterministic we will substitute the initial state
of the world with a set of states.
The set of states
is a rough description of what we know about the state of the world. A more detailed description
will be obtained when we have multiple sets of states and assign to each set
the probability of the world’s state being in that very set of states. In our
terminology, the partitioning of the set of states in disjoint subsets will be
a question, while the resulting subsets
will be basic answers. The answer to a question will be the probabilities we have assigned to the basic answers.
We will define the
non-deterministic function f through
an answer to a question. Then we will sophisticate the definition by presenting it
through an answer to a group of questions (in this article the term group will be synonymous to set).
We will assume that we do not ask irrelevant questions.
For a question to be relevant, the answer to that question should be i)
dependent on the past and ii) have some impact on the future. The definition of
the world will depend significantly on the group of questions we have asked. We
will assume that we have asked sufficiently many questions so that we can
present the world as a Markov Decision Process (MDP) which satisfies the Markov
property.
In this article we
present the world (the f function) in
a quite complicated way, using some convoluted and unintelligible formulae. The
only purpose of these formulae is to give us an idea of how the world we trying
to describe looks like. These formulas will not be involved in the creation of
AI and will not be employed by AI itself in its reasoning.
We will not try to
find the f function as such but an
approximation of that function, which will be referred to as function g. The approximated function also will depend on the
group of questions, however, here the questions will be abstract. In other
words, we will not be interested in how we partition the set of states of the
world. Instead, all that will be important is how many questions are out there
and how many are the answers of each question. That is, how many times we
have partitioned the set S to basic
answers and how many basic answers are in each of these partitions.
So the g function describes the world. But is
the g function computable? If the
function were computable, the language which describes the world would be some
form of programming language. We will find out that the computable functions
are not enough and therefor we will describe a somewhat broader class of
functions.
We will show that
once we have the language for description of worlds, creating Artificial
Intelligence becomes easy. In order to understand the world, we need to find
its description. One way to do this is by an exhaustive search (brute-force
search) in all possible descriptions. This approach is foredoomed to be
unsuccessful. We will take a smarter approach by assuming that the description
of the world consists of individual modules which can be discovered one by one.
How many agents will live in the world? Each world has at least one
agent – the protagonist. We will consider two versions in the world we are going to describe. In the first version,
the protagonist will be alone and will play against himself. In the second
scenario there will be another agent who will play against the first one.
Although the new
language will not be a programming language, we will create it using the same
approach that we would employ for developing a new programming language. We
will start with a particular problem (i.e. a specific world) and will try to write a program
which solves this problem (provide a description of the specific world).
The specific world
we are going to describe will be the one in which the agent plays chess. We
have written a program [8]
which contains a simple description of this world and by this description emulates
the world. Run this program and see how simple that description (of the chess
game) happens to be. The description consists of 24 modules presented in the
form of Event-Driven (ED) models. In addition to the ED models there are also
two moving traces (i.e. two arrays). The ED models of the chess game belong to
three types: (5 patterns + 9 algorithms + 10 properties = 24 models). We added
also seven simple rules which we need too.
It is important
that our agent will not be able to see everything in this world. Rather than
seeing the whole chessboard, the agent will only be able to see one square at a
time. Thus we will be exploring a Partial Observability world, which is more
interesting because if the agent were able to see everything the task would not
be so stimulating. When the agent can only see a
fragment of the
picture, he will need to figure out how the rest of the picture (the one he
cannot
see) looks like.
We will use simple
coding in order to limit the number of the commands given by the agent. To this
end, we will divide the steps on three groups (the step number modulo 3). The specific command will
depend on the step at which the agent gives that command.
This type of coding
will provide the first pattern, namely “1, 2, 3”. Similar to all other patterns in the world, the “1,
2, 3” pattern will be presented through an Event-Driven model. An ED model is
an oriented graph consisting of several states and arrows between the states.
The arrows are associated with events which in turn are presented as
conjunctions. An event can be true
sometimes and false at other times.
We will assume that
something special occurs in certain states of the ED models. In our terminology this
special occurrence will be a trace.
The trace is what makes the model meaningful because if all states were the
same it would not matter which state we are currently in.
The next two
patterns will be Horizontal and Vertical. These patterns will provide the x and y coordinates of the
observed chess square. The Cartesian product of these two models will be the
model of the whole chessboard.
The trace of this
Cartesian product will be the chessboard position. It will be a “moving” trace
because the chessboard position is not still but keeps evolving.
Now that we have
described the chessboard we should describe how the chess pieces move. This is
where we need algorithms. What is an algorithm? We presented the patterns as
Event-Driven models and will likewise use ED models to present algorithms. In
our understanding, an algorithm is a sequences of actions (events). The
programs which calculate
We will describe
the algorithms which determine how the chess pieces move. For example, the
king’s algorithm is quite simple because it is deterministic, while the rook’s
algorithm is more complex because it contains non-deterministic furcations.
Very few authors
have ventured to define the term algorithm.
The only attempts at an algorithm definition known to us are those of Moschovakis [1, 2].
In order to
demonstrate that our definition of an algorithm makes sense, we will present
the Turing machine as an Event-driven model. We will thus show that our
definition covers the meaning which other authors imply in the term algorithm.
We will present the
Turing machine as a separate world. The description of that world will consist
of two ED models, the first of which will describe an infinite tape and the
second model will describe the Turing machine as such. A proper algorithm will
be the second ED model because the first one simply describes a tape of
infinite length. An algorithm does not make sense unless we have something to
run the algorithm on. Likewise, a Turing machine will not run unless we supply an
infinite tape for the machine to run on.
Another fundamental
term introduced in this article is property.
This concept will also be a pattern and will also be presented through an
Event-Driven model. We will use properties in order to code the agent’s input.
Although the input will consist of as little as four characters, these four characters
will enable us present infinitely many properties.
The first world
which we described is deterministic and for our description we used a g function which is deterministic. Most worlds however are not deterministic,
hence our language for describing worlds should also be able to describe non-deterministic
functions. As soon as we introduce a second agent, the world becomes non-deterministic
because the exact behavior of the second agent is unknown.
The description of the world will include various events. Most of these events will be
linked to the input and to the output. Certain events will be linked to the
states of certain ED models. There will also be random events which occur with
a certain probability.
The description will go as far as allowing for impossible events. These will be
events which never occur in reality, but can occur in our imagination.
Regardless of their impossibility, these events will help us describe the world
because they will be intertwined in various algorithms.
The first world we
described included just one agent (protagonist) playing against himself. We
will describe a second version of the chess game where, in addition to the
protagonist, there will be a second agent (antagonist). Importantly, the state
of the world will be different for the two agents. The state of the world is
the state in which each agent is. The context is: “who”, “where”, “when”. While
we can reasonably assume that the “when” context is the same for all agents,
the “where” context may vary across agents. The “who” context may also vary as
some agents may be male while others may be female. One agent can play with the
white pieces and another agent may play with the black pieces.
We have written a
program which emulates the first world. Can we write such a program for the
second world as well? Generally speaking, the answer is “no” because in that
word there are statements such as “this algorithm can be executed” (i.e. “the termination is
possible”) and operations such
“agent executes algorithm”. In the general case, these are undecidable problems
and non-computable operations. So we said that in the general case the world
can be a non-computable function meaning that in certain cases one can describe
the world but cannot write a program which emulates the description. The
particular case of the chess game is different. In that particular case, one
can create an emulating program because everything in the chess game is finite
and everything is computable.
How does the world look
like?
Let us first see
how the world looks like in the eyes of the agent. The agent sees a data stream
or a series of inputs and outputs (observations and actions). This is how the stream
looks like:
v0, a1, v1, a2, v2, a3, v3, …
So the world looks
like (appears to be) a data stream. The appearance however covers the
underlying essence of the world which the agent should endeavor to understand.
What is the
underlying essence? This is the function which generates the data stream. More precisely, it is a
function which takes its arguments from actions (the outputs) and returns its
values in the form of observations (the inputs):
vi = f(ai)
This description is
oversimplified because we assumed that the f
function does not have a memory and does not depend on what has happened
before. We assumed that the f
function depends only and exclusively on the last action of the agent. This
excessive simplification forgoes the core of the problem. Nevertheless, we do
mention this possible simplification because it is used in nearly all articles
dedicated to AI. In literature, the presence of this simplification is termed Full Observability and, accordingly, its absence is denoted
as Partial Observability. In this
article we will assume that the f
function has a memory and accordingly we will be dealing with Partial Observability.
This is how the f function with a memory looks like:
< si+1, vi > = f(si, ai)
Here si is the state of the world. The
state of the world is the memory of the world. But, the f function is the world proper, meaning that si is the memory of the f function. In other words, si is what the world
has memorized at moment i.
The f function takes the current state of
the world and the agent’s action as an argument, and will return two values (as
a 2-tuple). The first value returned will be the new state of the world (the new
value of the memory) and the second one will be the new observation.
Can we assert that
the f function describes the world
and is therefore its underlying essence? Not quite, because we also need s0. Hence, we need to add the
initial state of the world (or the initial state of the memory). Now we can say
that the pair <f, s0> is the world and that at least
it describes the world precisely.
Note: We will assume that the agent’s action and
observation are letters from a finite alphabet and the state of the memory is a
real number. In other words we will assume that the amount of information
received and transmitted at each step is limited while the world’s memory is the
continuum. We need a continuum size to ensure that the memory of the world can
store an infinite sequence of 0’s and 1’s (the number of these sequences is the
continuum).
We do not need
sizes larger that a continuum because any f
function the memory of which is larger than a continuum can be expressed as
function with a continuum memory. Let us take the equivalence relation “undistinguishable
states” (i.e. two states are equivalent if it does not matter which state we
start from). The size of the factor set of this relation is not larger than the
continuum. We can express f as a
function over that factor set instead of a function over S. In this way we will present f
as a function with a continuum memory.
All continuum-sized
sets are isomorphic among themselves. Hence, we can choose any one of these
sets. And we selected the most prominent of these sets – the set of real
numbers
Note: We will make the picture more
colorful and interesting by assuming that not all actions of the agent are
allowed. Therefore, certain actions will be allowed at certain moments and
other actions will be allowed at other moments. For this purpose we will assume
that the f function is not a total
one and therefore it is undefined for some “state of the world – action” tuples.
This is how from the partial function f
we will obtain the total function f1:
Here undef is a special constant equal to the agent’s observation
when he tries to play an incorrect move.
A simple non-deterministic world
Although we did a
good job defining the world as a deterministic function, this is inconsistent
with our idea that the world is non-deterministic and involves randomness. How
can we introduce randomness? By replacing the concrete states of the world with
sets of states. When we know exactly which state the world is in, the f function will be deterministic and we
will be able to tell exactly the next state of the world and the next
observation. It may also be the case that we know the state of the world only approximately
rather than exactly. In this case we will have a set of states instead of a
single state.
We will also have
to introduce probability. We must be able to tell the probability of the
various subsets of
W = <ℝ,
F, P>
F here will be the set of the measurable subsets of
But, we want to have probabilities for all subsets of
In order to define
the f4 function – which
is the non-deterministic version of the f function – we will
need two more versions of f:
f2(M, a, v)= { s | $s′ÎM, < s, v > = f1(s′, a) }
This is the image
of M after we have performed action a and have seen observation v.
f3(M, a, vi) = <M′, pi>
= <f2(M, a, vi), a.P(Mi)>
The f3 returns a set and a
probability value. The set is the image of M and pi is the probability that the
next observation will be vi.
We have n+1 possible results (n possible observations and one possibility
that a is an incorrect move). Action a partitions set S into n+1 disjoint subsets Ai.
Ai = {
s | < s′, vi > = f1(s, a) }
From these subsets we obtain Mi=MÇAi. Each Mi
set has its own probability or interval of probabilities, namely the
probability of the result vi. Here vi traverses all possible observations. The last
possible result is undef. In this
case the state of the world remains the same, i.e. the M set is reduced to Mn+1. Therefore, the next set of states will be Mn+1= f2(Mn+1, a, undef)= f2(M, a, undef).
The probability of
each result is the probability of the corresponding set Mi. However, these probabilities should be normalized,
i.e. their sum should be 1. To do so, we will move to a probability which is
relative to the probability of M by
multiplying it by some normalizing constant a. In our case, normalization is the
transition to the conditional probability under the condition M. This will be done by multiplying by
some constant a. If the probability of M is p, then the
conditional probability will be the obtained by dividing by p. If the probability of M is in the
interval [a, b], then we have to
divide by b.
From the f3 function we will easily obtain
the non-deterministic function:
f4(M, a)
This function returns n+1 possible results in the form of < f2(M, a, vi), vi> and each result will be returned with the probability pi, which is given by f3.
We presented the deterministic
world as the 2-tuple <f, s0>, so the simple non-deterministic
world will be the following 3-tuple:
< f4, M0, W>
Here M0 is some set of states of the
world (the initial set) and W is some probability space. There
are many probability spaces which can be chosen for W and the various spaces will give
various non-deterministic worlds.
What is a question?
We described the state of the world in an extremely simplified way – by partitioning the set S in two parts (M and S\M) and saying that the probability
the current state to be in the first set is 1 and to be in the second set is 0.
A better way of
describing the state of the world would be to partition S into finitely many disjoint sets Qi (or into a countable number of such disjoint sets) and
match each Qi set with a
probability or an interval of probabilities.
Some definitions:
Definition: A question is the partitioning of set S into a finite or countable number of disjoint subsets Qi, where "i P(Qi)¹0. Each Qi subset will be a basic answer.
Definition: An answer to a question is
some congruence which matches each basic
answer with a probability or an interval of probabilities.
Definition: A group of questions is a finite or countable set of questions.
Definition: An answer to a group of questions is the set of the answers obtained by
a group of questions wherein each
question in the group of questions has exactly one answer.
Definition: “Do not know“ is the answer
to a question where each basic answer to this question is matched
with the probability interval [0, 1].
The answer to a group of questions is
“Do not know” if the answers to all
questions in the group are “Do not know”.
Definition: An answer to a question will
be deterministic if the probability of one of its basic answers is 1 and the
probabilities of all other basic answers are 0. The answer to a group of questions will be deterministic
if the answers to all questions in the group are deterministic.
Arriving at a definition on
the back of a question
We will describe
the state of the world by answering a certain question. It can be assumed that
the answer AQ to question Q transforms the probability space W into a new probability space W′ where the probability of each
set Qi (of each basic answer) is
multiplied by some factor qi. How shall we calculate these
factors?
qi =AQ(Qi)/P(Qi)
When the value of Qi in AQ is a number and the probability of Qi is also a number, then qi is a number, too. If these are
the intervals of probabilities [a′, b′] and [a, b], then qi is the interval of factors [a′/b, b′/b].
We want get the new
idea of the state of world in the form of an answer to a question. We cannot
seek answers to all questions because they are uncountably many. Therefore, we
will choose a new question N and try
obtain the new idea of the state of world as an answer to the question N.
What is the
probability pi for the next
observation to be vi, if the action is a?
That probability is
equal to the probability of the set Ai in the probability space
W′, which can be expressed as the
sum shown above.
Which new answer ANi to the question N will describe the
state of the world provided that we (i) have started from a state described by
the answer AQ, (ii) have performed action a and (iii) have seen observation vi?
Here r traverses the basic answers to Q, j traverses the basic answers to N and the constant ai normalizes the answer ANi.
From the f function we will develop one more function and will
name it f5.
<ANi, pi> = f5(AQ, a, vi, N)
Here ANi is an answer to the question N and pi is a probability or an
interval of probabilities. The value of pi is equal to the
probability that the next observation is vi. That value does not depend on question N. The answer ANi is the one which will describe
the new state of the world if the observation is vi.
Arriving at a definition
on the back of multiple questions
Thus far we
described the state of the world using the answer to a single question. Now
let’s replace the single answer with a group of answers. Will examine the case
where (i) the group consists of two questions and (ii) the answers do not
contain probability intervals (i.e. each basic answer comes with an exact
probability).
Let’s have the two questions B and C and their basic answers Br and Cs. Let’s also have some factors krs which depend on the questions
only and not on the answers.
Let
Here P′ is the new probability in the new probability space W′ which we obtain
after considering the answers AB and AC to the questions B and C. Now
we have to find the factors trs. How can we find them?
When some of the factors k is 0, then the
corresponding factor t is also 0. When all factors k
are 1, then all factors t will be 1,
too. In the general case, the factors t
depend on the answers and we will calculate them using the following equalities:
|
(1) |
|
(2) |
We will easily find
factors br which satisfy equalities (1). Then we will easily find
factors cs which satisfy
equalities (2). Unfortunately, when
we fix equalities
(2) we
will unfix equalities (1). We will assume that we have found factors br and cs such that when trs=br.cs then both (1) and (2) are satisfied. Here we do not
say how we will find these factors.
Thus, we have found
the factors trs and thereby the
factors qrs. So we can extend the f5 function and now it will use an idea of the state of
the world described through an answer to a group of questions (rather than to a
single question as per the previous section).
< ANi, pi> = f5(AG, a, vi, N)
Here pi and ANi are:
A real non-deterministic world
Very important is
what the questions that
we are going to ask are. As we said, we cannot afford
to ask all questions because they are uncountably many. Therefore, we will
assume that (i) we have fixed the group of questions GQ and (ii) both the current state of the world and the new state
of the world are described by an answer to that group of questions. We will
define the function f6 such that it depends on GQ although GQ is not an
argument of f6.
The f5 function provides us with an
answer to a single question and enables us find answers to all questions
in the GQ group (one at a time). Thus,
from f5 we obtain f6.
< AGi, pi> = f6(AG, a, vi)
Here AG is an answer to the questions in the GQ group. The answers AGi are also answers to the GQ questions.
Using the deterministic function f6 we will construct
a non-deterministic function.
f7(AG, a)
The f7 function will (i) return n+1 possible answers in the form of <AGi, vi> and (ii) each of these answers will be returned with the corresponding probability
pi.
So far, we presented the deterministic world as the 2-tuple <f, s0> and the simple
non-deterministic world as the 3-tuple < f4, M0, W>. Now, the real
non-deterministic world will be the 4-tuple
< f7, Answer0, GQ, W >
Here Answer0 is some answer to the group of
questions GQ (the initial answer we will
start from). We will assume that the initial answer is possible. An example of
an impossible answer is where the answer is deterministic and the corresponding
factor krs equals 0. In other words, an impossible situation is
where (i) questions B and C cannot have answers r and s at the same time and (ii) another possibility does not
exist.
We could replace
the group of questions with a single question which has many basic answers, but
we will not do this because we prefer to have many questions each of which has
only a few basis answers.
Irrelevant questions
The
non-deterministic world hinges heavily on GQ (the group of questions
we choosed). If we chose an empty group of questions Æ, we will have only one
possible answer: the empty answer Æ. In this case we
can assume that we have only one state of the world and enjoy Full Observability.
The more questions
we include in the group and making the partition the S set finer, the more accurate our description of the f function will be. For example, if the partitioning
coincides with the factor set of the relation “undistinguishable states” and if
the initial answer is deterministic, then we will obtain a deterministic world.
How many questions
should we ask? Are there irrelevant questions? Yes. For a question to be
relevant, its answer must somehow depend on the past and somehow influence the
future. If the answer to the question does not anyhow depend on the past or
does not anyhow influence the future, then the question is irrelevant.
An example of an
irrelevant question is “Which side of the coin will fall upside? Heads or
tails?”. Despite all the
importance of that question, we will not include it in the group because the
answer to the question does not anyhow depend on what has happened before. In
other words, the answer to the question does not depend on the past and we
cannot know the answer before we toss the coin.
Another example is
“How many beans of peas are there in my dish? Odd or even number?”. The answer to that question
has no impact on the future whatever. Neither does the answer depend on the
past, unless we have made the effort to count the pea beans.
Note: A question is irrelevant if it
contains two needlessly separated basic answers. They may be needlessly separated from the perspective
of the past in case that regardless of the previous answer, of the action
performed and of the observation seen, the two basic answers obtain always the
same factor q. They are needlessly separated
from the perspective of the future in this case: If we take the two deterministic
answers which provide 1 to those basic answers then these two deterministic
answers produce one and the same forecast for the future. Therefore, these
basic answers do not anyhow affect the future.
We will assume that
we will not be asking any irrelevant questions. It follows therefore that the
classes of the relation “indistinguishable questions” cannot be partitioned any
further because the answer to the question which leads to such partitioning
does not anyhow affect the future.
We presented the non-deterministic
world as a Markov Decision Process (MDP). There are three minor
differences from the classic MDP definition:
1. The set of states here is the
set of intersections between basic answers. The latter set may not always be
finite. It may be infinite or even reach the size of a continuum.
2. The probabilities are replaced
with probability intervals.
3. We do not have rewards here
because the objective is only to find a model while rewards are needed when we
try to find a strategy.
Does our MDP satisfy
the Markov property? This property means that the MDP model cannot be further
improved. The property will be satisfied when we have asked a sufficient group of
questions.
Definition: A group of questions is sufficient
when every next question is either irrelevant or incapable to produce further partitioning
of the set S.
The Markov property
says that the future depends only on the state, while the way by which we have
arrived at that state does not matter. Therefore, every next question which
leads to further partitioning will either be independent from the past or will
not affect the future.
We will assume that
GQ is sufficient group of questions. Otherwise the description of
the world would be rough, but this is not enough for us because we aim at a
description which is maximally precise. For the same reason will also assume
that the initial answer is deterministic.
Now that we have an
idea about how the world we aim to describe looks like, we should find a way to
make this description.
How shall we describe the
world?
Thus far we have
described the world in a very complicated way. We used the f function and a group of questions. The only purpose of that
complicated description was to give us an idea about how the world we are
trying the describe looks like. For arriving at the actual description of the
world we will use a very straightforward g
function.
First, the
questions we are going to ask will be abstract ones, meaning that we will care only
about the number of questions and the number of answers to each question. We
will not care about the way the S set
has been partitioned to basic answers or about the probability space W from which we would
have obtained the probabilities.
All we know about
the world (the f function) is the
life (the sequence of inputs and outputs). We will aim to find the g function on the basis of this
information in the hope that g will
successfully predict the future behavior of the world. Because life is finite,
we can find a computable deterministic function g which provides a precise description of life until now. Such a
function may turn out to be excessively complicated which is not good because
overly complicated functions are highly unlikely to produce a precise
description of the future (Occam’s razor principle). If we let randomness come
in, then we may be able to find a much simpler non-deterministic function g which is still capable to describe
life albeit with an approximation because the description thus obtained would
include randomness. We will try to reconcile the two requirements and obtain a g function which is both maximally
simple and maximally deterministic.
We will describe
the state of the world through the answers to the questions asked. If the
number of questions is countable, the answers will be a continuum. In this case
only part of the answers will be describable and we will assume that the g function operates only with the
describable answers. Thus, the g
function is defined only for the describable answers and always returns a
describable answer.
In order to
describe the g function, it would be
sufficient to describe the behavior of this function with deterministic answers. In order to extend the g function to non-deterministic answers,
we will need the factors krs, but will typically assume that
the answers are independent, i.e. the krs
factors are equal to 1. In certain cases we may assume that a combination of
answers is impossible or highly improbable, i.e. the corresponding factor is
zero or close to zero.
While we will be
aiming to find the right set of questions, we will never be sure about the
sufficiency of the questions or about the absence of any irrelevant questions.
There are two
reasons why we will not look for the f
function and will only try to find its approximation, the g function.
1. The f function may not be
describable, i.e. the function may not have a description which can be found.
2. The f function may be
describable, but too complicated for us to find its description. That is why we
will be looking for an approximation which is simpler although not very
precise.
The general form of
the g function is:
< Answer′, p> = g(Answer, a, v)
where Answer is an answer to the questions asked, Answer′ is the new answer obtained
after we have played a and have
observed v, and p is a probability or
an interval of probabilities.
Computability of the world
The first question
we are going to ask as we proceed to describe the world is whether the f function is computable. The answer is
that the function is probably non-computable. The second question is whether
the g function – which describes the f function – must be computable?
We can imagine the world and the agent as two black boxes which exchange information. The
function which describes the agent must be computable, otherwise we would not
be able to write a program which computes this function and will therefore fail
to create AI. On the other hand, we can comfortably live with a non-computable g function (the function which describes
the world). We are not going to create the world because it is already there
and all we need to do is describe it. However, can we describe a non-computable
function? Yes, we can, but of course we would not be able to compute what we
have described.
Indeed, in our own
perception the world is non-computable. Let us take as an example the following
rule: “A statement is true if there is proof for that statement”. This rule is non-computable
because the question about the existence of proof is semi-decidable.
If we had to create
the world (i.e. emulate it with a computer program), what would we do? We would
need a g function which, besides
being computable, is computable in reasonable time. For example, in a chess
game each position is either winning or non-winning. While we can calculate
whether a position is winning, it would take more than a lifetime to compute
whether the initial position is
winning. In fact the lifetime of our Galaxy would be too short for such a
computation, even if we use the most powerful computer which we have for the
moment. This phenomenon is known as combinatorial
explosion and means that some functions are computable in theory but not in
practice.
In other words, we would
not bother about the computability of the g
function or about the easiness of its computation. This means that the language
which describes the world can even describe non-computable worlds. But, will
our language be able to describe indescribable worlds? Can a world be
indescribable? Yes, it can, but if a world is indescribable we would not be
able to describe it. Whatever language we choose, the descriptions will be countably
many but there will be lot more worlds than descriptions. So, there will
obviously be indescribable worlds, but we will stick to the describable ones
and assume that each indescribable world has a describable counterpart which is
sufficiently similar to it.
The essence of the AI program
As we said before, we will reduce the task of creating AI to the task of finding a language which can describe worlds. If we had that descriptive
language, how can we create the AI program by this language?
In [7] we wrote
that AI should answer two questions: “What is going on?” and “What should I
do?”. In this article we will not deal with the second question, but if we have
understood the world, in our mind we can make a few steps forward and using an
algorithm similar to Min-Max select the best action which is likely to bring
about the best possible future development.
Answering the first
question is more difficult. The first question is linked to understanding the world,
i.e. to finding the f function which
is the underlying essence of the world. Finding of the f function is tantamount to finding its description written in some
language for description of worlds. As we said before, in addition to the
function we have to find the current state of the world (the memory value). In
order to find the current value, we will again need the language for
description of worlds because that language will give us the format of the
memory. In other words, before we find the current memory value, we must know
the format in which the value is stored. (As we saw from the description of the
f function, the memory format will be
the group of questions GQ or more
precisely the abstract version of the GQ
group which describes only the number of questions and the number of answers to
each question.)
If we have the
language for description of worlds, we would know what we are looking for and
will easily write a program which searches for such description. Of course, it
will not be easy to make this search efficient.
The description
must be discoverable. For this purpose it has to consist of individual modules (individual
questions). These modules must be patterns which can be discovered one by one.
For example, if you try to guess my email password, you will have hard time
because the possible combinations are far too many. Your task would be much
easier if you were able to guest the password characters one by one. For the
first character there are not too many choices and you can easily guess it,
provided however that I am kind enough to tell you whether you have guessed it
or not. Therefore, the description has to consist of simple individual modules
such that each module contains some telltale specificity by which it can be
discovered.
How many agents will be
there?
In every world there
is at least one agent – the protagonist. This is the agent who receives inputs from the world
and returns outputs which have an impact on the world. You are the protagonist
in your world. You have a model of the world and your model describes many
other agents. These agents are other humans as well as animals, deities and
even inanimate objects because we expect some objects to perform certain
actions from time to time. For example, your car breaks down. You might assume
that the perpetrator of this action is the car itself. Conversely, when your
car is in good order and running on the road you will not assume that it is
driving on its free will, but this is exactly how dogs perceive the world. A
dog barks at the car, not at the driver. I.e. the dog perceives the car as a
separate agent and does not realize that it is driver inside who runs the car.
You may assume that
in your world there is just one agent and that agent is you. As concerns all
other people, you may assume that they are just a product of your imagination
rather than real beings. Each world can be presented in a single-agent or
multi-agent model, however the multi-agent model is a far more natural and
understandable. This is the reason why most of us believe that they are not
alone in the world and that in the world there are other people (agents) as
well.
The particular
world we will look at is the chess game and we will explore two versions of
that world. In the first version there will be a single agent playing against
himself. In the second version there will be two agents. The protagonist will
play with the white pieces and there will be a second agent (antagonist). The
second version is more interesting and sophisticated, but also more difficult
to describe.
Why a particular world?
When we develop a
new programming language, we do not create it in one go but first choose a
particular problem and write a program which solves that problem. Then we pick
another problem and write another program. Thus, we improve the language by
tailoring it to the problems we need to solve.
We will take a
similar approach in our quest for a language which describes worlds. We will
take a particular world and will try to describe it in some language.
Initially, that language will not be universal and capable of describing any
world, but it would be a success if the language manages to describe the first
world. We will by and large improve that language by applying it to various
worlds until we eventually end up with the universal language we need.
The first world we start from is the chess game. More precisely, this will be a
world in which the agent plays chess. Will the agent be solitaire in this world
or will the world include another agent – an antagonist (opponent) who moves
the black pieces?
We will start with
the simpler version where the agent (the protagonist) plays solitaire by moving
both the white pieces and the black pieces. Once the agent plays a white piece
he will change sides and will play a black piece next. This is how people play
when they are quarantined, jailed or placed in other isolation. Playing against
an opponent is far more interesting, but that makes the world much more
complicated. In the two-players case the world becomes non-computable because
we will have an agent who plays black pieces and this agent executes an
algorithm.
In the
single-player case the world will be computable except for one rule: we cannot
play a move after which we will be in check. This means we cannot play a move
if it enables the opponent to capture our king in the next move. That is, if
there is an algorithm by which the opponent can capture our king. (More precisely, the
algorithm for moving the pieces is given, but there is a specific execution of
this algorithm in which our opponent captures our king.)
Now I can hear you
saying that this is perfectly computable so that if I tell you a move and a
position, you will tell me right away whether you will be in check after that
move. Correct, this is computable with chess, because everything about chess is
finite (a finite 8 x 8 board). In the general case however the existence of an
algorithm is non-computable. (Whether there is a specific execution of an
algorithm is non-computable.)
Which particular world?
Let the agent play
chess by moving chess pieces on a chessboard. This world is emulated in the
computer program [8] written in
language [9].
Figure
1
The left-hand side
of Figure 1 shows the stream of input-output information, in fact not the full
stream, but only the last 50 steps. The top row shows the agent’s observations
and the bottom row – the agent’s actions. There are four possible observations:
{0, x, y, z}. The possible actions are also
four: {0, a, b, c}. For the sake of legibility the
nil and character “c” are replaced with point and minus. In the second row we
can see which actions are permitted at the present moment (2 means that action a
is incorrect, 4 means that b is incorrect and 6 means that both
a
and b
are incorrect).
All the agent can
see is the left-hand side of Figure 1. The agent cannot see what is in the
right-hand side, and must figure it out in order to understand the world. In
the right-hand side we can see (i) the position on the board, (ii) the piece lifted
by the agent (knight), (iii) the place from which the knight was lifted (the
yellow square) and the square observed currently (the one framed in red).
Partial Observability
We have restricted
the agent’s observability to a single square (the currently observed one)
rather than letting him see the full chessboard. The agent can move his gaze
from one square to another and thus obtain an overview of the full board.
Nevertheless, the important assumption here is that the agent cannot observe
the entire board so we have Partial Observability. This requires the agent to
imagine the part of the world which he cannot see at the moment. All the agent
can see is a single square, therefore the rest of the chessboard will be in his
imagination.
The agent’s
imagination must cater on some idea about the current state of the world (i.e.
about si). What sort of idea? In the best-case
scenario, the agent should know the current state of the world with utmost
precision (i.e. should have a deterministic answer to all questions asked). If
utmost precision is not achievable, the agent should have some approximated
knowledge of the state of the world (deterministic answers to certain questions
and non-deterministic answers to all other questions, e.g. several possibilities
or “Do not know”). When the agent moves a piece on the board and his virtual
opponent makes a countermove, in the first instance that countermove would be
unknown to the agent. What the agent can do is to obtain an overview of the
entire chessboard (by moving the currently observed square) and figure out what
the opponent’s last move has been. Thus, from a non-deterministic answer the
agent can arrive at a deterministic one.
We use coding
The agent will be able to do 8 things: move his gaze (the
square currently observed) in the four directions, lift the piece he sees at
the moment and drop the lifted piece in the square he sees at this moment. The seventh and eighth thing
the agent can do is do nothing.
We will limit the
agent’s actions to
the four characters {0, a, b, c}. The 0 and ‘c’ symbols will be
reserved for the “do nothing” action. This leaves us with 6 actions to describe
with as little as 2 characters. How can we do that? We will do that by coding:
Let us divide the process in three steps. Every first step will describe how we
move the square in horizontal direction (i.e. how we move the observation
gauge). Every second step will describe how we move the square in vertical
direction and every third step will indicate whether we lift a piece or drop
the lifted piece.
We mentioned in [5]
that we should avoid excessive coding because the world is complicated enough
and we do not want to complicate it further. However, out coding here is not
excessive because it replaces eight actions with four and therefore simplifies
the world rather than complicate it.
Two empty actions
Why do we introduce
two actions that mean “I do nothing”? Actually, when the agent stands and does
nothing, he observes the world. The question is, will he be a passive observer
or he will observe actively?
When you stand and
observe the world, you are not a passive observer. At the very least, you are
moving your gaze.
All patterns that
the passive observer can see are periodic. In a sense, the periodic patterns
are few and not very interesting. Much more interesting are the patterns that
the active observer can see.
We expect the agent
to be able to notice certain patterns (properties). For example, the type and
color of figures are such properties. When the agent stands in a square and
does nothing, it will be difficult for him to catch the pattern (property),
especially since he may have to catch two or three patterns at the same time.
If the agent is active and can alternate two actions, then the patterns he
observes will be much clearer and more quickly detectable.
To distinguish the
two actions “I do nothing”, we called the second “survey”.
1, 2, 3
The first pattern
which will exist in this particular world stems from our division of the steps in
three groups. Let us name this pattern “1, 2, 3”. The pattern is modeled in
Figure 2.
Figure
2
What is the gist of
this pattern? It counts: one, two, three.
The pattern is
presented through an Event-Driven (ED) model. (ED models were first introduced
in [3, 4] and elaborated in [6]). This particular ED model has three states. The
event in this model is only one and this is the event “always” (i.e. “true” or
“at each step”).
This ED model
corresponds to a question with three basic answers. The question is “In which
state is the model now?”. The number of basic answers to this question is equal
to the number of states because the model is deterministic. If the model were
non-deterministic, the basic answers would be as many as the subsets of states
of the model.
Trace
Does anything
specific occur in the states of the above model so that we can notice it and
thereby discover the model? In other words, is there a “trace” (this
terminology was introduced in [6])?
Yes, in the third
state, action a or
action b (or both) must be
incorrect. The reason is that the third state indicates whether we lift the
piece we see or drop a piece which is already lifted. These two actions cannot
be possible concurrently.
We can describe the
world without this trace, but without it the “1, 2, 3” pattern would be far
more difficult to discover. That is why it is helpful to have some trace in
this model.
The trace is the telltale
characteristic which makes the model meaningful. Example: cold beer in the
refrigerator. Cold beer is what makes the fridge a more special cupboard. If
there was cold beer in all cupboards, the refrigerator would not be any special
and it would not matter which cupboard we are going open.
The trace enables
us predict what is going to happen. If we open the fridge, we expect to find cold
beer inside. Furthermore, trace helps us recognize which state we are in now and
thereby reduce non-determinacy. Let us open a white cupboard, but we do not
know if it is the fridge or another white cupboard. If there is cold beer
inside, then we will know that we have opened the fridge and thus we will reduce
non-determinacy.
We will consider
two types of traces – permanent and moving. The permanent trace will be the special features
(phenomena) which occur every time while the moving trace will represent features
which occur from time to time (transiently).
An example in this
respect is a house which we describe as an Event-Driven model. The rooms will
be the states of that model. A permanent feature of those rooms will be number
of doors. Transient phenomena which appear and then disappear are “sunlit” and
“warm”. I.e. the permanent trace can tell us which room is actually a hallway
between rooms and the moving trace will indicate which room is warm at the
moment.
Rooms can be linked
to various objects. These objects have
properties (the phenomena we see when we observe the relevant object). Objects
can also be permanent or moving and accordingly their properties will be
permanent or transient phenomena. Furniture items (in particular heavyweight
ones) are examples of permanent objects. People and pets are examples of moving
objects. To sum up, a fixed trace will describe what is permanent and a moving
trace will describe what is transient.
We will associate
the permanent trace with a question which always returns the same answer. We
will not deal with these questions since we assume that each question has more
than one possible answer. We will also assume that permanent questions are not
part of the state of the world and are instead inbuilt in the definition of the
g function. Nevertheless, as we
improve the g function and change our
idea of the world, we will assume that some permanent question may evolve into
a real question. For example, “How many doors are there in this room?” is a
permanent one until some enthusiast drills another door.
We will associate
the moving trace with a question such as “Is the cat in the room?”. This question has two answers
which depict the current state of the world. The question “How many cats are
there in the room?” has more answers the number of which depends on the number
of cats we have in the house. If another cat comes by, the model of the world
will change and so will the number of answers to this question.
Horizontal and Vertical
The next Event-Driven
model we need for our description of the world is the Horizontal model (Figure 3).
Figure
3
This model tells us
in which column of the chessboard is the currently observed square.
Here we have two
events: left and right which reflect the direction in which the agent moves his gaze
– to the left or to the right. So, the agent performs the actions a
and b
when model 1 is in state 1. We also have two traces. In state 1 playing to the
left is not possible. Therefore, the left
event cannot occur in state 1. Similarly, we have state 8 and the trace that
playing to the right is not possible.
These two traces will make the model discoverable. For example, if you are in a
dark room which is 7 paces wide, you will find that after making 7 paces you
cannot continue to the left. You will realize this because you will bump
against the wall. Therefore, the trace in this case will be the bump against
the wall. These bumps will occur only in the first and in the last position.
In addition to
helping us discover the model, the trace will do a nice job explaining the
world. How else would you explain that in the leftmost column one cannot play left.
Very similar is the
Vertical model shown on Figure 4.
Figure
4
This model will
tell us the row of the currently observed square. Likewise, we have two events
(forward and backward) and two traces (not
forward and not backward).
It makes perfect
sense to do the Cartesian product of the two models above and obtain a model
with 64 states which represents the chessboard.
The bad news is
that our Cartesian product will not have a permanent trace. In other words,
nothing special will happen in any of the squares. Indeed, various things
happen, but they are all transient, not permanent. For example, seeing a white
pawn in the square may be relatively permanent, but not fully permanent,
because the player can move the pawn at some point of time.
Thus we arrive at
the conclusion that the trace may not always be permanent.
The moving trace
As we said, moving
traces are the special features which occur in a given state only from time to
time (transiently), but not permanently.
How can we depict a
moving trace? In the case of permanent traces, we indicated on the state
whether an event occurs always in that state (by using red color and
accordingly blue color for events which never occur in that state).
We will depict the
moving trace by an array with as many cells as are the states in the model
under consideration. In each cell we
will write the moving traces which are in the corresponding state. That is, the
moving trace array will be changing its values.
Here is the moving
trace array of the Cartesian product of models 2 and 3:
Figure
5
This moving trace is
very complicated because it pertains to a model with 64 states. Let us take the
moving trace of a model with two states (Figure 6). This is model 4 which remembers
whether we have lifted a chess piece. Its moving trace will remember which the
lifted piece is. Certainly, the model will also have its permanent trace which
says that in state 2 there up is
impossible and in state 1 there down
is impossible.
The moving trace of
that model will be an array with two cells which correspond to the two states
of the ED model. The cell which corresponds to the current state is framed in
red. The content of the red cell is not very important. What is important is
the content of the other cells because they tell us what will happen if one of
these other cells becomes the current cell. In this case, if we drop a lifted
piece we will go to state 1, where we will see the lifted piece. (We will see
the piece which we have dropped, in this case a white knight.)
Figure
6
We said that the
language for description of worlds will tell us how the memory of the f function looks like. Where is the
internal state of the world stored? At two locations – first, the current state
of each ED model and second, the moving traces. For example, in Figure 5 we can
see how the moving trace presents the position on the chessboard.
If the language for
description of worlds were a standard programming language, its memory would
hold the values of the variables and of the arrays. By analogy, we can say that
the current state of the ED model is the value of one variable and the value of
one moving trace is the value of one array.
The value of the
current state of an ED model is usually a number when the model is
deterministic or several numbers if the ED model has several current states.
The value of each cell of the moving trace array will consist of several
numbers because one state can have many moving traces. Certainly, the permanent
traces can also be more than one.
If we present the
current state of the world by means of questions, we will have one question for
the current state of each ED model and one question for each cell of each
moving trace. The number of answers to these questions depends on the number of
states of the ED model and on whether the model is deterministic, as well as on
the number of moving traces that can be present in the corresponding moving
trace cell.
Algorithms
Now that we have
described the basic rules of the chess game and the position of the chessboard,
the next step is to describe how the chess pieces move. For this purpose we
will resort to the concept of algorithm.
For most people an
algorithm is a Turing machine. The reason is that they only look at
In our definition,
an algorithm can be executed without our participation at all. The Moonlight
Sonata, for example, is an algorithm which we can execute by playing it. However,
if somebody else plays the Moonlight Sonata, it will still be an algorithm
albeit executed by someone else. When we hear the piece and recognize that it
is the Moonlight Sonata, we would have recognized the algorithm even though we
do not execute it ourselves.
Who actually
executes the algorithm will not be a very important issue. It makes sense to
have somebody demonstrate the algorithm to us first before we execute it on our
own.
We will examine
three versions of algorithm:
1. Railway track;
2. Mountain footpath;
3. Going home.
In the first
version there will be restrictions which do not allow us to deviate from the
execution of the algorithm. For example, when we board a coach, all we can do
is travel the route. We cannot make detours because someone else is driving the
coach. Similarly, when listening to someone else’s performance of the Moonlight
Sonata, we are unable to change anything because we are not playing it.
In the second
version, we are allowed to make detours but then consequences will occur. A mountain footpath passes
near an abyss. If we go astray of the footpath we will fall in the abyss.
In the third
version, we can detour from the road. After the detour we can go back to the
road or take another road. The Going home algorithm tells us that if we execute
it properly, we will end up at our home, but we are not anyhow bound to execute
it or execute in exactly the same way.
Typically, we associate
algorithms with determinacy.
We picture in our mind a computer program where the next action is perfectly
known. However, even computer programs are not single-threaded anymore. With
multi-threaded programs it is not very clear what the next action will be.
Cooking recipes are even a better example. When making pancakes, we are not
told which ingredient to put first – eggs or milk. In both cases we will be
executing the same algorithm.
Imagine an
algorithm as a walk in a cave. You can go forward, but you can also turn around
and go backward. The gallery has branches and you are free to choose which
branch to take. Only when you exit the cave you will have ended the execution
of the cave walk algorithm. In other words, we imagine the algorithm as an
oriented graph with multiple branches and not as a road without any furcations.
The algorithm of chess
pieces
We will use
algorithms to describe the movement of chess pieces. We will choose the Railway
track version (the first one of the versions examined above). This means that
when you lift a piece you will invoke an algorithm which prevents you from
making an incorrect move.
We could have
chosen the Mountain footpath which allows you to detour from the algorithm, but
with consequences. For example, lift the piece and continue with the algorithm,
but if you break it at some point the piece will escape and go back to its
original square.
We could have
chosen the Going home version where you can move as you like but can drop the
piece only at places where the algorithm would put it if it were properly
executed. That is, you have full freedom of movement while the algorithm will
tell you which moves are the correct ones.
We will choose the
first version of algorithm mainly because we let the agent play randomly and if
we do not usher him in some rail track, he will struggle a lot in order to make
a correct move. Moreover, we
should consider how the agent would understand the world. How would he discover
these algorithms? If we put him on a rail track, he will learn the algorithm –
like it or not – but if we let him loose he would have hard time trying to
guess what the rules for movement (the algorithms) are. For example, if you
demonstrate to a school student the algorithm of finding a square root, he will
learn to do so relatively easily. But, the kid’s life would become very
difficult if you just explain to him what is square root and tell him to find
the algorithm which computes square roots. You can show the student what a
square root is with a definition or examples, but he would grasp it more
readily if you demonstrate hands-on how the algorithm works.
What will be the
gist of our algorithms? These will be Event-Driven models. There will be some
input event which triggers the algorithm and another output event which will
put an end to the execution of the algorithm. Later on we will clone the
outputs in two (successful and unsuccessful output).
Each chess piece
will have its algorithm:
The king and knight
algorithms
The king’s
algorithm (Figure 7) will be the
simplest one. The input event
will be king lifted. The input point
will be state 1 (this applies to all algorithms described here). The events will
be four (left, right, forward and backward).
Figure
7
The trace will consist
of four events (not left, not right, etc.). These four events (traces)
will restrict the king’s movement to nine squares. Thus, the four events
(traces) will be the rail tracks in which we will enter and which will not let
us leave the nine squares until we execute the algorithm. In Figure 7, the four
traces are marked with red horizontal lines. For example, the three upper
states have the first trace which means cannot move forward from these three
states.
We may drop the
lifted piece (the king) whenever we wish. Certainly, there will be other
restrictive rules and algorithms. E.g. we
cannot capture our own pieces is an example of other restrictions, which
however are not imposed by this algorithm. If we drop the piece in state 1, the
move will not be real but fake. If we drop the piece in another state, then we
would have played a real move.
The knight’s
algorithm is somewhat more complicated (Figure 8). The main difference with the
king’s algorithm is that here we have one more trace. This trace restricts us
in certain states to drop the lifted piece. (Only this trace is marked in
Figure 8, the other four traces are not.) In this algorithm we have only
two options – play a correct move with the knight or play a fake move by
returning the knight to the square which we lifted it from.
Figure
8
The rook and bishop
algorithms
Although with less
states, the rook’s algorithm is more complex (Figure 9). The reason is that
this algorithm is non-deterministic. In state 3 for example there two arrows
for the move forward event.
Therefore, two states are candidates to be the next state. This non-determinacy
is resolved immediately because in state 1 it must be seen that a piece has
been lifted from that square while in state 3 must be seen the opposite.
Therefore, we have a trace which resolves the algorithm’s non-determinacy
immediately.
Figure
9
Even more
complicated is the bishop’s algorithm (Figure 10). The reason is that we cannot
move the bishop diagonally outright and have to do this in two steps: first a
horizontal move and then a vertical move. If the event left occurs in state 1, we cannot know whether our diagonal move is
left and forward or left and backward. This is another non-determinacy
which cannot however be resolved immediately. Nevertheless, the non-determinacy
will be resolved when a forward or backward event occurs. In these two
possible states we have traces which tell us not forward in state 8 and not
backward in state 2. If the not forward
restriction applied in both states, the forward
event would breach the algorithm. But in this case the event is allowed in one
of the states and disallowed in the other state. So, the forward event is allowed, but if it occurs the state 8 will become inactive
and the non-determinacy will be resolved. (In Figure 10 we have marked only the
traces not forward and not backward.)
Figure
10
The most complicated algorithm is that of the queen because it is a
combination of the rook and bishop algorithms. The pawn’s algorithm is not
complicated, but in fact we have four algorithms: for white/black pawns and for
moved/unmoved pawns.
The term algorithm
Importantly, this article defines the term algorithm as such. Very few people bother to ask what is an
algorithm in the first place. The only attempts at a definition I am aware of
are those of Moschovakis [1, 2]. In these works Moschovakis says that most authors define algorithms
through some abstract machine and equate algorithms with the programs of that
abstract machine. Moschovakis goes on to explain what kind of an algorithm
definition we need – a generic concept which does not depend on a particular
abstract machine. The computable function is such a concept, but for Moschovakis
it is too general so he seeks to narrow it down to a more specific concept
which reflects the notion that a computable function can be computed by a
variety of substantially different algorithms. This is a tall aim which Moschovakis
could not reach in [1]. What he did there can be regarded as a new abstract
machine. Indeed, the machine is very interesting and more abstract than most
known machines, but again we run into the trap that the machine’s program may
become needlessly complicated and in this way to morph into a new program which
implements the same algorithm. Although [1] does not achieve the objective of
creating a generic definition of an algorithm, Moschovakis himself admits that
his primary objective is to put the question on the table even if he may not be
able to answer it. His exact words are: “my chief goal is to convince the
reader that the problem of founding the theory of algorithms is important, and
that it is ripe for solution.“
The Turing machine
So far we described the chess pieces algorithms as Event-Driven (ED) models. Should we assert that each
algorithm can be presented as an ED model? Can the Turing machine be presented
in this way?
We will describe a
world which represents the Turing machine. The first thing we need to describe
in this world is the infinite tape. In the chess game, we described the
chessboard as the moving trace of some ED model with 64 states. Here we will
also use a moving trace, however we will need a model with countably many states.
Let us take the model in Figure 3. This is a model of a tape comprised of eight
cells. We need the same model which has again two events (left and right), but is
not limited to a leftmost and rightmost state. This means an ED model
with infinitely many states. So far we have only used models with finitely many
states. Now we will have to add some infinite ED models which nevertheless have
structures as simple as this one. In this case the model is merely a counter,
which keeps an integer number (i.e. an element of
What kind of memory
will this world have? We must memorize the counter value (which is the cell on
which the head of machine is placed). This is an integer number. Thus, we will
have a question with a countable number of answers. Besides this, we will need
to memorize what is stored on the tape. For this purpose we will need an
infinite sequence of 0 and 1 numbers, which is equinumerous to the continuum.
We will express this by countably many questions, each of which has two
answers. We usually use Turing machines in order to compute
Note: The agent’s idea of the state
of the world will be countable even though the memory of the world is a
continuum. In other words, the agent cannot figure out all possible configurations
on the tape, but only a countable subset of these configurations. In this statement we imagine
the agent as an abstract machine with an infinite memory. If we image the agent
as a real computer with a finite memory, in the above statement we must replace
countable with finite. Anyway, if the agent is a program for a real computer, the
finite memory would be enormous, so for the sake of simplicity we will think of
it as countable.
Thus, we have
described the tape of the Turing machine with an infinite ED model. In order to
describe the head of the machine (the algorithm per se) we will need another ED model. We will employ the Turing
machine in order to construct the second ED model.
We assumed that the
machine uses two characters {0, 1}. Let us construct an ED model with four events:
write(0),
write(1),
move left,
move right.
Then each command
to the machine will be in the following format:
if observe(0) then
write Symbol_0, move Direction_0, goto Command_0
if observe(1) then
write Symbol_1, move Direction_1, goto Command_1
Here Symbol_i, Direction_i and Command_i have been replaced
with concrete values. For
example:
if observe(0) then
write(1), move left, goto s3
if observe(1) then
write(0), move right,
goto s7
We will replace
each command with four states which describe it. The above command will take
the following form:
Figure
11
In Figure 11, the
input is over the move right event. In fact
there will be many input paths – sometimes over the move left event and other times over the move right event. Importantly, the input will be non-deterministic
but the non-determinacy will be resolved immediately because the first two
states have a trace. In the top state the event “observe(0)” must always occur
and in the bottom state the “observe(0)” event must never occur.
Thus, each state of
the machine is replaced with four states as shown in Figure 11 and then the individual
quaternaries are interconnected. For example, the quaternary in Figure 11
connects to the quaternary in s3
by arrows over the move left event
and to the quaternary in s7 by
arrows over the move right event.
We have to add more
trace to accommodate the rule that only one of the four events is possible in
each state. The new trace should tell us that the other three events are
impossible. We should do this in case we want a Railway track type of
algorithm. If we prefer a Mountain footpath algorithm, we must add a trace
which tells which consequences will occur if one of the other three events
happens. If we wish a Go home algorithm, then the other three events must lead
to a termination of the algorithm.
Thus we presented
the Turing machine through an Event-Driven model or, more precisely, through
two ED models – the first one with infinitely many states and the second one
with a finite number of states (the number of machine states multiplied by 4).
Who executes the
algorithm of the Turing machine? We may assume the four events are actions of
the agent and that he is the one who runs the algorithm. We may also assume
that the events are acts of another agent or that the events just happen. In
that case the agent will not be the executor of the algorithm, but just an
observer. In the general case, some events in the algorithm will be driven by
the agent and all the rest will not. For example, “I pour water in the pot” is
an action of the agent while “The water boils up” is not his action. The agent
can influence even those events which are not driven by his actions. This is
described in [7]. For these events the agent may have some “preference” and by
his “preferences” the agent could have some influence on whether an event will
or will not occur.
Properties
Having defined the
term algorithm, we will try to define
another fundamental term: property.
For the definition
of this term we will again resort to the Event-Driven models. A property is the phenomenon
we see when we observe an object which possesses that property. Properties are patterns
which are not observed all the times but only from time to time. Given that the
other patterns are presented through ED models, it makes perfect sense to
present properties through ED models, too.
The difference
between a pattern and a property will be that the pattern will be active on a
permanent basis (will be observed all the time) while the property will be observed from time to time
(when we observe the object which possesses that property).
The basic term will
be property while object will be an abstraction of higher
order. For example, if in the chess world one observes the properties white and knight he may conclude that there is a white knight object which is observed and which possesses these two
properties. We may also dispense of this abstraction and simply imagine that
some properties come and go, i.e. some phenomena appear and disappear.
The second coding
The agent’s output consists
of four characters only which made us use coding in other to describe the eight
possible actions of the agent. The input is also limited to four characters.
While it is true that the input will depict to us only one square rather than
the full chessboard, four characters are still too little because a square can
accommodate six different pieces in two distinct colors. Furthermore, we need
to know whether the pawn on the square has moved and whether the lifted piece
comes from that square. How can one present all that amount of information with
four characters only?
That information
may not necessarily come to the agent for one step only. The agent can spend some time
staying on the square and observing the input. As the agent observes the
square, he may spot various patterns. The presence or absence of each of these
patters will be the information which the agent will receive for the square he
observes. Although the input characters are only four, the patterns that can be
described with four characters are countless.
Let us call these
patterns properties and assume that
the agent is able to identify (capture) these patterns. We will further assume
that the agent can capture two or more patterns even if they are layered on top
of each other. Thus, the agent should be able to capture the properties white and knight even when these properties appear at the same time.
How would the
properties look like? In the case of chess pieces, the patterns and algorithms
of their movement are written by a human who has an idea of the chess rules and
of how the pieces move. The properties are not written by a human and are
generated automatically. As an example, Figure 12 depicts the property pawn. That property appears rather
bizarre and illogical. The reason is, as we said, that the property is
generated automatically in a random way. It is not written by us because we do
not know how the pawn would look like. How the pawn looks like does not matter.
What matters is that the pawn should have a certain appearance such that it can
be recognized by the agent. In other words, the pawn should have some face, but
how that face would look like is irrelevant.
Figure
12
In our program [8] there are 10 properties and each of them has some
trace. When several properties are active at the same time, each of them has an
impact (via its trace) on the agent’s input. Sometimes these impacts may be
contradictory. For example, one property tells us that the next input must be
the character x, while another property insists on the opposite (the input
must not be x). The issue at hand is then solved by voting. The world
counts the votes for each decision and selects the one which reaps the highest
number of votes. As concerns contradictory recommendations, they will cancel
each other.
A deterministic world
So far we have described a chess world where the agent plays solitaire
against himself. The description consists of 24 modules (Event-Driven models),
two arrays (moving traces) and 7 additional rules. These rules provide us with additional information
about how the world has changed. For example, when we lift a piece, the
property Lifted will appear in the square
of that piece. We are able to formulate these rules owing to the fact that we
have the context of the chessboard (the moving trace from Figure 5). If we had
no idea that a chessboard exists, we would not be able to formulate rules for
the behavior of the pieces on that board. We wrote a computer program [8] which
emulates this world. In that demonstration program the agent plays randomly. Of
course the agent’s actions are not important. What matters is that we have
described the g function. Therefore,
we have described the world.
The description thus obtained is deterministic, i.e. the initial answer
is deterministic and the g function
is also deterministic. (Meaning that for each deterministic answer and for each
action there will be one observation, for which the g function will return a deterministic answer and a probability of
1. For all other observations the g
function will return a probability of 0.)
A deterministic description means that the described world is free from
randomness. Should the description of the world be deterministic? Should we
deal only with deterministic descriptions? The idea that the world may be
deterministic seems outlandish. And even if it were, we need not constrain
ourselves to deterministic descriptions.
If we apply a deterministic description to a non-deterministic world,
that description will very soon exhibit its imperfections. Conversely, the
world may be deterministic but its determinacy may be too complicated and
therefore beyond our understanding (rendering us unable to describe it). Accordingly, instead of a
deterministic description of the world will find a non-deterministic
description which works sufficiently well.
Typically, the world is non-deterministic. When we shoot at a target may
miss it. This means that
our actions may not necessarily yield a result or may yield different results
at different times.
We will assume that the f function
may be non-deterministic. For most authors, non-deterministic implies that for
each possible value of the f function
there is one precisely defined probability. In [6] and [7] we showed that the
latter statement is too deterministic. Telling the exact probability of
occurrence for each and every event would be an exaggerated requirement.
Accordingly, we will assume that we do not know the exact probability, but only
the interval [a, b] in which this
probability resides. Typically that would be the interval [0, 1] which means that we are in total darkness as regards the
probability of the event to occur.
We will sophisticate the chess game by adding another agent: the
opponent (antagonist) who will play the black pieces. This will induce
non-determinacy because we will not be able to tell what move the opponent will
play. Even if the opponent is deterministic, his determinacy would be too
complicated for us to describe it. That is why will describe the world through
a non-deterministic function g.
Random events
One way of introducing non-determinacy in Event-Driven models is by
random events. If all events in the ED model are deterministic, the ED itself
behaves deterministically (even if the model is non-deterministic, the set of
its active states is determinated). To get the ED behave as a non-deterministic
one, we need to add random events in it. All we know about random events is
that they occur with a certain probability, so we do not know when they exactly
occur. (The probability is a number or interval.)
While random events are not used in either of the two worlds examined in
this article, we do mention them here because they will be needed in other
worlds. When seeking a model of the world we can perfectly assume that we have
discovered some event which switches the states of some ED model. Suppose we
have discovered that event indirectly (as described in [7]) meaning that we can
observe the event but do not know its characteristic function. Then it makes
sense to assume that it is a random event which occurs with a certain
probability. Later on we can find out when exactly the event occurs whereupon
the event will become deterministic (i.e. later on we can describe it by characteristic
function).
Impossible events
We said the world would be more interesting if we do not pay against
ourselves but against another agent who moves the black pieces.
For this purpose we will modify the fifth ED model (the one which tells
us which pieces we are playing with – white or black). This model has two states
which are switched by the event change.
The event is defined as “real_move” (this event is when we really move a piece
while the “fake_move” is when we only touch a piece). We will change the
definition of that event and define it as never
(this is the opposite of every time).
Does it make sense to describe in our model events which can never
happen? The answer is yes, because these events may happen in our imagination.
I.e. even though these events not occur, we need them in order to understand
the world. For example, we are unable to fly or change our gender, but we can
do this in our imagination. The example is not very appropriate, because we can
already fly and change our gender if we wish to. In other words, we can imagine
impossible events in our minds. Also, at some point, these impossible events
may become possible.
We will use the impossible event change
in order to add the rule that we cannot play a move after which we will be in
check (our opponent to be able to capture our king). Figure 13 depicts the
algorithm which describes how we switch sides (turn the chessboard around) and
capture the king. If this algorithm is possible, then the move is incorrect.
Figure
13
This algorithm is more consistent with our understanding of algorithms.
While the algorithms of the chess pieces were oriented graphs with multiple
branches, this one is a path without any branches. In other words, this
algorithm is simply a sequence of actions without any diversions.
This algorithm needs some more restrictions (traces) which are not shown
in Figure 13. In state 1 for example we cannot move in any of the four
directions (otherwise we could go to another square and play another move). The
change event can only occur in state
2 and not in any other state. In state 4 we have the restriction “not observe(King) => not down“ which means that the only
move we can make is to capture the king.
Part of this algorithm is the impossible action change. As mentioned above, although this action is not possible,
we can perform it in our imagination. This event may be part of the definition
of algorithms which will not be executed but are still important because we
need to know whether their execution exists.
Note: In this article
when we say that an algorithm can be executed we mean that it can be executed
successfully. This means that the execution may finish in a final (accepting)
state or with an output event (successful exit).
The second agent
The algorithm in Figure 13 would become simpler if we allow the
existence of a second agent. Instead of switching between the color of the
pieces (turning the chessboard around) we will replace the agent with someone
who always plays the black pieces. Thus, we will end up with an algorithm
performed by more than one agent, which is fine because these algorithms are natural.
For example, “I gave some money to someone and he bought something with my
money” is an example of an algorithm executed by two agents.
The important aspect here is that once we move a white piece, we will
have somebody else (another agent) move a black piece. While in the solitaire
version of the game we wanted to know whether a certain algorithm is possible,
in the two players version we want certain algorithm to be actually executed. It
is not the same to know that a certain algorithm is possible and a certain
algorithm to be actually executed. Knowing that “someone can cook pancakes” is
okay but “your roommate cooking pancakes this morning” is something different.
In the first case you will know something about the world while in the latter
case you will have some pancakes for breakfast. If the actual execution of an
algorithm will be in the hands of agent, then it does matter who the executing
agent is. E.g. we will suppose that the pancakes coming from the your roommate’s
hands will be better than those cooked by you.
We will assume that after each “real_move” we play, the black-pieces
agent will execute the algorithm in Figure 14.
Figure
14
The execution of an algorithm does not happen outright because it is a
multi-step process. Nevertheless, we will assume here that the opponent will
play the black pieces right away (in one step). When people expect someone to
do something, they tend to imagine the final result and ignore the fact that the
process takes some time. Imagine that “Today is my birthday and my roommate will
cook pancakes for me”. In this reflection you take the pancakes for granted and
do not bother that cooking the pancakes takes some time.
As mentioned already, it matters a lot who the black-pieces agent is.
Highly important is whether the agent is friend or foe (will he assist us or
try to disrupt us). The agent’s smartness is also important (because he may
intend certain things but may not be smart enough to do these things). It is
also important to know what the agent can see. In the chess game we assume that
the agent can see everything (the whole chessboard) but in other worlds the
agent may only be able to know and see some part of the information. The
agent’s location can also be important. Here will assume that it is not
important, i.e. wherever the agent is, he may move to any square and lift the
piece in that square. In another assumption the agent’s position may matter
because the pieces that are nearer the agent may be more likely to be moved
that the more distant pieces.
Agent-specific state
We assumed here that the second agent sees a distinct state of the
world. I.e. the second agent has his distinct position <x, y> on the chessboard (the square which he observes). We also assume that
the second agent plays with black pieces as opposed to the protagonist playing
with the white pieces.
We assume that the two agents change the world through the same g function but the memory of that
function is specific to each agent. (The function’s memory describes the state
of the world through answers to the questions asked.) We might assume that the
two states of the world have nothing in common, but then the antagonist’s
actions will not have any impact on the protagonist’s world. Therefore, we will
assume that the chessboard position is the same for both agents (i.e. the trace
in Figure 5 is the same for both). We will further assume that each agent has
his own coordinates and his own piece color (i.e. the active states of
Event-Driven models 2, 3 and 5 are different for the two agents). As concerns
the other ED models and the trace in Figure 6, we will assume that they are
also specific to each agent, although nothing prevents us from making the
opposite assumption.
Had we assumed that the two agents share the same state of the world,
the algorithm in Figure 14 would become heavily complicated. The antagonist
would first turn the chessboard around (change),
then play his move and then turn the chessboard around again in order to leave
the protagonist’s world unchanged. Moreover, the antagonist would need to go
back to his starting coordinates <x, y> (these are the protagonist’s coordinates). It would be bizarre to think
that the separate agents are absolutely identical and share the same location.
The natural way of thinking is that the agents are distinct and have distinct,
but partially overlapping states of the world. For example, “Right now I am
cooking pancakes and my roommate is cooking pancakes, too”. We may be cooking
the same pancakes or it may be that my pancakes have nothing to with his.
Note: It is not very
accurate to say that the world has distinct states for the agents. The world is
one and it has only one state. It would be more accurate to say that we have
changed the world and now we have a world with more questions. The questions
that are common to the two agents have remained the same, but the other
questions are now bifurcated. For example, “Where am I?” is replaced with the
questions “Where is the protagonist?” and “Where is the antagonist?”. From the g function we derived the new function g' which operates with the answers to
the new questions and describes (i) the world through the two agents and (ii)
how the agents change their states through the g function. Nevertheless, for the sake convenience we may assume
that the world has different states for the two agents and these agents change
their states through the g function
which operates only with the answers to the questions that apply only to one of
the agents.
The second world
So far we have described the first world in which the agent plays
solitaire against himself and have written the program [8] which emulates the
first world. The program [8] is the g
function which describes the first world. We have also described a second world
in which the agent plays against some opponent (antagonist). Now, can we also
create a program which emulates the second world?
In the second world we added a statement which says “This algorithm can
be executed”. (This statement was to be added in the first world, because playing
moves after which we are in check is not allowed in the first world, too. For
the time being the program [8] allows us to make this kind of moves.) In the
second world we also added the operation “Opponent executes algorithm”. In the
general case that statement and that operation are unresolvable (more
precisely, they are semi-resolvable).
For example, let us take the statement “This algorithm can be executed”.
In this particular case the question is whether the opponent can capture our
king, which is fully resolvable because the chessboard is finite, has finitely
many positions and all algorithms operating over the chessboard are resolvable.
In the general case the algorithm may be a Turing machine and then the
statement will be equivalent to a halting problem.
The same can be said of the operation “Opponent executes algorithm”. While
the algorithm can be executed by many different methods, the problem of finding
at least one of these methods is semi-resolvable. In the particular case of the
chess game we can easily find one method of executing the algorithm, or even
all methods (i.e. all possible moves). In the general case, however, the
problem is semi-resolvable.
Therefore, in this particular case we can write a program which emulates
the second world, provided however that we have to select the opponent’s
behavior because it can go in many different paths. In other words, in order to
create a program which emulates the chess world, we should embed in it a
program which emulates a chess player.
In the general case however, we will not be able to write a program
which emulates the world we have described. Thus, the language for description of worlds is
already capable of describing worlds that cannot be emulated by a computer
program. We said in the very beginning that the g function may turn out to be non-computable. Writing a program which
computes non-computable functions is certainly impossible.
However, being unable to write a program that emulates the world we have
described is not a big issue, because we are not aiming to emulate the world,
but write an AI program which acts on its understanding of the world (the
description of the world which it has found) in order to successfully plan its
future moves. Certainly, the AI program can proceed with one emulation of the
world, play out some of its possible future developments and select the best
development. (Essentially this is how the Min-Max algorithm of chess programs
works.) I.e. making an emulation of the world would be a welcome though not
mandatory achievement.
Besides being unable to produce a complete emulation of the world (when
the g function is non-computable), AI
would be unable to even figure out the current state of the world (when the
possible states are continuum many). Nevertheless, AI will be able to produce a
partial emulation and figure out the state of the world to some extent. For
example, if there is an infinite tape in the world and this tape carries an
infinite amount of information, AI will not be able to discern the current
state of the world, but would be capable of describing some finite section of
the tape and the information on that finite section. In other words, AI cannot
find the answers to infinitely many questions, but can live with the answers to
a subset of the questions.
Even the Min-Max algorithm is not a complete emulation due to
combinatorial explosion. Instead, Min-Max produces partial emulation by only traversing
the first few moves. If
the description of the world contains a semi-resolvable rule, AI will use that
rule only in one direction. An example is the rule which says that “A statement
is true if there is proof for that statement”. People use that rule if (i) a
proof exists and (ii) they have found that proof. If a proof does not exist,
the rule is not used because we cannot ascertain that there is not any proof at
all.
Conclusion
We have reduced the
task of creating AI to a purely logical problem. Now we have to create a language for description of
worlds, which will be a logical language because it would enable the
description of non-computable functions. If a language enables only the
description of computable functions, it is a programing and not a logical
language.
The main building
blocks of our new language will be Event-Driven models. These will be the simple
modules which we are going to discover one by one. With these modules we will
present patterns, algorithms and phenomena.
Then we will
proceed with the next abstraction and introduce objects. We will not observe
the objects directly and instead will detect them by observing their
properties. A property is a special phenomenon which transpires when we observe
an object which possesses that property. Thus the property is also presented
through an ED model.
Our next
abstraction will be the agent.
Similar to objects, we will not be able to detect agents outright but will
gauge them indirectly by observing their actions. The detection of agents is a
difficult task. People manage to detect agents, however, they need to search
them everywhere. Whenever something happens, people quickly jump into the
explanation than some agent has done that. In peoples’ eyes, behind every event
there lurks a perpetrator which can be another human or an animal or some
deity. Very seldom they would accept that the event has occurred through its own
devices. AI should behave as people do and look for agents everywhere.
Once AI detects an
agent it should proceed to investigate the agent and try to connect to it. To
detect an agent means to conjure up an agent. When AI conjures up an existing
agent, then we can say that AI has detected the agent. When AI conjures up a
non-existing agent, the best we can say is that AI has conjured up a
non-existing thing. It does not really matter whether agents are real or
fictitious as long as the description of the world obtained through these
agents is adequate and yields appropriate results.
AI will investigate
agents and classify them as friends or foes. It will label them as smart or
stupid and as grateful or revengeful. AI will try to connect to agents. To this
end, AI must first find out what each agent is aiming at and offer that thing
to the agent in exchange of getting some benefit for itself. This exchange of
benefits is the implementation of a coalition strategy. Typically it is assumed
that agents meet somewhere outside the world and there they negotiate their
coalition strategy. But, because there is no such place outside the world, we
will assume that agents communicate within the walls of the world. The
principle of their communication is: “I will do something good for you and
expect you to do something good for me in return”. The other principle is “I
will behave predictably and expect you to find out what my behavior is and
start implementing a coalition strategy (engage in behavior which is beneficial
to both of us)”.
This how we communicate
with dogs. We give a bone to a dog and right away we make friends. What do we get in return? They
will not bite us or bark at us, which is a fair deal. As time goes by the
communication may become more sophisticated. We may show an algorithm to the
agent and ask him to replicate it. We can teach the dog to “shake hands” with a
paw. Further on, we can get to linguistic communication by associating objects
with phenomena. For example, a spoken word can be a phenomenon and if this
phenomenon is associated with a certain object or algorithm, the agent will
execute the algorithm as soon as he hears the word. E.g. the dog will come to
us as soon as it hears its name or bring our sleepers when it hears us saying “sleepers”.
You see that
through its simple constituent modules, the language for description of worlds
can describe quite complicated worlds with multiple agents and complex
relationships among the agents. The superstructure we build on these modules
cannot hover in thin air and should rest on some steady fundament. Event-Driven models are
exactly the fundament of the language for description of worlds and the base on
which we will develop all abstractions of higher order.
References
[1] Yiannis
N. Moschovakis (2001). What is an algorithm? Mathematics unlimited – 2001 and beyond, edited by
B. Engquist and W. Schmid, Springer, 2001, pages 919-936.
[2] Yiannis
N. Moschovakis (2018). Abstract recursion and intrinsic complexity. Cambridge
University Press, Lecture Notes in Logic, Volume 48,
ISBN: 9781108415583.
[3] Xi-Ren Cao (2005). Basic Ideas for Event-Based
Optimization of Markov
Systems. Discrete Event Dynamic Systems:
Theory and Applications, 15, 169–197, 2005.
[4] Xi-Ren Cao, Junyu Zhang (2008). Event-Based
Optimization of Markov Systems. IEEE TRANSACTIONS
ON AUTOMATIC CONTROL, VOL. 53, NO. 4, MAY 2008.
[5]
Dimiter Dobrev (2013). Giving the AI definition a form suitable for the engineer. arXiv:1312.5713 [cs.AI].
[6]
Dimiter Dobrev (2018). Event-Driven Models. International Journal "Information Models and Analyses",
Volume 8, Number 1, 2019, pp. 23-58.
[7] Dimiter Dobrev (2019).
Before we can find a model, we must forget about perfection. arXiv:1912.04964 [cs.AI].
[8]
Dimiter Dobrev (2020). AI Unravels Chess. http://dobrev.com/software/AI_unravels_chess.zip.
[9]
Dimiter Dobrev (2020). Strawberry Prolog,
version 5.1. http://dobrev.com/ .
Няма коментари:
Публикуване на коментар