How do we describe the invisible? Let’s take a sequence: input, output,
input, output ... Behind this sequence stands a world and the sequence of its
internal states. We do not see the internal state of the world, but only a part
of it. To describe that part which is invisible, we will use the concept of ‘incorrect move’
and its generalization ‘testable state’. Thus, we will reduce the problem of partial observability to
the problem of full
observability.
Keywords: Artificial
Intelligence, Machine Learning, Reinforcement Learning, Partial Observability,
Diagnosability.
Introduction:
Our first task in the field of Reinforcement learning is to describe the
current state of the world (or the current position of the game). When we see
everything (full observability) this question does not arise because in this
case the input fully describes the state of the world. More interesting is the
case when we see only a part of the world (partial observability). Then we will
add more coordinates to the input vector, and thus we get a new vector which already
fully describes the state of the world.
To add these new coordinates we will first introduce the concept of ‘incorrect
move’. The idea that not all moves are correct is not new. Other authors
suggest, for example, that you cannot spend more money than you have. That is,
they assume that the output is limited by a parameter that changes over time.
What’s new in this article is that we will use this parameter to give further
description of the state of the world.
If at a specific moment we know what we see and what moves are correct
at that specific moment, we know a lot about the current state of the world,
yet we do not know everything. When we generalize the concept of ‘incorrect
move’, we will derive the new concept of ‘testable state’. If we add to the
input vector the values of all testable states, we will get an
infinite-dimensional vector that fully describes the state of the world. That
is, for every two distinct states there will be a testable state whose value in
these two states is different.
Note: The closest to the term ‘testable
state’ is the term ‘diagnosability’ (see [1, 2]). However, there are three
significant differences between these two concepts:
1. Diagnosability corresponds to a semi-decidable testable state. This is
so because with the testable state we have some experiment that, when it can be
performed, will give us the value of a parameter. With ‘diagnosability’,
whenever the experiment can be performed the parameter is necessarily true.
When the parameter is not true, the experiment cannot be performed.
2. With ‘diagnosability’, we look at a parameter that changes its value
only once (first, the failure event has not happened, and then it has already
happened, that is, there is a single change from 0 to 1). With the testable
state, we look at a parameter that can change its value many times, and so it
is very important to us when exactly the parameter has changed its value,
whereas in [1, 2] this question is not significant.
3. With ‘diagnosability’, we start with the parameter and look for
experiments to diagnose this parameter. With the testable state it is the
opposite. Testable state is a specific experiment itself. Later, a link is
sought between the different testable states. For example, it can be determined
that two testable states define the same parameter.
Incorrect moves and testable states are something that actually exists
just like the input we get on each step. However, unlike the input, the value
of testable states is not ready to derive. For example, is the door locked? To
check this we need to push the handle and see whether the door will open, but
we can do it only if we stand by the door. In other words, the check requires
extra effort and it is not always possible (there are moments in which we can
check and moments in which we can’t). The locked door can be considered as an
example of both an incorrect move and of a testable state.
To describe the incorrect moves and testable states we will search for a
theory that gives us this description. Of course, we may have many theories for
a given testable state all of which will compete with each other over time.
What will the theory constitute? Statistics shows us that in specific
situations a given testable state is true. Specific situations means a
situation in which a conjunction is true. This conjunction may be associated only
with the past, but may also be associated with the past and the future. For
example, let’s say we’ve checked and we’ve seen that a door is locked, but is
it the door that we are interested in? We may decide that it is precisely this
door on the basis of what we have seen before checking, or maybe after
checking, a posteriori, we’ve seen something which tells us that it is exactly
this door.
Another generalization of ‘specific situations’ will be to allow
dependencies with memory (except dependencies without memory). A dependency
without memory is the situation in which specific events occur in the course of
several consecutive steps (i.e. this is an ordinary conjunction). A dependency
with memory is the situation in which specific events occur at specific moments
(not consecutive), and in the periods between those moments specific events do
not occur.
The theory may be a set of such implications and this set can be
generalized as a finite-state machine. Let’s take an example where we are
moving in a castle in which some of the doors are locked. We can have a rule of
the following type: “If you see a white sofa and you turn right, then the door
will be locked.” If we represent the map of the castle as a finite-state
machine, we will see that if after the white sofa we turn right, we are at the
door X and that this door is locked.
If we know the finite-state machine, we can easily derive the corresponding rules.
Unfortunately, we need the opposite – to derive a finite-state machine from the
rules, and that’s a much more difficult task.
Let us now imagine a castle the doors of which are not permanently
locked or unlocked but change following specific rules. Then our theory will suggest
some sustainability of testable states. For example, if a door has been locked
at a specific moment and shortly thereafter we check again, we assume that it
will be locked again, especially if during this time no specific events have
occurred (for example, to hear a click of door locks).
The next upgrade of the theory would be to assume that there is some
kind of creature inside the castle, which unlocks and locks the doors in its
sole discretion. Then we can predict whether a door is locked or unlocked
predicting the behavior of that creature.
Once we’ve created one or several theories that predict a testable
state, we will use these theories and gather statistics that will help us
predict the future and to create new theories of other testable states. For
example, in our case, if we’ve noticed that behind the locked door there is a
princess, and a tiger behind the unlocked door, then based on the theory that
tells us which door is locked, we can make a theory that tells us where the
princesses are.
The article begins with a few definitions, then follows a specific example,
the concept of ‘incorrect move’, and we say what the dependencies with and
without memory are. We define the testable state and show that the incorrect
move is its special case. We will say what an axiom is and how the axioms make
theories. It remains to be said how from axioms we will make probable theories
and how we will make more sophisticated theories based on a finite-state
machines or group of creatures (agents) living in the same world and changing
the testable states at their own discretion.
Definitions
Let’s take a sequence of output,
input, output, input, …, and the goal is to understand this sequence.
Of course, this sequence is not accidental. We can assume that we are
playing a game and that’s the sequence:
move, observation, move,
observation ...
And our goal is to understand the rules of the game and what is the
current position on the game board.
We might assume that we are in a world and then the sequence would be:
action, view, action, view ...
In this case, our goal is to understand the rules of this world and what
is the current state of the world.
The description of the world is given by the functions World and View, and the following applies:
si+1=World(si
, ai+1)
vi=View(si)
Here, actions and
observations (ai and vi) are vectors from scalars with dimensions n and m, respectively.
Our goal is to
present the internal state (si) also as a vector of scalars, in which the
first m coordinates will be exactly vi. In
other words, the function View will
be the projective function that returns the first m coordinates of si.
We will call ‘states’ to all coordinates of si. We will call the first m ones ‘visible states’. Other coordinates will be called ‘testable states’.
The coordinates of
the input and output will be called ‘signals’. These are functions of time that return a
scalar. To the input and output signals we will add other signals as well, like
the testable states; more precisely – the value of the testable state according
to the relevant theory, because we will not know the exact value of these
states, and will approximate them with some theories. For each finite-state
machine we will add a signal whose value will be equal to the state in which
the finite-state machine is situated at the moment t. Of course, if the
machine is nondeterministic, the value of this signal may not be exact but
approximate.
We will call the
Boolean signals ‘events’. When the Boolean signal is truth, we will say that the event is happening.
Example
To make things
clear, let’s take an example. Let’s imagine we are playing chess with an
imaginary opponent without seeing the board (we are playing blind). We will not
specify whether the imaginary opponent is human or a computer program.
Our move will be
the following 4-dimensional vector:
X1, Y1, X2, Y2
The meaning is
that we are moving the piece from the (X1, Y1) coordinates to the (X2, Y2) coordinates.
What we will see
after each move is a 5-dimensional vector:
A1, B1, A2, B2, R
The first four
coordinates of the input show us the counter-move of the imaginary opponent,
and R shows us our immediate reward.
All scalars have
values from 1 to 8, except R, whose
value is in the {-1, 0, 1, nothing}
set. The meaning of these values is as follows: {loss, draw, win, the game is not over}.
Incorrect move
Shall we allow the
existence of incorrect moves?
Our first
suggestion is to choose a world in which all moves are correct. In the example
we took, we cannot move the bishop as a knight, so it is natural to assume that
some of our moves are incorrect or impossible.
Our second
suggestion is to have incorrect moves and when we play such a move to assume
that the world penalizes us with a loss. So we will very quickly learn to avoid
incorrect moves, but here we are denied the opportunity to study the world by
checking which move is correct (like touching the walls in the darkness).
Our third
suggestion is: When
an incorrect move is made the state of the world to remain the same, and
instead of ‘loss’ the immediate reward to be ‘incorrect move’ and all other
coordinates at the input to return ‘nothing’. This option is better, but thus
we would unnecessarily prolong the history. (We will call ‘history’ the
following sequence: a1, v1, ... , at-1, vt-1, where t
is the current time). Given that the state of the world remains the same, there
is no need to increase the count of the steps.
The fourth
suggestion is: When you play an incorrect move, you to be
informed that the move is incorrect but without prolongation of the history.
The new history will look like this: u1, a1, v1, ... , ut, at, vt. Here ui
is the set of incorrect moves at the i-th
step which we have tried and we know for sure that they are incorrect. Thus the
history is not prolonged, yet the information that certain moves are incorrect
is recorded in the history. Here we assume that the order in which we’ve tried
the incorrect moves does not matter.
The fifth option,
which we will discuss in this article, will be even more liberal. In the
previous suggestion we have the opportunity to try whether a move is incorrect,
but if it proves correct, we have already made this move. Now we will assume
that we can try different moves as many as we want and we can get information
whether it is correct or incorrect for each one of them. After that, we make a
move, for which we already know that it is correct. We know because we’ve tried
it. Here, the history is the same as with suggestion No. 5, except that ui is a tuple of two sets,
the first of which is from the tried incorrect moves, the second – from the
tried correct moves.
There is a sixth
option and it is for every step to get full information about which moves are
correct and which are not. In other words, we get ui with all the moves in it like they have been tested.
However, we do not like this option because ui
may be too large, i.e. it may contain too much information. Moreover, we assume
that after some training we will have a fairly accurate idea about which move
is correct and which is incorrect and will not have to make many tests in order
to understand this.
Dependencies without memory
We will assume
that the current moment t is a very big number, and the history is too long
and it is not worth remembering it all. Even if we remembered it, it would be
foolish to process the whole of it on each step in order to choose our next
move. So we will assume that instead of remembering the entire history we have
collected some statistics and will determine our next move solely based on
these statistics.
What will this
statistics constitute? We will look at the conjunctions of literals. Here
‘literal’ means the equality between a signal for a given t and a
particular value that this signal could receive. In the present example, with
the game of chess, let us take for example the following conjunction:
X1(t)=5, Y1(t)=2, X2(t)=5, Y2(t)=4, A2(t)=5, B2(t)=4
This means that at
the moment t we’ve moved a piece from E2 to E4 and the imaginary
opponent has captured the piece that we’ve played.
Let’s take another
example:
A2(t-1)=5, B2(t-1)=4, X2(t)=5, Y2(t)=4
This means that
imaginary opponent has played on E4 and we immediately, the very next move,
have captured the piece that he has played.
We will assume
that this conjunction are not scrambled and are ordered:
1. Older ones
supersede new ones (i.e., t-1 supersedes before t).
2. The output
supersedes the input.
3. The coordinate i
of the input (output) supersedes coordinate i+1.
We will assume
that these conjunctions are stabilized, i.e. the time at the rightmost literal
is t. What is the idea of stabilization? Conjunction is an event that is
happening at the moment t. If we put t+1 instead of t we
get the same event that happens at t+1. Since this is the same event,
possibly shifted in time, we will collect statistics only for one of all
possible variants of the event. The value of literals at t+1 is not yet
known and we will therefore take the event whose value is already known and has
become known exactly at the moment t.
For each of these
conjunctions we will count how many times it has occurred (i.e. it has been
truth).
Definition: We will call a move a conjunction, in
which all literals are for the moment t, there are no literals from the
input and all coordinates at the output are taking part.
Definition: A cumulative move will be the same as
move, except that not all coordinates at the output will necessarily
participate.
That is, the
conjunction ‘move’ describes a particular move, while the cumulative move
describes a set of moves.
Definition: We will call a conjunction, whose last
literal is from the output, a conditional move. Such a conjunction can be
represented as A & M, where A represents the condition, and M is a
cumulative move. (Here we assume that we’ve divided it into A and M so that M
is the maximum.)
For the
conditional moves will not only count how many times they’ve been played, but
also how many times they’ve been tried and how many times they have shown a
correct or incorrect move when tried. That is, for every conditional move we
will keep three counters:
1. How many times
it has been played;
2. How many times
it has been tried and it has been correct
3. How many times
it has been tried and it has been incorrect.
The value of
counter 2 will be greater or equal to the value of counter 1.
For all other
conjunctions we will have only counter 1 (which will only remember how many
times this conjunction was true). Here we will not count how many times there
has been a correct move (because there has always been) and will not count how
many times there have been an incorrect move (because it is information that is
not worth the effort to collect).
Note: We will not consider the empty
conjunction as a cumulative move. That is, we will not consider the set of all
possible moves. We’ll skip the signal which is truth when all possible moves
are correct. We will skip this signal because the effort that we need to make
to collect statistics about it is not worth it. Is there a case where all
possible moves are incorrect? Yes, it can happen, but if it happens it will be
only once. We will call this situation a cul-de-sac
or premature death. In this
situation, the history will end because there won’t be a single possible
correct move.
We will organize
all conjunctions in a tree, where each conjunction will be a node of the tree.
This will save memory, but mostly will save time because at each step we will
not go around the entire tree, but only around that part of it where nodes are
true conjunctions (i.e. true at this step).
When we go around
the tree we will increase by one the counters which are on the nodes through
which we pass. It is important to note that when we have several correct moves;
they will be presented as a sub-tree of our tree. We need to go round this
sub-tree and increase by one all counters for correct move in it. These nodes,
which have several successors, should also be increased only by one. If we are
not careful, we may pass through such a node several times and increase it by
one each time, which would be a mistake. (What we’ve said about several correct
moves applies correspondingly to several incorrect moves.)
Example: If at the current moment the incorrect
moves are the cumulative ones: X1(t)=1,
Y1(t)=1 and X1(t)
= 1, Y1(t) = 2 (i.e., we cannot capture a piece from A1 and from
A2) then we need to increase by one the counters of these cumulative moves
(counter 3 – for the case when in this cumulative move there has been an
incorrect move). Counter 3 for the cumulative move X1(t)=1 must also be increased by one because there has
been one moment in which this cumulative move contained an incorrect move,
regardless of the fact that in this case it contains not one but two incorrect
moves.
With the use of
this tree we will collect statistics on how many times each of the observed
conjunctions has been a truth. Of course, all of conjunctions are countless, so
we will observe only some of them (which are shorter and include fewer steps).
We’ll assume that if we are observing a conjunction, we are observing all its
subsets.
However, the
information about how many times a conjunction was a truth is not sufficient.
We also want to know when it was a truth for the last time. This can easily be
achieved by adding another memory cell to the counter (to all three counters)
we’ve already attributed to this conjunction. Every time we increase counter 1,
we will record the value of the counter of time in that memory cell.
Let’s say we want
to know not only the last time when the conjunction was a truth but the last
100 times when it was a truth. This can easily be achieved with the expense of
memory, but without the expense of computer time. Instead of an additional cell
we will attribute a hundred cells. In which cell will we record the value of
the counter of time? We’ll take the value of counter 1 by module 100 and this
will give us the number of the cell.
Thus we will keep
some very strict memories of the last 100 steps, and even of more than 100
steps, because we will know what conjunctions were well back in the past for
these conjunctions, which are rarely a truth.
That will not be
enough and we want to have more information about the time before last hundred successes
of the event. This information will not be entirely accurate, but it will be
approximate. For this purpose we will choose 100 important points that will
divide the history to 101 intervals. We
will count how many times the event occurred in each of these 101 intervals.
More precisely we will count how many times it happened from the beginning
(from the moment of birth) to the end of each of these intervals, but the
difference between two such numbers will give us how many times it occurred
between any two important points.
Those hundred
important points will not be static and will change over time. The idea is
those are densely when close to the current moment, and less frequent back in
the past. Here is a sample algorithm for changing the important points:
We put an
important point at every 10 steps. When the 100 points finish we begin
rarefaction of the important points, one in one, starting from the beginning to
the current moment. When we finish the first rarefaction we start the second
one, again from the beginning and so on.
It is not very
smart to put important points at regular intervals. It would be smarter to
specify some important events and put important points when such an event
occurs. For example, if you are trying to find a finite-state machine whose
transitions are certain events, it would be smart to put important points when
these events (transitions) occur.
This would help us to gather statistics on this finite-state machine and to
build it for more easily.
How will we organize
the counting? This will again be at the expense of some memory, but without the
expense of computer time. For each conjunction we will set aside another 100
memory cells for the one hundred important points. Thus, the cells for each
conjunction become 201 (203) total.
Let’s make an
auxiliary table. It will consist of 100 rows and two columns. The first column
will be the times of the one hundred important points (sorted) and the second
will be the numbers of points (which are between 0 and 99). This table will
help us to calculate how many times a conjunction between two important points
has been a truth. We will choose from the table the two moments that define our
interval of interest. We take from the table the numbers of the respective
important points. We check the last time the conjunction has been a truth and
if the two are before this point, we take the difference. If both are after,
the answer is zero. If the first moment is before, and the second after, then
we take the difference between the counter 1 and the cell of the first moment.
This auxiliary
table will be used in counting as well. Here’s how the algorithm of counting
will look like:
When a conjunction
is a truth, we take counter 1 by module 100 and this is the address of the cell
which remembers the last time this conjunction has been a truth. We take this
number. In the next cell (by module 100) we record the current time. We look in
the auxiliary table from the bottom to the top and we go until the time of
creation of the important point is after the time we remember (the last time
this conjunction has been a truth). We take the number of each such (new)
important point and write down the value of counter 1 in the cell which address
is this number. When finished, we increment counter 1 and we are ready.
Testable states
A testable state
is the result of an experiment. Here are two examples:
The door is
locked
If I push the
handle Þ the door will open.
The stove is
hot
If I touch the
stove Þ I will get burned.
Here we have the
testable state and the experiment, which must be made in order to obtain the
value of the testable state. The result of the experiment can be ‘Yes’ or ‘No’.
From the above two
examples you might get the impression that the experiment consists of any actions
that we need to do, but it is not so. The experiment could be something that
happens without our intervention. Here is an example:
The roof is
damaged
If it rains Þ the
roof will leak.
In this example
the testable state is checked when it rains, and this is something that does
not depend on our actions.
Definition: A testable state will consist of two
statements. We will call the first one ‘condition’ and the second one
‘conclusion’. Below we will give additional requirements to the condition and
the conclusion of a testable state.
Definition: We say that a testable state is a special
case of another one if they have the same conclusion, and if the condition of
the second is a special case of the condition of the first.
Note: We can think of the testable state as an implication,
although this is not quite accurate because its possible values are three
(true, false and undefined – when the condition is not true). If we imagine
this implication as a disjunction, a special case is the disjunction that has
more literals. Accordingly, the conditions will be conjunctions, and a special
case is the conjunction that has fewer literals.
Let’s define
incorrect move.
Definition: An incorrect move is testable state whose
condition is a conjunction in which all literals are from the output and are
for the moment t+1 and whose
conclusion would be “the move is incorrect.”
That is, the
incorrect move is testable state, whose condition is a cumulative move (at the
moment t+1).
To complete the
definition of testable state we will first define an immediately testable state
is. This is testable state that, if it can be tested, the test will be carried
out at the next step.
Definition: An immediately testable state is one of
the following:
1. An incorrect
move.
2. A testable
state, whose condition is a conjunction, in which all literals are for the
moment t+1, and the conclusion would
be a literal from the input, also for the moment t+1.
Here two questions
arise. Why the conclusion is only one literal and why the literal is from the
input? The answer to the first question will be: If the conclusion is the
conjunction of two literals, then we can present this testable state by four
other testable states in which the conclusion is only one literal. If this
testable state is of the type A Þ B1&
B2, this
testable state is a truth just when the testable states A Þ B1 and A
Þ B2 are a truth. This state is a lie just when
the testable states A & B1
Þ Ø B2 and A
& B2Þ Ø B1 are a truth. (Here the negation with Boolean signals
is one literal, and with non-Boolean ones – a disjunction of several literals.)
As for the second
question: We want to describe the world, not to describe our behaviour. That in
certain circumstances we have always played something (or have never played
this thing) is not a description of the world, and a description of our behaviour.
We may never have played it because it’s an incorrect move or we could have
always played it, because this is the only correct move. This latter is already
a description of the state of the world, but this information is given by the
term ‘incorrect move’.
Having defined an
immediately testable state we are to get a definition of the term ‘testable
state’ by adding something to the condition. We will add precondition or
postcondition or both. A precondition will be a conjunction of literals which
are for moments before t+1, and a
postcondition will be a conjunction of literals which are for moments after t+1.
Here, one part of
the conjunction is associated with the past (before t+1), and the other part is associated with the future (after t). The testable state is a prediction
for the result of carrying out an experiment that has yet to be held. So a
testable state must fully involve the future. Otherwise, we get something that
is partially known and it can not be verified (for the known part is a lie) or
will be a testable state, but some other testable state (the first will be a
special case of the second) because the known part will be a truth and the new
testable state will depend only on the part that is still unknown.
Therefore we will
correct the definition of a testable state by shifting the entire conjunction
to the right, so it all goes into the future (we add a k to the time of all literals). We need to shift the conjunction to
the right until the most left literal is for the moment t+1.
If we shift
further to the right (e.g. by one more step), we will get another testable
state, but it will be a prediction of more distant future. So this new testable
state will be a truth if the old testable state is a truth at step t+1 whatever happened at step t+1. That is, the new testable state
will be a more common case of the old taken for step t+1. This is so because the new one can be obtained from the old
one taken for step t+1 by adding what
happened at step t (i.e. to add a
conjunction describing completely the output at step t). That is, we quite naturally get that the testable state that
looks ahead to the future, is a more general case of the one that looks closer.
In the definition
of a testable state we can assume that the precondition and the postcondition
are not conjunctions but cumulative conjunctions. That is, the precondition and
the postcondition can have memory and apply for a longer period of time (and
not just for a few steps).
Note that the
number of immediately testable states is final (if the input and output signals
are finite functions). By adding preconditions and postconditions we get an
infinite number of testable states, which is natural, because we can not
describe an infinite world with finitely many parameters.
When do we test?
We want for each testable
state to assign a signal. That is, we look for a time function that is related
to the result of the experiment. The question arises if the experiment is
conducted over a period of time, when exactly the testable state signal takes
the experimental result as its value.
If the experiment was
limited to just one step, it would be easy, but it could take several steps. If
the condition of the testable state is a common conjuncture, then we know how
many steps this experiment takes, but if it is a cumulative conjunction, we do
not know even that.
We will look at several
options:
1. Let’s assume the
signal receives the value at the moment the experiment ends. That is, at the
moments in which the experiment ends the signal receives a value and it is
equal to the result of the experiment, and at the rest of the moments the
signal is undefined. Under this definition, the value of the signal depends
entirely on the past (history) and does not depend on the future (the current state)
at all.
2. Let’s assume the value
is obtained at the moment when the testable state conclusion is tested. At this
moment, the value is already clear, but the postcondition has not yet been
verified, and it may be true or false, depending on what we will do (what are
our future moves). We will assume that the signal will have value at this
moment if the postcondition can be fulfilled (assuming we play it so to fulfill
it). In this definition, the value of the signal depends on both the past
(history) and the future (the current state).
3. Let’s choose the
moment immediately before the experiment starts. At this point, we still do not
know what will be the result of the experiment, and whether it can be
conducted. The value of the signal will be the forecast of the result of the
experiment. This will be the definition we will choose. The good thing is that
this definition depends only on the future (on the current state).
4. Let’s choose a moment
before the beginning of the experiment, but not immediately before. Then the
signal value should still be a forecast, but this will be a more distant
forecast. The possible developments seen from a distance are more. For example,
if you see from a distance that the experiment cannot be performed, then from a
closer distance you will see the same, but not vice versa. That is, in this way
we would get another, different signal that corresponds to a more generally
testable state.
The testable state is something real
The testable state is
something quite real, as the input we get at every step is real. We can look at
the value of the testable state as a function that depends only on the current
state of the world. This function takes one of four possible values:
We get the value ‘Cannot
be tested’ when the experiment is not possible. When does this happen? Let’s
look at all the possible developments (all the moves we could play starting
from the current moment t). If the
experiment cannot be performed with any of these developments, then for moment t we have: ‘Cannot be tested’. It should
be able to perform the experiment immediately, i.e. its beginning must be at
the moment t and not later. If the
experiment cannot be performed, it means that the precondition or the postcondition
is false, or the condition of the respective immediately testable state is
false.
Accordingly, ‘Yes’ will
be if it can be tested, and always when it is possible to test it the result is
‘Yes’. It will be similar for ‘No’. The ‘Yes and No’ will be in all other
cases.
When is the value ‘Yes
and No’? This is possible in a loose experiment. This can also happen if the
function World is not deterministic.
What does a loose
experiment mean? This is when only some of the output signals are determined by
the condition of the testable state. When all output signals are determined,
then we will know which is exactly the move we have played, and then we will
call the experiment tight.
Example:
If I touch the stove Þ I will get burned.
This experiment is loose
because we have not said whether we will touch with a glove or without a glove.
If I touch the stove without a glove Þ I will get burned.
This experiment is
tighter because the move has more detailed description.
Note: Here, if the precondition and the postcondition are elementary
conjunctions, then the number of all possible developments will be finite,
because then they will depend only a few steps before and after the test. If
these are cumulative conjunctions, then all possible developments can be
infinite, but a testable state still will be a well-defined function, although
this function may not be computable. (We mean could we calculate it if we had
the function World as a subroutine.
Of course, we do not have the function
World, so this remark is purely theoretical.)
Theories
The testable state is not
ready to be obtained. As we have said, it is something real, but we cannot
directly measure it, but we need to approximate it by theory.
The theory will be a
function that will take the history as an argument and return a signal. This
signal will have three possible values: {Yes,
No, I don’t know}.
The purpose of the theory
will be to describe the signal of the testable state. We will say that the
theory is correct if, whenever it says ‘Yes’ the signal of the testable state
has one of these two values: {Yes, Cannot
be tested}. (The same shall also apply for ‘No’.)
We will say that the
theory is complete if, whenever the signal of the testable state is ‘Yes’ the
theory says ‘Yes’. (The same shall also apply for ‘No’.)
We will say that the
theory is super complete if it is complete and if it makes a prediction of all
the moments when the signal is ‘Cannot be tested’. That is, for any such moment,
the theory is to say ‘Yes’ or ‘No’.
The theories that we will
use will not be correct nor will they be complete, but we will strive to make
them as correct and as complete as possible.
Every theory gives us
some predictions about the value of the testable state, and this prediction has
some degree of confidence. That is, for any moment, the theory will return two
values: a prediction and a degree of confidence of this prediction.
Axioms
Definition: We will call a positive axiom a testable state that,
whenever it was tested, it was true and that has been tested a sufficient
number of times.
Similarly, we will define
a negative axiom.
What does “whenever it
was tested” mean? It means that this is so in the history we have accumulated.
There is no guarantee that this will be the case with all possible histories,
and there is no guarantee for all possible extensions of the current history.
However, if the axiom is checked for a sufficient number of times, then with a
high degree of confidence we can assume that this will always be the case.
What does “a sufficient
number of times” mean? Let it be 10 times.
If something has happened only once, it could have happened by accident, if it
has happened 10 times, then it increases the degree of confidence, but of
course there is no guarantee that the 11th time the opposite will
not happen.
Instead of counting to
10, it is better to want to have so many experiments that the expectation that
the conclusion of the condition does not
happen to be at least 10 times. If the conclusion has probability near zero,
then to test 10 times will be enough. If the conclusion has probability about
50%, then we will have to test at least 20 times, and so on.
The conclusion is a
specific literal and its probability can be taken from statistics by seeing how
many times this literal was true and divide it to the counter of the steps. It
is better not to take the probability of the conclusion but the probability of
the small testable state (the one this axiom will describe). This value will
still be taken from the statistics. How many times this small testable state
has been tested and how many times it was true when tested.
Theories of axioms
What are we going to use
the axioms for? If a testable state is a positive axiom, there is nothing to
describe it. Its theory is the one that always says ‘Yes’. Interestingly, with
the help of axioms we will describe the smaller testable states (those to which
the particular axiom is a special case of).
Let’s take a testable
state. From the statistics we will find axioms that are a special case of this
testable state, and we will use these axioms to describe this testable state.
Definition: A prerequisite for an axiom will be the difference
between the axiom and the small testable state it describes. (That is, the
prerequisite depends on what we describe with this axiom.)
From each positive axiom
we will make a theory that describes the small testable state and says it is
true always under certain circumstances. The certain circumstances are when the
axiom’s prerequisite is true.
Let’s say we want to
predict the value of a testable state for the moment t based on the information we have at moment t. We will take the set of all histories with a length t. Each of these histories takes us to a
certain internal state (several histories can take us to the same internal
state). We divide this set of histories into 4 as we’ve divided the set of
internal states into 4 (Fig. 1) by placing the history in the corresponding
quarter based on what internal state it takes us to.
We have taken three
axioms whose prerequisites are before t
(that is, if we shift the axiom to the left, so that the small axiom starts
from the moment t, then the whole
prerequisite must be before t). For
each of these three axioms, we draw a red circle surrounding the histories in
which the prerequisite of the corresponding axiom is true.
Note: An internal state may correspond to two histories and
one could be in the red circle and the other – outside it, but both will surely
be in the same quarter.
We will assume that the
axioms we have found through the statistics are correct. That is, we assume
that their red circles are entirely in the ‘Yes’ and ‘Cannot be checked’
quarters. Of course, this assumption may not be true, and the red circle of any
of these axioms may contain a state that could generate a ‘No’ result.
The red circle may be
completely in the ‘Yes’ quarter, but it may also be partially in it. Can the
red circle be completely in the ‘Cannot be checked’ quarter? We’ve checked this
axiom in our history many times. So there is at least one internal state that
could generate a ‘Yes’ result. The problem is that, although this state is
reachable, there is no guarantee that it can be reached with a history of
length t. For this reason, we change
the description of Figure 2 and thus all stories will be included there, not
just those with a length t.
We would like all
histories to end at the moment t,
which would be difficult if they are of different lengths. Here, the numbering
of the steps in the history is largely conditional, and so we can align them to
the right instead of aligning them to the left. That is, instead of the
numbering always to start from 1 we will make it to end at t. So we will start the numbering of the shorter histories from a
larger number, and the longer ones – from a negative number. (We are interested
in how the history ends, not how it begins, and so we can safely number it
backwards.)
Once we have made this
change, all histories and all reachable states are now included in Figure 2.
Now we can say that the red circle can not be entirely in the quarter of ‘Cannot
be checked’.
This was for the moment t, based on the information we have at
the moment t. We will want to be able
to predict the testable state both at earlier and later moments. Very often the
late prediction is useless like after death, the doctor, but it will still be
helpful in collecting statistics. In addition, sometimes the late prediction
may be useful and give us important information. (We are talking about a
testable state. Because the moment is over, it does not mean that we already
know what its value was at that moment. For example, we understand that at
lunchtime the stove was hot, which means that the children have had lunch. This
is useful information obtained from the value of a testable state at a past
moment.)
Now we will make Figure 2 so that it depicts the axioms that predict the
value of the testable state for the moment t,
based on the information we have at the moment t+k. We had taken all the histories that ended at the moment t, and for each such history we had
assigned the state to which it brought us at the moment t. We now want to take all the histories that end at the moment t+k, and for each such history we have
assigned the state to which it has brought us at the moment t. What do we do when k is negative? Then this history does
not reach the moment t. For this
purpose we will take all the histories that finish at the moment max (t, t+k). When k is negative, the last few steps of the history will not be used
to determine if the prerequisite of the axiom is valid, because these few steps
will still be unknown.
We received a simple picture with a very complicated description of what is
depicted in the picture. Despite the complexity of this description, the
algorithm that follows from this picture is quite simple. We take all positive
axioms that describe a given testable state. We unite the red circles of all
these axioms. We get a red circle that describes the histories for which our
theory says that the testable state is true at the moment t. We act with the negative axioms in a similar way and we get a
blue circle. Figure 3 shows how our theory looks like:
Note that the red and blue circles can cross. For the cases where the
theory tells us both ‘Yes’ and ‘No’, we will assume that it says ‘I do not know’.
What is the algorithm? At each step, we check which axioms are coming out
(that is, their prerequisite is already known and true). If, until now, our
theory has said, ‘I do not know because I have no information’, then, after the
axiom, the theory already knows and predicts something. If our theory already
predicts ‘Yes’, then the new axiom, if positive, will only confirm this
prediction, and if it is negative, the prediction will become: ‘I do not know
because I have too much information.’
Note that when the axioms are ordinary conjunctions, then an axiom gives us
a prediction for a moment, but when the conjunctions are cumulative, an axiom
gives us a prediction for a whole interval of time.
Conclusion
In the field of AI
our first task is to say what we are looking for, and the second task is to
find the thing we are looking for. This article is entirely devoted to the
second task.
Articles [3-5], which are devoted to the first task, tell us that
the thing we are looking for is an intelligent program that proves its
intelligence by demonstrating it can understand an unknown world and cope in it
to a sufficient degree.
The key element in
this article is the understanding of the world itself, and more precisely – the
understanding of the state of the world. We argue that if we understand the
state of the world, we understand the world itself. Formally speaking, to
understand the state of the world is like reducing the problem for
Reinforcement learning with partial observability to the problem for Reinforcement
learning with full observability. Not accidentally, almost all articles on
Reinforcement learning deal with the case of full observability. This is
because it is the easiest case. The only problem in this case is to overcome
the combinatorial explosion. Of course, combinatorial explosion itself is a big
enough problem because we get algorithms that would work if you have an
endlessly fast computer and indefinitely long time for training (long in terms
of the number of steps). Such algorithms are completely useless in practice and
their value is only theoretical.
In this article we
presented the state of the world as infinite-dimensional vector of the values
of all testable states. This seems enough to understand the state of the
world, but we need a finite
description and we would like this description to be as simple as possible. For
this purpose, we’ve made three additional steps. A model of the world was
introduced as a finite-state machine. Each such machine describes an infinite
number of testable states and this is a very simple description which is easy
to operate.
The second step
was the assumption that testable states are inert and change only if specific
events occur. That is, if between two checks, none of these events has
occurred, we can assume that the value of the testable state has not changed.
The third step was
the introduction of agents which under certain conditions may alter the values
of testable states. This step is particularly important because the agent
hides in itself much of the complexity of the world and thus the world becomes
much simpler and more understandable. We will not describe the agent as a
system of formal rules. Instead, we will approximate it with assumptions such
as that it is our ally and tries to help us or that it is an opponent and tries
to counteract.
Without these
three steps the understanding of a more complex world would be completely
impossible. Formally speaking, testable states only would be sufficient to
understand the state of the world. Even testable states which conditions are
simple conjunction dependent only on the last few steps of the past are
sufficient. However, without the additional generalizations made, we would face
an enormous combinatorial explosion.
Acknowledgments
I would like to thank my
colleague, Dimitar Guelev, who introduced me to the concept of ‘Diagnosability’
and the related literature.
References
[1] Meera Sampath, Raja
Sengupta, Stephane Lafortune, Kasim Sinnamohideen and Demosthenis Teneketzis.
Diagnosability of discrete-event systems. IEEE
Transactions on Automatic Control, 40(9):1555–1575, sep 1995.
[2] Xiaowei Huang.
Diagnosability in Concurrent Probabilistic Systems. Proceedings of the 12th International Conference on Autonomous Agents
and Multiagent Systems (AAMAS 2013), Ito, Jonker, Gini, and Shehory (eds.),
May, 6–10, 2013, Saint Paul, Minnesota, USA.
[3] Dobrev D. AI – What is this. PC Magazine – Bulgaria, November'2000, pp.12-13.
http://www.dobrev.com/AI/definition.html
[4] Dobrev D. Formal Definition of Artificial
Intelligence. International Journal
"Information Theories & Applications", vol.12, Number 3, 2005,
pp.277-285. http://www.dobrev.com/AI/
[5] Dobrev D. Comparison between the two definitions of
AI. arXiv:1302.0216, January, 2013.
http://www.dobrev.com/AI/