  with Etienne Houzé Pierre-Henri Paris Zacchary Sadeddine # Description complexity

Machine learning relies on relevance criteria that says what the system should pay attention to, what it should keep in memory and what it should ignore. Frequency is often the only measure that is used as relevance criterion in machine learning. Human beings, however, are able to learn from unique situations that are relevant to them for reasons that owe nothing to their frequency of occurrence. Description complexity is a fundamental conceptual tool to capture this ability.

The concept of description complexity is the topic of a whole chapter in this course because of its importance for the future of Artificial Intelligence. Though invented in the 1960s, the notion of complexity is only now becoming pervasive in many approaches to AI and to cognitive science. One may even regard complexity as the key concept that may unlock several of the fundamental problems that AI is still facing at the beginning of this century.

Douglas Hofstadter poses the following question (Hofstadter, D. R. (1984). The Copycat Project: An Experiment in Nondeterminism and Creative Analogies).

If abc gives abd, what will pqrs give? Most people answer the following:

a b c ---> a b d

p q r s ---> p q r t

Imagine that you wrote a program that is supposed to pass this kind of test. When faced with this problem of induction, your program might have answered pqrd, or pqds, or even worse: abd. Human beings ‘know’ that pqrt is a better answer. Similarly, when prompted with ppqqrr, human beings typically answer:

a b c ---> a b d

ppqqrr ---> ppqqss

What would your programme do in this case? And what about abc --> abd, xyz --> ? ? In such creative analogies, complexity may provide the answer (Cornuéjols, A.,1996: Analogie, principe d’économie et complexité algorithmique). The idea is that the ‘correct’ answer should minimize the overall complexity of the quadruple.

## The mathematical concept of complexity

The notion of description complexity was independently defined in the sixties by several scientists, including Ray Solomonoff, Andrei Kolmogorov, and Gregory Chaitin.

• Definition 1. The description complexity of a structure is measured
by the length of the shortest algorithm
generating the structure.

Equivalent definitions mention maximal compression (as the generating algorithm can be seen as a compressed version of the structure). Complexity is thus measured by the size of the structure after compression (imagine you have an ideal ‘Zip’ programme).

(Note that definition 1 makes no reference to any time or memory requirement for the computation, contrary to another notion, ‘resource complexity’, that is commonly used by computer scientists to assess computation cost.)

Definition 1 is usually made objective by considering that the algorithm is run on a universal Turing machine. If one changes the machine, one only pays the fixed cost of emulating the old machine on the new one. Such a general definition is too general for A.I. purposes, as one does not wish to eliminate the observer. We adopt the following definition.

• Definition 2. The description complexity of a structure is measured
by the length of the shortest algorithm generating the structure
that is available to the observer at a given moment.

The advantage of definition 2 is that it takes background knowledge into account. For instance, a number seen in a lottery draw may appear random to others, but not to you if it matches your phone number.

In Hofstadter’s problem, the sequence ppqqrr might prove simpler when seen as resulting from the programme ‘increment letters from p, then duplicate all letters’, rather than any alternative coding (especially as the part ‘increment letters from’ was already made available by the analysis of abc).

Alphabetic Complexity (1)
 Rank the following structures based on your perception of their complexity: d j x ka a a ap q r sa b c dw x y z

 1 - simplest --d j x ka a a ap q r sa b c dw x y z 2 - second simplest --d j x ka a a ap q r sa b c dw x y z 3 - intermediary --d j x ka a a ap q r sa b c dw x y z 4 - second most complex --d j x ka a a ap q r sa b c dw x y z 5 - most complex --d j x ka a a ap q r sa b c dw x y z

To deal with Hofstadter’s problem, imagine a simple "machine" with a few available operations such as:

 copy: C copy the preceding item (here: letter). F C → aa first: F return the first element of the current list (here, the alpabet) F F F → aaa repeat: R(n) repeat the last actual operation n > 0 times F I(2) R(2) → aceg increment: I(n) copy and increment the preceding item (letter) by n > 0. F C I(3) → aad get: G(n) return the nth element of the alphabet. G(4) F → da last: L return the last element of the current list. F L L → azz mirror: M reverse the whole string F L M → za opposite: O modify the last operation by performing its opposite     (not an actual operation) F G(4) I(1) O R(1) → adcb

Alphabetic Coding
 Using the preceding operators, provide clever codes for: a a a aa b c dd j x kp q r sw x y z

 a a a a --G(23) R(3) I(1)G(4) G(10) G(24) G(11)G(23) G(24) G(25) G(26)G(16) I(1) R(2)G(23) I(1) R(2)F C R(2)F R(3)F I(1) R(2)F I(1) R(3)L I(1) O R(2) ML R(3) O R(2)L I(3) O I(1) R(2) a b c d --G(23) R(3) I(1)G(4) G(10) G(24) G(11)G(23) G(24) G(25) G(26)G(16) I(1) R(2)G(23) I(1) R(2)F C R(2)F R(3)F I(1) R(2)F I(1) R(3)L I(1) O R(2) ML R(3) O R(2)L I(3) O I(1) R(2) d j x k --G(23) R(3) I(1)G(4) G(10) G(24) G(11)G(23) G(24) G(25) G(26)G(16) I(1) R(2)G(23) I(1) R(2)F C R(2)F R(3)F I(1) R(2)F I(1) R(3)L I(1) O R(2) ML R(3) O R(2)L I(3) O I(1) R(2) p q r s --G(23) R(3) I(1)G(4) G(10) G(24) G(11)G(23) G(24) G(25) G(26)G(16) I(1) R(2)G(23) I(1) R(2)F C R(2)F R(3)F I(1) R(2)F I(1) R(3)L I(1) O R(2) ML R(3) O R(2)L I(3) O I(1) R(2) w x y z --G(23) R(3) I(1)G(4) G(10) G(24) G(11)G(23) G(24) G(25) G(26)G(16) I(1) R(2)G(23) I(1) R(2)F C R(2)F R(3)F I(1) R(2)F I(1) R(3)L I(1) O R(2) ML R(3) O R(2)L I(3) O I(1) R(2)

 rank binary code code length 1 1 1 2 10 2 3 11 2 4 100 3 5 101 3 6 110 3
One needs at most log2(N+1) bits to represent any object taken among N objects.
By default, a series of four letters requires 4 × log2(26+1) = 20 bits.
Can we do better for our five sequences?
We can note that the nth element in an ordered list can be retrieved by expressing its rank r expressed in bits: log2(r+1), as in the table on the right.

(Note: slightly more efficient codings do exist).

Operators can be given complexities by considering that they are ordered as they appear in the above list of definitions:

[C, F, R, I, G, L, M, O].

For operators that take an argument, the complexity of the argument should be added.
Designating the letter d within the alpabet (with operator G) requires a complexity of 6 bits: three bits 101 to designate G in the list, and again three bits as d appears at rank 4.

The binary representation of the code F I(1) C R(2) is: 10 100 1 1 11 10    (note that this code is unambiguous, provided we are using blank separators) So the complexity of F I(1) C R(2), which codes for abbbb, is 11 bits.
 Provide a binary code that represents abcd.

Alphabetic Complexity (2)
 Based on this coding, provide complexity estimates         (in number of bits) for the 5 sequences. a a a aa b c dd j x kp q r sw x y z

a a a a     a b c d     d j x k

p q r s     w x y z

Complexity provides a relevance criterion for inductive learning, as understood by Solomonoff from the very beginning in 1964 (Solomonoff, R. J. (1978), Complexity-based induction systems), and more generally for any clever learning. Even statistical learning systems, by using frequency-based criteria, rely on complexity: memorizing frequent forms is a good way to achieve compression, by allocating fewer bits to them.
Complexity theory’s statement has a significantly broader scope: ideal learners should achieve maximal compression. Cf. Gregory Chaitin’s beautiful sentence: Comprehension is compression.

The program compressor.pl looks for copied elements in a list. Run the program (type go.). Try different examples by modifying testcompress at the end of the program.
Observe one more time Prolog’s spectacular reversibility: the compressor can be used to decompress lists!
 Use complexity twice successively to compress the following list: [31,31,31,31,32,32,32,32,33,33,33,33,34,34,34,34, 35,35,35,35,36,36,36,36]. What do you get after two successive compression operations? Do you retrieve the original list by running compress twice with swapped arguments?

The second compression step makes use of the cs operation, which means "copy-step". [cs, Times, X, Step] means that pattern X is copied Times times while allowing Step items in-between.

 Using predicate plusOne, define operation increment (just after plusOne). Include increment into the compressor by uncommenting lines in compress. You should be able to get:?- testCompress2.original list: [31,31,31,31,32,32,32,32,33,33,33,33,34,34,34,34,35,35,35,35,36,36,36,36]compressed list: [cs, 6, [c,4], 1, inc, 31, 5](paste the increment predicate below)

 Please comment on the content of this lab session, and more generally on the use of Complexity in AI. Back to the main page