Version 2 (modified by schiptsov, 7 years ago) ( diff )

--

# Data

Know its structure (form, shape) and conventions.

Lets first think about a simple fixed-length sequence of only numbers.

```[1 2 3 4]
```

Suppose, in a computer memory it is stored as a 4 consecutive bytes as one chunk of memory:

```[1][2][3][4]
```

How do I know where is the beginning of this sequence?

Well, there must be an address of a byte that contains the first element of it, stored somewhere for me. Otherwise it lost.

What is an address of a byte in computer memory?

It is an offset of a byte from the "base of memory". Think of a ruler. Zero on it is the base.

What is an offset? It is just a number of a byte - first, second, third, etc. starting from zero. Repeat after me: starting from zero.

But why? Humans start counting from one - there is no zero fingers.

Well, there is a reason for it. Offset is not just a number of a byte, starting from zero. It is a coefficient to multiply to the size-of-an-element of a sequence to get its position, relative of the its beginning. A long sentence, I know.

Suppose the size of a number is just 1 byte. Then, to get the n-th number of of a sequence of numbers I do:

```(+ base (* (- n 1) (size-of byte)))
```

The offset of the first element if zero. So, after multiplication I get zero. Base plus zero is base.

It is an address (or an offset) of the first element.

Address of the second element is base plus 1, third - base + two, and so on.

This is why in computer science we start counting from zero. Always. Just accept it. There is a deep reason for it.

What if I do not know the size of my sequence in advance? It means that I can't ask what is an element with offset 10, because there might be only 5 of them, or 7.

If I would try to get the content of a byte at offset 10, it will be just random number which happen to be in (a value of) this particular byte, and have nothing to do with the data in my structure.

The answer is - I must have something special to mark the end-of-sequence. It must be an unique identity which cannot be a part of sequence itself. Very difficult.)

But, look, the base address of a sequence and the address of its first element are the same number - offset 0 is the same as base address. So, why not use 0 to mark the end-of-sequence?

This means we can have any numbers in a sequence, except zero, anything but zero.

Lets ask another question. What a number can represent? Well, an offset or a code for something (another number).

Code, by the way, is just an abstract term, so, for example, an offset (of another number) is a code for it.

Another word (a synonym) for an offset is an address.

So, for sequences of addresses (offsets) it is OK, because offset 0 is the same as the base. For representing sequences of codes it is also fine, just avoid 0.

The only problem is a sequence of an arbitrary numbers. Well, there is no such thing as an arbitrary-length sequence of arbitrary numbers, including zero - it is too abstract. But it works for almost everything else - offsets, codes, etc.

So, this is how zero became a universal marker for the end-of-sequence of codes.

How do I know the length of an arbitrary-length sequence? I will count until I hit the end-of-sequence mark.

This means I would check the content of each consecutive byte, until I got zero.

```(define (length l)
(if (null? l)
0
(+ 1 (length (cdr l)))))
```

This code is a premature enlightenment, but these rules and reasoning about the structure of the data are the essence of CS.)

Now, what is the-empty-list? It is a list-of-no-thing. Just nothing. To represent no thing there is a symbol for it - zero. 0. So, '() is just 0.

Note: See TracWiki for help on using the wiki.