Version 1 (modified by schiptsov, 7 years ago) ( diff )

--

# Data

Know its structure (form, shape) and conventions.

Lets first think about a simple fixed-length sequence of only numbers.

```[1 2 3 4]
```

Suppose, in a computer memory it is stored as a 4 consecutive bytes as one chunk of memory:

```
```

How do I know where is the beginning of this sequence?

Well, there must be an address of a byte that contains the first element of it, stored somewhere for me. Otherwise it lost.

What is an address of a byte in computer memory?

It is an offset of a byte from the "base of memory". Think of a ruler. Zero on it is the base.

What is an offset? It is just a number of a byte - first, second, third, etc. starting from zero. Repeat after me: starting from zero.

But why? Humans start counting from one - there is no zero fingers.

Well, there is a reason for it. Offset is not just a number of a byte, starting from zero. It is a coefficient to multiply to the size-of-an-element of a sequence to get its position, relative of the its beginning. A long sentence, I know.

Suppose the size of a number is just 1 byte. Then, to get the n-th number of of a sequence of numbers I do:

```(+ base (* (- n 1) (size-of byte)))
```

The offset of the first element if zero. So, after multiplication I get zero. Base plus zero is base.

It is an address (or an offset) of the first element.

Address of the second element is base plus 1, third - base + two, and so on.

This is why in computer science we start counting from zero. Always. Just accept it. There is a deep reason for it. What if do not know the size of my sequence in advance? It means that I can't ask what is an element with offset 10, because there may be only 5 of them, or seven.

If I would try to get a content of a byte at offset 10, it will be just random number, that happen to be in this byte, and have nothing to do with my structure.

The answer is, I must have something special to mark the end-of-sequence. It must be an unique thing, which cannot be a part of sequence itself. Very difficult.

But, look, the base address of a sequence and the address of its first element are the same number. Offset 0 is the same as base address. So, why not use 0 to mark the end-of-sequence.

This means we can have any numbers in a sequence, except zero, anything but zero.

What a number can represent? Generally two things - an offset or a code for a something.

Code, by the way, is just an abstract term, so, for example, an offset (of another number) is a code for it.

Another word (a synonym) for an offset is an address.

So, for sequences of addresses (offsets) it is OK, because offset 0 is the same as base. For representing sequences of characters it is also fine, just avoid 0.

The only problem is a sequence of an arbitrary numbers. Well, there is no such thing as an arbitrary-length sequence of arbitrary numbers, including zero. But it works for almost everything else - offsets, characters, etc.

So, this is how zero became a universal marker for the end-of-sequence.

How do I know the length of an arbitrary-length sequence? I will count until I hit the end-of-sequence mark.

This means I would check the content of each consecutive byte, until I got zero.

These rules about the structure of the data are the essence of CS.)

What is the-empty-list? It is a list-of-no-thing. Or just nothing. To represent nothing there is a symbol for it - zero. 0. So, '() is just 0.

Note: See TracWiki for help on using the wiki.