Version 2 (modified by 7 years ago) ( diff ) | ,
---|

# Data

Know its *structure* (form, shape) and *conventions*.

Lets first think about a simple *fixed-length* sequence of only numbers.

[1 2 3 4]

Suppose, in a computer memory it is stored as a 4 consecutive bytes as one chunk of memory:

[1][2][3][4]

How do I know where is the *beginning of this sequence*?

Well, there must be an *address of a byte* that contains *the first element* of it, stored somewhere for me. Otherwise it lost.

What is an *address of a byte* in computer memory?

It is an *offset* of a byte from the "base of memory". Think of a ruler. Zero on it is the base.

What is an *offset*? It is just a *number of a byte* - first, second, third, etc. *starting from zero*. Repeat after me: *starting from zero*.

But why? Humans start counting from one - there is no zero fingers.

Well, there is a reason for it. *Offset* is not just a *number of a byte, starting from zero*. It is a *coefficient* to multiply to the *size-of-an-element of a sequence* to get its position, relative of the its beginning. A long sentence, I know.

Suppose the *size of a number* is just 1 byte. Then, to get the n-th number of of a sequence of numbers I do:

(+ base (* (- n 1) (size-of byte)))

The offset of the first element if zero. So, after multiplication I get zero. Base plus zero is base.

It is an address (or an offset) of the first element.

Address of the second element is base plus 1, third - base + two, and so on.

This is why in computer science we start counting from zero. Always. Just accept it. There is a deep reason for it.

What if I do not know the *size of my sequence* in advance? It means that I can't ask *what is an element with offset 10*, because there might be only 5 of them, or 7.

If I would try to get the content of a byte at offset 10, it will be just random number which happen to be in (a value of) this particular byte, and have nothing to do with the data in my structure.

The answer is - I must have something special to mark the *end-of-sequence*. It must be an *unique identity* which cannot be a part of sequence itself. Very difficult.)

But, look, the *base address* of a sequence and the *address of its first element* are the same number - *offset 0* is the same as base address. So, why not use 0 to mark the *end-of-sequence*?

This means we can have any numbers in a sequence, *except zero*, *anything but zero*.

Lets ask another question. What a number can represent? Well, an *offset* or a *code* for something (another number).

Code, by the way, is just an abstract term, so, for example, an offset (of another number) is a *code* for it.

Another word (a *synonym*) for an *offset* is an *address*.

So, for sequences of addresses (offsets) it is OK, because *offset 0* is the same as the *base*. For representing *sequences of codes* it is also fine, just avoid 0.

The only problem is a sequence of an arbitrary numbers. Well, there is no such thing as an arbitrary-length sequence of arbitrary numbers, including zero - it is too abstract. But it works for almost everything else - offsets, codes, etc.

So, this is how zero became a universal marker for the end-of-sequence of codes.

How do I know the length of an arbitrary-length sequence? I will count until I hit the *end-of-sequence* mark.

This means I would check the content of each consecutive byte, until I got zero.

(define (length l) (if (null? l) 0 (+ 1 (length (cdr l)))))

This code is a premature enlightenment, but these *rules* and *reasoning* about the structure of the data are the essence of CS.)

Now, what is *the-empty-list*? It is a *list-of-no-thing*. Just nothing. To represent *no thing* there is a *symbol* for it - *zero*. 0. So, '() is just 0.

**Note:**See TracWiki for help on using the wiki.