Know your data

How could we make an arbitrary-length sequence?

Hey, it is just an end-of-sequence mark! Every journey begins with the first step, sort of.)


The-empty-list is a list, no surprise.

(list? '()) ; #t

Wait, it says that 0 *is* a list? Well, technically it is just the end-of-list mark. A number zero.

Logically, it is a list with no elements in it. So, it is correct.

Now, how we create a pair? By sticking something in front of the-end-of-list mark.

(cons 1 '())

'(1 . '()) ; this is what we got

It is a pair of two elements. One is the number 1, another is the-empty-list.

This is called compound data structure - something made out of atoms.)

(pair '())   ; false
(pair '(1))  ; true

'() is just an atom. '(1 . '()) is a structure.

There is how it works. We have some byte which contain zero in it. By convenience we use it as a mark for the-end-of-list.

We can access this empty list if we know its address in memory, which is just an offset of this particular byte (containing zero).

This is fine, how could we stick a new element in front of it?

Well, here is a trick. This new element, the number 1, is already stored somewhere in memory in some byte. Otherwise how could we access it. This byte (with number 1 in it) has some address, its own offset.

Now we have 1 and '() in two separate bytes somewhere in memory, how do we put them together?

The naive way would be to find an two unused consecutive bytes, store values in them and remember the address of the first byte somewhere somehow. This requires copying of values from one location to another, which, for real-world cases, requires the knowledge of the size and encoding (the type) of the values.

There is another algorithm. We shall find an unused, consecutive region of memory, big enough to hold two machine words.

A machine word is a name for the size of a register, big enough to hold any possible address (offset of a byte) in computer's memory. This is where fancy terms of 32bit or 64bit processors came from. It is just the size of a machine word, or just how big the address is.

So, we have a chunk of memory with size of two machine words - big enough to hold two addresses. In the lower (or the leftmost) word we store the address of the number 1, and put the address of the-empty-list into another. Then we remember the address of the first (lower) word as the address of the whole thing.

Why to know all this too specific stuff, all those little details? To be able to notice and realize the beauty, of course.)

What we just made in an abstract computer memory is called Pair. It is just a fixed-length structure (an array of) two addresses. Because it is of a fixed size we don't need any special values to mark its end. Knowing the address of the first byte and the structure is enough.

So, a Pair is just two numbers, which are addresses (offsets) of another numbers.

Wait, does it mean that we always have to have an address of the empty list in each pair?

Logically, yes. Technically, no. We just put zeroes in the second address, to indicate that there is nothing in it. There is, of course, the reason for doing this.

How do we make a List? Well, we make two Pairs and then link them together.

First we make a new pair, whose first word has an address of a byte holding a value, and the second word is filled with zeros.

Now we have two Pairs, each of which is a data-structure (array) of two addresses.

There is the magic:

We put the address of the old pair into the second word of the new pair, and return the address of the new pair as the address of the whole new thing.

So, a list is a sequence of zero, one, or more pairs linked together using pointers (words containing addresses).

There is a few thing to realize:

  • We allocated memory for a each new pair only once.
  • We didn't copy or overwrite any data.
  • We didn't change the old pair.
  • We know that there is another pair, because the second address is not zero.

Look at this again. It is not just some consequences of some random set of actions. This tiny set of rules and simple algorithm allow us to model and represent almost *everything* in the computer memory. Not just one-dimensional sequences, like lists or strings, but also nested lists, trees or even tables.

More than that. We almost never change the data, once it was allocated and set up. We just work with addresses.

'() ; logically, the-empty-list is an unique entity, there are many pointers to it.

Each time we do something we return an address of a data structure, containing a value instead of a value itself. Usually it is an address of the whole new thing, or an address of the old, unmodified one. Thus we almost never over-write or lose the data.

(list a b) ; return a new list, leaving a and b unchanging, whatever they are

(map square '(1 2 3)) ; returns a new list of squares, leaving original unmodified

Because we almost always re-creating a data structure anew, instead of replacing the old values, we could do different operations with the same data structure in parallel, without any locking, in any order.

These are things to realize. This is what so-called computer science is for.

There is some action..

Last modified 7 years ago Last modified on Dec 14, 2012, 5:08:03 PM
Note: See TracWiki for help on using the wiki.