wiki:Lists

Lists

A single-linked list, in my opinion, is the most fundamental data-structure in whole CS. It uses two fundamental concepts - a pointer - an address (offset) of another byte in memory, and recursion - List is the simplest (and hence fundamental) recursive data-structure.

It is also asymmetrical (in terms of access times) data-structure - unlike an Array the "complexity" of adding an element to the "head" of a single-linked list is of O(1), while adding to the "tail" of a list is of O(n) (because we have to traverse the whole spine of a List.

Understanding Lists, like everything in CS, is full of subtleties, depending on which language or library we are using. We will focus on a few mostly-functional languages.

List type and related functions

First of all, no matter what other people are saying, List is nothing but Abstract Data Type which is, like any other ADT is defined by the set of possible operations over values of a given type . These operations are implemented as procedures of different kind - constructors, selectors and predicates.

For a single-linked list, which we will call List there are two constructors cons and list.

The primary constructor cons is usually an tightly coupled with a language runtime system, so we won't show the code here.

The list procedure - a constructor - in Scheme (and other Lisps) is a piece of beauty

(lambda x x)

From here all the magic and power of Lisps are coming from. We are saying All you need is Lambda. Not just that, we also need a List structure to represent our programs as Lists. Thus, we need heterogeneous Lists, because our expressions could contain different kind of values. Mark it - heterogeneous. This is the source of power.

Well, no static type-checker could ensure that each element of a List is of the same type, because we don't have such restriction. Some people would tell that they have a "safe lists" in their languages, because it type-checks and they have some option type, which represent "Nothing", or "NONE", and thus it also type-checks. This notion of "safety" is Just Nonsense.

In Lisps values are typed (type-taged) not variables. Technically speaking, there are NO variables in Lisps, only bindings - associations between a symbol and type-tagged value in a frame of environment. Thus, we could have any kind of aggregates, including a List - the most fundamental one.

Having lambda as the basic building block, list-structure as common representation for code and data and list-aware read function, we could define a constructor for List so elegantly.

1 ]=> (define list (lambda x x))

;Value: list

1 ]=> (list)

;Value: ()

1 ]=> (list 1 2)

;Value 13: (1 2)

1 ]=> (list '+ 2 3)

;Value 14: (+ 2 3)

Notice how we have created another expression - (+ 2 3) by making a List of a symbol and two self-evaluating numbers. Expressions is Lisps are Lists of symbols, numbers and other compound values. Procedures are mere values.

There are two "selectors" - car and cdr. In other languages they could be called hd and tl or first and rest. We prefer traditional, historic names, especially because we could say cadr or cddr, which is a small miracle in itself.

1 ]=> (car '(1 2 3))

;Value: 1

1 ]=> (cdr '(1 2 3))

;Value 13: (2 3)

1 ]=> (cadr '(1 2 3))

;Value: 2

1 ]=> (cddr '(1 2 3))

;Value 14: (3)

As intuition telling us, (cadr x) is mere (car (cdr x)) and (cddr x) is (cdr (cdr x)). Beautiful.

A List could be empty. That means it has 0 elements. We use an appropriate notation '() - see? No elements in it.

Is it a List?

1 ]=> (list? '())

;Value: #t

Sure.

Is it empty?

1 ]=> (null? '())

;Value: #t

It is.

Why we are so pedantic with this boring stuff? Because it is the essential to understand that a List type is completely defined by these procedures cons, list, car, cdr, list? and null?. These procedures encapsulate all the implementation details and define the behavior of a List. No more, no less.

Now repeat after me: a value of type List is returned by one of the "constructors" - cons or list, or implicitly created via list comprehension, like '(1 2 3) - the read procedure will create a value of type List containing 1, 2 and 3.

In other languages, such as Haskell or Standard ML a List type could be defined as an Algebraic Data Type

data [] a = [] | a : [a]

List in Haskell are homogeneous - all the values of a List must be of the same type. This is a huge difference.

The definition above could be read as "a value of type List is either an empty - [] or it is a cons - : of a value and another List of values of the same type". This is a recursive definition. This is an either-of type. This is boring.

Nevertheless the only way to create a value of type List is to call an appropriate constructor, like : which stands for cons in Haskell, or [] which constructs an empty list.

λ> let x = 1:2:3:[]
λ> sum x
6

There are two "constructors" [] and : - for an empty list and for consing a value to a List. In Haskell we could pattern-match on data-constructors.

append [] ys      =  ys
append [x:xs] ys  =  x : append xs ys

Pattern Matching is a clever technique, which combines "testing-for-the-same-shape of data" and "creating of local binding for the pieces of data upon success". It is a clever alternative of using "selectors" for bindings.

In Scheme we have to explicitly use "selectors"

(define (append xs ys)
    (if (null? xs)
        ys
        (cons (car xs)
              (append (cdr xs) ys))))

which is more verbose and less elegant. But. We could "see" the very essence of using an ADT, whole "machinery", all the details, how everything - behavior and implementation - is encapsulated by a few procedures which together define an ADT. Some-times it is a good trade-off for elegance.

Here is a good part. Since we have type-tagging of values (constructors do allocate memory for values and stick tags to them), universal list-structure for code and data, and the read procedure - we have macros. With macros we could implement a pattern-matching DSL and embed it into a Lisp. Hundreds of people have done it.

Here is append procedure in Erlang

append([], L) -> L;
append([H|T], L) -> [H|append(T, L)].

Same pattern-matching, different syntax.

"Classic" list procedures

Scheme is elegant

(define (member? x xs)
   (if (null? xs)
       #f
       (or (= x (car xs)) (member? x (cdr xs)))))

Haskell is beautiful, as usual.

elem _ []       =  False
elem x (y:ys)   =  x == y || elem x ys

or

elem x  =  any (== x)

member = elem

Erlang is pragmatic

member(X, [X|_]) -> true;
member(X, [_|Y]) -> member(X, Y);
member(X, []) -> false.

Scheme

(define (reverse xs)
   (let loop ((ys '()) (xs xs))
        (if (null? xs)
            ys
            (loop (cons (car xs) ys) (cdr xs)))))

which could be "abstracted out"

(define (foldl f a xs)
    (if (null? xs)
        a
        (foldl f (f a (car xs)) (cdr xs))))

(define (flip f)
   (lambda (x y) (f y x)))

(define (reverse xs)
   (foldl (flip cons) '() xs))

Haskell with a helper procedure

reverse l =  loop l []
  where
    loop []     a = a
    loop (x:xs) a = loop xs (x:a)

or

foldl f a  =  go
                where
                   go a []      =  a
                   go a (x:xs)  =  go (f a x) xs

flip f x y  =  f y x

reverse  =  foldl (flip (:)) []

Erlang

reverse([] = L) -> L;
reverse([_] = L) -> L;
reverse([A, B]) ->[B, A];
reverse([A, B | L]) ->reverse(L, [B, A]).

reverse([H|T], Y) -> reverse(T, [H|Y]);
reverse([], X) -> X.
Last modified 5 years ago Last modified on Oct 31, 2014, 8:18:30 PM
Note: See TracWiki for help on using the wiki.