wiki:BigIdeas/AbstractDataTypes

Abstract Data Types

An abstract type is a type equipped with a set of operations on its values, which are the only operations applicable to that type.

Such type forms an Abstraction Barrier such that its representation can be changed — perhaps to a more efficient one — without affecting the rest of the program.

cons, car, cdr pair? of the type Pair is the canonical example of an ADT.

Abstractions

Appropriate abstractions (the ones that reflects certain aspects of reality, of what is) is the secret of the trade. From right abstractions comes right data-structures, and from these right algorithms follow.

Abstractions are obtained from the process of modeling aspects of reality. Modeling is a part of the scientific method.

The first principle about abstractions - they must not be disconnected from reality, from what is, like Hegelian nonsense. Abstraction means generalization, not some "higher level" bullshit from "the universe of ideas". Abstractions must be concrete. An atom is a good abstraction, sequence is good abstraction. "Higher dimensions" is bullshit.

Abstract Data Types or ADTs.

Abstract data types are the very essence of programming. In contrast to so-called "machine types" which define values according with hard-wired encoding of this or that hardware architecture, abstract data types are used by programmer to describe and represent real-world concepts and data.

The power of a programming language mostly comes from the ability to define ADTs and procedures over them.

ADTs are abstractions described in terms of a programming language, with a set of procedures "closed over them". The procedures are of three different kinds: constructors, selectors and predicates.

Values of a certain type must follow (or satisfy) a set of conventions and constraints - how values of this type are represented and interpreted.

The precise definition is - a type is defined by a set of possible operations (procedures) over its values. Constructors and selectors define representation aspect, predicates - interpretation. To understand the ideas behind data types we have to pay attention to details and spell out all the nuances of real-world phenomena we are trying to capture.

For example, we could say that a list is either an "empty list" - a list containing zero elements - nothing, emptiness, or it is a "cons cell" (or a pair) of an element and "another list". Now lets try to define these "abstract" relationships in terms of a "concrete" programming language.

In Standard ML we could write it this way:

datatype List = Empty
              | Cons of int * List

This almost literally means - a List is either Empty or it is a Cons of int and another List.

This expression happen to be a definition of so-called Algebraic Data Type but this in NOT the point. What we are studying here is more general concept of an ADT to which an Algebraic Type are just a "specialization". So, let's forget about it for a while.

Here we have defined a List of Ints. According to the semantics of Standard ML, Empty and Cons are so-called "data-constructors". Logically, these are procedures which returns a piece of data "type-tagged" with the tag List. We could say that they do wrap a value into a List-container or lift a value into the world of Lists while technically we just "tag a value with an additional marker, like (cons 'type x).

The Empty constructor requires no arguments, while Cons requires exactly two - the first one of the type Int and the second must be of type List. They both returns a value of the same type List.

Having just this we could define a value of type List:

val x = Cons (4, Cons (23, Cons (2008, Empty)))

Think about this as a nested procedure calls - Empty, then Cons of 2008 and that value returned by Empty, and then Cons 23 to the value of that, and then Cons 4 to it.

The clever idea in the ML family of languages is that they have a DSL - a sub-language for describing data types, which also defines "data-constructors" and "templates for pattern-matching" instead of "selectors", which must be explicitly defined (as ordinary procedures) by a programmer. The very same idea we will see in Haskell.

In Common Lisp we have a DSL (a bunch of macros) for defining structures, which implicitly creates "selectors", "constructors" and "helper procedures for a generalized setter" (setf macro) for a given type. The same idea, generalized.

We do not have to think about the representation, only about logic. The "data-constructors" are hiding (encapsulating) all the implementation details of how a value of the type List is represented, which does not concern us.

Now we could define the append procedure for our type List in Standard ML language:

fun list_append (xs, ys) =
    case xs of
         Empty         => ys
       | Cons (x, xs') => Cons (x, list_append (xs', ys))

And here is append for built-in lists:

fun append (xs, ys) =
    case xs of
         []     => ys
       | x::xs' => x :: append(xs', ys)

In both cases we are using pattern-matching against data-constructors ([] and :: in the case of built-in ML lists).

Now we could define a polymorphic type - a generic list - a list of any type.

datatype 'a List = Empty
                 | Cons of 'a * 'a List

where 'a is a "type-parameter", a placeholder for any other type, such as Int or Real.

In ML and Haskell all the lists must be of values of the same type. This property has a special name - homogeneous lists.

Now let's consider Scheme's built-in Lists. The naive append function would have almost the same shape:

(define (append xs ys)
  (cond ((null? xs) ys)
        (#t (cons (car xs) (append (cdr xs) ys)))))

Instead of pattern-matching which performs "type-checking" and "destructuring" (variable binding) implicitly we are explicitly using a conditional cond with the "predicate" function (null?) and "selectors" (car and cdr).

The ML's version of append is much more elegant, but to understand ADTs we have to learn to define and use "constructors", "selectors" and "predicates" for each ADT.

So, for the built-in List type there are "constructors" - cons and list, "selectors" car and cdr, "predicates" - null? and list?.

More idiomatic version of apppend:

(define (append xs ys)
  (if (null? xs)
      ys
      (cons (car xs) (append (cdr xs) ys))))

Defining a generic List

data List a = Nil | Cons a (List a)

and the naive append for built-in Lists in Haskell

append [] ys      =  ys
append [x:xs] ys  =  x : append xs ys

or

[] ++ ys      =  ys 
(x:xs) ++ ys  =  x : (xs ++ ys)
Last modified 17 months ago Last modified on Nov 3, 2018, 5:44:45 AM
Note: See TracWiki for help on using the wiki.