To keep it simple, we can think of data structures in terms of its shape. There are
- list - a sequence of the values of the same type (an array or vector) or of any type (a dynamic list or a fixed set)
- table - of the values of the same type (matrix or multi-dimensional array) or of mixed types an associative list (alist) or an associative array or a hash table.
- tree - trees of any kind, such as binary trees, binary-search trees, red-black trees, etc.
- net - a graph or a network with any kind connections between its nodes.
They all are abstract data types, because deep down there is nothing but pointers and numbers. Pointers are mere numbers themselves.
We should use annotations for the compiler to choice appropriate built-in primitives and hardware-supported types of values. We should forget all space optimizations because memory is cheap, and use partitioning and pool-based pre-allocation instead. There is a pool for pointers, another one for indexes, another one for data, etc. The goal is to keep the most clear symbolic notation and and to transform it to simple immutable data structures based on pointers, assuming 64-bit x86-64 Long mode and SSE4.2+ instruction set.
cons: a pair of pointers - the basic building block
(a . b)
list: a sequence (could be annotated as a set, one-dimension array, etc)
(1 "two" 3 (4 5))
vecror: an array with a shape, which is actually just a pointer to a sequence of chunks of memory of the same size.
((1) (2) (3))
matrix: multi-dimension array, which is a flat sequence of memory chunks in memory.
((1 2 3) (3 4 5) (6 7 8))
associative array: two linked sets of pointers
(("a" . 1) ("b" . 2) ("c" . '(1 2 3)))
hash table: set of slots plus a hash-function to produce an offset in the index.
(("apple" . 1) ("orange" . 2) ("grapes" . 3))
a table could be "called" by sending a message to it:
(t "orange") => 2
Structure. Form. Shape.
|1D||Line||List||Pairs-of-Pairs, 1-D Array (Vector)|
|2D||Rectangle||Table||Array-of-Arrays (Matrix), Associative List or Array, Hash-Table|
|nD||Tree||Tree||B-Tree, Binary Tree, BST, RB, AVL, etc.|
|nD||Network||Graph||Acyclic, Directed graph, Undirected graph, etc.|
Computer memory is linear, a flat sequence of words.
The crucial difference between Arrays and other data structures is that Arrays are represented in memory as a continuous chunk of memory, while other data structures involving pointers (addresses of a byte in memory) to link nodes together.
So, a Nested Array (an Array Of Arrays) is one continuous chunk of memory in which the next row starts where previous row ends. There is no pointers between its elements.
With continuous chunks of memory we can use Streaming (SSE) and Vector (AVX) Extensions to efficiently implement Vector or Matrix manipulations. All the elements of an array must be of the same machine type - floats or integers.
Pair (a cons cell - cons/car/cdr)
(cons 1 2) ; => (1 . 2) (car '(1 . 2)) ; => 1 (cdr '(1 . 2)) ; => 2 '(1 . (2 . (3))) ; => (1 2 3) - cons 2 to 3 and then cons 1 to the result '(1 . (2 3)) ; => (1 2 3) - cons 1 to a list of 2 and 3 '(1 . (2 . 3)) ; => (1 2 . 3) - cons 1 to a pair of 2 and 3
Table is one or more column (a vertical sequece)
((1) ; one column, three rows (2) (3))
Table is one or more row (horisontal sequence)
((1 2 3) (4 5 6)) ; two rows, three columns ((1 2 3) ; three rows, three columns (4 5 6) (7 8 9))
Tree is a nested list of conses. car is a node, cdr is a pointer to the next node.
'(* . (2 . ((+ . (x . (y)))))) ; => (* 2 (+ x y))
y + * x 2
or more familiar
* 2 + x y