INTRODUCTION Linked
list is a chain of structs or records called nodes. Each node has at least
two members, one of which points to the next item or node in the list! These
are defined as Single Linked Lists because they only point to the next
item, and not the previous. Those that do point to both are called Doubly
Linked Lists or Circular Linked Lists.
Linked lists are among the
simplest and most common data structures, and are used to implement many
important abstract datastructures, such as stacks, queues, hash tables, symbolic expressions, skip lists, and many more.
The principal benefit of a linked list over a
conventional array is that
the order of the linked items may be different from the order that the data
items are stored in memory or on disk. For that reason, linked lists allow
insertion and removal of nodes at any point in the list, with a constant number
of operations.
Linked List Types:
Node and Pointer
Before
writing the code to build the above list, we need two data types...
•
Node The type for the nodes which will make up the body of the list.
These
are allocated in the heap. Each node contains a single client data
element
and a pointer to the next node in the list. Type: struct node
struct
node {
int
data;
struct
node* next;
};
•
Node Pointer The type for pointers to nodes. This will be the type of the
head
pointer and the .next fields inside each node. In C and C++, no
separate
type declaration is required since the pointer type is just the node
type
followed by a '*'. Type: struct node*
BuildOneTwoThree() Function
Here
is simple function which uses pointer operations to build the list {1, 2, 3}.
The
memory
drawing above corresponds to the state of memory at the end of this function.
This
function demonstrates how calls to malloc() and pointer assignments (=) work to
build
a pointer structure in the heap.
/*
Build
the list {1, 2, 3} in the heap and store
its
head pointer in a local stack variable.
Returns
the head pointer to the caller.
*/
struct
node* BuildOneTwoThree() {
struct
node* head = NULL;
struct
node* second = NULL;
struct
node* third = NULL;
head
= malloc(sizeof(struct node)); // allocate 3 nodes in the heap
second
= malloc(sizeof(struct node));
third
= malloc(sizeof(struct node));
head->data
= 1; // setup first node
head->next
= second; // note: pointer assignment rule
second->data
= 2; // setup second node
second->next
= third;
third->data
= 3; // setup third link
third->next
= NULL;
// At
this point, the linked list referenced by "head"
//
matches the list in the drawing.
return
head;
}
Basic
concepts and nomenclature
Each record of a linked list is often called an element or node.
The field of each node that contains address of the next node is usually
called the next link or next pointer. The remaining
fields may be called the data, information, value, or payload
fields.
The head of a list is its first node, and the tail is the list
minus that node (or a pointer thereto). In Lisp and some derived languages, the
tail may be called the CDR (pronounced
could-R) of the list, while the payload of the head node may be called
the
Linear and circular lists
In the last node of a list, the link field often contains a null
reference, a special value that is interpreted by programs as meaning
"there is no such node". A less common convention is to make it point
to the first node of the list; in that case the list is said to be circular
or circularly linked; otherwise it is said to be open or linear.
Simply-, doubly-, and multiply-linked lists
In a doubly-linked list, each node contains, besides the next-node link, a
second link field pointing to the previous node in the sequence. The two
links may be called forward(s) and backwards. Linked lists
that lack such pointers are said to be simply linked, or simple
linked lists.
A doubly-linked list whose nodes contain three fields: an integer value, the link forward to the next node, and the link backward to the previous node
The technique known as XOR-linking allows a doubly-linked list to be implemented using
a single link field in each node. However, this technique requires the ability
to do bit operations on addresses, and therefore may not be available in some
high-level languages.
In a multiply-linked list, each node contains two or more link
fields, each field being used to connect the same set of data records in a
different order (e.g., by name, by department, by date of birth, etc.). (While
doubly-linked lists can be seen as special cases of multiply-linked list, the
fact that the two orders are opposite to each other leads to simpler and more
efficient algorithms, so they are usually treated as a separate case.)
Linked lists vs. arrays
Array
|
Linked list
|
|
Indexing
|
Θ(1)
|
Θ(n)
|
Inserting / Deleting at end
|
Θ(1)
|
Θ(1) or Θ(n)
|
Inserting / Deleting in middle (with iterator)
|
Θ(n)
|
Θ(1)
|
Persistent
|
No
|
Singly yes
|
Locality
|
Great
|
Poor
|
Linked lists have several advantages over arrays. Insertion of an element at a specific
point of a list is a constant-time operation, whereas insertion in an array may
require moving half of the elements, or more. While one can "delete"
an element from an array in constant time by somehow marking its slot as
"vacant", an algorithm that iterates over the elements may have to
skip a large number of vacant slots.
Simply-linked linear lists vs. other lists
While doubly-linked and/or circular lists have
advantages over simply-linked linear lists, linear lists offer some advantages
that make them preferable in some situations.
For one thing, a simply-linked linear list is a recursive data structure, because it contains a
pointer to a smaller object of the same type. For that reason, many
operations on simply-linked linear lists (such as merging two lists, or enumerating the elements
in reverse order) often have very simple recursive algorithms, much simpler
than any solution using iterative commands. While
one can adapt those recursive solutions for doubly-linked and circularly-linked
lists, the procedures generally need extra arguments and more complicated base
cases.
Doubly-linked vs. singly-linked
Double-linked lists require more space per node
(unless one uses xor-linking), and their elementary operations are more
expensive; but they are often easier to manipulate because they allow
sequential access to the list in both directions. In particular, one can insert
or delete a node in a constant number of operations given only that node's
address. To do the same in a singly-linked list, one must have the previous
node's address. Some algorithms require access in both directions. On the other
hand, they do not allow tail-sharing, and cannot be used as persistent data
structures.
Circularly-linked vs. linearly-linked
A circularly linked list may be a natural option to represent arrays that
are naturally circular, e.g. for the corners of a polygon, for a pool of buffers that are used and released in FIFO order, or for a set of processes that
should be time-shared in round-robin order.
In these applications, a pointer to any node serves as a handle to the whole
list.
With a circular list, a pointer to the last node gives easy access also to
the first node, by following one link. Thus, in applications that require
access to both ends of the list (e.g., in the implementation of a queue), a
circular structure allows one to handle the sructure by a single pointer,
instead of two.
Using sentinel nodes
Sentinel node may simplify certain list operations, by ensuring that the
next and/or previous nodes exist for every element, and that even empty lists
have at least one node. One may also use a sentinel node at the end of the
list, with an appropriate data field, to eliminate some end-of-list tests. For
example, when scanning the list looking for a node with a given value x,
setting the sentinel's data field to x makes it unnecessary to test for
end-of-list inside the loop. Another example is the merging two sorted lists:
if their sentinels have data fields set to +∞, the choice of the next output
node does not need special handling for empty lists.
However, sentinel nodes use up extra space (especially in applications that
use many short lists), and they may complicate other operations (such as the
creation of a new empty list).
However, if the circular list is used merely to simulate a linear list, one
may avoid some of this complexity by adding a single sentinel node to every
list, between the last and the first data nodes. With this convention, an empty
list consists of the sentinel node alone, pointing to itself via the next-node
link. The list handle should then be a pointer to the last data node, before
the sentinel, if the list is not empty; or to the sentinel itself, if the list
is empty.
The same trick can be used to simplify the handling of a doubly-linked
linear list, by turning it into a circular doubly-linked list with a single
sentinel node. However, in this case, the handle should be a single pointer to
the dummy node itself.
Linked list operations
When manipulating linked lists in-place, care must be taken to not use values that you have invalidated in previous assignments. This makes algorithms for inserting or deleting linked list nodes somewhat subtle. This section gives pseudocode for adding or removing nodes from singly, doubly, and circularly linked lists in-place. Throughout we will use null to refer to an end-of-list marker or sentinel, which may be implemented in a number of ways.
Linearly-linked lists
Singly-linked lists
Our node data structure will have two fields. We also keep a variable firstNode
which always points to the first node in the list, or is null for an
empty list.
record Node {
data // The data being stored in the node
next // A reference to the next node, null for last node
}
record List {
Node firstNode // points to first node of list; null for empty list
}
Traversal of a singly-linked list is simple,
beginning at the first node and following each next link until we come
to the end:
node := list.firstNode
while node not null {
(do something with node.data)
node := node.next
}
The following code inserts a node after an
existing node in a singly linked list. The diagram shows how it works.
Inserting a node before an existing one cannot be done; instead, you have to
locate it while keeping track of the previous node.
function insertAfter(Node node, Node newNode) { // insert newNode after node
newNode.next := node.next
node.next := newNode
}
Inserting at the beginning of the list requires a
separate function. This requires updating firstNode.
function insertBeginning(List list, Node newNode) { // insert node before current first node
newNode.next := list.firstNode
list.firstNode := newNode
}
Similarly, we have functions for removing the
node after a given node, and for removing a node from the beginning of
the list. The diagram demonstrates the former. To find and remove a particular
node, one must again keep track of the previous element.
function removeAfter(node node) { // remove node past this one
obsoleteNode := node.next
node.next := node.next.next
destroy obsoleteNode
}
function removeBeginning(List list) { // remove first node
obsoleteNode := list.firstNode
list.firstNode := list.firstNode.next // point past deleted node
destroy obsoleteNode
}
Notice that removeBeginning() sets list.firstNode
to null when removing the last node in the list.
Since we can't iterate backwards, efficient
"insertBefore" or "removeBefore" operations are not
possible.
Appending one linked list to another can be
inefficient unless a reference to the tail is kept as part of the List
structure, because we must traverse the entire first list in order to find the
tail, and then append the second list to this. Thus, if two linearly-linked
lists are each of length n,
list appending has asymptotic time
complexity of O(n). In the
Lisp family of languages, list appending is provided by the append procedure.
Many of the special cases of linked list
operations can be eliminated by including a dummy element at the front of the
list. This ensures that there are no special cases for the beginning of the
list and renders both insertBeginning() and removeBeginning() unnecessary. In
this case, the first useful data in the list will be found at
list.firstNode.next.
Circularly-linked list
In a circularly linked list, all nodes are linked in a continuous circle,
without using null. For lists with a front and a back (such as a queue),
one stores a reference to the last node in the list. The next node after
the last node is the first node. Elements can be added to the back of the list
and removed from the front in constant time.
Circularly-linked lists can be either singly or
doubly linked.
Both types of circularly-linked lists benefit
from the ability to traverse the full list beginning at any given node. This
often allows us to avoid storing firstNode and lastNode, although
if the list may be empty we need a special representation for the empty list,
such as a lastNode variable which points to some node in the list or is null
if it's empty; we use such a lastNode here. This representation
significantly simplifies adding and removing nodes with a non-empty list, but
empty lists are then a special case.
Assuming that someNode is some node in a non-empty circular
simply-linked list, this code iterates through that list starting with someNode:
function iterate(someNode)
if someNode ≠ null
node := someNode
do
do something with node.value
node := node.next
while node ≠ someNode
Notice that the test "while node ≠ someNode" must be at the
end of the loop. If it were replaced by the test "" at the beginning
of the loop, the procedure would fail whenever the list had only one node.
This function inserts a node "newNode" into a circular linked list
after a given node "node". If "node" is null, it assumes
that the list is empty.
function insertAfter(Node node, Node newNode)
if node = null
newNode.next := newNode
else
newNode.next := node.next
node.next := newNode
Suppose that "L" is a variable pointing to the last node of a
circular linked list (or null if the list is empty). To append "newNode"
to the end of the list, one may do
insertAfter(L, newNode)
L = newNode
To insert "newNode" at the beginning of the list, one may
do
insertAfter(L, newNode)
if L = null
L = newNode
Linked
lists using arrays of nodes
Languages that do not support any type of reference
can still create links by replacing pointers with array indices. The approach
is to keep an array of records, where each record has integer fields indicating
the index of the next (and possibly previous) node in the array. Not all nodes
in the array need be used. If records are not supported as well, parallel arrays can often be used instead.
As an example, consider the following linked list
record that uses arrays instead of pointers:
record Entry {
integer next; // index of next entry in array
integer prev; // previous entry (if double-linked)
string name;
real balance;
}
By creating an array of these structures, and an integer variable to store
the index of the first element, a linked list can be built:
integer listHead;
Entry Records[1000];
Links between elements are formed by placing the array index of the next (or
previous) cell into the Next or Prev field within a given element. For example:
Index
|
Next
|
Prev
|
Name
|
Balance
|
0
|
1
|
4
|
Jones, John
|
123.45
|
1
|
-1
|
0
|
Smith, Joseph
|
234.56
|
2 (listHead)
|
4
|
-1
|
Adams, Adam
|
0.00
|
3
|
Ignore, Ignatius
|
999.99
|
||
4
|
0
|
2
|
Another, Anita
|
876.54
|
5
|
||||
6
|
||||
7
|
In the above example,
ListHead
would be set to 2, the location of the first entry in the list. Notice that
entry 3 and 5 through 7 are not part of the list. These cells are available for
any additions to the list. By creating a ListFree
integer variable, a free list could be
created to keep track of what cells are available. If all entries are in use,
the size of the array would have to be increased or some elements would have to
be deleted before new entries could be stored in the list.
The following code would traverse the list and display names and account
balance:
i := listHead;
while i >= 0 { '// loop through the list
print i, Records[i].name, Records[i].balance // print entry
i = Records[i].next;
}
When faced with a choice, the advantages of this approach include:
- The linked list is relocatable, meaning it can be moved about in memory at will, and it can also be quickly and directly serialized for storage on disk or transfer over a network.
- Especially for a small list, array indexes can occupy significantly less space than a full pointer on many architectures.
- Locality of reference can be improved by keeping the nodes together in memory and by periodically rearranging them, although this can also be done in a general store.
- Naïve dynamic memory allocators can produce an excessive amount of overhead storage for each node allocated; almost no allocation overhead is incurred per node in this approach.
- Seizing an entry from a pre-allocated array is faster than using dynamic memory allocation for each node, since dynamic memory allocation typically requires a search for a free memory block of the desired size.
This approach has one main disadvantage, however: it creates and manages a
private memory space for its nodes. This leads to the following issues:
- It increase complexity of the implementation.
- Growing a large array when it is full may be difficult or impossible, whereas finding space for a new linked list node in a large, general memory pool may be easier.
- Adding elements to a dynamic array will occasionally (when it is full) unexpectedly take linear (O (n)) instead of constant time (although it's still an amortized constant).
- Using a general memory pool leaves more memory for other data if the list is smaller than expected or if many nodes are freed.
For these reasons, this approach is mainly used for languages that do not
support dynamic memory allocation. These disadvantages are also mitigated if
the maximum size of the list is known at the time the array is created.
Language support
Many programming languages such as Lisp and Scheme have singly linked lists built in. In many functional
languages, these lists are constructed from nodes, each called a cons or cons cell. The cons has two fields: the car, a
reference to the data for that node, and the cdr, a
reference to the next node. Although cons cells can be used to build other data
structures, this is their primary purpose.
In languages that support Abstract data types
or templates, linked list ADTs or templates are available for building linked
lists. In other languages, linked lists are typically built using references
together with records.
Here is a complete example in C:
#include <stdio.h> /* for printf */
#include <stdlib.h> /* for malloc */
struct node
{
int data;
struct node *next; /* pointer to next element in list */
};
struct node *list_add(struct node **p, int i)
{
struct node *n = malloc(sizeof(struct node));
if (n == NULL)
return NULL;
n->next = *p; /* the previous element (*p) now becomes the "next" element */
*p = n; /* add new empty element to the front (head) of the list */
n->data = i;
return *p;
}
void list_remove(struct node **p) /* remove head */
{
if (*p != NULL)
{
struct node *n = *p;
*p = (*p)->next;
free(n);
}
}
struct node **list_search(struct node **n, int i)
{
while (*n != NULL)
{
if ((*n)->data == i)
{
return n;
}
n = &(*n)->next;
}
return NULL;
}
void list_print(struct node *n)
{
if (n == NULL)
{
printf("list is empty\n");
}
while (n != NULL)
{
printf("print %p %p %d\n", n, n->next, n->data);
n = n->next;
}
}
int main(void)
{
struct node *n = NULL;
list_add(&n, 0); /* list: 0 */
list_add(&n, 1); /* list: 1 0 */
list_add(&n, 2); /* list: 2 1 0 */
list_add(&n, 3); /* list: 3 2 1 0 */
list_add(&n, 4); /* list: 4 3 2 1 0 */
list_print(n);
list_remove(&n); /* remove first (4) */
list_remove(&n->next); /* remove new second (2) */
list_remove(list_search(&n, 1)); /* remove cell containing 1 (first) */
list_remove(&n->next); /* remove second to last node (0) */
list_remove(&n); /* remove last (3) */
list_print(n);
return 0;
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
No comments:
Post a Comment