Linked Lists - Advanced Programming Techniques (2011, 2013)

Advanced Programming Techniques (2011, 2013)

3. Linked Lists

33

A linked list is a flexible abstract data structure that is useful for relatively short lists where items are frequently added and removed from the list. Figure 9 shows a non-circular singly linked list that contains three nodes. Because of today’s modern programming libraries such as the C++ standard template library (STL) and the Java collections found in the package java.util, few programmers need to write a linked list. However, there are still some programmers that are required to write them, most notably students, library developers, and kernel developers.

A non-circular singly linked list containing three nodes

Figure 9: A non-circular singly linked list containing three nodes.

A linked list is made up of connected nodes. In C and C++ the nodes are connected using pointers, and in Java and other languages that don’t have pointers, the nodes are connected using references. The example code in this chapter is written in C because example 3 in this chapter uses the address of pointers or in other words, pointers to pointers which is not possible in Java because Java doesn’t have pointers.

Inefficient Node Centric Lists

Most programmers would consider writing a linked list, singly or doubly linked, circular or non-circular, as trivial. Unfortunately, almost all programmers write linked lists inefficiently and incorrectly because they are never taught the correct way to write one. Computer programmers are almost always taught to visualize a linked list in a node centric way. In other words, they are taught to focus on the nodes when writing code to add, find, and remove elements from a linked list, which results in C code for a non-circular singly linked list as shown in Figure 10 and example 1.

34

LinkedNode

–next : LinkedNode *
which : int

+createNode(which : int) : LinkedNode *
+freeNode(node : LinkedNode *) : void

LinkedList

–head : LinkedNode *
–tail : LinkedNode *

+createList() : LinkedList *
+freeList(list : LinkedList *) : void
+listIsEmpty(list : LinkedList *) : int
+getNode(list : LinkedList *, index : int) : LinkedNode *
+findNode(list : LinkedList *, key : int) : LinkedNode *
+insertNode(list : LinkedList *, node : LinkedNode *) : void
+appendNode(list : LinkedList *, node : LinkedNode *) : void
+removeFirst(list : LinkedList *) : LinkedNode *
+removeNode(list : LinkedList *, node : LinkedNode *) : void

Figure 10: A UML class diagram for a node centric singly linked list.

Example 1

typedef struct slnode {

struct slnode *next;

/* Programmer defined data goes here. */

int which;

} LinkedNode;

/* Creates and returns a singly-linked node. */

LinkedNode *createNode(int which) {

LinkedNode *node = malloc(sizeof(*node));

node->next = NULL;

node->which = which;

return node;

}

/* Frees a node. */

void freeNode(LinkedNode *node) {

node->next = NULL;

free(node);

}

typedef struct sllist {

LinkedNode *head, *tail;

} LinkedList;

/* Creates and initializes a

* non-circular, singly-linked list. */

LinkedList *createList(void) {

LinkedList *list = malloc(sizeof(*list));

list->head = NULL;

list->tail = NULL;

return list;

}

35

/* Frees all the nodes in this list. */

void freeList(LinkedList *list) {

LinkedNode *prev, *curr = list->head;

while ((prev = curr) != NULL) {

curr = curr->next;

freeNode(prev);

}

list->head = NULL;

list->tail = NULL;

free(list);

}

/* Returns true if this list is

* empty; otherwise returns false. */

int listIsEmpty(const LinkedList *list) {

return list->head == NULL;

}

/* Returns a pointer to the node in this list at

* location index or NULL if no such node exists. */

LinkedNode *getNode(const LinkedList *list, int index) {

LinkedNode *curr = list->head;

while (curr != NULL) {

if (index == 0) {

break;

}

--index;

curr = curr->next;

}

return curr;

}

/* Returns a pointer to the node in this list that

* contains key or NULL if no such node exists. */

LinkedNode *findNode(const LinkedList *list, int key) {

LinkedNode *curr = list->head;

while (curr != NULL) {

if (curr->which == key) {

break;

}

curr = curr->next;

}

return curr;

}

/* Inserts a node at the beginning of this list. */

void insertNode(LinkedList *list, LinkedNode *node) {

if (listIsEmpty(list)) {

list->tail = node;

}

node->next = list->head;

list->head = node;

}

36

/* Appends a node at the end of this list. */

void appendNode(LinkedList *list, LinkedNode *node) {

if (listIsEmpty(list)) {

list->head = node;

}

else {

list->tail->next = node;

}

list->tail = node;

node->next = NULL;

}

/* Removes and returns the first node from this list. */

LinkedNode *removeFirst(LinkedList *list) {

LinkedNode *first = list->head;

if (!listIsEmpty(list)) {

LinkedNode *next = first->next;

list->head = next;

if (next == NULL) {

/* There was only one node in this list. Now

* that we are removing it, this list is empty. */

list->tail = NULL;

}

first->next = NULL;

}

return first;

}

/* Removes a node from this list. */

void removeNode(LinkedList *list, LinkedNode *node) {

LinkedNode *prev = list->head;

if (prev == node) {

/* The node to be removed is at the beginning

* of the list. Remove the node. */

removeFirst(list);

}

else {

/* Traverse the list to find the node that

* comes before the one to be removed. */

while (prev != NULL) {

if (prev->next == node) {

/* We found the node, so remove it. */

LinkedNode *next = node->next;

prev->next = next;

if (next == NULL) {

/* We are removing the node at the end

* of the list, so change the tail. */

list->tail = prev;

}

node->next = NULL;

break;

}

prev = prev->next;

}

}

}

37

Notice the complexity of the appendNode and removeNode functions, especially the removeNode function which even includes a call to the removeFirst function. Because we have taken a node centric approach when writing this linked list, we must write if statements to handle two special cases: when the list is empty and when the node to be removed is at the beginning of the list. The complexity of this code, especially the remove function makes it difficult to code correctly and to test c

Dummy Node

Many programmers will reduce the complexity of the append and remove functions by adding an empty dummy node at the beginning of the list. This dummy node contains no user defined data and serves only as a place holder.Figure 11 shows two non-circular singly linked lists that use dummy nodes at the beginning of the list. The first list is empty, and the second contains three nodes with data. Adding a dummy node reduces the complexity of the append and remove functions but slightly increases the complexity of the other functions, and the dummy node wastes some memory. The example code below is an implementation of a non-circular singly linked list with a dummy node at the beginning of the list. Only the functions that differ from example 1 are listed and each differing line is highlighted in bold font.

Two non-circular singly linked lists, each with a dummy node

Figure 11: Two non-circular singly linked lists, each with a dummy node.

Example 2

/* Creates and initializes a

* non-circular, singly-linked list

* with a dummy node at the beginning. */

LinkedList *createList(void) {

LinkedList *list = malloc(sizeof(*list));

list->head = createNode(-1); /* Create the dummy node. */

list->tail = list->head;

return list;

}

/* Returns true if this list is

* empty; otherwise returns false. */

int listIsEmpty(const LinkedList *list) {

return list->head->next == NULL;

}

38

/* Returns a pointer to the node in this list at

* location index or NULL if no such node exists. */

LinkedNode *getNode(const LinkedList *list, int index) {

LinkedNode *curr = list->head->next;

while (curr != NULL) {

if (index == 0) {

break;

}

--index;

curr = curr->next;

}

return curr;

}

/* Returns a pointer to the node in this list that

* contains key or NULL if no such node exists. */

LinkedNode *findNode(const LinkedList *list, int key) {

LinkedNode *curr = list->head->next;

while (curr != NULL) {

if (curr->which == key) {

break;

}

curr = curr->next;

}

return curr;

}

/* Inserts a node at the beginning of this list. */

void insertNode(LinkedList *list, LinkedNode *node) {

if (listIsEmpty(list)) {

list->tail = node;

}

LinkedNode *head = list->head;

node->next = head->next;

head->next = node;

}

/* Appends a node at the end of this list. */

void appendNode(LinkedList *list, LinkedNode *node) {

list->tail->next = node;

list->tail = node;

node->next = NULL;

}

39

/* Removes a node from this list. */

void removeNode(LinkedList *list, LinkedNode *node) {

/* Traverse the list to find the node that

* comes before the one to be removed. */

LinkedNode *prev = list->head;

LinkedNode *curr = prev->next;

while (curr != NULL) {

if (curr == node) {

/* We found the node, so remove it. */

LinkedNode *next = node->next;

prev->next = next;

if (next == NULL) {

/* We are removing the node at the end

* of the list, so change the tail. */

list->tail = prev;

}

node->next = NULL;

break;

}

prev = curr;

curr = curr->next;

}

}

/* Removes and returns the first node from this list. */

LinkedNode *removeFirst(LinkedList *list) {

LinkedNode *first = list->head->next;

if (!listIsEmpty(list)) {

removeNode(list, first);

}

return first;

}

Elegant and Efficient Pointer Centric Lists

The correct way to write a singly linked list is to visualize the list in a pointer centric way, focusing on the links (pointers) between the nodes instead of focusing on the nodes. Pointer centric thinking results in code that uses the address of the head pointer and next pointers. Such code uses no dummy node, requires less special case handling, is easier to test because it has fewer paths through the code, and executes slightly faster than node centric code.Figure 12 shows a non-circular singly linked list that contains three nodes. Figure 13 and example 3 show an implementation of a non-circular singly linked list using pointer centric code. Only the functions that differ from example 1are listed below and each differing line is highlighted in bold font.

A non-circular singly linked list

Figure 12: A non-circular singly linked list.

40

LinkedList

–head : LinkedNode *
–tailp : LinkedNode **

+createList() : LinkedList *
+freeList(list : LinkedList *) : void
+listIsEmpty(list : LinkedList *) : int
+getNode(list : LinkedList *, index : int) : LinkedNode *
+findNode(list : LinkedList *, key : int) : LinkedNode *
+insertNode(list : LinkedList *, node : LinkedNode *) : void
+appendNode(list : LinkedList *, node : LinkedNode *) : void
+removeFirst(list : LinkedList *) : LinkedNode *
+removeNode(list : LinkedList *, node : LinkedNode *) : void

Figure 13: A UML class diagram for a pointer centric singly linked list.

Example 3

typedef struct sllist {

LinkedNode *head, **tailp;

} LinkedList;

/* Creates and initializes a

* non-circular, singly-linked list. */

LinkedList *createList(void) {

LinkedList *list = malloc(sizeof(*list));

list->head = NULL;

list->tailp = &list->head;

return list;

}

/* Frees all the nodes in this list. */

void freeList(LinkedList *list) {

LinkedNode *prev, *curr = list->head;

while ((prev = curr) != NULL) {

curr = curr->next;

freeNode(prev);

}

list->head = NULL;

list->tailp = NULL;

free(list);

}

/* Inserts a node at the beginning of this list. */

void insertNode(LinkedList *list, LinkedNode *node) {

if (listIsEmpty(list)) {

list->tailp = &node->next;

}

node->next = list->head;

list->head = node;

}

/* Appends a node at the end of this list. */

void appendNode(LinkedList *list, LinkedNode *node) {

*list->tailp = node;

list->tailp = &node->next;

node->next = NULL;

}

41

/* Removes a node from this list. */

void removeNode(LinkedList *list, LinkedNode *node) {

/* Traverse the list to find the next pointer of the

* node that comes before the one to be removed. */

LinkedNode *curr, **pnp = &list->head;

while ((curr = *pnp) != NULL) {

if (curr == node) {

/* We found the node, so remove it. */

*pnp = node->next;

if (list->tailp == &node->next) {

/* We are removing the node at the end

* of the list, so change the tail. */

list->tailp = pnp;

}

node->next = NULL;

break;

}

pnp = &curr->next;

}

}

/* Removes and returns the first node from this list. */

LinkedNode *removeFirst(LinkedList *list) {

LinkedNode *first = list->head;

if (first != NULL) {

removeNode(list, first);

}

return first;

}

Notice that the appendNode and removeNode functions are much less complex when using a pointer centric approach (example 3). You may be thinking, “The node centric approach (example 1) would not be so complex if you used a circular list or if you wrote it in C++ instead of C.” Try it. No matter what you try, if you need to implement a singly linked list with append and remove functions, the pointer centric approach (example 3) will always be less complex.

It is helpful to see why the pointer centric approach is less complex. Notice within the pointer centric LinkedList structure that tailp is a double pointer so that it can hold the address of the head pointer or next pointer within the last node in the list. Consider this C code which creates a linked list and appends three nodes to it.

Example 4

1 int main(void) {

2 LinkedList *list = createList();

3 appendNode(list, createNode(1));

4 appendNode(list, createNode(2));

5 appendNode(list, createNode(3));

6 freeList(list);

7 return 0;

8 }

42

Memory
Address

Variable
Name

Value

0x2000

list->head

0x0

0x2008

list->tailp

0x2000

Figure 14: A representation of the contents of the computer’s memory after line 2 inexample 4 has been executed.

Figure 14 shows a representation of the linked list in the computer’s memory after line 2 is executed. Notice the linked list has been created but is empty because list->head is NULL andlist->tailp points to list->head.

Memory
Address

Variable
Name

Value

0x2000

list->head

0x2010

0x2008

list->tailp

0x2010

0x2010

node1->next

0x0

0x2018

node1->which

1

Figure 15: A representation of the contents of the computer’s memory after one node has been added to the list.

Figure 15 shows the linked list after line 3 has been executed and one node has been added to the list. list->head now points to node1, and list->tailp points to node1->next, which is where the next node will be added. This is what makes the appendNode function so simple to write. When node2 is appended to the list, the first line of code in appendNode:

*list->tailp = node;

changes the value in node1->next to point at node2. Then the second line in appendNode:

list->tailp = &node->next;

changes tailp to point at node2->next in preparation for another node to be added to the list. This line of code with the ampersand (&) deserves some explanation. The ampersand is theaddress-of operator, so the line is assigning to tailp the address of the next field within the last node. This assignment does not require the computer to read any data from memory. Instead it requires the computer to compute the address of the next field within a node by simply adding the offset of the next field to the address of the start of its node. However, in all the example code in this chapter, the offset of the next field is 0 bytes within theLinkedNode structure, and so the line:

list->tailp = &node->next;

doesn’t even require addition. For the computer it is a simple assignment like this:

list->tailp = (LinkedNode **)node;

It is interesting to learn that the C statement without the address-of operator requires the computer to do more work than the C statement that includes the address-of operator. The statement without the address-of operator:

curr = curr->next;

requires the computer to

43

1. Add the offset of next (0 in this example) to the address stored in curr.

2. Read from memory the address stored at the address computed in step 1.

3. Store that address in curr.

The statement with the address-of operator:

pnp = &curr->next;

requires the computer to

1. Add the offset of next (0 in this example) to the address stored in curr.

2. Store that address in pnp.

Memory
Address

Variable
Name

Value

0x2000

list->head

0x2010

0x2008

list->tailp

0x2020

0x2010

node1->next

0x2020

0x2018

node1->which

1

0x2020

node2->next

0x0

0x2028

node2->which

2

Figure 16: A representation of the contents of the computer’s memory after two nodes have been added to the list.

Figure 16 shows the linked list after line 4 has been executed and two nodes have been added to the list. Notice that list->tailp now contains the address of node2->next. In other words,list->tailp points to node2->next.

Memory
Address

Variable
Name

Value

0x2000

list->head

0x2010

0x2008

list->tailp

0x2030

0x2010

node1->next

0x2020

0x2018

node1->which

1

0x2020

node2->next

0x2030

0x2028

node2->which

2

0x2030

node3->next

0x0

0x2038

node3->which

3

Figure 17: A representation of the contents of the computer’s memory after three nodes have been added to the list.

Figure 17 shows the linked list after line 5 has been executed and all three nodes have been added to the list. list->head still points to node1. Of course node1->next points to node2 andnode2->next points to node3. list->tailp points to node3->next which is where the next node will be added to the list.

It is important to remember that the nodes of a linked list do not have to be allocated and stored sequentially in memory as they are in this simple example. Instead the nodes may be scattered throughout memory in a random order, but the next pointers link the nodes together in the order that they were added to the list.

44

Other Data Structures with Pointers

Pointer centric code also works well with other data structures that use pointers. Consider this node centric C code to add a node to a binary search tree. Notice the special case code that is executed when a tree is empty. Special case code is the hallmark of node centric code.

Example 5

/** Adds a node to a binary search tree. */

void addNode(BinaryTree *tree, BinaryNode *node) {

BinaryNode *curr = tree->root;

if (curr == NULL) {

/* This tree is empty so add node as its root. */

tree->root = node;

}

else {

BinaryNode *parent;

int cmp;

/* Find within this tree, the node that

* should be the parent of node. */

do {

parent = curr;

cmp = node->which - parent->which;

if (cmp < 0) {

curr = parent->left;

}

else if (cmp > 0) {

curr = parent->right;

}

else {

return; /* Don't allow duplicate nodes. */

}

} while (curr != NULL);

/* parent now points to the node that should be

* the parent of node. So just add node. */

if (cmp < 0) {

parent->left = node;

}

else if (cmp > 0) {

parent->right = node;

}

}

node->left = node->right = NULL;

}

Now consider this pointer centric code that also adds a node to a binary search tree. Notice how much shorter and simpler it is than the node centric code in the previous example.

45

Example 6

/** Adds a node to a binary search tree. */

void addNode(BinaryTree *tree, BinaryNode *node) {

BinaryNode *curr, **p = &tree->root;

while ((curr = *p) != NULL) {

int cmp = node->which - curr->which;

if (cmp < 0) {

p = &curr->left;

}

else if (cmp > 0) {

p = &curr->right;

}

else {

return; /* Don't allow duplicate nodes. */

}

}

/* p now points to the pointer where node

* must be added. So just add node. */

*p = node;

node->left = node->right = NULL;

}

Doubly Linked List

The most efficient way to write a doubly linked list seems to be to use a dummy node at the beginning of the list. This dummy node holds no user defined data, but instead holds pointers to the first and last nodes in the list. When the list is empty, both of these pointers point to the dummy node itself. Figure 18 shows two non-circular doubly linked lists. The first list is empty, and the second contains three nodes with data. Figure 19 shows a UML class diagram for a doubly linked list and its nodes, and the C code is listed in example 7.

Two non-circuclar doubly linked lists, each with a dummy node

Figure 18: Two non-circuclar doubly linked lists, each with a dummy node.

46

LinkedNode

–prev : LinkedNode *
–next : LinkedNode *
which : int

+createNode(which : int) : LinkedNode *
+freeNode(node : LinkedNode *) : void

LinkedList

–head : LinkedNode *

+createList() : LinkedList *
+freeList(list : LinkedList *) : void
+listIsEmpty(list : const LinkedList *) : int
+getNode(list : LinkedList *, index : int) : LinkedNode *
+findNode(list : LinkedList *, key : int) : LinkedNode *
+insertNode(list : LinkedList *, node : LinkedNode *) : void
+appendNode(list : LinkedList *, node : LinkedNode *) : void
+removeNode(list : LinkedList *, node : LinkedNode *) : void

Figure 19: A UML class diagram for a doubly linked list.

Example 7

typedef struct dlnode {

struct dlnode *prev;

struct dlnode *next;

/* Programmer defined data goes here. */

int which;

} LinkedNode;

/* Creates and returns a doubly-linked node. */

LinkedNode *createNode(int which) {

LinkedNode *node = malloc(sizeof(*node));

node->prev = node->next = NULL;

node->which = which;

return node;

}

/* Frees a node. */

void freeNode(LinkedNode *node) {

node->prev = node->next = NULL;

free(node);

}

typedef struct dllist {

/* head points to a dummy node at

* the beginning of the list. */

LinkedNode *head;

/* These two pointers are the dummy

* node at the beginning of the list. */

LinkedNode *prev;

LinkedNode *next;

} LinkedList;

47

/* Creates and initializes a doubly-linked list. */

LinkedList *createList(void) {

LinkedList *list = malloc(sizeof(*list));

/* Fool the computer into thinking the dummy node

* is a separate structure even though it is

* actually stored as part of the list itself. */

list->head = (LinkedNode *)&list->prev;

/* Initialize the list to be empty by making the

* first and last node pointers within the dummy

* node point to the dummy node itself. */

LinkedNode *head = list->head;

head->prev = head->next = head;

return list;

}

/* Frees all the nodes in this list. */

void freeList(LinkedList *list) {

LinkedNode *head = list->head;

LinkedNode *prev, *curr = head->next;

while ((prev = curr) != head) {

curr = curr->next;

freeNode(prev);

}

list->head = NULL;

free(list);

}

/* Returns true if this list is

* empty; otherwise returns false. */

int listIsEmpty(const LinkedList *list) {

LinkedNode *head = list->head;

return head->next == head;

}

/* Returns a pointer to the node in this list at

* location index or NULL if no such node exists. */

LinkedNode *getNode(const LinkedList *list, int index) {

LinkedNode *head = list->head;

LinkedNode *curr = head->next;

while (curr != head) {

if (index == 0) {

return curr;

}

--index;

curr = curr->next;

}

return NULL;

}

48

/* Returns a pointer to the node in this list that

* contains key or NULL if no such node exists. */

LinkedNode *findNode(const LinkedList *list, int key) {

LinkedNode *head = list->head;

LinkedNode *curr = head->next;

while (curr != head) {

if (curr->which == key) {

return curr;

}

curr = curr->next;

}

return NULL;

}

/* Inserts a node at the beginning of this list. */

void insertNode(LinkedList *list, LinkedNode *node) {

LinkedNode *head = list->head;

node->prev = head;

node->next = head->next;

head->next->prev = node;

head->next = node;

}

/* Appends a node at the end of this list. */

void appendNode(LinkedList *list, LinkedNode *node) {

LinkedNode *head = list->head;

node->prev = head->prev;

node->next = head;

head->prev->next = node;

head->prev = node;

}

/* Removes a node from this list. */

void removeNode(LinkedList *list, LinkedNode *node) {

/* Ensure node is actually in this list. */

if (findNode(list, node->which) != NULL) {

node->prev->next = node->next;

node->next->prev = node->prev;

node->prev = node->next = NULL;

}

}

49

Linked List Comparison

The following table compares the four different styles of linked lists presented in this chapter. The speed measurement in the table is for a test program that repeatedly calls insertNode, appendNode, and removeNode. The complexity column shows the cyclomatic complexity for four functions from each of the linked lists.

Comparison of Linked List Data Structures

Example

Name

Relative
Speed
(1 is best)

Dummy
Nodes

Complexity

1

Node centric, singly-linked list

1.08

0

insertNode

2

appendNode

2

removeFirst

3

removeNode

5

total

12

2

Node centric, singly-linked list with a dummy node

1.00

1

insertNode

2

appendNode

1

removeFirst

2

removeNode

4

total

9

3

Pointer centric, singly-linked list

1.00

0

insertNode

2

appendNode

1

removeFirst

2

removeNode

4

total

9

7

Doubly-linked list

1.11

1

insertNode

1

appendNode

1

removeFirst

2

removeNode

2

total

6

From the table, we see that the pointer centric singly-linked list is only slightly faster than the node centric singly-linked list and the doubly-linked list, so the performance difference is probably insignificant in most programs. However, the reduced complexity of the pointer centric, singly-linked list is a sufficient reason to switch from the node centric to the pointer centric. It is also interesting to see that the complexity of the doubly-linked list is even lower than the pointer centric singly-linked list. If you are not worried about the extra memory required for a doubly-linked list, it may be the best choice.

50

Programming Exercises

1. Read about the cyclomatic complexity metric and count the complexity of the freeList and findNode functions as written in the node centric singly-linked list code.

2. Use the pointer centric singly-linked list code in this chapter to help you write a singly-linked list that uses a string (const char *) as the key for each node instead of an integer.

ompletely.