Tutorial on structures in C using singly linked lists as an example

In C++, when you declare a variable to be an instance of a class, the constructor is automatically called. For example, if Single_list is a templated singly linked list class, then

void f() {
	// ...
	Single_list<int> history;
	// ...
}

declares history to be an instance of a singly linked list class storing integers. The memory for this class would be allocated by the compiler on the stack and the constructor would be called with this being assigned the memory location on the stack. When the variable history goes out of scope (in this case, at the end of the function), the compiler schedules a call to the destructor.

Alternatively, if you were to use new, you would have the following:

void f() {
	// ...
	Single_list<int> *p_history = new Single_list<int>();
	// ...
}

Here, the memory is allocated on the heap; that is, by the operating system. The operating system will find a block of memory of the appropriate size. When the constructor is called, this is assigned the address of this memory location found. The value returned by new and assigned to p_history is that address. To deallocate this memory, a call to delete p_history; must be made, at which point, the destructor will be called.

To give an example, consider the following program:

#include <vector>
#include <iostream>

int main() {
        // 'array' is an instance of a vector
        std::vector<int> array;

        std::cout << "The address of 'array' is    " << &array << std::endl;

        // 'p_array' is a pointer to a vector
        std::vector<int> *p_array = new std::vector<int>();

        std::cout << "The address of 'p_array' is  " << &p_array << std::endl;
        std::cout << "'p_array' stores the address " << p_array << std::endl;

        delete p_array;

        return 0;
}

When executed, this produces the output:

The address of 'array' is    0x7ffff28d47f0
The address of 'p_array' is  0x7ffff28d47e8
'p_array' stores the address 0x191ec010

You will note that the local variables array and p_array are stored on the stack, while the address of the memory allocated by new (on the heap) has an address much closer to the start of memory.

When the function main exits, the local variables array and p_array go out of scope. As p_array is a pointer (a primitive data type), its memory sits on the stack and may be reused by future local variables. When array goes out of scope, because it is an instance of a class, the compiler will schedule a call to the destructor right before the function exits. After that, the memory allocated for array will also sit on the stack, ready to be reused for a future local variable.

In C, memory allocation, initialization, clean-up (the destructor), and memory de-allocation each form distinct steps.

1.1 Declarations

In C++, we use classes (class) where both member variables and member functions are associated with the class. In C, we use structures (struct) where only member variables (called fields) are associated with the structure. There is no language association between the structure and any functions meant to manipulate the structure.

For example, a singly linked list class storing integers in C++ may look like:

class Single_node {
	private:
		int node_value;
		Single_node *next_node;

	public:
		int value() const;
		Single_node *next() const;
};
		
class Single_list {
	private:
		Single_node *list_head;
		Single_node *list_tail;
		int list_size;

	public:
		Single_list();
		~Single_list();

		int front() const;
		int back() const;
		void push_front( int );
		void pop_front();
};

In C, the corresponding structure is

struct single_node {
	int value;
	struct single_node *next;
};

struct single_list {
	struct single_node *head;
	struct single_node *tail;
	int size;
};

There are two things you will note:

There is no keyword public or private. All fields are public at all times.
It is always necessary to prefix the structure name with struct.

This second issue can be resolved through the use of typedef:

	typedef struct single_node single_node_t;
	typedef struct single_list single_list_t;

Now, any time you use single_list_t inside your code, the compiler will replace it with struct single_list and deal with it accordingly.

1.2 Memory allocation and de-allocation

Like C++, the compiler allocates sufficient memory on the stack for local variables declared to be of a specific struct. However, to get memory allocated on the heap, in C++, a call to new makes a request to the operating for the requisite amount of memory. In C, the function void *malloc( int n ) requests that the operating system allocates n bytes. To determine the amount of memory required by a data structure, you can use the sizeof operator:

	single_list_t *p_history = (single_list_t *) malloc( sizeof( single_list_t ) );

In C++, to deallocate the memory, the same address must be returned using the delete operator. In C, it is necessary to call the free function:

	free( p_history );
	p_history = NULL;

Note that in C, NULL takes the place of nullptr in C++.

1.3 Constructors and initialization

In C++, the constructor would initialize the class:

Single_node::Single_node( int n, Single_node *next ):
node_value( n ),
next_node( next ) {
	// Only initialization is required
}

Single_list::Single_list():
list_head( nullptr ),
list_head( nullptr ),
list_size( 0 ) {
	// Only initialization is required
}

If you are still uncomfortable with C++'s initialization list, you can consider the above to be equivalent to

Single_node::Single_node( int n, Single_node *next ) {
	node_value = n;
	next_node = next;
}

Single_list::Single_list() {
	list_head = nullptr;
	list_head = nullptr;
	list_size = 0;
}

however, this is not recommended especially when the member variables are themselves classes and not just primitive data types.

As there are no constructors in C, we must perform initialization in a separately defined function:

void single_node_init( single_node_t *node, int v, single_node_t *n ) {
	node->value = v;
	node->next_node = n;
}

void single_list_init( single_list_t *const list ) {
	list->head = NULL;
	list->tail = NULL;
	list->size = 0;
}

In this case, there is no this member variable. Instead, we must explicitly pass the address of the object we are initializing. Therefore, the following code in C++:

	Single_list history;
	Single_list *p_history = new Single_list<int>();

would now look like

	single_list_t history;
	single_list_init( &history );

	single_list_t *p_history = (single_list_t *) malloc( sizeof( single_list_t ) );
	single_list_init( p_history );

1.4 Functions for manipulating a data structure

In C++, a member function for accessing, pushing a new node onto, and popping a node from the front of the linked list may look like:

int Single_list::front() const {
	// If the list is empty, return 0
	if ( list_size == 0 ) {
		return 0;
	}

	return list_head->value();
}

int Single_list::back() const {
	// If the list is empty, return 0
	if ( list_size == 0 ) {
		return 0;
	}

	return list_tail->value();
}

void Single_list::push_front( int n ) {
	// Create a new node that stores 'n' and points to the node
	// currently stored in 'list_head'.  Assign the address of
	// this new node to 'list_head'.

	list_head = new Single_node( n, list_head );

	// If the list was empty prior to this function being called,
	// we need to have 'list_tail' point to the same node as 'list_head'.

	if ( list_size == 0 ) {
		list_tail = list_head;
	}

	// Increment the number of nodes stored in this linked list

	++list_size;
}

void Single_list::pop_front() {
	// If it is empty, do nothing

	if ( list_size == 0 ) {
		return;
	}

	// Temporarily store the address of the node currently at the head of
	// the linked list, set 'list_head' to be the address of the node
	// pointed to by the current head of the linked list, and then
	// delete the temporary node.

	Single_node *tmp = list_head;
	list_head = list_head->next_node;
	delete tmp;

	// Decrement the number of nodes stored in this linked list
	--list_size;

	// If the list is now empty, set the tail pointer to null, as well

	if ( list_size == 0 ) {
		list_tail = nullptr;
	}
}

Remember that every time you make a reference to a member variable of the same class, you have the option of explicitly indicating that it is that member variable of the object pointed to by this:

int Single_list::front() const {
	if ( this->list_size == 0 ) {
		return 0;
	}

	return this->list_head->value();
}

int Single_list::back() const {
	if ( this->list_size == 0 ) {
		return 0;
	}

	return this->list_tail->value();
}

void Single_list::push_front( int n ) {
	this->list_head = new Single_node( n, this->list_head );

	if ( this->list_size == 0 ) {
		this->list_tail = this->list_head;
	}

	++this->list_size;
}

void Single_list::pop_front() {
	if ( this->list_size == 0 ) {
		return;
	}

	Single_node *tmp = this->list_head;
	this->list_head = this->list_head->next_node;
	delete tmp;
	--this->list_size;

	if ( this->list_size == 0 ) {
		this->list_tail = nullptr;
	}
}

This member function is called explicitly on an instance of the linked list, so we may see something like

	history.push_front( 3 );
	p_history->push_front( 3 );

	std::cout << history.front() << std::endl;
	std::cout << p_history->front() << std::endl;

and inside the member function, this is assigned the address of the object on which push_front is being called. In C, however, we do not have the language association between the structure and any functions that manipulate it. Instead, we will use the convention that the address will always be passed as the first argument of such functions:

int single_list_front( single_list_t *const list ) {
	// If the list is empty, return 0
	if ( list->size == 0 ) {
		return 0;
	}

	return list->head->value;
}

int single_list_back( single_list_t *const list ) {
	// If the list is empty, return 0
	if ( list->size == 0 ) {
		return 0;
	}

	return list->tail->value;
}

void single_list_push_front( single_list_t *const list, int n ) {
	// Create a new node that stores 'n' and points to the node
	// currently stored in 'head'.  Assign the address of
	// this new node to 'head'.

	single_node_t *new_head = (single_node_t *) malloc( sizeof( single_node_t ) );
	single_node_init( new_head, n, list->head );
	list->head = new_head;

	// If the list was empty prior to this function being called,
	// we need to have 'tail' point to the same node as 'head'.

	if ( list->size == 0 ) {
		list->tail = list->head;
	}

	// Increment the number of nodes stored in this linked list
	++( list->size );
}

void single_list_pop_front( single_list_t *const list ) {
	// If it is empty, do nothing

	if ( list->size == 0 ) {
		return;
	}

	// Temporarily store the address of the node currently at the head of
	// the linked list, set 'head' to be the address of the node
	// pointed to by the current head of the linked list, and then
	// free the temporary node.

	single_node_t *tmp = list->head;
	list->head = list->head->next_node;
	free( tmp );

	// Decrement the number of nodes stored in this linked list
	--( list->size );

	// If the list is now empty, set the tail pointer to null, as well

	if ( list->size == 0 ) {
		list->tail = NULL;
	}
}

This member function is called explicitly on an instance of the linked list, so we may see something like

	single_list_push_front( &history, 3 );
	single_list_push_front( p_history, 3 );

	printf( "%d\n", single_list_back( &history ) );
	printf( "%d\n", single_list_back( p_history ) );

As you can see, while classes and structures do the same thing, the idea with a C++ class is that there is an intrinsic connection between the member variables and the member functions and programmers are prevented from accessing the member variables except through the member functions. In C structures, however, the data structure is something that is operated on by a number of functions.

1.5 Cleaning up after a data structure

Suppose an instance of a class goes out of scope. In this case, it may be necessary to clean up. This ensures that there is an opportunity to deallocate any allocated memory, to close files, etc. In C++, this is all done in the destructor:

Single_list::~Single_list() {
	// Remove all remaining nodes in the linked list

	while ( list_size != 0 ) {
		pop_front();
	}
}

The destructor should never be explicitly called: if a local variable goes out of scope, the compiler schedules a call to the destructor. If memory allocated on the heap is deallocated through a call to delete, the compiler automatically calls the destructor, too. In C, however, any clean-up must be performed in a function that is explicitly called. We will call such functions destroy:

void single_list_destroy( single_list_t *const list ) {
	// Remove all remaining nodes in the linked list

	while ( list->size != 0 ) {
		single_list_pop_front( list );
	}
}

In C++, it is not necessary to call the destructor. In C, you must place a call to destroy at the appropriate moment:

void f() {
	// ...

	single_list_t history;
	single_list_init( &history );

	single_list_t *p_history = (single_list_t *) malloc( sizeof( single_list_t ) );
	single_list_init( p_history );

	// ...

	single_list_destroy( &history );

	single_list_destroy( p_history );
	free( p_history );
	p_history = NULL;  // Not really necessary here, but useful
}

Introduction to Computer Structures and Real-time Systems